Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Bug 1554293 - logging-eventrouter event not formatted correctly in El… #1357

Merged
merged 1 commit into from Sep 28, 2018

Conversation

nhosoi
Copy link
Contributor

@nhosoi nhosoi commented Sep 22, 2018

…asticsearch when using MUX

Disable process_kubernetes_events if TRANSFORM_EVENTS is false or MUX client.

Depends on openshift/openshift-ansible#10207

@openshift-ci-robot openshift-ci-robot added the size/XS Denotes a PR that changes 0-9 lines, ignoring generated files. label Sep 22, 2018
@nhosoi
Copy link
Contributor Author

nhosoi commented Sep 22, 2018

/test logging

@nhosoi nhosoi added the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label Sep 23, 2018
@nhosoi nhosoi changed the title Bug 1554293 - logging-eventrouter event not formatted correctly in El… [WIP] Bug 1554293 - logging-eventrouter event not formatted correctly in El… Sep 23, 2018
@nhosoi
Copy link
Contributor Author

nhosoi commented Sep 23, 2018

WIP - adding a test case...

@nhosoi
Copy link
Contributor Author

nhosoi commented Sep 23, 2018

/test logging

@nhosoi
Copy link
Contributor Author

nhosoi commented Sep 23, 2018

Unfortunately, the current fix is incomplete. It does not pass my to-be-added-test cases. :(

@openshift-ci-robot openshift-ci-robot added size/M Denotes a PR that changes 30-99 lines, ignoring generated files. and removed size/XS Denotes a PR that changes 0-9 lines, ignoring generated files. labels Sep 25, 2018
@nhosoi nhosoi removed the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label Sep 25, 2018
@nhosoi nhosoi changed the title [WIP] Bug 1554293 - logging-eventrouter event not formatted correctly in El… Bug 1554293 - logging-eventrouter event not formatted correctly in El… Sep 25, 2018
@nhosoi nhosoi requested a review from richm September 25, 2018 15:19
# Make sure there's no MUX
oc set env ds/logging-fluentd MUX_CLIENT_MODE- 2>&1 | artifact_out
oc label node --all logging-infra-fluentd-
oc label node --all logging-infra-fluentd=true
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the oc set env has the effect of restarting fluentd - if that is sufficient in this case, you can get rid of the oc label node and add something like this, to wait for fluentd to be up and running. That is, the usual idiom is - label node with logging-infra-fluentd- to shut off fluentd, then wait for the daemonset numberReady to be 0. Then, edit the fluentd ds, configmap, whatever changes are needed to fluentd. Then, relabel the node, and wait for a fluentd pod to be running:

    # undeploy fluentd
    oc label node --all logging-infra-fluentd- 2>&1 | artifact_out
    os::cmd::try_until_text "oc get daemonset logging-fluentd -o jsonpath='{ .status.numberReady }'" "0" $FLUENTD_WAIT_TIME
    # oc set env, edit ds, whatever
    oc label node --all logging-infra-fluentd=true 2>&1 | artifact_out
    os::cmd::try_until_text "oc get pods -l component=fluentd" "^logging-fluentd-.* Running "

@richm
Copy link
Contributor

richm commented Sep 25, 2018

Finally, the eventrouter test is not running:

[INFO] Logging test suite test-eventrouter started at Tue Sep 25 06:15:35 UTC 2018
No resources found.
[WARNING] Eventrouter not deployed
[INFO] Logging test suite test-eventrouter succeeded at Tue Sep 25 06:15:35 UTC 2018

@richm richm added kind/bug Categorizes issue or PR as related to a bug. component/fluentd do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. release/4.0 labels Sep 25, 2018
@richm
Copy link
Contributor

richm commented Sep 25, 2018

/lgtm
on hold until openshift/openshift-ansible#10207 merges

@openshift-ci-robot openshift-ci-robot added the lgtm Indicates that a PR is ready to be merged. label Sep 25, 2018
@nhosoi
Copy link
Contributor Author

nhosoi commented Sep 25, 2018

Thank you, @richm . I pushed the new patch based on your suggestion. (I just wanted to make sure Fluentd is restarted with the valid MUX_CLIENT_MODE...)

Regarding the issue eventrouter is not enabled, I thought "-e openshift_logging_install_eventrouter=True" was passed to ansible... Where I can enable it?

@richm
Copy link
Contributor

richm commented Sep 25, 2018

Regarding the issue eventrouter is not enabled, I thought "-e openshift_logging_install_eventrouter=True" was passed to ansible... Where I can enable it?

https://github.com/openshift/aos-cd-jobs/tree/master/sjb/config/test_cases - but it is tricky - I think if you re-enable -e openshift_logging_install_eventrouter=True for the "base" jobs like test_cases/test_branch_openshift_ansible_logging.yml and test_cases/test_branch_openshift_ansible_logging_json_file.yml it will apply to all branches. If that is ok, then you could just look for all places where openshift_logging_install_eventrouter is commented out and add them back.

@nhosoi nhosoi added backport/3.9 backport this PR to release-3.9 backport/3.10 backport/3.11 labels Sep 25, 2018
@nhosoi
Copy link
Contributor Author

nhosoi commented Sep 25, 2018

Regarding the issue eventrouter is not enabled, I thought "-e openshift_logging_install_eventrouter=True" was passed to ansible... Where I can enable it?

https://github.com/openshift/aos-cd-jobs/tree/master/sjb/config/test_cases - but it is tricky - I think if you re-enable -e openshift_logging_install_eventrouter=True for the "base" jobs like test_cases/test_branch_openshift_ansible_logging.yml and test_cases/test_branch_openshift_ansible_logging_json_file.yml it will apply to all branches. If that is ok, then you could just look for all places where openshift_logging_install_eventrouter is commented out and add them back.

Indeed, it's tricky... I'd like to enable the eventrouter test in 3.9 and above. To do so, I'm supposed to discomment from "# -e openshift_logging_install_eventrouter=True" and move it to the right place in all these files? (I don't see 39 or older yml files in the aos-cd-jobs/sjb/config/test_cases directory.)

test_branch_openshift_ansible_logging_310.yml
test_branch_openshift_ansible_logging_json_file_310.yml
test_branch_openshift_ansible_logging_json_file.yml
test_branch_openshift_ansible_logging.yml
test_pull_request_origin_aggregated_logging_journald_310.yml
test_pull_request_origin_aggregated_logging_journald.yml

Thanks.

@richm
Copy link
Contributor

richm commented Sep 25, 2018

I'm supposed to discomment from "# -e openshift_logging_install_eventrouter=True" and move it to the right place in all these files?

yes

@nhosoi
Copy link
Contributor Author

nhosoi commented Sep 25, 2018

@richm , eventrouter was disabled with this comment. I wonder the bug has been already solved?

logging - workaround ansible eventrouter recursion bug

@richm
Copy link
Contributor

richm commented Sep 25, 2018

@richm , eventrouter was disabled with this comment. I wonder the bug has been already solved?

I think so - I think it was an ansible version problem

@nhosoi
Copy link
Contributor Author

nhosoi commented Sep 25, 2018

@richm , eventrouter was disabled with this comment. I wonder the bug has been already solved?

I think so - I think it was an ansible version problem

Thank you, @richm !! I've submit this PR.
openshift-eng/aos-cd-jobs#1594

@nhosoi
Copy link
Contributor Author

nhosoi commented Sep 27, 2018

/retest

@nhosoi
Copy link
Contributor Author

nhosoi commented Sep 27, 2018

bot, retest this please

@nhosoi
Copy link
Contributor Author

nhosoi commented Sep 27, 2018

/test logging
/test json-file

…asticsearch when using MUX

When fluentd is configured as a collector and MUX, event logs from the event
router need to be processed by MUX not by the collector fluentd for the both
MUX_CLIENT_MODE=maximal and minimal cases.  It is because if an event log is
formatted in the collector (note: the event record is put under the kubernetes
key), the log is forwarded to MUX and passed to the k8s-meta plugin and the
existing kubernetes record is overwritten.

To avoid the replacement, if the log is from event router, the tag is rewritten
to ${tag}.raw in input-post-forward-mux.conf, which makes the log treated in
the MUX_CLIENT_MODE=minimal way.

There was another bug in ansible.  That is, the environment variable TRANSFORM_
EVENTS was not set in MUX even if openshift_logging_install_eventrouter is set
to true.  This PR fixes the issue.
openshift/openshift-ansible#10207

In Fluentd run.sh, "process_kubernetes_events false" is set in the filter-viaq-
data-model plugin to suppress processing the event logs.  It is _not_ set, i.e.,
event router log processing is enabled, when TRANSFORM_EVENTS is true and the
fluentd is standalone (no MUX configured) or the fluentd is MUX.

Correct the order of shutdown fluentd, reset MUX_CLIENT_MODE, then restart.
@openshift-ci-robot openshift-ci-robot added lgtm Indicates that a PR is ready to be merged. and removed lgtm Indicates that a PR is ready to be merged. labels Sep 28, 2018
Copy link
Contributor

@richm richm left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

/lgtm

@nhosoi nhosoi removed the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label Sep 28, 2018
@nhosoi
Copy link
Contributor Author

nhosoi commented Sep 28, 2018

Thanks, @richm . Removed the hold label.

@openshift-merge-robot openshift-merge-robot merged commit 414c62c into openshift:master Sep 28, 2018
@nhosoi
Copy link
Contributor Author

nhosoi commented Oct 3, 2018

/cherrypick release-3.11

@openshift-cherrypick-robot

@nhosoi: new pull request created: #1374

In response to this:

/cherrypick release-3.11

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@nhosoi
Copy link
Contributor Author

nhosoi commented Oct 3, 2018

/cherrypick release-3.10

@openshift-cherrypick-robot

@nhosoi: new pull request created: #1375

In response to this:

/cherrypick release-3.10

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@nhosoi
Copy link
Contributor Author

nhosoi commented Oct 3, 2018

/cherrypick release-3.9

@openshift-cherrypick-robot

@nhosoi: new pull request created: #1376

In response to this:

/cherrypick release-3.9

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
backport/completed backport/3.9 backport this PR to release-3.9 backport/3.10 backport/3.11 component/fluentd kind/bug Categorizes issue or PR as related to a bug. lgtm Indicates that a PR is ready to be merged. release/4.0 size/M Denotes a PR that changes 30-99 lines, ignoring generated files.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

5 participants