Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Generate uuid on ingestion to reduce duplicates #974

Merged
merged 1 commit into from Mar 1, 2018

Conversation

jcantrill
Copy link
Contributor

This PR generates record uuids at the source to resolve:

https://bugzilla.redhat.com/show_bug.cgi?id=1548104

where when ES is underload, it fails bulk indexing and retries can produce duplicate entries

@jcantrill jcantrill requested a review from richm February 23, 2018 15:37
@openshift-ci-robot openshift-ci-robot added the size/S Denotes a PR that changes 10-29 lines, ignoring generated files. label Feb 23, 2018
@richm
Copy link
Contributor

richm commented Feb 23, 2018

/test json-file

@richm
Copy link
Contributor

richm commented Feb 23, 2018

/test logging

@jcantrill
Copy link
Contributor Author

/test json-file

@richm
Copy link
Contributor

richm commented Feb 24, 2018

/retest

1 similar comment
@jcantrill
Copy link
Contributor Author

/retest

@richm
Copy link
Contributor

richm commented Feb 24, 2018

please test with #933
or, let's merge 933, then this PR can change the bulk index rejection test to look for expected count == actual count instead of expected count <= actual count

<filter **>
@type elasticsearch_genid
hash_id_key _openshift_es_genid
</filter>

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Follow up to IRC discussion - this should go above filter-post-*.conf. My successful (no duplicates) config was:

<label @INGRESS>
## filters
  @include configs.d/openshift/filter-pre-*.conf
  @include configs.d/openshift/filter-retag-journal.conf
  @include configs.d/openshift/filter-k8s-meta.conf
  @include configs.d/openshift/filter-kibana-transform.conf
  @include configs.d/openshift/filter-k8s-flatten-hash.conf
  @include configs.d/openshift/filter-k8s-record-transform.conf
  @include configs.d/openshift/filter-syslog-record-transform.conf
  @include configs.d/openshift/filter-viaq-data-model.conf
  <filter **>
    @type elasticsearch_genid
    hash_id_key _openshift_es_genid
  </filter>
  @include configs.d/openshift/filter-post-*.conf
</label>

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fixed

# requires fluent-plugin-elasticsearch
<filter **>
@type elasticsearch_genid
hash_id_key _openshift_es_genid
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is there a reason we are using this particular key only to map and remove it later?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is what is recommended by the filter. The filter generates the key, copies the value to the 'id' field, and then comments that ES does not like records with '' so it is removed prior to submission

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think this file is used.
The "real" fluent.conf comes from ansible.
If we want this in the product by default, we'll need to create configs.d/openshift/filter-post-esid.conf or something like that with this config. And - if we want to easily disable it, we'll need to make the inclusion of this file conditional based on an env var.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ack, i read the documentation for the filter type. I didn't realize that id_key in the output plugin denotes what should be used as that key... I read it as it was setting another key to be that value.

Copy link
Contributor

@richm richm left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

/lgtm

@openshift-ci-robot openshift-ci-robot added the lgtm Indicates that a PR is ready to be merged. label Feb 28, 2018
@jcantrill jcantrill added the backport/3.9 backport this PR to release-3.9 label Feb 28, 2018
@jcantrill
Copy link
Contributor Author

/test json-file

1 similar comment
@richm
Copy link
Contributor

richm commented Feb 28, 2018

/test json-file

@richm
Copy link
Contributor

richm commented Mar 1, 2018

Running test/fluentd-forward.sh:258: executing 'wait_for_fluentd_to_catch_up '' '' 2' expecting success...
FAILURE after 1245.828s: test/fluentd-forward.sh:258: executing 'wait_for_fluentd_to_catch_up '' '' 2' expecting success: the command returned the wrong error code
Standard output from the command:
1
... repeated 853 times

/test json-file

@richm
Copy link
Contributor

richm commented Mar 1, 2018

I think the fluentd-forward test may be getting confused - it may be doing the esid in both fluentd - @jcantrill take a look and see if you need to disable the esid plugin in the client fluentd or the server fluentd

@richm
Copy link
Contributor

richm commented Mar 1, 2018

I see what the problem is - the test is failing here:
https://github.com/openshift/origin-aggregated-logging/blob/master/test/fluentd-forward.sh#L258

The test with FORWARDCNT=2 is testing that you can have a client fluentd sending to 2 server fluentd and each one of them will write the same record, but with a different id, to Elasticsearch. However, with esid, the id is generated in the client fluentd, which is passed through to the 2 servers, one of them gets through, the other is rejected with a duplicate id.

FAILURE after 1243.062s: test/fluentd-forward.sh:258: executing 'wait_for_fluentd_to_catch_up '' '' 2' expecting success: the command returned the wrong error code
Standard output from the command:
1

That is, the test is expecting 2 matching records, but there is only 1.

I think the best fix is to disable the esid plugin on the client fluentd. If that's not possible, then the next best is to create some sort filter to strip out the esid field from the record after generating it but before sending it to the matchs.

@openshift-merge-robot openshift-merge-robot removed the lgtm Indicates that a PR is ready to be merged. label Mar 1, 2018
@@ -75,6 +75,10 @@ update_current_fluentd() {
port 24284\n\
</server>\n\
</store>\n\
<filter **>\n\
@type record_modifier\n\
remove_keys _id\n\
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

should this be _openshift_es_genid ?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No I dont believe so. The gen plugin creates the id and stuffs it in a key that you define. It then uses that to populate the _id field which is what ES uses. We then tell it to remove the original key because apparently ES doesnt like that. In our case are not getting a duplicate record so we need to remove the '_id' field

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also, use a record_transformer filter instead of record_modifier

@jcantrill jcantrill force-pushed the 1548104 branch 2 times, most recently from e38d36d to 550a6e2 Compare March 1, 2018 15:35
Copy link
Contributor

@richm richm left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

/lgtm

@openshift-ci-robot openshift-ci-robot added the lgtm Indicates that a PR is ready to be merged. label Mar 1, 2018
@jcantrill
Copy link
Contributor Author

/test logging

@@ -61,6 +61,10 @@ update_current_fluentd() {
</match>' | oc replace -f -
oc patch configmap/logging-fluentd --type=json --patch '[{ "op": "add", "path": "/data/secure-forward1.conf", "#": "generated config file secure-forward1.conf" }]' 2>&1
oc patch configmap/logging-fluentd --type=json --patch '[{ "op": "replace", "path": "/data/secure-forward1.conf", "value": "\
<filter **>\n\
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry, I missed this earlier - this won't work - you cannot have a <filter> inside of a <match> block - this oc patch is putting the patch inside a <match> block - you might need a separate oc patch in order to edit fluent.conf to add the filter

@openshift-merge-robot openshift-merge-robot removed the lgtm Indicates that a PR is ready to be merged. label Mar 1, 2018
Copy link
Contributor

@richm richm left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

/lgtm

@openshift-ci-robot openshift-ci-robot added the lgtm Indicates that a PR is ready to be merged. label Mar 1, 2018
@jcantrill
Copy link
Contributor Author

/cherrypick release-3.9

@openshift-cherrypick-robot

@jcantrill: once the present PR merges, I will cherry-pick it on top of release-3.9 in a new PR and assign it to you.

In response to this:

/cherrypick release-3.9

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@jcantrill
Copy link
Contributor Author

/cherrypick release-3.7

@openshift-cherrypick-robot

@jcantrill: once the present PR merges, I will cherry-pick it on top of release-3.7 in a new PR and assign it to you.

In response to this:

/cherrypick release-3.7

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@jcantrill
Copy link
Contributor Author

/cherrypick release-3.6

@openshift-cherrypick-robot

@jcantrill: once the present PR merges, I will cherry-pick it on top of release-3.6 in a new PR and assign it to you.

In response to this:

/cherrypick release-3.6

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@openshift-merge-robot
Copy link
Contributor

/test all [submit-queue is verifying that this PR is safe to merge]

@openshift-merge-robot
Copy link
Contributor

Automatic merge from submit-queue.

@openshift-cherrypick-robot

@jcantrill: new pull request created: #990

In response to this:

/cherrypick release-3.9

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@openshift-cherrypick-robot

@jcantrill: new pull request created: #991

In response to this:

/cherrypick release-3.7

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@openshift-cherrypick-robot

@jcantrill: new pull request created: #992

In response to this:

/cherrypick release-3.6

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@jcantrill jcantrill deleted the 1548104 branch March 7, 2018 16:01
openshift-merge-robot added a commit that referenced this pull request Mar 15, 2018
…74-to-release-3.7

Automatic merge from submit-queue.

[release-3.7] Generate uuid on ingestion to reduce duplicates

This is an automated cherry-pick of #974

/assign jcantrill

Ref: https://bugzilla.redhat.com/show_bug.cgi?id=1556896
openshift-merge-robot added a commit that referenced this pull request Mar 22, 2018
…74-to-release-3.6

Automatic merge from submit-queue.

[release-3.6] Generate uuid on ingestion to reduce duplicates

This is an automated cherry-pick of #974

/assign jcantrill

https://bugzilla.redhat.com/show_bug.cgi?id=1556897
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
backport/completed backport/3.6 backport/3.7 backport/3.9 backport this PR to release-3.9 lgtm Indicates that a PR is ready to be merged. release/3.9 size/S Denotes a PR that changes 10-29 lines, ignoring generated files.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

7 participants