New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Generate uuid on ingestion to reduce duplicates #974
Conversation
|
/test json-file |
|
/test logging |
|
/test json-file |
|
/retest |
1 similar comment
|
/retest |
|
please test with #933 |
fluentd/configs.d/user/fluent.conf
Outdated
| <filter **> | ||
| @type elasticsearch_genid | ||
| hash_id_key _openshift_es_genid | ||
| </filter> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Follow up to IRC discussion - this should go above filter-post-*.conf. My successful (no duplicates) config was:
<label @INGRESS>
## filters
@include configs.d/openshift/filter-pre-*.conf
@include configs.d/openshift/filter-retag-journal.conf
@include configs.d/openshift/filter-k8s-meta.conf
@include configs.d/openshift/filter-kibana-transform.conf
@include configs.d/openshift/filter-k8s-flatten-hash.conf
@include configs.d/openshift/filter-k8s-record-transform.conf
@include configs.d/openshift/filter-syslog-record-transform.conf
@include configs.d/openshift/filter-viaq-data-model.conf
<filter **>
@type elasticsearch_genid
hash_id_key _openshift_es_genid
</filter>
@include configs.d/openshift/filter-post-*.conf
</label>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
fixed
fluentd/configs.d/user/fluent.conf
Outdated
| # requires fluent-plugin-elasticsearch | ||
| <filter **> | ||
| @type elasticsearch_genid | ||
| hash_id_key _openshift_es_genid |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
is there a reason we are using this particular key only to map and remove it later?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is what is recommended by the filter. The filter generates the key, copies the value to the 'id' field, and then comments that ES does not like records with '' so it is removed prior to submission
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think this file is used.
The "real" fluent.conf comes from ansible.
If we want this in the product by default, we'll need to create configs.d/openshift/filter-post-esid.conf or something like that with this config. And - if we want to easily disable it, we'll need to make the inclusion of this file conditional based on an env var.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ack, i read the documentation for the filter type. I didn't realize that id_key in the output plugin denotes what should be used as that key... I read it as it was setting another key to be that value.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
/lgtm
|
/test json-file |
1 similar comment
|
/test json-file |
/test json-file |
|
I think the fluentd-forward test may be getting confused - it may be doing the esid in both fluentd - @jcantrill take a look and see if you need to disable the esid plugin in the client fluentd or the server fluentd |
|
I see what the problem is - the test is failing here: The test with That is, the test is expecting 2 matching records, but there is only 1. I think the best fix is to disable the esid plugin on the client fluentd. If that's not possible, then the next best is to create some sort filter to strip out the esid field from the record after generating it but before sending it to the |
test/fluentd-forward.sh
Outdated
| @@ -75,6 +75,10 @@ update_current_fluentd() { | |||
| port 24284\n\ | |||
| </server>\n\ | |||
| </store>\n\ | |||
| <filter **>\n\ | |||
| @type record_modifier\n\ | |||
| remove_keys _id\n\ | |||
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
should this be _openshift_es_genid ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No I dont believe so. The gen plugin creates the id and stuffs it in a key that you define. It then uses that to populate the _id field which is what ES uses. We then tell it to remove the original key because apparently ES doesnt like that. In our case are not getting a duplicate record so we need to remove the '_id' field
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also, use a record_transformer filter instead of record_modifier
e38d36d
to
550a6e2
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
/lgtm
|
/test logging |
test/fluentd-forward.sh
Outdated
| @@ -61,6 +61,10 @@ update_current_fluentd() { | |||
| </match>' | oc replace -f - | |||
| oc patch configmap/logging-fluentd --type=json --patch '[{ "op": "add", "path": "/data/secure-forward1.conf", "#": "generated config file secure-forward1.conf" }]' 2>&1 | |||
| oc patch configmap/logging-fluentd --type=json --patch '[{ "op": "replace", "path": "/data/secure-forward1.conf", "value": "\ | |||
| <filter **>\n\ | |||
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry, I missed this earlier - this won't work - you cannot have a <filter> inside of a <match> block - this oc patch is putting the patch inside a <match> block - you might need a separate oc patch in order to edit fluent.conf to add the filter
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
/lgtm
|
/cherrypick release-3.9 |
|
@jcantrill: once the present PR merges, I will cherry-pick it on top of release-3.9 in a new PR and assign it to you. In response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
|
/cherrypick release-3.7 |
|
@jcantrill: once the present PR merges, I will cherry-pick it on top of release-3.7 in a new PR and assign it to you. In response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
|
/cherrypick release-3.6 |
|
@jcantrill: once the present PR merges, I will cherry-pick it on top of release-3.6 in a new PR and assign it to you. In response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
|
/test all [submit-queue is verifying that this PR is safe to merge] |
|
Automatic merge from submit-queue. |
|
@jcantrill: new pull request created: #990 In response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
|
@jcantrill: new pull request created: #991 In response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
|
@jcantrill: new pull request created: #992 In response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
…74-to-release-3.7 Automatic merge from submit-queue. [release-3.7] Generate uuid on ingestion to reduce duplicates This is an automated cherry-pick of #974 /assign jcantrill Ref: https://bugzilla.redhat.com/show_bug.cgi?id=1556896
…74-to-release-3.6 Automatic merge from submit-queue. [release-3.6] Generate uuid on ingestion to reduce duplicates This is an automated cherry-pick of #974 /assign jcantrill https://bugzilla.redhat.com/show_bug.cgi?id=1556897
This PR generates record uuids at the source to resolve:
https://bugzilla.redhat.com/show_bug.cgi?id=1548104
where when ES is underload, it fails bulk indexing and retries can produce duplicate entries