-
Notifications
You must be signed in to change notification settings - Fork 4.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Metricbeat][Autodiscover Kubernetes] Fix multiple instances reporting same metrics #38471
Conversation
Signed-off-by: constanca <constanca.manteigas@elastic.co>
Signed-off-by: constanca <constanca.manteigas@elastic.co>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please address my comments before merging
Co-authored-by: Giuseppe Santoro <giuseppe.santoro@elastic.co>
Co-authored-by: Giuseppe Santoro <giuseppe.santoro@elastic.co>
Signed-off-by: constanca <constanca.manteigas@elastic.co>
Signed-off-by: constanca <constanca.manteigas@elastic.co>
I finally managed to follow the steps you posted @constanca-m, however even before updating the lease object I already have both hostnames reporting metrics... Which query did you use to take the screen shot you put in the PR description? |
You will only see one host name for the metrics that are unique cluster wide. These metrics are coming from, for example,
Can you filter by the metricset? Example metricset: |
@constanca-m Have you noticed the same behaviour in elastic-agent because the code of leader election provider is somewhat different there ? |
Yes @MichaelKatsoulis , I have not checked, but I documented it in the issue as well as one of the upcoming tasks |
Signed-off-by: constanca <constanca.manteigas@elastic.co>
Signed-off-by: constanca <constanca.manteigas@elastic.co>
Signed-off-by: constanca <constanca.manteigas@elastic.co>
Signed-off-by: constanca <constanca.manteigas@elastic.co>
…g same metrics (#38471) * Fix event id Signed-off-by: constanca <constanca.manteigas@elastic.co> * Update changelog Signed-off-by: constanca <constanca.manteigas@elastic.co> * Update libbeat/autodiscover/providers/kubernetes/kubernetes.go Co-authored-by: Giuseppe Santoro <giuseppe.santoro@elastic.co> * Update libbeat/autodiscover/providers/kubernetes/kubernetes.go Co-authored-by: Giuseppe Santoro <giuseppe.santoro@elastic.co> * add space to log line Signed-off-by: constanca <constanca.manteigas@elastic.co> * change log.debug order Signed-off-by: constanca <constanca.manteigas@elastic.co> * - run leader elector until context is cancelled - add unit tests Signed-off-by: constanca <constanca.manteigas@elastic.co> * fix lint errors Signed-off-by: constanca <constanca.manteigas@elastic.co> * mage check Signed-off-by: constanca <constanca.manteigas@elastic.co> * use assert instead of require Signed-off-by: constanca <constanca.manteigas@elastic.co> * Update changelog Signed-off-by: constanca <constanca.manteigas@elastic.co> * Update changelog Signed-off-by: constanca <constanca.manteigas@elastic.co> * Add test comments Signed-off-by: constanca <constanca.manteigas@elastic.co> * Update docs Signed-off-by: constanca <constanca.manteigas@elastic.co> --------- Signed-off-by: constanca <constanca.manteigas@elastic.co> Co-authored-by: Giuseppe Santoro <giuseppe.santoro@elastic.co> (cherry picked from commit 5947565)
…g same metrics (#38471) (#38761) * Fix event id Signed-off-by: constanca <constanca.manteigas@elastic.co> * Update changelog Signed-off-by: constanca <constanca.manteigas@elastic.co> * Update libbeat/autodiscover/providers/kubernetes/kubernetes.go Co-authored-by: Giuseppe Santoro <giuseppe.santoro@elastic.co> * Update libbeat/autodiscover/providers/kubernetes/kubernetes.go Co-authored-by: Giuseppe Santoro <giuseppe.santoro@elastic.co> * add space to log line Signed-off-by: constanca <constanca.manteigas@elastic.co> * change log.debug order Signed-off-by: constanca <constanca.manteigas@elastic.co> * - run leader elector until context is cancelled - add unit tests Signed-off-by: constanca <constanca.manteigas@elastic.co> * fix lint errors Signed-off-by: constanca <constanca.manteigas@elastic.co> * mage check Signed-off-by: constanca <constanca.manteigas@elastic.co> * use assert instead of require Signed-off-by: constanca <constanca.manteigas@elastic.co> * Update changelog Signed-off-by: constanca <constanca.manteigas@elastic.co> * Update changelog Signed-off-by: constanca <constanca.manteigas@elastic.co> * Add test comments Signed-off-by: constanca <constanca.manteigas@elastic.co> * Update docs Signed-off-by: constanca <constanca.manteigas@elastic.co> --------- Signed-off-by: constanca <constanca.manteigas@elastic.co> Co-authored-by: Giuseppe Santoro <giuseppe.santoro@elastic.co> (cherry picked from commit 5947565) Co-authored-by: Constança Manteigas <113898685+constanca-m@users.noreply.github.com>
Proposed commit message
If the holder of the lease changes when using metricbeat autodiscover, then we have multiple hosts reporting the same metrics.
Please read the issue #38543 for a more detailed description.
This only affects the metrics that are unique cluster wide, like KSM metrics.
Checklist
CHANGELOG.next.asciidoc
orCHANGELOG-developer.next.asciidoc
.How to test this PR locally
If you want to see this fix in action, follow the section Results below.
Edit: here are more detailed steps for this:
Create a two node cluster. You can do it this way.
And
kind-config.yaml
:Build a docker image from this branch, in a place where you have the metricbeat binary.
So build metricbeat with
GOOS=linux GOARCH=amd64 CGO_ENABLED=0 go build
in metricbeat directory.You can use this Dockerfile to create the docker image:
Then run:
And then upload it to kind nodes:
Deploy the metricbeat manifest with this image.
I am using this manifest, that only has
state_node
enabled, and a few other metricsets.Don't forget to set up your ES outputs if you are not using the elastic stack.
Then deploy the manifest.
Update the lease object like this, so a lease renewal fail occurrs.
Depending on your current holder, you might have to update
holderIdentity
:Related issues
Closes #34998.
Relates #38543.
Results
Lease belongs to
control-plane
metricbeat instance:Logs from leader,
control-plane
metricbeat instance:Change the leader. You can modify the lease like this, which will cause a failure on lease renewal.
Now we see in the logs from the previous leader
control-plane
metricbeat instance:And in the logs from the new leader
worker
metricbeat instance:Results in discover just reporting metrics from one host.name. Check by comparing the number of documents before and after lease holder changed: they remain the same, so we are not having duplicates like before: