New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Mitigate prometheus RAM flooding #2020
Conversation
030f103
to
5c23832
Compare
Hmm.. it seems I don't have access to the build log. I compared the Cluster.Version call to
|
I asked upstream if there will be a backport because IMHO there should be one: kubernetes/kubernetes#68530 (comment) |
Thanks for fixing the PR :) If it gets cherry picked, the workaround should still be rolled out for versions v1.11.0 - v1.11.x (x being the release including the backport) for stability. (Or am I missing something?) |
@cbeneke Yep exactly |
The cherry pick for upstream has been merged. |
7098de6
2ddcd27
to
7098de6
Compare
In kubernetes/kubernetes#68530 a bug will be fixed with v1.12 of kubernetes, which floods the metrics created by the controller-manager. This fix will drop all rest_* series from the controller-manager when v1.11.x is used.
7098de6
to
bf08a71
Compare
@alvaroaleman @kron4eg @cbeneke |
/cherry-pick release/v2.8 |
Error creating cherry-pick due to: |
* Mitigate prometheus RAM flooding In kubernetes/kubernetes#68530 a bug will be fixed with v1.12 of kubernetes, which floods the metrics created by the controller-manager. This fix will drop all rest_* series from the controller-manager when v1.11.x is used. * fix tests * update fixtures * restrict prometheus drop rule to 1.11.0-1.11.3 as it got fixed in the versions above * fix test (cherry picked from commit 20aeed6)
* Mitigate prometheus RAM flooding In kubernetes/kubernetes#68530 a bug will be fixed with v1.12 of kubernetes, which floods the metrics created by the controller-manager. This fix will drop all rest_* series from the controller-manager when v1.11.x is used. * fix tests * update fixtures * restrict prometheus drop rule to 1.11.0-1.11.3 as it got fixed in the versions above * fix test (cherry picked from commit 20aeed6)
To fix the faulty prometheus clusters this commit creates a version of #2020 (github.com/kubermatic/kubermatic) which rolls out the metric drop for _all types_ of clusters (the merge itself does not work, as the .TemplateData field is not implemented yet and im too lazy to backport this to v2.7.1). ATTENTION: This commit MUST be reverted when #2020 is merged into this repo or conflicts will occur
What this PR does / why we need it:
kubernetes/kubernetes#68530 fixes a bug in kubernetes, which floods the metrics created by the controller-manager. As the fix seems not to be backported to v1.11 this change will drop all rest_* series from the controller-manager when any clusterversion v1.11.x is used.
Which issue(s) this PR fixes (optional, in
fixes #<issue number>(, fixes #<issue_number>, ...)
format, will close the issue(s) when PR gets merged):Special notes for your reviewer:
Documentation:
Release note: