Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Mitigate prometheus RAM flooding #2020

Merged
merged 6 commits into from Oct 11, 2018

Conversation

cbeneke
Copy link
Contributor

@cbeneke cbeneke commented Sep 18, 2018

What this PR does / why we need it:

kubernetes/kubernetes#68530 fixes a bug in kubernetes, which floods the metrics created by the controller-manager. As the fix seems not to be backported to v1.11 this change will drop all rest_* series from the controller-manager when any clusterversion v1.11.x is used.

Which issue(s) this PR fixes (optional, in fixes #<issue number>(, fixes #<issue_number>, ...) format, will close the issue(s) when PR gets merged):

Special notes for your reviewer:

Documentation:

Release note:

Version v1.11.0 - 1.11.3 Clusters will no longer gather `rest_*` metrics from the controller-manager due to a [bug in kubernetes](https://github.com/kubernetes/kubernetes/pull/68530)

@cbeneke cbeneke force-pushed the fix/mitigate-prometheus-ram-flooding branch from 030f103 to 5c23832 Compare September 18, 2018 14:54
@cbeneke
Copy link
Contributor Author

cbeneke commented Sep 18, 2018

Hmm.. it seems I don't have access to the build log. I compared the Cluster.Version call to

{{- if semverCompare ">=1.9.*" .Cluster.Spec.Version }}
.. updated to be in the TemplateData though, as I suspected the structure is comparable. Would be nice if someone can lighten me up how to correctly compare the version there. :)

@alvaroaleman
Copy link
Contributor

I asked upstream if there will be a backport because IMHO there should be one: kubernetes/kubernetes#68530 (comment)

alvaroaleman
alvaroaleman previously approved these changes Sep 19, 2018
@cbeneke
Copy link
Contributor Author

cbeneke commented Sep 19, 2018

Thanks for fixing the PR :) If it gets cherry picked, the workaround should still be rolled out for versions v1.11.0 - v1.11.x (x being the release including the backport) for stability. (Or am I missing something?)

@alvaroaleman
Copy link
Contributor

@cbeneke Yep exactly

kron4eg
kron4eg previously approved these changes Sep 19, 2018
@mrIncompetent
Copy link
Contributor

The cherry pick for upstream has been merged.
We can now limit the affected versions to 1.11.0 - 1.11.3

@mrIncompetent mrIncompetent force-pushed the fix/mitigate-prometheus-ram-flooding branch from 2ddcd27 to 7098de6 Compare October 10, 2018 17:24
cbeneke and others added 4 commits October 10, 2018 19:25
In kubernetes/kubernetes#68530 a bug will be
fixed with v1.12 of kubernetes, which floods the metrics created by
the controller-manager. This fix will drop all rest_* series from
the controller-manager when v1.11.x is used.
@mrIncompetent mrIncompetent force-pushed the fix/mitigate-prometheus-ram-flooding branch from 7098de6 to bf08a71 Compare October 10, 2018 17:25
@mrIncompetent
Copy link
Contributor

@alvaroaleman @kron4eg @cbeneke
I updated the config to only drop those metrics for 1.11.0 - 1.11.3.
PTAL

@mrIncompetent
Copy link
Contributor

/cherry-pick release/v2.8

@mrIncompetent mrIncompetent merged commit 20aeed6 into master Oct 11, 2018
@mrIncompetent mrIncompetent deleted the fix/mitigate-prometheus-ram-flooding branch October 11, 2018 09:35
@kubermatic-bot
Copy link
Contributor

Error creating cherry-pick due to: error executing command git -C REDACTED cherry-pick 20aeed6f6677836021943400203e5709e221333d: out=error: could not apply 20aeed6f... Mitigate prometheus RAM flooding (#2020) hint: after resolving the conflicts, mark the corrected paths hint: with 'git add <paths>' or 'git rm <paths>' hint: and commit the result with 'git commit' , err=exit status 1

mrIncompetent pushed a commit that referenced this pull request Oct 11, 2018
* Mitigate prometheus RAM flooding

In kubernetes/kubernetes#68530 a bug will be
fixed with v1.12 of kubernetes, which floods the metrics created by
the controller-manager. This fix will drop all rest_* series from
the controller-manager when v1.11.x is used.

* fix tests

* update fixtures

* restrict prometheus drop rule to 1.11.0-1.11.3 as it got fixed in the versions above

* fix test

(cherry picked from commit 20aeed6)
mrIncompetent added a commit that referenced this pull request Oct 11, 2018
* Mitigate prometheus RAM flooding

In kubernetes/kubernetes#68530 a bug will be
fixed with v1.12 of kubernetes, which floods the metrics created by
the controller-manager. This fix will drop all rest_* series from
the controller-manager when v1.11.x is used.

* fix tests

* update fixtures

* restrict prometheus drop rule to 1.11.0-1.11.3 as it got fixed in the versions above

* fix test

(cherry picked from commit 20aeed6)
@kubermatic-bot kubermatic-bot added release-note Denotes a PR that will be considered when it comes time to generate release notes. size/L Denotes a PR that changes 100-499 lines, ignoring generated files. labels Feb 7, 2019
multi-io referenced this pull request in multi-io/kubermatic-old Nov 7, 2019
To fix the faulty prometheus clusters this commit creates a version
of #2020 (github.com/kubermatic/kubermatic) which rolls out the
metric drop for _all types_ of clusters (the merge itself does not
work, as the .TemplateData field is not implemented yet and im too
lazy to backport this to v2.7.1).
ATTENTION: This commit MUST be reverted when #2020 is merged into this
repo or conflicts will occur
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
cherry-pick/release/v2.8 release-note Denotes a PR that will be considered when it comes time to generate release notes. size/L Denotes a PR that changes 100-499 lines, ignoring generated files.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

5 participants