New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improve cadvisor (integration) tests #55398

Open
discordianfish opened this Issue Nov 9, 2017 · 11 comments

Comments

Projects
None yet
7 participants
@discordianfish
Contributor

discordianfish commented Nov 9, 2017

/kind feature

Every other kubernetes release the metrics provided by cadvisor seem to break. This not only just happened again in #55397 but also earlier in #48483, #33192 and #25131.

These issue usually not only end up in the stable releases but even on cloud providers like GKE where these are sometimes the only supported versions.

I think this should be prevented by improving the cadvisor integration tests (if there are any yet). They should at least check that cadvisor successfully exposes plausible metrics for mem, cpu, disk io, filesystem and networking.

@dims

This comment has been minimized.

Show comment
Hide comment
@dims

dims Nov 13, 2017

Member

/sig node

Member

dims commented Nov 13, 2017

/sig node

@discordianfish

This comment has been minimized.

Show comment
Hide comment
@discordianfish

discordianfish Nov 15, 2017

Contributor

In the meanwhile I ran into the next metrics related regression. This time it's not cadvisor so maybe the scope should be more integration/regression testing for metrics in general.
This time it's #52121 and #53485

Contributor

discordianfish commented Nov 15, 2017

In the meanwhile I ran into the next metrics related regression. This time it's not cadvisor so maybe the scope should be more integration/regression testing for metrics in general.
This time it's #52121 and #53485

@fejta-bot

This comment has been minimized.

Show comment
Hide comment
@fejta-bot

fejta-bot Feb 13, 2018

Issues go stale after 90d of inactivity.
Mark the issue as fresh with /remove-lifecycle stale.
Stale issues rot after an additional 30d of inactivity and eventually close.

If this issue is safe to close now please do so with /close.

Send feedback to sig-testing, kubernetes/test-infra and/or fejta.
/lifecycle stale

fejta-bot commented Feb 13, 2018

Issues go stale after 90d of inactivity.
Mark the issue as fresh with /remove-lifecycle stale.
Stale issues rot after an additional 30d of inactivity and eventually close.

If this issue is safe to close now please do so with /close.

Send feedback to sig-testing, kubernetes/test-infra and/or fejta.
/lifecycle stale

@discordianfish

This comment has been minimized.

Show comment
Hide comment
@discordianfish

discordianfish Feb 13, 2018

Contributor

/remove-lifecycle stale
/lifecycle freeze

Contributor

discordianfish commented Feb 13, 2018

/remove-lifecycle stale
/lifecycle freeze

@discordianfish

This comment has been minimized.

Show comment
Hide comment
@discordianfish

discordianfish Feb 13, 2018

Contributor

@brancz Is that something you can get someone to look into?

Contributor

discordianfish commented Feb 13, 2018

@brancz Is that something you can get someone to look into?

@brancz

This comment has been minimized.

Show comment
Hide comment
@brancz

brancz Feb 13, 2018

Member

Unlikely, personally cAdvisor has been so unreliable for Prometheus metrics that we try not to rely on it and are even looking into getting the stats differently. Aside from that I would say this mostly belongs in the cAdvisor repo.

Member

brancz commented Feb 13, 2018

Unlikely, personally cAdvisor has been so unreliable for Prometheus metrics that we try not to rely on it and are even looking into getting the stats differently. Aside from that I would say this mostly belongs in the cAdvisor repo.

@discordianfish

This comment has been minimized.

Show comment
Hide comment
@discordianfish

discordianfish Feb 14, 2018

Contributor

@brancz No need to stablized cadvisor or improve that project itself, I was more thinking about better testing when bumping cadvisor version. And it's not cadvisor specific but something that applies to other metric related changes too (e.g #52121 and #53485). It just would be great to get people to pay more attention on metric changes in PRs.

Contributor

discordianfish commented Feb 14, 2018

@brancz No need to stablized cadvisor or improve that project itself, I was more thinking about better testing when bumping cadvisor version. And it's not cadvisor specific but something that applies to other metric related changes too (e.g #52121 and #53485). It just would be great to get people to pay more attention on metric changes in PRs.

@brancz

This comment has been minimized.

Show comment
Hide comment
@brancz

brancz Feb 14, 2018

Member

Understood. I don't have a great answer for you, yes I would love to, but don't have the resources to do this. Maybe it's a topic you can bring up to the CNCF CI working group? They already test Prometheus on Kubernetes, so maybe it's something they could look into (we're also working with them on some unrelated Prometheus CI things, maybe I can bring it up in there).

Member

brancz commented Feb 14, 2018

Understood. I don't have a great answer for you, yes I would love to, but don't have the resources to do this. Maybe it's a topic you can bring up to the CNCF CI working group? They already test Prometheus on Kubernetes, so maybe it's something they could look into (we're also working with them on some unrelated Prometheus CI things, maybe I can bring it up in there).

@fejta-bot

This comment has been minimized.

Show comment
Hide comment
@fejta-bot

fejta-bot May 15, 2018

Issues go stale after 90d of inactivity.
Mark the issue as fresh with /remove-lifecycle stale.
Stale issues rot after an additional 30d of inactivity and eventually close.

If this issue is safe to close now please do so with /close.

Send feedback to sig-testing, kubernetes/test-infra and/or fejta.
/lifecycle stale

fejta-bot commented May 15, 2018

Issues go stale after 90d of inactivity.
Mark the issue as fresh with /remove-lifecycle stale.
Stale issues rot after an additional 30d of inactivity and eventually close.

If this issue is safe to close now please do so with /close.

Send feedback to sig-testing, kubernetes/test-infra and/or fejta.
/lifecycle stale

@discordianfish

This comment has been minimized.

Show comment
Hide comment
@discordianfish

discordianfish May 15, 2018

Contributor

/remove-lifecycle stale
/lifecycle freeze

Contributor

discordianfish commented May 15, 2018

/remove-lifecycle stale
/lifecycle freeze

@techtonik

This comment has been minimized.

Show comment
Hide comment
@techtonik

techtonik Jul 5, 2018

So what are specific metrics that should be collected for now to close this issue? I guess the rest of metrics could be picked up when more issue arise.

techtonik commented Jul 5, 2018

So what are specific metrics that should be collected for now to close this issue? I guess the rest of metrics could be picked up when more issue arise.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment