New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
runtime: remove kata_shim_netdev metric #9100
Merged
justxuewei
merged 2 commits into
kata-containers:main
from
littlejawa:fix_5738_metrics_memory
Feb 26, 2024
Merged
runtime: remove kata_shim_netdev metric #9100
justxuewei
merged 2 commits into
kata-containers:main
from
littlejawa:fix_5738_metrics_memory
Feb 26, 2024
Conversation
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
littlejawa
force-pushed
the
fix_5738_metrics_memory
branch
from
February 15, 2024 09:47
6474fe4
to
9ff9857
Compare
/test |
gkurz
approved these changes
Feb 21, 2024
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM. Thanks @littlejawa !
@littlejawa you need to rebase to get the fix from #9112 . |
As part of the shim network metrics, the shim is reporting network interfaces from the host with no namespace isolation - this gives insight in interfaces not tied to the kata containers, and causes an increase in resource usage for kata metrics. As the shim itself is not using the network (all its communication with other processes is done with local unix sockets), there is no reason to keep gathering and reporting shim-specific network metrics. Actual network usage of the kata containers can be found from the existing hypervisor network metrics (kata_hypervisor_netdev) and from the agent network metrics (kata_guest_netdev_stat). Fixes: kata-containers#5738 Signed-off-by: Julien Ropé <jrope@redhat.com>
For consistency with the go runtime. As the shim itself is not using the network (all its communication with other processes is done with local unix sockets), there is no reason to keep gathering and reporting shim-specific network metrics. Actual network usage of the kata containers can be found from the existing agent network metrics (kata_guest_netdev_stat). Signed-off-by: Julien Ropé <jrope@redhat.com>
littlejawa
force-pushed
the
fix_5738_metrics_memory
branch
from
February 22, 2024 13:00
9ff9857
to
1c306fe
Compare
/test |
justxuewei
approved these changes
Feb 26, 2024
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Lgtm, thanks!
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Following discussion on #5738 - I think the
kata_shim_netdev
metrics is unneeded.It is not only using too much memory (causing Prometheus pods to be OOM-killed in some installations), but it also reports data from interfaces that are unrelated to kata (no namespace isolation allowing the reporting of host-wide metrics).
The shim network usage is not giving any useful insight anyway - we already have hypervisor and agent metrics for that.
So rather than trying to fix the memory/namespace question, I'm suggesting to just remove the shim netdev metrics.
I'm also doing it on the rust implementation, even if it's not impacted by the initial bug as far as I can tell: In the rutime-rs case, the metrics are isolated (not verified, but that's my understanding at this point). Also it seems we don't have hypervisor netdev metrics coming from runtime-rs, but we still have the agent's network metrics.
I feel it's worth removing anyway, if only for consistency, but I'm making it in a separate commit in case reviewers ask to keep it as it is.