Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Resolve issue preventing networking container stats from being sent to collector #1218

Conversation

tzifudzi
Copy link
Contributor

@tzifudzi tzifudzi commented Jun 3, 2023

What type of PR is this?

  • bugfix

What this PR does / why we need it

  • This fix ensures that network stats for containerd on Windows are successfully collected. Before this change, other container stats such as CPU and memory are successfully collected, but network stats are not being collected successfully when using containerd.
  • The root cause is that the code for collecting network stats was originally written to work with dockershim which relies on HCS v1 schema. After dockershim removal as Kubernetes's container runtime, containerd adoption has increased and this error is more frequently encountered when using containerd as the runtime. containerd uses HCS v2 schema whereby the network stats need to be queried from the HNS component.

Which issue(s) this PR fixes:

Notes for reviewers

  • This is my first pull request in this windows_exporter project. Any feedback and guidance is appreciated.

Testing/Verification

Manually tested by compiling new binary with changes and subsequently installing and running the new binary. Tested on both Docker and containerd.

When testing with containerd on Kubernetes
  • used containerd 1.6.6
  • used Windows Server 2022 Datacenter 10.0.20348.1726
  • used AWS EKS with the AWS VPC CNI plugin
# HELP windows_container_network_transmit_bytes_total Bytes Sent on Interface
# TYPE windows_container_network_transmit_bytes_total counter
windows_container_network_transmit_bytes_total{container_id="containerd://1ef7eb8f8c96538c8395c3b195029e1e674951b0105d14fe31f955b276b54008",interface="467D7000-DEA3-4BD8-B3CE-424CEEB959D0"} 301289
When testing with Docker on Kubernetes
  • used Docker 20.10.21
  • used Windows Server 2022 Datacenter 10.0.20348.1726
  • used Windows Server 2019 Datacenter 10.0.17763.4377
  • used AWS EKS with the AWS VPC CNI plugin
# HELP windows_container_network_transmit_bytes_total Bytes Sent on Interface
# TYPE windows_container_network_transmit_bytes_total counter
windows_container_network_transmit_bytes_total{container_id="docker://04c0c86b64342c4bf9c8390555bc254e5f07f2cc85e046fe8b5fc8d2f9d81106",interface="52A2F900-32C7-4CB1-832D-8108ED966FB4"} 241375
When testing with Docker on plain Windows host (without Kubernetes)
  • used Docker 24.0.2
  • used Windows Server 2022 Datacenter 10.0.20348.1787
# HELP windows_container_network_transmit_bytes_total Bytes Sent on Interface
# TYPE windows_container_network_transmit_bytes_total counter
windows_container_network_transmit_bytes_total{container_id="docker://062ac7f1925dd3a7054089b05bf9b2b8bdd4920d89fe3b944584e1dbf7e98065",interface="87AD7194-4445-4D5B-869F-5B6AF713C121"} 15745

@tzifudzi tzifudzi force-pushed the bugfix/missing-container-metrics branch 3 times, most recently from 32b7066 to 7ecf363 Compare June 4, 2023 00:13
@tzifudzi tzifudzi changed the title Resolve missing container metrics by querying from HNS Resolve isue preventing networking container stats from being sent to collector Jun 4, 2023
@tzifudzi tzifudzi changed the title Resolve isue preventing networking container stats from being sent to collector Resolve issue preventing networking container stats from being sent to collector Jun 4, 2023
@tzifudzi tzifudzi force-pushed the bugfix/missing-container-metrics branch from 7ecf363 to cec6ae8 Compare June 4, 2023 03:51
@tzifudzi tzifudzi marked this pull request as ready for review June 4, 2023 03:54
@tzifudzi tzifudzi requested a review from a team as a code owner June 4, 2023 03:54
collector/container.go Outdated Show resolved Hide resolved
collector/container.go Outdated Show resolved Hide resolved
@tzifudzi
Copy link
Contributor Author

tzifudzi commented Jun 4, 2023

@jsturtevant Please take a look at my change. Directly mentioning you here as I would appreciate your review.

Copy link
Contributor

@jsturtevant jsturtevant left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for picking this up!

collector/container.go Outdated Show resolved Hide resolved
collector/container.go Show resolved Hide resolved
collector/container.go Outdated Show resolved Hide resolved
@tzifudzi tzifudzi force-pushed the bugfix/missing-container-metrics branch from cec6ae8 to 9d5fbfa Compare June 11, 2023 21:25
@tzifudzi tzifudzi force-pushed the bugfix/missing-container-metrics branch 2 times, most recently from 8a80aa1 to 9e0fb3c Compare June 11, 2023 22:25
collector/container.go Outdated Show resolved Hide resolved
@jsturtevant
Copy link
Contributor

@breed808 I think this is close, @tzifudzi is looking into one more thing #1218 (comment) but could we run CI against it?

@breed808
Copy link
Contributor

I suspect the merge conflict is preventing the CI checks from running, might be worth rebasing the feature branch on master and resolving the conflict.

Aside from that, great work on this PR so far! I must profess my lack of k8s+Windows experience, but the conversation in this PR is reassuring: there's been a strong focus on stability and testing 👍

@tzifudzi tzifudzi force-pushed the bugfix/missing-container-metrics branch 3 times, most recently from 24d226b to a4bf8d8 Compare July 20, 2023 05:53
@tzifudzi
Copy link
Contributor Author

@breed808 Thanks for the great feedback. Conflicts have been resolved and I also re-tested the changes manually in nodes with different configurations as listed in the description. Let me know if that resolved the CI checks. Awaiting @jsturtevant's response for one last comment and after that we should be good to go.

collector/container.go Outdated Show resolved Hide resolved
This fix ensures that network stats for containerd on Windows are successfully collected. Before this change, other container stats such as CPU and memory are successfully collected, but network stats are failing for containerd.

The root cause is that the code for collecting network stats was originally written to work with docker which relies on v1 schema. After dockershim removal as Kubernetes's container runtime, containerd adoption has increased and this error is more frequently encountered when using containerd as  the runtime. containerd uses v2 schema whereby the network stats need to be queried from the HNS component.

Signed-off-by: Tatenda Zifudzi <tzifudzi@amazon.com>
@tzifudzi tzifudzi force-pushed the bugfix/missing-container-metrics branch from a4bf8d8 to 9adefdc Compare July 20, 2023 18:24
@jsturtevant
Copy link
Contributor

LGTM

Copy link
Contributor

@breed808 breed808 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good! Great work with this one.

@breed808 breed808 merged commit f068cf4 into prometheus-community:master Jul 21, 2023
5 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Missing metrics from 'container' collector Lots of warnings: No Network Stats for container
3 participants