-
Notifications
You must be signed in to change notification settings - Fork 38.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
fix network value for stats summary for multiple network interfaces #52144
fix network value for stats summary for multiple network interfaces #52144
Conversation
/assign @dchen1107 |
3862eb9
to
6277040
Compare
the actual code seems reasonable, but this probably needs a big bold release note if/when it gets merged, since it changes the meaning of the network stats (they might suddenly be much larger). |
IIUC, @DirectXMan12 you mean that we need to make the release note bold? Maybe we can surround release note by an internal PTAL. |
6277040
to
3748f94
Compare
/test pull-kubernetes-e2e-gce-bazel |
I meant that somewhat figuratively ;-). |
this has approval from me, I think, but I'd like SIG Node to chime in for the actual approval. |
/ping @vishh @timstclair @yujuhong |
pkg/kubelet/stats/helper.go
Outdated
rxBytes += inter.RxBytes | ||
rxErrors += inter.RxErrors | ||
txBytes += inter.TxBytes | ||
txErrors += inter.TxErrors |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think that summing these statistics over all interfaces is the right approach. For instance, you probably don't care about how much traffic is going over loopback. I think the right solution is to either make it possible to select the monitored interface (see #28407), or modify the API to include a list of network stats.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For instance, you probably don't care about how much traffic is going over loopback.
Actually, when requesting cadvisor to return network interface values, network interface with name contains lo
, veth
and docker
will be ignored.. So, this should not be the problem.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@tallclair PTAL.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ping @tallclair
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
OK, I could be sold on this, but I would prefer to also offer the per-interface stats through another method. Perhaps someone from @kubernetes/sig-network-pr-reviews can chime in about how common multiple-interfaces are, and any other potential side effects to combining the stats.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Another example, I just looked at a random node on a GKE cluster and it has a cbr0 device (container bridge), which should definitely be counted separately from the eth0 stats.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I just looked at a random node on a GKE cluster and it has a cbr0 device (container bridge), which should definitely be counted separately from the eth0 stats.
@tallclair This should be done in cAdvisor.
@@ -59,6 +59,8 @@ const ( | |||
offsetFsTotalUsageBytes | |||
offsetFsBaseUsageBytes | |||
offsetFsInodeUsage | |||
|
|||
cbr0NetworkSeed = 100 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: call this something other than seed, since it's not seeding anything. Maybe value, since it's fixed?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Will do
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done. PTAL.
3748f94
to
bdf8b23
Compare
/test pull-kubernetes-e2e-gce-etcd3 |
@smarterclayton |
You have made me a happy man. |
Yes, I think this should go in 1.9. |
Ping @thockin @dchen1107 Could you please help to add milestone related label for this to be merged in 1.9? |
[MILESTONENOTIFIER] Milestone Pull Request Current @andyxning @dchen1107 @tallclair @thockin @yujuhong Note: This pull request is marked as Example update:
Pull Request Labels
|
I bumped the priority to critical-urgent to make sure it is merged into v1.9 since it fixes the real production issues, and risk is very low. |
/lgtm |
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: andyxning, dchen1107, tallclair Associated issue: 1788 The full list of commands accepted by this bot can be found here.
Needs approval from an approver in each of these OWNERS Files:
You can indicate your approval by writing |
/test all [submit-queue is verifying that this PR is safe to merge] |
Automatic merge from submit-queue. If you want to cherry-pick this change to another branch, please follow the instructions here. |
Thanks @dchen1107 @tallclair |
This PR is part of Heapster #1788.
The original reason is when there are more than one none
lo
,docker0
,veth
network interfaces instead of just oneeth0
, the network interface value is only partial and does not correct. For now, summary stats api only gets the eth0 network interface values.The original issues about this can be find in Heapster #1058 and Cadvisor #1593.
/cc @DirectXMan12 @piosz @xiangpengzhao @vishh @timstclair