Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix network value for stats summary for multiple network interfaces #52144

Conversation

andyxning
Copy link
Member

@andyxning andyxning commented Sep 8, 2017

This PR is part of Heapster #1788.

The original reason is when there are more than one none lo, docker0, veth network interfaces instead of just one eth0, the network interface value is only partial and does not correct. For now, summary stats api only gets the eth0 network interface values.

The original issues about this can be find in Heapster #1058 and Cadvisor #1593.

Fix stats summary network value when multiple network interfaces are available.

/cc @DirectXMan12 @piosz @xiangpengzhao @vishh @timstclair

@k8s-ci-robot k8s-ci-robot added size/M Denotes a PR that changes 30-99 lines, ignoring generated files. cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. labels Sep 8, 2017
@k8s-github-robot k8s-github-robot added do-not-merge/release-note-label-needed Indicates that a PR should not merge because it's missing one of the release note labels. release-note Denotes a PR that will be considered when it comes time to generate release notes. and removed do-not-merge/release-note-label-needed Indicates that a PR should not merge because it's missing one of the release note labels. labels Sep 8, 2017
@andyxning
Copy link
Member Author

/assign @dchen1107

@andyxning andyxning force-pushed the fix_network_value_for_stats_summary branch from 3862eb9 to 6277040 Compare September 8, 2017 05:35
@andyxning andyxning changed the title fix network value for stats summary fix network value for stats summary for multiple network interfaces Sep 8, 2017
@DirectXMan12
Copy link
Contributor

the actual code seems reasonable, but this probably needs a big bold release note if/when it gets merged, since it changes the meaning of the network stats (they might suddenly be much larger).

@andyxning
Copy link
Member Author

andyxning commented Sep 8, 2017

this probably needs a big bold release note if/when it gets merged

IIUC, @DirectXMan12 you mean that we need to make the release note bold? Maybe we can surround release note by an internal ** which can not be parsed under "```" but will be correctly parsed when added to a change log markdown file.

PTAL.

@andyxning andyxning force-pushed the fix_network_value_for_stats_summary branch from 6277040 to 3748f94 Compare September 9, 2017 00:15
@andyxning
Copy link
Member Author

/test pull-kubernetes-e2e-gce-bazel

@DirectXMan12
Copy link
Contributor

IIUC, @DirectXMan12 you mean that we need to make the release note bold?

I meant that somewhat figuratively ;-).

@DirectXMan12
Copy link
Contributor

this has approval from me, I think, but I'd like SIG Node to chime in for the actual approval.

@andyxning
Copy link
Member Author

/ping @vishh @timstclair @yujuhong

rxBytes += inter.RxBytes
rxErrors += inter.RxErrors
txBytes += inter.TxBytes
txErrors += inter.TxErrors
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think that summing these statistics over all interfaces is the right approach. For instance, you probably don't care about how much traffic is going over loopback. I think the right solution is to either make it possible to select the monitored interface (see #28407), or modify the API to include a list of network stats.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For instance, you probably don't care about how much traffic is going over loopback.

Actually, when requesting cadvisor to return network interface values, network interface with name contains lo, veth and docker will be ignored.. So, this should not be the problem.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@tallclair PTAL.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ping @tallclair

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK, I could be sold on this, but I would prefer to also offer the per-interface stats through another method. Perhaps someone from @kubernetes/sig-network-pr-reviews can chime in about how common multiple-interfaces are, and any other potential side effects to combining the stats.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Another example, I just looked at a random node on a GKE cluster and it has a cbr0 device (container bridge), which should definitely be counted separately from the eth0 stats.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I just looked at a random node on a GKE cluster and it has a cbr0 device (container bridge), which should definitely be counted separately from the eth0 stats.

@tallclair This should be done in cAdvisor.

@@ -59,6 +59,8 @@ const (
offsetFsTotalUsageBytes
offsetFsBaseUsageBytes
offsetFsInodeUsage

cbr0NetworkSeed = 100
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: call this something other than seed, since it's not seeding anything. Maybe value, since it's fixed?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Will do

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done. PTAL.

@andyxning andyxning force-pushed the fix_network_value_for_stats_summary branch from 3748f94 to bdf8b23 Compare September 13, 2017 04:07
@andyxning
Copy link
Member Author

/test pull-kubernetes-e2e-gce-etcd3
/test pull-kubernetes-e2e-gce-bazel

@andyxning
Copy link
Member Author

@andyxning
Copy link
Member Author

@smarterclayton
/sig node

@k8s-ci-robot k8s-ci-robot added the sig/node Categorizes an issue or PR as relevant to SIG Node. label Sep 20, 2017
@smarterclayton
Copy link
Contributor

You have made me a happy man.

@tallclair
Copy link
Member

Yes, I think this should go in 1.9.

@tallclair tallclair added priority/important-longterm Important over the long term, but may not be staffed and/or may need multiple releases to complete. kind/feature Categorizes issue or PR as related to a new feature. kind/bug Categorizes issue or PR as related to a bug. and removed kind/feature Categorizes issue or PR as related to a new feature. labels Nov 28, 2017
@k8s-github-robot k8s-github-robot removed this from the v1.9 milestone Nov 28, 2017
@andyxning
Copy link
Member Author

Ping @thockin @dchen1107 Could you please help to add milestone related label for this to be merged in 1.9?

@dchen1107 dchen1107 added this to the v1.9 milestone Nov 28, 2017
@k8s-github-robot k8s-github-robot removed this from the v1.9 milestone Nov 28, 2017
@dchen1107 dchen1107 added priority/critical-urgent Highest priority. Must be actively worked on as someone's top priority right now. and removed priority/important-longterm Important over the long term, but may not be staffed and/or may need multiple releases to complete. milestone/removed labels Nov 28, 2017
@dchen1107 dchen1107 added this to the v1.9 milestone Nov 28, 2017
@k8s-github-robot
Copy link

[MILESTONENOTIFIER] Milestone Pull Request Current

@andyxning @dchen1107 @tallclair @thockin @yujuhong

Note: This pull request is marked as priority/critical-urgent, and must be updated every 1 day during code freeze.

Example update:

ACK.  In progress
ETA: DD/MM/YYYY
Risks: Complicated fix required
Pull Request Labels
  • sig/node: Pull Request will be escalated to these SIGs if needed.
  • priority/critical-urgent: Never automatically move pull request out of a release milestone; continually escalate to contributor and SIG through all available channels.
  • kind/bug: Fixes a bug discovered during the current release.
Help

@dchen1107
Copy link
Member

I bumped the priority to critical-urgent to make sure it is merged into v1.9 since it fixes the real production issues, and risk is very low.

@dchen1107
Copy link
Member

/lgtm

@k8s-github-robot
Copy link

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: andyxning, dchen1107, tallclair

Associated issue: 1788

The full list of commands accepted by this bot can be found here.

Needs approval from an approver in each of these OWNERS Files:

You can indicate your approval by writing /approve in a comment
You can cancel your approval by writing /approve cancel in a comment

@k8s-github-robot
Copy link

/test all [submit-queue is verifying that this PR is safe to merge]

@k8s-github-robot
Copy link

Automatic merge from submit-queue. If you want to cherry-pick this change to another branch, please follow the instructions here.

@k8s-github-robot k8s-github-robot merged commit 8226973 into kubernetes:master Nov 28, 2017
@andyxning andyxning deleted the fix_network_value_for_stats_summary branch November 29, 2017 00:30
@andyxning
Copy link
Member Author

Thanks @dchen1107 @tallclair

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
approved Indicates a PR has been approved by an approver from all required OWNERS files. cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. kind/bug Categorizes issue or PR as related to a bug. lgtm "Looks good to me", indicates that a PR is ready to be merged. priority/critical-urgent Highest priority. Must be actively worked on as someone's top priority right now. release-note Denotes a PR that will be considered when it comes time to generate release notes. sig/node Categorizes an issue or PR as relevant to SIG Node. size/M Denotes a PR that changes 30-99 lines, ignoring generated files.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet