Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Increase the period of getting container filesystem/network stats #898

Closed
yujuhong opened this issue Sep 29, 2015 · 12 comments
Closed

Increase the period of getting container filesystem/network stats #898

yujuhong opened this issue Sep 29, 2015 · 12 comments

Comments

@yujuhong
Copy link
Contributor

Frequent checking could cause high cpu usage, as reported by a kubernetes user in kubernetes/kubernetes#10451 (comment)

edit: The current housekeeping period is 1s

/cc @vishh

@yujuhong
Copy link
Contributor Author

/cc @dchen1107

@jimmidyson
Copy link
Collaborator

From the referenced issue it seems running du is the culprit. Why does cadvisor run du at all? Don't think we care much about the size of any individual dirs do we? Shouldn't df of equivalent give enough info?

@jimmidyson
Copy link
Collaborator

AFAICT this du cheek only runs on aufs backed docker, basically docker running on ubuntu. Firstly I don't really like that inconsistency, but secondly I'm not sure I see the value of this check anyway. What do you think about removing it? In the meantime I'll have a think about how we might be able to implement in a better fashion.

Alternatively we could reduce the polling interval, but that just seems like delaying for when the number of containers rises.

@yujuhong
Copy link
Contributor Author

AFAICT this du cheek only runs on aufs backed docker, basically docker running on ubuntu.

Do we not check disk usage in other cases?

I think we want more information of disk usage exposed to kubelet. @vishh is probably the one who added this, so he'd defend the choice better than I do.

I agree that reducing the polling interval is a short-term solution, but it'd help for the time being since we've got quite a few reports from the users. A better implementation is more than welcome :)

@jimmidyson
Copy link
Collaborator

Yes only aufs. See

if !self.usesAufsDriver {
.

If this check is required, the way to perform it is dependent on the backing storage used for docker. In its current form this check would probably also work for devicemapper loopback but isn't going to work for a more production like deployment like using direct lvm.

However I can't think of a better way to do it tbh. I can't see this performing well enough at scale. Still vote to drop the check, open an issue to think if we can do this better.

@jimmidyson
Copy link
Collaborator

You probably already know this, but thought it worth just noting why du is causing CPU spikes. The only way for du to know dir size is to traverse the file tree & sum file size from file metadata. Unless the files are cached this is a relatively expensive operation. It also affects the disk cache which potentially affects performance of other running processes as files they access may have been evicted & need to be recached.

@yujuhong
Copy link
Contributor Author

yujuhong commented Oct 1, 2015

I am okay with disabling this for now since kubelet hasn't started utilizing it, but I think we still want the disk usage information in the near future. @vishh, WDYT?

@vishh
Copy link
Contributor

vishh commented Oct 2, 2015

Filesystem stats are useful mainly to figure out which container is hogging up disk space on a given node.
As you said @jimmidyson, there is no easy way to make this work across all storage drivers in docker.
I'm working on a quota based approach, but even that won't work on all deployments since it requires setting up quota.
I personally think fs usage feature will be useful.
If we can, we should probably add support for other storage backends like lvm and overlay for now.
For now, reducing the frequency will help. Another option is to place ulimits while exec'ing du.

@jimmidyson
Copy link
Collaborator

See #771 for another reason not to do du inside the containers' fileystem - blocks container deletes.

@dchen1107
Copy link
Collaborator

@jimmidyson du is a temporary workaround for disk usage tracking in cAdvisor without disk quota. We are working on proposal / prototype on better disk usage tracking. One proposal is using disk quota tracking. But before we get there, we need signals at least to detect out-of-disk condition, and propagate such information to upstream layers for management. Thus increasing the interval for filesystem stats might be a ok workaround for short-term.

@jimmidyson
Copy link
Collaborator

@dchen1107 du isn't giving you out of disk notifications, that would be handled via df or similar I'd suggest. du is giving you information on where your disk is being used up though, which I agree is useful, but is currently having, to my mind, unacceptable impacts on both performance (high IO & CPU) & stability (blocking container GC). This is also only implemented for aufs which isn't great.

If we could somehow swap to df or equivalent for storage backends that may be a better approach, but I have no idea if that is possible.

@vishh
Copy link
Contributor

vishh commented Oct 7, 2015

@jimmidyson: We do not use du for out of disk conditions. We use statfs. I intend to add support for devicemapper and overlayfs soon, which should address the "aufs only" concern.
As @dchen1107 mentioned, identifying and getting rid of a disk hogging container is very useful in reality.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants