Cadvisor now publishes per-container inode stats#1489
Cadvisor now publishes per-container inode stats#1489timstclair merged 1 commit intogoogle:masterfrom
Conversation
|
Issue: kubernetes/kubernetes#33382 |
timstclair
left a comment
There was a problem hiding this comment.
I left a few suggestions around optimizing the file counting. I don't know if it's all necessary, but maybe worth considering.
| type FsHandler interface { | ||
| Start() | ||
| Usage() (baseUsageBytes uint64, totalUsageBytes uint64) | ||
| Usage() (baseUsageBytes, totalUsageBytes, inodeUsage uint64) |
There was a problem hiding this comment.
nit: Can we create a struct to return with this info instead? IMHO there are very few situations where a method should return more than 2 values...
| } | ||
| } | ||
| } | ||
| } |
There was a problem hiding this comment.
How about extracting something like a "BackoffPoller" that abstracts all the shared backoff & polling logic between trackInodesUsage & trackDiskUsage?
There was a problem hiding this comment.
Actually, on further thought I'm wondering if this should be in the same routine as the du call. The advantages are that containers with very large filesystems wouldn't hog as many exec calls, and there might be less disk thrashing.
|
|
||
| if !ignoreMetrics.Has(container.DiskUsageMetrics) { | ||
| handler.fsHandler = common.NewFsHandler(time.Minute, rootfsStorageDir, "", fsInfo) | ||
| handler.fsHandler = common.NewFsHandler(time.Minute, time.Minute, rootfsStorageDir, "", fsInfo) |
There was a problem hiding this comment.
nit: make constants for the default periods
| err error | ||
| ) | ||
|
|
||
| if fh.rootfs != "" { |
There was a problem hiding this comment.
Under what conditions would the rootfs not be set? It should probably be an error, or exit the tracking loop...
| } | ||
| claimFindToken() | ||
| defer releaseFindToken() | ||
| findCmd := exec.Command("find", dir, "-xdev") |
There was a problem hiding this comment.
When I tried this, I found find $dir -xdev -printf '.' | wc -c to be about twice as fast. Might be a premature optimization though...
There was a problem hiding this comment.
Ill go ahead and try it out. It shouldnt hurt to make it faster, and it should still work.
| glog.Errorf("failed to read from stdout for cmds: %v, %v - %v", findCmd.Args, wcCmd.Args, souterr) | ||
| } | ||
| // return 0, fmt.Errorf("cmd %v output %s", cmd.Args, stdout) | ||
| inodeUsage, err := strconv.ParseUint(strings.Fields(stdout)[0], 10, 64) |
There was a problem hiding this comment.
Is the fields call necessary?
There was a problem hiding this comment.
stdout = int64 + \n, so the Fields call was just to remove the newline character.
| func (fh *realFsHandler) Start() { | ||
| go fh.trackUsage() | ||
| go fh.trackDiskUsage() | ||
| go fh.trackInodeUsage() |
There was a problem hiding this comment.
Does it make sense to track inode usage for devicemapper? If not we should skip it.
There was a problem hiding this comment.
AFAIK, inodes cannot be tracked for devicemapper.
There was a problem hiding this comment.
See the note I made in the Issue. I was able to test this, and the find command works correctly (and takes almost no time, since it always finds 0 inodes) with devicemapper.
| } | ||
| claimFindToken() | ||
| defer releaseFindToken() | ||
| findCmd := exec.Command("find", dir, "-xdev") |
There was a problem hiding this comment.
I think this will double count inodes on base layers shared across containers. That's probably what we want, but it means killing a countainer will not reclaim all it's reported inodes.
There was a problem hiding this comment.
It should not be doing that. dir here is expected to be the writable layer of a container. We need to test this.
There was a problem hiding this comment.
I am very likely wrong, but I think the "rootfs" directory that it is pointed at is the container's modifiable directory, and shouldn't contain the base layers. At least in most of my tests, the containers start out reporting 0 inodes used, which would indicate to me that they are not counting any shared resources in base layers.
There was a problem hiding this comment.
If that's the case, shouldn't it be zero for a new container? That's not what I'm seeing.
There was a problem hiding this comment.
I tested this. It only includes the writeable layer.
| } | ||
| inodes, err := fsInfo.GetDirInodeUsage(dir, time.Minute) | ||
| as.NoError(err) | ||
| as.True(uint64(numFiles) <= inodes, "expected inodes in dir to be at-least %d; got inodes: %d", numFiles, inodes) |
There was a problem hiding this comment.
shouldn't numFiles == inodes? Under what circumstances would it be different?
There was a problem hiding this comment.
I believe it is actually, numFiles + 1 == inodes, since the directory I created counts as well. I could change it to that, but I wasn't sure if it would consistently be the case.
There was a problem hiding this comment.
Confirmed, I changed it to numFiles +1 == inodes
| stat.Filesystem = &FilesystemStats{ | ||
| TotalUsageBytes: &val.Filesystem[0].Usage, | ||
| BaseUsageBytes: &val.Filesystem[0].BaseUsage, | ||
| InodeUsage: &val.Filesystem[0].Inodes, |
There was a problem hiding this comment.
If we skip inode collection for some storage drivers, then we should probably leave InodeUsage nil here when it's not collected.
There was a problem hiding this comment.
I think that is what actually happens. &val.Filesystem[0].Inodes may be nil if we skipped collection. In that case, it should also set InodeUsage to nil as well?
There was a problem hiding this comment.
I don't think so. If you're taking the address of a variable, it will always be non-nil. The value might be zero, but if possible we should distinguish between 0 and not set (nil).
There was a problem hiding this comment.
Wouldn't it actually be more accurate to set InodesUsed to 0 in that case? We only skip collection for some storage drivers if they do not use inodes, which means they use 0 of the inodes shared by the rest of the pods.
There was a problem hiding this comment.
Just to document, since these storage drivers do not use inodes, I am leaving this the way it is, since using 0 inodes is equivalent to using no inodes. Also, making a change here would require chaning the v1 api, or a large refactor of the code.
|
ok to test |
|
@k8s-bot test this |
|
@k8s-bot ok to test |
|
@k8s-bot test this It didn't have permissions to set the status, but the test ran :P |
|
pushed a commit addressing all comments except device-mapper related ones. |
d0aee10 to
9835f82
Compare
| TotalUsageBytes: 0, | ||
| BaseUsageBytes: 0, | ||
| InodeUsage: 0, | ||
| }, |
There was a problem hiding this comment.
nit: no need to specify this (values default to 0)
| if extraDiskErr == nil && fh.extraDir != "" { | ||
| fh.usage.BaseUsageBytes = baseUsage | ||
| } | ||
| //combine errors into a single error to return |
There was a problem hiding this comment.
nit: space after //, start comments with a Capital.
| const maxConcurrentDus = 20 | ||
|
|
||
| // The maximum number of `find` tasks that can be running at once. | ||
| const maxConcurrentFinds = 20 |
There was a problem hiding this comment.
Do we need a separate pool for each of these? On the one hand, if one operation is significantly slower, it prevents it from hogging the tokens. On the other hand, the operations happen in sequence, so the one operation will be blocking anyway. I'm leaning towards a single "exec" pool.
| wcCmd := exec.Command("wc", "-c") | ||
| wcCmd.Stdin, _ = findCmd.StdoutPipe() | ||
|
|
||
| stdoutp, err := wcCmd.StdoutPipe() |
There was a problem hiding this comment.
I think it would be better to assign a bytes.Buffer to stdout and stderr. I'm not positive, but I tihnk the process could hang if the pipe fills up, waiting for it to be drained.
| as.True(expectedSize <= size, "expected dir size to be at-least %d; got size: %d", expectedSize, size) | ||
| } | ||
|
|
||
| //make sure that the timeout is actually being triggered (found this bug in PR#1489) |
There was a problem hiding this comment.
nit: spaces & capitals, same below.
| dir, err := ioutil.TempDir(os.TempDir(), "") | ||
| as.NoError(err) | ||
| defer os.RemoveAll(dir) | ||
| dataSize := 1024 * 10000 //1000 KB bigger to make sure it triggers tihe timeout |
There was a problem hiding this comment.
s/tihe/the/
Did you mean 1,000 KB or 10,000 KB?
There was a problem hiding this comment.
looks like you missed the typo
| numFiles := 1000 | ||
| for i := 0; i < numFiles; i++ { | ||
| _, err := ioutil.TempFile(dir, "") | ||
| as.NoError(err) |
There was a problem hiding this comment.
I think you should use a require here, otherwise this could be a lot of error spam.
| numFiles := 100000 //make sure we actually trigger the timeout | ||
| for i := 0; i < numFiles; i++ { | ||
| _, err := ioutil.TempFile(dir, "") | ||
| as.NoError(err) |
eb60f6e to
e92659c
Compare
|
Jenkins GCE e2e failed for commit acb2426. Full PR test history. The magic incantation to run this job again is |
|
LGTM once you fix the typo and squash the commits. |
ee2e96e to
633578d
Compare
…ind . -xdev printf '.'| wc -c' this is published in the v2 api using a new field
633578d to
9fdeefe
Compare
Closes: kubernetes#2896 According to the report, the google/cadvisor#1422 has been closed. However, the related issue has been fixed in google/cadvisor#1489 and merged a long time ago. We can safely remove the known issue now.
Per-container stats available in the v2 api in the FileSystemStats.InodeUsage field. They are available in the v1 stats in the Filesystem.Inodes field. Per-container Inode stats collected using the command "find . -xdev | wc -l". The collection of inode stats uses the same back-off strategy as measuring per container bytes used, so that slow calls to find do not severely impact performance.