fs: get inodes and disk usage via pure go #2171

namreg · 2019-02-15T10:19:47Z

Hello there,
I would like to go back to the PR have been making almost two years ago. (this one #1576). @euank proposed to get rid of find command. I took as a basis his PR and also got rid of du command. Now we collect both disk space and inode stats at the same time.

Why it matters for us: both du and find overwhelme our audit log.

Also, I have benchmarked du command and pure go implementation and got the following results:

Benchmark_duDiskUsage-8             1000           1597743 ns/op           22551 B/op        165 allocs/op
Benchmark_osStatDiskUsage-8        20000             56153 ns/op           53624 B/op         88 allocs/op

As we can see, pure go implementation is ~20x faster.

k8s-ci-robot · 2019-02-15T10:20:08Z

Hi @namreg. Thanks for your PR.

I'm waiting for a google or kubernetes member to verify that this patch is reasonable to test. If it is, they should reply with /ok-to-test on its own line. Until that is done, I will not automatically test new commits in this PR, but the usual testing commands by org members will still work. Regular contributors should join the org to skip this step.

Once the patch is verified, the new status will be reflected by the ok-to-test label.

I understand the commands that are listed here.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

dims · 2019-02-15T12:52:16Z

/ok-to-test

dashpole · 2019-02-15T17:47:26Z

How much data is in the directory it is getting disk and inodes on?

namreg · 2019-02-16T06:30:45Z

@dashpole Do you mean how much data in the directory when benchmarking? To be honest, I have ran it in a small directory that contains ~10 items. Also, I ran benchmark only for du command.
But I have remade benchmarks and now have the following results:

goos: linux
goarch: amd64
pkg: github.com/namreg/find-du-bench
BenchmarkDiskUsageNew-2   	   20000	     85928 ns/op	   22720 B/op	      90 allocs/op
BenchmarkDiskUsageOld-2   	      50	  34123838 ns/op	   46605 B/op	     182 allocs/op

Benchmarks is running on the cadvisor source code directory, including .git (2557 files, ~57M). The benchmarks source code you can see in the repo https://github.com/namreg/find-du-bench

dashpole · 2019-02-19T19:21:35Z

Can you try it on a large directory? Say 10 GB?

Thanks a bunch for running these experiments.

euank · 2019-02-19T19:39:55Z

The best data to use is probably a /var/lib/docker directory using the overlay/overlay2 graphdriver. That's what this code is meant to count in kubernetes after all, and that directory will also often have quite a few hardlinks depending on the container images you have laying there. It could be worth trying to populate it with something approximating a typical k8s workload, either by running the benchmark on a k8s node, or by trying to download similar images and run similar sets of containers.

namreg · 2019-02-20T07:42:20Z

@dashpole @euank So, I've tried to run the benchmark on the one of the kubernetes node in the directory /var/lib/docker (the size is ~17GB), as was proposed above. And now I have the following results:

goos: linux
goarch: amd64
BenchmarkDiskUsageNew-4   	       5	 249946403 ns/op	96558353 B/op	  245327 allocs/op
BenchmarkDiskUsageOld-4   	       1	1132808839 ns/op	   63032 B/op	     238 allocs/op

As we can see, new implementation is still faster, but it consumes a lot more memory. I believe this happens because go runtime does not track memory for processes that were spawned by exec.Command.

Nevertheless, here are pprof results:

Showing nodes accounting for 777.70MB, 99.49% of 781.71MB total
Dropped 8 nodes (cum <= 3.91MB)
Showing top 10 nodes out of 26
      flat  flat%   sum%        cum   cum%
  548.19MB 70.13% 70.13%   557.21MB 71.28%  os.(*File).readdirnames
   85.52MB 10.94% 81.07%   145.03MB 18.55%  os.lstatNolog
   64.51MB  8.25% 89.32%    64.51MB  8.25%  syscall.ByteSliceFromString
   61.01MB  7.80% 97.12%    61.01MB  7.80%  strings.Join
    9.03MB  1.15% 98.28%     9.03MB  1.15%  syscall.ParseDirent
    5.50MB   0.7% 98.98%     5.50MB   0.7%  os.newFile
    3.96MB  0.51% 99.49%     3.96MB  0.51%  _/tmp/find-du-bench.DiskUsageNew.func1
         0     0% 99.49%   781.20MB 99.94%  _/tmp/find-du-bench.BenchmarkDiskUsageNew
         0     0% 99.49%   781.20MB 99.94%  _/tmp/find-du-bench.DiskUsageNew
         0     0% 99.49%   557.21MB 71.28%  os.(*File).Readdirnames

(pprof) list .readdirnames
Total: 781.71MB
ROUTINE ======================== os.(*File).readdirnames in /usr/local/go/src/os/dir_unix.go
  548.19MB   557.21MB (flat, cum) 71.28% of Total
         .          .     45:}
         .          .     46:
         .          .     47:func (f *File) readdirnames(n int) (names []string, err error) {
         .          .     48:	// If this file has no dirinfo, create one.
         .          .     49:	if f.dirinfo == nil {
    2.50MB     2.50MB     50:		f.dirinfo = new(dirInfo)
         .          .     51:		// The buffer must be at least a block long.
  454.03MB   454.03MB     52:		f.dirinfo.buf = make([]byte, blockSize)
         .          .     53:	}
         .          .     54:	d := f.dirinfo
         .          .     55:
         .          .     56:	size := n
         .          .     57:	if size <= 0 {
         .          .     58:		size = 100
         .          .     59:		n = -1
         .          .     60:	}
         .          .     61:
   91.66MB    91.66MB     62:	names = make([]string, 0, size) // Empty with room to grow.
         .          .     63:	for n != 0 {
         .          .     64:		// Refill the buffer if necessary
         .          .     65:		if d.bufp >= d.nbuf {
         .          .     66:			d.bufp = 0
         .          .     67:			var errno error
         .          .     68:			d.nbuf, errno = f.pfd.ReadDirent(d.buf)
         .          .     69:			runtime.KeepAlive(f)
         .          .     70:			if errno != nil {
         .          .     71:				return names, wrapSyscallError("readdirent", errno)
         .          .     72:			}
         .          .     73:			if d.nbuf <= 0 {
         .          .     74:				break // EOF
         .          .     75:			}
         .          .     76:		}
         .          .     77:
         .          .     78:		// Drain the buffer
         .          .     79:		var nb, nc int
         .     9.03MB     80:		nb, nc, names = syscall.ParseDirent(d.buf[d.bufp:d.nbuf], n, names)
         .          .     81:		d.bufp += nb
         .          .     82:		n -= nc
         .          .     83:	}
         .          .     84:	if n >= 0 && len(names) == 0 {
         .          .     85:		return names, io.EOF

dashpole · 2019-02-20T19:34:14Z

I am going to run our eviction tests with the PR as another data point.

You mentioned at the beginning: "both du and find overwhelm our audit log". Have you done any experiments to show that this helps your issue?

dashpole · 2019-02-20T22:23:24Z

eviction tests are all good. It looks like the latency on du and find is about the same as this implementation in the inode eviction test (lots of small files).

fs/fs.go

namreg · 2019-02-21T09:14:12Z

@dashpole

You mentioned at the beginning: "both du and find overwhelm our audit log". Have you done any experiments to show that this helps your issue?

Yeah, I have deployed this patch into a kubernetes cluster. As a result, number of the audit log items drastically decreased. It turns out, ionice and nice was also being there. Hence, this patch removes from audit logs four classes of records: du, find, ionice, nice

dashpole · 2019-02-21T17:24:59Z

Ah, I get it now. It was the calls themselves, not the du and find took ... log messages that were overwhelming your audit logging.

namreg · 2019-02-21T20:13:35Z

@dashpole yeah, exactly :)

Does anything else concern you in this PR?

fs/fs.go

dashpole · 2019-02-22T19:00:04Z

/retest

namreg · 2019-02-25T07:27:17Z

@dashpole is there anything else I should fix?

dashpole

lgtm

PaulFurtado · 2019-03-28T08:02:56Z

By moving this into the cadvisor process, we actually lose a couple of things:

du/find were being wrapped in "ionice", "-c3", "nice", "-n", "19" to make them lower priority. Now they will run at the same priority as cadvisor/kubelet
if storage was hung, the syscall would hang in the du/find child processes, instead of cadvisor/kubelet. Now, kubelet will be unkillable in this situation.

Many monitoring solutions fork off child processes to do these tasks for these reasons. I like that this is moving to pure-go, but I think it should still fork and alter the niceness values with syscalls.

k8s-ci-robot added the needs-ok-to-test label Feb 15, 2019

namreg force-pushed the replace-du-and-find branch from 79af676 to 4b680b3 Compare February 15, 2019 11:59

k8s-ci-robot added ok-to-test and removed needs-ok-to-test labels Feb 15, 2019

dashpole reviewed Feb 20, 2019

View reviewed changes

fs/fs.go Outdated Show resolved Hide resolved

fs/fs.go Show resolved Hide resolved

namreg force-pushed the replace-du-and-find branch from 4b680b3 to 353322f Compare February 21, 2019 09:07

dashpole reviewed Feb 21, 2019

View reviewed changes

fs/fs.go Show resolved Hide resolved

namreg force-pushed the replace-du-and-find branch from 353322f to 2739586 Compare February 22, 2019 08:03

dashpole reviewed Feb 22, 2019

View reviewed changes

fs/fs.go Outdated Show resolved Hide resolved

fs: get inodes and disk usage via pure go

046818d

namreg force-pushed the replace-du-and-find branch from 2739586 to 046818d Compare February 22, 2019 18:46

dashpole approved these changes Feb 25, 2019

View reviewed changes

dashpole merged commit 05529e2 into google:master Feb 25, 2019

euank mentioned this pull request Mar 28, 2019

fs: get inodes via pure go instead of find #1576

Closed

namreg deleted the replace-du-and-find branch April 3, 2019 14:44

wzshiming mentioned this pull request Jun 25, 2021

TOB-K8S-023: kubelet crash due to improperly handled errors kubernetes/kubernetes#81135

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fs: get inodes and disk usage via pure go #2171

fs: get inodes and disk usage via pure go #2171

namreg commented Feb 15, 2019 •

edited

k8s-ci-robot commented Feb 15, 2019

dims commented Feb 15, 2019

dashpole commented Feb 15, 2019

namreg commented Feb 16, 2019 •

edited

dashpole commented Feb 19, 2019

euank commented Feb 19, 2019

namreg commented Feb 20, 2019 •

edited

dashpole commented Feb 20, 2019

dashpole commented Feb 20, 2019

namreg commented Feb 21, 2019

dashpole commented Feb 21, 2019

namreg commented Feb 21, 2019

dashpole commented Feb 22, 2019

namreg commented Feb 25, 2019

dashpole left a comment

PaulFurtado commented Mar 28, 2019

fs: get inodes and disk usage via pure go #2171

fs: get inodes and disk usage via pure go #2171

Conversation

namreg commented Feb 15, 2019 • edited

k8s-ci-robot commented Feb 15, 2019

dims commented Feb 15, 2019

dashpole commented Feb 15, 2019

namreg commented Feb 16, 2019 • edited

dashpole commented Feb 19, 2019

euank commented Feb 19, 2019

namreg commented Feb 20, 2019 • edited

dashpole commented Feb 20, 2019

dashpole commented Feb 20, 2019

namreg commented Feb 21, 2019

dashpole commented Feb 21, 2019

namreg commented Feb 21, 2019

dashpole commented Feb 22, 2019

namreg commented Feb 25, 2019

dashpole left a comment

Choose a reason for hiding this comment

PaulFurtado commented Mar 28, 2019

namreg commented Feb 15, 2019 •

edited

namreg commented Feb 16, 2019 •

edited

namreg commented Feb 20, 2019 •

edited