Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix fsmanager.GetStats performance (regression in rc93) #2921

Merged
merged 3 commits into from Apr 30, 2021

Conversation

kolyshkin
Copy link
Contributor

@kolyshkin kolyshkin commented Apr 27, 2021

Commit 88e8350 (PR #2169), among the other things, replaced filepath.Join with
securejoin.SecureJoin for both reads and writes to cgroupfs.

Commits e76ac1c and 31f0f5b (PR #2604) switched more code to use
fscommon.ReadFile (and thus securejoin). Commit 0228226 (PR #2635) introduced
fscommon.OpenFile (which uses securejoin as a fallback if openat2(2) is not available,
which is the case for older kernels), and commit c95e690 (PR #2635) switched
most of cgroup/fs[2] code to use it.

As a result, fs.GetStats() method became noticeable slower, mostly due
to securejoin calling os.Lstat and filepath.Clean.

Using securejoin as a security measure for cgroupfs files is
not well justified, as cgroupfs do not contain symlinks, and none of the
code using it have uncleaned paths. In particular, fs/fs2/systemd
managers do check and sanitize their paths.

This PR modifies the code to not use securejoin for cgroupfs. Instead,
it checks that the opened file is indeed on cgroupfs.

Using a BenchmarkGetStats that this PR introduces, I see the following
improvement (on a CentOS 8 VM):

Before:

BenchmarkGetStats-8               8376            625135 ns/op

After:

BenchmarkGetStats-8              12226            485015 ns/op

An intermediate version, with no fstatfs to check fstype:

BenchmarkGetStats-8              13162            452281 ns/op

path = dir + "/" + file
} else {
var err error
path, err = securejoin.SecureJoin(dir, file)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are we certain that we need this for writes, too? Another option may be to check for the caller being root and taking the fast path for that, too.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I thought about writes, too. Frankly, I don't know if we should use securejoin or not, just being cautious here as write is potentially more harmful.

Maybe @AkihiroSuda can chime in wrt why it was added in PR #2169. Cc @cyphar.

Copy link
Member

@cyphar cyphar Apr 28, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There is a potential for attack along the lines of the /proc attacks we've seen in the past. Unfortunately there isn't a nice solution that doesn't involve a performance penalty -- even with libpathrs. However there are two counterpoints:

  1. We could make a faster slow path for host paths -- just open the target path as O_PATH and then check the target (statfs, readlink == intended path, and so on). This is probably a more practical defense anyway.
  2. SecureJoin doesn't defend against the more powerful bind-mount attack, so arguably using it isn't protecting us as much as (1) would anyway.

But yes, it is important to note that just because something is a host path, it isn't necessarily safe because it's possible that we access it before pivot_root(2) but after the container has done their mounts (now, I'm not sure this actually happens for /sys/fs/cgroup but I'd rather be safe than sorry).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@cyphar is your comment about further optimizing the slow (write) path? If yes, we can address it later, as currently I am more worried about the open-for-read performance -- we have frequent calls to GetStats calls from k8s/cadvisor, resulting in lots of open-for-read calls. Writes are relatively seldom compared to that -- they mostly come from Set which I guess is not called that frequently.

I was thinking about using fstatfs(2) right after open -- merely to check that we have just opened cgroupfs/cgroupfs2 file. Perhaps this check is enough, but I am not sure what's the overhead of added fstatfs (will measure it).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was thinking about using fstatfs(2) right after open

I am not sure if this adds any security, but adding a fstatfs after open to check that we're still on cgroupfs do not affect performance much (seeing ~5% degradation in BenchmarkGetStats), so I have added this commit.

AFAIK neither v1 or v2 cgroupfs have symlinks so no room for /proc-like exploits.

Given this, maybe we should not use SecureJoin for the write case... 🤔

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also, from what I see, we always audit/clean cgroup paths so we can't possibly have a path with ../ in it.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Updated the PR to

  • not have a distinction between read and write;
  • never use securejoin;
  • always check that the opened file is on cgroupfs.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is your comment about further optimizing the slow (write) path? I

I meant to say that we should do what you ended up doing -- having a much less expensive check (fstatfs) which is probably more effective when openat2 isn't available -- for both read and write. (I call non-openat2 the slow path because openat2 does exactly what we need here, and SecureJoin or fstatfs are both less than ideal in comparison.)

AFAIK neither v1 or v2 cgroupfs have symlinks so no room for /proc-like exploits.

They don't have symlinks, but there is room for /proc-like exploits because you could have a mount configuration that messes around with /sys/fs/cgroup and thus results in runc not actually writing to the path you expect (there was a CVE like this in /proc but it wasn't /proc-specific -- it was just targeting the process labels).

However the only real solution for this is openat2(RESOLVE_NO_XDEV) which we use if it's available -- so there's probably not much else you can do.

On my CentOS 8 VM it shows:

> BenchmarkGetStats-8   	    8376	    625135 ns/op

Signed-off-by: Kir Kolyshkin <kolyshkin@gmail.com>
In case openat2() is not available, it does not make sense to calculate
relpath (and check if path has /sys/fs/cgroup prefix).

Reverse the order of checks to not do that in case openat2 is not
available.

Signed-off-by: Kir Kolyshkin <kolyshkin@gmail.com>
@kolyshkin kolyshkin changed the title Fix fsmanager.GetStats performance regression Fix fsmanager.GetStats performance (regression in rc93) Apr 27, 2021
@kolyshkin kolyshkin added this to the 1.0.0-rc94 milestone Apr 27, 2021
@thaJeztah
Copy link
Member

Looks like CI is failing @kolyshkin (haven't checked details)

Commit 88e8350, among the other things, replaced filepath.Join with
securejoin.SecureJoin for both reads and writes to cgroupfs.

Commits e76ac1c and 31f0f5b switched more code to use
fscommon.ReadFile (and thus securejoin). Commit 0228226 introduced
fscommon.OpenFile (which uses securejoin as the fallback if openat2(2)
is not available, which is the case for older kernels), and commit
c95e690 switched most of cgroup/fs[2] code to use it.

As a result, fs.GetStats() method became noticeable slower, mostly due
to securejoin calling os.Lstat and filepath.Clean.

Using securejoin as a security measure for cgroupfs files is
not well justified, as cgroupfs do not contain symlinks, and none of the
code using it have uncleaned paths. In particular, fs/fs2/systemd
managers do check and sanitize their paths.

This commit modifies the code to not use securejoin. Instead, it checks
that the opened file is indeed on cgroupfs.

Using BenchmarkGetStats on a CentOS 8 VM, I see the following
improvement:

Before:
> BenchmarkGetStats-8               8376            625135 ns/op

After:
> BenchmarkGetStats-8   	   12226	    485015 ns/op

An intermediate version, with no fstatfs to check fstype:
> BenchmarkGetStats-8              13162            452281 ns/op

Signed-off-by: Kir Kolyshkin <kolyshkin@gmail.com>
@kolyshkin
Copy link
Contributor Author

Looks like CI is failing

The error message is quite funny :)

utils_test.go:36: open /tmp/cgroup_utils_test065049446/cgroup.file: not a cgroup file

I added a check that opened files are on cgroupfs, and forgot to enable TestMode flag in unit tests.

Copy link
Member

@cyphar cyphar left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

5 participants