New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix fsmanager.GetStats performance (regression in rc93) #2921
Conversation
c78334f
to
474a605
Compare
| path = dir + "/" + file | ||
| } else { | ||
| var err error | ||
| path, err = securejoin.SecureJoin(dir, file) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Are we certain that we need this for writes, too? Another option may be to check for the caller being root and taking the fast path for that, too.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I thought about writes, too. Frankly, I don't know if we should use securejoin or not, just being cautious here as write is potentially more harmful.
Maybe @AkihiroSuda can chime in wrt why it was added in PR #2169. Cc @cyphar.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There is a potential for attack along the lines of the /proc attacks we've seen in the past. Unfortunately there isn't a nice solution that doesn't involve a performance penalty -- even with libpathrs. However there are two counterpoints:
- We could make a faster slow path for host paths -- just open the target path as
O_PATHand then check the target (statfs,readlink == intended path, and so on). This is probably a more practical defense anyway. SecureJoindoesn't defend against the more powerful bind-mount attack, so arguably using it isn't protecting us as much as (1) would anyway.
But yes, it is important to note that just because something is a host path, it isn't necessarily safe because it's possible that we access it before pivot_root(2) but after the container has done their mounts (now, I'm not sure this actually happens for /sys/fs/cgroup but I'd rather be safe than sorry).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@cyphar is your comment about further optimizing the slow (write) path? If yes, we can address it later, as currently I am more worried about the open-for-read performance -- we have frequent calls to GetStats calls from k8s/cadvisor, resulting in lots of open-for-read calls. Writes are relatively seldom compared to that -- they mostly come from Set which I guess is not called that frequently.
I was thinking about using fstatfs(2) right after open -- merely to check that we have just opened cgroupfs/cgroupfs2 file. Perhaps this check is enough, but I am not sure what's the overhead of added fstatfs (will measure it).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I was thinking about using
fstatfs(2)right after open
I am not sure if this adds any security, but adding a fstatfs after open to check that we're still on cgroupfs do not affect performance much (seeing ~5% degradation in BenchmarkGetStats), so I have added this commit.
AFAIK neither v1 or v2 cgroupfs have symlinks so no room for /proc-like exploits.
Given this, maybe we should not use SecureJoin for the write case... 🤔
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also, from what I see, we always audit/clean cgroup paths so we can't possibly have a path with ../ in it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Updated the PR to
- not have a distinction between read and write;
- never use securejoin;
- always check that the opened file is on cgroupfs.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
is your comment about further optimizing the slow (write) path? I
I meant to say that we should do what you ended up doing -- having a much less expensive check (fstatfs) which is probably more effective when openat2 isn't available -- for both read and write. (I call non-openat2 the slow path because openat2 does exactly what we need here, and SecureJoin or fstatfs are both less than ideal in comparison.)
AFAIK neither v1 or v2 cgroupfs have symlinks so no room for /proc-like exploits.
They don't have symlinks, but there is room for /proc-like exploits because you could have a mount configuration that messes around with /sys/fs/cgroup and thus results in runc not actually writing to the path you expect (there was a CVE like this in /proc but it wasn't /proc-specific -- it was just targeting the process labels).
However the only real solution for this is openat2(RESOLVE_NO_XDEV) which we use if it's available -- so there's probably not much else you can do.
On my CentOS 8 VM it shows: > BenchmarkGetStats-8 8376 625135 ns/op Signed-off-by: Kir Kolyshkin <kolyshkin@gmail.com>
In case openat2() is not available, it does not make sense to calculate relpath (and check if path has /sys/fs/cgroup prefix). Reverse the order of checks to not do that in case openat2 is not available. Signed-off-by: Kir Kolyshkin <kolyshkin@gmail.com>
474a605
to
4369873
Compare
|
Looks like CI is failing @kolyshkin (haven't checked details) |
ef4d43a
to
740c8e5
Compare
Commit 88e8350, among the other things, replaced filepath.Join with securejoin.SecureJoin for both reads and writes to cgroupfs. Commits e76ac1c and 31f0f5b switched more code to use fscommon.ReadFile (and thus securejoin). Commit 0228226 introduced fscommon.OpenFile (which uses securejoin as the fallback if openat2(2) is not available, which is the case for older kernels), and commit c95e690 switched most of cgroup/fs[2] code to use it. As a result, fs.GetStats() method became noticeable slower, mostly due to securejoin calling os.Lstat and filepath.Clean. Using securejoin as a security measure for cgroupfs files is not well justified, as cgroupfs do not contain symlinks, and none of the code using it have uncleaned paths. In particular, fs/fs2/systemd managers do check and sanitize their paths. This commit modifies the code to not use securejoin. Instead, it checks that the opened file is indeed on cgroupfs. Using BenchmarkGetStats on a CentOS 8 VM, I see the following improvement: Before: > BenchmarkGetStats-8 8376 625135 ns/op After: > BenchmarkGetStats-8 12226 485015 ns/op An intermediate version, with no fstatfs to check fstype: > BenchmarkGetStats-8 13162 452281 ns/op Signed-off-by: Kir Kolyshkin <kolyshkin@gmail.com>
740c8e5
to
850b2c4
Compare
The error message is quite funny :)
I added a check that opened files are on cgroupfs, and forgot to enable TestMode flag in unit tests. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM.
Commit 88e8350 (PR #2169), among the other things, replaced filepath.Join with
securejoin.SecureJoin for both reads and writes to cgroupfs.
Commits e76ac1c and 31f0f5b (PR #2604) switched more code to use
fscommon.ReadFile(and thussecurejoin). Commit 0228226 (PR #2635) introducedfscommon.OpenFile(which usessecurejoinas a fallback ifopenat2(2)is not available,which is the case for older kernels), and commit c95e690 (PR #2635) switched
most of cgroup/fs[2] code to use it.
As a result,
fs.GetStats()method became noticeable slower, mostly dueto securejoin calling
os.Lstatandfilepath.Clean.Using securejoin as a security measure for cgroupfs files is
not well justified, as cgroupfs do not contain symlinks, and none of the
code using it have uncleaned paths. In particular, fs/fs2/systemd
managers do check and sanitize their paths.
This PR modifies the code to not use securejoin for cgroupfs. Instead,
it checks that the opened file is indeed on cgroupfs.
Using a BenchmarkGetStats that this PR introduces, I see the following
improvement (on a CentOS 8 VM):
Before:
After:
An intermediate version, with no fstatfs to check fstype: