-
Notifications
You must be signed in to change notification settings - Fork 105
Description
Note
Apologies for the long issue, it was a lot of investigation work. Note that this is not AI generated. It's me taking the pains of writing it down.
Hey all, we're perhaps an unusual user of fscrypt, but wanted to share our current challenges. We have a non-trivial production fleet on Kubernetes (100s of nodes, ~20k cores, 150 TiB memory, with high CPU and memory usage across GCP, AWS, Azure). The services in question use fscrypt to encrypt customer data on persistent volumes (block storage) on a per-customer basis. fscrypt runs as a separate Pod in all those nodes with low CPU and memory limits, for efficiency and cost reasons.
The core of the issue is that fscrypt does not respect CPU and memory limits set in cgroups. The behavior under such conditions is pathological, especially with large nodes (>= 64 CPUs).
This is in summary what I see:
- Ignoring CPU limits causes massive slowdowns.
- Ignoring memory limits causes OOM crashes, unless they are >= 256 MiB
- Large CPU consumption spikes on large machines (the greater the bigger the machine is)
- These spikes in usage occur only during
fscrypt setup
I've run some benchmarks to demonstrate the problem on a 64 cores machine, while varying CPU and memory limits. See some graphs below.
Large execution time under low CPU limits:
Memory usage is 140 MiB regardless of limits:
CPU usage depends on the number of CPUs rather than limits, and on a 64 machine causes a large spike (150s across all cores):
Root causes
fscrypt setup runs tests to determine hashing costs. However,
- Instead of using cgroup limits, it uses
runtime.NumCPU - For memory limits, it uses
Sysinfo.Totalram. - Argon2 spawns up to 256 threads for the testing, which is quite expensive on large machines. Even more when limits are much lower than CPU count.
- It uses actual CPU usage, rather than wall time, to determine for how long to run the tests, which is very skewed when limits are much lower than CPU count.
Possible fixes
I tried fixing this in several ways on our side, but I think this would be better fixed here. What I tried:
- Remove CPU limits and keep requests: This makes performance really unpredictable. On busy nodes, Kubernetes starts enforcing the actual requests, making the problem appear at random times, rather than consistently.
- Increasing CPU limits: We can't really do that as the nodes are already pretty full, and the excess capacity is essentially wasted after startup.
- Use
--time=5msin setup: This fixes the CPU usage by artificially setting very low limits for the hashing test, but doesn't fix memory usage.
I believe the only proper fix is to make fscrypt cgroup aware, and my preliminary tests seem to show that the problem goes completely away in that scenario.
Please let me know what you think. I'm happy to contribute a fix, as I fiddled with this for long enough at this point.