New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
scylla does not start when kernel inotify limits are exceeded #7700
Comments
Since f3bcd4d ("Merge 'Support SSL Certificate Hot Reloading' from Calle"), we reload certificates as they are modified on disk. This uses inotify, which is limited by a sysctl fs.inotify.max_user_instances, with a default of 128. This is enough for 64 shards only, if both rpc and cql are encrypted; above that startup fails. Increase to 1200, which is enough for 6 instances * 200 shards. Fixes scylladb#7700.
Maybe this error should also not be fatal, but instead print an error message in the logs? I guess it depends if the mechanisms that rely on inotify also allow other ways of reloading observed files (e.g. via a signal or REST or whatever). |
Backported to 4.1, 4.2, 4.3. |
Since f3bcd4d ("Merge 'Support SSL Certificate Hot Reloading' from Calle"), we reload certificates as they are modified on disk. This uses inotify, which is limited by a sysctl fs.inotify.max_user_instances, with a default of 128. This is enough for 64 shards only, if both rpc and cql are encrypted; above that startup fails. Increase to 1200, which is enough for 6 instances * 200 shards. Fixes #7700. Closes #7701 (cherry picked from commit 390e07d)
Since f3bcd4d ("Merge 'Support SSL Certificate Hot Reloading' from Calle"), we reload certificates as they are modified on disk. This uses inotify, which is limited by a sysctl fs.inotify.max_user_instances, with a default of 128. This is enough for 64 shards only, if both rpc and cql are encrypted; above that startup fails. Increase to 1200, which is enough for 6 instances * 200 shards. Fixes #7700. Closes #7701 (cherry picked from commit 390e07d)
Since f3bcd4d ("Merge 'Support SSL Certificate Hot Reloading' from Calle"), we reload certificates as they are modified on disk. This uses inotify, which is limited by a sysctl fs.inotify.max_user_instances, with a default of 128. This is enough for 64 shards only, if both rpc and cql are encrypted; above that startup fails. Increase to 1200, which is enough for 6 instances * 200 shards. Fixes #7700. Closes #7701 (cherry picked from commit 390e07d)
Question: Should we try to address this on a seastar level? While the basic problem or shard multiplication cannot be solved, we could maybe fix it somewhat for the usage pattern of shard-shared credentials builder generating a reloadable credentials object per shard. There will be a lot of cross-shard calls when stuff changes, but... We can also add a fallback option for the actual originating shard reloader, to use polling iff inotify is not available. |
We could, but it's a huge amount of work compared to writing to a sysctl file. |
I take that as a down prioritization of the idea. |
Yes. |
Each tls instance consumes an inotify watch, and there can be multiple tls instances per shard. A large machine can run out, and will fail startup. The default is 128, which is enough for 64 shards.
The text was updated successfully, but these errors were encountered: