Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

inotify file descriptor issue on large instances causes redpanda galaxy module to crash #211

Open
WesWWagner opened this issue Jan 23, 2024 · 0 comments

Comments

@WesWWagner
Copy link
Contributor

WesWWagner commented Jan 23, 2024

When building a 15 node im4gn cluster with TLS and prometheus monitoring enabled, I have an issue where Redpanda fails to start due to the following message:

ubuntu@ip-172-31-16-44:~$ journalctl -f -u redpanda | grep -i error
Jan 23 01:13:08 ip-172-31-16-44 rpk[12253]: ERROR 2024-01-23 01:13:08,030 [shard 0] main - application.cc:388 - Failure during startup: std::__1::system_error (error system:24, could not create inotify instance: Too many open files)

ubuntu@ip-172-31-16-44:~$ ulimit -n
1024

I have not yet looked into the code for the galaxy component but something is not configuring enough inode and linux security widgets before spooling up redpanda for the first time on large instances (which will start more threads because of more cores, etc)

I tested this on 23.3.3 and 23.2.10 and received the same behavior so it is not a recent regression.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant