Introduce "eventloop" style API to better handle netlink event storms #499
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Please consider this proposal.
In mass network topology updates, thousands of netlink packets lead to excessive resource usage (one thread per event use much memory and results in slow processing) and hanging process (recovery from failed IPDB commit is slow, and monitoring thread is prone to die without the master thread noticing).
This change allows the user to process incoming netlink messages in an event loop (in the main thread, or in any user-initiated thread(s)), and netlink socket receive errors are passed via the same queue, and re-raised in the master thread. This allows the user program to notice that netlink events where lost and take appropriate action.
The last commit of three allows to specify the size of netlink socket send and receive buffers, and thus control the "storm sensitivity" of the system.
Sidenote: on Linux, system wide max size of socket receive and send buffers is controlled with
/proc/sys/net/core/rmem_max, it may be worth to mention in the documentation.