Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Introduce "eventloop" style API to better handle netlink event storms #499

Merged
merged 3 commits into from May 14, 2018

Conversation

crosser
Copy link
Contributor

@crosser crosser commented May 8, 2018

Please consider this proposal.

In mass network topology updates, thousands of netlink packets lead to excessive resource usage (one thread per event use much memory and results in slow processing) and hanging process (recovery from failed IPDB commit is slow, and monitoring thread is prone to die without the master thread noticing).

This change allows the user to process incoming netlink messages in an event loop (in the main thread, or in any user-initiated thread(s)), and netlink socket receive errors are passed via the same queue, and re-raised in the master thread. This allows the user program to notice that netlink events where lost and take appropriate action.

The last commit of three allows to specify the size of netlink socket send and receive buffers, and thus control the "storm sensitivity" of the system.

Sidenote: on Linux, system wide max size of socket receive and send buffers is controlled with
/proc/sys/net/core/rmem_max, it may be worth to mention in the documentation.

@svinota
Copy link
Owner

svinota commented May 9, 2018

@celebdor Antoni, pls take a look at the PR

Eugene Crosser added 3 commits May 14, 2018 15:41
Event queue interface is an alternative to "post" callbacks.
While "post" callbacks are executed in a separate thread each,
event queue interface follows the "eventloop" metaphor, i.e.
each netlink event received in the monitoring thread is put in
the queue from which it can be subsequently fetched by calling
`nextmsg()` generator function in the main thread (or in any other
thread(s) started by the user). In the event of packet storm, it
is much nicer to the resources than creating a thread for each
received message.

Signed-off-by: Eugene Crosser <crosser@average.org>
IPDB reinitialization is rather fragile, and sometimes hangs
for a long time, and prevents release() from finishing.
When event queue is used, errors in the monitor thread are
reported to the user via event queue, so the user can take
care of cleanup and reinitialization.

Signed-off-by: Eugene Crosser <crosser@average.org>
Introduce keyword arguments to IPRoute() and IPDB() constructors
to specify netlink socket send and receive buffer sizes. Defaults
are 1048576, 1048576 (one megabyte). Useful for defining strategy
for handling netlink packet storms: specify smaller size when
early bailout is desired. When used in conjunction with eventqueue,
exception will be raised in the main thread when netlink events
are lost, allowing for user-controlled recovery procedure.

Signed-off-by: Eugene Crosser <crosser@average.org>
@svinota svinota merged commit ef01d3f into svinota:master May 14, 2018
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants