New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Reduce blocking in main Naemon event loop #4

Merged
merged 2 commits into from Oct 13, 2016

Conversation

Projects
None yet
4 participants
@jjneely
Contributor

jjneely commented Sep 19, 2016

With the Merlin network IO now being completely handled inside the
Naemon event loop, its possible under high numbers of checks and TCP
congestion / full buffers that we could end up poll()'ing and blocking
for 100ms for each check Naemon processes. This starves the even loop
for resources, Naemon cannot process check results, and we are not
reading from the Merlin sockets. This then cascades to peers as they
begin to block on a full TCP write buffer.

I've reduced the amount of time poll() will block for from 100ms to
10ms. I wonder if we should not block at all in the event loop.

jjneely added some commits Sep 19, 2016

Reduce blocking in main Naemon event loop
With the Merlin network IO now being completely handled inside the
Naemon event loop, its possible under high numbers of checks and TCP
congestion / full buffers that we could end up poll()'ing and blocking
for 100ms for each check Naemon processes.  This starves the even loop
for resources, Naemon cannot process check results, and we are not
reading from the Merlin sockets.  This then cascades to peers as they
being to block on a full TCP write buffer.

I've reduced the amount of time poll() will block for from 100ms to
10ms.  I wonder if we should not block at all in the event loop.
Do not block in the Naemon event loop
Instead of reducing blocking in the event loop, do not block at all.
Even with 10ms blocking there are still cases of starvation of the event
loop and inability to process results.
@jjneely

This comment has been minimized.

Show comment
Hide comment
@jjneely

jjneely Sep 21, 2016

Contributor

Updated this PR so that we don't block at all in the Naemon event loop. Even with 10ms I still experienced some small recurrences of event loop starvation / unable to process all fds fast enough.

Contributor

jjneely commented Sep 21, 2016

Updated this PR so that we don't block at all in the Naemon event loop. Even with 10ms I still experienced some small recurrences of event loop starvation / unable to process all fds fast enough.

@pengi

This comment has been minimized.

Show comment
Hide comment
@pengi

pengi Oct 13, 2016

Contributor

To have a poll of 0 might have side effects, but I can see that's how it works in the ipc case. Thus I'll pull this locally for a while to test it out before merging.

Great find however!

Contributor

pengi commented Oct 13, 2016

To have a poll of 0 might have side effects, but I can see that's how it works in the ipc case. Thus I'll pull this locally for a while to test it out before merging.

Great find however!

@op5-ci-user op5-ci-user merged commit a5bdae1 into op5:master Oct 13, 2016

@mfalkvidd

This comment has been minimized.

Show comment
Hide comment
@mfalkvidd

mfalkvidd Oct 17, 2016

This fix will be included in tomorrow's release, see https://kb.op5.com/x/34IXAQ for details

mfalkvidd commented Oct 17, 2016

This fix will be included in tomorrow's release, see https://kb.op5.com/x/34IXAQ for details

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment