Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix CPU load of dockerapi container #5376

Merged
merged 2 commits into from
Aug 28, 2023

Conversation

mstilkerich
Copy link
Contributor

Previously the handle_pubsub_messages() loop was executing every 10ms when there was no message available. This creates a constant CPU load of around 6% on my system.

Now reading from the redis network socket will block (the coroutine) for up to 30s before it returns when no message is available. Using channel.listen() would be even better, but it lacks the ignore_subscribe_messages option and I could not figure out how to filter the returned messages.

Note I have a very limited to basically non-existent knowledge of Python, but I read up on asyncio and believe to understand the changes I made. Nonetheless of course this should be cross-checked. I verified that it does not actually block the entire event loop, since HTTP requests to dockerapi are immediately responded to.

Previously the handle_pubsub_messages() loop was executing every 10ms
when there was no message available. Now reading from the redis network
socket will block (the coroutine) for up to 30s before it returns when
no message is available.

Using channel.listen() would be even better, but it lacks the
ignore_subscribe_messages option and I could not figure out how to
filter the returned messages.
@milkmaker
Copy link
Collaborator

Thanks for contributing!

I noticed that you didn't select staging as your base branch. Please change the base branch to staging.
See the attached picture on how to change the base branch to staging:

check_prs_if_on_staging.png

@mstilkerich mstilkerich changed the base branch from master to staging August 5, 2023 19:15
@FreddleSpl0it
Copy link
Collaborator

In the telegram discussion we had, i pointed to the wrong timeout function.
this is the correct one:

await asyncio.sleep(0.01)

this is also the loop delay of 10ms @mstilkerich described. I think it would be enough to set this timeout to 250ms and see if the cpu load decreases.

@mstilkerich
Copy link
Contributor Author

It would sure fix the CPU load, but it would also add up to 250ms of extra delay for processing a redis message.

I think what you want is:

  • As long as there is no redis message to process, the task / coroutine should be blocked so the event loop does not schedule it.
    • This is done by adding the timeout to get_message(), which will limit it to waking every 30s in case there is no message. If there is a new message, it will become available for being dispatched by the event loop right away.
    • Better would be listen(), as it blocks until a message is available for processing, but could not use it for the reasons described above.
  • After processing a redis message, we should give a chance for other coroutines to run. This is achieved by the asyncio.sleep(0.01) at the end. In fact, for this purpose we should pass 0 instead as it appear to offer an optimized code path for selection of other coroutines if any is available for execution.
  • The surrounding async_timeout.timeout() serves as a watchdog to my understanding in case processing one message takes longer than expected and will kill it in that case. The timeout value here must be chosen large enough to accomodate for the execution time incl. blocking time of the contained code.

@mstilkerich
Copy link
Contributor Author

AFAIK everything is in place here. Just in case you are waiting for anything from my side please let me know.

@FreddleSpl0it FreddleSpl0it merged commit 9ba5c13 into mailcow:staging Aug 28, 2023
@mstilkerich mstilkerich deleted the fix_dockerapi_cpuload branch October 13, 2023 14:08
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants