Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Mitigate CPU spikes when sending lots of events with lots of data #2449

Merged
merged 9 commits into from
Oct 17, 2023

Conversation

antonpirker
Copy link
Member

@antonpirker antonpirker commented Oct 16, 2023

Increasing the HTTP pool size to better handle the requests. Also not reading the response saves some CPU cycles because TLS stuff uses CPU.

This does not fix all CPU spikes, but instead of spikes happening every 1 in 3-4 times it only happens 1 in 7-8 times with my test script.

FIxes #2384

@antonpirker antonpirker marked this pull request as ready for review October 16, 2023 09:10
Copy link
Member

@sl0thentr0py sl0thentr0py left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

these are changes that are hard to reason about in terms of impact but they impact the entire SDK.

Can we instead

  • expose num_pools as a config with default 2
  • expose preload_content also as a new option and pass it through
  • try these 2 in production for a while and then change the defaults if it works better

@sl0thentr0py
Copy link
Member

cc @untitaker for a quick sanity check if you see problems with changing this stuff

@untitaker
Copy link
Member

I think you want drain_conn instead of release_conn. If I understood the documentation correctly, you are putting a TCP connection back into the pool while the response body still has to be read.

I would guess that if you start sending new requests into that connection, the server may choose to terminate the connection because from their POV you're misbehaving. Even if that happens, urllib3 may transparently recover from that scenario by creating a new connection, but I am not sure. You'd have to test what actually happens. I suggest spinning up a http server locally and looking at wireshark (open it, look at loopback device lo0 and apply "http" filter)

I don't exactly understand what preload_conn does, but from what I can tell the original True value would've prevented this issue because it just behaves like "draining".

This assumes http1, i have no idea what happens with http2. I also just barely skimmed over the urllib3 docs for this, I have never heard of those specific parameters before, so take it with a grain of salt.

@antonpirker
Copy link
Member Author

Hey! Thanks for the input. Both good ideas. Will make it configurable and also look at the traffic to see what is happening.
(the admin of the urllib3 discord server said I should just release_conn without reading the response, but better save than sorry)

@antonpirker
Copy link
Member Author

Looking at wireshark was a good idea @untitaker. When I set preload_content/release_conn for every HTTP post a new TCP connection is established. This not what we want.

I will just revert the preload_content stuff and make the pool size configurable. Let's see if this helps.

@antonpirker antonpirker enabled auto-merge (squash) October 17, 2023 12:56
@antonpirker antonpirker merged commit d8634d0 into master Oct 17, 2023
283 of 284 checks passed
@antonpirker antonpirker deleted the antonpirker/2384-cpu-spikes branch October 17, 2023 13:06
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Transport causes CPU Spikes on Larger Envelopes
3 participants