Avoid an unnecessary computation of the length of data for non-stream requests (which determine content-length based upon body content). #5496

dbaxa · 2020-06-16T10:55:28Z

No description provided.

… requests (which determines content-length based upon body content).

nateprewitt

Good catch @dbaxa, this seems reasonable. Thanks!

nateprewitt

I don't think we need to use a set here, was there a reason beyond avoiding a second iteration?

dbaxa · 2020-06-16T22:15:37Z

@nateprewitt when profiling requests, it resulted in a reduction in execution time & it makes more sense to me.

nateprewitt · 2020-06-16T22:57:09Z

@dbaxa the net gain of the change is pretty minor and we're just swapping memory usage for speed. Since it's not really part of what the PR was raised for let's consolidate this to just the super_len encapsulation and I think we'll be set.

sigmavirus24 · 2020-06-16T23:13:09Z

@dbaxa so performance isn't the top concern for Requests. Further, evidence of improvements goes a long way towards convincing others

dbaxa · 2020-06-16T23:22:50Z

@sigmavirus24 I can provide some evidence / data if desired. To be honest some of the minor optimisations that can be made might not be suitable for requests to take. However, there are certainly some areas that requests could do with improvements with/to (e.g. the slowness of the code executed as part of obtaining proxy information inside of an already created session object for each/every generated request. I have a benchmark that shows that for one non-cached request & 9999 cached requests, using cachecontrol, ~10 seconds can be saved by setting trust_env to False. Although, arguably it should be possible for cachecontrol to be modified to offer a "fast path").

As an aside, is there a reason to prefer using a tuple instead of a set for the method check (other than the memory usage trade off) ?

sigmavirus24 · 2020-06-16T23:34:10Z

To be honest some of the minor optimisations that can be made might not be suitable for requests to take.

Agreed.

I can provide some evidence / data if desired.

I'm not a maintainer, but in the past PRs with no description of the changes without previously discussing them with the maintainers often feel like "drive-by" contributions with no context and no clear value. That makes them easy to dismiss, forget about, or just close outright. I'd find these to be a higher quality if you explained (in a couple sentences or less) how you found the problem, and the rough measured performance improvements (including OS, python versions tested, etc.).

However, there are certainly some areas that requests could do with improvements with/to (e.g. the slowness of the code executed as part of obtaining proxy information inside of an already created session object for each/every generated request. I have a benchmark that shows that for one non-cached request & 9999 cached requests, using cachecontrol, ~10 seconds can be saved by setting trust_env to False.

Yes well trust_env is likely to stay set to `True for the foreseeable future. I don't recall why we decided it was prudent to reparse every-time but I feel like there was an argument made in that direction when we did it.

Although, arguably it should be possible for cachecontrol to be modified to offer a "fast path"

I'm fairly certain cachecontrol is no longer maintained unfortunately

As an aside, is there a reason to prefer using a tuple instead of a set for the method check (other than the memory usage trade off) ?

🤷

dbaxa · 2020-06-16T23:38:05Z

I'm not a maintainer, but in the past PRs with no description of the changes without previously discussing them with the maintainers often feel like "drive-by" contributions with no context and no clear value. That makes them easy to dismiss, forget about, or just close outright. I'd find these to be a higher quality if you explained (in a couple sentences or less) how you found the problem, and the rough measured performance improvements (including OS, python versions tested, etc.).

Sure, I'll try to provide some more context in future pull requests.

Yes well trust_env is likely to stay set to `True for the foreseeable future. I don't recall why we decided it was prudent to reparse every-time but I feel like there was an argument made in that direction when we did it.

That's fine as users who want not to hit the relative slowness encountered by trust_env can disable it - however we should also consider seeing what we can do to reduce the time that the code that executes when trust_env is enabled takes.

I'm fairly certain cachecontrol is no longer maintained unfortunately

I am not sure. I have also yet to look into raising an improvement/suggestion for @ionrock 's cachecontrol to offer the ability to check if a request to a certain url would be cached or not without needing to create a request object (or use a mock request). It would be excellent to be able to make use of the logic that has already been implemented in cachecontrol in a way that would let user's have a "fast path" to obtaining cached responses/content.

nateprewitt

Looks good, thanks @dbaxa!

Avoid an unnecessary computation of the length of data for non-stream…

71a05cf

… requests (which determines content-length based upon body content).

nateprewitt approved these changes Jun 16, 2020

View reviewed changes

nateprewitt requested changes Jun 16, 2020

View reviewed changes

dbaxa force-pushed the optimise-prepare-body branch from 2084872 to 71a05cf Compare June 16, 2020 23:18

dbaxa requested a review from nateprewitt June 17, 2020 00:29

nateprewitt approved these changes Jun 17, 2020

View reviewed changes

nateprewitt merged commit 7c71982 into psf:master Jun 17, 2020

nateprewitt mentioned this pull request Jun 17, 2020

v2.24.0 #5500

Merged

dbaxa mentioned this pull request Jul 29, 2021

fix: improve proxy handling #5893

Closed

github-actions bot locked as resolved and limited conversation to collaborators Aug 28, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Avoid an unnecessary computation of the length of data for non-stream requests (which determine content-length based upon body content). #5496

Avoid an unnecessary computation of the length of data for non-stream requests (which determine content-length based upon body content). #5496

dbaxa commented Jun 16, 2020

nateprewitt left a comment

nateprewitt left a comment

dbaxa commented Jun 16, 2020

nateprewitt commented Jun 16, 2020

sigmavirus24 commented Jun 16, 2020

dbaxa commented Jun 16, 2020 •

edited

Loading

sigmavirus24 commented Jun 16, 2020

dbaxa commented Jun 16, 2020 •

edited

Loading

nateprewitt left a comment

Avoid an unnecessary computation of the length of data for non-stream requests (which determine content-length based upon body content). #5496

Avoid an unnecessary computation of the length of data for non-stream requests (which determine content-length based upon body content). #5496

Conversation

dbaxa commented Jun 16, 2020

nateprewitt left a comment

Choose a reason for hiding this comment

nateprewitt left a comment

Choose a reason for hiding this comment

dbaxa commented Jun 16, 2020

nateprewitt commented Jun 16, 2020

sigmavirus24 commented Jun 16, 2020

dbaxa commented Jun 16, 2020 • edited Loading

sigmavirus24 commented Jun 16, 2020

dbaxa commented Jun 16, 2020 • edited Loading

nateprewitt left a comment

Choose a reason for hiding this comment

dbaxa commented Jun 16, 2020 •

edited

Loading

dbaxa commented Jun 16, 2020 •

edited

Loading