Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Avoid an unnecessary computation of the length of data for non-stream requests (which determine content-length based upon body content). #5496

Merged
merged 1 commit into from
Jun 17, 2020

Conversation

dbaxa
Copy link
Contributor

@dbaxa dbaxa commented Jun 16, 2020

No description provided.

… requests (which determines content-length based upon body content).
@dbaxa dbaxa changed the title Avoid an unnecessary computation of the length of data for non-stream requests (which determines content-length based upon body content). Avoid an unnecessary computation of the length of data for non-stream requests (which determine content-length based upon body content). Jun 16, 2020
Copy link
Member

@nateprewitt nateprewitt left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good catch @dbaxa, this seems reasonable. Thanks!

Copy link
Member

@nateprewitt nateprewitt left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think we need to use a set here, was there a reason beyond avoiding a second iteration?

@dbaxa
Copy link
Contributor Author

dbaxa commented Jun 16, 2020

@nateprewitt when profiling requests, it resulted in a reduction in execution time & it makes more sense to me.

@nateprewitt
Copy link
Member

@dbaxa the net gain of the change is pretty minor and we're just swapping memory usage for speed. Since it's not really part of what the PR was raised for let's consolidate this to just the super_len encapsulation and I think we'll be set.

@sigmavirus24
Copy link
Contributor

@dbaxa so performance isn't the top concern for Requests. Further, evidence of improvements goes a long way towards convincing others

@dbaxa
Copy link
Contributor Author

dbaxa commented Jun 16, 2020

@sigmavirus24 I can provide some evidence / data if desired. To be honest some of the minor optimisations that can be made might not be suitable for requests to take. However, there are certainly some areas that requests could do with improvements with/to (e.g. the slowness of the code executed as part of obtaining proxy information inside of an already created session object for each/every generated request. I have a benchmark that shows that for one non-cached request & 9999 cached requests, using cachecontrol, ~10 seconds can be saved by setting trust_env to False. Although, arguably it should be possible for cachecontrol to be modified to offer a "fast path").

As an aside, is there a reason to prefer using a tuple instead of a set for the method check (other than the memory usage trade off) ?

@sigmavirus24
Copy link
Contributor

To be honest some of the minor optimisations that can be made might not be suitable for requests to take.

Agreed.

I can provide some evidence / data if desired.

I'm not a maintainer, but in the past PRs with no description of the changes without previously discussing them with the maintainers often feel like "drive-by" contributions with no context and no clear value. That makes them easy to dismiss, forget about, or just close outright. I'd find these to be a higher quality if you explained (in a couple sentences or less) how you found the problem, and the rough measured performance improvements (including OS, python versions tested, etc.).

However, there are certainly some areas that requests could do with improvements with/to (e.g. the slowness of the code executed as part of obtaining proxy information inside of an already created session object for each/every generated request. I have a benchmark that shows that for one non-cached request & 9999 cached requests, using cachecontrol, ~10 seconds can be saved by setting trust_env to False.

Yes well trust_env is likely to stay set to `True for the foreseeable future. I don't recall why we decided it was prudent to reparse every-time but I feel like there was an argument made in that direction when we did it.

Although, arguably it should be possible for cachecontrol to be modified to offer a "fast path"

I'm fairly certain cachecontrol is no longer maintained unfortunately

As an aside, is there a reason to prefer using a tuple instead of a set for the method check (other than the memory usage trade off) ?

🤷

@dbaxa
Copy link
Contributor Author

dbaxa commented Jun 16, 2020

I'm not a maintainer, but in the past PRs with no description of the changes without previously discussing them with the maintainers often feel like "drive-by" contributions with no context and no clear value. That makes them easy to dismiss, forget about, or just close outright. I'd find these to be a higher quality if you explained (in a couple sentences or less) how you found the problem, and the rough measured performance improvements (including OS, python versions tested, etc.).

Sure, I'll try to provide some more context in future pull requests.

Yes well trust_env is likely to stay set to `True for the foreseeable future. I don't recall why we decided it was prudent to reparse every-time but I feel like there was an argument made in that direction when we did it.

That's fine as users who want not to hit the relative slowness encountered by trust_env can disable it - however we should also consider seeing what we can do to reduce the time that the code that executes when trust_env is enabled takes.

I'm fairly certain cachecontrol is no longer maintained unfortunately

I am not sure. I have also yet to look into raising an improvement/suggestion for @ionrock 's cachecontrol to offer the ability to check if a request to a certain url would be cached or not without needing to create a request object (or use a mock request). It would be excellent to be able to make use of the logic that has already been implemented in cachecontrol in a way that would let user's have a "fast path" to obtaining cached responses/content.

@dbaxa dbaxa requested a review from nateprewitt June 17, 2020 00:29
Copy link
Member

@nateprewitt nateprewitt left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good, thanks @dbaxa!

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants