Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Prevent unnecessary should_bypass_proxies checks in rebuild_proxies #5894

Conversation

dbaxa
Copy link
Contributor

@dbaxa dbaxa commented Jul 29, 2021

Signed-off-by: David Black dblack@atlassian.com

@dbaxa
Copy link
Contributor Author

dbaxa commented Jul 29, 2021

I noticed #5893 prior to pushing the exact same fix for the performance regression noticed that I had made locally. This PR extracts the re-building proxies performance regression fix from #5893 to have it be in its own pr/change . cc
@omermizr

@@ -633,7 +633,8 @@ def send(self, request, **kwargs):
kwargs.setdefault('stream', self.stream)
kwargs.setdefault('verify', self.verify)
kwargs.setdefault('cert', self.cert)
kwargs.setdefault('proxies', self.rebuild_proxies(request, self.proxies))
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@sigmavirus24 this change is needed because without it we are calling to re-build proxies from the environment even when proxies have been set/provided.

In one use test case I have setup - the time to "retrieve" a cached connection (using the cache control adapter) 10,000 times goes from 9.741 seconds and 25212240 function calls to 9902240 function calls & 6.169 seconds after applying this change.

Copy link
Contributor Author

@dbaxa dbaxa Jul 29, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Before this change ->


   Ordered by: internal time
   List reduced from 543 to 30 due to restriction <30>

   ncalls  tottime  percall  cumtime  percall filename:lineno(function)
  1410001    0.698    0.000    1.729    0.000 /usr/lib/python3.9/os.py:674(__getitem__)
  1510005    0.598    0.000    3.054    0.000 /usr/lib/python3.9/_collections_abc.py:848(__iter__)
  2760000    0.447    0.000    0.760    0.000 /usr/lib/python3.9/os.py:758(decode)
    10000    0.406    0.000    3.401    0.000 /usr/lib/python3.9/urllib/request.py:2486(getproxies_environment)
  1410001    0.403    0.000    0.640    0.000 /usr/lib/python3.9/os.py:754(encode)
  3010069    0.335    0.000    0.335    0.000 {method 'decode' of 'bytes' objects}
  1400000    0.321    0.000    0.689    0.000 /usr/lib/python3.9/os.py:697(__iter__)
        1    0.308    0.308    0.308    0.308 {method 'do_handshake' of '_ssl._SSLSocket' objects}
  2360120    0.258    0.000    0.380    0.000 {built-in method builtins.isinstance}
    70004    0.179    0.000    0.536    0.000 /usr/lib/python3.9/_collections_abc.py:932(update)
  1760141    0.165    0.000    0.165    0.000 {method 'lower' of 'str' objects}
        1    0.164    0.164    0.164    0.164 {method 'read' of '_ssl._SSLSocket' objects}
1490043/1490041    0.163    0.000    0.163    0.000 {method 'encode' of 'str' objects}
        1    0.147    0.147    0.147    0.147 {method 'connect' of '_socket.socket' objects}
        ...

After this change ->


   Ordered by: internal time
   List reduced from 535 to 30 due to restriction <30>

   ncalls  tottime  percall  cumtime  percall filename:lineno(function)
        1    0.326    0.326    0.326    0.326 {method 'do_handshake' of '_ssl._SSLSocket' objects}
        1    0.191    0.191    0.191    0.191 {method 'read' of '_ssl._SSLSocket' objects}
    70004    0.185    0.000    0.553    0.000 /usr/lib/python3.9/_collections_abc.py:932(update)
   890120    0.170    0.000    0.298    0.000 {built-in method builtins.isinstance}
        1    0.150    0.150    0.150    0.150 {method 'connect' of '_socket.socket' objects}
        ...

@omermizr
Copy link

@dbaxa The only thing that looks kinda wonky here to me is that rebuild_proxies has a side effect of removing the Proxy-Authorization header (see #5888). It's probably not the intended behavior (as stated in the issue), but I'm not familiar with the considerations.
Anyway, if it shouldn't do that - then it shouldn't do that even when proxies is not set, and if it should do that - after this PR we no longer do. So this introduces inconsistent behavior...

@dbaxa
Copy link
Contributor Author

dbaxa commented Jul 29, 2021

@omermizr I am not sure I entirely follow what you have said.

I'll re-phase what I think you are suggesting here & add some of my own understanding - which might help -

  1. If proxies have been set we don't need to call rebuild_proxies
  2. If proxies have been set we do call rebuild_proxies
  3. Calling rebuild_proxies removes the Proxy-Authorization header
  4. So if we explicitly set proxy information and Proxy-Authorization has been explicitly set - we won't remove the Proxy-Authorization header <- this sounds like it would work as expected to me, unless the header should be removed if we are to bypass proxying later.
  5. rebuild_auth Does not interact with the Proxy-Authorization header but it does interact with a Authorization header
  6. rebuild_proxies does remove any explicitly set Proxy-Authorization header - as it will try to fill it in from/via get_auth_from_url (if a defined proxy has username & password information as part of the url - it will attempted to be used).
  7. should_bypass_proxies seems to be the main slow path that we should be avoiding if proxies have explicitly been set.

Therefore imho the better fix is to do one of the following:

  • Provide a fast(er) path inside of rebuild_proxies such that we do not call should_bypass_proxies - at least not when proxies have been explicitly provided and/or possibly not when trust_env is set to false (as we don't trust the env and so there is no need to examine the environment)
  • cache and or make obtaining proxies from env faster (the maintainers seem not to want to do this)
  • Do as you did for https://github.com/psf/requests/pull/5893/files and have rebuild_auth interact with proxy auth header - but this modifies public api code

Not removing Proxy-Authorization when proxy information has been explicitly set is an issue as we might not end up selecting the proxy for the given scheme iirc.

@@ -289,7 +289,9 @@ def rebuild_proxies(self, prepared_request, proxies):
new_proxies = proxies.copy()
no_proxy = proxies.get('no_proxy')

bypass_proxy = should_bypass_proxies(url, no_proxy=no_proxy)
bypass_proxy = False
if self.trust_env:
Copy link
Contributor Author

@dbaxa dbaxa Jul 29, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@omermizr / @sigmavirus24 / @nateprewitt - this change fixes the performance regression of #5891 but does not fix #5888. IMHO fixing #5888 is something that you might do by checking if a proxy has been selected as if one has not been selected you likely would not want to leak proxy credentials. Of course it might be easier to suggest that we do not explicitly support setting Proxy-Authorization as a header and that users must specify proxy credentials in the proxy url.

@dbaxa
Copy link
Contributor Author

dbaxa commented Jul 29, 2021

Before the revised changed ->


   Ordered by: internal time
   List reduced from 543 to 30 due to restriction <30>

   ncalls  tottime  percall  cumtime  percall filename:lineno(function)
   950001    0.583    0.000    1.435    0.000 /usr/lib/python3.9/os.py:674(__getitem__)
  1050005    0.519    0.000    2.563    0.000 /usr/lib/python3.9/_collections_abc.py:848(__iter__)
  1840000    0.372    0.000    0.632    0.000 /usr/lib/python3.9/os.py:758(decode)
    10000    0.341    0.000    2.815    0.000 /usr/lib/python3.9/urllib/request.py:2486(getproxies_environment)
   950001    0.330    0.000    0.525    0.000 /usr/lib/python3.9/os.py:754(encode)
        1    0.310    0.310    0.310    0.310 {method 'do_handshake' of '_ssl._SSLSocket' objects}
  2090069    0.289    0.000    0.289    0.000 {method 'decode' of 'bytes' objects}
  1900120    0.269    0.000    0.424    0.000 {built-in method builtins.isinstance}
   940000    0.264    0.000    0.569    0.000 /usr/lib/python3.9/os.py:697(__iter__)
    70004    0.210    0.000    0.634    0.000 /usr/lib/python3.9/_collections_abc.py:932(update)
        1    0.178    0.178    0.178    0.178 {method 'read' of '_ssl._SSLSocket' objects}
  1530141    0.175    0.000    0.175    0.000 {method 'lower' of 'str' objects}
  ...

After the revised change ->

         10082240 function calls (10062186 primitive calls) in 6.860 seconds

   Ordered by: internal time
   List reduced from 536 to 30 due to restriction <30>

   ncalls  tottime  percall  cumtime  percall filename:lineno(function)
        1    0.294    0.294    0.294    0.294 {method 'do_handshake' of '_ssl._SSLSocket' objects}
    70004    0.201    0.000    0.606    0.000 /usr/lib/python3.9/_collections_abc.py:932(update)
   910120    0.192    0.000    0.336    0.000 {built-in method builtins.isinstance}
        1    0.189    0.189    0.189    0.189 {method 'read' of '_ssl._SSLSocket' objects}
        1    0.145    0.145    0.145    0.145 {method 'connect' of '_socket.socket' objects}
        ...

@dbaxa
Copy link
Contributor Author

dbaxa commented Aug 12, 2021

bump cc @sigmavirus24 / @nateprewitt

requests/sessions.py Outdated Show resolved Hide resolved
@nateprewitt nateprewitt changed the title As per #5893 & #5891 address a performance regression by not rebuilding proxies if proxies have been supplied. Prevent unnecessary should_bypass_proxies checks in rebuild_proxies Aug 12, 2021
@dbaxa
Copy link
Contributor Author

dbaxa commented Aug 26, 2021

@nateprewitt can we merge this ?

@nateprewitt
Copy link
Member

I think this is alright as a minor optimization, but doesn't address either of the original issues. I'd like to get those refactored and resolved before we merge anymore changes into the method.

@dbaxa
Copy link
Contributor Author

dbaxa commented Aug 26, 2021

@nateprewitt imho it is slightly more than a minor optimisation (for certain use cases) & I have a concern that fixing/re-factoring the related code is likely to be complex and take a fairly long time. I am not sure how much #5894 (comment) applies to what you feel we should have as the result though. Mainly this part ->

have rebuild_auth interact with proxy auth header - but this modifies public api code

@nateprewitt
Copy link
Member

nateprewitt commented Aug 27, 2021

@nateprewitt imho it is slightly more than a minor optimisation (for certain use cases)

Sure, I was conveying it doesn't affect the hot path for the average user. There may be larger improvements for more niche uses.

I have a concern that fixing/re-factoring the related code is likely to be complex and take a fairly long time. I am not sure how much #5894 (comment) applies to what you feel we should have as the result though.

That may be true, I've left my thoughts on moving forward in the issue, but I don't know if we've arrived at a consensus. Either way, #5893 is a blocker for the next release of Requests, so merging this now or later has no effect on when it's available publicly.

@nateprewitt
Copy link
Member

Hey @dbaxa, now that #5924 is merged would you mind rebasing this change onto the new resolve_proxies function in utils.py?

@nateprewitt nateprewitt added this to the 2.27.0 milestone Dec 29, 2021
Co-authored-by: David Black <dblack@atlassian.com>
@nateprewitt nateprewitt force-pushed the do-not-re-build-proxies-when-proxies-have-been-provided branch from 7442f4e to 0d5347e Compare December 29, 2021 05:12
@nateprewitt nateprewitt changed the base branch from master to main December 29, 2021 05:12
@nateprewitt nateprewitt mentioned this pull request Dec 29, 2021
@nateprewitt nateprewitt merged commit 77d1e9a into psf:main Dec 29, 2021
@github-actions github-actions bot locked as resolved and limited conversation to collaborators Mar 29, 2022
This pull request was closed.
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants