Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Use Session for HTTP requests #738

Closed
wants to merge 11 commits into from
Closed

Use Session for HTTP requests #738

wants to merge 11 commits into from

Conversation

hokieg3n1us
Copy link
Contributor

@hokieg3n1us hokieg3n1us commented Aug 2, 2021

This proposed change would resolve any issues similar to issue #17 .

Description: Include a dictionary of configuration for a requests Session that is instantiated by the downloader, including keys for auth, cert, and verify. This allows a user to modify that session by providing a tuple of username and password for basic authentication, or provide client and server certificates. It should also improve performance when making repeated requests to the same host by re-using the underlying urllib3 connection.

Example Usage:

import osmnx as ox

session_config = {}
session_config['auth'] = ('username', 'password')
session_config['cert'] = '/path/to/cert'
session_config['verify'] = '/path/to/CA/'

osmnx.config(session_config=session_config)

hokieg3n1us and others added 7 commits August 1, 2021 21:25
Include a default requests.Session as part of settings that is used for HTTP requests to OSM APIs, and can be overridden using the utils.config. This should improve performance by re-using the underlying TCP connection when sending multiple requests to the same host, and allows a user of osmnx to override the session to provide client/server certificates when making request to any OSM API that uses 2-way SSL verification.
Correct formatting for imports.
Update downloader.py to fix out-of-scope variable.
Re-format code using black (for line length)
Switched to Ubuntu dev machine to properly run lint_test.sh, after introducing formatting issue through Github merge conflict interface.
@codecov
Copy link

codecov bot commented Aug 11, 2021

Codecov Report

Merging #738 (3c35302) into main (81db14d) will decrease coverage by 0.12%.
The diff coverage is 87.50%.

Impacted file tree graph

@@            Coverage Diff             @@
##             main     #738      +/-   ##
==========================================
- Coverage   96.77%   96.65%   -0.13%     
==========================================
  Files          24       24              
  Lines        2422     2423       +1     
==========================================
- Hits         2344     2342       -2     
- Misses         78       81       +3     
Impacted Files Coverage Δ
osmnx/downloader.py 95.26% <86.36%> (-1.44%) ⬇️
osmnx/settings.py 100.00% <100.00%> (ø)
osmnx/utils.py 100.00% <100.00%> (ø)

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 81db14d...3c35302. Read the comment docs.

@gboeing
Copy link
Owner

gboeing commented Aug 11, 2021

Thanks @hokieg3n1us let me think about this. Two questions for you:

  1. Would requests.Session have any conflicts/redundancies with the new explicit IP address resolution fixes released in v1.1.1 (Resolve Overpass API requests to a specific IP address #699)?
  2. Could you add more detail about why/how a user would use this in the docstring?

Updated docstring to better describe how to use session. Since Session creates a persistent TCP connection that is used when requests are sent to the same host/port, remove redundant fix from #699. Unable to duplicate network error from tests after
running multiple times on Python 3.6, 3.7, 3.8, and 3.9.
@hokieg3n1us
Copy link
Contributor Author

@gboeing

  1. You're correct that it should be redundant. The requests.Session will create a persistent urllib3 connection that will be re-used for requests to the same host/port, so I backed out the _config_dns function.
  2. I updated the docstring. Maybe we should link to Requests documentation for Advanced Usage to describe the customization that can be done on the session.

Diagnosing the intermittent test failures, determined that the root cause was the server closing the connection due to inactivity. Refactored the settings to expose a dictionary of attributes that can be configured for a requests.Session that is instantiated when a call is made to _osm_network_download, _osm_geometries_download, or nominatum_request. Less elegant, but should allow users to provide authentication and 2-way SSL configuration, while also improving performance during repeated calls to the overpass API.
@hokieg3n1us
Copy link
Contributor Author

@gboeing

I diagnosed the intermittent failures (inactivity on the underlying TCP connection causing the server to close), and refactored significantly to still gain the benefits of using a requests Session, while providing some of the more important configuration options to a user. More of the underlying Session parameters can be exposed as necessary.

Properly check if key exists in session_config, so a user can only provide keys as necessary for their environment.
moving files used for local testing of different python versions
Copy link
Owner

@gboeing gboeing left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've left just a few comments. My main overall question would be: have you tested this to confirm consistent IP address resolution in line with what #699 provided? If so, could you document those test methods/results here? Thanks!

Comment on lines +683 to +684
utils.log(f"Pausing {this_pause} seconds before making HTTP POST request")
time.sleep(this_pause)
Copy link
Owner

@gboeing gboeing Aug 12, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These need to be indented back out of the if block.

Copy link
Contributor Author

@hokieg3n1us hokieg3n1us Aug 12, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That produces an out of scope variable reference to this_pause similar to the recently fixed issue for the response status code.

Comment on lines +211 to +214
and verify as keys. auth is a tuple of (username, password). cert is the
path to a single file (containing the private key and certificate) or a tuple
of both files' paths. verify can be the path to a CA_BUNDLE file, or directory
of trusted CAs.
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It might be easier and more future-proof to just point the user toward the requests documentation on requests.Session.auth, cert, and path since we're just passing these values through to those requests.Session attributes.

@@ -275,7 +278,7 @@ def _get_pause(base_endpoint, recursive_delay=5, default_duration=60):

try:
url = base_endpoint.rstrip("/") + "/status"
response = settings.session.get(url, headers=_get_http_headers())
response = session.get(url, headers=_get_http_headers())
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

_create_session is not called in the _get_pause function, but it is called in the other functions that make HTTP requests to the Overpass server?

@@ -451,6 +454,8 @@ def _osm_network_download(polygon, network_type, custom_filter):
else:
osm_filter = _get_osm_filter(network_type)

session = _create_session()
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I notice that _create_session is called 3 times (i.e. separately in the different "download" functions). Is there a reason to do it this way, rather than creating a session one time, for example whenever ox.config is called, and then just using that session throughout?

@hokieg3n1us
Copy link
Contributor Author

I am closing this pull request because I cannot reproduce the intermittent issues causing the remote server to close the connection, and that's difficult to debug properly without more information about OSM's service layer. It's also difficult to elegantly construct a session, without refactoring the downloader.py to be instantiated classes that live/die during graph construction since you can't create a single Session to be used for the entirety of osmnx, because the client can't hold open an inactive TCP connection.

I plan to identify a route of cleanly passing cert, auth, and verify directly to the requests.post/requests.get.

@hokieg3n1us hokieg3n1us deleted the use-requests-session branch August 12, 2021 17:59
@gboeing
Copy link
Owner

gboeing commented Aug 12, 2021

New PR for development at #745.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants