-
Notifications
You must be signed in to change notification settings - Fork 815
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Overpass status endpoint provides incorrect results #698
Comments
Thanks! I believe the approach based on a numeric hostname might break in case of https. I've found a blog post on this topic, but I haven't done more research on any specifics. Maybe something to keep in mind for a future implementation: https://www.tomechangosubanana.com/2017/forcing-python-requests-to-connect-to-a-specific-ip-address/ |
@mmd-osm you were right about that. I found a potential workaround for https, but now I'm wondering if this is all necessary: as far as I can tell, requesting the "status" endpoint or the "interpreter" endpoint always resolves to the same IP address as the other. For example: import requests
with requests.get("https://overpass-api.de/api/status", stream=True) as r:
print(r.raw._original_response.fp.raw._sock.getpeername()[0])
with requests.get("https://overpass-api.de/api/interpreter", stream=True) as r:
print(r.raw._original_response.fp.raw._sock.getpeername()[0]) This gets the "status" then "interpreter" endpoints and prints the host's IP address for each request. The IP address is always the same across the two requests. Sometimes they are both 178.63.11.215. Other times, when I change my own IP address, they are both 178.63.48.217. But the two requests always resolve to the same IP address. As far as I can tell, the round-robin DNS causes every request I make to any URL at overpass-api.de resolve to the same IP address as any other (until I change my own IP address). So the slot management results I see at the "status" endpoint thus will always correspond to the same server I'm querying at the "interpreter" endpoint, right? |
I'm still looking for some good references describing the actual behavior, although I'm suspecting this might be entirely implementation dependent. https://en.wikipedia.org/wiki/Round-robin_DNS mentions client-side address caching and reuse in the Drawbacks section, which is very likely what you're seeing here. By the way, I tried your code example above on two different boxes, running Ubuntu 20.04 and Raspbian Linux. In both cases, I'm getting deviating IP addresses across both requests:
|
Fascinating. I tested this on Ubuntu 20.04 and Windows 10, and have only been able to get the host IP address to change by changing my own IP, and the two requests' resolved IPs always match each other. |
@mmd-osm I have a proposed fix in #699 that seems to work properly for both http and https requests. Would you mind taking a quick look at it? I want to make sure OSMnx is being a good consumer of the Overpass API: hopefully this fix will ensure the status endpoint gives it correct results for the server being queried at the interpreter endpoint. It appears to now, in my testing. |
Apologies for the delay, I wanted to give the new code a try and run at least the unit tests. Everything looks fine here, querying the status and subsequent interpreter calls are all using the same IP. Switching between both servers also looks fine. For you reference, I have attached one of my log files: log.txt I've noticed two unrelated points, which might be a topic for another issue.
Since about three years, Thanks again for looking into this topic! |
Thanks @mmd-osm.
It appears to be working, but requires an API key which the tests don't currently use. Thanks for the heads-up about the |
I've been noticing mysteriously long pauses required by Overpass's status endpoint recently. There may be an explanation now... it's possible that the Overpass status endpoint provides "incorrect" results due to load balancing/redirection among multiple subdomains now. In #697 @mmd-osm says:
and
So, my understanding is that the domain overpass-api.de just redirects to one of its subdomains (currently z.overpass-api.de and lz4.overpass-api.de). So if we check the status endpoint of overpass-api.de, we may see results for subdomain z, but when we submit the query itself it gets redirected to subdomain lz4. This could result in violating the slot management timing for that server, and potentially a long delay before the next query is allowed. This may explain the particularly long delay times between queries I've seen in recent months.
I believe we can do the decoupling suggested by @mmd-osm in Python with the
socket
module.For example if we run:
we see:
but if I change my computer's IP address and run it again, we see:
So, it looks like
socket.gethostbyname(domain)
does return the IP address of one of its subdomains when you pass in "overpass-api.de". We could then submit all Overpass API requests directly to the IP address resolved bysocket.gethostbyname(domain)
. Something like:which prints:
This seems to work. I'll try to put together a PR. @mmd-osm let me know if I'm making any naive logical errors here.
The text was updated successfully, but these errors were encountered: