Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Overpass status endpoint provides incorrect results #698

Closed
gboeing opened this issue Apr 28, 2021 · 7 comments · Fixed by #699
Closed

Overpass status endpoint provides incorrect results #698

gboeing opened this issue Apr 28, 2021 · 7 comments · Fixed by #699
Labels

Comments

@gboeing
Copy link
Owner

gboeing commented Apr 28, 2021

I've been noticing mysteriously long pauses required by Overpass's status endpoint recently. There may be an explanation now... it's possible that the Overpass status endpoint provides "incorrect" results due to load balancing/redirection among multiple subdomains now. In #697 @mmd-osm says:

/api/status may show incorrect results for overpass-api.de, as there are really two independent servers behind this URL. However, /api/status always reflects the status of a single server only. There's currently no workaround for it, unless you're specifically targeting one of the two machines.

and

It's a bit unfortunate that /api/status hasn't been designed in a way to handle multiple servers in a consistent way. Due to this design issue, it's hard to recommend some best practice. At this time, I wouldn't hardcode any of the subdomains, as they might change over time: new servers might get added, or existing ones decommissioned and replaced by bigger machines.

I don't know if it's feasible for osmnx to decouple the "overpass-api.de" name server lookup from sending the actual request to the server. As an example, curl offers the command line option --resolve to influence name resolving, thereby directing the request to one specific IP addresses. By using the same IP address for both queries and /api/status, you should get a consistent picture.

The clear downside of this approach is that you would essentially need to re-implement DNS round robin, to make sure a request uses another server in case of an issue.

So, my understanding is that the domain overpass-api.de just redirects to one of its subdomains (currently z.overpass-api.de and lz4.overpass-api.de). So if we check the status endpoint of overpass-api.de, we may see results for subdomain z, but when we submit the query itself it gets redirected to subdomain lz4. This could result in violating the slot management timing for that server, and potentially a long delay before the next query is allowed. This may explain the particularly long delay times between queries I've seen in recent months.

I believe we can do the decoupling suggested by @mmd-osm in Python with the socket module.

import socket
socket.gethostbyname("localhost")
# returns 127.0.0.1

For example if we run:

domains = ["overpass-api.de", "z.overpass-api.de", "lz4.overpass-api.de"]
ips = [socket.gethostbyname(d) for d in domains]
for d, ip in zip(domains, ips):
    print(d, "\t", ip)

we see:

overpass-api.de 	 178.63.11.215
z.overpass-api.de 	 178.63.11.215
lz4.overpass-api.de 	 178.63.48.217

but if I change my computer's IP address and run it again, we see:

overpass-api.de 	 178.63.48.217
z.overpass-api.de 	 178.63.11.215
lz4.overpass-api.de 	 178.63.48.217

So, it looks like socket.gethostbyname(domain) does return the IP address of one of its subdomains when you pass in "overpass-api.de". We could then submit all Overpass API requests directly to the IP address resolved by socket.gethostbyname(domain). Something like:

import requests
overpass_endpoint = "http://overpass-api.de/api"
start = overpass_endpoint.find('//') + 2
end = overpass_endpoint[start:].find('/')
domain = overpass_endpoint[start:start+end]
ip = socket.gethostbyname(domain)
overpass_endpoint = overpass_endpoint.replace(domain, ip)
print(domain, ip, overpass_endpoint)
print(requests.get(overpass_endpoint + "/status").text)

which prints:

overpass-api.de 178.63.48.217 http://178.63.48.217/api
Connected as: 2333923515
Current time: 2021-04-28T18:09:53Z
Rate limit: 2
2 slots available now.
Currently running queries (pid, space limit, time limit, start time):

This seems to work. I'll try to put together a PR. @mmd-osm let me know if I'm making any naive logical errors here.

@mmd-osm
Copy link

mmd-osm commented Apr 28, 2021

Thanks! I believe the approach based on a numeric hostname might break in case of https. I've found a blog post on this topic, but I haven't done more research on any specifics. Maybe something to keep in mind for a future implementation: https://www.tomechangosubanana.com/2017/forcing-python-requests-to-connect-to-a-specific-ip-address/

@gboeing
Copy link
Owner Author

gboeing commented Apr 29, 2021

@mmd-osm you were right about that. I found a potential workaround for https, but now I'm wondering if this is all necessary: as far as I can tell, requesting the "status" endpoint or the "interpreter" endpoint always resolves to the same IP address as the other. For example:

import requests
with requests.get("https://overpass-api.de/api/status", stream=True) as r:
    print(r.raw._original_response.fp.raw._sock.getpeername()[0])
with requests.get("https://overpass-api.de/api/interpreter", stream=True) as r:
    print(r.raw._original_response.fp.raw._sock.getpeername()[0])

This gets the "status" then "interpreter" endpoints and prints the host's IP address for each request. The IP address is always the same across the two requests. Sometimes they are both 178.63.11.215. Other times, when I change my own IP address, they are both 178.63.48.217. But the two requests always resolve to the same IP address.

As far as I can tell, the round-robin DNS causes every request I make to any URL at overpass-api.de resolve to the same IP address as any other (until I change my own IP address). So the slot management results I see at the "status" endpoint thus will always correspond to the same server I'm querying at the "interpreter" endpoint, right?

@mmd-osm
Copy link

mmd-osm commented Apr 29, 2021

I'm still looking for some good references describing the actual behavior, although I'm suspecting this might be entirely implementation dependent.

https://en.wikipedia.org/wiki/Round-robin_DNS mentions client-side address caching and reuse in the Drawbacks section, which is very likely what you're seeing here.

By the way, I tried your code example above on two different boxes, running Ubuntu 20.04 and Raspbian Linux. In both cases, I'm getting deviating IP addresses across both requests:

python3 demo.py
178.63.11.215
178.63.48.217

@gboeing
Copy link
Owner Author

gboeing commented Apr 29, 2021

Fascinating. I tested this on Ubuntu 20.04 and Windows 10, and have only been able to get the host IP address to change by changing my own IP, and the two requests' resolved IPs always match each other.

@gboeing
Copy link
Owner Author

gboeing commented May 4, 2021

@mmd-osm I have a proposed fix in #699 that seems to work properly for both http and https requests. Would you mind taking a quick look at it? I want to make sure OSMnx is being a good consumer of the Overpass API: hopefully this fix will ensure the status endpoint gives it correct results for the server being queried at the interpreter endpoint. It appears to now, in my testing.

@mmd-osm
Copy link

mmd-osm commented May 8, 2021

Apologies for the delay, I wanted to give the new code a try and run at least the unit tests. Everything looks fine here, querying the status and subsequent interpreter calls are all using the same IP. Switching between both servers also looks fine. For you reference, I have attached one of my log files: log.txt

I've noticed two unrelated points, which might be a topic for another issue.

  1. open.mapquestapi.com 403 -> I believe this service is no longer around.

  2. Shorter Overpass QL statements

Since about three years, (node(poly: ...); way(poly:...); rel(poly:...);); could be written in an abbreviated form as nwr(poly: ...);. https://dev.overpass-api.de/blog/loop_and_group.html#nwr has some more details on this syntax enhancement.

Thanks again for looking into this topic!

@gboeing
Copy link
Owner Author

gboeing commented May 9, 2021

Thanks @mmd-osm.

I believe this service is no longer around.

It appears to be working, but requires an API key which the tests don't currently use. Thanks for the heads-up about the nwr syntax: I'll take a look!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants