Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

too many elements cause 414 error #62

Open
jose1711 opened this issue Sep 29, 2016 · 8 comments
Open

too many elements cause 414 error #62

jose1711 opened this issue Sep 29, 2016 · 8 comments

Comments

@jose1711
Copy link

when attempting to download a big number of elements (say - using WaysGet method) ends in 'Request-URI Too Long'. it would be nice if osmapi is able to fight this by allowing to finish the request in chunks

@austinhartzheim
Copy link
Contributor

@jose1711 Could you provide an example Way ID that causes this error to occur? I'd be interested in looking into this more if @metaodi thinks we should support this feature.

@jose1711
Copy link
Author

jose1711 commented Oct 7, 2016

well... it's really just a looooong list of ways to download that triggers the error like wayid1, wayid2, ..waid999999

@metaodi
Copy link
Owner

metaodi commented Dec 7, 2016

@austinhartzheim feel free to look into that. I think we need a way to stop at some point to avoid an endless loop. Maybe we can use this issue to discuss possible solutions. Do you already have an idea?

In general in think it's good to provide this kind of abstraction, so that a consumer of osmapi doesn't have to care about URL length limits. Something like a generator might come in handy here. I've seen something similar already in the OAI-PMH client implementation of pyoai. Let me know if you want to discuss this more in detail.

@austinhartzheim
Copy link
Contributor

Excellent. I'm busy with final projects/exams at my university right now but I should have time in late December. If someone else is interested in working on this issue before then, feel free to take it.

@austinhartzheim
Copy link
Contributor

Root Cause

I've been looking into this issue and it seems that the URI length limit is not defined in the API server software. Rather, I believe that the limit is imposed by the Apache web server itself. It seems that the length of the HTTP request line is the limiting factor. And Apache limits it to 8190 bytes by default.

This is the default value, which has not been set specifically on the servers. (If we wanted to pursue having the value set explicitly on the servers rather than relying on the default, I believe this Chef file would be the location to do it).

Experimental Verification

The following code shows that a request line of 8190 bytes gives the expected result whereas a request line of 8195 bytes causes the 414 error we are addressing in this issue:

len('GET /api/0.6/waysways=') + len(','.join([str(x) for x in range(1, 1854)])) + len(' HTTP/1.1\r\n')  # 8190
len('GET /api/0.6/waysways=') + len(','.join([str(x) for x in range(1, 1855)])) + len(' HTTP/1.1\r\n')  # 8195

api.WaysGet(range(1,1854))  # 404 error - expected
api.WaysGet(range(1,1855))  # 414 error - not expected

Possible Solutions

Here are some of the most likely solutions.

  • Hardcode a URI length limit constant: From an efficiency perspective, we may want to place as many Way IDs into each request as possible (and consequently use fewer HTTP requests). The downside is that any shortening of the limit would cause the 414 errors to return (although this could be easily fixed again by changing the constant). Also, it seems unlikely to me that this limit would be changed.
  • Push for a URI limit in the specification: The OSM wiki (which may or may not be authoritative) does not specify a URI limit, but instead includes a question, possibly indicating a need for future improvement: "How long is too long?" Putting the limit in the specification and in the Chef files for the Apache configuration might be the most responsible way forward and it would help any future API users understand the limitations better. This would allow us to hardcode a URI length limit with more confidence that it will not change in the future.
  • Use conservative limits to preempt future changes: We might consider artificially lowering our length limit constant to prevent future problems that may arise if the length limit is lowered on the servers.
  • Include a constant number of IDs in each request: We could send a constant number of IDs in each request and choose that constant such that we do not violate the length limit. A benefit of this approach is that developers can easily estimate the number of API requests will be made for each call to NodesGet()/WaysGet()/RelationsGet(). A potential issue is support for 64-bit IDs which would limit us to ~380 IDs under the current length limits. Furthermore, we lose efficiency by creating extra API calls in situations where we may not need to.

Discussion

I'm personally leaning towards hardcoding a URI length limit constant, with or without trying to standardize the limit. I believe that the efficiency gains of this approach may be significant. Furthermore, I do not think it is likely that the length limit will be decreased in the future.

I'm interested in hearing your thoughts or alternate solutions.

@metaodi
Copy link
Owner

metaodi commented Jan 11, 2017

@austinhartzheim thank you very much for this very thorough analysis of the problem at hand.

I have a few things to add:

  • I think we should not be shy to just try a request and see if it fails or not. I think this is a very pythonic way of handling things anyway. So I wouldn't bother to check and measure the optimal size, but rather use a good default, try the request, and if it fails, maybe try to split it differently, and if it still fails, raise the error to the user
  • I think the places you tracked down are very good the come up with a good default value and/or influence the definition of an actual limit in the API as you already mentioned. But it still leaves us with at least two problems:
    1. People might still use an old version of the API
    2. People might run osmapi against their very own custom version of the API, running on their server or customized code, not running on Apache

All these points lead me to the conclusion, that I'd prefer a limit with a good default value, that a user of osmapi can override (e.g. in the constructor). If the limit is reached, another request is sent to the OSM API with the remaining items, the results are then put back together and returned to the consumer as "one", so that this whole process is transparent from a users perspective (i.e. they don't notice it).

@austinhartzheim
Copy link
Contributor

I like the idea of retrying the request if we see a 414 error.

I think a good strategy would be to start at ~8000 bytes. Upon encountering a 414 error, we divide that number in half and retry. And if we encounter another 414 error, we divide it in half again to ~2000 bytes. After that, we raise an exception if the request is not successful.

The reason for starting at 8000 is that RFC 7230 recommends that servers support at least 8000 byte request lines.

The reason for ending at 2000 is because this is what browsers support and so almost every server (unless configured otherwise) is likely to support that.

Also, we can add a configuration option to override the default settings. I'm considering setting the number of retries to zero if that is the case (or perhaps we can make that configurable as well).


Also, you mentioned using a generator. Do you want the methods to return a generator instead or should we collect all the results and return them as a list?

@metaodi
Copy link
Owner

metaodi commented Sep 20, 2017

I stumped upon this library for retrying, this might be handy for this use case.
About the generators: I quite like the idea of returning generators when we return multiple items.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants