-
Notifications
You must be signed in to change notification settings - Fork 111
Add batch geocoding feature #13
Comments
Curious to talk through how to implement this. Basic problems:
I don't have an elegant answer to this; in the demo code I just throw the results onto disk and expect the user to come back and sort it out. |
@sbma44 introducing users to parallel requests is going to run them up against the rate limits pretty quickly, isn't it? Say we create 12 threads and each makes a batch of 50 geocode queries... there's a UX issue to work out. The mbx-geocode command has some support for batching built in already: it can get queries from stdin and doesn't presume that there's only one. We could collect queries into batches, send them to the batch geocode endpoint, and write out results out the batch, syncronously. If we gave users an async option, maybe we would also put on them the responsibility of looking in the query object of the responses instead of trusting in any order. There are already some standard methods in Python that behave like this. The multiprocessing module's imap_unordered() is one. |
Good point about ratelimiting. If we don't parallelize, this is as simple as |
Talked about this a bit more with @sgillies -- we're still not sure just yet how to make this feature meaningful to users. Closing for now; if people wind up needing the feature we can reopen. |
Just about to write the same thing. Intention is to make use of the batch geocode endpoint where it makes sense, but not surface it specifically in the Python client. |
@sbma44 @sgillies I'd like to clarify the decisions on batch geocoding and maybe revisit this now that the SDK and the CLI are a bit further along. If I understand correctly, the advantage of using the batch geocoding at the API level is you can get many more geocodes per request. This would be more desirable than making many concurrent requests as you wouldn't hit the rate limit as quickly, right? In the SDK: In the CLI: I'm still not entirely clear the best way to do this but batch geocoding is going to come up eventually and right now we don't have a good story for achieving that at the sdk or at the cli level. I'd like to at least clarify and document some suggested approaches. |
No, the rate limit is per-query, not per request. You don't get anything for free here. The advantages of batch are:
I dislike batch because
We have a few recent features (e.g. country filtering) that will probably be landing in a PR here soon, and adding batch then might make sense. But I think the bigger advantage, by far, would be building in systems for adaptive ratelimiting and parallelized requests, rather than focusing on the batch query mode itself. I have example code that does this but am very sure it could benefit from a less hacky python dev's eyes :P (also, from a non-beta dependency -- that |
We should clarify that in the docs. The batch docs say "Up to 50 queries may be included in a single batch geocoding request" and the rate limits are specified in requests per minute.
All good reasons not to go that route, though the SDK could potentially hide the semicolon delimiting.
That looks like a good approach! I think we could port that logic over to asyncio which is the new standard for asynchronous coding in python. What about this: Adding |
Here's an Some issues with it
|
Agreed, PR here: https://github.com/mapbox/www.mapbox.com/pull/6565. In this ticket I've been using Thanks very much for tackling the batch endpoint. Quick reactions:
But this is just my own gut reaction. Can you say more about your thinking on where this line belongs? |
Python 2 support? I think Python 2 will be around for a long long time to come, most likely even after official support ends in 2020. We should support it until at least then. My current thinking on the SDK vs CLI: The SDK is low-level and tracks the APIs very closely. The CLI can contain higher level abstractions like batching and concurrent requests. They'd be hidden from the user by a clean command line interface of course. |
That distinction makes sense to me. Still, appropriate use of the rate limit headers is a core part of using the API correctly. Perhaps an |
I guess you could say we're currently handling rate limiting by asking for forgiveness, not permission. IOW, the user of the sdk can just keep making requests until the server tells them "No" which should show up in the response message/statuscode/headers. I like the idea of keeping the sdk simple but providing an |
I'm going to close this in favor of mapbox/mapbox-cli-py#44 and #104 |
https://www.mapbox.com/developers/api/geocoding/#batch
A POST endpoint for these batch requests would be a little easier to use, but no biggie.
The text was updated successfully, but these errors were encountered: