Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improve batch processing functionality and timeouts handling with large amounts of data #35

Closed
st-polina opened this issue Nov 23, 2020 · 4 comments · Fixed by #37
Closed
Assignees

Comments

@st-polina
Copy link
Member

2 issues combined into one:
Processing 50,000 IP addresses using the batch functionality provided by the library is limited to 1000 IPs per batch. Due to the limitation, users need to split it into 50 different batches. Sometimes the server might time out and the request would be left incomplete. Despite the timeout, the requests still counted for the request limit. We need to improve the batch processing functionality and timeouts handling with large amounts of data.

@UmanShahzad
Copy link
Contributor

@st-polina Hey thanks for filing this issue.

  1. 1000 IPs per batch is an IPinfo API limit, not this SDK's limit; see https://ipinfo.io/developers/batch.
  2. If by timeout you really mean client timeout, then that is adjustable in the SDK; see https://github.com/ipinfo/python#modifying-request-options (it defaults to 2 seconds). If you really mean the server is timing out, i.e. after 60 seconds or more it still doesn't respond, then we can definitely investigate that in the backend.
  3. As for still counting against the request limit despite not technically receiving the result, that should be solved if the timeout is properly managed by the client; if we process 1000 IPs on the backend and the client decides to quit (i.e. timeout) before receiving the results in 1 second, it is reasonable for us to still count that against their quota - they just need to wait a bit longer if they want to see results.

@coderholic
Copy link
Member

1000 IPs per batch is an IPinfo API limit, not this SDK's limit; see ipinfo.io/developers/batch.

I guess the question is how much we want to abstract this in the client. Should the batch method do chunking itself, or should we make the user do it? Or should we add another method that does chunking for the user? I think the SDKs should probably handle it - allow batch to take an arbitrary number of IPs, and chunk and combine the results.

If by timeout you really mean client timeout, then that is adjustable in the SDK; see ipinfo/python#modifying-request-options (it defaults to 2 seconds). If you really mean the server is timing out, i.e. after 60 seconds or more it still doesn't respond, then we can definitely investigate that in the backend.

I think the issue is the client is hitting the 2s timeout, but the requests are still processing on the server. So the client cant' get the results. The bulk endpoint shouldn't have a 2s response timeout when looking up 1k IPs - it should scale with the number of IPs or something, or have a separate timeout (eg. 60s)

@UmanShahzad
Copy link
Contributor

Alright that makes sense, thanks for clarifying! So the action items in this issue are:

  1. Accept arbitrary list sizes for IPs to a batch function, and have the SDK chunk that and return all the results at once.
  2. Make the timeout for this new batch function scale to the number of IPs. We can test how long 1k IPs usually take, add 1 second to that, and use that as the default scaling factor per 1k. The user can hard-override the total timeout, or override the default scaling factor.

@UmanShahzad
Copy link
Contributor

This is released in v4.1.0.

https://pypi.org/project/ipinfo/4.1.0/
https://github.com/ipinfo/python/releases/tag/v4.1.0

Please let me know if the client has any more feedback.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants