Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Long intervals during resource iteration can lead to issues #141

Open
hermit-crab opened this issue Jan 8, 2020 · 0 comments
Open

Long intervals during resource iteration can lead to issues #141

hermit-crab opened this issue Jan 8, 2020 · 0 comments
Labels

Comments

@hermit-crab
Copy link
Contributor

hermit-crab commented Jan 8, 2020

Hello.

Recently there was this issue #121 for which a batch read workaround was implemented. I am now experiencing from what I believe to be same or similar issue but now while using JSON instead of msgpack. Basically when I do for item in job.items.iter(..., count=X, ...): if there are long intervals during iteration the count can end up being ignored. I was able to reproduce it with the following snippet:

sh_client = ScrapinghubClient(APIKEY, use_msgpack=False)
take = 10_000
job_id = '168012/276/1'
for i, item in enumerate(sh_client.get_job(job_id).items.iter(count=take, meta='_key')):
    print(f'\r{i} ({item["_key"]})', end='')

    if i == 3000:
        print('\nsleeping')
        time.sleep(60*3)
    
    if i > take:
        print('\nWTF')
        break

With the sleep part removed the WTF section does not fire and the iterator stops on 168012/276/1/9999th item.

This seem to be more of a ScrapyCloud API platform problem but I am reporting it here to track nonetheless.

For now I am assuming resource/collections iteration is not robust if any delays are possible client side during retrieval (I haven't tested any other potential issues) and I will try either preloading all at once (.list()) or using .list_iter() when makes sense as a habit.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

1 participant