Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

what's the normal time for downloading a paper? #41

Closed
shizhediao opened this issue Apr 10, 2020 · 2 comments
Closed

what's the normal time for downloading a paper? #41

shizhediao opened this issue Apr 10, 2020 · 2 comments
Labels
question Questions about how to use this package.

Comments

@shizhediao
Copy link

Hi,
Thanks for your great work!
I was wondering what's the normal time for downloading a paper?
I would like to download as much as possible papers to do some research. Maybe the size is 10 K ~ 100 K.
But for now, it costs me 10 seconds for each paper downloading, so is it possible to speed up?
Thanks very much!

@shizhediao shizhediao added the enhancement Requests for new features or improvements. label Apr 10, 2020
@lukasschwab lukasschwab added question Questions about how to use this package. and removed enhancement Requests for new features or improvements. labels Apr 11, 2020
@lukasschwab
Copy link
Owner

lukasschwab commented Apr 11, 2020

Hmm, this depends on how you're finding the papers to download.

  • Does the 10-second operation include a query, or is it just the call to arxiv.download? It may be possible to improve the query performance.
  • arxiv.download uses urlretrieve; I don't know if this is the quickest solution for downloads in bulk. You might be interested in building your own bulk-download function.

Probably the most useful: if you just want as many papers as possible, arXiv offers bulk access to tarfiles of PDFs and source files via S3: https://arxiv.org/help/bulk_data_s3

I'll close this issue for the time being; feel free to reopen it if this doesn't answer your question!

@shizhediao
Copy link
Author

Thanks for your reply!
In my experiment, it costs 10-second only for the call to arxiv.download.
Thanks for pointing out the bulk access, I'll take a look.
Thanks very much!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
question Questions about how to use this package.
Projects
None yet
Development

No branches or pull requests

2 participants