Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Use parfive in gammapy download #2308

Merged
merged 5 commits into from Aug 23, 2019

Conversation

@Bultako
Copy link
Member

commented Jul 24, 2019

This PR addresses issue #2269.

It uses parfive package for async files downloading, removing the need of multiprocessing, and using an external well-tested solution.

The download times are highly reduced.

Without parfive.

time gammapy download datasets
INFO:gammapy.scripts.downloadclasses:Looking for datasets...
INFO:gammapy.scripts.downloadclasses:Reading https://raw.githubusercontent.com/gammapy/gammapy/master/dev/datasets/gammapy-data-index.json
INFO:gammapy.scripts.downloadclasses:Content will be downloaded in gammapy-datasets
Downloading files [==================================================] 100% 

*** You might want to declare GAMMAPY_DATA env variable
export GAMMAPY_DATA=/Users/jer/Desktop/gammapy-datasets


real	1m46.545s
user	0m15.772s
sys	0m2.390s

With parfive.

time gammapy download datasets
INFO:gammapy.scripts.downloadclasses:Looking for datasets...
INFO:gammapy.scripts.downloadclasses:Reading https://raw.githubusercontent.com/gammapy/gammapy/master/dev/datasets/gammapy-data-index.json
INFO:gammapy.scripts.downloadclasses:Content will be downloaded in gammapy-datasets
Files Downloaded: 100%|██████████████████████████████████████████████ 569/569 [00:36<00:00, 15.56file/s]
                                                                                                                                                                                                                                                                              
*** You might want to declare GAMMAPY_DATA env variable                                                                                                                                                                                                                       
export GAMMAPY_DATA=/Users/jer/Desktop/gammapy-datasets                                                                                                                                                                                                                       
                                                                                                                                                                                                                                                                              
                                                                                                                                                                                                                                                                              
real	0m39.452s
user	0m10.294s
sys	0m1.784s
du -hs gammapy-datasets/
162M	gammapy-datasets/

This PR adds parfive as a new external dependency, which in turn needs the following requirements:

  • Python 3.5+
  • aiohttp
  • tqdm
@Bultako Bultako requested a review from cdeil Jul 24, 2019
@Bultako Bultako self-assigned this Jul 24, 2019
@Bultako Bultako added the feature label Jul 24, 2019
@adonath adonath added this to the 0.14 milestone Jul 25, 2019
@cdeil

This comment has been minimized.

Copy link
Member

commented Jul 25, 2019

@Bultako - Thanks!

Are you available for the Gammapy dev call tomorrow to discuss this?
It's an extra dependency, so it's controversial whether to do this or not.

@registerrier @adonath and I would prefer not to rush this decision, so we've postponed this to v0.14 and don't have to decide today or tomorrow.

Copy link
Member

left a comment

@Bultako - Thanks!

As mentioned in the devcall, I'm overall +1 to use this.
See inline comments.

from urllib.request import urlretrieve, urlopen
from pathlib import Path
from parfive import Downloader

This comment has been minimized.

Copy link
@cdeil

cdeil Jul 26, 2019

Member

Suggest to make parfive an optional dependency, which I think means you have to make this a delayed import, and in the test put @requires_dependency?

pool.close()
retrieve = True
if md5 and path.exists():
md5local = hashlib.md5(path.read_bytes()).hexdigest()

This comment has been minimized.

Copy link
@cdeil

cdeil Jul 26, 2019

Member

I was hoping parfive would allow us to get rid of urllib completely, and also code that checks if files already have been downloaded. Can you try to go that way, to use it for everything?
If some feature is missing (such as e.g. checksum check to avoid downloading multiple times), you could file a feature request with parfive?

@Bultako Bultako force-pushed the Bultako:parfive branch 3 times, most recently from ec296e8 to 13b8254 Jul 26, 2019
@Bultako Bultako force-pushed the Bultako:parfive branch from 13b8254 to 454ec15 Aug 14, 2019
@Bultako

This comment has been minimized.

Copy link
Member Author

commented Aug 23, 2019

@cdeil
There's still one unrelated Codacy issue, so you can proceed now for final review.

@cdeil

This comment has been minimized.

Copy link
Member

commented Aug 23, 2019

@Bultako - Thanks!

(from a quick look at the diff, this seems OK. I didn't try this locally or review it in detail.)

@cdeil
cdeil approved these changes Aug 23, 2019
@cdeil cdeil merged commit 0b1f295 into gammapy:master Aug 23, 2019
8 of 9 checks passed
8 of 9 checks passed
Codacy/PR Quality Review Not up to standards. This pull request quality could be better.
Details
Scrutinizer Analysis: 2 new issues, 1 updated code elements – Tests: passed
Details
continuous-integration/travis-ci/pr The Travis CI build passed
Details
gammapy.gammapy Build #20190823.2 succeeded
Details
gammapy.gammapy (DevDocs) DevDocs succeeded
Details
gammapy.gammapy (Lint) Lint succeeded
Details
gammapy.gammapy (Test Python36) Test Python36 succeeded
Details
gammapy.gammapy (Test Windows36) Test Windows36 succeeded
Details
gammapy.gammapy (Test Windows37) Test Windows37 succeeded
Details
@Bultako Bultako deleted the Bultako:parfive branch Aug 23, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
3 participants
You can’t perform that action at this time.