Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ENH: Add download_all utility method & script #17163

Merged
merged 6 commits into from Nov 24, 2022

Conversation

AnirudhDagar
Copy link
Member

@AnirudhDagar AnirudhDagar commented Oct 7, 2022

Reference issue

#16983

What does this implement/fix?

_download_all.py can be run as a standalone script to download all the data files required by scipy.datasets at once without the need to build SciPy first. This could be useful; for example, SciPy Debian, Fedora etc. packages may have build time constraints for downloading from external links.

As per the discussion with Ralf earlier today, the steps might look something like this:

git clone ...
git submodule update ...
# Download dataset files
python _download_all.py 

...continue the build process

Although I couldn't find the exact SciPy Debian build script in https://salsa.debian.org/python-team/packages/scipy, where something like this should be placed, it would be great to hear @drew-parsons, your thoughts and if could help us review this. Thanks!

Besides, download_all can also be used as a utility method within the scipy.datasets module if required.

from scipy import datasets

# Cache all datasets
datasets.download_all()

# Now work without an internet connection

cc @rgommers

Additional Information

Sharing Network access by Debian package builds by Nathan Willis, talks about some constraints when using the internet at build time in Debian packages for people who didn't know about this strict rule.
I found it a nice read since I recently learnt about the constraint when working on scipy.datasets module.

download_all.py can be run as a simple script to download
all the data files required by scipy.datasets at once.
This is useful for packages that may have some restrictions
e.g, scipy#15607 (comment)
@AnirudhDagar
Copy link
Member Author

@rgommers, a little reminder requesting your review, in case it was missed. I'll raise the clear_cache utility method PR (already implemented) after this lands to avoid unnecessary merge conflicts. Thanks!

Copy link
Member

@rgommers rgommers left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Overall LGTM, a couple of changes needed @AnirudhDagar. And apologies for the delay in reviewing

scipy/datasets/_download_all.py Outdated Show resolved Hide resolved
scipy/datasets/__init__.py Outdated Show resolved Hide resolved
scipy/datasets/_download_all.py Outdated Show resolved Hide resolved
scipy/datasets/_download_all.py Outdated Show resolved Hide resolved
@rgommers rgommers added this to the 1.10.0 milestone Nov 24, 2022
Copy link
Member

@rgommers rgommers left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM now, thanks @AnirudhDagar. In it goes.

@rgommers rgommers merged commit b265d91 into scipy:main Nov 24, 2022
@AnirudhDagar AnirudhDagar deleted the add_download_all branch November 24, 2022 17:28
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement A new feature or improvement scipy.datasets
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants