New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fetcher downloads data every time it is called #1638

Closed
BramshQamar opened this Issue Sep 13, 2018 · 10 comments

Comments

Projects
None yet
4 participants
@BramshQamar
Copy link
Contributor

BramshQamar commented Sep 13, 2018

Every fetcher in fetcher.py downloads data every time it's called. Even when data is already in place.

for example:

from dipy.data.fetcher import fetch_scil_b0, read_siemens_scil_b0
fetch_scil_b0()

output:

Data size is approximately 9.2MB Downloading "datasets_multi-site_all_companies.zip" to /Users/bramshqamar/.dipy Download Progress: [##################################] 100.00% of 9.19 MBFiles successfully downloaded to /Users/bramshqamar/.dipy Out[1]: ({'datasets_multi-site_all_companies.zip': ('https://digital.lib.washington.edu/researchworks/bitstream/handle/1773/38479/datasets_multi-site_all_companies.zip', None)}, '/Users/bramshqamar/.dipy')

lets fetch again:
fetch_scil_b0()

output:

Data size is approximately 9.2MB Downloading "datasets_multi-site_all_companies.zip" to /Users/bramshqamar/.dipy Download Progress: [##################################] 100.00% of 9.19 MBFiles successfully downloaded to /Users/bramshqamar/.dipy Out[2]: ({'datasets_multi-site_all_companies.zip': ('https://digital.lib.washington.edu/researchworks/bitstream/handle/1773/38479/datasets_multi-site_all_companies.zip', None)}, '/Users/bramshqamar/.dipy')

This is the case with every fetcher.

@BramshQamar

This comment has been minimized.

Copy link
Contributor

BramshQamar commented Sep 18, 2018

I am on Mac. I have Python 3.6.1, Numpy 1.12.1, and Nibabel 2.3.0.

@arokem

This comment has been minimized.

Copy link
Member

arokem commented Sep 18, 2018

Is the data on your hard-drive in between calls? Look in ~/.dipy

@skoudoro

This comment has been minimized.

Copy link
Member

skoudoro commented Sep 18, 2018

this data does not have md5, can it be the problem @arokem?

@arokem

This comment has been minimized.

Copy link
Member

arokem commented Sep 18, 2018

Might be. I can replicate this bug with the SCIL b0 dataset, but not with other datasets. @BramshQamar: Did you experience this also with other datasets?

I bet we need to change something around this line: https://github.com/nipy/dipy/blob/master/dipy/data/fetcher.py#L175

@skoudoro

This comment has been minimized.

Copy link
Member

skoudoro commented Sep 18, 2018

I bet we need to change something around this line:

or update the dataset. having a digital signature like md5 should be good practice, right? this permit to make sure that the dataset does not change.

What do you think?

@arokem

This comment has been minimized.

Copy link
Member

arokem commented Sep 18, 2018

Yes. Even better.

@skoudoro

This comment has been minimized.

Copy link
Member

skoudoro commented Sep 18, 2018

So, can you update it? I do not know how to access to https://digital.lib.washington.edu/

@arokem

This comment has been minimized.

Copy link
Member

arokem commented Sep 18, 2018

I don't think you need to access that website. Just add this line: #1643

@BramshQamar

This comment has been minimized.

Copy link
Contributor

BramshQamar commented Sep 18, 2018

I was trying with some fetchers I wrote and fetch_scil_b0. I will add md5 in my fetchers and will test them.
Thank You @arokem @skoudoro

@skoudoro

This comment has been minimized.

Copy link
Member

skoudoro commented Sep 23, 2018

fix by #1643

@skoudoro skoudoro closed this Sep 23, 2018

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment