Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improve data fetcher path handling #44

Closed
rmarkello opened this issue Jun 19, 2019 · 0 comments · Fixed by #66
Closed

Improve data fetcher path handling #44

rmarkello opened this issue Jun 19, 2019 · 0 comments · Fixed by #66
Labels
refactor Not an enhancement, but not a bug

Comments

@rmarkello
Copy link
Owner

The issue(s)

If a user does not have a local copy of all the Allen Institute human microarray data then, when calling abagen.get_expression_data, the data are automatically fetched. Currently, this fetching procedure will create a directory in the user's current directory called allenbrain and all the data are downloaded and unzipped in there. Users can optionally provide a path to the data_dir parameter, but this requires that the specified path be to a directory inside of which is a directory labelled allenbrain with all the necessary data. This can be confusing as many people think that data_dir should be the allenbrain directory itself.

Furthermore, most users might not want to have to constantly point to the directory where they store it (or deal with constantly downloading duplicate copies of it in their current working directory). It might be nice to think about using a $HOME directory (a la nilearn, sklearn, etc.) to store the downloaded data and attempt to fetch from, if no data_dir path is provided.

Proposed solutions

To address the first issue we could modify the checking of the data_dir parameter to accept EITHER the current scenario (a path to a directory inside of which is a directory labelled allenbrain with all the necessary data) OR a path to a directory inside of which is all the Allen Brain Atlas data (i.e., it does not need to be labelled allenbrain). So, providing either data_dir=/home/ross/Desktop or data_dir=/home/ross/Desktop/allenbrain in the following example should work:

>>> import abagen
>>> atlas = abagen.fetch_desikan_killiany()

# allenbrain directory exists on Desktop
>>> data_dir = '/home/ross/Desktop'
>>> abagen.get_expression_data(atlas.image, data_dir=data_dir)

# data_dir points to allenbrain directory itself
>>> data_dir = '/home/ross/Desktop/allenbrain'
>>> abagen.get_expression_data(atlas.image, data_dir=data_dir)

assuming that /home/ross/Desktop/allenbrain was organized as:

allenbrain/
├── normalized_microarray_donor9861/
├── normalized_microarray_donor10021/
...
└── normalized_microarray_donor15697/

To address the second issue we could consider creating the allenbrain directory in the user's $HOME directory by default (instead of in the current directory). If it doesn't already exist there, we can do a quick check in the current working directory (and the data_dir path, if provided) and, if nothing else exists, fetch the data to the $HOME directory.

@rmarkello rmarkello added the enhancement New feature or request label Jun 19, 2019
@rmarkello rmarkello added refactor Not an enhancement, but not a bug and removed enhancement New feature or request labels Jul 8, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
refactor Not an enhancement, but not a bug
Projects
None yet
Development

Successfully merging a pull request may close this issue.

1 participant