New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[MAINT, MRG] Enable pooch to perform fetching of datasets #9742
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
a few prelim comments
So majority of the work was done by @drammock originally in his PR. Really great push! I've added some additional changes to basically remove as much redundant code as possible. Here, I have basically removed almost the entire This PR does not change the user interface, and so far works pretty great for me to pull datasets. If this PR is approved, then I can easily add in a public interface (turning |
Fixing the docs error. Everything else actually LGTM. There's many LOC reduced (~800). The only other error is:
However, I'm not entirely sure why that line is producing an error... Any ideas? |
The error has to do w/ capsys, or calling data_path twice? I think? This test succeeds:
But this one fails
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
just one last bit of cruft in requirements_testing.txt
. I think this is good to go after that.
Now there is an error with GH actions though:
Do we need pooch somewhere besides where we already added? |
dang, OK, I'll fix it |
ci failure is unrelated (Windows 3.7 pip timeout) |
@@ -0,0 +1,46 @@ | |||
# include here the name of the zipped file and the md5 hash |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
later PR: Put into python code and create temp file for internal
TODO list (from live chat with @agramfort and @adam2392):
|
Unrelated CI failure. |
indeed they look unrelated to this PR, though did you see the |
Ah fixed the resource warning. It was an error with not closing the files when a runtime error is raised. |
all green! pooch PR is ready for final review |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@larsoner merge if happy
@@ -261,6 +266,7 @@ def _export_raw(fname, raw, physical_range, add_ch_type): | |||
onset = onset * 10000 | |||
duration = duration * 10000 | |||
if hdl.writeAnnotation(onset, duration, desc) != 0: | |||
hdl.close() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This solution is less clean / DRY than a solution that uses a context manager like with hdl:
or, if that's not an option:
@contextmanager
def _close_file(fid):
try:
yield
finally:
fid.close()
But we can clean this up in a follow-up PR
@@ -17,6 +17,7 @@ elif [ "${TEST_MODE}" == "pip-pre" ]; then | |||
python -m pip install --progress-bar off --upgrade --only-binary ":all" "vtk<=9.0.1" | |||
python -m pip install --progress-bar off https://github.com/pyvista/pyvista/zipball/main | |||
python -m pip install --progress-bar off https://github.com/pyvista/pyvistaqt/zipball/main | |||
python -m pip install --progress-bar off pooch |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is not needed because the last line includes pip install -r requirements_testing.txt
and it's in there
echo "pooch" | ||
pip install --progress-bar off pooch |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same
hf_sef_raw=pooch.Untar( | ||
extract_dir=path, members=[f'hf_sef/{subdir}' for subdir in | ||
('MEG', 'SSS', 'subjects')]), | ||
hf_sef_evoked=pooch.Untar( | ||
extract_dir=path, members=[f'hf_sef/{subdir}' for subdir in | ||
('MEG', 'SSS', 'subjects')]), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@drammock do you know why we needed hf_Sef to be like this? These LOC I copied seems to break circle CI in : #9774
It seems to work perfectly fine w/ processor = 'untar'
untaring to the path. I confirm that the error was not raised when I changed the processor to just regular pooch.Untar(extract_dir=path
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
To clarify, I copied these from your original PR aimed to tackle #8674
…e-tools#9742)" This reverts commit c42a7b7.
Reference issue
First part of #8679 #9736
closes #9756
What does this implement/fix?
Builds on top of @drammock work in #8679 to:
config.py
+dataset_checksums.txt
file. These define urls, archive names, configuration keys, folder names, and md5 hashesutils/fetching.py
with pooch usage in all datasets that used_fetch_file()
.I'm not sure how to handle bst, eegbci, hf_sef, limo and other complicated files yet. I think maybe I can copy @drammock's work? I'll have to investigate.fsaverage
files_fetch_file
and related filesNote that the checksums file for eegbci is 3058 lines long, so in reality this code diff is:
as of 09/15/21
Additional information
The next PR should handle:
_data_path
which would then address Adding an optional token to the dataset fetcher code to allow optional fetching from private repositories #9736_data_path
to a public function, perhaps...mne_dataset_path()
, which then can be used by 3rd parties who don't want to store data in the MNE datasets section.