Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Ingest Narratives datasets #952

Merged
merged 68 commits into from Jul 27, 2021
Merged

Ingest Narratives datasets #952

merged 68 commits into from Jul 27, 2021

Conversation

adelavega
Copy link
Collaborator

@adelavega adelavega commented Jun 15, 2021

Ingest Narratives dataset.

As part of this PR, I'm going to try to make it easier to ingest datasets in the future.
All Datalad wrangling of datasets should be done automatically, by a function setup_dataset which takes a remote_address and preproc_path. Alternatively, it can also take a local_path, which would indicate that everything is already set up properly.

This function will do the following:

  • Datalad clone of datasets, and add symlinks where necessary to create a single dataset
  • Datalad GET of raw files necessary for ingestion (.json, .tsv, and headers of functional. nii.gz)
  • Create template JSON config (default to all tasks)
  • When cloning, finds unique folder name, and json file will reflect that folder name

After this, the user can edit the json file prior to calling ingest_from_json, which does the follownig
-add_dataset
-add_task (looping over listed tasks)

In terms of extraction, I also want to separate it from the dataset config.
Instead, you can point the convert_stimuli or extract_features file to specific json files (sets specific to that dataset), or leave the fields empty and allow them to attempt to extract all features.

It may also be worth revising the master list of extractors, so that automatic extraction can be done at a larger scale. This may be better as its own PR, and would need more fault tolerance (continue w/ extraction if possible, and maybe save a progress, or remaining extraction).

  • Auto add citation to bibliography (make it requirement of add_dataset)
  • Update tests
  • Test add_dataset refactor on narratives
  • Ensure that task-json file is not required
  • Add extract_from_json function, or auto extract function
  • Expand list of fmriprep regressors for ingestion, and possibly filter some?
  • Auto drop raw niftis

Related TODOs:

  • Verify all datasets can compile/report on production
  • Check that sentry on staging is not going to the same bucket as production
  • Possibly put ingestion as a celery job?

@codecov-commenter
Copy link

codecov-commenter commented Jun 15, 2021

Codecov Report

Merging #952 (e61b95a) into master (e6f49ac) will decrease coverage by 4.10%.
The diff coverage is 49.78%.

Impacted file tree graph

@@            Coverage Diff             @@
##           master     #952      +/-   ##
==========================================
- Coverage   82.69%   78.59%   -4.11%     
==========================================
  Files          63       63              
  Lines        3132     3228      +96     
==========================================
- Hits         2590     2537      -53     
- Misses        542      691     +149     
Impacted Files Coverage Δ
neuroscout/tests/api/test_dataset.py 100.00% <ø> (ø)
neuroscout/populate/convert.py 44.80% <5.00%> (-27.46%) ⬇️
neuroscout/populate/utils.py 77.27% <33.33%> (-11.77%) ⬇️
neuroscout/populate/setup.py 42.60% <42.60%> (ø)
neuroscout/models/dataset.py 82.22% <50.00%> (-9.89%) ⬇️
neuroscout/populate/extract.py 64.67% <62.50%> (-2.00%) ⬇️
neuroscout/populate/ingest.py 88.15% <84.09%> (-2.36%) ⬇️
neuroscout/populate/__init__.py 100.00% <100.00%> (ø)
neuroscout/tasks/utils/build.py 92.30% <100.00%> (+0.17%) ⬆️
neuroscout/tests/api/test_extractor.py 100.00% <100.00%> (ø)
... and 5 more

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update e6f49ac...e61b95a. Read the comment docs.

@cypress
Copy link

cypress bot commented Jun 15, 2021



Test summary

7 0 0 0Flakiness 1


Run details

Project Neuroscout
Status Passed
Commit e61b95a
Started Jul 26, 2021 9:21 PM
Ended Jul 26, 2021 9:23 PM
Duration 01:56 💡
OS Linux Ubuntu - 20.04
Browser Electron 87

View run in Cypress Dashboard ➡️


Flakiness

cypress/integration/analysis_builder.spec.js Flakiness
1 Analysis Builder > analysis builder

This comment has been generated by cypress-bot as a result of this project's GitHub integration settings. You can manage this integration in this project's settings in the Cypress Dashboard

@cypress
Copy link

cypress bot commented Jun 15, 2021



Test summary

7 0 0 0Flakiness 1


Run details

Project Neuroscout
Status Passed
Commit 843f839 ℹ️
Started Jul 26, 2021 9:21 PM
Ended Jul 26, 2021 9:24 PM
Duration 02:09 💡
OS Linux Ubuntu - 20.04
Browser Electron 87

View run in Cypress Dashboard ➡️


Flakiness

cypress/integration/analysis_builder.spec.js Flakiness
1 Analysis Builder > analysis builder

This comment has been generated by cypress-bot as a result of this project's GitHub integration settings. You can manage this integration in this project's settings in the Cypress Dashboard

@adelavega adelavega merged commit 47130b3 into master Jul 27, 2021
@adelavega adelavega deleted the ingest_narratives branch July 27, 2021 16:45
@adelavega adelavega mentioned this pull request Oct 25, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants