Ingest Narratives datasets #952

adelavega · 2021-06-15T03:57:10Z

Ingest Narratives dataset.

As part of this PR, I'm going to try to make it easier to ingest datasets in the future.
All Datalad wrangling of datasets should be done automatically, by a function setup_dataset which takes a remote_address and preproc_path. Alternatively, it can also take a local_path, which would indicate that everything is already set up properly.

This function will do the following:

Datalad clone of datasets, and add symlinks where necessary to create a single dataset
Datalad GET of raw files necessary for ingestion (.json, .tsv, and headers of functional. nii.gz)
Create template JSON config (default to all tasks)
When cloning, finds unique folder name, and json file will reflect that folder name

After this, the user can edit the json file prior to calling ingest_from_json, which does the follownig
-add_dataset
-add_task (looping over listed tasks)

In terms of extraction, I also want to separate it from the dataset config.
Instead, you can point the convert_stimuli or extract_features file to specific json files (sets specific to that dataset), or leave the fields empty and allow them to attempt to extract all features.

It may also be worth revising the master list of extractors, so that automatic extraction can be done at a larger scale. This may be better as its own PR, and would need more fault tolerance (continue w/ extraction if possible, and maybe save a progress, or remaining extraction).

Auto add citation to bibliography (make it requirement of add_dataset)
Update tests
Test add_dataset refactor on narratives
Ensure that task-json file is not required
Add extract_from_json function, or auto extract function
Expand list of fmriprep regressors for ingestion, and possibly filter some?
Auto drop raw niftis

Related TODOs:

Verify all datasets can compile/report on production
Check that sentry on staging is not going to the same bucket as production
Possibly put ingestion as a celery job?

codecov-commenter · 2021-06-15T03:57:21Z

Codecov Report

Merging #952 (e61b95a) into master (e6f49ac) will decrease coverage by 4.10%.
The diff coverage is 49.78%.

@@            Coverage Diff             @@
##           master     #952      +/-   ##
==========================================
- Coverage   82.69%   78.59%   -4.11%     
==========================================
  Files          63       63              
  Lines        3132     3228      +96     
==========================================
- Hits         2590     2537      -53     
- Misses        542      691     +149

Impacted Files	Coverage Δ
neuroscout/tests/api/test_dataset.py	`100.00% <ø> (ø)`
neuroscout/populate/convert.py	`44.80% <5.00%> (-27.46%)`	⬇️
neuroscout/populate/utils.py	`77.27% <33.33%> (-11.77%)`	⬇️
neuroscout/populate/setup.py	`42.60% <42.60%> (ø)`
neuroscout/models/dataset.py	`82.22% <50.00%> (-9.89%)`	⬇️
neuroscout/populate/extract.py	`64.67% <62.50%> (-2.00%)`	⬇️
neuroscout/populate/ingest.py	`88.15% <84.09%> (-2.36%)`	⬇️
neuroscout/populate/__init__.py	`100.00% <100.00%> (ø)`
neuroscout/tasks/utils/build.py	`92.30% <100.00%> (+0.17%)`	⬆️
neuroscout/tests/api/test_extractor.py	`100.00% <100.00%> (ø)`
... and 5 more

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update e6f49ac...e61b95a. Read the comment docs.

cypress · 2021-06-15T03:59:19Z

Test summary

7 • 0 • 0 • 0 • 1

Run details


Project	Neuroscout
Status	Passed
Commit	`e61b95a`
Started	Jul 26, 2021 9:21 PM
Ended	Jul 26, 2021 9:23 PM
Duration	01:56 💡
OS	Linux Ubuntu - 20.04
Browser	Electron 87

View run in Cypress Dashboard ➡️

Flakiness

	cypress/integration/analysis_builder.spec.js		1
1	Analysis Builder > analysis builder

This comment has been generated by cypress-bot as a result of this project's GitHub integration settings. You can manage this integration in this project's settings in the Cypress Dashboard

cypress · 2021-06-15T04:02:05Z

Test summary

7 • 0 • 0 • 0 • 1

Run details


Project	Neuroscout
Status	Passed
Commit	`843f839` ℹ️
Started	Jul 26, 2021 9:21 PM
Ended	Jul 26, 2021 9:24 PM
Duration	02:09 💡
OS	Linux Ubuntu - 20.04
Browser	Electron 87

View run in Cypress Dashboard ➡️

Flakiness

	cypress/integration/analysis_builder.spec.js		1
1	Analysis Builder > analysis builder

This comment has been generated by cypress-bot as a result of this project's GitHub integration settings. You can manage this integration in this project's settings in the Cypress Dashboard

…into ingest_narratives

Add basic narratives json dataset config

edc3bc1

adelavega added 26 commits June 28, 2021 19:49

Inital pass at setup function

7df0cb5

Finish setup_dataset

6a950eb

str Path

a432592

Use correct path

36ac72e

output json

27b036f

Don't add preprocesseD

763d884

Correct suffix

e580301

Index output json

d0ef85d

Get correct summary

379caa4

TMP: Set pybids to master

274595a

Add to manage

fc776cd

Seperate add_dataset to new function

e2e5f68

Update reingest, and arguments

4381433

Rename path argument

974139e

Change name to raw address

b6d3644

Change name of summary

a0bf1b5

Relative symlink

40f4f60

count function

bd90ddc

local path Pathlib

5f1552a

Open to write

2d13ba4

Add NNdb to bib

5d9ea2a

Add logic to add to bib upon ingestion

5e8a51c

Use 0.13.1 custom patch

40e73ff

Change flag to skip preproc

e22f0f2

Get dataset desc

136cff3

KeyError not ValueError

013b9db

adelavega added 26 commits July 12, 2021 18:10

Add sentry uri to config

b3c161e

Force reingest

01d750f

Preproc path not address

f22f39d

Optimize rebuilding of layout

a843d2d

Fix typo

4ed3935

Typo 2

2e74f88

fix age map

f9a7380

Vals not val

fa8a0a0

if vals

c7fa8b5

TMP Print summarY

4cc565a

Ignore TSV and download json sidecars

e95d442

Ignore regressors

dd4156d

Add auto fetch

1e85a99

config sentry not null

e5eabb4

Set to string, Sentry config value

7ec24b6

Get path

fe02bc9

Update predictor schema

8167186

Update config file

46d20f9

Merge remote-tracking branch 'origin/master' into ingest_narratives

a7177a7

Refactor extract_from_json file to only encompass extraction

6187cf1

Merge branch 'ingest_narratives' of github.com:neuroscout/neuroscout …

2440d05

…into ingest_narratives

Migrate all dataset configuration files

0edd5dd

Update active dataset count

7d85f69

Fix typo

a2f710b

Fix cypress backend setup

6ef6226

Add auditory extractor group

e61b95a

adelavega merged commit 47130b3 into master Jul 27, 2021

adelavega deleted the ingest_narratives branch July 27, 2021 16:45

adelavega mentioned this pull request Jul 28, 2021

Ingest Narratives dataset #873

Closed

adelavega mentioned this pull request Oct 25, 2022

INGEST: ds004007 #1090

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Ingest Narratives datasets #952

Ingest Narratives datasets #952

adelavega commented Jun 15, 2021 •

edited

codecov-commenter commented Jun 15, 2021 •

edited

cypress bot commented Jun 15, 2021 •

edited

cypress bot commented Jun 15, 2021 •

edited

Ingest Narratives datasets #952

Ingest Narratives datasets #952

Conversation

adelavega commented Jun 15, 2021 • edited

codecov-commenter commented Jun 15, 2021 • edited

Codecov Report

cypress bot commented Jun 15, 2021 • edited

Test summary

Run details

Flakiness

cypress bot commented Jun 15, 2021 • edited

Test summary

Run details

Flakiness

adelavega commented Jun 15, 2021 •

edited

codecov-commenter commented Jun 15, 2021 •

edited

cypress bot commented Jun 15, 2021 •

edited

cypress bot commented Jun 15, 2021 •

edited