New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Ingest Narratives datasets #952
Conversation
Codecov Report
@@ Coverage Diff @@
## master #952 +/- ##
==========================================
- Coverage 82.69% 78.59% -4.11%
==========================================
Files 63 63
Lines 3132 3228 +96
==========================================
- Hits 2590 2537 -53
- Misses 542 691 +149
Continue to review full report at Codecov.
|
Test summaryRun details
View run in Cypress Dashboard ➡️ Flakiness
This comment has been generated by cypress-bot as a result of this project's GitHub integration settings. You can manage this integration in this project's settings in the Cypress Dashboard |
Test summaryRun details
View run in Cypress Dashboard ➡️ Flakiness
This comment has been generated by cypress-bot as a result of this project's GitHub integration settings. You can manage this integration in this project's settings in the Cypress Dashboard |
…into ingest_narratives
Ingest Narratives dataset.
As part of this PR, I'm going to try to make it easier to ingest datasets in the future.
All Datalad wrangling of datasets should be done automatically, by a function
setup_dataset
which takes aremote_address
andpreproc_path
. Alternatively, it can also take alocal_path
, which would indicate that everything is already set up properly.This function will do the following:
After this, the user can edit the json file prior to calling
ingest_from_json
, which does the follownig-
add_dataset
-
add_task
(looping over listed tasks)In terms of extraction, I also want to separate it from the dataset config.
Instead, you can point the convert_stimuli or extract_features file to specific json files (sets specific to that dataset), or leave the fields empty and allow them to attempt to extract all features.
It may also be worth revising the master list of extractors, so that automatic extraction can be done at a larger scale. This may be better as its own PR, and would need more fault tolerance (continue w/ extraction if possible, and maybe save a progress, or remaining extraction).
add_dataset
)add_dataset
refactor on narrativesextract_from_json
function, or auto extract functionRelated TODOs: