Updates to submission portal jobs to handle sequencing data #329

pkalita-lbl · 2023-10-18T18:10:32Z

Re: #315 (will close once we actually run the job on production)

We previously had jobs to fetch a submission portal entry, translate it into Study and Biosample objects, validate, and submit to Mongo. It turns out Andrew Tritt's submission (and possibly future ones??) came with sequencing data. The submission portal doesn't have the capability to accept that information, so it was provided in a CSV file mapping sample IDs (that is, the ID provided in the submission portal, not an NMDC minted ID) to FASTA files.

These changes:

Update the submission portal jobs to accept two CSV files where the columns correspond to OmicsProcessing and DataObject slots, with one extra column to join on sample IDs. If URLs to such files are provided in the job config, each row will be expanded into the corresponding objects by the SubmissionPortalTranslator class.
The SubmissionPortalTranslator class and corresponding tests were also a little behind-the-times on some recent schema changes (DOI stuff, ID patterns, etc). That has been resolved, and the test now does a schema validation step to catch these things earlier (especially if we, ahem, ran tests against PRs in CI)
I cut out a bunch of sample data from the existing test case in tests/test_data/test_submission_portal_translator_data.yaml -- it was redundant as far as testing is concerned and it made test failure diffs hard to read. I added a second test case in that file to cover the case of providing OmicsProcessing and DataObject mapping data.

…urrent nmdc-schema

…ce OmicsProcessing and DataObject records

nmdc_runtime/site/ops.py

nmdc_runtime/site/translation/submission_portal_translator.py

tests/test_data/test_submission_portal_translator.py

nmdc_runtime/site/translation/submission_portal_translator.py

eecavanna

Thanks for updating the tests! I wonder how it'll affect #301.

Looks good to me (I reviewed it in terms of "do I understand what this does in isolation"), with the caveat that I don't have experience with the submission portal or this part of the codebase (i.e. "do I understand what this does to the big picture").

eecavanna

You addressed all my concerns. Thanks!

pkalita-lbl added 5 commits October 16, 2023 10:14

Update submission portal translator to pass validation according to c…

29c23ff

…urrent nmdc-schema

Add op for getting CSV data from arbitrary URL

395842a

Update submission portal graphs to accept a CSV mapping file to produ…

2926508

…ce OmicsProcessing and DataObject records

Revert docker compose test changes

0f7d4ca

Revert docker compose test change

ca5c780

pkalita-lbl requested review from dwinston and eecavanna October 18, 2023 18:10