Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Updates to submission portal jobs to handle sequencing data #329

Merged
merged 6 commits into from
Oct 19, 2023

Conversation

pkalita-lbl
Copy link
Collaborator

Re: #315 (will close once we actually run the job on production)

We previously had jobs to fetch a submission portal entry, translate it into Study and Biosample objects, validate, and submit to Mongo. It turns out Andrew Tritt's submission (and possibly future ones??) came with sequencing data. The submission portal doesn't have the capability to accept that information, so it was provided in a CSV file mapping sample IDs (that is, the ID provided in the submission portal, not an NMDC minted ID) to FASTA files.

These changes:

  • Update the submission portal jobs to accept two CSV files where the columns correspond to OmicsProcessing and DataObject slots, with one extra column to join on sample IDs. If URLs to such files are provided in the job config, each row will be expanded into the corresponding objects by the SubmissionPortalTranslator class.
  • The SubmissionPortalTranslator class and corresponding tests were also a little behind-the-times on some recent schema changes (DOI stuff, ID patterns, etc). That has been resolved, and the test now does a schema validation step to catch these things earlier (especially if we, ahem, ran tests against PRs in CI)
  • I cut out a bunch of sample data from the existing test case in tests/test_data/test_submission_portal_translator_data.yaml -- it was redundant as far as testing is concerned and it made test failure diffs hard to read. I added a second test case in that file to cover the case of providing OmicsProcessing and DataObject mapping data.

nmdc_runtime/site/ops.py Outdated Show resolved Hide resolved
eecavanna
eecavanna previously approved these changes Oct 18, 2023
Copy link
Collaborator

@eecavanna eecavanna left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for updating the tests! I wonder how it'll affect #301.

Looks good to me (I reviewed it in terms of "do I understand what this does in isolation"), with the caveat that I don't have experience with the submission portal or this part of the codebase (i.e. "do I understand what this does to the big picture").

Copy link
Collaborator

@eecavanna eecavanna left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You addressed all my concerns. Thanks!

@pkalita-lbl pkalita-lbl merged commit 46d6543 into main Oct 19, 2023
@pkalita-lbl pkalita-lbl deleted the issue-315 branch October 19, 2023 18:07
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants