Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Develop Pacbio for NMDC submission #413

Open
2 of 4 tasks
mslarae13 opened this issue Sep 6, 2023 · 13 comments
Open
2 of 4 tasks

Develop Pacbio for NMDC submission #413

mslarae13 opened this issue Sep 6, 2023 · 13 comments
Assignees
Labels

Comments

@mslarae13
Copy link
Contributor

mslarae13 commented Sep 6, 2023

Deliverable this task is associated with

See Deliverables tab here: https://docs.google.com/spreadsheets/d/1z_b6WbuTk4pI0Q-Z-rfCgC-8R3m3F2_JDevYuK8CjYE/edit?usp=sharing

  • 3

RACI

Tag people in their roles

Describe the the task?

Criteria for completion

  • Users can submit metadata via NMDC for JGI Pacbio analysis

Estimate people time

  • 8

Completion Date (Goal)

  • Oct 20
  • Rescheduled, Feb 23rd

Target Sprint Start & End Dates

  • Start: Sept 11
  • End: Oct 20 Feb 23

Tag Blocker/Contingent upon isues

  • [Tagg issues]
@mslarae13 mslarae13 self-assigned this Sep 6, 2023
@mslarae13
Copy link
Contributor Author

See above issue, the slots that are different for metagenome - long reads have different requirememnts for metagenome - short reads.

To accomplish this check for JGI sample submission, the long and short reads will be split apart, in the "multi omics" data selection, if metaG is selected, additional check boxes will appear for long or short reads.

In the metadata file, long and short reads will be added to the analysis type option

when selected, those assigned long will appear in a "JGI Metagenomics - Long Reads" template tab... and those assigned to short will appear in a "JGI Metagenomics - Short Reads" tab.

@pkalita-lbl when do you think we can work on this? I think I or @bmeluch can make the updates to add the interface and the requirements updates?

@pkalita-lbl
Copy link
Contributor

@mslarae13 can I turn the question around and ask when do we need to have this done?

@ssarrafan
Copy link
Contributor

At least questions are in progress. Moving to next sprint. @mslarae13 let me know if this should be in the backlog instead.

@mslarae13
Copy link
Contributor Author

We should do this as part of the expansion / updates to the submission portal interface.
See #433

I think this rolls into the the updating tabs task

@mslarae13
Copy link
Contributor Author

In schema, pacbio instrument will capture that it's long reads.

@mslarae13
Copy link
Contributor Author

Decided to separate out long and short reads for metaGs at step 4, Multi-Omics data (for JGI), and on the analysis slot. When a user selects metaG they can choose long or short read.

@mslarae13 mslarae13 assigned pkalita-lbl and unassigned mslarae13 Feb 1, 2024
@mslarae13
Copy link
Contributor Author

@pkalita-lbl

Functionality on the submission portal is great and works with no issues
I did have a realization / question about a potential problem

It was previously asked by Mark if the dna_slot and rna_slot (s) could beconsolidated to just a single slot.
You concluded here that no, because the data goes into mongo associated with a single biosample.

If sample 1 has long and short read data, don't we have the same issue?

@mslarae13 mslarae13 self-assigned this Apr 19, 2024
@pkalita-lbl
Copy link
Contributor

pkalita-lbl commented Apr 22, 2024

🤦🏻 🤦🏻 🤦🏻

Yes, you're absolutely right. That's my bad for not thinking of that. I'll make a new issue to deal with that.

In the meantime, it doesn't really hurt anything to collect data like this in the submission portal. But if we get any submissions with data like that, we'll just need to hold off on bringing that data into Mongo until the issue is resolved.

EDIT: Here's the new issue microbiomedata/nmdc-schema#1937

@mslarae13
Copy link
Contributor Author

Thanks @pkalita-lbl !
Let's check with @aclum

Alicia, of these DNA vs RNA slots that are JGI specific... do we need to store any of them in NMDC/mongo? Or can they be considered "submission portal & UF specific"?

See the slots in MGInterface in submission schema : https://github.com/microbiomedata/submission-schema/blob/0b9413915f63bd7fa9be70f32061db49dc422009/src/nmdc_submission_schema/schema/nmdc_submission_schema.yaml#L34798

@ssarrafan
Copy link
Contributor

Thanks @pkalita-lbl ! Let's check with @aclum

Alicia, of these DNA vs RNA slots that are JGI specific... do we need to store any of them in NMDC/mongo? Or can they be considered "submission portal & UF specific"?

See the slots in MGInterface in submission schema : https://github.com/microbiomedata/submission-schema/blob/0b9413915f63bd7fa9be70f32061db49dc422009/src/nmdc_submission_schema/schema/nmdc_submission_schema.yaml#L34798

@aclum can you respond to this when you get a chance?
I'll remove this from the sprint and add backlog label since it hasn't been updated for 2 weeks.

@aclum
Copy link
Contributor

aclum commented May 6, 2024

I would like to keep dna_isolate_meth and map it to a slot on NMDC's Extraction class. However in looking at those slots we've conflated extraction target and how the extraction was done into one permissible value. If we were to store the values JGI has we'd need to just allow this to be a string b/c JGI doesn't place any CV on this so this needs further discussion with @turbomam

@mslarae13
Copy link
Contributor Author

we've conflated extraction target and how the extraction was done into one permissible value.
Fixed in nmdc-schema and merged into berk-schema.

Make short and long read, deal with mapping the 1 field we care about back later.

@mslarae13
Copy link
Contributor Author

Schema change, post berk.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
Status: 🏗 In Progress
Development

No branches or pull requests

4 participants