Add slots to Biosample class to disambiguate standard MG metadata vs long-read MG metadata #1937

pkalita-lbl · 2024-04-22T18:47:04Z

Montana correctly pointed out in this comment that our implementation of long-read metagenomics was somewhat incomplete.

The changes implemented for microbiomedata/submission-schema#168 added a new JgiMgLrInterface class. It reuses slots that are also used by the JgiMgInterface class. That makes sense from a pure LinkML perspective, but unfortunately it misses an important point about how submission data is brought into MongoDB where it adheres to nmdc-schema.

In the submission data one sample's metadata might be spread across multiple submission-schema class instances (e.g. a SoilInterface instance and a JgiMgInterface instance), linked together by the unique sample name. When going into Mongo those instances get collapsed into one instance of the nmdc-schema Biosample class. The issue is that if, in the submission-schema data, one sample has both an JgiMgInterface instance and a JgiMgLrInterface the slots values for one will overwrite the other when squashing into a Biosample instance.

This is the reason why we currently need to have pairs of slots like dna_absorb1 and rna_absorb1 instead of just absorb1. With the introduction of long-read MG metadata these need to become triples of slots (e.g. rna_absorb1, dna_absorb1, and -- new -- dna_lr_absorb1)

The text was updated successfully, but these errors were encountered:

mslarae13 · 2024-04-24T17:24:15Z

Checking with Alicia if NMDC needs to store these slots. If so, which ones?

microbiomedata/issues#413 (comment)

pkalita-lbl · 2024-05-03T21:01:19Z

Removing this from Sprint 35. Not adding to a future sprint right now because it sounds like we need further input before proceeding.

mslarae13 · 2024-06-14T00:12:13Z

Decision was made on 06/12 during the metadata meeting

From @aclum in microbiomedata/issues#413 (comment)

would like to keep dna_isolate_meth and map it to a slot on NMDC's Extraction class.

We want to track dna_isolate_meth in NMDC, but this is the only slot.
We need to:

Add dna_isolate_meth_long & change dna_isolate_meth to dna_isolate_meth_short

POST BERK

add these 2 method slots to their correct berk-schema class
Update to an enum with controlled values if needed to map to https://microbiomedata.github.io/berkeley-schema-fy24/nucl_acid_ext/

pkalita-lbl self-assigned this Apr 22, 2024

pkalita-lbl mentioned this issue Apr 22, 2024

Develop Pacbio for NMDC submission microbiomedata/issues#413

Open

4 tasks

pkalita-lbl transferred this issue from microbiomedata/nmdc-server Apr 22, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add slots to Biosample class to disambiguate standard MG metadata vs long-read MG metadata #1937

Add slots to Biosample class to disambiguate standard MG metadata vs long-read MG metadata #1937

pkalita-lbl commented Apr 22, 2024

mslarae13 commented Apr 24, 2024

pkalita-lbl commented May 3, 2024

mslarae13 commented Jun 14, 2024 •

edited

Loading

Add slots to Biosample class to disambiguate standard MG metadata vs long-read MG metadata #1937

Add slots to Biosample class to disambiguate standard MG metadata vs long-read MG metadata #1937

Comments

pkalita-lbl commented Apr 22, 2024

mslarae13 commented Apr 24, 2024

pkalita-lbl commented May 3, 2024

mslarae13 commented Jun 14, 2024 • edited Loading

mslarae13 commented Jun 14, 2024 •

edited

Loading