You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Montana correctly pointed out in this comment that our implementation of long-read metagenomics was somewhat incomplete.
The changes implemented for microbiomedata/submission-schema#168 added a new JgiMgLrInterfaceclass. It reuses slots that are also used by the JgiMgInterfaceclass. That makes sense from a pure LinkML perspective, but unfortunately it misses an important point about how submission data is brought into MongoDB where it adheres to nmdc-schema.
In the submission data one sample's metadata might be spread across multiple submission-schema class instances (e.g. a SoilInterface instance and a JgiMgInterface instance), linked together by the unique sample name. When going into Mongo those instances get collapsed into one instance of the nmdc-schemaBiosampleclass. The issue is that if, in the submission-schema data, one sample has both an JgiMgInterface instance and a JgiMgLrInterface the slots values for one will overwrite the other when squashing into a Biosample instance.
This is the reason why we currently need to have pairs of slots like dna_absorb1 and rna_absorb1 instead of just absorb1. With the introduction of long-read MG metadata these need to become triples of slots (e.g. rna_absorb1, dna_absorb1, and -- new -- dna_lr_absorb1)
The text was updated successfully, but these errors were encountered:
Montana correctly pointed out in this comment that our implementation of long-read metagenomics was somewhat incomplete.
The changes implemented for microbiomedata/submission-schema#168 added a new
JgiMgLrInterface
class. It reuses slots that are also used by theJgiMgInterface
class. That makes sense from a pure LinkML perspective, but unfortunately it misses an important point about how submission data is brought into MongoDB where it adheres tonmdc-schema
.In the submission data one sample's metadata might be spread across multiple
submission-schema
class instances (e.g. aSoilInterface
instance and aJgiMgInterface
instance), linked together by the unique sample name. When going into Mongo those instances get collapsed into one instance of thenmdc-schema
Biosample
class. The issue is that if, in thesubmission-schema
data, one sample has both anJgiMgInterface
instance and aJgiMgLrInterface
the slots values for one will overwrite the other when squashing into aBiosample
instance.This is the reason why we currently need to have pairs of slots like
dna_absorb1
andrna_absorb1
instead of justabsorb1
. With the introduction of long-read MG metadata these need to become triples of slots (e.g.rna_absorb1
,dna_absorb1
, and -- new --dna_lr_absorb1
)The text was updated successfully, but these errors were encountered: