-
Notifications
You must be signed in to change notification settings - Fork 8
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Use MIxS samp_taxon_id
to model NCBI taxonomy ID from GOLD?
#574
Comments
host_taxon_id
slot to schemancbi_taxon_id
slot to schema
Comments from discussion with @turbomam: There is a slot in MIxS called specific_host already, which models the taxonomy name or id. We have two approaches here:
|
@sujaypatil96 is this something you're currently working on? Can I add it to the sprint board for January? |
Model this similar to ENVO terms. Followup with GSC to figure out why there are three similar terms for taxon id. |
Following up on this, the three MIxS terms with confusing definitions are: |
@sujaypatil96 I'll move this to the next sprint but let me know if you won't be actively working on it for the next couple of weeks |
@ssarrafan yup, we plan to address this at the metadata call today. |
The information that we intend to capture in the schema from GOLD is the NCBI taxonomy ID (this is the label for the field that appears on the JGI GOLD website) For example, below is a snippet of the output from the GOLD API:
Semantic considerations
"soil metagenome" is not a host, so neither of the MIxS terms with the word "host" is applicable, leaving only the samp_taxon_id term from the original 3 candidates (specific_host, host_taxid and samp_taxon_id) Format considerationsLooking at the
We could reconstruct a value that looks like the above example using two GOLD fields - ncbiTaxName and ncbiTaxId. We propose replacing the syntax implied by the MIxS example above with syntax like: |
ncbi_taxon_id
slot to schemasamp_taxon_id
to model NCBI taxonomy ID from GOLD?
Agree, samp_taxon_id is the right slot. This is NOT collected by the user or in the submission portal, but I think that’s fine. This can just be a GOLD assigned field
These descriptions are more or less forward and reverse, so how they're different is unclear, but I think this is a different issue. Decision during Wednesday 1pm metadata meeting Format change ncbiTaxName to NCBITaxon:#### |
Thanks for the notes, @mslarae13 A couple of questions:
See regexr.com for experimenting with those patterns |
Add a slot called
ncbi_taxon_id
to capture the host taxonomy ID as present in GOLD, and other sources possibly.For example, let's look at
Gb0291745
on the GOLD website: https://gold.jgi.doe.gov/biosample?id=Gb0291745Look under the Host Metadata section, and you'll see a value of 3689 associated with the host taxonomy ID.
We need a slot under the Biosample class to capture this information. In this issue, I'm proposing the addition of a slot called
ncbi_taxon_id
generally, and asserting its usage in the Biosample class using it'sslots
property.The text was updated successfully, but these errors were encountered: