Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Re-ID Studies in Napa, verify schema compliance, and ensure pre-requisites for Prod Re-ID #1807

Closed
14 tasks done
mbthornton-lbl opened this issue Feb 28, 2024 · 4 comments
Closed
14 tasks done
Assignees
Labels
Epic An Epic contains multiple related Issues. They will generally span multiple srpints.

Comments

@mbthornton-lbl
Copy link
Contributor

mbthornton-lbl commented Feb 28, 2024

Test Runs on Napa instance and pre-requisites for Metagenomic workflows.
Note:
Context for this test run assumes the the following updates have been applied to the testDB instance:

  • Updates to study, biosample, and omics_processing
  • Updates for Metaproteomic, Metabolomic and NOM workflows and data

microbiomedata/issues#532

Re-ID Studies:

  1. "Stegen": nmdc:sty-11-aygzgv51, formerly gold:Gs0114663
  2. "SPRUCE": nmdc:sty-11-33fbta56, formerly gold:Gs0110138
  3. "EMP": nmdc:sty-11-547rwq94, formerly gold:Gs0154244
  4. "Luquillo": nmdc:sty-11-076c9980, formerly gold:Gs0128850
  5. "CrestedButte" nmdc:sty-11-dcqce727, formerly gold:Gs0135149
  6. "DeepShale" nmdc:sty-11-8fb6t785, formerly gold:Gs0114675
  7. "Populus": nmdc:sty-11-1t150432, formerly gold:Gs0103573

Pre-Requisites - complete ETL on Napa instance and verify:

Pre-requisites - All ETL recipes fully reproducable:

Study, BioSample and Omics:

Metagenomics:

@mbthornton-lbl mbthornton-lbl self-assigned this Feb 28, 2024
@mbthornton-lbl
Copy link
Contributor Author

SPARQL query for Orphaned DataObjects:

PREFIX nmdc: <https://w3id.org/nmdc/>
PREFIX rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#>
# Orphan DataObjects - not object of has_input or has_output
select * where { 
    ?do a nmdc:DataObject .
    minus {
        ?o nmdc:has_input ?do .
    }
    minus {
        ?o nmdc:has_output ?do .
    }
} limit 100 

@mbthornton-lbl mbthornton-lbl added the Epic An Epic contains multiple related Issues. They will generally span multiple srpints. label Mar 8, 2024
@aclum
Copy link
Contributor

aclum commented Mar 8, 2024

I reopened SPRUCE but this likely pertains to all the studies. We are missing deleting some of the binning data object records. ie
{'description':{$regex:/Gp0208377/}}
from the SPRUCE example data object type are Metagenome Bins,CheckM Statistics or null. The null ones, based on this example, could be captured by a case insensitive search for metabat2 on slot description

@mbthornton-lbl
Copy link
Contributor Author

mbthornton-lbl commented Mar 11, 2024

@aclum Are we deleting all Binning data objects, or only those with a non-compliant ID?
Should records like this one:

record: nmdc:dobj-11-qm3fbt63 CheckM Statistics CheckM for nmdc:wfmag-11-m0t5hc17.1

be deleted?

NO only non-compliant identifiers

@mbthornton-lbl mbthornton-lbl changed the title Re-ID Studies in Production, and verify Napa compliance Re-ID Studies in Napa, verify schema compliance and readiness for Prod Re-ID Apr 16, 2024
@mbthornton-lbl mbthornton-lbl changed the title Re-ID Studies in Napa, verify schema compliance and readiness for Prod Re-ID Re-ID Studies in Napa, verify schema compliance, and ensure pre-requisites for Prod Re-ID Apr 16, 2024
@ssarrafan
Copy link
Collaborator

@mbthornton-lbl will be continuing to work on this in the next sprint per Slack message. Moving over.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Epic An Epic contains multiple related Issues. They will generally span multiple srpints.
Projects
Development

No branches or pull requests

3 participants