re-id sequencing omics records with missing data objects #596

aclum · 2024-01-30T01:10:49Z

Deliverable this task is associated with

_See Deliverables tab here: _

[Add the Deliverable #]

RACI

Tag people in their roles

Responsible: @mbthornton-lbl , @sujaypatil96
Accountable: @aclum
Consulted:
Informed:

Describe the the task
- [ ] We'll need to make new data objects for these three missing data objects. For nmdc:8a9d164e1310e5b838d6ceb492f64a61 the other two data objects exist (for omics processing ID nmdc:omprc-11-tdt0js09, formerly gold:Gp0452741)

Resolved by: Resolve omics_processing has_output data_object referential integrity exceptions in study nmdc:sty-11-dcqce727 nmdc-schema#1894

Criteria for completion

SPARQL query returns no records for 'objects which are not subjects' query written by @turbomam.

Estimate people time

[Hours or days of people time. 1 person, 4 hours]

Completion Date (Goal)

[GOAL!]

Target Sprint Start & End Dates

Start:
End:

Tag Blocker/Contingent upon issues

[Tagg issues]

aclum · 2024-01-30T01:15:04Z

20240129.missing_has_output.nmdc.omics_processing_set.json

aclum · 2024-01-30T01:34:46Z

Two data objects for gold:Gp0452741 that do exist
{
"_id": {
"$oid": "649b00471ae706d7b5b1c2e2"
},
"id": "nmdc:9bd3cf378610c02776b54cc797d8c07a",
"name": "SAMEA7723902_ERR5003681_interleaved.fq.gz",
"description": "Raw interleaved fastq for SAMEA7723902_ERR5003681 (gold:Gp0452741)",
"md5_checksum": "9bd3cf378610c02776b54cc797d8c07a",
"url": "https://data.microbiomedata.org/data/raw/SAMEA7723902_ERR5003681_interleaved.fq.gz ",
"file_size_bytes": 114912166
}

{
"_id": {
"$oid": "649b00471ae706d7b5b1c2ed"
},
"id": "nmdc:9d5f99fba241d6bdd933ccbf405bf872",
"name": "SAMEA7723902_ERR5004468_interleaved.fq.gz",
"description": "Raw interleaved fastq for SAMEA7723902_ERR5004468 (gold:Gp0452741)",
"md5_checksum": "9d5f99fba241d6bdd933ccbf405bf872",
"url": "https://data.microbiomedata.org/data/raw/SAMEA7723902_ERR5004468_interleaved.fq.gz ",
"file_size_bytes": 125218381
}
There are two missing sra run ids from this grouping
ERR5003109 and ERR5001830

unclear what actually happened during analysis. Workflow activities only show one data object has has_input ( nmdc:9bd3cf378610c02776b54cc797d8c07a. cc @Michal-Babins @scanon @hubin-keio Could the workflow code only handle 1 fastq when these ran (EMP500 sample)?

mbthornton-lbl · 2024-04-05T19:21:56Z

Resolved by microbiomedata/nmdc-schema#1894

aclum assigned mbthornton-lbl Jan 30, 2024

aclum mentioned this issue Apr 4, 2024

Resolve referential integrity exceptions for nmdc:sty-11-547rwq94 Metagenome Omics Processing records microbiomedata/nmdc-schema#1897

Open

mbthornton-lbl closed this as completed Apr 5, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

re-id sequencing omics records with missing data objects #596

re-id sequencing omics records with missing data objects #596

aclum commented Jan 30, 2024 •

edited by mbthornton-lbl

Loading

aclum commented Jan 30, 2024

aclum commented Jan 30, 2024

mbthornton-lbl commented Apr 5, 2024

re-id sequencing omics records with missing data objects #596

re-id sequencing omics records with missing data objects #596

Comments

aclum commented Jan 30, 2024 • edited by mbthornton-lbl Loading

aclum commented Jan 30, 2024

aclum commented Jan 30, 2024

mbthornton-lbl commented Apr 5, 2024

aclum commented Jan 30, 2024 •

edited by mbthornton-lbl

Loading