Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

re-id sequencing omics records with missing data objects #596

Closed
1 task done
aclum opened this issue Jan 30, 2024 · 3 comments
Closed
1 task done

re-id sequencing omics records with missing data objects #596

aclum opened this issue Jan 30, 2024 · 3 comments
Assignees

Comments

@aclum
Copy link
Contributor

aclum commented Jan 30, 2024

Deliverable this task is associated with

_See Deliverables tab here: _

  • [Add the Deliverable #]

RACI

Tag people in their roles

Describe the the task
- [ ] We'll need to make new data objects for these three missing data objects. For nmdc:8a9d164e1310e5b838d6ceb492f64a61 the other two data objects exist (for omics processing ID nmdc:omprc-11-tdt0js09, formerly gold:Gp0452741)

Criteria for completion

  • SPARQL query returns no records for 'objects which are not subjects' query written by @turbomam.

Estimate people time

  • [Hours or days of people time. 1 person, 4 hours]

Completion Date (Goal)

  • [GOAL!]

Target Sprint Start & End Dates

  • Start:
  • End:

Tag Blocker/Contingent upon issues

  • [Tagg issues]
@aclum
Copy link
Contributor Author

aclum commented Jan 30, 2024

@aclum
Copy link
Contributor Author

aclum commented Jan 30, 2024

Two data objects for gold:Gp0452741 that do exist
{
"_id": {
"$oid": "649b00471ae706d7b5b1c2e2"
},
"id": "nmdc:9bd3cf378610c02776b54cc797d8c07a",
"name": "SAMEA7723902_ERR5003681_interleaved.fq.gz",
"description": "Raw interleaved fastq for SAMEA7723902_ERR5003681 (gold:Gp0452741)",
"md5_checksum": "9bd3cf378610c02776b54cc797d8c07a",
"url": "https://data.microbiomedata.org/data/raw/SAMEA7723902_ERR5003681_interleaved.fq.gz ",
"file_size_bytes": 114912166
}

{
"_id": {
"$oid": "649b00471ae706d7b5b1c2ed"
},
"id": "nmdc:9d5f99fba241d6bdd933ccbf405bf872",
"name": "SAMEA7723902_ERR5004468_interleaved.fq.gz",
"description": "Raw interleaved fastq for SAMEA7723902_ERR5004468 (gold:Gp0452741)",
"md5_checksum": "9d5f99fba241d6bdd933ccbf405bf872",
"url": "https://data.microbiomedata.org/data/raw/SAMEA7723902_ERR5004468_interleaved.fq.gz ",
"file_size_bytes": 125218381
}
There are two missing sra run ids from this grouping
ERR5003109 and ERR5001830

unclear what actually happened during analysis. Workflow activities only show one data object has has_input ( nmdc:9bd3cf378610c02776b54cc797d8c07a. cc @Michal-Babins @scanon @hubin-keio Could the workflow code only handle 1 fastq when these ran (EMP500 sample)?

@mbthornton-lbl
Copy link

Resolved by microbiomedata/nmdc-schema#1894

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
No open projects
Development

No branches or pull requests

2 participants