Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Commands to find mal-formed workflow version and to create updates for Database #186

Closed
aclum opened this issue May 24, 2024 · 6 comments
Closed
Assignees

Comments

@aclum
Copy link
Contributor

aclum commented May 24, 2024

Implement 2 commands:

  • find-affected

These are re-id studies where the there are two versions appended for read qc and reads based analysis

nmdc:sty-11-076c9980 -103 read_qc_analysis_activity_set
nmdc:sty-11-dcqce727 - 91 read_qc_analysis_activity_set

nmdc:sty-11-076c9980 - 103 read_based_taxonomy_analysis_activity_set
nmdc:sty-11-dcqce727 - 48 nmdc:sty-11-dcqce727

@ssarrafan
Copy link

Appears to be active. Moving to next sprint

@mbthornton-lbl mbthornton-lbl changed the title workflow increment issues in prod data from re-iding Commands to find mal-formed workflow version and to create updates for Database Jun 18, 2024
@aclum
Copy link
Contributor Author

aclum commented Jun 24, 2024

@mbthornton-lbl update summary of the issue

read_qc_analysis_activity_set
194 workflow activity records in mongo with multiple versions IDs (.1.1).
Of those 194:
190 the data_object_set records do not match, they are .1.1.1 - URLs don't resolve
4 of the data_object_set records are consistent. - URLs do resolve
108 workflow activity records are .1 in mongo but the data_object_set record name and url are .1.1 - URLs don't resolve

read_based_taxonomy_analysis_activity_set
151 workflow activity records in monogo with multiple version IDs (.1.1).
Of those 151:
149 the data_object_set records do not match, they are .1.1.1 - URLs don't resolve
2 of the data_object_set records are consistent -URLs do resolve
108 workflow activity records are .1 in mongo but the data_object_set record name and url are .1.1 - URLs don't resolve

metagenome_assembly_set
no activity records in mongo have multiple versions
79 identifiers the ID with the blade that is in mongo doesn't exist on the file system - URLs don't resolve
180 workflow activity records are .1 in mongo but the data_object_set record name and url are .1.1 - URLs don't resolve

@mbthornton-lbl
Copy link
Contributor

@mbthornton-lbl
Copy link
Contributor

Summary of find-affected-workflows run on Prod:

Total of 346 Malformed Workflow ID versions found in the database
194 from reads QC
151 read based taxonomy
0 from metagenome assembly

Malformed data paths on the Filesystem:
180 malformed data file paths for metagenome assembly

@aclum
Copy link
Contributor Author

aclum commented Jun 25, 2024

I've spun out the ID blade issue to a new ticket #201. This ticket., #186 is higher priority to fix than #201, since it impacts more records.

@mbthornton-lbl
Copy link
Contributor

This is a duplicate of
#198
and can be closed

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
3 participants