Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Duplicate HIDs on usegalaxy.org #3818

Open
jmchilton opened this issue Mar 26, 2017 · 6 comments
Open

Duplicate HIDs on usegalaxy.org #3818

jmchilton opened this issue Mar 26, 2017 · 6 comments

Comments

@jmchilton
Copy link
Member

See @bgruening comment and sample history here #3816 (comment). It would seem to be unrelated to the original issue to me so I wanted to track it here. We don't and have never relied on only one thread generating HIDs to ensure uniqueness - so it doesn't seem related to me. It should be enforced by Postgres though so this is very troublesome for sure - xref https://github.com/galaxyproject/galaxy/blob/dev/lib/galaxy/model/mapping.py#L2528.

@bgruening
Copy link
Member

@yvanlebras has a history where a history can not be extracted.

The only thing that I can find in our logs is:

May 16 11:19:51 sn06.galaxyproject.eu gunicorn[3043719]: galaxy.workflow.extract INFO 2024-05-16 11:19:51,877 [pN:main.4,p:3181069,tN:WSGI_3] Cannot find implicit input collection for reference_genome|own_file
May 16 11:19:51 sn06.galaxyproject.eu gunicorn[3043719]: galaxy.workflow.extract WARNING 2024-05-16 11:19:51,882 [pN:main.4,p:3181069,tN:WSGI_3] duplicate hid found in extract_steps [13]
May 16 11:19:51 sn06.galaxyproject.eu gunicorn[3043719]: galaxy.workflow.extract WARNING 2024-05-16 11:19:51,890 [pN:main.4,p:3181069,tN:WSGI_3] duplicate hid found in extract_steps [16]
May 16 11:19:51 sn06.galaxyproject.eu gunicorn[3043719]: galaxy.workflow.extract WARNING 2024-05-16 11:19:51,890 [pN:main.4,p:3181069,tN:WSGI_3] Failed to find matching implicit job - job id is 69246218, implicit pairs are [('output_html', <galaxy.model.HistoryDatasetCollectionAssociation(2569404) at 0x7f09bef7a6d0>)], assoc_name is raw_data.

I tried to find it on sentry, but had no luck.

If I can help with more details, let me please know.

@yvanlebras
Copy link
Contributor

Triskel made 3 histories. One with raw data. Then 2 different histories where input data (data collections) were "copied" from the first history. Then, on the 2 last histories, she apply almost same tools to have results. One history is ok when extraction of workflow, the other not. One possibility, is maybe that for creating the third history, she copied the 2 data collections from the second history and not the first one, so on the third history, input data collections are copied from the second history who are copied from the first.... My 2 cents for now

@yvanlebras
Copy link
Contributor

Triskel just tried to copy the history who doesn't work for workflow extraction, then delete all analytical steps (keeping only input data collections), then apply a tool on it, and here, workflow extraction works. So it seems this is maybe more related to an analytical step than input datasets...

@yvanlebras
Copy link
Contributor

ok, it seems this is due to the qualimap bam tool https://ecology.usegalaxy.eu/root?tool_id=toolshed.g2.bx.psu.edu/repos/iuc/qualimap_bamqc/qualimap_bamqc/2.2.2c+galaxy1. Deleting this step allow Triskel to extract a workflow!

@mvdbeek
Copy link
Member

mvdbeek commented May 16, 2024

It's an unfortunate design choice, but implicit conversions create datasets with the same hid as the source, this isn't problematic per se. Extraction from histories could surely use a good overhaul, in the meantime it would help to share a record of a problematic history, so we can track that in a separate issue.

@yvanlebras
Copy link
Contributor

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants