-
Notifications
You must be signed in to change notification settings - Fork 414
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Workflows implicit conversions #5384
Comments
Note that implicitly converted datasets are also added to the history, but as hidden datasets (as far as I know).. they have the same name and number but are hidden. Therefore I would not agree with your statement:
But I would rather say that the following is optimal (trading space for run time):
Please note the ongoing efforts to have as many tools as possible to accept zipped and unzipped inputs: #2312 .. in the linked PRs you may also find some approached to do so on the tool side. Nevertheless I would say that the problem with the implicitly converted datasets is a bug, or @mvdbeek ?
@EngyNasr do you have a small example workflow where the failure occurs? |
Here is a small example workflow: https://usegalaxy.eu/u/engy.nasr/w/collection-implicit-conversion-example and a small example History: https://usegalaxy.eu/u/engy.nasr/h/collection-trial-implicit-conversion you can reproduce the history by running the workflow on the first collection "Spiked Samples" in the history. As you can see the Filter sequences by ID failed, but if you rerun the tool in the history it will succeed since it ran outside the workflow |
Implicit Conversions:
Some tools fails when they run in a workflow and succeed when they run alone without a workflow, for example Krakentools:
Extract Kraken Reads By ID
(older version) andFilter Sequence by ID
.After a bit of investigation we noticed that when these tools run alone (without being in a workflow) they perform an implicit decompressing of the input zipped files, which make the output successful, however when these same tools run with the same exact inputs with-in a workflow this implicit decompressing does not take place, which cause the output to fail.
Example of the implicit datatype conversion performed by the tools while running stand alone without a workflow in a history:
The initial solution was to add
Convert compressed file to uncompressed
tool in the workflow before these tools, as shown in green in the figure belowHowever, this initial solution is not the optimal, since by hundreds and thousands of sequence files the size will increase dramatically in the user's history by running the workflow.
For that we have proposed another solution by updating the tools wrappers themselves to perform the decompression internally without the need to use the
Convert compressed file to uncompressed
tool as we did to Krakentools:Extract Kraken Reads By ID
(current version).The most optimal solution would be updating Galaxy workflow to perform implicit conversions similar to the ones done when running the tool without a workflow
Important note: this implicit conversion issue only occur when the input is a collection of zipped files, so if the input is a single zipped file these tools work fine within and without a workflow
The text was updated successfully, but these errors were encountered: