Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Roary 3.13.0 fails at usegalaxy.org -- likely installation issue #293

Closed
jennaj opened this issue Mar 13, 2020 · 11 comments
Closed

Roary 3.13.0 fails at usegalaxy.org -- likely installation issue #293

jennaj opened this issue Mar 13, 2020 · 11 comments

Comments

@jennaj
Copy link
Member

@jennaj jennaj commented Mar 13, 2020

Tool: Roary the pangenome pipeline - Quickly generate a core gene alignment from gff3 files (Galaxy Version 3.13.0)

Workaround for end-users: Until the tool is corrected at usegalaxy.org and this ticket closes out, it can be used instead at usegalaxy.eu.

Troubleshooting: seems to have three problems

  1. working directory path is maybe incorrect?
  2. missing a datatype? ("dot")?
  3. error help reports that "duplicated inputs were used" -- they weren't

Test histories: use some of the tutorial data from here: https://training.galaxyproject.org/training-material/topics/assembly/

Error for the usegalaxy.org test. Is the same as reported at Galaxy Help here: https://help.galaxyproject.org/t/roary-fatal-error-exit-code-2/3164

Dataset Error
An error occurred while running the tool toolshed.g2.bx.psu.edu/repos/iuc/roary/roary/3.13.0.

Error Details
Execution resulted in the following messages:

Fatal error: Exit code 2 ()
Tool generated the following standard error:

Use of uninitialized value in require at /cvmfs/main.galaxyproject.org/deps/_conda/envs/__roary@3.13.0/lib/site_perl/5.26.2/x86_64-linux-thread-multi/Encode.pm line 61.
Usage: extract_proteome_from_gff [options] *.gff
Take in GFF files and create FASTA files of the protein sequences

Options: -o STR output suffix [proteome.faa]
-t INT translation table [11]
-f filter sequences with missing data
-v verbose output to STDOUT
-d STR output directory
-w print version and exit
-h this help message

For further info see: http://sanger-pathogens.github.io/Roary/
Usage: extract_proteome_from_gff [options] *.gff
Take in GFF files and create FASTA files of the protein sequences

Options: -o STR output suffix [proteome.faa]
-t INT translation table [11]
-f filter sequences with missing data
-v verbose output to STDOUT
-d STR output directory
-w print version and exit
-h this help message

For further info see: http://sanger-pathogens.github.io/Roary/
Cant open file: /galaxy-repl/main/jobdir/027/303/27303938/working/out/Sx2fKLLwy8/Prokka on data 11: gff.gff.proteome.faa
Galaxy job runner generated the following standard error:

WARNING:galaxy.model:Datatype class not found for extension 'dot'
WARNING:galaxy.model:Datatype class not found for extension 'dot'
WARNING:galaxy.model:Datatype class not found for extension 'dot'
WARNING:galaxy.model:Datatype class not found for extension 'dot'
WARNING:galaxy.model:Datatype class not found for extension 'dot'
WARNING:galaxy.model:Datatype class not found for extension 'dot'
WARNING:galaxy.model:Datatype class not found for extension 'dot'
WARNING:galaxy.model:Datatype class not found for extension 'dot'
WARNING:galaxy.model:Datatype class not found for extension 'dot'
WARNING:galaxy.model:Datatype class not found for extension 'dot'
Detected Common Potential Problems
The tool was executed with one or more duplicate input datasets. This frequently results in tool errors due to problematic input choices.

ping @davebx @mvdbeek @natefoo

@natefoo
Copy link
Member

@natefoo natefoo commented Mar 17, 2020

It's unclear to me what's broken here but apparently the Perl error is not the problem.

Loading

@natefoo
Copy link
Member

@natefoo natefoo commented Mar 17, 2020

One thing that stands out: why does the tool wrapper copy its inputs? Are symlinks not sufficient?

cp '/galaxy-repl/main/files/038/407/dataset_38407948.dat' 'Prokka on data 5: gff.gff' &&  cp '/galaxy-repl/main/files/038/411/dataset_38411630.dat' 'Prokka on data 11: gff.gff' &&   roary -f out -p ${GALAXY_SLOTS:-1} -e -n -i '95' -cd '99.0' -g '50000'  -t '11' -iv '1.5'  'Prokka on data 5: gff.gff' 'Prokka on data 11: gff.gff'

@Takadonet any thoughts? Looks like you originally implemented the input handling.

Loading

@Takadonet
Copy link

@Takadonet Takadonet commented Mar 17, 2020

Reason being is that Roary will follow the softlink and use that file name instead of the soft link name. All names would be dataset_###

Loading

@natefoo
Copy link
Member

@natefoo natefoo commented Mar 17, 2020

Ahhh gotcha. Blech, ok, thanks.

Loading

@natefoo
Copy link
Member

@natefoo natefoo commented Mar 17, 2020

Ah, it's not handling spaces in the input filenames:

Error: Cant access file /galaxy-repl/main/jobdir/027/303/27303938/working/Prokka
Error: Cant access file /galaxy-repl/main/jobdir/027/303/27303938/working/Prokka

Loading

@Takadonet
Copy link

@Takadonet Takadonet commented Mar 17, 2020

Probably. That is my mistake assuming that file name would be command line friendly.

Loading

@natefoo
Copy link
Member

@natefoo natefoo commented Mar 17, 2020

They're quoted, though, so I think roary is not reading those params correctly?

Loading

@Takadonet
Copy link

@Takadonet Takadonet commented Mar 17, 2020

Roary cannot handle them.

Loading

@natefoo
Copy link
Member

@natefoo natefoo commented Mar 17, 2020

@bgruening did you fix this manually somehow on usegalaxy.eu?

Loading

@bgruening
Copy link
Member

@bgruening bgruening commented Mar 18, 2020

I don't think so.

(venv) galaxy@sn04:~/shed_tools/toolshed.g2.bx.psu.edu/repos/iuc/roary/e02e9af2743f/roary$ hg diff
(venv) galaxy@sn04:~/shed_tools/toolshed.g2.bx.psu.edu/repos/iuc/roary/e02e9af2743f/roary$ 

Loading

@jennaj
Copy link
Member Author

@jennaj jennaj commented Mar 27, 2020

Summary:

  1. Roary is picky about the input dataset name format
  2. Spaces should be avoided
  3. All inputs should have a distinct name

Workarounds for end-users working with individual datasets:

  • If executing tools from the History: Click on the pencil icon for an input gff dataset to reach the Edit Attributes forms. On the first tab, modify the file name, removing any spaces, then save. Do this for all gff inputs to avoid the naming problem. Rerun Roary using those renamed inputs.
  • If executing tools from a Workflow: The output gff dataset generated by an upstream tool (likely Prokka) can be renamed to remove spaces as a "post job action" within the Workflow itself. This will pass the renamed gff inputs to Roary and avoid the naming problem.

Notes

The upstream tool commonly used (Prokka), when executed in Galaxy on individual datasets, will always insert spaces into the result dataset names.

When Prokka is executed with a collection input, spaces in dataset names are avoided from the start. Collections and workflows are worth learning about. If interested, please see:

Loading

@jennaj jennaj moved this from Install Problem to Done in Tool Lifecycle Sep 10, 2021
@jennaj jennaj closed this Sep 10, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
Linked pull requests

Successfully merging a pull request may close this issue.

None yet
4 participants