-
Notifications
You must be signed in to change notification settings - Fork 25
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
SignalP jobs failing with "error running HOW" #24
Comments
Testing with the latest code showed another error, my mistake with past
That's fixed in e1a43a4 and this now gives:
I should perhaps special case not being able to find the |
This also affects the RXLR Galaxy tool which calls SignalP via this |
I now strongly suspect this is a file system issue, where the temporary FASTA file I have created is not ready for reading when SignalP is launched. |
The temporary FASTA files are probably not involved. Running this single threaded and testing with a single temporary FASTA file (with sleeps after creating it), I am currently seeing about 80% failure to 20% success for the same job via Galaxy. Adding debugging to the
The problem is inside
The failing step is this multi-line command using a bash here-document to pipe text into the black box binary $HOW <<END_OF_HOW | $AWK -v head=$HEAD '
BEGIN {if (head) out=1} # Get everything
/^ T\*SAMPLE\*/ {out=1} # Get default output
/^ #/ {out=1} # Get -w or -s output
/^ *\*\**[^*]/ {out=1;error=1} # Get error messages always!
out==1
END { if (!out) error=1 # No output = error
exit(error)
}
' || exit 1
...
END_OF_HOW Both files It is unclear if the problem is one of these, the stdin to the |
We ran into this error message on a nextflow pipeline, predector (https://github.com/ccdmb/predect/) |
We had SignalP working nicely in Galaxy on our old instance running on the cluster as the Galaxy user, but on our new Galaxy instance running on the same cluster as the associated user's Linux account this can happen:
Error is being raised here:
pico_galaxy/tools/protein_analysis/signalp3.py
Line 206 in 37d5b47
My script
signalp3.py
breaks up the input FASTA file into chunks of 500 sequences and by default uses four worker threads at once calling SignalP (which is single threaded).This is on top of the optional Galaxy parallelisation setting which breaks up the parent FASTA input file into chunks of 2000 sequences (i.e. 4 times 500):
pico_galaxy/tools/protein_analysis/signalp3.xml
Line 5 in 37d5b47
I've not pinned it down but think it is something about SignalP using predictable temp file names clashing when running child processes on a cluster node (and we expect sets of four jobs to get started around the same time on the same nodes).
CC @peterthorpe5
The text was updated successfully, but these errors were encountered: