-
Notifications
You must be signed in to change notification settings - Fork 39
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
BLASTN causing crash/core dump with ~1% of samples (tested on 3.11.2 and 3.11.11) #118
Comments
If you run
do yo get the same crash? What is the contents of the below files?
And what is the version of |
What is the result of these commands?
|
Tried it with amrfinder 3.11.11 (Python 3.7) and 3.11.2 (Python 3.10). The BLAST version is BLAST 2.13.0+ in both cases, running on Ubuntu 22.04 LTS in two Mamba environments. The Database version used is: 2023-04-17.1 With the commandline suggestion, I still get "Segmentation fault (core dumped)" blastn: "" (empty, 0 bytes) As said, the weird thing is it only happens in a minority of genomes, |
-rwxrwxr-x 4 vetschool vetschool 276776 Jul 19 2022 /home/vetschool/mambaforge/envs/genomics/bin/blastn* (username = vetschool, genotyping is the env for Python 3.10 which only allows amrfinder 3.11.2, genomics is the env for Python 3.7 which allows amrfinder 3.11.11) |
Since the bug is reproducible, could you post Can you try BLASTN ver. 2.14.0+? |
Blast 2.14 is not available yet via Conda/Mamba? The Salm000048.fna file is available on https://drive.google.com/file/d/11JmHcvVhjvgJz1JFxrokD7PIvyjOw3Rv/view?usp=sharing. |
I have tried Let's check that the blast database is available. What is the result of this command?
Is there enough disk space? |
|
-rw-rw-r-- 1 vetschool vetschool 1612 Apr 26 17:20 /tmp/amrfinder.Bvch0H/db/AMR_DNA-Salmonella I rebooted the computer, now Salm000048.fna did work, took the next one (Salm000070.fna) which does crash again, hence the change in code Bvch0H. The output of df -h is: (virtualbox Ubuntu 22.04 LTS computer running in Windows) |
And now the next one works after a few tries. This is very irritating! Thanks very much for your assistance, by the way! |
Your
|
Are you working on a Windows computer emulating Ubuntu? |
I am working on a Windows 10 computer with Virtualbox 7.08, and a virtual Ubuntu 22.04 Linux computer with Conda (Mamba) environments. So it is Linux, not emulating. I had a look at https://anaconda.org/bioconda/blast/files, and the BLAST file size is OK there? Blastn is about 270 kb. |
I will pass this issue to those who understand |
And it works absolutely fine if I leave out the -O Salmonella out, the only thing I am missing then is the point mutation resistances. But I can do those with pointfinder. Thanks for your help! |
I thought it might be due to the blast in bioconda, which applies several minor patches to the blast source, but I wasn't able to reproduce the issue with two versions of blast from bioconda. Still a mystery to me. |
You can download NCBI BLAST from https://ftp.ncbi.nlm.nih.gov/blast/executables/blast+/LATEST/ and use the |
Thanks! It did notice that each time I re-ran the failed ones, a few would do it suddenly (say 5% of the samples), and then I moved to another virtual computer and all the remaining samples worked fine. I am about to do another big batch again, will try this and report back. |
I have reformatted the headers and files with SeqFu (https://github.com/telatin/seqfu2): I still use BLAST+ 2.13.0, but now I have not had dropouts anymore, except for 1 genome that ran fine when done again. All the previous "problem makers" like Salm000048.fna ran absolutely fine. The only difference I can seen between the Prokka- and Seqfu-generated files is that with Prokka it's 60 bases per line (like Genbank downloads), whereas with Seqfu everything is on a single line, no line breaks per contig. Anyway, just ran 40k of the 50k Salmonella genomes without a hiccup (still running), so problem seems to have been resolved. Happy to close it, thanks for the assistance! |
I'm glad you got it working! Thanks for the clue about line length, and thanks for your patience. We'll take a look and see if we can figure anything out. At least we have a potential fix if we hear of other people having the issue and a clue as to what could be breaking. Thanks again for reporting and giving us all the details. |
I am running >20k Salmonella genomes with AMRfinder using the "--plus" switch and "-O Salmonella". In about 1% of the samples it will crash once the BLASTN starts for the point mutation search; if I run it without the -O switch, it is fine with the same sequences. I first thought it could have to do with long contig names, but after running them through Prokka with renamed contig names, it still causes failures.
Below is the output when crashing:
*** ERROR ***
'/home/username/mambaforge/envs/genotyping/bin/blastn' -query 'Salm000001fna/Salm000048.fna' -db /tmp/amrfinder.4Qo99F/db/AMR_DNA-Salmonella -evalue 1e-20 -dust no -max_target_seqs 10000 -num_threads 2 -mt_mode 1 -outfmt '6 qseqid sseqid qstart qend qlen sstart send slen qseq sseq' -out /tmp/amrfinder.4Qo99F/blastn > /tmp/amrfinder.4Qo99F/log 2> /tmp/amrfinder.4Qo99F/blastn-err
status = 35584
Segmentation fault (core dumped)
Anything that can be done for this? Thanks :)
The text was updated successfully, but these errors were encountered: