I am interested in parallel BLCA. I tried splitting the FASTA input.. #14

wolfgangrumpf · 2019-02-19T21:04:59Z

I also am interested in parallel BLCA. I tried splitting the FASTA input into 10 files and then running BLCA separately on each one, but I saw these errors in the output log:

*** ERROR *** No sequences in input file
blastdbcmd is located in your PATH!
muscle is located in your PATH!

Fasta file read in!!
Reading in taxonomy information! ....
blastn is located in your PATH!
Running blast!!
Blastn Finished!!
Read in blast output!
Traceback (most recent call last):
File "/opt/blca/2.1/2.blca_main.py", line 295, in
alndic=get_dic_from_aln(k1+".muscle")
File "/opt/blca/2.1/2.blca_main.py", line 70, in get_dic_from_aln
alignment=AlignIO.read(aln,"clustal")
File "/opt/blca/2.1/lib/python2.7/site-packages/Bio/AlignIO/init.py", line 435, in read
first = next(iterator)
File "/opt/blca/2.1/lib/python2.7/site-packages/Bio/AlignIO/init.py", line 357, in parse
with as_handle(handle, 'rU') as fp:
File "/gpfs0/export/opt/anaconda-2.3.0/lib/python2.7/contextlib.py", line 17, in enter
return self.gen.next()
File "/opt/blca/2.1/lib/python2.7/site-packages/Bio/File.py", line 113, in as_handle
with open(handleish, mode, **kwargs) as fp:
IOError: [Errno 2] No such file or directory: '21758886.muscle'
Command line argument error: Argument "entry_batch". File is not accessible: 21758886.dblist' rm: cannot remove 21758886.dblist': No such file or directory
blastdbcmd is located in your PATH!
muscle is located in your PATH!
Fasta file read in!!
Reading in taxonomy information! ....
blastn is located in your PATH!
Running blast!!
Blastn Finished!!
Read in blast output!
Traceback (most recent call last):
File "/opt/blca/2.1/2.blca_main.py", line 295, in
alndic=get_dic_from_aln(k1+".muscle")
File "/opt/blca/2.1/2.blca_main.py", line 70, in get_dic_from_aln
alignment=AlignIO.read(aln,"clustal")
File "/opt/blca/2.1/lib/python2.7/site-packages/Bio/AlignIO/init.py", line 435, in read
first = next(iterator)
File "/opt/blca/2.1/lib/python2.7/site-packages/Bio/AlignIO/init.py", line 382, in parse
for a in i:
File "/opt/blca/2.1/lib/python2.7/site-packages/Bio/AlignIO/ClustalIO.py", line 115, in next
", ".join(known_headers)))
ValueError: >21758886 is not a known CLUSTAL header: CLUSTAL, PROBCONS, MUSCLE, MSAPROBS, Kalign
srun: error: node03: tasks 3,9: Exited with exit code 1
rm: cannot remove 21758886.hitdb.fsa': No such file or directory rm: cannot remove 21758886.hitdb.fsa': No such file or directory

Originally posted by @wolfgangrumpf in #12 (comment)

The text was updated successfully, but these errors were encountered:

yingeddi2008 · 2019-02-19T21:18:49Z

Hi Wolfgangrumpf, Thanks for taking an interest in our software. I'd happy to assist you with any issue regarding BLCA. First, could you please check your blastn version? It should be above 2.5.0. Also, please make sure you did NOT clone the github repo, but downloaded the package from the release tab, the python2.7 version. The current github repo is a mixed python version of 2.7 and 3, so it won't work properly yet. Best, Eddi

…

On Tue, Feb 19, 2019 at 3:05 PM Wolfgang Rumpf ***@***.***> wrote: I also am interested in parallel BLCA. I tried splitting the FASTA input into 10 files and then running BLCA separately on each one, but I saw these errors in the output log: *** ERROR *** No sequences in input file blastdbcmd is located in your PATH! muscle is located in your PATH! Fasta file read in!! Reading in taxonomy information! .... blastn is located in your PATH! Running blast!! Blastn Finished!! Read in blast output! Traceback (most recent call last): File "/opt/blca/2.1/2.blca_main.py", line 295, in alndic=get_dic_from_aln(k1+".muscle") File "/opt/blca/2.1/2.blca_main.py", line 70, in get_dic_from_aln alignment=AlignIO.read(aln,"clustal") File "/opt/blca/2.1/lib/python2.7/site-packages/Bio/AlignIO/*init*.py", line 435, in read first = next(iterator) File "/opt/blca/2.1/lib/python2.7/site-packages/Bio/AlignIO/*init*.py", line 357, in parse with as_handle(handle, 'rU') as fp: File "/gpfs0/export/opt/anaconda-2.3.0/lib/python2.7/contextlib.py", line 17, in *enter* return self.gen.next() File "/opt/blca/2.1/lib/python2.7/site-packages/Bio/File.py", line 113, in as_handle with open(handleish, mode, **kwargs) as fp: IOError: [Errno 2] No such file or directory: '21758886.muscle' Command line argument error: Argument "entry_batch". File is not accessible: 21758886.dblist' rm: cannot remove 21758886.dblist': No such file or directory blastdbcmd is located in your PATH! muscle is located in your PATH! Fasta file read in!! Reading in taxonomy information! .... blastn is located in your PATH! Running blast!! Blastn Finished!! Read in blast output! Traceback (most recent call last): File "/opt/blca/2.1/2.blca_main.py", line 295, in alndic=get_dic_from_aln(k1+".muscle") File "/opt/blca/2.1/2.blca_main.py", line 70, in get_dic_from_aln alignment=AlignIO.read(aln,"clustal") File "/opt/blca/2.1/lib/python2.7/site-packages/Bio/AlignIO/*init*.py", line 435, in read first = next(iterator) File "/opt/blca/2.1/lib/python2.7/site-packages/Bio/AlignIO/*init*.py", line 382, in parse for a in i: File "/opt/blca/2.1/lib/python2.7/site-packages/Bio/AlignIO/ClustalIO.py", line 115, in *next* ", ".join(known_headers))) ValueError: >21758886 is not a known CLUSTAL header: CLUSTAL, PROBCONS, MUSCLE, MSAPROBS, Kalign srun: error: node03: tasks 3,9: Exited with exit code 1 rm: cannot remove 21758886.hitdb.fsa': No such file or directory rm: cannot remove 21758886.hitdb.fsa': No such file or directory *Originally posted by @wolfgangrumpf <https://github.com/wolfgangrumpf> in #12 (comment) <#12 (comment)>* — You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub <#14>, or mute the thread <https://github.com/notifications/unsubscribe-auth/AHCP060PkT1pEkHYXr0KnHYrZ3FsfKFhks5vPGb8gaJpZM4bD3NC> .

wolfgangrumpf · 2019-02-20T15:26:07Z

I am using BLAST 2.5.0. I used pyfasta to split the input file into 9 separate files. They are all in the same directory. I am BLASTing in series, so the first one finishes BLASTing and then starts the next step in the workflow while the second one is BLASTing. Should I instead create new directories for each file and segregate the jobs in those compartments?

I'm asking our HPC admin how BLCA was installed....

wolfgangrumpf · 2019-02-20T15:29:23Z

And they say that we installed it from the release tab, not by cloning the distro.

yingeddi2008 · 2019-02-20T16:15:18Z

Hi Wolfgangrumpf, Judging from the error message, it seems like an input/output issue. Please try separating each batch in a different folder, and see how it goes. Did you first try running the test file without parallel? Did it work? If it worked, you should definitely separate the input files. It seems that you have sequences in different files that have the same IDs. Let me know how it goes, Eddi

…

On Wed, Feb 20, 2019 at 9:29 AM Wolfgang Rumpf ***@***.***> wrote: And they say that we installed it from the release tab, not by cloning the distro. — You are receiving this because you commented. Reply to this email directly, view it on GitHub <#14 (comment)>, or mute the thread <https://github.com/notifications/unsubscribe-auth/AHCP0_v882rS3yBbXISKTHqErJ5l4mKoks5vPWnTgaJpZM4bD3NC> .

wolfgangrumpf · 2019-02-20T16:22:23Z

Yes, it ran albeit very slowly after the initial BLAST job. I split things into 9 files in their own directories and executed 9 jobs, each with 2 cpus, on a 20 cpu node - it appears to be working. The hardest part was figuring out the correct SLURM commands to make the jobs run simultaneously, but I finally got it. Thanks for your help!

yingeddi2008 · 2019-02-26T18:27:32Z

I am closing this issue.

FYI, I just uploaded a utility script for merging multiple BLCA outputs. It could be useful if you want to generate count tables from BLCA taxonomy assignment.

yingeddi2008 closed this as completed Feb 26, 2019

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

I am interested in parallel BLCA. I tried splitting the FASTA input.. #14

I am interested in parallel BLCA. I tried splitting the FASTA input.. #14

wolfgangrumpf commented Feb 19, 2019

yingeddi2008 commented Feb 19, 2019 via email

wolfgangrumpf commented Feb 20, 2019

wolfgangrumpf commented Feb 20, 2019

yingeddi2008 commented Feb 20, 2019 via email

wolfgangrumpf commented Feb 20, 2019

yingeddi2008 commented Feb 26, 2019

I am interested in parallel BLCA. I tried splitting the FASTA input.. #14

I am interested in parallel BLCA. I tried splitting the FASTA input.. #14

Comments

wolfgangrumpf commented Feb 19, 2019

yingeddi2008 commented Feb 19, 2019 via email

wolfgangrumpf commented Feb 20, 2019

wolfgangrumpf commented Feb 20, 2019

yingeddi2008 commented Feb 20, 2019 via email

wolfgangrumpf commented Feb 20, 2019

yingeddi2008 commented Feb 26, 2019