-
Notifications
You must be signed in to change notification settings - Fork 12
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
I am interested in parallel BLCA. I tried splitting the FASTA input.. #14
Comments
Hi Wolfgangrumpf,
Thanks for taking an interest in our software. I'd happy to assist you with
any issue regarding BLCA.
First, could you please check your blastn version? It should be above
2.5.0. Also, please make sure you did NOT clone the github repo, but
downloaded the package from the release tab, the python2.7 version. The
current github repo is a mixed python version of 2.7 and 3, so it won't
work properly yet.
Best,
Eddi
…On Tue, Feb 19, 2019 at 3:05 PM Wolfgang Rumpf ***@***.***> wrote:
I also am interested in parallel BLCA. I tried splitting the FASTA input
into 10 files and then running BLCA separately on each one, but I saw these
errors in the output log:
*** ERROR *** No sequences in input file
blastdbcmd is located in your PATH!
muscle is located in your PATH!
Fasta file read in!!
Reading in taxonomy information! ....
blastn is located in your PATH!
Running blast!!
Blastn Finished!!
Read in blast output!
Traceback (most recent call last):
File "/opt/blca/2.1/2.blca_main.py", line 295, in
alndic=get_dic_from_aln(k1+".muscle")
File "/opt/blca/2.1/2.blca_main.py", line 70, in get_dic_from_aln
alignment=AlignIO.read(aln,"clustal")
File "/opt/blca/2.1/lib/python2.7/site-packages/Bio/AlignIO/*init*.py",
line 435, in read
first = next(iterator)
File "/opt/blca/2.1/lib/python2.7/site-packages/Bio/AlignIO/*init*.py",
line 357, in parse
with as_handle(handle, 'rU') as fp:
File "/gpfs0/export/opt/anaconda-2.3.0/lib/python2.7/contextlib.py", line
17, in *enter*
return self.gen.next()
File "/opt/blca/2.1/lib/python2.7/site-packages/Bio/File.py", line 113, in
as_handle
with open(handleish, mode, **kwargs) as fp:
IOError: [Errno 2] No such file or directory: '21758886.muscle'
Command line argument error: Argument "entry_batch". File is not
accessible: 21758886.dblist' rm: cannot remove 21758886.dblist': No such
file or directory
blastdbcmd is located in your PATH!
muscle is located in your PATH!
Fasta file read in!!
Reading in taxonomy information! ....
blastn is located in your PATH!
Running blast!!
Blastn Finished!!
Read in blast output!
Traceback (most recent call last):
File "/opt/blca/2.1/2.blca_main.py", line 295, in
alndic=get_dic_from_aln(k1+".muscle")
File "/opt/blca/2.1/2.blca_main.py", line 70, in get_dic_from_aln
alignment=AlignIO.read(aln,"clustal")
File "/opt/blca/2.1/lib/python2.7/site-packages/Bio/AlignIO/*init*.py",
line 435, in read
first = next(iterator)
File "/opt/blca/2.1/lib/python2.7/site-packages/Bio/AlignIO/*init*.py",
line 382, in parse
for a in i:
File "/opt/blca/2.1/lib/python2.7/site-packages/Bio/AlignIO/ClustalIO.py",
line 115, in *next*
", ".join(known_headers)))
ValueError: >21758886 is not a known CLUSTAL header: CLUSTAL, PROBCONS,
MUSCLE, MSAPROBS, Kalign
srun: error: node03: tasks 3,9: Exited with exit code 1
rm: cannot remove 21758886.hitdb.fsa': No such file or directory rm:
cannot remove 21758886.hitdb.fsa': No such file or directory
*Originally posted by @wolfgangrumpf <https://github.com/wolfgangrumpf> in
#12 (comment)
<#12 (comment)>*
—
You are receiving this because you are subscribed to this thread.
Reply to this email directly, view it on GitHub
<#14>, or mute the thread
<https://github.com/notifications/unsubscribe-auth/AHCP060PkT1pEkHYXr0KnHYrZ3FsfKFhks5vPGb8gaJpZM4bD3NC>
.
|
I am using BLAST 2.5.0. I used pyfasta to split the input file into 9 separate files. They are all in the same directory. I am BLASTing in series, so the first one finishes BLASTing and then starts the next step in the workflow while the second one is BLASTing. Should I instead create new directories for each file and segregate the jobs in those compartments? I'm asking our HPC admin how BLCA was installed.... |
And they say that we installed it from the release tab, not by cloning the distro. |
Hi Wolfgangrumpf,
Judging from the error message, it seems like an input/output issue. Please
try separating each batch in a different folder, and see how it goes. Did
you first try running the test file without parallel? Did it work? If it
worked, you should definitely separate the input files. It seems that you
have sequences in different files that have the same IDs.
Let me know how it goes,
Eddi
…On Wed, Feb 20, 2019 at 9:29 AM Wolfgang Rumpf ***@***.***> wrote:
And they say that we installed it from the release tab, not by cloning the
distro.
—
You are receiving this because you commented.
Reply to this email directly, view it on GitHub
<#14 (comment)>,
or mute the thread
<https://github.com/notifications/unsubscribe-auth/AHCP0_v882rS3yBbXISKTHqErJ5l4mKoks5vPWnTgaJpZM4bD3NC>
.
|
Yes, it ran albeit very slowly after the initial BLAST job. I split things into 9 files in their own directories and executed 9 jobs, each with 2 cpus, on a 20 cpu node - it appears to be working. The hardest part was figuring out the correct SLURM commands to make the jobs run simultaneously, but I finally got it. Thanks for your help! |
I am closing this issue. FYI, I just uploaded a utility script for merging multiple BLCA outputs. It could be useful if you want to generate count tables from BLCA taxonomy assignment. |
I also am interested in parallel BLCA. I tried splitting the FASTA input into 10 files and then running BLCA separately on each one, but I saw these errors in the output log:
*** ERROR *** No sequences in input file
blastdbcmd is located in your PATH!
muscle is located in your PATH!
Originally posted by @wolfgangrumpf in #12 (comment)
The text was updated successfully, but these errors were encountered: