-
Notifications
You must be signed in to change notification settings - Fork 12
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Error writing output: ValueError: max() arg is an empty sequence #6
Comments
Hi Bryan,
Thanks for using our software.
After examining your file, I noticed that you do not have the same number
of database fasta reads as in your taxonomy file. You have 583043 lines of
taxonomy information in midori_blca_taxonomy.txt, while you have 582591
reads in the midori_blca.dedup.fasta. But I do not think it caused the
error.
Another thing is that the deliminator in the taxonomy file should be a tab
instead of space. I forgot to emphasize it in the instruction. It should be
formatted as the following.
DQ523177[\t]species:Gammarus
tigrinus;genus:Gammarus;family:Gammaridae;order:Amphipoda;class:Malacostraca;phylum:Arthropoda;superkingdom:Eukaryota;
I fixed the error by changing the space into tab. The outputs from the two
test fasta files are as the following:
AY937375.1
superkingdom:Eukaryota;100.0;phylum:Cnidaria;100.0;class:Hydrozoa;100.0;order:Siphonophorae;100.0;family:Prayidae;100.0;genus:Praya;100.0;species:Praya
dubia;100.0;
DQ133904.1
superkingdom:Eukaryota;100.0;phylum:Porifera;100.0;class:Demospongiae;100.0;order:Poecilosclerida;100.0;family:Tedaniidae;100.0;genus:Tedania;100.0;species:Tedania
ignis;100.0;
1 Unclassified
3 superkingdom:Eukaryota;100.0;phylum:Cnidaria;100.0;class:Anthozoa;100.0;order:Actiniaria;100.0;family:Aiptasiidae;100.0;genus:Aiptasia;100.0;species:Aiptasia
pulchella;100.0;
2 Unclassified
5 superkingdom:Eukaryota;100.0;phylum:Porifera;100.0;class:Demospongiae;100.0;order:Poecilosclerida;100.0;family:Tedaniidae;100.0;genus:Tedania;100.0;species:Tedania
ignis;100.0;
4 superkingdom:Eukaryota;100.0;phylum:Porifera;100.0;class:Demospongiae;100.0;order:Poecilosclerida;100.0;family:Tedaniidae;100.0;genus:Tedania;100.0;species:Tedania
ignis;100.0;
other_BLCA_test_input.fasta.blca.out (END)
sq1abc123 Unclassified
sq10abc123
superkingdom:Eukaryota;100.0;phylum:Porifera;100.0;class:Demospongiae;100.0;order:Poecilosclerida;100.0;family:Microcionidae;100.0;genus:Clathria;100.0;species:Clathria
abietina;73.5;
sq3abc123
superkingdom:Eukaryota;100.0;phylum:Cnidaria;100.0;class:Anthozoa;100.0;order:Actiniaria;100.0;family:Aiptasiidae;100.0;genus:Aiptasia;100.0;species:Aiptasia
pulchella;100.0;
sq2abc123 Unclassified
sq5abc123
superkingdom:Eukaryota;100.0;phylum:Porifera;100.0;class:Demospongiae;100.0;order:Poecilosclerida;100.0;family:Tedaniidae;100.0;genus:Tedania;100.0;species:Tedania
ignis;100.0;
sq7abc123
superkingdom:Eukaryota;100.0;phylum:Cnidaria;100.0;class:Anthozoa;100.0;order:Scleractinia;100.0;family:Agariciidae;100.0;genus:Agaricia;100.0;species:Agaricia
agaricites;65.75;
sq6abc123 Unclassified
sq9abc123
superkingdom:Eukaryota;100.0;phylum:Porifera;100.0;class:Demospongiae;100.0;order:Poecilosclerida;100.0;family:Microcionidae;100.0;genus:Clathria;100.0;species:Clathria
abietina;56.5;
sq4abc123
superkingdom:Eukaryota;100.0;phylum:Porifera;100.0;class:Demospongiae;100.0;order:Poecilosclerida;100.0;family:Tedaniidae;100.0;genus:Tedania;100.0;species:Tedania
ignis;100.0;
sq8abc123
superkingdom:Eukaryota;100.0;phylum:Cnidaria;100.0;class:Anthozoa;100.0;order:Scleractinia;100.0;family:Agariciidae;100.0;genus:Agaricia;100.0;species:Agaricia
agaricites;66.5;
uniques10_trim.fasta.blca.out (END)
I believe if you do the same, the error message will disappear, and BLCA
will output the taxonomy of your reads.
I will add a check statement in the taxonomy reading-in function, and
update the README file on github.
Please let me know if you have any other problems,
Eddi
…On Mon, Feb 12, 2018 at 11:41 AM, Bryan Nguyen ***@***.***> wrote:
Hi Eddi,
I've been trying to get BLCA to run with a custom database. As far as I
can tell, I have everything formatted as specified in the README, but BLCA
errors out while trying to write the output files with the following
messages:
blastdbcmd is located in your PATH!
muscle is located in your PATH!
> > Fasta file read in!!
> > Read in taxonomy information!
blastn is located in your PATH!
> > Running blast!!
> > Blastn Finished!!
> > Read in blast output!
Traceback (most recent call last):
File "/groups/cbi/shared/apps/BLCA/BLCA.git/2.blca_main.py", line 352, in <module>
outout.write(le+":"+max(lexsum,key=lexsum.get)+";"+str(max(lexsum.values()))+";")
ValueError: max() arg is an empty sequence
I managed to get it run successfully once or twice with different sequence
IDs, but haven't been able to replicate it. Even then, it kept identifying
sequences as "Unclassified," though. This would probably be a separate
issue, though.
I've uploaded my blastdb, reference FASTA used to generate the blastdb
(midori_blca_dedup.fasta), reference taxonomy file
(midori_blca_taxonomy.txt), and two test input FASTA files
(uniques10_trim.fasta) and (other_BLCA_test_input.fasta) to a Dropbox link
here: https://www.dropbox.com/sh/m3hie0b9o8ldc29/
AABHeBAiS94lwl_skgtEY-qIa?dl=0
Thanks for your help!
Cheers,
Bryan
—
You are receiving this because you are subscribed to this thread.
Reply to this email directly, view it on GitHub
<#6>, or mute the thread
<https://github.com/notifications/unsubscribe-auth/AHCP06CrnvVgNhUKDEGS3TC-MfezL-iUks5tUHe0gaJpZM4SCjsg>
.
|
Fantastic! Yes, the differing number of lines/reads is because I generated the taxonomy file before deduplication, but I was hoping it wouldn't make a difference (just being lazy). The Midori database has some duplicate sequence IDs, which I didn't realize until I tried to generate the BLAST db. Thanks for the clarification. I'm looking forward to giving BLCA a try. The results you posted look promising. Cheers, |
Hi Eddi,
I've been trying to get BLCA to run with a custom database. As far as I can tell, I have everything formatted as specified in the README, but BLCA errors out while trying to write the output files with the following messages:
I managed to get it run successfully once or twice with different sequence IDs, but haven't been able to replicate it. Even then, it kept identifying sequences as "Unclassified," though. This would probably be a separate issue, though.
I've uploaded my blastdb, reference FASTA used to generate the blastdb (midori_blca_dedup.fasta), reference taxonomy file (midori_blca_taxonomy.txt), and two test input FASTA files (uniques10_trim.fasta) and (other_BLCA_test_input.fasta) to a Dropbox link here: https://www.dropbox.com/sh/m3hie0b9o8ldc29/AABHeBAiS94lwl_skgtEY-qIa?dl=0
Thanks for your help!
Cheers,
Bryan
The text was updated successfully, but these errors were encountered: