Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

July 8 2019 release - does it REQUIRE Blast 2.9.0? #20

Open
wolfgangrumpf opened this issue Jul 16, 2019 · 9 comments
Open

July 8 2019 release - does it REQUIRE Blast 2.9.0? #20

wolfgangrumpf opened this issue Jul 16, 2019 · 9 comments

Comments

@wolfgangrumpf
Copy link

I'm considering upgrading BLCA but our cluster doesn't have BLAST 2.9 on it yet. Is 2.9 required, or will the July 8 2019 release work with BLAST 2.8?

@yingeddi2008
Copy link
Collaborator

yingeddi2008 commented Jul 16, 2019

You are right, The July 8 2019 release should work with previous versions of blast. I made some minor changes so it could work with the latest version of blastn 2.9. Let me know if you find any problems.

@wolfgangrumpf
Copy link
Author

wolfgangrumpf commented Jul 16, 2019 via email

@yingeddi2008
Copy link
Collaborator

Hi Wolfgang,

Please note that the default BLCA database is 16s rRNA, not the NT database which you are referring to when you perform BLASTN online. We have noticed some issue with the 16s rRNA database -- such as that some of the 16s rRNA fragments are not the type strains. I believe that's the reason why the annotation is off. Since we have no control over NCBI's 16s rRNA database, I can't say that updating the BLCA software will fix your misclassification issue. I do recommend that you use a manually curated database, such as greengene or SILVA instead.

I hope this helps,

Eddi

@dswan
Copy link

dswan commented Jul 19, 2019

There's also a plethora of sequences in the NCBI 16S database with ambiguous nucleotides, I'd thought of applying a filter for removing some of the more egregiously poor sequences actually. It's a shame because the ITS targetted loci project at the NCBI is far better curated for quality and really focuses on type strains.

One of the things I've been meaning to dig into a little further is the provenance of these files:

ftp://ftp.ncbi.nlm.nih.gov/refseq/TargetedLoci/Bacteria/bacteria.16SrRNA.fna.gz

and

ftp://ftp.ncbi.nlm.nih.gov/refseq/TargetedLoci/Archaea/archaea.16SrRNA.fna.gz

As opposed to the pre-formatted BLAST database. Technically should be all the same project I imagine, but I've noticed a few formatting issues with the BLAST database, probably down to sequence redundancy.

(updated) Having checked these files they're similar enough to satisfy me that they're the same source!

@qunfengdong
Copy link
Owner

qunfengdong commented Jul 19, 2019 via email

@dswan
Copy link

dswan commented Jul 19, 2019

If you can remove those poor sequences in NCBI 16S database, I do believe that it'd be better. Any other ITS loci sequences should also work as long as you can compile the corresponding taxonomic annotation.

I did wonder how BLAST handled these ambiguities, but I assume they would be penalised.

@wolfgangrumpf
Copy link
Author

wolfgangrumpf commented Jul 19, 2019 via email

@qunfengdong
Copy link
Owner

qunfengdong commented Jul 19, 2019 via email

@qunfengdong
Copy link
Owner

qunfengdong commented Jul 19, 2019 via email

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants