-
Notifications
You must be signed in to change notification settings - Fork 12
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
July 8 2019 release - does it REQUIRE Blast 2.9.0? #20
Comments
You are right, The July 8 2019 release should work with previous versions of blast. I made some minor changes so it could work with the latest version of blastn 2.9. Let me know if you find any problems. |
Okay, I’m working on the install now.
Another question for you - I’m using an older release of BLCA - with an older database. Some of the results it comes up with don’t agree with a “simple” BLAST on the current NCBI database - e.g. one sequence in particular that I am working with is an ATCC E. Coli strain, which NCBI recognizes if I use their online search, but BLCA says it’s E. fergusonii. Is this a database mismatch issue, e.g. if I update to the newer BLCA and database, do you think the sequence will be recognized as E. coli?
Cheers,
Wolfgang Rumpf, Ph.D.
————————————
Bioinformatics Analyst
The Institute for Genomic Medicine at
The Abigail Wexner Research Institute
Nationwide Children’s Hospital
—————————————-
Professor
University of Maryland Global Campus
… On Jul 16, 2019, at 11:45 AM, yingeddi2008 ***@***.***> wrote:
You are right, THe July 8 2019 release work with previous versions of blast. I made some minor changes so it could work with the latest version of blastn 2.9. Let me know if you find any problems.
—
You are receiving this because you authored the thread.
Reply to this email directly, view it on GitHub, or mute the thread.
|
Hi Wolfgang, Please note that the default BLCA database is 16s rRNA, not the NT database which you are referring to when you perform BLASTN online. We have noticed some issue with the 16s rRNA database -- such as that some of the 16s rRNA fragments are not the type strains. I believe that's the reason why the annotation is off. Since we have no control over NCBI's 16s rRNA database, I can't say that updating the BLCA software will fix your misclassification issue. I do recommend that you use a manually curated database, such as greengene or SILVA instead. I hope this helps, Eddi |
There's also a plethora of sequences in the NCBI 16S database with ambiguous nucleotides, I'd thought of applying a filter for removing some of the more egregiously poor sequences actually. It's a shame because the ITS targetted loci project at the NCBI is far better curated for quality and really focuses on type strains. One of the things I've been meaning to dig into a little further is the provenance of these files: ftp://ftp.ncbi.nlm.nih.gov/refseq/TargetedLoci/Bacteria/bacteria.16SrRNA.fna.gz and ftp://ftp.ncbi.nlm.nih.gov/refseq/TargetedLoci/Archaea/archaea.16SrRNA.fna.gz As opposed to the pre-formatted BLAST database. Technically should be all the same project I imagine, but I've noticed a few formatting issues with the BLAST database, probably down to sequence redundancy. (updated) Having checked these files they're similar enough to satisfy me that they're the same source! |
If you can remove those poor sequences in NCBI 16S database, I do believe
that it'd be better. Any other ITS loci sequences should also work as
long as you can compile the corresponding taxonomic annotation.
…On Fri, Jul 19, 2019 at 7:28 AM Dr. Daniel Swan ***@***.***> wrote:
There's also a plethora of sequences in the NCBI 16S database with
ambiguous nucleotides, I'd thought of applying a filter for removing some
of the more egregiously poor sequences actually. It's a shame because the
ITS targetted loci project at the NCBI is far better curated for quality
and really focuses on type strains.
—
You are receiving this because you are subscribed to this thread.
Reply to this email directly, view it on GitHub
<#20?email_source=notifications&email_token=AEOBXE3RAYCPRTH3RWM333LQAGXNNA5CNFSM4IECAP4KYY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOD2LPPII#issuecomment-513210273>,
or mute the thread
<https://github.com/notifications/unsubscribe-auth/AEOBXE76FVU7K3PG33BDKLLQAGXNNANCNFSM4IECAP4A>
.
|
I did wonder how BLAST handled these ambiguities, but I assume they would be penalised. |
I saw that there are instructions for generating the SILVA LSU database for BLCA, but not for the SSU - I don’t suppose anyone has done this already? Or will greengenes provide sufficient resolution?
Cheers,
Wolfgang Rumpf, Ph.D.
————————————
Bioinformatics Analyst
The Institute for Genomic Medicine at
The Abigail Wexner Research Institute
Nationwide Children’s Hospital
—————————————-
Professor
University of Maryland Global Campus
… On Jul 19, 2019, at 10:42 AM, qunfengdong ***@***.***> wrote:
If you can remove those poor sequences in NCBI 16S database, I do believe
that it'd be better. Any other ITS loci sequences should also work as
long as you can compile the corresponding taxonomic annotation.
On Fri, Jul 19, 2019 at 7:28 AM Dr. Daniel Swan ***@***.***>
wrote:
> There's also a plethora of sequences in the NCBI 16S database with
> ambiguous nucleotides, I'd thought of applying a filter for removing some
> of the more egregiously poor sequences actually. It's a shame because the
> ITS targetted loci project at the NCBI is far better curated for quality
> and really focuses on type strains.
>
> —
> You are receiving this because you are subscribed to this thread.
> Reply to this email directly, view it on GitHub
> <#20?email_source=notifications&email_token=AEOBXE3RAYCPRTH3RWM333LQAGXNNA5CNFSM4IECAP4KYY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOD2LPPII#issuecomment-513210273>,
> or mute the thread
> <https://github.com/notifications/unsubscribe-auth/AEOBXE76FVU7K3PG33BDKLLQAGXNNANCNFSM4IECAP4A>
> .
>
—
You are receiving this because you authored the thread.
Reply to this email directly, view it on GitHub, or mute the thread.
|
Yes, BLAST should penalize those.
…On Fri, Jul 19, 2019 at 9:57 AM Dr. Daniel Swan ***@***.***> wrote:
If you can remove those poor sequences in NCBI 16S database, I do believe
that it'd be better. Any other ITS loci sequences should also work as long
as you can compile the corresponding taxonomic annotation.
I did wonder how BLAST handled these ambiguities, but I assume they would
be penalised.
—
You are receiving this because you commented.
Reply to this email directly, view it on GitHub
<#20?email_source=notifications&email_token=AEOBXE5NLH6IDXDWATX3JMTQAHI53A5CNFSM4IECAP4KYY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOD2L4BBA#issuecomment-513261700>,
or mute the thread
<https://github.com/notifications/unsubscribe-auth/AEOBXE5RXZ3NBMU7JPSWTYTQAHI53ANCNFSM4IECAP4A>
.
|
No, we have not tried neither SILVA LSU nor SSU (the LSU instruction was
provided kindly by Dr. Daniel Swan), and we have not done any systematic
comparison to greengenes either. We are just providing those options
available for the community to use. Sometimes, we do apply multiple
databases to our own projects.
On Fri, Jul 19, 2019 at 9:58 AM Wolfgang Rumpf <notifications@github.com>
wrote:
… I saw that there are instructions for generating the SILVA LSU database
for BLCA, but not for the SSU - I don’t suppose anyone has done this
already? Or will greengenes provide sufficient resolution?
Cheers,
Wolfgang Rumpf, Ph.D.
————————————
Bioinformatics Analyst
The Institute for Genomic Medicine at
The Abigail Wexner Research Institute
Nationwide Children’s Hospital
—————————————-
Professor
University of Maryland Global Campus
> On Jul 19, 2019, at 10:42 AM, qunfengdong ***@***.***>
wrote:
>
> If you can remove those poor sequences in NCBI 16S database, I do believe
> that it'd be better. Any other ITS loci sequences should also work as
> long as you can compile the corresponding taxonomic annotation.
>
> On Fri, Jul 19, 2019 at 7:28 AM Dr. Daniel Swan <
***@***.***>
> wrote:
>
> > There's also a plethora of sequences in the NCBI 16S database with
> > ambiguous nucleotides, I'd thought of applying a filter for removing
some
> > of the more egregiously poor sequences actually. It's a shame because
the
> > ITS targetted loci project at the NCBI is far better curated for
quality
> > and really focuses on type strains.
> >
> > —
> > You are receiving this because you are subscribed to this thread.
> > Reply to this email directly, view it on GitHub
> > <
#20?email_source=notifications&email_token=AEOBXE3RAYCPRTH3RWM333LQAGXNNA5CNFSM4IECAP4KYY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOD2LPPII#issuecomment-513210273
>,
> > or mute the thread
> > <
https://github.com/notifications/unsubscribe-auth/AEOBXE76FVU7K3PG33BDKLLQAGXNNANCNFSM4IECAP4A
>
> > .
> >
> —
> You are receiving this because you authored the thread.
> Reply to this email directly, view it on GitHub, or mute the thread.
—
You are receiving this because you commented.
Reply to this email directly, view it on GitHub
<#20?email_source=notifications&email_token=AEOBXE6TZRSWERF5PXQ6RYDQAHJB3A5CNFSM4IECAP4KYY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOD2L4EBY#issuecomment-513262087>,
or mute the thread
<https://github.com/notifications/unsubscribe-auth/AEOBXE4NJJ3WGIBGFDZFC7TQAHJB3ANCNFSM4IECAP4A>
.
|
I'm considering upgrading BLCA but our cluster doesn't have BLAST 2.9 on it yet. Is 2.9 required, or will the July 8 2019 release work with BLAST 2.8?
The text was updated successfully, but these errors were encountered: