-
Notifications
You must be signed in to change notification settings - Fork 12
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Assignment not making sense #31
Comments
Thanks for the report. To be clear, are you using the entire nt database? BLCA is designed to deal with marker genes instead of a generic database. You will need to use a particularly family of marker genes as the database, so that the database entries are in similar length. Otherwise, the subsequent multiple sequence alignment is not reliable. If you are using the entire nt database, the multiple sequence alignment may be a problem. |
if you can make your database and query available for us to download, we can take a look. |
Hi, could you check if the blastn file you showed was correct? In your
blastn output, the query ID was "haplo_51122", but in the result file the
ID was "seq_01"? I was wondering if they refer to the same thing?
…On Wed, Nov 3, 2021 at 11:57 AM qunfengdong ***@***.***> wrote:
if you can make your database and query available for us to download, we
can take a look.
—
You are receiving this because you are subscribed to this thread.
Reply to this email directly, view it on GitHub
<#31 (comment)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/AKABWIXNOXWFM4VB2JALSBTUKFSVZANCNFSM5HJIRBKA>
.
Triage notifications on the go with GitHub Mobile for iOS
<https://apps.apple.com/app/apple-store/id1477376905?ct=notification-email&mt=8&pt=524675>
or Android
<https://play.google.com/store/apps/details?id=com.github.android&referrer=utm_campaign%3Dnotification-email%26utm_medium%3Demail%26utm_source%3Dgithub>.
--
Yue "July" Xing, Ph.D.
Postdoctoral research associate
Ph.D. in Genetics and MS in Statistics at Texas A&M University
Center for Biomedical Informatics
Department of Medicine
Stritch School of Medicine
Loyola University Chicago
|
Yes it's correct, I changed it as I copied it in. |
That's wired. Could you also send the sequence of this one entry, and I'll take a look. |
Hi @YJulyXing @qunfengdong
Thanks for taking the time to look into this issue. |
Also, we don't think that using the entire nt database would work. You need
to extract a family of maker genes. If the gene is inside a genome, it
would mess up the multiple sequence alignment.
Peter Shum ***@***.***> 于 2021年11月4日周四 下午12:27写道:
… Hi @YJulyXing <https://github.com/YJulyXing> @qunfengdong
<https://github.com/qunfengdong>
See here the link to the data:
https://drive.google.com/drive/folders/1-WGhTt9wesYZbpY80I-A44QZkCtmtxte?usp=sharing
1. nt.ACC.taxonomy file of the entire nt database ( wget
https://ftp.ncbi.nlm.nih.gov/blast/db/nt.{00..47}.tar.gz)
2. test (the sequence - should be Strombus gigas)
3. test.blastn (the resulting blastn file generated)
4. test.blca.out (clustalo)
5. test1.blca.out (muscle)
Thanks for taking the time to look into this issue.
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
<#31 (comment)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/AKABWIX4VTTC5JBH2FLBQOLUKKX7ZANCNFSM5HJIRBKA>
.
Triage notifications on the go with GitHub Mobile for iOS
<https://apps.apple.com/app/apple-store/id1477376905?ct=notification-email&mt=8&pt=524675>
or Android
<https://play.google.com/store/apps/details?id=com.github.android&referrer=utm_campaign%3Dnotification-email%26utm_medium%3Demail%26utm_source%3Dgithub>.
|
Ok I understand that but these are all the same species albeit some are whole mitogenomes but the return a different species from the blast hits? |
@shump2 @YJulyXing We should have made it more clear in our documentation. For BLCA, we expect that the database sequences are of similar length: that is, the sequences are from a gene family. For example, all the 16S gene sequences have more or length the similar lengths (not identical, but similar). If the database sequences have very dramatically different length, the multiple sequence alignment may become a problem. In your case, if some of the sequences correspond to the whole mitogenomes, but others correspond to a particular gene in the mitogenomes, they are of very different lengths, which may create problems for reliable multiple sequence alignments. |
What is your blast database? Are you using the default blast database
(16SMicrobial)?
…On Thu, Nov 4, 2021 at 6:14 PM qunfengdong ***@***.***> wrote:
@shump2 <https://github.com/shump2> @YJulyXing
<https://github.com/YJulyXing> We should have made it more clear in our
documentation. For BLCA, we expect that the database sequences are of
similar length: that is, the sequences are from a gene family. For example,
all the 16S gene sequences have more or length the similar lengths (not
identical, but similar). If the database sequences have very dramatically
different length, the multiple sequence alignment may become a problem. In
your case, if some of the sequences correspond to the whole mitogenomes,
but others correspond to a particular gene in the mitogenomes, they are of
very different lengths, which may create problems for reliable multiple
sequence alignments.
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
<#31 (comment)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/AKABWIUUW6CN2WNBIOVHYCTUKMHULANCNFSM5HJIRBKA>
.
Triage notifications on the go with GitHub Mobile for iOS
<https://apps.apple.com/app/apple-store/id1477376905?ct=notification-email&mt=8&pt=524675>
or Android
<https://play.google.com/store/apps/details?id=com.github.android&referrer=utm_campaign%3Dnotification-email%26utm_medium%3Demail%26utm_source%3Dgithub>.
--
Yue "July" Xing, Ph.D.
Postdoctoral research associate
Ph.D. in Genetics and MS in Statistics at Texas A&M University
Center for Biomedical Informatics
Department of Medicine
Stritch School of Medicine
Loyola University Chicago
|
@shump2 do you mind providing your blast database? @YJulyXing needs to check if your database entries really have the correct taxa ID in the NCBI taxa database you used. |
Hi,
I have created a db (nt.ACC.taxonomy ~82 million records) using an updated version of the entire nt NCBI database.
Some sequence queries fail to assign to the correct species despite the blastn hits being correct.
For example: if I search one sequence against the nt database with the custom taxonomy file. The blastn returns 11 hits but the alignment returns a very strange result to COVID. I am not sure where to begin in resolving this and if there are any known issues?
Any assistance here would be much appreciated.
Blastn file
haplo_51122 KU317715.1 99.361 313 2 0 1 313 309 621 1.57e-157 568 307 plus 621 313
haplo_51122 MW124469.1 99.361 313 2 0 1 313 343 655 1.57e-157 568 307 plus 655 313
haplo_51122 MZ157283.1 99.361 313 2 0 1 313 384 696 1.57e-157 568 307 plus 15460 313
haplo_51122 KU317714.1 99.042 313 3 0 1 313 309 621 7.32e-156 562 304 plus 622 313
haplo_51122 KU317712.1 99.042 313 3 0 1 313 285 597 7.32e-156 562 304 plus 601 313
haplo_51122 KM245630.1 99.042 313 3 0 1 313 384 696 7.32e-156 562 304 plus 15461 313
haplo_51122 DQ525222.1 98.026 304 6 0 1 304 335 638 7.43e-146 529 286 plus 638 313
haplo_51122 KU317713.1 98.893 271 3 0 1 271 279 549 1.63e-132 484 262 plus 549 313
haplo_51122 MW124560.1 90.096 313 31 0 1 313 346 658 3.63e-109 407 220 plus 658 313
haplo_51122 MW124542.1 90.096 313 31 0 1 313 346 658 3.63e-109 407 220 plus 658 313
haplo_51122 DQ525226.1 90.099 303 30 0 1 303 335 637 2.83e-105 394 213 plus 638 313
Result
seq_01 superkingdom:Viruses;96.0;phylum:Pisuviricota;96.0;class:Pisoniviricetes;96.0;order:Nidovirales;96.0;family:Coronaviridae;96.0;genus:Betacoronavirus;96.0;species:Severe acute respiratory syndrome-related coronavirus;96.0;
The text was updated successfully, but these errors were encountered: