Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

trees and alignments genbank species labels for some OTUs missing #197

Open
batson opened this issue Sep 14, 2022 · 1 comment
Open

trees and alignments genbank species labels for some OTUs missing #197

batson opened this issue Sep 14, 2022 · 1 comment
Labels
bug Something isn't working

Comments

@batson
Copy link

batson commented Sep 14, 2022

Describe the bug
Some OTUs in the Orthomyxovirus tree have good BLAST hits but are not labelled with the corresponding Genbank species name.

For example, palmID_u19687 is Wellfleet Bay virus (100% sequence ID), which was submitted in 2018.

Compare u25189|Quaranfil quaranjavirus.

Screenshots
Screen Shot 2022-09-14 at 9 46 26 AM
Screen Shot 2022-09-14 at 9 53 21 AM

@batson batson added the bug Something isn't working label Sep 14, 2022
@ababaian
Copy link
Member

Good catch, looks like the GenBank ID are coming from hits where a Serratus sequence was a centroid in the clustering that went into palmDB.

One way or another it's necessary to to do a BLAST/DIAMOND search against nr (instructions: https://github.com/ababaian/serratus/wiki/DIAMOND-nr) to deplete knowns as a filtering step. Also will catch errors where the virus has since been described (since Jan 2021) but after the snapshot that went into palmDB.

Updating GenBank accession per representative sOTU where any sequence in the cluster are in GenBank will be the fix for this. Keeping issue open as TODO

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

2 participants