-
Notifications
You must be signed in to change notification settings - Fork 16
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
graftM create builds an HMM from sequences that have not been deduplicated #121
Comments
and also the HMM is created using sequences that ultimately do not pass the length cutoff |
and also the HMM is not dereplicated at the genus level, which would be a nice feature. I'm working on create now so leave this one to me. |
and also the duplicates are included in the diamond db, so it is possible to map to a read that is in the diamond database, but is not in the tree. |
One might argue that diamond sequences shouldn't be deduplicated, because the deduplication happens only for those positions aligned to the HMM, and diamond doesn't care about HMMs. What do you think @geronimp ? |
@wwood Yes I dont think we should de-duplicate the diamond sequences. The branch I'm working on now includes sequences that have not been deduplicated, but it ""doesn't"" remove those which did not pass the min percent align filter. imo i think this is the best way. |
this issue seems to be fixed now thanks to Joel |
Hello Ben,
GraftM create builds the HMM using the sequences provided, and seems to be resulting in an HMM biased to duplicates.
Thanks,
Caitlin
The text was updated successfully, but these errors were encountered: