Join GitHub today
GitHub is home to over 20 million developers working together to host and review code, manage projects, and build software together.
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
Already on GitHub? Sign in to your account
Inconsistant subsampling #317
Comments
mothur-westcott
added a commit
that referenced
this issue
Mar 20, 2017
|
|
mothur-westcott |
ecfbb09
|
|
I found the source of the problem and have released version 1.39.5 which corrects it. https://github.com/mothur/mothur/releases/tag/v1.39.5 |
mothur-westcott
closed this
Mar 20, 2017
|
Thanks! Works great on my end now too |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
campenr commentedMar 20, 2017
Hello,
First I'm raising this as an issue separatley from #311 as although they both are about mothurs subsampling the root cause may be unrelated.
I am finding that I get inconsistant results with summary.single if I pre subsample to a set size (in my case 12,000 sequences) using sub.sample(), or if I let summary.single do the sub sampling with subsample=12000.
Some of my samples are being subsampled from more than a million sequences down to only 12,000. The previous bug in issue #311 resulted in uneven distributions when subsmpling these large samples with sub.sample but that is now fixed. However, when I run summary.shared on the original samples and set subsample=12000 I get very different numbers of OTUs for these highly subsampled samples. The greater the sub sampling effort, the greater the observed difference in OTUs between the sub.sample followed by summary.single and the summary.single with subsample=12000 methods. See the table below for the actual number of OTUS (sobs) returned by summary.single depending on whether I use all sequences without subsampling, pre subsample the shared file with sub.sample(nseqs=12000) followed by summary.single(), or just run summary.single on the original shared file with subsample=12000:
I am running the latest version of mothur 1.37.4 (including the commits that fix the sub.sample issue on windows). I am running 64-bit Windows 10.
Let me know if there is anything else I can provide to help with this issue.
Cheers
Richard