Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

Already on GitHub? Sign in to your account

Inconsistant subsampling #317

Closed
campenr opened this Issue Mar 20, 2017 · 2 comments

Comments

Projects
None yet
2 participants
Contributor

campenr commented Mar 20, 2017

Hello,

First I'm raising this as an issue separatley from #311 as although they both are about mothurs subsampling the root cause may be unrelated.

I am finding that I get inconsistant results with summary.single if I pre subsample to a set size (in my case 12,000 sequences) using sub.sample(), or if I let summary.single do the sub sampling with subsample=12000.

Some of my samples are being subsampled from more than a million sequences down to only 12,000. The previous bug in issue #311 resulted in uneven distributions when subsmpling these large samples with sub.sample but that is now fixed. However, when I run summary.shared on the original samples and set subsample=12000 I get very different numbers of OTUs for these highly subsampled samples. The greater the sub sampling effort, the greater the observed difference in OTUs between the sub.sample followed by summary.single and the summary.single with subsample=12000 methods. See the table below for the actual number of OTUS (sobs) returned by summary.single depending on whether I use all sequences without subsampling, pre subsample the shared file with sub.sample(nseqs=12000) followed by summary.single(), or just run summary.single on the original shared file with subsample=12000:

Sample original sequence number All sequences sub.sample(nseqs=12000) summary.shared(subsample=12000)
1 1,147,118 2,315 522 6.595
2 769,206 1,496 537 10.837
3 883,794 2,225 543 7.779
4 355,720 3,792 842 63.485
5 351,501 871 387 62.685
6 271,146 653 330 99.583
7 219,120 375 179 117.588
8 22,308 1,154 1,002 1,003.387
9 14,319 783 758 757.326
10 29,100 1,625 1,200 1,255.675
11 12,605 468 465 464.532
12 277,670 770 189 32.813
13 755,534 3,743 925 30.434
14 33,209 418 330 335.374
15 36,559 322 199 203.401

I am running the latest version of mothur 1.37.4 (including the commits that fix the sub.sample issue on windows). I am running 64-bit Windows 10.

Let me know if there is anything else I can provide to help with this issue.

Cheers
Richard

Contributor

mothur-westcott commented Mar 20, 2017

I found the source of the problem and have released version 1.39.5 which corrects it. https://github.com/mothur/mothur/releases/tag/v1.39.5

Contributor

campenr commented Mar 20, 2017

Thanks! Works great on my end now too 👍

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment