Why shouldn't you combine list or shared files from different datasets? #508

mothur-westcott · 2018-08-16T16:37:21Z

We are often asked if datasets can be combined after clustering. We don't recommend it because the OTUs created would be different if the datasets where combined before clustering. Here's a simple example to illustrate:

DataSet_1:

seq1 seq2 0.023
seq1 seq3 0.027
seq2 seq3 0.015
seq4 seq5 0.018

Clustered would create OTUs: seq1,seq2,seq3 seq4,seq5

DataSet_2:

seq6 seq8 0.023
seq6 seq10 0.027
seq7 seq9 0.015
seq8 seq10 0.018

Clustered would create OTUs: seq6,seq8,seq10 seq7,seq9

If you merged the list files you would get:

seq1,seq2,seq3 seq6,seq8,seq10 seq4,seq5 seq7,seq9

Now lets look at the OTUs if we clustered both datasets together:

seq1 seq2 0.023
seq1 seq3 0.027
seq2 seq3 0.015
seq4 seq5 0.018
seq6 seq8 0.023
seq6 seq10 0.027
seq7 seq9 0.015
seq8 seq10 0.018
seq1 seq7 0.011
seq1 seq9 0.011
seq2 seq7 0.023
seq3 seq7 0.024
seq5 seq7 0.023
seq5 seq9 0.022

Would create OTUs

seq1,seq7,seq9 seq10,seq8,seq6 seq3,seq2 seq4,seq5

The results can be more dramatic depending on the datasets you are combining.

mothur-westcott added Documentation Support New mothur user labels Aug 16, 2018

mothur-westcott added this to the Bug / Support Tracking milestone Aug 16, 2018

mothur-westcott closed this as completed Nov 9, 2018

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Why shouldn't you combine list or shared files from different datasets? #508

Why shouldn't you combine list or shared files from different datasets? #508

mothur-westcott commented Aug 16, 2018

Why shouldn't you combine list or shared files from different datasets? #508

Why shouldn't you combine list or shared files from different datasets? #508

Comments

mothur-westcott commented Aug 16, 2018