Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Why shouldn't you combine list or shared files from different datasets? #508

Closed
mothur-westcott opened this issue Aug 16, 2018 · 0 comments

Comments

@mothur-westcott
Copy link
Contributor

We are often asked if datasets can be combined after clustering. We don't recommend it because the OTUs created would be different if the datasets where combined before clustering. Here's a simple example to illustrate:

DataSet_1:

seq1 seq2 0.023
seq1 seq3 0.027
seq2 seq3 0.015
seq4 seq5 0.018

Clustered would create OTUs: seq1,seq2,seq3 seq4,seq5

DataSet_2:

seq6 seq8 0.023
seq6 seq10 0.027
seq7 seq9 0.015
seq8 seq10 0.018

Clustered would create OTUs: seq6,seq8,seq10 seq7,seq9

If you merged the list files you would get:

seq1,seq2,seq3 seq6,seq8,seq10 seq4,seq5 seq7,seq9

Now lets look at the OTUs if we clustered both datasets together:

seq1 seq2 0.023
seq1 seq3 0.027
seq2 seq3 0.015
seq4 seq5 0.018
seq6 seq8 0.023
seq6 seq10 0.027
seq7 seq9 0.015
seq8 seq10 0.018
seq1 seq7 0.011
seq1 seq9 0.011
seq2 seq7 0.023
seq3 seq7 0.024
seq5 seq7 0.023
seq5 seq9 0.022

Would create OTUs

seq1,seq7,seq9 seq10,seq8,seq6 seq3,seq2 seq4,seq5

The results can be more dramatic depending on the datasets you are combining.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

1 participant