Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Cluster.fit command #415

Closed
mothur-westcott opened this issue Jan 22, 2018 · 4 comments
Closed

Cluster.fit command #415

mothur-westcott opened this issue Jan 22, 2018 · 4 comments

Comments

@mothur-westcott
Copy link
Contributor

mothur-westcott commented Jan 22, 2018

Fit sequences to existing dataset model.

  • Calc distances needed - Within new data and from new data to model
  • Initialize dataset model - sens.spec on existing data model, place new data in singletons
  • Move new data sequences around until optimal fit is found
  • Create new list file

Simple Example:
Old Data:
otu1 otu2 otu3 otu4 otu5
A,B,C,D,E F,G,H,I J,K L M

New Data:
N O P Q R

New List:
otu2 otu4 otu5
O,Q,R N P

@mothur-westcott mothur-westcott added this to the Version 1.40.0 milestone Jan 22, 2018
mothur-westcott added a commit that referenced this issue Jan 29, 2018
mothur-westcott added a commit that referenced this issue Jan 29, 2018
@mothur-westcott
Copy link
Contributor Author

Required: model list, fasta and count file as well as the fasta file to fit.
Optional: model’s distance matrix to save the calc time, name or count file for fit data.
Outputs: Clustered fit data in a list file with OTUlabels from your model data. If sequences are not fitted to the existing OTUS they are outputted in an *unfitted.list file.

mothur-westcott added a commit that referenced this issue Jan 29, 2018
mothur-westcott added a commit that referenced this issue Jan 30, 2018
mothur-westcott added a commit that referenced this issue May 3, 2018
mothur-westcott added a commit that referenced this issue May 11, 2018
mothur-westcott added a commit that referenced this issue May 17, 2018
mothur-westcott added a commit that referenced this issue Jun 4, 2018
mothur-westcott added a commit that referenced this issue Jun 5, 2018
mothur-westcott added a commit that referenced this issue Jun 8, 2018
mothur-westcott added a commit that referenced this issue Jun 11, 2018
mothur-westcott added a commit that referenced this issue Jun 12, 2018
mothur-westcott added a commit that referenced this issue Jun 12, 2018
mothur-westcott added a commit that referenced this issue Jun 14, 2018
mothur-westcott added a commit that referenced this issue Jun 15, 2018
mothur-westcott added a commit that referenced this issue Jun 15, 2018
mothur-westcott added a commit that referenced this issue Jul 5, 2018
mothur-westcott added a commit that referenced this issue Jul 16, 2018
@krmaas
Copy link

krmaas commented Sep 17, 2018

I"m sooooo excited for this option!!!!

@pschloss
Copy link
Contributor

shhhh! don't tell anyone, it's ultra-top secret 😄

mothur-westcott added a commit that referenced this issue Sep 25, 2018
The criteria parameter allows you to indicate which metric will influence the fitting. Options are fit, combo and both. Default=both.
Using fit means a sequence will be fitted to an OTU if the fit makes the metric for the fitted sequences better (only considers metric value generated by fit seqs).
Using combo means a sequence will be fitted to an OTU if the fit makes the metric for the fitted and the reference sequences better (considers metric value generated by all reference and fit sequences).
Using both means a sequence will be fitted to an OTU if it makes the metric for the fitted sequences better (fit) or the metric for the combo better (combo)

#415
mothur-westcott added a commit that referenced this issue Sep 25, 2018
The printref parameter allows to indicate whether you want the reference seqs printed with the fit seqs. For example, if you are trying to see how a new patient's data changes the clustering, you want to set printref=t so the old patient and new patient OTUs are printed together. If you want to see how your data would fit with a reference like silva, setting printref=f would output only your sequences to the list file. By default printref=t for denovo clustering and printref=f when using a reference.

#415
mothur-westcott added a commit that referenced this issue Oct 8, 2018
@mothur-westcott
Copy link
Contributor Author

Add accnos parameter to assign reference sequences.

mothur-westcott added a commit that referenced this issue Dec 18, 2018
The accnos parameter allows you to assign reference sequences by name. This can save time by allowing you to provide a distance matrix containing all the sequence distances rather than a sample matrix and reference matrix and mothur calculating the distances between the sample and reference.

#415
mothur-westcott added a commit that referenced this issue May 27, 2020
mothur-westcott added a commit that referenced this issue May 27, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants