- helpString += "The cluster command parameter options are phylip, column, name, count, method, cutoff, precision, sim, showabund and timing. Fasta or Phylip or column and name are required.\n";
+ helpString += "The cluster command parameter options are phylip, column, name, count, method, cutoff, precision, sim, showabund, timing, metric, iters, initialize. Fasta or Phylip or column and name are required.\n";
//helpString += "The adjust parameter is used to handle missing distances. If you set a cutoff, adjust=f by default. If not, adjust=t by default. Adjust=f, means ignore missing distances and adjust cutoff as needed with the average neighbor method. Adjust=t, will treat missing distances as 1.0. You can also set the value the missing distances should be set to, adjust=0.5 would give missing distances a value of 0.5.\n";
helpString += "The phylip and column parameter allow you to enter your distance file. \n";
helpString += "The fasta parameter allows you to enter your fasta file for use with the agc or dgc methods. \n";
helpString += "The name parameter allows you to enter your name file. \n";
helpString += "The count parameter allows you to enter your count file. \n A count or name file is required if your distance file is in column format.\n";
helpString += "The iters parameter allow you to set the maxiters for the opticluster method. \n";
helpString += "The metric parameter allows to select the metric in the opticluster method. Options are Matthews correlation coefficient (mcc), sensitivity (sens), specificity (spec), true positives + true negatives (tptn), false positives + false negatives (fpfn), true positives (tp), true negative (tn), false positive (fp), false negative (fn), f1score (f1score), accuracy (accuracy), positive predictive value (ppv), negative predictive value (npv), false discovery rate (fdr). Default=mcc.\n";
+ helpString += "The initialize parameter allows to select the initial randomization for the opticluster method. Options are singleton, meaning each sequence is randomly assigned to its each OTU, or oneotu meaning all sequences are assigned to oneotu. Default=singleton.\n";
helpString += "The delta parameter allows to set the stable value for the metric in the opticluster method (delta=0.0000). \n";
helpString += "The method parameter allows you to enter your clustering mothod. Options are furthest, nearest, average, weighted, agc, dgc and opti. Default=average. The agc and dgc methods require a fasta file.";
helpString += "The processors parameter allows you to specify the number of processors to use. The default is 1.\n";
- helpString += "The cluster.split command parameter options are file, fasta, phylip, column, name, count, cutoff, precision, method, splitmethod, taxonomy, taxlevel, showabund, timing, large, cluster, iters, delta, dist, processors. Fasta or Phylip or column and name are required.\n";
+ helpString += "The cluster.split command parameter options are file, fasta, phylip, column, name, count, cutoff, precision, method, splitmethod, taxonomy, taxlevel, showabund, timing, large, cluster, iters, delta, initialize, dist, processors. Fasta or Phylip or column and name are required.\n";
helpString += "The cluster.split command can split your files in 3 ways. Splitting by distance file, by classification, or by classification also using a fasta file. \n";
helpString += "For the distance file method, you need only provide your distance file and mothur will split the file into distinct groups. \n";
helpString += "For the classification method, you need to provide your distance file and taxonomy file, and set the splitmethod to classify. \n";
helpString += "The iters parameter allow you to set the maxiters for the opticluster method. \n";
helpString += "The metric parameter allows to select the metric in the opticluster method. Options are Matthews correlation coefficient (mcc), sensitivity (sens), specificity (spec), true positives + true negatives (tptn), false positives + false negatives (fpfn), true positives (tp), true negative (tn), false positive (fp), false negative (fn), f1score (f1score), accuracy (accuracy), positive predictive value (ppv), negative predictive value (npv), false discovery rate (fdr). Default=mcc.\n";
helpString += "The delta parameter allows to set the stable value for the metric in the opticluster method. Default=0.000\n";
+ helpString += "The initialize parameter allows to select the initial randomization for the opticluster method. Options are singleton, meaning each sequence is randomly assigned to its each OTU, or oneotu meaning all sequences are assigned to oneotu. Default=singleton.\n";
helpString += "The method parameter allows you to enter your clustering mothod. Options are furthest, nearest, average, weighted, agc, dgc and opti. Default=average. The agc and dgc methods require a fasta file.";
helpString += "The splitmethod parameter allows you to specify how you want to split your distance file before you cluster, default=distance, options distance, classify or fasta. \n";
helpString += "The taxonomy parameter allows you to enter the taxonomy file for your sequences, this is only valid if you are using splitmethod=classify. Be sure your taxonomy file does not include the probability scores. \n";
@@ -25,27 +25,52 @@ int OptiCluster::initialize(double& value, bool randomize) {
seqBin[numSeqs] = -1;
insertLocation = numSeqs;
- for (int i = 0; i < numSeqs; i++) { bins[i].push_back(i); }
-
- //maps randomized sequences to bins
- for (int i = 0; i < numSeqs; i++) {
- seqBin[i] = bins[i][0];
- randomizeSeqs.push_back(i);
- }
-
- if (randomize) { random_shuffle(randomizeSeqs.begin(), randomizeSeqs.end()); }
-
- //for each sequence (singletons removed on read)
- for (map<int, int>::iterator it = seqBin.begin(); it != seqBin.end(); it++) {
- if (it->second == -1) { }
- else {
- longlong numCloseSeqs = (matrix->getCloseSeqs(it->first)).size(); //does not include self
- falseNegatives += numCloseSeqs;
+ if (initialize == "singleton") {
+
+ //put everyone in own bin
+ for (int i = 0; i < numSeqs; i++) { bins[i].push_back(i); }
+
+ //maps randomized sequences to bins
+ for (int i = 0; i < numSeqs; i++) {
+ seqBin[i] = bins[i][0];
+ randomizeSeqs.push_back(i);
+ }
+
+ if (randomize) { random_shuffle(randomizeSeqs.begin(), randomizeSeqs.end()); }
+
+ //for each sequence (singletons removed on read)
+ for (map<int, int>::iterator it = seqBin.begin(); it != seqBin.end(); it++) {
+ if (it->second == -1) { }
+ else {
+ longlong numCloseSeqs = (matrix->getCloseSeqs(it->first)).size(); //does not include self
+ falseNegatives += numCloseSeqs;
+ }
+ }
+ falseNegatives /= 2; //square matrix
+ trueNegatives = numSeqs * (numSeqs-1)/2 - (falsePositives + falseNegatives + truePositives); //since everyone is a singleton no one clusters together. True negative = num far apart
- trueNegatives = numSeqs * (numSeqs-1)/2 - (falsePositives + falseNegatives + truePositives); //since everyone is a singleton no one clusters together. True negative = num far apart
0 comments on commit
30d65f3