KNN metric classifier

Used software:

Implement the metric classifier kNN;
Make cross-validation; justify the choice of the number of folds for it;
Perform data visualization;
Configure the classifier with 2-3 metrics and 2-3 spatial transformations;
To assess the quality, you can use the Accuracy metric, but better - F1-measure;

Dataset.txt - sample of objects: coordinates of the dot(x,y),class{0,1}.

Plot:
- buildPlotWithAllDots;
- buildPlotCentroid;
- buildPlotCircle.
Statistic:
- compareClasses;
- computingRecall;
- computingSpecificity;
- computingPrecision;
- computingAccuracy;
- computingF1_measure.
DatasetProcessing:
- getDataset;
- getTrainTestDots;
- getDotsByClass;
- computingManhattanDistance2D;
- computingManhattanDistance3D;
- computingEuclideanDistance2D;
- computingEuclideanDistance3D;
- getCentroid;
- classifyDotCentroid;
- classifyDotCircle;
- classifyKNNCentroid;
- classifyKNNCircle;
- getTrainTestDots.
presentation files:
- output_mesh;
- output_table (all vars fixed);
- output_table (k_fold cycle).

Training dots	Test dots	k_neighbors	k_fold	Kernel functions	Metrics for configuring kNN	Spatial coordinate transformations	F1-measure	Recall	Specificity	Precision	Accuracy
30	88	10	1	none	manhattan	none	0,76087	0,833333	0,666667	0,7	0,747126
30	88	10	2	none	manhattan	elliptic	0,833846	0,904255	0,6875	0,774081	0,804598
30	88	10	3	none	manhattan	elliptic	0,82012	0,875969	0,742424	0,773565	0,808429
30	88	10	4	none	euclidean	elliptic	0,843119	0,914634	0,76087	0,787364	0,833333
30	88	10	5	gaussian	manhattan	none	0,855745	0,904545	0,781395	0,812642	0,843678
30	88	10	6	none	euclidean	elliptic	0,856968	0,909091	0,782946	0,812181	0,846743
30	88	10	7	none	manhattan	elliptic	0,85414	0,904762	0,796825	0,811838	0,848933
30	88	10	8	gaussian	manhattan	elliptic	0,901953	0,925	0,863095	0,881464	0,895115
30	88	10	9	none	manhattan	none	0,905819	0,953086	0,830688	0,865636	0,893997
30	88	10	10	none	euclidean	elliptic	0,900326	0,940909	0,839535	0,866322	0,890805

Question: What is linear classifier?

Answer: It is a classification algorithm, which based on the construcion of a linear separating surface. In the case of two classes of separating force is a hyperplane that divides the feature space into two half-spaces. Metric classifier is based on the concept of similarity between objects. In this task, the similarity measure between objects is distance.
Question: What is empirical risk (эмпирический риск)?

Answer: This is the average error value of the algorithm on the training sample.
Question: How to determine the optimal number of neighbors (k_neighbors)?

Answer: As a rule, lt's processed how: sqrt(datasetStartSample). In this problem datasetStartSample=118 => optimal k_neighbors≈10. You can check this experimentally.
Question: What cross-validation is used?

Answer: Was implemented k-fold cross validation.
Question: How to determine the optimal number of k-fold (k_fold)?

Answer: Experimentally. I was get the best f1-measure (0,905819) when k_fold was equal 9 (of course, because of the random shuffle function, this number may vary slightly) . Think that 10 is the optimal k_fold value.
Question: What does your decision contain?

Answer: Look at Program structure, also:

2 Spatial coordinate transformations:
- elliptic paraboloid;
- hyperbolic paraboloid.
2 Kernel functions:
- gaussian;
- logistic.
2 Metrics for configuring kNN:
- manhattan distance (p=1);
- euclidean distance (p=2).
Quality assessment:
- Sensitivity or Recall;
- Specificity;
- Precision;
- Accuracy;
- F1-measure.