## Importing Libraries

In [56]:
import knn
import pandas as pd

## Training and Testing Data Frames

In [35]:
training_csv_name='atomsradii.csv'
testing_csv_name='testing.csv'
training_df=pd.read_csv(training_csv_name)
testing_df=pd.read_csv(testing_csv_name)
columns =[0,1]
class_column=3

## KNN Library

### euclidean_dist function

The euclidean_dist function takes 2 points as input calculates the euclidean distance between 2 points.  
The euclidean distance is given by:  
**Euclidean Distance = ((X<sub>1</sub> - X<sub>2</sub>)<sup>2</sup> + (Y<sub>1</sub> - Y<sub>2</sub>)<sup>2</sup>)<sup>1/2</sup>**

#### Example 1

Euclidean Distance between [1,1] and [2,2]

In [6]:
knn.euclidean_dist([1,1],[2,2])

1.4142135623730951

#### Example 2

Euclidean Distance between [0,0] and [2,2]

In [9]:
knn.euclidean_dist([0,0],[2,2])

2.8284271247461903

#### Example 3

Euclidean Distance between [0,1] and [3,5]

In [10]:
knn.euclidean_dist([0,1],[3,5])

5.0

### sorted_euclidean_dist function

The sorted_euclidean_dist function calculates the sorted euclidean distances between one point and all the points in the training data frame. It returns the sorted euclidean distances with the corresponding class of the point in the training data frame.

#### Example 1

Sorted euclidean distance between the point [0,0] and all points in the training data frame.

In [19]:
sorted_ed1 = knn.sorted_euclidean_dist([0,0],training_df,columns,class_column)
sorted_ed1

[(0.67468511173732, 'Alk'),
 (0.6977105417004963, 'TM'),
 (0.8154140052758476, 'TM'),
 (0.9052071586106686, 'Alk'),
 (0.9264987857520376, 'PT'),
 (1.0080674580602234, 'Alk'),
 (1.0480935072788116, 'Alk'),
 (1.122007130102122, 'PT'),
 (1.1676472069936192, 'PT'),
 (1.2880993750483696, 'PT'),
 (1.3433167906342867, 'Alk'),
 (1.3542894816101911, 'PT'),
 (1.4641379716406513, 'Alk'),
 (1.5250245899656834, 'Alk'),
 (1.627912774076056, 'Alk')]

#### Example 2

Sorted euclidean distance between the point [0.5,0.5] and all points in the training data frame.

In [20]:
sorted_ed2 = knn.sorted_euclidean_dist([0.5,0.5],training_df,columns,class_column)
sorted_ed2

[(0.18681541692269407, 'TM'),
 (0.21633307652783934, 'TM'),
 (0.28, 'PT'),
 (0.39395431207184417, 'Alk'),
 (0.43462627624201466, 'PT'),
 (0.44654227123532214, 'Alk'),
 (0.4933558553417604, 'PT'),
 (0.5554277630799527, 'Alk'),
 (0.5798275605729689, 'Alk'),
 (0.5993329625508679, 'PT'),
 (0.6664082832618455, 'PT'),
 (0.7310950690573695, 'Alk'),
 (0.8448076704197235, 'Alk'),
 (0.8807383266328315, 'Alk'),
 (0.9798469268207152, 'Alk')]

#### Example 3

Sorted euclidean distance between the point [1,1] and all points in the training data frame.

In [21]:
sorted_ed3 = knn.sorted_euclidean_dist([1,1],training_df,columns,class_column)
sorted_ed3

[(0.23259406699226015, 'PT'),
 (0.2433105012119288, 'PT'),
 (0.3448187929913333, 'PT'),
 (0.3512833614050059, 'PT'),
 (0.47507894080878826, 'Alk'),
 (0.5142956348249516, 'Alk'),
 (0.5197114584074513, 'Alk'),
 (0.5326349594234311, 'Alk'),
 (0.5462600113499065, 'PT'),
 (0.6363175307973213, 'TM'),
 (0.7200694410957876, 'Alk'),
 (0.7611832893594026, 'Alk'),
 (0.7789736837660178, 'TM'),
 (0.8100617260431455, 'Alk'),
 (0.9247702417357513, 'Alk')]

### class_prediction function

The class_prediction function takes the sorted euclidean distances list and the value of k as input and predicts the class of the point.

#### Example 1

Taking the sorted_ed1 above for the point [0,0] and the value of k as 3, let's predict the class of the point [0,0]

In [24]:
knn.class_prediction(sorted_ed1,3)

'TM'

#### Example 2

Taking the sorted_ed1 above for the point [0.5,0.5] and the value of k as 4, let's predict the class of the point [0.5,0.5]

In [27]:
knn.class_prediction(sorted_ed2,4)

'TM'

#### Example 3

Taking the sorted_ed1 above for the point [1,1] and the value of k as 5, let's predict the class of the point [1,1]

In [26]:
knn.class_prediction(sorted_ed3,5)

'PT'

### knn_classifier function

The knn_classifier function is a wrapping function that takes an input point, the training data frame, a value of k, the column indices of the feature vectors in the training data frame, and the class column index as input and predicts the class of the input point.

#### Example 1

Predicted class of the point [0,0] with k = 3

In [30]:
knn.knn_classifier(training_df,[0,0],3,[0,1],3)

'TM'

#### Example 2

Predicted class of the point [0.5,0.5] with k = 4

In [31]:
knn.knn_classifier(training_df,[0.5,0.5],4,[0,1],3)

'TM'

#### Example 3

Predicted class of the point [1,1] with k = 5

In [32]:
knn.knn_classifier(training_df,[1,1],5,[0,1],3)

'PT'

### knn_accuracy function

The knn_accuracy is a wrapping function takes the training data frame, the testing data frame, a list of values of k, the column indices of the feature vectors, and the class column index as input and returns a dictionary of the accuracies (as a percentage) for different values of k.

#### Example 1

The accuracy of the KNN classifier for the testing data frame for k values 1,2,3,4, and 5

In [49]:
k_list1 = [1,2,3,4,5]

In [50]:
knn.knn_accuracy(training_df,testing_df,k_list1,class_column,columns)

{1: 80.0, 2: 60.0, 3: 60.0, 4: 80.0, 5: 60.0}

#### Example 2

The accuracy of the KNN classifier for the testing data frame for k values 2,4, and 6

In [51]:
k_list2 = [2,4,6]

In [52]:
knn.knn_accuracy(training_df,testing_df,k_list2,class_column,columns)

{2: 60.0, 4: 80.0, 6: 60.0}

#### Example 3

The accuracy of the KNN classifier for the testing data frame for k values 1,3, and 5`

In [53]:
k_list3 = [1,3,5]

In [54]:
knn.knn_accuracy(training_df,testing_df,k_list3,class_column,columns)

{1: 80.0, 3: 60.0, 5: 60.0}