# **Introduction**

0. [About Dataset](#t0.)
1. [Import Data & Libraries](#t1.)
2. [Exploratory Data Analysis (EDA)](#t2.)
    * 2.1. [Exploration of Label](#t2.1.)
3. [K Nearest Neighbors (KNN)](#t3.)
    * 3.1. [Elbow Method for Choosing Reasonable K Values](#t3.1.)
    * 3.2. [Grid Search for Choosing Reasonable K Values](#t3.2.)
    * 3.3. [Final Model](#t3.3.)

<a id="t0."></a>
# 0. About Dataset

Sonar (sound navigation ranging) is a technique that uses sound propagation
(usually underwater, as in submarine navigation) to navigate, communicate with
or detect objects on or under the surface of the water, such as other vessels.<br><br>
The data set contains the response metrics for 60 separate sonar frequencies sent
out against a known mine field (and known rocks)  and included **208** Rows and **61** columns.<br><br>
These frequencies are then
labeled with the known object they were beaming the sound at (either a rock or a
mine).<br>
This Dataset contains the information about Titanic ship

<a id="t1."></a>
# 1. Import Data & Libraries

In [None]:
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
%matplotlib inline

In [None]:
df= pd.read_csv('../input/sonar-dataset-suitable-for-classification/sonar.all-data.csv')

Data overview :

In [None]:
pd.DataFrame([df.shape],index=['Sonar Dataset'],columns=['Rows','Columns'])

In [None]:
df.head()

In [None]:
df.info()

##### check missing values in data

In [None]:
df.isna().sum().sum()

The information shows there is not any missing data.

<a id="t2."></a>
# 2. Exploratory Data Analysis (EDA)

<a id="t2.1."></a>
## 2.1. Exploration of Label

In [None]:
sns.countplot(data=df,x='Label')

As we see the Lable is balance .

<a id="t3."></a>
# 3. K Nearest Neighbors (KNN)

#### Determine the Features & Target Variable

In [None]:
X=df.drop('Label',axis=1)
y=df['Label']

#### Split the Data to Train & Test

In [None]:
from sklearn.model_selection import train_test_split

X_train, X_test, y_train, y_test = train_test_split(X,y, test_size = 0.30, random_state = 101)

#### Scaling the Features

In [None]:
from sklearn.preprocessing import StandardScaler

In [None]:
scaler=StandardScaler()

In [None]:
scaler.fit(X_train)

In [None]:
scaled_X_train=scaler.transform(X_train)
scaled_X_test=scaler.transform(X_test)

In [None]:
from sklearn.neighbors import KNeighborsClassifier

knn_model=KNeighborsClassifier(n_neighbors=1)

#### Train the Model

In [None]:
from sklearn.neighbors import KNeighborsClassifier

knn_model=KNeighborsClassifier(n_neighbors=1)
knn_model.fit(scaled_X_train, y_train)

#### Predicting Test Data

In [None]:
y_pred=knn_model.predict(scaled_X_test)

In [None]:
pd.DataFrame({'Y_Test': y_test,'Y_Pred':y_pred})

#### Evaluating the Model

In [None]:
from sklearn.metrics import classification_report,confusion_matrix, accuracy_score,precision_score,f1_score,recall_score

In [None]:
 # calculate metrics
accuracy=accuracy_score(y_test,y_pred)
recall=recall_score(y_test,y_pred, average="binary", pos_label="M")
precision=precision_score(y_test,y_pred, average="binary", pos_label="M")
f1=f1_score(y_test,y_pred, average="binary", pos_label="M")

In [None]:
pd.DataFrame({'KNN Metrics': [accuracy, recall, precision,f1]}, index=['accuracy', 'recall', 'precision','f1'])

### <a id="t3.1."></a>
### 3.1. Elbow Method for Choosing Reasonable K Values

#### Calculate accuracy and error rate for k values


In [None]:
test_error_rate=[]
acc = []
for k in range(1,40):
    knn_model=KNeighborsClassifier(n_neighbors=k)
    knn_model.fit(scaled_X_train,y_train)
    
    y_pred_test=knn_model.predict(scaled_X_test)
    accuracy=accuracy_score(y_test,y_pred_test)
    acc.append(accuracy)
    test_error_rate.append(1-accuracy)

In [None]:
plt.figure(figsize=(8,6))
plt.plot(range(1,40),test_error_rate,label='Test Error',color='#00e68a', linestyle='dashed', marker='o',
         markerfacecolor='#ff66cc', markersize=10)
plt.annotate(text='Optimal K',
            xy=(1, 0.1305),
            fontsize=20,
            xytext=(45, 60),
            textcoords='offset points',
            arrowprops=dict(arrowstyle='->', color='red'),
            bbox=dict(boxstyle='round', fc='0.8'))
plt.title('Error Rate vs. K Value')
plt.xlabel('K Values')
plt.ylabel('Error Rate')

In [None]:
plt.figure(figsize=(10,6))
plt.plot(range(1,40),acc,color = 'blue',linestyle='dashed', 
         marker='o',markerfacecolor='red', markersize=10)
plt.title('Accuracy vs. K Value')
plt.xlabel('K Values')
plt.ylabel('Accuracy')

From the plots, we can see that the smallest error we got is 0.126 at K=1.
We got the accuracy of 0.873 at K=1

#### Best **k = 1** via Elbow method

### <a id="t3.2."></a>
### 3.2. Grid Search for Choosing Reasonable K Values

#### Creating a Pipeline to find K value

In [None]:
from sklearn.model_selection import GridSearchCV
from sklearn.pipeline import Pipeline

In [None]:
scaler=StandardScaler()
knn_model=KNeighborsClassifier()

In [None]:
operations=[('scaler',scaler),('knn',knn_model)]
pipe=Pipeline(operations)

In [None]:
k_values=list(range(1,40))
param_grid={'knn__n_neighbors':k_values}

In [None]:
full_cv_classifier=GridSearchCV(pipe,param_grid,cv=5,scoring='accuracy')

In [None]:
full_cv_classifier.fit(scaled_X_train,y_train)

In [None]:
full_cv_classifier.best_estimator_.get_params()

#### Best **k = 1** via Grid Search

### **Note : ** Optimum K value in K-Nearest Neighbor ?! <br>
In KNN, finding the value of k is not easy. A small value of k means that noise will have a higher influence on the result and a large value make it computationally expensive.

##### Data scientists usually choose :
* 1. An odd number if the number of classes is 2

* 2. Another simple approach to select k is set k = sqrt(n). where n = number of data points in training data.<br><br>
<a href='https://saravananthirumuruganathan.wordpress.com/2010/05/17/a-detailed-introduction-to-k-nearest-neighbor-knn-algorithm/'>**Source**</a>

In [None]:
print(f'Number of data points in training data is {len(X_train)}\n')
print(f'K =  {np.sqrt(len(X_train))}')


<br>So The value of k is approximately 12

### <a id="t3.3."></a>
### 3.3. Final Model

##### **k = 1**

In [None]:
knn_model_1=KNeighborsClassifier(n_neighbors=1)
knn_model_1.fit(scaled_X_train,y_train)
y_pred_1= knn_model_1.predict(scaled_X_test)

In [None]:
print(classification_report(y_test, y_pred_1))

**Accuracy 87%**

##### **k = 12**

In [None]:
knn_model_12=KNeighborsClassifier(n_neighbors=12)
knn_model_12.fit(scaled_X_train,y_train)
y_pred_12= knn_model_12.predict(scaled_X_test)

In [None]:
print(classification_report(y_test, y_pred_12))

**Accuracy 68%**