
# K-Nearest Neighbors (KNN) with scikit-learn

**K-Nearest Neighbors (KNN)** algorithm 
to classify data using the **car evaluation dataset**. 

## Steps Involved:
1. Import the necessary libraries
2. Preprocess the data
3. Split the data into train and test sets
4. Train the KNN model
5. Evaluate the model's performance
6. Make predictions and analyze neighbors


In [None]:

# Step 1: Import necessary libraries
import sklearn
from sklearn.utils import shuffle
from sklearn.neighbors import KNeighborsClassifier
import pandas as pd
import numpy as np
from sklearn import linear_model, preprocessing



## Step 2: Load the dataset

We are using the **Car Evaluation Dataset**, which evaluates cars based on various attributes such as:
- Buying Price (`buying`)
- Maintenance Price (`maint`)
- Number of Doors (`door`)
- Capacity (`persons`)
- Size of Luggage Boot (`lug_boot`)
- Safety (`safety`)

The **target column** is the car's acceptability (`class`), which can be one of:
- `unacc` (unacceptable)
- `acc` (acceptable)
- `good` 
- `vgood` (very good)


In [None]:

# Load the dataset
data = pd.read_csv("car.data")
print(data.head())  # Display first few rows of the dataset



## Step 3: Data Preprocessing

We need to **convert categorical data** into numerical values since machine learning algorithms 
work with numerical data. We use **Label Encoding** to transform the categorical columns into integers.


In [None]:

# Initialize LabelEncoder
le = preprocessing.LabelEncoder()

# Transform categorical data into numerical values
buying = le.fit_transform(list(data["buying"]))
maint = le.fit_transform(list(data["maint"]))
door = le.fit_transform(list(data["door"]))
persons = le.fit_transform(list(data["persons"]))
lug_boot = le.fit_transform(list(data["lug_boot"]))
safety = le.fit_transform(list(data["safety"]))
cls = le.fit_transform(list(data["class"]))



## Step 4: Prepare Data for Training

We select the features (`X`) and the target (`y`). 
The features include attributes such as buying price, maintenance, etc., 
while the target is the car's acceptability class.


In [None]:

# Select features (X) and target (y)
X = list(zip(buying, maint, door, persons, lug_boot, safety))
y = list(cls)



## Step 5: Split the Data

We split the data into **training** and **testing** sets using an 90-10 split. 
This ensures that the model is evaluated on data it has not seen before.


In [None]:

# Split the data into training and testing sets
x_train, x_test, y_train, y_test = sklearn.model_selection.train_test_split(X, y, test_size=0.1)



## Step 6: Train the KNN Model

We initialize the **KNeighborsClassifier** with `k=9` (9 nearest neighbors) 
and train it on the training data.


In [None]:

# Initialize and train the KNN model
model = KNeighborsClassifier(n_neighbors=9)
model.fit(x_train, y_train)



## Step 7: Evaluate the Model

We calculate the **accuracy** of the model on the testing set to see how well it performs.


In [None]:

# Calculate accuracy of the model
acc = model.score(x_test, y_test)
print("Accuracy =", acc)



## Step 8: Make Predictions

We use the trained model to make predictions on the test data and compare them with the actual labels.


In [None]:

# Predict the class for test data
predicted = model.predict(x_test)
names = ["unacc", "acc", "good", "vgood"]

# Display predictions along with actual values and nearest neighbors
for i in range(len(predicted)):
    print(f"Predicted: {names[predicted[i]]}, Data: {x_test[i]}, Actual: {names[y_test[i]]}")

    # Get the 9 nearest neighbors
    neighbors = model.kneighbors([x_test[i]], 9, True)
    print("Neighbors:", neighbors)
