# 6. KNN

In [4]:
# Load the Dataset
import pandas as pd
df = pd.read_csv("ml_customer_data.csv")
df.head()

Unnamed: 0,age,salary,purchased
0,56,19000,0
1,46,85588,1
2,32,53304,1
3,60,84449,1
4,25,97986,0


In [6]:
#Prepare the Features and Target
from sklearn.model_selection import train_test_split

X = df[['age', 'salary']]
y = df['purchased']

X_train, X_test, y_train, y_test = train_test_split(
    X, y, test_size=0.3, random_state=42
)

In [8]:
# Train the K-Nearest Neighbors model
from sklearn.neighbors import KNeighborsClassifier

model = KNeighborsClassifier(n_neighbors=5)  # You can tune n_neighbors later
model.fit(X_train, y_train)

In [10]:
# Make predictions
y_pred = model.predict(X_test)

# Evaluate the model
from sklearn.metrics import (
    accuracy_score, confusion_matrix, classification_report
)

# Accuracy
accuracy = accuracy_score(y_test, y_pred)
print("Accuracy:", accuracy)

# Confusion Matrix
cm = confusion_matrix(y_test, y_pred)
tn, fp, fn, tp = cm.ravel()
print("\nConfusion Matrix:")
print(cm)

Accuracy: 0.6866666666666666

Confusion Matrix:
[[66 26]
 [21 37]]


In [12]:
# Derived Metrics
precision = tp / (tp + fp) if (tp + fp) else 0
recall = tp / (tp + fn) if (tp + fn) else 0
f1 = 2 * precision * recall / (precision + recall) if (precision + recall) else 0
specificity = tn / (tn + fp) if (tn + fp) else 0
fpr = fp / (fp + tn) if (fp + tn) else 0
fnr = fn / (fn + tp) if (fn + tp) else 0

print(f"\nPrecision: {precision:.3f}")
print(f"Recall (Sensitivity): {recall:.3f}")
print(f"F1-score: {f1:.3f}")
print(f"Specificity: {specificity:.3f}")
print(f"False Positive Rate (FPR): {fpr:.3f}")
print(f"False Negative Rate (FNR): {fnr:.3f}")


Precision: 0.587
Recall (Sensitivity): 0.638
F1-score: 0.612
Specificity: 0.717
False Positive Rate (FPR): 0.283
False Negative Rate (FNR): 0.362


In [14]:
# Classification Report
print("\nClassification Report:")
print(classification_report(y_test, y_pred))


Classification Report:
              precision    recall  f1-score   support

           0       0.76      0.72      0.74        92
           1       0.59      0.64      0.61        58

    accuracy                           0.69       150
   macro avg       0.67      0.68      0.67       150
weighted avg       0.69      0.69      0.69       150



In [16]:
# Predict class probabilities (optional)
y_proba = model.predict_proba(X_test)
print("\nFirst 5 Predicted Probabilities:")
print(y_proba[:5])


First 5 Predicted Probabilities:
[[0.  1. ]
 [1.  0. ]
 [0.2 0.8]
 [0.6 0.4]
 [1.  0. ]]


## Model Summary

This section summarizes the performance and interpretation of the KNN classifier trained to predict whether a customer will purchase based on their age and salary.

**Accuracy:**  
0.6867  
The model correctly predicted purchase behavior in approximately 68.67% of test cases.

---

**Confusion Matrix:**  

| Actual \\ Predicted | Predicted 0 (No Purchase) | Predicted 1 (Purchase) |
|---------------------|----------------------------|-------------------------|
| Actual 0            | 66 (True Negative)         | 26 (False Positive)     |
| Actual 1            | 21 (False Negative)        | 37 (True Positive)      |

- The model made 47 incorrect predictions out of 150.
- It missed 21 actual buyers (FN) and falsely predicted 26 non-buyers as buyers (FP).

---

**Classification Report:**

| Class | Precision | Recall | F1-score | Support |
|-------|-----------|--------|----------|---------|
| 0     | 0.76      | 0.72   | 0.74     | 92      |
| 1     | 0.59      | 0.64   | 0.61     | 58      |
| **Overall Accuracy** | — | — | **0.69** | 150     |
| **Macro Avg** | 0.67 | 0.68 | 0.67 | 150 |
| **Weighted Avg** | 0.69 | 0.69 | 0.69 | 150 |

- **Precision (class 1):** 0.587 — Only 58.7% of predicted buyers were actually buyers.
- **Recall (class 1):** 0.638 — Model detected 63.8% of all actual buyers.
- **F1-score (class 1):** 0.612 — Moderate balance between precision and recall.

---

**Derived Metrics from Confusion Matrix:**

- **Specificity:** 0.717 — 71.7% of actual non-buyers were correctly identified
- **False Positive Rate (FPR):** 0.283 — 28.3% of non-buyers were misclassified as buyers
- **False Negative Rate (FNR):** 0.362 — 36.2% of buyers were missed by the model

---

**First 5 Predicted Probabilities:**  
[0.0 1.0]  
[1.0 0.0]  
[0.2 0.8]  
[0.6 0.4]  
[1.0 0.0]

Each row shows [P(not purchase), P(purchase)]. These probabilities reflect how confident the model is in its classification decision.

---

### Final Conclusion

KNN achieved moderate results on this dataset, with a 69% accuracy rate. It performed reasonably well at identifying buyers (recall = 64%) but had relatively low precision (59%), leading to more false positives. It may benefit from hyperparameter tuning (e.g., adjusting `k`) or feature scaling.