# 5. SVM

In [2]:
# Load and Preview Data
import pandas as pd
df = pd.read_csv("ml_customer_data.csv")
df.head()

Unnamed: 0,age,salary,purchased
0,56,19000,0
1,46,85588,1
2,32,53304,1
3,60,84449,1
4,25,97986,0


In [4]:
# Split Features and Target
from sklearn.model_selection import train_test_split
X = df[['age', 'salary']]
y = df['purchased']

X_train, X_test, y_train, y_test = train_test_split(
    X, y, test_size=0.3, random_state=42
)


In [6]:
# Train the Support Vector Machine model
from sklearn.svm import SVC
model = SVC(probability=True, random_state=42)
model.fit(X_train, y_train)

# Make predictions
y_pred = model.predict(X_test)

In [8]:
# Evaluate Model
from sklearn.metrics import (
    accuracy_score,
    confusion_matrix,
    classification_report
)


In [14]:
# Accuracy
accuracy = accuracy_score(y_test, y_pred)
print("Accuracy:", accuracy)

# Confusion matrix
cm = confusion_matrix(y_test, y_pred)
tn, fp, fn, tp = cm.ravel()

print("Confusion Matrix:")
print(cm)

Accuracy: 0.6933333333333334
Confusion Matrix:
[[60 32]
 [14 44]]


In [18]:
# Derived Metrics
precision = tp / (tp + fp) if (tp + fp) else 0
recall = tp / (tp + fn) if (tp + fn) else 0
f1 = 2 * precision * recall / (precision + recall) if (precision + recall) else 0
specificity = tn / (tn + fp) if (tn + fp) else 0
fpr = fp / (fp + tn) if (fp + tn) else 0
fnr = fn / (fn + tp) if (fn + tp) else 0

print(f" Precision: {precision:.3f}")
print(f" Recall (Sensitivity): {recall:.3f}")
print(f" F1-score: {f1:.3f}")
print(f" Specificity: {specificity:.3f}")
print(f" False Positive Rate (FPR): {fpr:.3f}")
print(f" False Negative Rate (FNR): {fnr:.3f}")


 Precision: 0.579
 Recall (Sensitivity): 0.759
 F1-score: 0.657
 Specificity: 0.652
 False Positive Rate (FPR): 0.348
 False Negative Rate (FNR): 0.241


In [20]:
# Classification report
print("\n📄 Classification Report:")
print(classification_report(y_test, y_pred))


📄 Classification Report:
              precision    recall  f1-score   support

           0       0.81      0.65      0.72        92
           1       0.58      0.76      0.66        58

    accuracy                           0.69       150
   macro avg       0.69      0.71      0.69       150
weighted avg       0.72      0.69      0.70       150



In [22]:
# Predict class probabilities (optional)
y_proba = model.predict_proba(X_test)
print("First 5 Predicted Probabilities:")
print(y_proba[:5])

First 5 Predicted Probabilities:
[[0.31078761 0.68921239]
 [0.8945439  0.1054561 ]
 [0.43401511 0.56598489]
 [0.65006432 0.34993568]
 [0.84302796 0.15697204]]


## Model Summary

This section summarizes the performance and interpretation of the Support Vector Machine classifier trained to predict whether a customer will purchase based on their age and salary.

**Accuracy:**  
0.6933  
The model correctly predicted purchase behavior with **69.33%** accuracy.

---

**Confusion Matrix:**  

| Actual \\ Predicted | Predicted 0 (No Purchase) | Predicted 1 (Purchase) |
|---------------------|----------------------------|-------------------------|
| Actual 0            | 60 (True Negative)         | 32 (False Positive)     |
| Actual 1            | 14 (False Negative)        | 44 (True Positive)      |

- The model misclassified 46 out of 150 customers.
- It missed 14 real buyers (FN) and wrongly labeled 32 non-buyers as buyers (FP).

---

**Classification Report:**

| Class | Precision | Recall | F1-score | Support |
|-------|-----------|--------|----------|---------|
| 0     | 0.81      | 0.65   | 0.72     | 92      |
| 1     | 0.58      | 0.76   | 0.66     | 58      |
| **Overall Accuracy** | — | — | **0.69** | 150     |
| **Macro Avg** | 0.69 | 0.71 | 0.69 | 150 |
| **Weighted Avg** | 0.72 | 0.69 | 0.70 | 150 |

- **Precision (class 1)**: 0.579 — Only 57.9% of predicted buyers actually bought.
- **Recall (class 1)**: 0.759 — Model caught 75.9% of actual buyers.
- **F1-score (class 1)**: 0.657 — Fair balance between precision and recall.

---

**Derived Metrics (from Confusion Matrix):**  
- **Specificity**: 0.652 → 65.2% of actual non-buyers were correctly identified  
- **False Positive Rate (FPR)**: 0.348 → 34.8% of non-buyers were wrongly predicted as buyers  
- **False Negative Rate (FNR)**: 0.241 → 24.1% of buyers were missed

---

**First 5 Predicted Probabilities:**

[0.3108 0.6892]  
[0.8945 0.1055]  
[0.4340 0.5659]  
[0.6501 0.3499]  
[0.8430 0.1570]  

Each row shows [P(not purchase), P(purchase)]. The second number indicates the model's confidence in predicting a customer will purchase.

---

### ✅ Final Conclusion

The SVM model achieved moderate performance, with **strong recall but relatively low precision**, especially for predicting actual purchasers. It is effective at capturing buyers (high recall), but tends to overpredict purchases (low specificity and high false positive rate). Probabilistic outputs give additional interpretability for threshold-based decision-making.