# 1. Logistic Regression

This notebook uses the same dataset to demonstrate the machine learning model.

In [23]:
# Load and Preview Data
import pandas as pd
df = pd.read_csv("ml_customer_data.csv")
df.head()

Unnamed: 0,age,salary,purchased
0,56,19000,0
1,46,85588,1
2,32,53304,1
3,60,84449,1
4,25,97986,0


In [25]:
# Prepare Features (x) and Target (y)
from sklearn.model_selection import train_test_split
X = df[['age', 'salary']]
y = df['purchased']

# Split the dataset into training and testing sets (70% train, 30% test)
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size = 0.3, random_state = 42)

In [29]:
# Train the Logistic Regression Model
from sklearn.linear_model import LogisticRegression
model = LogisticRegression()
model.fit(X_train, y_train)

In [31]:
# Evaluate Accuracy
y_pred = model.predict(X_test)

from sklearn.metrics import accuracy_score
print("Accuracy:", accuracy_score(y_test, y_pred))

Accuracy: 0.7733333333333333


In [35]:
# Classification Report (gives precision, recall, f-1 score)
from sklearn.metrics import confusion_matrix, classification_report

# Confusion Matrix
print("Confusion Matrix:")
print(confusion_matrix(y_test, y_pred))

# Classification Report
print("Classification Report:")
print(classification_report(y_test, y_pred))

Confusion Matrix:
[[74 18]
 [16 42]]
Classification Report:
              precision    recall  f1-score   support

           0       0.82      0.80      0.81        92
           1       0.70      0.72      0.71        58

    accuracy                           0.77       150
   macro avg       0.76      0.76      0.76       150
weighted avg       0.77      0.77      0.77       150



In [37]:
# Model Parameters
print("Coefficients (age, salary):", model.coef_)
print("Intercept (bias):", model.intercept_)

Coefficients (age, salary): [[8.17251097e-02 6.29487795e-05]]
Intercept (bias): [-7.46065116]


In [39]:
# Predicted Probabilities
y_proba = model.predict_proba(X_test)
print("First 5 Prediction Probabilities:\n", y_proba[:5])

First 5 Prediction Probabilities:
 [[0.27057365 0.72942635]
 [0.96947661 0.03052339]
 [0.19729648 0.80270352]
 [0.81505008 0.18494992]
 [0.94136873 0.05863127]]


## Model Summary

This section summarizes the performance and interpretation of the Logistic Regression model trained to classify whether a customer will purchase based on their age and salary.

**Accuracy:**  
0.7733  
The model correctly predicted whether a customer would purchase 77.33% of the time.

**Confusion Matrix:**  

| Actual \\ Predicted | Predicted 0 (No Purchase) | Predicted 1 (Purchase) |
|---------------------|----------------------------|-------------------------|
| Actual 0            | 74 (True Negative)         | 18 (False Positive)     |
| Actual 1            | 16 (False Negative)        | 42 (True Positive)      |

**Classification Report:**  
| Class | Precision | Recall | F1-score | Support |
|-------|-----------|--------|----------|---------|
| 0     | 0.82      | 0.80   | 0.81     | 92      |
| 1     | 0.70      | 0.72   | 0.71     | 58      |
|       |           |        |          |         |
| **Accuracy** |       |        | **0.77** | 150     |

- Precision (class 1): 0.70 — Of all predicted buyers, 70% actually purchased.
- Recall (class 1): 0.72 — Of all actual buyers, 72% were correctly identified.
- F1-score: 0.71 — Balanced average of precision and recall.

**Model Coefficients:**  
- age: +0.0817  
- salary: +0.0000629  
- intercept: -7.4607  

Older customers are more likely to purchase. Higher salary also increases purchase likelihood, though only slightly. The intercept shifts the model’s baseline probability.

**First 5 Predicted Probabilities:**  
[0.2706 0.7294]  
[0.9695 0.0305]  
[0.1973 0.8027]  
[0.8151 0.1849]  
[0.9414 0.0586]  

Each row shows [P(not purchase), P(purchase)]. The second number is the model’s confidence that a customer will purchase.

### ✅ Final Conclusion

Logistic Regression is a strong baseline for binary classification. With 77% accuracy, well-balanced precision and recall, and interpretable coefficients, it performs well for predicting customer purchases and provides meaningful probability scores for decision-making.