# 📊 ML Demo Notebook

This notebook demonstrates how the **data engineering pipeline** provides clean, curated features for ML training.

We’ll use the features generated in `data/features/` to train a simple **logistic regression churn model**.

In [1]:
import pandas as pd
import numpy as np
from sklearn.model_selection import train_test_split
from sklearn.linear_model import LogisticRegression
from sklearn.metrics import classification_report, confusion_matrix

### 1. Load engineered features

In [2]:
df = pd.read_parquet("../data/features/")
df.head()

### 2. Create fake churn labels (for demo only)

Since we don’t have real churn data, we’ll **simulate a churn label**: 
- Customers with low activity and spending → churned.
- Active/high-spending customers → retained.

In [3]:
df['churn'] = np.where((df['avg_spent'] < 50) & (df['event_types'] < 2), 1, 0)
df[['customer_id','avg_spent','event_types','churn']].head()

### 3. Train/Test Split

In [4]:
X = df[['avg_spent','event_types']]
y = df['churn']

X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.3, random_state=42)

### 4. Train Logistic Regression Model

In [5]:
model = LogisticRegression()
model.fit(X_train, y_train)
y_pred = model.predict(X_test)

### 5. Evaluate Model

In [6]:
print(confusion_matrix(y_test, y_pred))
print(classification_report(y_test, y_pred))

✅ Even though this model is very basic, it shows how our **data engineering pipeline produces ML-ready features**.