# 🤖 Module 4 — Statistics & Machine Learning

> Build predictive models that actually work. Learn the statistics behind ML and create end-to-end pipelines.

## 🚀 How to use this notebook
- Run the cell below to execute the complete ML pipeline lesson.
- Model outputs and evaluation metrics will display inline.
- Use the Playground section to experiment with different algorithms.

In [None]:
# 🔁 Run the original script (source of truth)
%run 4_statistics_ml.py

## 🧪 Playground
Try different algorithms, feature engineering techniques, and evaluation approaches.

In [None]:
from sklearn.datasets import make_classification
from sklearn.model_selection import train_test_split
from sklearn.linear_model import LogisticRegression
from sklearn.metrics import accuracy_score, classification_report
import pandas as pd

# Create sample classification problem
X, y = make_classification(n_samples=1000, n_features=4, n_classes=2, random_state=42)
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

# Train simple model
model = LogisticRegression(random_state=42)
model.fit(X_train, y_train)

# Evaluate
y_pred = model.predict(X_test)
accuracy = accuracy_score(y_test, y_pred)

print(f'🎯 Accuracy: {accuracy:.3f}')
print('\n📊 Detailed Report:')
print(classification_report(y_test, y_pred))
print('\n💡 Try changing the algorithm to RandomForest or SVM!')

## 🎯 Practice Tasks (Professional ML)
- Build a complete pipeline: data loading → cleaning → feature engineering → model → evaluation.
- Compare multiple algorithms (Linear, Tree, Ensemble) on the same dataset.
- Implement cross-validation to get robust performance estimates.
- Create feature importance plots to understand what drives predictions.

## ✅ Before you move on
- [ ] I understand the difference between regression and classification.
- [ ] I can split data properly (train/validation/test).
- [ ] I know how to evaluate models (accuracy, precision, recall, F1).
- [ ] I can identify and handle overfitting.
- [ ] I understand when to use different ML algorithms.