# Week 11 — Deep Learning Fundamentals

**Course:** Applied ML Foundations for SaaS Analytics  
**Week Focus:** Neural networks, backpropagation, and deep learning for SaaS predictions.

---

## 🎯 Learning Objectives

- Understand neural network architecture
- Implement feedforward networks with Keras/TensorFlow
- Apply regularization (Dropout, L1/L2, Early Stopping)
- Compare deep learning vs classical ML
- Build production-ready neural networks

In [None]:
import pandas as pd, numpy as np
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import StandardScaler
from sklearn.metrics import roc_auc_score, precision_score, recall_score
import warnings
warnings.filterwarnings('ignore')

# Load data
subs = pd.read_csv('../data/subscriptions.csv')
feature_usage = pd.read_csv('../data/feature_usage.csv')
user_events = pd.read_csv('../data/user_events.csv')

# Feature engineering
engagement = feature_usage.groupby('user_id').agg({'usage_count': 'sum', 'feature_name': 'nunique'}).reset_index()
engagement.columns = ['user_id', 'total_usage', 'features_adopted']
events = user_events.groupby('user_id').size().reset_index(name='total_events')

df = subs[['user_id', 'tenure_days', 'mrr', 'churn_date']].merge(engagement, on='user_id', how='left')
df = df.merge(events, on='user_id', how='left').fillna(0)
df['churned'] = df['churn_date'].notna().astype(int)

features = ['tenure_days', 'mrr', 'total_usage', 'features_adopted', 'total_events']
X = df[features]
y = df['churned']

scaler = StandardScaler()
X_scaled = scaler.fit_transform(X)
X_train, X_test, y_train, y_test = train_test_split(X_scaled, y, test_size=0.2, random_state=42)

print(f"Dataset: {len(df)} customers | Features: {len(features)}")
print(f"Train: {len(X_train)} | Test: {len(X_test)}")

## Part 1: Neural Network Basics

**Structure:**
- Input layer: Raw features
- Hidden layers: Learn representations
- Output layer: Prediction

**Activation functions:** ReLU, Sigmoid, Tanh

**💡 Depth Note:** Why do we need activation functions? Explore vanishing gradients problem.

In [None]:
try:
    from tensorflow.keras import Sequential
    from tensorflow.keras.layers import Dense, Dropout
    from tensorflow.keras.callbacks import EarlyStopping
    
    # Simple neural network
    model = Sequential([
        Dense(32, activation='relu', input_dim=X_train.shape[1]),
        Dense(16, activation='relu'),
        Dense(8, activation='relu'),
        Dense(1, activation='sigmoid')
    ])
    
    model.compile(optimizer='adam', loss='binary_crossentropy', metrics=['accuracy'])
    
    print("Model architecture:")
    model.summary()
except ImportError:
    print('TensorFlow not installed. Install with: pip install tensorflow')
    print('For CPU-only: pip install tensorflow-cpu')
    print('For GPU: pip install tensorflow-gpu')

## Part 2: Regularization Techniques

**Overfitting problem:** Model learns noise, poor generalization

**Solutions:**
- Dropout: Randomly deactivate neurons
- L1/L2: Penalize large weights
- Early Stopping: Stop when validation loss plateaus

**💡 Depth Note:** How much dropout? Compare different rates.

In [None]:
try:
    # Model with regularization
    model_reg = Sequential([
        Dense(64, activation='relu', input_dim=X_train.shape[1]),
        Dropout(0.3),
        Dense(32, activation='relu'),
        Dropout(0.3),
        Dense(16, activation='relu'),
        Dropout(0.2),
        Dense(1, activation='sigmoid')
    ])
    
    model_reg.compile(optimizer='adam', loss='binary_crossentropy', metrics=['accuracy'])
    
    # Train with early stopping
    early_stop = EarlyStopping(monitor='val_loss', patience=5, restore_best_weights=True)
    
    print("Training model with regularization...")
    history = model_reg.fit(
        X_train, y_train,
        validation_split=0.2,
        epochs=50,
        batch_size=32,
        callbacks=[early_stop],
        verbose=0
    )
    
    # Evaluate
    y_pred = (model_reg.predict(X_test, verbose=0) > 0.5).flatten()
    auc = roc_auc_score(y_test, model_reg.predict(X_test, verbose=0))
    print(f"\nRegularized Model - AUC: {auc:.4f}")
except Exception as e:
    print(f'Could not train model: {e}')

## Part 3: Deep Learning vs Classical ML

**Classical ML:** Interpretable, fast, works with small data

**Deep Learning:** Better with large data, complex patterns, black box

**💡 Depth Note:** When should you use each? Compare on different dataset sizes.

In [None]:
from sklearn.linear_model import LogisticRegression
from sklearn.ensemble import RandomForestClassifier

# Classical baseline
lr = LogisticRegression(random_state=42, max_iter=1000)
lr.fit(X_train, y_train)
lr_auc = roc_auc_score(y_test, lr.predict_proba(X_test)[:, 1])

rf = RandomForestClassifier(n_estimators=100, random_state=42)
rf.fit(X_train, y_train)
rf_auc = roc_auc_score(y_test, rf.predict_proba(X_test)[:, 1])

print("Comparison:")
print(f"Logistic Regression: {lr_auc:.4f}")
print(f"Random Forest:       {rf_auc:.4f}")
print(f"\n📊 For this dataset size ({len(df)} customers):")
print(f"   Classical ML performs similarly or better")
print(f"   Deep learning needs 100K+ examples to shine")

## Hands-On Exercises

### Exercise 1: Hyperparameter Tuning
Try different layer sizes, dropout rates, learning rates. What's optimal?

### Exercise 2: Visualization
Plot training/validation loss curves. What do they tell us?

In [None]:
# TODO: Hyperparameter grid search
# TODO: Plot learning curves
# TODO: Compare architectures

## Key Takeaways

✅ Neural network fundamentals  
✅ Activation functions and backpropagation  
✅ Regularization for preventing overfitting  
✅ When to use deep learning vs classical ML  

## 🔜 Next Week: Capstone Project