<a href="https://colab.research.google.com/github/sreent/machine-learning/blob/main/Final%20DNN%20Code%20Examples/Bike%20Sharing/Bike%20Sharing%20-%20Regression%20Example.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Bike Sharing - Regression Example

This notebook demonstrates the **Universal ML Workflow** applied to a **regression problem** - predicting continuous numerical values instead of categories.

## Learning Objectives

By the end of this notebook, you will be able to:
- Apply neural networks to **regression** (predicting continuous values)
- Understand differences between regression and classification:
  - Output layer: Linear activation vs. Softmax/Sigmoid
  - Loss function: MSE/MAE vs. Cross-entropy
  - Metrics: MAE, RMSE, R² vs. Accuracy, Precision, Recall
- Handle mixed feature types for regression problems
- Evaluate regression models with appropriate metrics

---

## Dataset Overview

| Attribute | Description |
|-----------|-------------|
| **Source** | [UCI Bike Sharing Dataset](https://archive.ics.uci.edu/ml/datasets/bike+sharing+dataset) |
| **Problem Type** | Regression |
| **Target Variable** | `cnt` - Total bike rental count |
| **Data Type** | Structured (Mixed Categorical & Numerical) |
| **Features** | Weather, date/time, and environmental variables |

---

## Regression vs. Classification

| Aspect | Regression | Classification |
|--------|------------|----------------|
| **Output** | Continuous number (e.g., 542 bikes) | Category (e.g., "spam") |
| **Output Activation** | Linear (none) | Softmax/Sigmoid |
| **Loss Function** | MSE, MAE, Huber | Cross-entropy |
| **Metrics** | MAE, RMSE, R² | Accuracy, F1, AUC |

---

## 1. Defining the Problem and Assembling a Dataset

**Problem Statement:** Predict the total number of bike rentals for a given day based on weather and calendar features.

**Business Context:**
- Bike sharing companies need to plan bike distribution across stations
- Accurate demand prediction helps with maintenance scheduling
- Understanding demand drivers informs business strategy

## 2. Choosing a Measure of Success

**Regression Metrics:**

| Metric | Formula | Interpretation |
|--------|---------|----------------|
| **MAE** | Mean Absolute Error | Average prediction error in original units (bikes) |
| **RMSE** | Root Mean Squared Error | Penalizes large errors more heavily |
| **R²** | Coefficient of Determination | Proportion of variance explained (0 to 1) |

**We'll use MAE as primary metric** - it's interpretable ("on average, we're off by X bikes").

## 3. Deciding on an Evaluation Protocol

- **Hold-out Test Set (10%)**: Final evaluation
- **Validation Set**: Monitor training, early stopping
- **K-Fold Cross-Validation**: Hyperparameter tuning

**Note:** For regression, we don't use `stratify` - instead we shuffle randomly.

## 4. Preparing Your Data

### 4.1 Import Libraries

In [None]:
import pandas as pd
import numpy as np

from sklearn.preprocessing import OneHotEncoder, StandardScaler
from sklearn.compose import ColumnTransformer
from sklearn.model_selection import train_test_split, KFold
from sklearn.metrics import mean_squared_error, r2_score

import tensorflow as tf
from tensorflow import keras
from tensorflow.keras import layers, regularizers
from keras.utils import np_utils
from keras.models import Sequential
from keras.layers import Dense, Dropout

# Keras Tuner for hyperparameter search
!pip install -q -U keras-tuner
import keras_tuner as kt

import itertools
import matplotlib.pyplot as plt

SEED = 204

tf.random.set_seed(SEED)
np.random.seed(SEED)

import warnings
warnings.filterwarnings('ignore')

In [349]:
df = pd.read_csv('Bike Sharing.csv', sep=',')

df.head()

Unnamed: 0,instant,dteday,season,yr,mnth,holiday,weekday,workingday,weathersit,temp,atemp,hum,windspeed,casual,registered,cnt
0,1,2011-01-01,1,0,1,0,6,0,2,0.344167,0.363625,0.805833,0.160446,331,654,985
1,2,2011-01-02,1,0,1,0,0,0,2,0.363478,0.353739,0.696087,0.248539,131,670,801
2,3,2011-01-03,1,0,1,0,1,1,1,0.196364,0.189405,0.437273,0.248309,120,1229,1349
3,4,2011-01-04,1,0,1,0,2,1,1,0.2,0.212122,0.590435,0.160296,108,1454,1562
4,5,2011-01-05,1,0,1,0,3,1,1,0.226957,0.22927,0.436957,0.1869,82,1518,1600


In [350]:
NUMERICAL_VARIABLES = ['temp', 'atemp', 'hum', 'windspeed']
CATEGORICAL_VARIABLES = ['season', 'holiday', 'weekday', 'workingday', 'weathersit']

In [351]:
features = df[NUMERICAL_VARIABLES + CATEGORICAL_VARIABLES]

In [352]:
TARGET_VARIABLE = 'cnt'

target = df[TARGET_VARIABLE]

In [353]:
TEST_SIZE = 0.1

X_train, X_test, y_train, y_test = train_test_split(features, target, test_size=TEST_SIZE, 
                                                    random_state=SEED, shuffle=True)

In [354]:
preprocessor = ColumnTransformer([
    ('one-hot-encoder', OneHotEncoder(handle_unknown="ignore"), CATEGORICAL_VARIABLES),
    ('standard_scaler', StandardScaler(), NUMERICAL_VARIABLES)])

_ = preprocessor.fit(X_train)

In [355]:
X_train, X_test = preprocessor.transform(X_train), preprocessor.transform(X_test)

In [356]:
y_train, y_test = y_train.values, y_test.values

In [357]:
VALIDATION_SIZE = X_test.shape[0]

X_train, X_val, y_train, y_val = train_test_split(X_train, y_train, 
                                                 test_size=VALIDATION_SIZE,
                                                 shuffle=True, random_state=SEED)

##  Developing a model that does better than a baseline

## 5. Developing a Model That Does Better Than a Baseline

**Regression Baselines:**
- **Mean Baseline:** Always predict the mean of training data
- **Linear Model:** Simple linear regression as a sanity check

In [None]:
INPUT_DIMENSION = X_train.shape[1]

LEARNING_RATE = 1
LOSS_FUNC = 'mean_squared_error'

# Compute baseline (mean predictor)
baseline = np.var(y_train) * len(y_train) / len(y_train)  # This is essentially the variance
baseline = np.mean((y_train - np.mean(y_train))**2)  # MSE of always predicting the mean

In [None]:
# Build a simple Single Layer Perceptron (no hidden layers)
# For regression: linear activation (no activation) on output layer
slp_model = Sequential([
    Dense(1, input_shape=(INPUT_DIMENSION,))
])

slp_model._name = 'Single_Layer_Perceptron'
slp_model.compile(optimizer=tf.keras.optimizers.RMSprop(learning_rate=LEARNING_RATE), 
                  loss=LOSS_FUNC)

slp_model.summary()

In [None]:
batch_size = 64
EPOCHS = 100

In [None]:
# Train the SLP model
history_slp = slp_model.fit(
    X_train, y_train,
    batch_size=batch_size, 
    epochs=EPOCHS,
    validation_data=(X_val, y_val),
    verbose=0
)

slp_val_score = slp_model.evaluate(X_val, y_val, verbose=0)

In [None]:
print('Mean Squared Error (Validation): {:.2f} (baseline={:.2f})'.format(slp_val_score, baseline))

In [None]:
def plot_training_history(history, monitor='loss'):
    loss, val_loss = history.history[monitor], history.history['val_' + monitor]

    if monitor == 'loss':
        monitor = monitor.capitalize()

    epochs = range(1, len(loss)+1)

    plt.plot(epochs, loss, 'b.', label=monitor)
    plt.plot(epochs, val_loss, 'r.', label='Validation ' + monitor)
    plt.xlim([0, len(loss)]) 
                              
    plt.title('Training and Validation ' + monitor + 's')
    plt.xlabel('Epochs')
    plt.ylabel(monitor)
    plt.legend()
    plt.grid()
    
    _ = plt.show()

In [None]:
plot_training_history(history_slp, monitor='loss')

## 6. Scaling Up: Developing a Model That Overfits

Adding hidden layers to capture non-linear relationships between features and bike demand.

In [ ]:
LEARNING_RATE = 0.01
EPOCHS = 200

# Build a Multi-Layer Perceptron with one hidden layer
mlp_model = Sequential([
    Dense(32, activation='relu', input_shape=(INPUT_DIMENSION,)),
    Dense(1)  # Linear output for regression
])

mlp_model._name = 'Multi_Layer_Perceptron'
mlp_model.compile(optimizer=tf.keras.optimizers.RMSprop(learning_rate=LEARNING_RATE), 
                  loss=LOSS_FUNC)

mlp_model.summary()

In [None]:
# Train the MLP model
history_mlp = mlp_model.fit(
    X_train, y_train,
    batch_size=batch_size, 
    epochs=EPOCHS,
    validation_data=(X_val, y_val),
    verbose=0
)

mlp_val_score = mlp_model.evaluate(X_val, y_val, verbose=0)

In [None]:
print('Mean Squared Error (Validation): {:.2f} (baseline={:.2f})'.format(mlp_val_score, baseline))

In [None]:
plot_training_history(history_mlp, monitor='loss')

## 7. Regularizing Your Model and Tuning Hyperparameters

Using **Hyperband** for efficient hyperparameter tuning with a frozen architecture.

### Why Hyperband?

**Hyperband** is more efficient than grid search because it:
1. Starts training many configurations for a few epochs
2. Eliminates poor performers early
3. Allocates more resources to promising configurations

This "early stopping" approach saves compute time while still exploring widely.

In [ ]:
# Hyperband Model Builder for Regression
def build_model_hyperband(hp):
    """
    Build Bike Sharing model with FROZEN architecture (2 layers: 64 -> 32 neurons).
    Only tunes regularization (Dropout) and learning rate.
    """
    model = keras.Sequential()
    model.add(layers.Input(shape=(INPUT_DIMENSION,)))

    # Fixed architecture: 2 hidden layers with 64 and 32 neurons
    # Layer 1: 64 neurons
    model.add(layers.Dense(64, activation='relu'))
    drop_0 = hp.Float('drop_0', 0.0, 0.5, step=0.1)
    model.add(layers.Dropout(drop_0))

    # Layer 2: 32 neurons
    model.add(layers.Dense(32, activation='relu'))
    drop_1 = hp.Float('drop_1', 0.0, 0.5, step=0.1)
    model.add(layers.Dropout(drop_1))

    # Output layer for regression
    model.add(layers.Dense(1))

    lr = hp.Float('lr', 1e-4, 1e-2, sampling='log')
    model.compile(
        optimizer=keras.optimizers.Adam(learning_rate=lr),
        loss=LOSS_FUNC
    )
    return model

In [None]:
# Configure Hyperband tuner
tuner = kt.Hyperband(
    build_model_hyperband,
    objective='val_loss',
    max_epochs=20,
    factor=3,
    directory='bike_hyperband',
    project_name='bike_tuning'
)

# Run Hyperband search
tuner.search(
    X_train, y_train,
    validation_data=(X_val, y_val),
    epochs=20,
    batch_size=batch_size
)

In [None]:
# Get best hyperparameters
best_hp = tuner.get_best_hyperparameters(num_trials=1)[0]
print("Best hyperparameters:")
print(f"  Dropout Layer 1: {best_hp.get('drop_0')}")
print(f"  Dropout Layer 2: {best_hp.get('drop_1')}")
print(f"  Learning Rate: {best_hp.get('lr')}")

# Build and train the best model
opt_model = tuner.hypermodel.build(best_hp)
opt_model.summary()

In [None]:
# Train the best model
history_opt = opt_model.fit(
    X_train, y_train,
    validation_data=(X_val, y_val),
    epochs=50,
    batch_size=batch_size,
    verbose=1
)

opt_val_score = opt_model.evaluate(X_val, y_val, verbose=0)

In [None]:
print('Mean Squared Error (Validation): {:.2f} (baseline={:.2f})'.format(opt_val_score, baseline))

In [None]:
preds = opt_model.predict(X_test, verbose=0)

print('Mean Squared Error (Test): {:.2f} (baseline={:.2f})'.format(mean_squared_error(y_test, preds), baseline))

---

## 8. Key Takeaways

1. **Regression Output:** Linear activation allows any real number output
2. **Loss:** MSE/MAE instead of cross-entropy  
3. **Metrics:** MAE interpretable; R² shows variance explained

---

## Appendix: Making the Code More Modular

For larger projects or when you want to reuse code across multiple notebooks, you can encapsulate model building and training into reusable functions. Here's how to make this code more modular:

In [None]:
# Modular Model Builder Function
def build_regression_model(input_dimension, hidden_layers=0, hidden_neurons=32, 
                           activation='relu', dropout=None, 
                           optimizer='rmsprop', loss='mean_squared_error', 
                           name=None):
    """
    Build a neural network for regression.
    
    Parameters:
    -----------
    input_dimension : int
        Number of input features
    hidden_layers : int
        Number of hidden layers (0 = single layer perceptron)
    hidden_neurons : int
        Number of neurons per hidden layer
    activation : str
        Activation function for hidden layers
    dropout : float or None
        Dropout rate (None = no dropout)
    optimizer : str or optimizer
        Optimizer for training
    loss : str
        Loss function
    name : str
        Model name
    
    Returns:
    --------
    Compiled Keras Sequential model
    """
    model = Sequential()
    
    for layer in range(hidden_layers):
        if layer == 0:
            model.add(Dense(hidden_neurons, activation=activation, 
                           input_shape=(input_dimension,)))
        else:
            model.add(Dense(hidden_neurons, activation=activation))
        
        if dropout is not None:
            model.add(Dropout(dropout))
    
    # Output layer - linear for regression
    if hidden_layers == 0:
        model.add(Dense(1, input_shape=(input_dimension,)))
    else:
        model.add(Dense(1))
    
    if name is not None:
        model._name = name
        
    model.compile(optimizer=optimizer, loss=loss)
    
    return model

In [None]:
# Modular Training Function
def train_model(model, X_train, y_train, X_val, y_val,
                batch_size=32, epochs=100, callbacks=None, verbose=0):
    """
    Train a Keras model and return training history and validation score.
    
    Parameters:
    -----------
    model : Keras model
        Compiled model to train
    X_train, y_train : arrays
        Training data
    X_val, y_val : arrays
        Validation data
    batch_size : int
        Batch size for training
    epochs : int
        Number of training epochs
    callbacks : list
        Keras callbacks (e.g., EarlyStopping)
    verbose : int
        Verbosity level
    
    Returns:
    --------
    dict with keys: 'model', 'val_score', 'history'
    """
    if callbacks is None:
        callbacks = []
    
    history = model.fit(
        X_train, y_train,
        batch_size=batch_size, 
        epochs=epochs,
        validation_data=(X_val, y_val),
        callbacks=callbacks,
        verbose=verbose
    )
    
    val_score = model.evaluate(X_val, y_val, verbose=0)
    
    return {
        'model': model, 
        'val_score': val_score, 
        'history': history
    }

In [ ]:
# Example: Using the modular functions
# 
# # Build a model
# model = build_regression_model(
#     input_dimension=INPUT_DIMENSION,
#     hidden_layers=2,
#     hidden_neurons=64,
#     dropout=0.3,
#     optimizer=tf.keras.optimizers.Adam(learning_rate=0.001),
#     name='Modular_Model'
# )
#
# # Train the model
# results = train_model(
#     model, X_train, y_train, X_val, y_val,
#     batch_size=64, epochs=100
# )
#
# # Access results
# print(f"Validation Score: {results['val_score']}")
# plot_training_history(results['history'])