### California House Pricing Prediction Model

- circa 20,000 observations
- each observation is from a block of houses, hence collective metrics like 'median' etc.
- median house prediction capped at $500,000

target: median house value for block group

metric used: MAE (mean absolute error)

housing features:
- MedInc (median income)
- HouseAge (median age of houses in block group)
- AveRooms (average rooms)
- AveBedrms (average bedrooms)
- Population (block group population)
- AveOccup (people per household average)
- Latitude (geographical coordinate of block group)
- Longitude (geographical coordinate of block group; more west is closer to coast)

## Setup

In [None]:
import os, sys, random, json, hashlib, datetime

import numpy as np
import pandas as pd

import torch
import torch.nn as nn
import torch.optim as optim

from sklearn.datasets import fetch_california_housing
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import StandardScaler # all features on comparable scale for stable and efficient training
from sklearn.metrics import mean_absolute_error, mean_squared_error

In [None]:
# ensure exact same randoms to allow for reproducibility and comparison
SEED = 33 # was 42, then 67

random.seed(SEED)
np.random.seed(SEED)
torch.manual_seed(SEED)
torch.cuda.manual_seed(SEED)

# ensure reproducibility
torch.backends.cudnn.deterministic = True
torch.backends.cudnn.benchmark = False

print(f"Random seed fixed at {SEED}")

Random seed fixed at 33


In [None]:
# environment info for reference
print("Python version:", sys.version)
print("Torch version:", torch.__version__)
print("Numpy version:", np.__version__)

Python version: 3.12.11 (main, Jun  4 2025, 08:56:18) [GCC 11.4.0]
Torch version: 2.8.0+cu126
Numpy version: 2.0.2


In [None]:
from google.colab import drive
drive.mount('/content/drive')

# creation of house_price_prediction/experiment_logs/
project_dir = "/content/drive/MyDrive/house_price_prediction"
log_dir = os.path.join(project_dir, "experiment_logs")
os.makedirs(log_dir, exist_ok=True)

Drive already mounted at /content/drive; to attempt to forcibly remount, call drive.mount("/content/drive", force_remount=True).


### Log Experiments

In [None]:
# logging experiment's model properties and performance metrics into drive directory

def log_experiment(config: dict, metrics: dict, log_dir=log_dir):
  """Save config + metrics to JSON with timestamp + hash"""

  # timestamp creation
  timestamp = datetime.datetime.now().strftime("%Y%m%d_%H%%M%S")
  # hash for reference between different configs
  config_hash = hashlib.md5(json.dumps(config, sort_keys=True).encode()).hexdigest()[:6]
  fname = f"exp_{timestamp}_{config_hash}.json"
  path = os.path.join(log_dir, fname)
  # config and metrics stored as JSON
  with open(path, "w") as f:
    json.dump({"config": config, "metrics": metrics}, f, indent=2)
  print(f"Experiment logged to {path}")

### Fetch Data

In [None]:
# loading dataset
housing = fetch_california_housing(as_frame=True) # pandas dataframe included
df = housing.frame

# structure
print("Shape:", df.shape)
print("Columns:", df.columns.tolist())

# view couple of rows (housing blocks)
print(df.head(2))

Shape: (20640, 9)
Columns: ['MedInc', 'HouseAge', 'AveRooms', 'AveBedrms', 'Population', 'AveOccup', 'Latitude', 'Longitude', 'MedHouseVal']
   MedInc  HouseAge  AveRooms  AveBedrms  Population  AveOccup  Latitude  \
0  8.3252      41.0  6.984127    1.02381       322.0  2.555556     37.88   
1  8.3014      21.0  6.238137    0.97188      2401.0  2.109842     37.86   

   Longitude  MedHouseVal  
0    -122.23        4.526  
1    -122.22        3.585  


### Baseline Split

- train/val/test split as 0.7/0.15/0.15

- potentially to do geographic cross validation in a later experiment

In [None]:
# features and their targets
X = df.drop(columns=["MedHouseVal"])
y = df["MedHouseVal"]

# train / temp split
X_train, X_temp, y_train, y_temp = train_test_split(X,
                                                    y,
                                                    test_size=0.3,
                                                    random_state=SEED)
# validation / test split
X_val, X_test, y_val, y_test = train_test_split(X_temp,
                                                y_temp,
                                                test_size=0.5,
                                                random_state=SEED)

# print shapes
print("Train:", X_train.shape, y_train.shape)
print("Val: ", X_val.shape, y_val.shape)
print("Test: ", X_test.shape, y_test.shape)

Train: (14448, 8) (14448,)
Val:  (3096, 8) (3096,)
Test:  (3096, 8) (3096,)


### Preprocessing Data

- feature scaling - standardization of feature values for stable optimization
- target transform - compresses range of MedHouseVal to ensure relative differences are considered between MedHouseVal. expm1 to invert back

In [None]:
import joblib

# standardize feature values
scaler = StandardScaler()
#X_train_scaled = scaler.fit_transform(X_train)
#X_val_scaled = scaler.transform(X_val)
#X_test_scaled = scaler.transform(X_test)

# range compression of MedHouseVal
y_train_log = np.log1p(y_train)
y_val_log = np.log1p(y_val)
y_test_log = np.log1p(y_test)

# save preprocessing artifacts
artifacts_dir = os.path.join(project_dir, "artifacts")
os.makedirs(artifacts_dir, exist_ok=True)

# save scaler used for standardization
scaler_path = os.path.join(artifacts_dir, "scaler.pkl")
joblib.dump(scaler, scaler_path) # joblib holds standardization parameters, transform() and inverse_transform()

# save range compression: log1p
target_transform_info = {
    "transform": "log1p",
    "inverse": "expm1"
}

with open(os.path.join(artifacts_dir, "target_transform.json"), "w") as f:
  json.dump(target_transform_info, f, indent=2)

print("Features scaled, targets transformed, artifacts saved.")

Features scaled, targets transformed, artifacts saved.


Usually in practice, we would set a 'dummy regressor', which is the loss of the model if we were to set the prediction to be the mean of the house values (targets).
- Gives a good baseline for which our model should be beating for it to be of use

### Setting up the Data

### * Updates

- additional features, is_coastal feature, and log1p on skewed features

In [None]:
# @title
import numpy as np
import pandas as pd
from sklearn.preprocessing import StandardScaler

# geographic encodings
def add_geo_encodings(df: pd.DataFrame) -> pd.DataFrame:

  out = df.copy()

  # degrees -> radians
  lat_rad = np.deg2rad(out["Latitude"].to_numpy())
  lon_rad = np.deg2rad(out["Longitude"].to_numpy())
  # new features
  out["lat_sin"] = np.sin(lat_rad)
  out["lat_cos"] = np.cos(lat_rad)
  out["lon_sin"] = np.sin(lon_rad)
  out["lon_cos"] = np.cos(lon_rad)
  return out

# coastal determination
def add_is_coastal(df: pd.DataFrame, band_deg: float = 0.8) -> pd.DataFrame:
  """
  Very rough heuristic, therefore band_deg open for change. Given a latitude, compare
  longitude with suitable range to determine is_coastal
  """
  out = df.copy()
  lat = out["Latitude"].to_numpy()
  lon = out["Longitude"].to_numpy()

  # condition for determining longitude reference
  ref_lon = np.where(lat >= 36.0, -122.5,
                     np.where(lat >= 34.0, -118.5, -117.2))
  # comparing longitude to longitude reference
  is_coastal = (np.abs(lon - ref_lon) <= band_deg).astype(int)
  out["is_coastal"] = is_coastal
  return out

# ratio for better understanding of area density and affluence
"""def add_population_per_household(df: pd.DataFrame) -> pd.DataFrame:
  out = df.copy()

  # retrieve population and aveoccup
  population = out["Population"].to_numpy()
  ave_occup = out["AveOccup"].to_numpy()

  # calculating population_per_household
  population_per_household = np.divide(population, ave_occup, out=np.zeros_like(population),
                                       where=ave_occup!=0)
  out["population_per_household"] = population_per_household
  return out"""

# applying geo encodings, then coastal boolean [ADD add_population_per_household()]
X_train_fe = add_is_coastal(add_geo_encodings(X_train))
X_val_fe = add_is_coastal(add_geo_encodings(X_val))
X_test_fe = add_is_coastal(add_geo_encodings(X_test))

# log1p on skewed features only
skewed_cols = ["Population", "AveRooms", "AveBedrms", "AveOccup"]
for col in skewed_cols:
  for frame in (X_train_fe, X_val_fe, X_test_fe):
    frame[col] = np.log1p(frame[col])

# standardize
scaler = StandardScaler()
X_train_scaled = scaler.fit_transform(X_train_fe)
X_val_scaled = scaler.transform(X_val_fe)
X_test_scaled = scaler.transform(X_test_fe)

In [None]:
import torch
import torch.nn as nn
from torch.utils.data import TensorDataset, DataLoader
import numpy as np
import os

device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
print("Device:", device)

# converting arrays (and series) into tensors
def to_tensor(x):
  return torch.tensor(x, dtype=torch.float32)

# feature data; array -> [N, 8] tensor
Xtr_t = to_tensor(X_train_scaled)
Xva_t = to_tensor(X_val_scaled)
Xte_t = to_tensor(X_test_scaled)

# target data; series -> [N, 1] tensor
ytr_t = to_tensor(y_train_log.values if hasattr(y_train_log, "values") else y_train_log).view(-1, 1)
yva_t = to_tensor(y_val_log.values if hasattr(y_val_log, "values") else y_val_log).view(-1, 1)
yte_t = to_tensor(y_test_log.values if hasattr(y_test_log, "values") else y_test_log).view(-1, 1)

# combine
train_ds = TensorDataset(Xtr_t, ytr_t)
val_ds = TensorDataset(Xva_t, yva_t)
test_ds = TensorDataset(Xte_t, yte_t)

# dataloaders
BATCH_SIZE = 512
train_loader = DataLoader(train_ds, batch_size=BATCH_SIZE, shuffle=True, drop_last=False)
val_loader = DataLoader(val_ds, batch_size=BATCH_SIZE, shuffle=False, drop_last=False)
test_loader = DataLoader(test_ds, batch_size=BATCH_SIZE, shuffle=False, drop_last=False)

Device: cpu


### Setting up the Model (64 -> 32 -> 16 -> 1)

- dropout and weight decay to prevent overreliance
- batch normalization for efficient gradient updates
- reduceLROnPlateau to prevent oscillation around optimal

In [None]:
class MLPRegressor(nn.Module):
  def __init__(self, in_dim=8, p_drop=0.2):
    super().__init__()
    self.net = nn.Sequential(
        nn.Linear(in_dim, 64),
        nn.BatchNorm1d(64),
        nn.ReLU(),
        nn.Dropout(p_drop),

        nn.Linear(64, 32),
        nn.BatchNorm1d(32),
        nn.ReLU(),
        nn.Dropout(p_drop),

        nn.Linear(32, 16),
        nn.BatchNorm1d(16),
        nn.ReLU(),
        nn.Dropout(p_drop),

        nn.Linear(16, 1)
    )

  def forward(self, x):
    return self.net(x)

In [None]:
model = MLPRegressor(in_dim=Xtr_t.shape[1], p_drop=0.2).to(device)

In [None]:
criterion = nn.MSELoss() # mean squared error
optimizer = torch.optim.Adam(model.parameters(), lr=1e-3, weight_decay=1e-4)
scheduler = torch.optim.lr_scheduler.ReduceLROnPlateau(
    optimizer, mode="min", factor=0.5, patience=5)

In [None]:
# evaluation function on validation for every epoch

def evaluate(model, loader, y_true_original): # y_true_original used so we don't have to convert targets to numpy once more
  model.eval()
  preds_log = []
  with torch.no_grad():
    for xb, _ in loader:
      xb = xb.to(device)
      preds_log.append(model(xb).cpu().numpy())

    # rearrange predictions into 1D vector
    preds_log = np.vstack(preds_log).reshape(-1)
    # convert values back to correct format
    preds = np.expm1(preds_log)
    # series -> array
    y_true = np.array(y_true_original)
    # MAE & RMSE - used for interpretability
    mae = np.mean(np.abs(preds - y_true))
    rmse = np.sqrt(np.mean((preds - y_true) ** 2))

    return mae, rmse, preds_log

### Training Loop

- early stopping on val MAE (20 epochs)

measured metrics:
- MSE - mean squared error
- MAE - mean absolute error
- RMSE - root mean squared error

In [None]:
EPOCHS = 200
patience = 20
best_val_mae = float("inf")
best_state = None
epochs_no_improve = 0

# target extraction as array
y_val_orig = y_val.values if hasattr(y_val, "values") else y_val
y_test_orig = y_test.values if hasattr(y_test, "values") else y_test

# loop
for epoch in range(1, EPOCHS + 1):
  model.train()
  train_loss = 0.0
  # f-prop, loss, b-prop, g-d
  for xb, yb in train_loader:
    xb, yb = xb.to(device), yb.to(device)
    optimizer.zero_grad()
    pred = model(xb)
    loss = criterion(pred, yb)
    loss.backward()
    optimizer.step()
    # loss.item() is average so multiply by batch size
    train_loss += loss.item() * xb.size(0)

  # average over all batches
  train_loss /= len(train_loader.dataset)

  # eval on validation in $$$
  val_mae, val_rmse, _ = evaluate(model, val_loader, y_val_orig)
  # for potential LR reduction
  scheduler.step(val_mae)

  # results in log1p and $$$
  print(f"Epoch {epoch:03d} | train_loss(MSE-log): {train_loss:.4f} | "
        f"val_MAE($): {val_mae:.4f} | val_RMSE($): {val_rmse:.4f}")

  # early stopping
  if val_mae < best_val_mae - 1e-6:
    best_val_mae = val_mae
    best_state = {
        "epoch": epoch,
        "model_state_dict": model.state_dict(),
        "optimizer_state_dict": optimizer.state_dict(),
        "val_mae": best_val_mae
    }
    epochs_no_improve = 0
  else:
    epochs_no_improve += 1
    if epochs_no_improve >= patience:
      print(f"Early stoping at epoch {epoch} (no improvement for {patience} epochs).")
      break

best_model_path = os.path.join(artifacts_dir, "mlp_regressor_best.pt")
if best_state is not None:
  torch.save(best_state, best_model_path)
  print(f"Saved best model to: {best_model_path} (best val MAE: {best_val_mae:.4f})")
else:
  print("Warning: no best state captured.")

Epoch 001 | train_loss(MSE-log): 1.0834 | val_MAE($): 1.8704 | val_RMSE($): 2.1357
Epoch 002 | train_loss(MSE-log): 0.6064 | val_MAE($): 1.4785 | val_RMSE($): 1.7565
Epoch 003 | train_loss(MSE-log): 0.3479 | val_MAE($): 1.1277 | val_RMSE($): 1.4254
Epoch 004 | train_loss(MSE-log): 0.2307 | val_MAE($): 0.8744 | val_RMSE($): 1.1853
Epoch 005 | train_loss(MSE-log): 0.1779 | val_MAE($): 0.6898 | val_RMSE($): 0.9923
Epoch 006 | train_loss(MSE-log): 0.1648 | val_MAE($): 0.6012 | val_RMSE($): 0.8867
Epoch 007 | train_loss(MSE-log): 0.1536 | val_MAE($): 0.5797 | val_RMSE($): 0.8601
Epoch 008 | train_loss(MSE-log): 0.1431 | val_MAE($): 0.5569 | val_RMSE($): 0.8342
Epoch 009 | train_loss(MSE-log): 0.1398 | val_MAE($): 0.5439 | val_RMSE($): 0.8167
Epoch 010 | train_loss(MSE-log): 0.1353 | val_MAE($): 0.5220 | val_RMSE($): 0.7793
Epoch 011 | train_loss(MSE-log): 0.1309 | val_MAE($): 0.5052 | val_RMSE($): 0.7627
Epoch 012 | train_loss(MSE-log): 0.1266 | val_MAE($): 0.5121 | val_RMSE($): 0.7795
Epoc

- Highest val_MAE of 0.4622 (~$46k) at epoch 51, but then beyond that slight overfitting occurring.

### Changes to be made:
- Change model architecture to 64 -> 32 -> 1 ; simpler model to allow for better generalization and less overfitting.
- SiLU used instead of ReLU to capture more subtle patterns initially zeroed by ReLU.
- Remove BatchNorm as sometimes tabular data performs better without.
- Switch to nn.SmoothL1Loss so large errors become less influential (MAE) and small errors become more influential (MSE), minimizing the influence of outliers.
- Reduce dropout to allow for greater opportunity for pattern recognition. (0.2 -> 0.1)
- Lower LR (to 5e-4) to reduce risk overshooting over minima.
- Higher batch size to reduce noise.
- Early stopping patience reduced to 10.

ReduceLROnPlateau [NOT IMPLEMENTED]

### Setting up Model 2 w/ above changes

In [None]:
class MLPRegressor2(nn.Module):
  def __init__(self, in_dim=8, p_drop=0.1):
    super().__init__()
    self.net = nn.Sequential(
        nn.Linear(in_dim, 64),
        nn.SiLU(),
        nn.Dropout(p_drop),

        nn.Linear(64, 32),
        nn.SiLU(),
        nn.Dropout(p_drop),

        nn.Linear(32, 1)
    )

  def forward(self, x):
    return self.net(x)

In [None]:
model_2 = MLPRegressor2(in_dim=Xtr_t.shape[1], p_drop=0.1).to(device)

In [None]:
BATCH_SIZE_2 = 1024
train_loader_2 = DataLoader(train_ds, batch_size=BATCH_SIZE_2, shuffle=True, drop_last=False)
val_loader_2 = DataLoader(val_ds, batch_size=BATCH_SIZE_2, shuffle=False, drop_last=False)
test_loader_2 = DataLoader(test_ds, batch_size=BATCH_SIZE_2, shuffle=False, drop_last=False)

In [None]:
criterion_2 = nn.SmoothL1Loss() # mean squared error
optimizer_2 = torch.optim.Adam(model_2.parameters(), lr=5e-4, weight_decay=1e-4)
scheduler_2 = torch.optim.lr_scheduler.ReduceLROnPlateau(
    optimizer, mode="min", factor=0.5, patience=5)

### Training Loop 2

In [None]:
EPOCHS = 200
patience_2 = 10
best_val_mae_2 = float("inf")
best_state = None
epochs_no_improve = 0

# target extraction as array
y_val_orig = y_val.values if hasattr(y_val, "values") else y_val
y_test_orig = y_test.values if hasattr(y_test, "values") else y_test

# loop
for epoch in range(1, EPOCHS + 1):
  model_2.train()
  train_loss = 0.0
  # f-prop, loss, b-prop, g-d
  for xb, yb in train_loader:
    xb, yb = xb.to(device), yb.to(device)
    optimizer_2.zero_grad()
    pred = model_2(xb)
    loss = criterion_2(pred, yb)
    loss.backward()
    optimizer_2.step()
    # loss.item() is average so multiply by batch size
    train_loss += loss.item() * xb.size(0)

  # average over all batches
  train_loss /= len(train_loader.dataset)

  # eval on validation in $$$
  val_mae_2, val_rmse_2, _ = evaluate(model_2, val_loader, y_val_orig)
  # for potential LR reduction
  scheduler_2.step(val_mae_2)

  # results in log1p and $$$
  print(f"Epoch {epoch:03d} | train_loss(MSE-log): {train_loss:.4f} | "
        f"val_MAE($): {val_mae_2:.4f} | val_RMSE($): {val_rmse:.4f}")

  # early stopping
  if val_mae_2 < best_val_mae_2 - 1e-6:
    best_val_mae_2 = val_mae_2
    best_state_2 = {
        "epoch": epoch,
        "model_state_dict": model_2.state_dict(),
        "optimizer_state_dict": optimizer_2.state_dict(),
        "val_mae": best_val_mae_2
    }
    epochs_no_improve = 0
  else:
    epochs_no_improve += 1
    if epochs_no_improve >= patience_2:
      print(f"Early stoping at epoch {epoch} (no improvement for {patience_2} epochs).")
      break

best_model_path_2 = os.path.join(artifacts_dir, "mlp_regressor_best_2.pt")
if best_state_2 is not None:
  torch.save(best_state_2, best_model_path_2)
  print(f"Saved best model to: {best_model_path_2} (best val MAE: {best_val_mae_2:.4f})")
else:
  print("Warning: no best state captured.")

Epoch 001 | train_loss(MSE-log): 0.3682 | val_MAE($): 1.6326 | val_RMSE($): 0.7312
Epoch 002 | train_loss(MSE-log): 0.2275 | val_MAE($): 1.2299 | val_RMSE($): 0.7312
Epoch 003 | train_loss(MSE-log): 0.0983 | val_MAE($): 0.7749 | val_RMSE($): 0.7312
Epoch 004 | train_loss(MSE-log): 0.0470 | val_MAE($): 0.6763 | val_RMSE($): 0.7312
Epoch 005 | train_loss(MSE-log): 0.0406 | val_MAE($): 0.6308 | val_RMSE($): 0.7312
Epoch 006 | train_loss(MSE-log): 0.0372 | val_MAE($): 0.5959 | val_RMSE($): 0.7312
Epoch 007 | train_loss(MSE-log): 0.0339 | val_MAE($): 0.5697 | val_RMSE($): 0.7312
Epoch 008 | train_loss(MSE-log): 0.0320 | val_MAE($): 0.5488 | val_RMSE($): 0.7312
Epoch 009 | train_loss(MSE-log): 0.0304 | val_MAE($): 0.5331 | val_RMSE($): 0.7312
Epoch 010 | train_loss(MSE-log): 0.0293 | val_MAE($): 0.5219 | val_RMSE($): 0.7312
Epoch 011 | train_loss(MSE-log): 0.0280 | val_MAE($): 0.5123 | val_RMSE($): 0.7312
Epoch 012 | train_loss(MSE-log): 0.0275 | val_MAE($): 0.5049 | val_RMSE($): 0.7312
Epoc

forgot to update val_RMSE to val_RMSE_2...

### Best Val MAE: 0.4348

### Good MAEs for reference

- Dummy baseline: ~0.52-0.55 MAE
- Tree ensembles: ~0.33-0.36 MAE
- Plain MLPs (this): 0.4s MAE

While 40,000 feels like a lot, relative to Californian homes (ranged $120-500k), it isn't bad for noisy data like this.

### * Updates in an attempt to improve val_MAE:

Feature engineering
- Geographic encoding using lon/lat; enabling pattern recognition between houses of similar location
- Bin (boolean) features; more explicit identification
- Log1p(x) of features to minimise domination of extreme outliers and improve generalization

### Training Loop 3 w/ *updates

In [None]:
EPOCHS = 200
patience_2 = 10
best_val_mae_2 = float("inf")
best_state = None
epochs_no_improve = 0
early_stopping = False

# target extraction as array
y_val_orig = y_val.values if hasattr(y_val, "values") else y_val
y_test_orig = y_test.values if hasattr(y_test, "values") else y_test

# loop
for epoch in range(1, EPOCHS + 1):
  model_2.train()
  train_loss = 0.0
  # f-prop, loss, b-prop, g-d
  for xb, yb in train_loader:
    xb, yb = xb.to(device), yb.to(device)
    optimizer_2.zero_grad()
    pred = model_2(xb)
    loss = criterion_2(pred, yb)
    loss.backward()
    optimizer_2.step()
    # loss.item() is average so multiply by batch size
    train_loss += loss.item() * xb.size(0)

  # average over all batches
  train_loss /= len(train_loader.dataset)

  # eval on validation in $$$
  val_mae_2, val_rmse_2, _ = evaluate(model_2, val_loader, y_val_orig)
  # for potential LR reduction
  scheduler_2.step(val_mae_2)

  # results in log1p and $$$
  print(f"Epoch {epoch:03d} | train_loss(MSE-log): {train_loss:.4f} | "
        f"val_MAE($): {val_mae_2:.4f} | val_RMSE($): {val_rmse_2:.4f}")

  # early stopping
  if val_mae_2 < best_val_mae_2 - 1e-6:
    best_val_mae_2 = val_mae_2
    best_state_2 = {
        "epoch": epoch,
        "model_state_dict": model_2.state_dict(),
        "optimizer_state_dict": optimizer_2.state_dict(),
        "val_mae": best_val_mae_2
    }
    epochs_no_improve = 0
  else:
    epochs_no_improve += 1
    if epochs_no_improve >= patience_2 and early_stopping:
      print(f"Early stopping at epoch {epoch} (no improvement for {patience_2} epochs).")
      break

best_model_path_2 = os.path.join(artifacts_dir, "mlp_regressor_best_3.pt")
if best_state_2 is not None:
  torch.save(best_state_2, best_model_path_2)
  print(f"Saved best model to: {best_model_path_2} (best val MAE: {best_val_mae_2:.4f})")
else:
  print("Warning: no best state captured.")

Epoch 001 | train_loss(MSE-log): 0.4631 | val_MAE($): 1.7408 | val_RMSE($): 2.0656
Epoch 002 | train_loss(MSE-log): 0.2527 | val_MAE($): 1.1988 | val_RMSE($): 1.5434
Epoch 003 | train_loss(MSE-log): 0.1054 | val_MAE($): 0.8756 | val_RMSE($): 1.2206
Epoch 004 | train_loss(MSE-log): 0.0670 | val_MAE($): 0.7702 | val_RMSE($): 1.1319
Epoch 005 | train_loss(MSE-log): 0.0528 | val_MAE($): 0.6845 | val_RMSE($): 1.0371
Epoch 006 | train_loss(MSE-log): 0.0439 | val_MAE($): 0.6207 | val_RMSE($): 0.9367
Epoch 007 | train_loss(MSE-log): 0.0378 | val_MAE($): 0.5725 | val_RMSE($): 0.8609
Epoch 008 | train_loss(MSE-log): 0.0335 | val_MAE($): 0.5341 | val_RMSE($): 0.7976
Epoch 009 | train_loss(MSE-log): 0.0300 | val_MAE($): 0.5087 | val_RMSE($): 0.7518
Epoch 010 | train_loss(MSE-log): 0.0285 | val_MAE($): 0.4893 | val_RMSE($): 0.7193
Epoch 011 | train_loss(MSE-log): 0.0269 | val_MAE($): 0.4748 | val_RMSE($): 0.6938
Epoch 012 | train_loss(MSE-log): 0.0265 | val_MAE($): 0.4648 | val_RMSE($): 0.6785
Epoc

note: model_2 has been reset and parameters messed up

### Best val_MAE: 0.3786 - strong

- Updates have had an effect
- Potentially also the removal of early stopping allowed for the access 0.30s range

### Updates for Training Loop 4

- include SWA (stochastic weighting average) which takes average over specified epochs to manage late-epoch oscillations.
- implement min-LR of 1e-6 for LRReductionOnPlateau and complement with increase of 300 epochs to maximise granular pattern recognition and prevent zeroing of LR [NOT IMPLEMENTED]

In [None]:
model_3 = MLPRegressor2(in_dim=Xtr_t.shape[1], p_drop=0.1).to(device)

In [None]:
BATCH_SIZE_3 = 1024
train_loader_3 = DataLoader(train_ds, batch_size=BATCH_SIZE_3, shuffle=True, drop_last=False)
val_loader_3 = DataLoader(val_ds, batch_size=BATCH_SIZE_3, shuffle=False, drop_last=False)
test_loader_3 = DataLoader(test_ds, batch_size=BATCH_SIZE_3, shuffle=False, drop_last=False)

In [None]:
criterion_3 = nn.SmoothL1Loss() # MSE for low errors and MAE for high errors
optimizer_3 = torch.optim.Adam(model_3.parameters(), lr=5e-4, weight_decay=1e-4)
scheduler_3 = torch.optim.lr_scheduler.ReduceLROnPlateau(
    optimizer, mode="min", factor=0.5, patience=6, min_lr=1e-6)

In [None]:
from torch.optim.swa_utils import AveragedModel, update_bn
import numpy as np
import torch, os

EPOCHS = 300
patience_3 = 10
best_val_mae_3 = float("inf")
best_state = None
epochs_no_improve = 0
early_stopping = False

# SWA config
swa_start = 200
swa_update_freq = 5
swa_model = AveragedModel(model_3).to(device)

# target extraction as array
y_val_orig = y_val.values if hasattr(y_val, "values") else y_val
y_test_orig = y_test.values if hasattr(y_test, "values") else y_test

# loop
for epoch in range(1, EPOCHS + 1):
  model_3.train()
  train_loss = 0.0
  # f-prop, loss, b-prop, g-d
  for xb, yb in train_loader:
    xb, yb = xb.to(device), yb.to(device)
    optimizer_3.zero_grad()
    pred = model_3(xb)
    loss = criterion_3(pred, yb)
    loss.backward()
    optimizer_3.step()
    # loss.item() is average so multiply by batch size
    train_loss += loss.item() * xb.size(0)

  # average over all batches
  train_loss /= len(train_loader.dataset)

  # eval on validation in $$$
  val_mae_3, val_rmse_3, _ = evaluate(model_3, val_loader, y_val_orig)
  # for potential LR reduction
  scheduler_3.step(val_mae_3)

  # results in log1p and $$$
  print(f"Epoch {epoch:03d} | train_loss(MSE-log): {train_loss:.4f} | "
        f"val_MAE($): {val_mae_3:.4f} | val_RMSE($): {val_rmse_3:.4f}")

  # early stopping
  if val_mae_3 < best_val_mae_3 - 1e-6:
    best_val_mae_3 = val_mae_3
    best_state_3 = {
        "epoch": epoch,
        "model_state_dict": model_3.state_dict(),
        "optimizer_state_dict": optimizer_3.state_dict(),
        "val_mae": best_val_mae_3
    }
    epochs_no_improve = 0
  else:
    epochs_no_improve += 1
    if epochs_no_improve >= patience_3 and early_stopping:
      print(f"Early stopping at epoch {epoch} (no improvement for {patience_3} epochs).")
      break

  # SWA update
  if epoch >= swa_start and ((epoch - swa_start) % swa_update_freq == 0):
    swa_model.update_parameters(model_3)

best_model_path_3 = os.path.join(artifacts_dir, "mlp_regressor_best_4.pt")
if best_state_3 is not None:
  torch.save(best_state_3, best_model_path_3)
  print(f"Saved best model to: {best_model_path_3} (best val MAE: {best_val_mae_3:.4f})")
else:
  print("Warning: no best state captured.")

Epoch 001 | train_loss(MSE-log): 0.3620 | val_MAE($): 1.5829 | val_RMSE($): 1.9343
Epoch 002 | train_loss(MSE-log): 0.1986 | val_MAE($): 1.1006 | val_RMSE($): 1.4524
Epoch 003 | train_loss(MSE-log): 0.0757 | val_MAE($): 0.7471 | val_RMSE($): 1.0541
Epoch 004 | train_loss(MSE-log): 0.0468 | val_MAE($): 0.6627 | val_RMSE($): 0.9906
Epoch 005 | train_loss(MSE-log): 0.0388 | val_MAE($): 0.6077 | val_RMSE($): 0.9112
Epoch 006 | train_loss(MSE-log): 0.0341 | val_MAE($): 0.5683 | val_RMSE($): 0.8490
Epoch 007 | train_loss(MSE-log): 0.0304 | val_MAE($): 0.5359 | val_RMSE($): 0.8026
Epoch 008 | train_loss(MSE-log): 0.0280 | val_MAE($): 0.5124 | val_RMSE($): 0.7598
Epoch 009 | train_loss(MSE-log): 0.0264 | val_MAE($): 0.4927 | val_RMSE($): 0.7255
Epoch 010 | train_loss(MSE-log): 0.0248 | val_MAE($): 0.4779 | val_RMSE($): 0.6995
Epoch 011 | train_loss(MSE-log): 0.0240 | val_MAE($): 0.4664 | val_RMSE($): 0.6791
Epoch 012 | train_loss(MSE-log): 0.0230 | val_MAE($): 0.4568 | val_RMSE($): 0.6655
Epoc

### Best val_MAE: 0.3725

- slight improvement following SWA. i do wonder if we were to just remove the min-LR how it would perform.

### Updates for Training Loop 5

- epochs to 500
- min_lr to 1e-7

In [None]:
model_4 = MLPRegressor2(in_dim=Xtr_t.shape[1], p_drop=0.1).to(device)

In [None]:
BATCH_SIZE_4 = 1024
train_loader_4 = DataLoader(train_ds, batch_size=BATCH_SIZE_4, shuffle=True, drop_last=False)
val_loader_4 = DataLoader(val_ds, batch_size=BATCH_SIZE_4, shuffle=False, drop_last=False)
test_loader_4 = DataLoader(test_ds, batch_size=BATCH_SIZE_4, shuffle=False, drop_last=False)

In [None]:
criterion_4 = nn.SmoothL1Loss() # MSE for low errors and MAE for high errors
optimizer_4 = torch.optim.Adam(model_4.parameters(), lr=5e-4, weight_decay=1e-4)
scheduler_4 = torch.optim.lr_scheduler.ReduceLROnPlateau(
    optimizer, mode="min", factor=0.5, patience=6, min_lr=1e-7)

In [None]:
from torch.optim.swa_utils import AveragedModel, update_bn
import numpy as np
import torch, os

EPOCHS = 500
patience_4 = 10
best_val_mae_4 = float("inf")
best_state = None
epochs_no_improve = 0
early_stopping = False

# SWA config
swa_start = 350
swa_update_freq = 5
swa_model = AveragedModel(model_4).to(device)

# target extraction as array
y_val_orig = y_val.values if hasattr(y_val, "values") else y_val
y_test_orig = y_test.values if hasattr(y_test, "values") else y_test

# loop
for epoch in range(1, EPOCHS + 1):
  model_4.train()
  train_loss = 0.0
  # f-prop, loss, b-prop, g-d
  for xb, yb in train_loader:
    xb, yb = xb.to(device), yb.to(device)
    optimizer_4.zero_grad()
    pred = model_4(xb)
    loss = criterion_4(pred, yb)
    loss.backward()
    optimizer_4.step()
    # loss.item() is average so multiply by batch size
    train_loss += loss.item() * xb.size(0)

  # average over all batches
  train_loss /= len(train_loader.dataset)

  # eval on validation in $$$
  val_mae_4, val_rmse_4, _ = evaluate(model_4, val_loader, y_val_orig)
  # for potential LR reduction
  scheduler_4.step(val_mae_4)

  current_lr = optimizer_4.param_groups[0]["lr"]

  # results in log1p and $$$
  print(f"Epoch {epoch:03d} | train_loss(MSE-log): {train_loss:.4f} | "
        f"val_MAE($): {val_mae_4:.4f} | val_RMSE($): {val_rmse_4:.4f} | LR: {current_lr:.2e}")

  # early stopping
  if val_mae_4 < best_val_mae_4 - 1e-6:
    best_val_mae_4 = val_mae_4
    best_state_4 = {
        "epoch": epoch,
        "model_state_dict": model_4.state_dict(),
        "optimizer_state_dict": optimizer_4.state_dict(),
        "val_mae": best_val_mae_4
    }
    epochs_no_improve = 0
  else:
    epochs_no_improve += 1
    if epochs_no_improve >= patience_4 and early_stopping:
      print(f"Early stopping at epoch {epoch} (no improvement for {patience_4} epochs).")
      break

  # SWA update
  if epoch >= swa_start and ((epoch - swa_start) % swa_update_freq == 0):
    swa_model.update_parameters(model_4)

best_model_path_4 = os.path.join(artifacts_dir, "mlp_regressor_best_5.pt")
if best_state_4 is not None:
  torch.save(best_state_4, best_model_path_4)
  print(f"Saved best model to: {best_model_path_4} (best val MAE: {best_val_mae_4:.4f})")
else:
  print("Warning: no best state captured.")

Epoch 001 | train_loss(MSE-log): 0.2247 | val_MAE($): 1.1989 | val_RMSE($): 1.5494 | LR: 5.00e-04
Epoch 002 | train_loss(MSE-log): 0.0902 | val_MAE($): 0.7959 | val_RMSE($): 1.1076 | LR: 5.00e-04
Epoch 003 | train_loss(MSE-log): 0.0525 | val_MAE($): 0.6787 | val_RMSE($): 0.9770 | LR: 5.00e-04
Epoch 004 | train_loss(MSE-log): 0.0422 | val_MAE($): 0.6190 | val_RMSE($): 0.9029 | LR: 5.00e-04
Epoch 005 | train_loss(MSE-log): 0.0359 | val_MAE($): 0.5752 | val_RMSE($): 0.8522 | LR: 5.00e-04
Epoch 006 | train_loss(MSE-log): 0.0319 | val_MAE($): 0.5394 | val_RMSE($): 0.7998 | LR: 5.00e-04
Epoch 007 | train_loss(MSE-log): 0.0292 | val_MAE($): 0.5121 | val_RMSE($): 0.7586 | LR: 5.00e-04
Epoch 008 | train_loss(MSE-log): 0.0271 | val_MAE($): 0.4919 | val_RMSE($): 0.7218 | LR: 5.00e-04
Epoch 009 | train_loss(MSE-log): 0.0253 | val_MAE($): 0.4743 | val_RMSE($): 0.6949 | LR: 5.00e-04
Epoch 010 | train_loss(MSE-log): 0.0244 | val_MAE($): 0.4635 | val_RMSE($): 0.6778 | LR: 5.00e-04
Epoch 011 | train_lo

### Best val_MAE: 0.3686

- it seems LR hasn't changed at all... has it even been updating during any of the loops...?

### Updates for Training Loop 6

- actually implement ReduceLROnPlateau...

In [None]:
model_5 = MLPRegressor2(in_dim=Xtr_t.shape[1], p_drop=0.1).to(device)

In [None]:
BATCH_SIZE_5 = 1024
train_loader_5 = DataLoader(train_ds, batch_size=BATCH_SIZE_5, shuffle=True, drop_last=False)
val_loader_5 = DataLoader(val_ds, batch_size=BATCH_SIZE_5, shuffle=False, drop_last=False)
test_loader_5 = DataLoader(test_ds, batch_size=BATCH_SIZE_5, shuffle=False, drop_last=False)

In [None]:
criterion_5 = nn.SmoothL1Loss() # MSE for low errors and MAE for high errors
optimizer_5 = torch.optim.Adam(model_5.parameters(), lr=5e-4, weight_decay=1e-4)
scheduler_5 = torch.optim.lr_scheduler.ReduceLROnPlateau(
    optimizer_5, mode="min", factor=0.5, patience=6, min_lr=1e-7)

In [None]:
from torch.optim.swa_utils import AveragedModel, update_bn
import numpy as np
import torch, os

EPOCHS = 500
patience_5 = 10
best_val_mae_5 = float("inf")
best_state = None
epochs_no_improve = 0
early_stopping = False

# SWA config
swa_start = 300
swa_update_freq = 5
swa_model = AveragedModel(model_5).to(device)

# target extraction as array
y_val_orig = y_val.values if hasattr(y_val, "values") else y_val
y_test_orig = y_test.values if hasattr(y_test, "values") else y_test

# loop
for epoch in range(1, EPOCHS + 1):
  model_5.train()
  train_loss = 0.0
  # f-prop, loss, b-prop, g-d
  for xb, yb in train_loader:
    xb, yb = xb.to(device), yb.to(device)
    optimizer_5.zero_grad()
    pred = model_5(xb)
    loss = criterion_5(pred, yb)
    loss.backward()
    optimizer_5.step()
    # loss.item() is average so multiply by batch size
    train_loss += loss.item() * xb.size(0)

  # average over all batches
  train_loss /= len(train_loader.dataset)

  # eval on validation in $$$
  val_mae_5, val_rmse_5, _ = evaluate(model_5, val_loader, y_val_orig)
  # for potential LR reduction
  scheduler_5.step(val_mae_5)

  current_lr = optimizer_5.param_groups[0]["lr"]

  # results in log1p and $$$
  print(f"Epoch {epoch:03d} | train_loss(MSE-log): {train_loss:.4f} | "
        f"val_MAE($): {val_mae_5:.4f} | val_RMSE($): {val_rmse_5:.4f} | LR: {current_lr:.2e}")

  # early stopping
  if val_mae_5 < best_val_mae_5 - 1e-6:
    best_val_mae_5 = val_mae_5
    best_state_5 = {
        "epoch": epoch,
        "model_state_dict": model_5.state_dict(),
        "optimizer_state_dict": optimizer_5.state_dict(),
        "val_mae": best_val_mae_5
    }
    epochs_no_improve = 0
  else:
    epochs_no_improve += 1
    if epochs_no_improve >= patience_5 and early_stopping:
      print(f"Early stopping at epoch {epoch} (no improvement for {patience_5} epochs).")
      break

  # SWA update
  if epoch >= swa_start and ((epoch - swa_start) % swa_update_freq == 0):
    swa_model.update_parameters(model_5)

best_model_path_5 = os.path.join(artifacts_dir, "mlp_regressor_best_6.pt")
if best_state_5 is not None:
  torch.save(best_state_5, best_model_path_5)
  print(f"Saved best model to: {best_model_path_5} (best val MAE: {best_val_mae_5:.4f})")
else:
  print("Warning: no best state captured.")

Epoch 001 | train_loss(MSE-log): 0.6201 | val_MAE($): 1.9994 | val_RMSE($): 2.3098 | LR: 5.00e-04
Epoch 002 | train_loss(MSE-log): 0.4319 | val_MAE($): 1.6458 | val_RMSE($): 1.9839 | LR: 5.00e-04
Epoch 003 | train_loss(MSE-log): 0.2007 | val_MAE($): 1.0366 | val_RMSE($): 1.3813 | LR: 5.00e-04
Epoch 004 | train_loss(MSE-log): 0.0857 | val_MAE($): 0.8490 | val_RMSE($): 1.2282 | LR: 5.00e-04
Epoch 005 | train_loss(MSE-log): 0.0656 | val_MAE($): 0.7430 | val_RMSE($): 1.0461 | LR: 5.00e-04
Epoch 006 | train_loss(MSE-log): 0.0537 | val_MAE($): 0.6657 | val_RMSE($): 0.9488 | LR: 5.00e-04
Epoch 007 | train_loss(MSE-log): 0.0446 | val_MAE($): 0.6048 | val_RMSE($): 0.8685 | LR: 5.00e-04
Epoch 008 | train_loss(MSE-log): 0.0381 | val_MAE($): 0.5586 | val_RMSE($): 0.8008 | LR: 5.00e-04
Epoch 009 | train_loss(MSE-log): 0.0346 | val_MAE($): 0.5227 | val_RMSE($): 0.7520 | LR: 5.00e-04
Epoch 010 | train_loss(MSE-log): 0.0312 | val_MAE($): 0.4984 | val_RMSE($): 0.7248 | LR: 5.00e-04
Epoch 011 | train_lo

### Best val_MAE: 0.3797

- LR being limited to 1.00e-07 didn't improve the model beyond its first few introductions. will try without it next time.

### Updates for Training Loop 7

- no min_lr

In [None]:
model_6 = MLPRegressor2(in_dim=Xtr_t.shape[1], p_drop=0.1).to(device)

In [None]:
BATCH_SIZE_6 = 1024
train_loader_6 = DataLoader(train_ds, batch_size=BATCH_SIZE_6, shuffle=True, drop_last=False)
val_loader_6 = DataLoader(val_ds, batch_size=BATCH_SIZE_6, shuffle=False, drop_last=False)
test_loader_6 = DataLoader(test_ds, batch_size=BATCH_SIZE_6, shuffle=False, drop_last=False)

In [None]:
criterion_6 = nn.SmoothL1Loss() # MSE for low errors and MAE for high errors
optimizer_6 = torch.optim.Adam(model_6.parameters(), lr=5e-4, weight_decay=1e-4)
scheduler_6 = torch.optim.lr_scheduler.ReduceLROnPlateau(
    optimizer_6, mode="min", factor=0.5, patience=6, min_lr=0)

In [None]:
from torch.optim.swa_utils import AveragedModel, update_bn
import numpy as np
import torch, os

EPOCHS = 500
patience_6 = 10
best_val_mae_6 = float("inf")
best_state = None
epochs_no_improve = 0
early_stopping = False

# SWA config
swa_start = 300
swa_update_freq = 5
swa_model = AveragedModel(model_6).to(device)

# target extraction as array
y_val_orig = y_val.values if hasattr(y_val, "values") else y_val
y_test_orig = y_test.values if hasattr(y_test, "values") else y_test

# loop
for epoch in range(1, EPOCHS + 1):
  model_6.train()
  train_loss = 0.0
  # f-prop, loss, b-prop, g-d
  for xb, yb in train_loader:
    xb, yb = xb.to(device), yb.to(device)
    optimizer_6.zero_grad()
    pred = model_6(xb)
    loss = criterion_6(pred, yb)
    loss.backward()
    optimizer_6.step()
    # loss.item() is average so multiply by batch size
    train_loss += loss.item() * xb.size(0)

  # average over all batches
  train_loss /= len(train_loader.dataset)

  # eval on validation in $$$
  val_mae_6, val_rmse_6, _ = evaluate(model_6, val_loader, y_val_orig)
  # for potential LR reduction
  scheduler_6.step(val_mae_6)

  current_lr = optimizer_6.param_groups[0]["lr"]

  # results in log1p and $$$
  print(f"Epoch {epoch:03d} | train_loss(MSE-log): {train_loss:.4f} | "
        f"val_MAE($): {val_mae_6:.4f} | val_RMSE($): {val_rmse_6:.4f} | LR: {current_lr:.2e}")

  # early stopping
  if val_mae_6 < best_val_mae_6 - 1e-6:
    best_val_mae_6 = val_mae_6
    best_state_6 = {
        "epoch": epoch,
        "model_state_dict": model_6.state_dict(),
        "optimizer_state_dict": optimizer_6.state_dict(),
        "val_mae": best_val_mae_6
    }
    epochs_no_improve = 0
  else:
    epochs_no_improve += 1
    if epochs_no_improve >= patience_6 and early_stopping:
      print(f"Early stopping at epoch {epoch} (no improvement for {patience_6} epochs).")
      break

  # SWA update
  if epoch >= swa_start and ((epoch - swa_start) % swa_update_freq == 0):
    swa_model.update_parameters(model_6)

best_model_path_6 = os.path.join(artifacts_dir, "mlp_regressor_best_7.pt")
if best_state_6 is not None:
  torch.save(best_state_6, best_model_path_6)
  print(f"Saved best model to: {best_model_path_6} (best val MAE: {best_val_mae_6:.4f})")
else:
  print("Warning: no best state captured.")

Epoch 001 | train_loss(MSE-log): 0.5237 | val_MAE($): 1.8973 | val_RMSE($): 2.2105 | LR: 5.00e-04
Epoch 002 | train_loss(MSE-log): 0.3663 | val_MAE($): 1.5360 | val_RMSE($): 1.8917 | LR: 5.00e-04
Epoch 003 | train_loss(MSE-log): 0.1776 | val_MAE($): 1.0338 | val_RMSE($): 1.3462 | LR: 5.00e-04
Epoch 004 | train_loss(MSE-log): 0.0844 | val_MAE($): 0.8565 | val_RMSE($): 1.1705 | LR: 5.00e-04
Epoch 005 | train_loss(MSE-log): 0.0631 | val_MAE($): 0.7425 | val_RMSE($): 1.0262 | LR: 5.00e-04
Epoch 006 | train_loss(MSE-log): 0.0513 | val_MAE($): 0.6660 | val_RMSE($): 0.9402 | LR: 5.00e-04
Epoch 007 | train_loss(MSE-log): 0.0427 | val_MAE($): 0.6085 | val_RMSE($): 0.8669 | LR: 5.00e-04
Epoch 008 | train_loss(MSE-log): 0.0373 | val_MAE($): 0.5642 | val_RMSE($): 0.8102 | LR: 5.00e-04
Epoch 009 | train_loss(MSE-log): 0.0326 | val_MAE($): 0.5307 | val_RMSE($): 0.7641 | LR: 5.00e-04
Epoch 010 | train_loss(MSE-log): 0.0302 | val_MAE($): 0.5064 | val_RMSE($): 0.7324 | LR: 5.00e-04
Epoch 011 | train_lo

### Best val_MAE: 0.3757

- not an improvement from training loop 5 (0.3686).

- potentially thinking of reintroducing the additional 16-node layer considering that despite decreasing LR, no new patterns hence 0.3757.

### Training Loop 8 w/ updates

- Introduce 16-node hidden layer to increase capacity given that optimisation is maximised.
- dropout to 0.15 to force variability in pattern identification
- weight_decay to 3e-4 and beta=0.5 in SmoothL1 to ensure outliers aren't overfit
- addition of `population_per_household = Population / AveOccup` feature. Indicative of density and affluence (lower better)
- min_lr = 1e-7

In [None]:
class MLPRegressor3(nn.Module):
  def __init__(self, in_dim=8, p_drop=0.15):
    super().__init__()
    self.net = nn.Sequential(
        nn.Linear(in_dim, 64),
        nn.SiLU(),
        nn.Dropout(p_drop),

        nn.Linear(64, 32),
        nn.SiLU(),
        nn.Dropout(p_drop),

        nn.Linear(32, 16),
        nn.SiLU(),
        nn.Dropout(p_drop),

        nn.Linear(16, 1)
    )

  def forward(self, x):
    return self.net(x)

In [None]:
model_7 = MLPRegressor3(in_dim=Xtr_t.shape[1], p_drop=0.15).to(device)

In [None]:
BATCH_SIZE_7 = 1024
train_loader_7 = DataLoader(train_ds, batch_size=BATCH_SIZE_7, shuffle=True, drop_last=False)
val_loader_7 = DataLoader(val_ds, batch_size=BATCH_SIZE_7, shuffle=False, drop_last=False)
test_loader_7 = DataLoader(test_ds, batch_size=BATCH_SIZE_7, shuffle=False, drop_last=False)

In [None]:
criterion_7 = nn.SmoothL1Loss(beta=0.5) # MSE for low errors and MAE for high errors
optimizer_7 = torch.optim.Adam(model_7.parameters(), lr=5e-4, weight_decay=3e-4)
scheduler_7 = torch.optim.lr_scheduler.ReduceLROnPlateau(
    optimizer_7, mode="min", factor=0.5, patience=6, min_lr=1e-7)

In [None]:
from torch.optim.swa_utils import AveragedModel, update_bn
import numpy as np
import torch, os

EPOCHS = 500
patience_7 = 10
best_val_mae_7 = float("inf")
best_state = None
epochs_no_improve = 0
early_stopping = False

# SWA config
swa_start = 300
swa_update_freq = 5
swa_model = AveragedModel(model_7).to(device)

# target extraction as array
y_val_orig = y_val.values if hasattr(y_val, "values") else y_val
y_test_orig = y_test.values if hasattr(y_test, "values") else y_test

# loop
for epoch in range(1, EPOCHS + 1):
  model_7.train()
  train_loss = 0.0
  # f-prop, loss, b-prop, g-d
  for xb, yb in train_loader:
    xb, yb = xb.to(device), yb.to(device)
    optimizer_7.zero_grad()
    pred = model_7(xb)
    loss = criterion_7(pred, yb)
    loss.backward()
    optimizer_7.step()
    # loss.item() is average so multiply by batch size
    train_loss += loss.item() * xb.size(0)

  # average over all batches
  train_loss /= len(train_loader.dataset)

  # eval on validation in $$$
  val_mae_7, val_rmse_7, _ = evaluate(model_7, val_loader, y_val_orig)
  # for potential LR reduction
  scheduler_7.step(val_mae_7)

  current_lr = optimizer_7.param_groups[0]["lr"]

  # results in log1p and $$$
  print(f"Epoch {epoch:03d} | train_loss(MSE-log): {train_loss:.4f} | "
        f"val_MAE($): {val_mae_7:.4f} | val_RMSE($): {val_rmse_7:.4f} | LR: {current_lr:.2e}")

  # early stopping
  if val_mae_7 < best_val_mae_7 - 1e-6:
    best_val_mae_7 = val_mae_7
    best_state_7 = {
        "epoch": epoch,
        "model_state_dict": model_7.state_dict(),
        "optimizer_state_dict": optimizer_7.state_dict(),
        "val_mae": best_val_mae_7
    }
    epochs_no_improve = 0
  else:
    epochs_no_improve += 1
    if epochs_no_improve >= patience_7 and early_stopping:
      print(f"Early stopping at epoch {epoch} (no improvement for {patience_7} epochs).")
      break

  # SWA update
  if epoch >= swa_start and ((epoch - swa_start) % swa_update_freq == 0):
    swa_model.update_parameters(model_7)

best_model_path_7 = os.path.join(artifacts_dir, "mlp_regressor_best_8.pt")
if best_state_7 is not None:
  torch.save(best_state_7, best_model_path_7)
  print(f"Saved best model to: {best_model_path_7} (best val MAE: {best_val_mae_7:.4f})")
else:
  print("Warning: no best state captured.")

Epoch 001 | train_loss(MSE-log): 0.6671 | val_MAE($): 1.8182 | val_RMSE($): 2.1493 | LR: 5.00e-04
Epoch 002 | train_loss(MSE-log): 0.4924 | val_MAE($): 1.4260 | val_RMSE($): 1.8196 | LR: 5.00e-04
Epoch 003 | train_loss(MSE-log): 0.2496 | val_MAE($): 0.9650 | val_RMSE($): 1.3505 | LR: 5.00e-04
Epoch 004 | train_loss(MSE-log): 0.1374 | val_MAE($): 0.7539 | val_RMSE($): 1.0429 | LR: 5.00e-04
Epoch 005 | train_loss(MSE-log): 0.1041 | val_MAE($): 0.6539 | val_RMSE($): 0.9098 | LR: 5.00e-04
Epoch 006 | train_loss(MSE-log): 0.0886 | val_MAE($): 0.5982 | val_RMSE($): 0.8411 | LR: 5.00e-04
Epoch 007 | train_loss(MSE-log): 0.0791 | val_MAE($): 0.5563 | val_RMSE($): 0.7957 | LR: 5.00e-04
Epoch 008 | train_loss(MSE-log): 0.0716 | val_MAE($): 0.5315 | val_RMSE($): 0.7713 | LR: 5.00e-04
Epoch 009 | train_loss(MSE-log): 0.0672 | val_MAE($): 0.5051 | val_RMSE($): 0.7408 | LR: 5.00e-04
Epoch 010 | train_loss(MSE-log): 0.0637 | val_MAE($): 0.4957 | val_RMSE($): 0.7263 | LR: 5.00e-04
Epoch 011 | train_lo

### Best val_MAE: 0.3952

### Training Loop 9

- same settings as training loop 5 (highest performing; val_MAE of 0.3686), except with fixed LR scheduler
- inclusion of SWA, population_per_household

In [None]:
model_8 = MLPRegressor2(in_dim=Xtr_t.shape[1], p_drop=0.1).to(device)

In [None]:
BATCH_SIZE_8 = 1024
train_loader_8 = DataLoader(train_ds, batch_size=BATCH_SIZE_8, shuffle=True, drop_last=False)
val_loader_8 = DataLoader(val_ds, batch_size=BATCH_SIZE_8, shuffle=False, drop_last=False)
test_loader_8 = DataLoader(test_ds, batch_size=BATCH_SIZE_8, shuffle=False, drop_last=False)

In [None]:
criterion_8 = nn.SmoothL1Loss() # MSE for low errors and MAE for high errors
optimizer_8 = torch.optim.Adam(model_8.parameters(), lr=5e-4, weight_decay=1e-4)
scheduler_8 = torch.optim.lr_scheduler.ReduceLROnPlateau(
    optimizer_8, mode="min", factor=0.5, patience=6, min_lr=1e-7)

In [None]:
from torch.optim.swa_utils import AveragedModel, update_bn
import numpy as np
import torch, os

EPOCHS = 500
patience_8 = 10
best_val_mae_8 = float("inf")
best_state = None
epochs_no_improve = 0
early_stopping = False

# SWA config
swa_start = 300
swa_update_freq = 5
swa_model = AveragedModel(model_8).to(device)

# target extraction as array
y_val_orig = y_val.values if hasattr(y_val, "values") else y_val
y_test_orig = y_test.values if hasattr(y_test, "values") else y_test

# loop
for epoch in range(1, EPOCHS + 1):
  model_8.train()
  train_loss = 0.0
  # f-prop, loss, b-prop, g-d
  for xb, yb in train_loader:
    xb, yb = xb.to(device), yb.to(device)
    optimizer_8.zero_grad()
    pred = model_8(xb)
    loss = criterion_8(pred, yb)
    loss.backward()
    optimizer_8.step()
    # loss.item() is average so multiply by batch size
    train_loss += loss.item() * xb.size(0)

  # average over all batches
  train_loss /= len(train_loader.dataset)

  # eval on validation in $$$
  val_mae_8, val_rmse_8, _ = evaluate(model_8, val_loader, y_val_orig)
  # for potential LR reduction
  scheduler_8.step(val_mae_8)

  current_lr = optimizer_8.param_groups[0]["lr"]

  # results in log1p and $$$
  print(f"Epoch {epoch:03d} | train_loss(MSE-log): {train_loss:.4f} | "
        f"val_MAE($): {val_mae_8:.4f} | val_RMSE($): {val_rmse_8:.4f} | LR: {current_lr:.2e}")

  # early stopping
  if val_mae_8 < best_val_mae_8 - 1e-6:
    best_val_mae_8 = val_mae_8
    best_state_8 = {
        "epoch": epoch,
        "model_state_dict": model_8.state_dict(),
        "optimizer_state_dict": optimizer_8.state_dict(),
        "val_mae": best_val_mae_8
    }
    epochs_no_improve = 0
  else:
    epochs_no_improve += 1
    if epochs_no_improve >= patience_8 and early_stopping:
      print(f"Early stopping at epoch {epoch} (no improvement for {patience_8} epochs).")
      break

  # SWA update
  if epoch >= swa_start and ((epoch - swa_start) % swa_update_freq == 0):
    swa_model.update_parameters(model_8)

best_model_path_8 = os.path.join(artifacts_dir, "mlp_regressor_best_9.pt")
if best_state_8 is not None:
  torch.save(best_state_8, best_model_path_8)
  print(f"Saved best model to: {best_model_path_8} (best val MAE: {best_val_mae_8:.4f})")
else:
  print("Warning: no best state captured.")

Epoch 001 | train_loss(MSE-log): 0.4588 | val_MAE($): 1.7846 | val_RMSE($): 2.1158 | LR: 5.00e-04
Epoch 002 | train_loss(MSE-log): 0.2962 | val_MAE($): 1.3721 | val_RMSE($): 1.7349 | LR: 5.00e-04
Epoch 003 | train_loss(MSE-log): 0.1321 | val_MAE($): 0.9568 | val_RMSE($): 1.3301 | LR: 5.00e-04
Epoch 004 | train_loss(MSE-log): 0.0768 | val_MAE($): 0.8198 | val_RMSE($): 1.1556 | LR: 5.00e-04
Epoch 005 | train_loss(MSE-log): 0.0605 | val_MAE($): 0.7347 | val_RMSE($): 1.0570 | LR: 5.00e-04
Epoch 006 | train_loss(MSE-log): 0.0491 | val_MAE($): 0.6678 | val_RMSE($): 0.9721 | LR: 5.00e-04
Epoch 007 | train_loss(MSE-log): 0.0428 | val_MAE($): 0.6139 | val_RMSE($): 0.8923 | LR: 5.00e-04
Epoch 008 | train_loss(MSE-log): 0.0370 | val_MAE($): 0.5682 | val_RMSE($): 0.8282 | LR: 5.00e-04
Epoch 009 | train_loss(MSE-log): 0.0339 | val_MAE($): 0.5332 | val_RMSE($): 0.7795 | LR: 5.00e-04
Epoch 010 | train_loss(MSE-log): 0.0303 | val_MAE($): 0.5085 | val_RMSE($): 0.7405 | LR: 5.00e-04
Epoch 011 | train_lo

### Best val_MAE: 0.3776

- lower val_MAE and train_loss despite ReduceLROnPlateau, indicating that LR may have dropped too early causing the model to 'dig deeper' in a local area that wasn't good.

- indicates that loop 5 gave greater opportunity (time) for model to explore wider ranges for low minima.

### Running Training Loop 5 (0.3686 - seed 42) on 2 Different Seeds...

- no SWA, population_per_household and reduceLROnPlateau

- seeds: 67 (0.3691), 33 (0.3558)

**AVG: 0.3645 = $36,450**

SEED 67 | val_MAE: 0.3691

In [None]:
model_9 = MLPRegressor2(in_dim=Xtr_t.shape[1], p_drop=0.1).to(device)

In [None]:
BATCH_SIZE_9 = 1024
train_loader_9 = DataLoader(train_ds, batch_size=BATCH_SIZE_9, shuffle=True, drop_last=False)
val_loader_9 = DataLoader(val_ds, batch_size=BATCH_SIZE_9, shuffle=False, drop_last=False)
test_loader_9 = DataLoader(test_ds, batch_size=BATCH_SIZE_9, shuffle=False, drop_last=False)

In [None]:
criterion_9 = nn.SmoothL1Loss() # MSE for low errors and MAE for high errors
optimizer_9 = torch.optim.Adam(model_9.parameters(), lr=5e-4, weight_decay=1e-4)
# scheduler_9 = torch.optim.lr_scheduler.ReduceLROnPlateau(
    # optimizer_9, mode="min", factor=0.5, patience=6, min_lr=1e-7)

In [None]:
from torch.optim.swa_utils import AveragedModel, update_bn
import numpy as np
import torch, os

EPOCHS = 500
patience_9 = 10
best_val_mae_9 = float("inf")
best_state = None
epochs_no_improve = 0
early_stopping = False

# SWA config
#swa_start = 300
#swa_update_freq = 5
#swa_model = AveragedModel(model_9).to(device)

# target extraction as array
y_val_orig = y_val.values if hasattr(y_val, "values") else y_val
y_test_orig = y_test.values if hasattr(y_test, "values") else y_test

# loop
for epoch in range(1, EPOCHS + 1):
  model_9.train()
  train_loss = 0.0
  # f-prop, loss, b-prop, g-d
  for xb, yb in train_loader:
    xb, yb = xb.to(device), yb.to(device)
    optimizer_9.zero_grad()
    pred = model_9(xb)
    loss = criterion_9(pred, yb)
    loss.backward()
    optimizer_9.step()
    # loss.item() is average so multiply by batch size
    train_loss += loss.item() * xb.size(0)

  # average over all batches
  train_loss /= len(train_loader.dataset)

  # eval on validation in $$$
  val_mae_9, val_rmse_9, _ = evaluate(model_9, val_loader, y_val_orig)
  # for potential LR reduction
  # scheduler_9.step(val_mae_9)

  current_lr = optimizer_9.param_groups[0]["lr"]

  # results in log1p and $$$
  print(f"Epoch {epoch:03d} | train_loss(MSE-log): {train_loss:.4f} | "
        f"val_MAE($): {val_mae_9:.4f} | val_RMSE($): {val_rmse_9:.4f} | LR: {current_lr:.2e}")

  # early stopping
  if val_mae_9 < best_val_mae_9 - 1e-6:
    best_val_mae_9 = val_mae_9
    best_state_9 = {
        "epoch": epoch,
        "model_state_dict": model_9.state_dict(),
        "optimizer_state_dict": optimizer_9.state_dict(),
        "val_mae": best_val_mae_9
    }
    epochs_no_improve = 0
  else:
    epochs_no_improve += 1
    if epochs_no_improve >= patience_9 and early_stopping:
      print(f"Early stopping at epoch {epoch} (no improvement for {patience_9} epochs).")
      break

  # SWA update
  #if epoch >= swa_start and ((epoch - swa_start) % swa_update_freq == 0):
    #swa_model.update_parameters(model_9)

best_model_path_9 = os.path.join(artifacts_dir, "mlp_regressor_best_10.pt")
if best_state_9 is not None:
  torch.save(best_state_9, best_model_path_9)
  print(f"Saved best model to: {best_model_path_9} (best val MAE: {best_val_mae_9:.4f})")
else:
  print("Warning: no best state captured.")

Epoch 001 | train_loss(MSE-log): 0.6152 | val_MAE($): 1.9676 | val_RMSE($): 2.2743 | LR: 5.00e-04
Epoch 002 | train_loss(MSE-log): 0.3989 | val_MAE($): 1.5624 | val_RMSE($): 1.8874 | LR: 5.00e-04
Epoch 003 | train_loss(MSE-log): 0.1844 | val_MAE($): 1.0277 | val_RMSE($): 1.3199 | LR: 5.00e-04
Epoch 004 | train_loss(MSE-log): 0.0920 | val_MAE($): 0.8468 | val_RMSE($): 1.1807 | LR: 5.00e-04
Epoch 005 | train_loss(MSE-log): 0.0694 | val_MAE($): 0.7552 | val_RMSE($): 1.0885 | LR: 5.00e-04
Epoch 006 | train_loss(MSE-log): 0.0565 | val_MAE($): 0.6834 | val_RMSE($): 1.0077 | LR: 5.00e-04
Epoch 007 | train_loss(MSE-log): 0.0486 | val_MAE($): 0.6278 | val_RMSE($): 0.9304 | LR: 5.00e-04
Epoch 008 | train_loss(MSE-log): 0.0428 | val_MAE($): 0.5825 | val_RMSE($): 0.8587 | LR: 5.00e-04
Epoch 009 | train_loss(MSE-log): 0.0382 | val_MAE($): 0.5468 | val_RMSE($): 0.8096 | LR: 5.00e-04
Epoch 010 | train_loss(MSE-log): 0.0349 | val_MAE($): 0.5180 | val_RMSE($): 0.7551 | LR: 5.00e-04
Epoch 011 | train_lo

SEED 33 | val_MAE: 0.3558

In [None]:
model_10 = MLPRegressor2(in_dim=Xtr_t.shape[1], p_drop=0.1).to(device)

In [None]:
BATCH_SIZE_10 = 1024
train_loader_10 = DataLoader(train_ds, batch_size=BATCH_SIZE_10, shuffle=True, drop_last=False)
val_loader_10 = DataLoader(val_ds, batch_size=BATCH_SIZE_10, shuffle=False, drop_last=False)
test_loader_10 = DataLoader(test_ds, batch_size=BATCH_SIZE_10, shuffle=False, drop_last=False)

In [None]:
criterion_10 = nn.SmoothL1Loss() # MSE for low errors and MAE for high errors
optimizer_10 = torch.optim.Adam(model_10.parameters(), lr=5e-4, weight_decay=1e-4)
# scheduler_9 = torch.optim.lr_scheduler.ReduceLROnPlateau(
    # optimizer_9, mode="min", factor=0.5, patience=6, min_lr=1e-7)

In [None]:
from torch.optim.swa_utils import AveragedModel, update_bn
import numpy as np
import torch, os

EPOCHS = 500
patience_10 = 10
best_val_mae_10 = float("inf")
best_state = None
epochs_no_improve = 0
early_stopping = False

# SWA config
#swa_start = 300
#swa_update_freq = 5
#swa_model = AveragedModel(model_9).to(device)

# target extraction as array
y_val_orig = y_val.values if hasattr(y_val, "values") else y_val
y_test_orig = y_test.values if hasattr(y_test, "values") else y_test

# loop
for epoch in range(1, EPOCHS + 1):
  model_10.train()
  train_loss = 0.0
  # f-prop, loss, b-prop, g-d
  for xb, yb in train_loader:
    xb, yb = xb.to(device), yb.to(device)
    optimizer_10.zero_grad()
    pred = model_10(xb)
    loss = criterion_10(pred, yb)
    loss.backward()
    optimizer_10.step()
    # loss.item() is average so multiply by batch size
    train_loss += loss.item() * xb.size(0)

  # average over all batches
  train_loss /= len(train_loader.dataset)

  # eval on validation in $$$
  val_mae_10, val_rmse_10, _ = evaluate(model_10, val_loader, y_val_orig)
  # for potential LR reduction
  # scheduler_9.step(val_mae_9)

  current_lr = optimizer_10.param_groups[0]["lr"]

  # results in log1p and $$$
  print(f"Epoch {epoch:03d} | train_loss(MSE-log): {train_loss:.4f} | "
        f"val_MAE($): {val_mae_10:.4f} | val_RMSE($): {val_rmse_10:.4f} | LR: {current_lr:.2e}")

  # early stopping
  if val_mae_10 < best_val_mae_10 - 1e-6:
    best_val_mae_10 = val_mae_10
    best_state_10 = {
        "epoch": epoch,
        "model_state_dict": model_10.state_dict(),
        "optimizer_state_dict": optimizer_10.state_dict(),
        "val_mae": best_val_mae_10
    }
    epochs_no_improve = 0
  else:
    epochs_no_improve += 1
    if epochs_no_improve >= patience_10 and early_stopping:
      print(f"Early stopping at epoch {epoch} (no improvement for {patience_10} epochs).")
      break

  # SWA update
  #if epoch >= swa_start and ((epoch - swa_start) % swa_update_freq == 0):
    #swa_model.update_parameters(model_9)

best_model_path_10 = os.path.join(artifacts_dir, "mlp_regressor_best_11.pt")
if best_state_10 is not None:
  torch.save(best_state_10, best_model_path_10)
  print(f"Saved best model to: {best_model_path_10} (best val MAE: {best_val_mae_10:.4f})")
else:
  print("Warning: no best state captured.")

Epoch 001 | train_loss(MSE-log): 0.4163 | val_MAE($): 1.6730 | val_RMSE($): 2.0084 | LR: 5.00e-04
Epoch 002 | train_loss(MSE-log): 0.2440 | val_MAE($): 1.2093 | val_RMSE($): 1.5216 | LR: 5.00e-04
Epoch 003 | train_loss(MSE-log): 0.0969 | val_MAE($): 0.8065 | val_RMSE($): 1.3176 | LR: 5.00e-04
Epoch 004 | train_loss(MSE-log): 0.0562 | val_MAE($): 0.7293 | val_RMSE($): 1.2983 | LR: 5.00e-04
Epoch 005 | train_loss(MSE-log): 0.0472 | val_MAE($): 0.6552 | val_RMSE($): 1.0995 | LR: 5.00e-04
Epoch 006 | train_loss(MSE-log): 0.0407 | val_MAE($): 0.6010 | val_RMSE($): 1.0018 | LR: 5.00e-04
Epoch 007 | train_loss(MSE-log): 0.0354 | val_MAE($): 0.5526 | val_RMSE($): 0.8733 | LR: 5.00e-04
Epoch 008 | train_loss(MSE-log): 0.0313 | val_MAE($): 0.5164 | val_RMSE($): 0.8050 | LR: 5.00e-04
Epoch 009 | train_loss(MSE-log): 0.0292 | val_MAE($): 0.4884 | val_RMSE($): 0.7463 | LR: 5.00e-04
Epoch 010 | train_loss(MSE-log): 0.0269 | val_MAE($): 0.4677 | val_RMSE($): 0.7035 | LR: 5.00e-04
Epoch 011 | train_lo

### Testing model with parameters from training loop 5 on test set

In [None]:
checkpoint_path = os.path.join(artifacts_dir, "mlp_regressor_best_11.pt")
ckpt = torch.load(checkpoint_path, map_location=device, weights_only=False)

model_10.load_state_dict(ckpt["model_state_dict"])
model_10.eval()

test_mae, test_rmse, y_pred_test = evaluate(model_10, test_loader, y_test_orig)

print(f"Test MAE ($): {test_mae:.4f}")
print(f"Test RMSE ($): {test_rmse:.4f}")

Test MAE ($): 0.3792
Test RMSE ($): 0.5515


Average error of ~$38,000 is competitive. An RMSE of 0.5515 indicates that there are some large outliers, which is unsurprising given that it's housing and there are many more variables than the features used.

Ideally to improve performance further:

- observations are individual houses rather than average of a block of houses
- more data
- greater number of features