# Notebook 03: Machine Learning Forecasting

**Goal**: Train a predictive model to forecast server CPU capacity usage.

**Steps**:
1. Load processed data (from Notebook 02).
2. Split into Training (2022-2024) and Testing (2025) sets.
3. Train Random Forest Regressor.
4. Evaluate performance (RMSE, MAE).
5. Save the trained model.

In [1]:
import sys
import os
from pathlib import Path

# Add src to path
project_root = Path(os.getcwd()).parent
if str(project_root) not in sys.path:
    sys.path.append(str(project_root))

from src.ml_forecasting import CapacityForecaster
import pandas as pd
import matplotlib.pyplot as plt

## 1. model Training

In [2]:
processed_data = project_root / "data" / "processed" / "processed_metrics.parquet"
model_dir = project_root / "models"

forecaster = CapacityForecaster(
    data_path=str(processed_data),
    model_path=str(model_dir)
)

metrics = forecaster.run(target='cpu_p95')

## 2. Evaluation Summary

In [3]:
print("Training Results:")
print(f"RMSE: {metrics['rmse']:.4f}")
print(f"MAE:  {metrics['mae']:.4f}")

Training Results:
RMSE: 17.6521
MAE:  13.7617
