### Machine Learning for Building Energy Consumption Prediction and Comparison  
**Author:** Olivia Solomon  

This notebook walks through the full workflow used for this capstone project:

1. Load and preprocess the dataset  
2. Train the LightGBM models (log-meter + log-intensity ensemble)  
3. Generate predictions for 2017  
4. Compute evaluation metrics  
5. Produce key figures seen in the corresponding final report  

All scripts called here are fully automated and located in the `src/` directory of the [GitHub](https://github.com/oliviasolomon/ml-energy-forecasting-capstone).


## 1. Environment & Imports

The following cell loads standard Python libraries and enables autoreload so the notebook updates automatically when local `.py` files change.

In [None]:
%load_ext autoreload
%autoreload 2

import pandas as pd
import numpy as np
from pathlib import Path

import matplotlib.pyplot as plt

from src import data_preprocessing
from src import train_and_predict
from src import evaluation

from IPython.display import Image

BASE_DIR = Path.home()  # change to correct directory

## 2. Preprocess the Raw Data

This step:
- Loads `ml_capstone_data.xlsx`  
- Melts the electricity data into long format  
- Merges metadata + weather  
- Removes invalid meter readings  
- Outputs `data/electricity_long_2017.parquet` for modeling  

In [None]:
data_preprocessing.preprocess()

## 3. Train LightGBM Models

This command:
- Trains log(meter) and log(meter_per_sqft) models  
- Uses 2016 data for training  
- Uses the final 30 days of 2016 for early stopping  
- Saves model files into `models/`  
- Produces 2017 predictions saved as:  
  `outputs/electricity_2017_predictions_long.csv`

In [None]:
train_and_predict.main()

## 4. Load Predictions and Inspect Data
Loads:
- Processed parquet file (actual 2016–2017)  
- 2017 predictions from the model  
Then examine the shape, preview rows, and check available building types.

In [None]:
df_actual = pd.read_parquet("data/electricity_long_2017.parquet")
df_pred = pd.read_csv("outputs/electricity_2017_predictions_long.csv", parse_dates=["timestamp"])

print("Actual data:", df_actual.shape)
print("Prediction data:", df_pred.shape)

df_pred.head()

## 5. Evaluation & Visualization

Running the evaluation script generates all figures used in the final report:
- Weekly comparison for Oct 21–28  
- RMSE by building type  
- Feature importance  
- Seasonal profiles (all 3 types)  
- Weekly load shapes (all 3 types)  
- Full-year 2017 predicted vs actual  
- Scatter correlation plot (2017)  

Figures are saved to the `figures/` directory.

In [None]:
evaluation.main()

## 6. Metrics Summary

This cell loads the computed RMSE, MAE, and MAPE for each building type.

In [None]:
metrics = pd.read_csv("figures/metrics_by_building_type.csv")
metrics

## 7. Display an Example Figure

Load and display images from the `figures/` directory.

In [None]:
Image("figures/week_oct21_28_all_types.png")
Image("rmse_by_building_type.png")
Image("feature_importance_logmeter.png")
Image("seasonal_profile_dormitory.png")
Image("seasonal_profile_lab.png")
Image("seasonal_profile_classroom.png")
Image("seasonal_profile_all_types.png")
Image("weekly_shape_dormitory.png")
Image("weekly_shape_lab.png")
Image("weekly_shape_classroom.png")
Image("weekly_shape_all_types.png")
Image("full_year_2017_all_buildings.png")
Image("actual_vs_pred_scatter_2017.png")

## 8. Conclusion

This notebook executed the entire pipeline:

✔ Preprocessed raw dataset  
✔ Trained LightGBM ensemble models  
✔ Generated hourly 2017 predictions  
✔ Computed error metrics by building category  
✔ Generated all figures used in the capstone report  

The workflow is now fully reproducible to support extension to new datasets or additional building types.