Skip to content

Forecasting building energy demand through time series analysis and machine learning.

License

Notifications You must be signed in to change notification settings

erickCantu/TheGreenCitySolutionsGroup

Repository files navigation

Forecasting building energy demand

By: Rafael Arndt, Erick Cantu, Leon Pichotka and Su Leen Wong

This repository contains files and Jupyter notebooks related to our capstone project for the Neuefische Data Science bootcamp. This project focuses on forecasting the hourly building energy demand of 9 buildings in total from the CityLearn Challenge, based on 4 years energy consumption and weather data. The presentation of the project can be found here.

Content

Introduction

Since energy prices are continuing to rise and the future of the energy situation is rather uncertain, cities may want to investigate the energy consumption of different building sectors to predict future energy demand and identify areas where energy can be saved.

Energy demand forecasting is fundamental for an energy utility’s decision making on:

  • Grid stability

  • Planning power supply activities

  • Reducing energy wastage

Since the data available consists of a series of energy consumption values taken sequentially with a fixed time interval over four years, time series analysis and models are ideal for this problem.

About the dataset

The dataset we used for this project consists of:

  • Synthetic data of 4 years, 9 buildings from the CityLearn Challenge* (southern US suburb)
  • Hourly data of energy demand and solar generation
  • Hourly weather data (temperature, humidity, solar radiation)

The nine buildings in this dataset consist of:

  • Building 1: Office building
  • Building 2: Fast food restaurant
  • Building 3: Standalone retail
  • Building 4: Strip mall retail
  • Buildings 5-9: Multi-family buildings

Problem statement

Our goal is to model the net energy demand of a collection of 9 buildings which are part of the 2021 CityLearn Challenge.

First, the time series data was analyzed for trend and seasonality.

It seemed suitable to forecast the energy demand for 24 hours, as weather predictions get less accurate further in the future. Else, the power suppliers energy management is mainly focused on a 24 hour period.

Different models were applied and compared:

  • Baseline (last years values)
  • Linear Regression
  • Polynomial Regression
  • SARIMAX
  • Prophet
  • TBats
  • XGBoost
  • Random Forest

Results

A small trend in the net energy demand over 4 years was discovered with a slight increase over the first 3 years and a decrease in the 4th year (corresponding to the trend in the weather data). A clear yearly seasonality is found with the highest energy demand in summer (due to air conditioning) and the lowest energy demand in winter (due to mild winters). Furthermore a weekly as well as a daily seasonality was identified.

Time Series Analysis

Trend and yearly seasonality Weekly seasonality Daily seasonality

Modelling

Model benchmark

The tree-based machine learning models (Random Forest and XGBoost) performed better than the time series models (SARIMAX, Prophet, TBats) taking the mean absolute error as metric.

Forecast for 24 hours

Conclusion

The tree-based machine learning models (Random forest and XGBoost) produced forecasts with the lowest mean absolute error compared to the observed data.

Dashboard

The dashboard can be used to present the models results and compare then the for any date in the test year (year 4 = 2022). The dashboard can be found hosted here: https://greendash.herokuapp.com/

Prerequisites / How to run

The project notebooks require a pyenv with Python: 3.9.8. To properly setup the environment use the requirements file in the repository as follows:

make setup

or

pyenv local 3.9.8
python -m venv .venv
source .venv/bin/activate
pip install --upgrade pip
pip install -r requirements.txt

The requirements.txt file contains the libraries needed for the EDA, Time Series Analysis and the dash board deployment.

The time series models results (MAE, MSE and R2)P are trackable through MLFlow. The required MLFLOW URI file is not part of this repository. Before running the time series jupyter notebooks and locally save the results you require a .mlflow_uri file in the repository root. In bash do:

echo http://127.0.0.1:5000/ > .mlflow_uri

This will create a local file where the uri is stored which will not be added on github (.mlflow_uri is in the .gitignore file).

Before running the time series notebook, check your local mlflow by:

mlflow ui

and open the link http://127.0.0.1:5000

The dashboard deployment can be accessed by running:

cd dashboard
python forecast_dashboard.py 

and opening the link http://127.0.0.1:8100/

Files in the repo

Original data is available at Citylearn v1.0.0 release. Preprocessed data for our analysis is located at the preprocessed folder.

Jpyther notebooks are divided in two sections. Exploratory Data Analysis (EDA) and Time Series (TS) models. These sections are part of the notebook prefix name.

EDA Description
Data Pre-processing Data setup for time series analysis
Weather analysis Exploratory analysis on weather variables. e.g. Temperature, Solar radiation.
Seasonality decomposition Seasonality decomposition. Moving average period = 7 days
Data Visualization Data visualization
Yearly seasonality decomposition Yearly seasonality decomposition. And meteorological seasons analysis. Moving average period = 7 days
TS 8 day predictions models Description
Base line Previous 24 hrs energy demand value
Linear Regression Model with weather and time features
Polynomial Regression Linear regression model with polynomial features including weather and time features
SARIMAX Model with weather, yearly and weekly seasonalities as exogenous features.
Prophet Model with hyperparameter optimization, holidays, weather data as additional reggressors and weekly seasonality by meteorological seasons
TBats
XGBoost Model including weather features and time features. Net energy demand as target. Additionally a 24 hour time lag + 1,2,3 weeks time lag of the target feature is used.
XGBoost refinded Model including weather features and time features. Additionally a 24 hour time lag + 1,2,3 weeks time lag of the target feature is used. Every energy demand is predicted separately and summed up afterwards
Random Forest Model including weather features and time features. Net energy demand as target. Additionally a 24 hour time lag and 1 week time lag of the target feature is used.

Future work

We have two approaches for our future work. A business approach and a Machine Learning optimization approach.

For our business oriented future work. We plan to implement a dashboard solution to predict real-time energy demand. The solution will allow the stockholder to feed current data, helping them with their decisions to balance their energy demand.

At our Machine Learning optimization approach. We plan to optimize the models generalization with respect to different climate zones. Our plan is to evaluate a one solution for all climate zones vs individual climate zone solutions. Furthermore we plant to develop a reinforcement learning agent(s) with the aim of battery usage optimization towards cost reduction and energy grid stability improvement.