# HVAC optimization case study
Stefan Langenbach  
06/04/2020

## Agenda
1. Management Summary
1. Background
    1. Context of case study
    1. Tasks
    1. Assumptions
    1. Problem statement
1. Model Building
    1. Data Exploration
    1. Feature Engineering
    1. Modeling
    1. Evaluation
1. Model Deployment
    1. Architecture
    1. Deployment
    1. Monitoring
    1. Adaption

## Management Summary
* In order to regulate HVAC we created a model to predict whether a room is occupied or not
* Our model uses decision trees which predict room occupancy with 99% accuracy
* We suggest to use cloud services (AWS SageMaker) for model deployment, monitoring and adaption
* As available data is limited - all measurements were recorded during February 2015 - we strongly advise to gather more/additional data (c.f. [EnergyPlus tool](https://energyplus.net)), and re-evaluate the model before conducting large scale rollouts
* Additionally, in order to create a business case for HVAC regulation, its main cost drivers must be indentified and considered when improving the model
* A simple strategy for staged rollouts could be to focus on one room/floor/office building in the beginning move forward
* Finally, reminding employees to turn off HVAC when they switch off light in the office could be a quickwin not even requiring model rollout

## Background

### Context
>Your company is keen on transforming their **office buildings** into “smart buildings”, so as to underline their commitment to digitalization.  
>New sensors detecting various attributes have been deployed with the **goal of sensing whether a person is present in a room or not** so as to regulate HVAC accordingly.   
>Your CFO senses a **strong business case** since her gut feeling is that **most people leave climate control running, even when out of the office**.

### Tasks  
1. **Build a model** achieving the desired outcome, quantify its performance, describe your approach and **recommend a way forward to your management**.  
1. **Describe** also how you would ensure, that the **model is deployed**, its performance is being **monitored** and it is being constantly **adapted**.

## Assumptions

### Model building
* Available data is produced by different sensors *inside* a single room within *one* office space located somewhere in Europe
* Naming of the provided data files indicates their intended usage during modeling:
    * `datatrain.txt` is used for training 
    * `datatest.txt` is used for testing
    * `datatest2.txt` is used for validation
* Occupation of a room by a person is a sufficient proxy to regulate HVAC
* Occupancy status will be predicted for *every* data point (no resampling)
* Missclassification of occupancy, i.e. predicting that a room is occupied when it is not and vice versa, is *generally* discouraged (no preference to tune the model for recall/precision)
* Evaluation of costs/savings generated by the model are out of scope of the case study

### Model deployment
* Data produced by sensors is non-sensitive (no PII data) and thus save to be stored and processed in the cloud
* Sensors can transmit data via internet through standard protocols
* Deployment and monitoring infrastructure must be scalable, require minimum engineering effort and low maintenance

## Problem statement
Given historic data produced by different sensors (temperature, relative humidity, light, CO2 concentration, humidity ratio), predict whether a room is occupied or not (**binary classification**).

## Evaluation metric
According to the assumptions mentioned afore, model performance will be evaluated by **F1 Score**, the weighted average between recall and precision.

Recall (pun intended) that **precision** measures how **accurate** a model is by stating the following question for the case at hand: _"Ouf of those samples where rooms were predicted as occupied, how many were actually occupied?"_. Instead **recall** measures how many of the actually occupied rooms in the dataset did our model label as occupied.

## Existing research
* The body of knowledge for HVAC optimization includes roughly 7.500 studies according to [Google Scholar](https://scholar.google.com/scholar?hl=de&as_sdt=0%2C5&as_vis=1&q=hvac+optimization+machine+learning+deep+learning&btnG=)
* Research suggests that deep learning approaches are especially well suited to optimize HVAC systems: 
    * [A short-term building cooling load prediction method using deep learning algorithms](https://www.researchgate.net/publication/315436430_A_shortterm_building_cooling_load_prediction_method_using_deep_learning_algorithms)
    * [Autonomous HVAC Control: A Reinforcement Learning Approach](https://www.researchgate.net/publication/281638226_Autonomous_HVAC_Control_A_Reinforcement_Learning_Approach)
    * [Deep Reinforcement Learning for Building HVAC Control](https://www.researchgate.net/publication/317572268_Deep_Reinforcement_Learning_for_Building_HVAC_Control)
    * [Visible light based occupancy inference using ensemble learning](https://ieeexplore.ieee.org/document/8302496)
* There even exists an [AWS SageMaker notebook](https://github.com/awslabs/amazon-sagemaker-examples/tree/master/reinforcement_learning/rl_hvac_coach_energyplus) implementing HVAC optimization using reinforcement learning.  
* Interestingly though, a recent article in the Energy Systems journal [comparing machine learning algorithms for forecasting indoor temperature in smart buildings](https://www.researchgate.net/publication/338826095_A_comparison_of_machine_learning_algorithms_for_forecasting_indoor_temperature_in_smart_buildings) finds that traditional algorithms, i.e. Random Forrests, work well too.  
* The finding is in line with the results of a [study](#12) investigating the case study dataset. 

## Model building
>do machine learning like the great engineer you are, not like the great machine learning expert you aren’t.

### Data

* All samples in train, test and valdiation datasets were recorded in February 2015.  
* Note that train data was recored **after** test data, but **before** validation data.
* Unfortunately **test** data does not include **weekends**, but validation data does.

| | train | test | validation |
| --- | --- | --- | --- |
| samples | 8143 | 2665 | 9752 
| year | 2015 | 2015 | 2015
| month | Feb | Feb | Feb
| days | 7 | 3 | 8
| weekdays | MON - SUN | MON - WED | MON - SUN
| range | 04/02/2015 - 10/02/2015 | 02/02/2015 - 04/02/2015 | 11/02/2015 - 18/02/2015
| frequency | minute | minute | minute
| missing values | None | None | None

### Visualization

![Distribution of occupancy train](../references/figures/count_plot_train.svg)  

![Distribution of occupancy test](../references/figures/count_plot_test.svg)

![Heatmap of correlation of features in training data](../references/figures/heatmap_train.svg)

Line plot of average feature values by time of day in hours.  
Note that the average value of **light** tracks **occupancy** quite well.  
![Average feature value by hour](../references/figures/avg_feature_value_by_hour.svg)


Same plot as before, but with data limited to week**days**
![Average feature value by hour weekday](../references/figures/avg_feature_value_by_hour_weekday.svg)

Same plot as before, but with data limited to week**ends**
![Average feature value by hour weekend](../references/figures/avg_feature_value_by_hour_weekend.svg)

### Features

In addition to existing features and time components (year, month, day, etc.) already represented by the timestamp of the data, the following features were engineered:

| feature | description |
| --- | ---
| weekday | weekday (MON, TUE, WED, etc.) sample was recorded
| week | calendar week sample was recorded
| weekend | boolean indicating whether sample recorded on a weekend
| office_hour | boolean indicating whether sample was recorded between 09:00 and 17:00
| lunch_break | boolean indicating whether sample was recorded between 12:00 and 13:00
| night | boolean indicating whether sample was recorded between 19:00 and 06:00
| high_co2 | boolean indicating whether CO2 level is larger than 1000 ppm (c.f. [Safe CO2 levels in rooms](https://www.kane.co.uk/knowledge-centre/what-are-safe-levels-of-co-and-co2-in-rooms))
| light_on | boolean indicating whether light level is higher than 50 lux (c.f. [HSE on lightning in offices](https://www.hse.gov.uk/humanfactors/topics/lighting.htm))

### Model

A simple decision tree (**tree_limited**) with depth limited to 3 nodes, using only original features, outperforms any other model, i.e. RandomForrests using various parameters and features.

| model | description | Accuracy train | Accuracy test | F1-Score train | F1-Score test
| --- | --- | --- | --- | --- | ---
| baseline | dummy classifier always predicting the majority class | 0.7303 | 0.7415 | 0.3766 | 0.3620
| tree | decision tree with default parameters | 0.9268 | 0.9587 | 0.8165 | 0.9434
| tree_limited | decision tree with max_depth=3 | 0.9996 | 1.0 | 0.9992 | 1.0

### Evaluation
Feature importances of **tree_limited** model:  
![Feature importances tree_limited model](../references/figures/tree_limited_feat_importances_plot.svg)

Visualization of decision tree generated by **tree_limited** model:  
![Visualization of tree generated by tree_limited model](../references/figures/tree_plot.svg)

## Validation

The **tree_limited** model achieves accuracy of **0.99** and F1-score of **0.98** on the validation dataset - all samples are classified correctly.  

Confusion matrix of **tree limited** model:  
![Confusion matrix tree_limited model](../references/figures/conf_matrix_plot_final_model.svg)

## Model deployment

>Only a small fraction of real-world ML systems is composed of the ML code, as shown by the small black box in the middle. The required surrounding infrastructure is vast and complex

![Hidden technical debt in ML systems](../references/hidden_technical_debt_in_ml_systems.png)

### Architecture
>Standing on the shoulders of giants  

![HVAC architecture using AWS](../references/hvac_architecture.svg)

### Overview of architecture core components
* [IoT Core](https://aws.amazon.com/iot-core/) connects sensors in office buildings to the AWS cloud
* [IoT Analytics](https://aws.amazon.com/iot-analytics/)
    * collects sensor data from IoT Core and stores it in a timeseries data store
    * provides methods to clean, filter, transform and enrich data
    * features a built-in SQL engine to easily query data
* [QuickSight](https://aws.amazon.com/quicksight/) visualizes sendor data in dashboards
* [SageMaker](https://aws.amazon.com/sagemaker/)
    * provides an IDE for machine learning (SageMaker Studio) which supports experiment management (similar to [MLflow](https://www.mlflow.org/) and many more features
    * runs pipelines to preprocess input data before model building **and** inference
    * supports a great variety of algorithms and frameworks for model building
    * can deploy models for realtime and batch inference
    * automatically monitors deployed models for [concept drift](https://en.wikipedia.org/wiki/Concept_drift) (SageMaker Model Monitor)  
* [Greengras](https://aws.amazon.com/greengrass/) 
    * runs the SageMaker model on an edge device inside the office building
    * makes inferences on data received from sensors in order to control HVAC

### Deployment

* [Mlflow](https://www.mlflow.org/) is Python package for managing machine learning experiments and deployming models.  
* It allows us to deploy models in various ways, e.g. as REST-based webservices, within Docker containers and using cloud based ML services such as SageMaker.

Deploying a model is straightforwarded and can be done using the CLI:
```bash
# Deploy model as webservice
mlflow models serve -m $PATH_TO_MODEL [OPTIONS]

# Deploy model within Docker container
mlflow models build-docker -m $PATH_TO_MODEL [OPTIONS]
```

Given the architecture outlined above, we deploy locally created model to AWS SageMaker:  
```bash
# Build container
mlflow sagemaker build-and-push --build --push -c $MODEL_NAME [OPTIONS]

# Deploy on SageMaker
mlflow sagemaker deploy -m $MODEL_NAME [OPTIONS]
```

### Monitoring

* AWS SageMaker has a feature called **model monitor** which continously checks models for data and model drift
* Using model monitor we can save input data sent to our model endpoint and predictions made by the model
* Model monitor calculates various statistics (mean, standard deviation, min, max, quantiles, etc.) for every feature
* That information is compared against a baseline derived from data used to train the model via a user defined schedule
* If data or model drift is detected, we are notified and can check data pipelines or retrain our model

### Adaption

* In a real world scenario we would need to gather much more data before training and deploying any model
* As sensor recordings should not change too drastically too quickly in the real, **offline** training is sufficient for this usecase
* Using AWS SageMaker model monitor we can deploy our model and retrain and deploy when necessary 
* Given the available data retraining may be necessary quite often in the beginning

## References
1. [Google rules of ML](https://developers.google.com/machine-learning/guides/rules-of-ml)
1. [Hidden technical debt in Machine Learning systems](https://papers.nips.cc/paper/5656-hidden-technical-debt-in-machine-learning-systems.pdf)
1. [Prophet documentation](https://facebook.github.io/prophet/docs/quick_start.html#python-api)
1. [sktime documentation](https://alan-turing-institute.github.io/sktime/)
1. [MLflow documentation](https://www.mlflow.org/docs/latest/index.html)
1. [Kedro documentation](https://kedro.readthedocs.io/en/stable/index.html)
1. [Comet documentation](https://www.comet.ml/docs/)
1. [Streamlit documentation](https://docs.streamlit.io)
1. [AWS SageMaker developer guide](https://docs.aws.amazon.com/sagemaker/latest/dg/whatis.html)
1. [AWS SageMaker examples](https://github.com/awslabs/amazon-sagemaker-examples)
1. [Data project checklist](https://www.fast.ai/2020/01/07/data-questionnaire/)
1. [Case study dataset](https://archive.ics.uci.edu/ml/datasets/Occupancy+Detection+)
1. [Paper covering analysis of case study dataset](https://www.sciencedirect.com/science/article/abs/pii/S0378778815304357)
1. [Blog post covering analysis of case study dataset](https://machinelearningmastery.com/how-to-predict-room-occupancy-based-on-environmental-factors/)
1. [Blog post series explaining time series analysis to scikit-learn users](https://www.ethanrosenthal.com/2018/01/28/time-series-for-scikit-learn-people-part1/)
1. [Blog post on how to frame time series forcasting as supervised learning problems in Python](https://machinelearningmastery.com/convert-time-series-supervised-learning-problem-python/)
1. [Blog post on multivariate time series forcasting with neural networks](https://machinelearningmastery.com/multivariate-time-series-forecasting-lstms-keras/)
1. [Blog post explaining accuracy, recall, precision and F1 score](https://towardsdatascience.com/accuracy-precision-recall-or-f1-331fb37c5cb9)
1. [Blog post on Kedro](https://medium.com/@QuantumBlack/introducing-kedro-the-open-source-library-for-production-ready-machine-learning-code-d1c6d26ce2cf)
1. [Blog post on feature selection methods](https://machinelearningmastery.com/feature-selection-with-real-and-categorical-data/)
1. [Blog post on deploying MLflow models to AWS SageMaker](https://towardsdatascience.com/deploying-models-to-production-with-mlflow-and-amazon-sagemaker-d21f67909198)
1. [Blog post on AWS SageMaker model monitor](https://aws.amazon.com/blogs/aws/amazon-sagemaker-model-monitor-fully-managed-automatic-monitoring-for-your-machine-learning-models/)
1. [Working with time series section of Python data science handbook](https://jakevdp.github.io/PythonDataScienceHandbook/03.11-working-with-time-series.html)
1. [Deep learning for time series forcasting book](https://machinelearningmastery.com/deep-learning-for-time-series-forecasting/)