# Optimizing Building Temperature Control with `pi_optimal`

## Introduction

Efficient energy management is a key component of sustainable building operations, helping to reduce costs and minimize environmental impact. By leveraging **Reinforcement Learning (RL)**, we can optimize energy usage while maintaining occupant comfort. This notebook demonstrates the use of `pi_optimal` to train an RL agent for **temperature control optimization** in buildings, utilizing a dataset of historical energy consumption.

--- 
## Table of Contents

1. [Optimizing Building Temperature Control with `pi_optimal`](#optimizing-building-temperature-control-with-pi_optimal)
2. [Introduction](#introduction)
3. [Problem Statement](#problem-statement)
4. [Dataset](#dataset)
    - [Dataset Features](#dataset-features)
5. [Defining the Reward Function](#defining-the-reward-function)
    - [Implementation](#implementation)
    - [Apply the reward function to the dataset](#apply-the-reward-function-to-the-dataset)
6. [Dataset Preparation](#dataset-preparation)
    - [Configuration](#configuration)
7. [Training the Agent](#training-the-agent)
    - [Agent Initialization](#agent-initialization)
    - [Training the Agent](#training-the-agent-1)
8. [Evaluating and Predicting Actions](#evaluating-and-predicting-actions)
    - [Load Current Data](#load-current-data)
    - [Create Current Dataset](#create-current-dataset)
    - [Predict Optimal Actions](#predict-optimal-actions)
9. [Interpreting the Results](#interpreting-the-results)
    - [Multi-Step Planning](#multi-step-planning)
    - [Decision-Making Options](#decision-making-options)
10. [Visualization](#visualization)
11. [Conclusion](#conclusion)
    - [Key Highlights](#key-highlights)
12. [Next Steps](#next-steps)
13. [References](#references)

---

## Problem Statement

Our objective is to train an RL agent using `pi_optimal` to:

- **Minimize energy consumption**: Optimize cooling intensity to reduce electricity usage.
- **Maintain comfort**: Keep indoor temperature close to a desired level.

The agent will learn from historical data to adjust cooling intensity, balancing these goals effectively.

---

## Dataset

The dataset contains **energy consumption data** from multiple buildings, structured as follows:

### Dataset Features

1. **Simulation Details**:
   - `episode`: Unique identifier for each simulation run.
   - `step`: Time step during data collection.

2. **Time Variables**:
   - `hour`: Hour of the day.
   - `day_type`: Day of the week (e.g., weekday or weekend).

3. **Environmental Context**:
   - `outdoor_dry_bulb_temperature`: Outdoor temperature (°C).
   - `occupant_count`: Number of occupants in the building.

4. **System State**:
   - `indoor_dry_bulb_temperature`: Current indoor temperature (°C).
   - `indoor_dry_bulb_temperature_cooling_set_point`: Desired indoor temperature (°C).

5. **Control Action**:
   - `cooling_device`: Intensity of the cooling system (0 = low, 1 = high).

6. **Energy Metrics**:
   - `net_electricity_consumption`: Electricity usage (kWh).

In [1]:
import pandas as pd

df_historical_building_energy_consumption = pd.read_csv('data/historical_temperature_control.csv')
df_historical_building_energy_consumption.head()

Unnamed: 0,episode,step,day_type,hour,outdoor_dry_bulb_temperature,indoor_dry_bulb_temperature,indoor_dry_bulb_temperature_cooling_set_point,cooling_device,net_electricity_consumption,occupant_count
0,0,0,5,1,24.66,23.098652,23.222221,0.276068,0.677881,3.0
1,0,1,5,2,24.07,22.234743,22.222221,0.301041,0.846281,3.0
2,0,2,5,3,23.9,22.22306,22.222221,0.741433,5.384543,3.0
3,0,3,5,4,23.87,22.22225,22.222221,0.034795,1.809869,3.0
4,0,4,5,5,23.83,22.222237,22.222221,0.98248,-0.31952,3.0


In [2]:
# Uncomment the following lines for faster training and inference if you have sklearnex installed and are using an Intel CPU

import numpy as np
from sklearnex import patch_sklearn
patch_sklearn()

Intel(R) Extension for Scikit-learn* enabled (https://github.com/intel/scikit-learn-intelex)


In [3]:
# Add the root path to the sys path to load pi_optimal from the parent directory
import sys
sys.path.append("../..")

---

## Defining the Reward Function

The reward function balances two objectives:

1. **Comfort**: Penalizing deviations from the desired temperature.
2. **Cost**: Penalizing excessive energy consumption.

$\text{Reward} = - \left( (\text{Indoor Temperature} - \text{Desired Temperature})^2 + \text{Energy Consumption} \cdot 0.001 \right)$

### Implementation

In [4]:
# Desired indoor temperature
DESIRED_TEMP = 22  # Celsius

# Function to calculate reward
def calculate_reward(row):
    # Temperature comfort penalty
    temp_penalty = (row['indoor_dry_bulb_temperature'] - DESIRED_TEMP) ** 2
    # Energy cost
    energy_cost = row['net_electricity_consumption'] * 0.001
    # Total penalty
    total_penalty = temp_penalty + energy_cost
    # Reward is the negative of the total penalty
    reward = -total_penalty
    return reward

### Apply the reward function to the dataset

In [5]:
# Apply the reward calculation
df_historical_building_energy_consumption['reward'] = df_historical_building_energy_consumption.apply(calculate_reward, axis=1)
df_historical_building_energy_consumption.head()

Unnamed: 0,episode,step,day_type,hour,outdoor_dry_bulb_temperature,indoor_dry_bulb_temperature,indoor_dry_bulb_temperature_cooling_set_point,cooling_device,net_electricity_consumption,occupant_count,reward
0,0,0,5,1,24.66,23.098652,23.222221,0.276068,0.677881,3.0,-1.207714
1,0,1,5,2,24.07,22.234743,22.222221,0.301041,0.846281,3.0,-0.055951
2,0,2,5,3,23.9,22.22306,22.222221,0.741433,5.384543,3.0,-0.05514
3,0,3,5,4,23.87,22.22225,22.222221,0.034795,1.809869,3.0,-0.051205
4,0,4,5,5,23.83,22.222237,22.222221,0.98248,-0.31952,3.0,-0.04907


---

## Dataset Preparation

To train a `pi_optimal` RL agent, we must first load and preprocess the building energy dataset. The `pi_optimal` package provides a custom dataset class that streamlines the preprocessing pipeline. Below are the key parameters that need to be defined during this process:

- **Unit Index**:  
   This parameter, `unit_index`, identifies distinct units in the dataset. In our case, each unit corresponds to a unique building (`episode` column).

- **Time Column**:  
   The time column (`timestep_column`) establishes the temporal sequence of data points, enabling the model to learn from historical trends. For instance, the RL agent can consider the previous 12 hours of data (set by the `lookback_timesteps` parameter) to make informed decisions.

- **Reward Column**:  
   The `reward_column` specifies the target that the agent seeks to optimize. Here, the dataset already includes a precomputed `reward` column, which reflects the balance between energy efficiency and occupant comfort.

- **State Columns**:  
   The state columns capture the system's current status, including variables that influence energy consumption and comfort levels. Relevant examples include:  
   - `outdoor_dry_bulb_temperature`  
   - `occupant_count`  
   - `day_type`  

   These features help the agent assess the current environment and predict outcomes effectively.

- **Action Columns**:  
   The action columns represent controllable variables, such as the intensity of the cooling device (`cooling_device`). While this example focuses on a single action, `pi_optimal` supports multiple simultaneous actions if needed.

By carefully defining these parameters, we ensure that the RL agent can interpret the dataset's structure, learn from past patterns, and make optimized decisions to reduce energy usage while maintaining comfort.

### Configuration

In [6]:
import pi_optimal as po

LOOKBACK_TIMESTEPS = 8
historical_dataset = po.datasets.timeseries_dataset.TimeseriesDataset(df=df_historical_building_energy_consumption,
                                                                        lookback_timesteps=LOOKBACK_TIMESTEPS,
                                                                        unit_index='episode',
                                                                        timestep_column='step',
                                                                        reward_column='reward',
                                                                        state_columns=['day_type', 'hour', 'outdoor_dry_bulb_temperature', 'indoor_dry_bulb_temperature','occupant_count', 'net_electricity_consumption', 'indoor_dry_bulb_temperature_cooling_set_point'],
                                                                        action_columns=['cooling_device'])

---

## Training the Agent

With the dataset prepared, we can initialize and train the RL agent using `pi_optimal`.

### Agent Initialization

In [7]:
from pi_optimal.agents.agent import Agent

# Initialize the agent
agent = Agent(dataset=historical_dataset,
                type="mpc-continuous") # MPC horizon of 24 hours

### Training the Agent

In [8]:
agent.train()

100%|██████████| 8/8 [00:01<00:00,  4.00it/s]
100%|██████████| 8/8 [00:01<00:00,  4.33it/s]


---

## Evaluating and Predicting Actions

After training the Reinforcement Learning (RL) agent, the next step is to evaluate its performance on new, unseen data. This involves loading the current building energy consumption data, preparing it using the same preprocessing pipeline as the historical dataset, and then using the trained agent to predict the optimal actions (i.e. in our case cooling intensity) to maximize energy savings and maintain an desiried temperature.

### Load Current Data

In [9]:
import pandas as pd
import pi_optimal as po

# Load the current building energy consumption data
df_current_building_energy_consumption = pd.read_csv('data/current_temperature_control.csv')

# Apply the reward calculation
df_current_building_energy_consumption["reward"] = df_current_building_energy_consumption.apply(calculate_reward, axis=1)

### Create Current Dataset

In [10]:
current_dataset = po.datasets.timeseries_dataset.TimeseriesDataset(df=df_current_building_energy_consumption,
                                                                   lookback_timesteps=LOOKBACK_TIMESTEPS,
                                                                    dataset_config=historical_dataset.dataset_config,
                                                                    train_processors=False,
                                                                    is_inference=True)

### Predict Optimal Actions

In [11]:
best_actions = agent.predict(current_dataset, 
                             inverse_transform=True, 
                             horizon=24)

100%|██████████| 24/24 [00:01<00:00, 17.84it/s]


Iteration: 1, Top-100 Cost: 0.248 (Cost: 0.6362, Uncertainty: 0.3638)


100%|██████████| 24/24 [00:01<00:00, 15.31it/s]


Iteration: 2, Top-100 Cost: 0.1983 (Cost: 0.6331, Uncertainty: 0.3669)


100%|██████████| 24/24 [00:01<00:00, 16.87it/s]


Iteration: 3, Top-100 Cost: 0.2004 (Cost: 0.6133, Uncertainty: 0.3867)


100%|██████████| 24/24 [00:01<00:00, 17.11it/s]


Iteration: 4, Top-100 Cost: 0.2507 (Cost: 0.5117, Uncertainty: 0.4883)


100%|██████████| 24/24 [00:01<00:00, 16.95it/s]


Iteration: 5, Top-100 Cost: 0.239 (Cost: 0.5877, Uncertainty: 0.4123)


100%|██████████| 24/24 [00:01<00:00, 16.07it/s]


Iteration: 6, Top-100 Cost: 0.2338 (Cost: 0.6045, Uncertainty: 0.3955)


100%|██████████| 24/24 [00:01<00:00, 17.76it/s]


Iteration: 7, Top-100 Cost: 0.2402 (Cost: 0.5902, Uncertainty: 0.4098)


100%|██████████| 24/24 [00:01<00:00, 16.35it/s]


Iteration: 8, Top-100 Cost: 0.2276 (Cost: 0.4637, Uncertainty: 0.5363)


100%|██████████| 24/24 [00:01<00:00, 17.26it/s]


Iteration: 9, Top-100 Cost: 0.2178 (Cost: 0.5816, Uncertainty: 0.4184)


100%|██████████| 24/24 [00:01<00:00, 15.36it/s]

Iteration: 10, Top-100 Cost: 0.247 (Cost: 0.5695, Uncertainty: 0.4305)





---

## Interpreting the Results

The agent provides a sequence of optimal actions (cooling intensities) for the time horizon.

In [12]:
for i in range(len(best_actions)):
    print(f"Timestep {i}:")
    print("Cooling device strength:", round(best_actions[i][0], 2))
    print()
    print("--------------------")
    print()

Timestep 0:
Cooling device strength: 0.26

--------------------

Timestep 1:
Cooling device strength: 0.56

--------------------

Timestep 2:
Cooling device strength: 0.52

--------------------

Timestep 3:
Cooling device strength: 0.56

--------------------

Timestep 4:
Cooling device strength: 0.69

--------------------

Timestep 5:
Cooling device strength: 0.76

--------------------

Timestep 6:
Cooling device strength: 0.61

--------------------

Timestep 7:
Cooling device strength: 0.74

--------------------

Timestep 8:
Cooling device strength: 0.3

--------------------

Timestep 9:
Cooling device strength: 0.12

--------------------

Timestep 10:
Cooling device strength: 0.12

--------------------

Timestep 11:
Cooling device strength: 0.22

--------------------

Timestep 12:
Cooling device strength: 0.21

--------------------

Timestep 13:
Cooling device strength: 0.15

--------------------

Timestep 14:
Cooling device strength: 0.19

--------------------

Timestep 15:
Cooling 

### Multi-Step Planning

The agent optimizes actions by considering future outcomes over a multi-step horizon. This allows for efficient and forward-thinking decision-making.

### Decision-Making Options

1. **Full Application of Recommended Actions**: We could choose to apply all recommended actions immediately, adjusting the cooling intensity according to the agent's suggestions for the entire time horizon (e.g., the next 4 hours). This approach allows the building control system to operate based on the agent’s full plan.

2. **Incremental Application**: Alternatively, we might apply only the first action in the sequence for the next hour and then the next hour re-run the agent to generate updated recommendations. This method provides flexibility by allowing adjustments based on real-time conditions, while still leveraging the agent’s ability to look multiple steps ahead.

--- 

## Visualization

`pi_optimal` includes a **trajectory visualizer** for the simulated optimal trajectory. This tool allows you to explore the agent's recommendations and analyze their effects on energy consumption and indoor temperature over time. It provides valuable insights into the agent's behavior and helps evaluate its performance across various scenarios.

In [13]:
from pi_optimal.utils.trajectory_visualizer import TrajectoryVisualizer

trajectory_visualizer = TrajectoryVisualizer(agent, current_dataset, best_actions=best_actions, lookback_timesteps=8)
trajectory_visualizer.display()

VBox(children=(Sheet(cells=(Cell(column_end=0, column_start=0, numeric_format='0.00', row_end=0, row_start=0, …

100%|██████████| 24/24 [00:00<00:00, 118.99it/s]
100%|██████████| 24/24 [00:00<00:00, 244.36it/s]
100%|██████████| 24/24 [00:00<00:00, 243.07it/s]
100%|██████████| 24/24 [00:00<00:00, 235.02it/s]
100%|██████████| 24/24 [00:00<00:00, 262.99it/s]


---

## Conclusion

This notebook demonstrates how `pi_optimal` can train an RL agent to optimize building energy consumption while maintaining indoor comfort. The agent efficiently balances **energy savings** and **occupant comfort**, making it a powerful tool for sustainable building management.


### Key Highlights

- **Efficient Energy Management**: Significantly reduces electricity usage.
- **Comfort Maintenance**: Keeps indoor temperatures close to desired levels.
- **Scalable and Adaptive**: Can be applied to various buildings with minimal configuration.

---

## Next Steps

1. **Fine-Tuning**:
   - Adjust reward function weights.
   - Experiment with different lookback horizons and agent types.

2. **Deployment**:
   - Integrate the RL agent into a real-time control system for live optimization.

### References

- `pi_optimal` Documentation: [GitHub](https://github.com/pi-optimal/pi_optimal)