<a href="https://colab.research.google.com/github/intelligent-environments-lab/CityLearn/blob/master/examples/quickstart.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# QuickStart

Install the latest CityLearn version from PyPi with the :code:`pip` command:

In [None]:
!pip install CityLearn

## Centralized RBC
Run the following to simulate an environment controlled by centralized RBC agent for a single episode:

In [2]:
from citylearn.agents.rbc import BasicRBC as RBCAgent
from citylearn.citylearn import CityLearnEnv

dataset_name = 'citylearn_challenge_2022_phase_1'
env = CityLearnEnv(dataset_name, central_agent=True, simulation_end_time_step=1000)
model = RBCAgent(env)
model.learn(episodes=1)

# print cost functions at the end of episode
kpis = model.env.evaluate()
kpis = kpis.pivot(index='cost_function', columns='name', values='value')
kpis = kpis.dropna(how='all')
display(kpis)

  __DEFAULT = ''
  __STORAGE_SUFFIX = '_without_storage'
  __PARTIAL_LOAD_SUFFIX = '_and_partial_load'
  __PV_SUFFIX = '_and_pv'


name,Building_1,Building_2,Building_3,Building_4,Building_5,District
cost_function,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1
annual_normalized_unserved_energy_total,0.0,0.0,0.0,0.0,0.0,0.0
annual_peak_average,,,,,,1.095048
carbon_emissions_total,1.10255,1.121634,1.160502,1.288958,1.15636,1.166001
cost_total,1.051644,1.049357,1.099417,1.238857,1.060611,1.099977
daily_one_minus_load_factor_average,,,,,,1.006957
daily_peak_average,,,,,,1.127111
discomfort_delta_average,0.0,0.0,0.0,0.0,0.0,0.0
discomfort_delta_maximum,0.0,0.0,0.0,0.0,0.0,0.0
discomfort_delta_minimum,0.0,0.0,0.0,0.0,0.0,0.0
electricity_consumption_total,1.15407,1.201625,1.221118,1.35171,1.251914,1.236088


## Decentralized-Independent SAC

Run the following to simulate an environment controlled by decentralized-independent SAC agents for 1 training episode:

In [3]:
from citylearn.agents.sac import SAC as RLAgent
from citylearn.citylearn import CityLearnEnv

dataset_name = 'citylearn_challenge_2022_phase_1'
env = CityLearnEnv(dataset_name, central_agent=False, simulation_end_time_step=1000)
model = RLAgent(env)
model.learn(episodes=2, deterministic_finish=True)

# print cost functions at the end of episode
kpis = model.env.evaluate()
kpis = kpis.pivot(index='cost_function', columns='name', values='value')
kpis = kpis.dropna(how='all')
display(kpis)

  and should_run_async(code)


name,Building_1,Building_2,Building_3,Building_4,Building_5,District
cost_function,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1
annual_normalized_unserved_energy_total,0.0,0.0,0.0,0.0,0.0,0.0
annual_peak_average,,,,,,1.000224
carbon_emissions_total,1.004662,1.0,1.006444,1.00044,1.002564,1.002822
cost_total,1.004593,1.0,1.005644,1.000505,1.002343,1.002617
daily_one_minus_load_factor_average,,,,,,0.998608
daily_peak_average,,,,,,1.001827
discomfort_delta_average,0.0,0.0,0.0,0.0,0.0,0.0
discomfort_delta_maximum,0.0,0.0,0.0,0.0,0.0,0.0
discomfort_delta_minimum,0.0,0.0,0.0,0.0,0.0,0.0
electricity_consumption_total,1.004508,1.0,1.006431,1.000633,1.002679,1.00285


## Decentralized-Cooperative MARLISA

Run the following to simulate an environment controlled by decentralized-cooperative MARLISA agents for 1 training episode:

In [4]:
from citylearn.agents.marlisa import MARLISA as RLAgent
from citylearn.citylearn import CityLearnEnv

dataset_name = 'citylearn_challenge_2022_phase_1'
env = CityLearnEnv(dataset_name, central_agent=False, simulation_end_time_step=1000)
model = RLAgent(env)
model.learn(episodes=2, deterministic_finish=True)

kpis = model.env.evaluate()
kpis = kpis.pivot(index='cost_function', columns='name', values='value')
kpis = kpis.dropna(how='all')
display(kpis)

  and should_run_async(code)


name,Building_1,Building_2,Building_3,Building_4,Building_5,District
cost_function,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1
annual_normalized_unserved_energy_total,0.0,0.0,0.0,0.0,0.0,0.0
annual_peak_average,,,,,,1.000144
carbon_emissions_total,1.001658,1.000789,1.000947,1.001103,0.999957,1.00089
cost_total,1.002205,1.001202,1.000498,1.002034,0.99978,1.001144
daily_one_minus_load_factor_average,,,,,,0.999698
daily_peak_average,,,,,,1.000874
discomfort_delta_average,0.0,0.0,0.0,0.0,0.0,0.0
discomfort_delta_maximum,0.0,0.0,0.0,0.0,0.0,0.0
discomfort_delta_minimum,0.0,0.0,0.0,0.0,0.0,0.0
electricity_consumption_total,1.001355,1.000561,1.000642,1.000686,1.000097,1.000668


## Stable Baselines3 Reinforcement Learning Algorithms

Install the latest version of Stable Baselines3:

In [None]:
!pip install shimmy==0.2.1
!pip install stable-baselines3==2.1.0

Before the environment is ready for use in Stable Baselines3, it needs to be wrapped. Firstly, wrap the environment using the `NormalizedObservationWrapper` (see [docs](https://www.citylearn.net/api/citylearn.wrappers.html#citylearn.wrappers.NormalizedObservationWrapper)) to ensure that observations served to the agent are min-max normalized between [0, 1] and cyclical observations e.g. hour, are encoded using the cosine transformation.

Next, we wrap with the `StableBaselines3Wrapper` (see [docs](https://www.citylearn.net/api/citylearn.wrappers.html#citylearn.wrappers.StableBaselines3Wrapper)) that ensures observations, actions and rewards are served in manner that is compatible with Stable Baselines3 interface.

For the following Stable Baselines3 example, the `baeda_3dem` dataset that support building temperature dynamics is used.

> ⚠️ **NOTE**: `central_agent` in the `env` must be `True` when using Stable Baselines3  as it does not support multi-agents.

In [6]:
from stable_baselines3.sac import SAC
from citylearn.citylearn import CityLearnEnv
from citylearn.wrappers import NormalizedObservationWrapper, StableBaselines3Wrapper

dataset_name = 'baeda_3dem'
env = CityLearnEnv(dataset_name, central_agent=True, simulation_end_time_step=1000)
env = NormalizedObservationWrapper(env)
env = StableBaselines3Wrapper(env)
model = SAC('MlpPolicy', env)
model.learn(total_timesteps=env.time_steps*2)

# evaluate
observations = env.reset()

while not env.done:
    actions, _ = model.predict(observations, deterministic=True)
    observations, _, _, _ = env.step(actions)

kpis = env.evaluate()
kpis = kpis.pivot(index='cost_function', columns='name', values='value')
kpis = kpis.dropna(how='all')
display(kpis)

  from tensorflow.tsl.python.lib.core import pywrap_ml_dtypes
  deprecation(


name,Building_1,Building_2,Building_3,Building_4,District
cost_function,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1
annual_normalized_unserved_energy_total,3.251227e-10,-2.049072e-10,0.0,2.032326e-10,8.086204e-11
annual_peak_average,,,,,0.5829849
cost_total,1.03774,0.6016434,0.761491,0.4804734,0.7203371
daily_one_minus_load_factor_average,,,,,0.9960214
daily_peak_average,,,,,0.630954
discomfort_delta_average,-0.3981834,0.7763432,3.866776,3.287224,1.88304
discomfort_delta_maximum,2.259375,2.658461,6.857704,8.107384,4.970731
discomfort_delta_minimum,-3.941513,-2.659611,-3.700125,-2.782747,-3.270999
discomfort_proportion,0.2313725,0.4893993,0.874543,0.8301887,0.6063759
discomfort_too_cold_proportion,0.2156863,0.008833922,0.003654,0.004354136,0.0581321
