# DRL usage example

In this notebook example, Stable Baselines 3 has been used to train and to load an agent. However, Sinergym is completely agnostic to any DRL algorithm (although there are custom callbacks for SB3 specifically) and can be used with any DRL library that works with gymnasium interface.

## Training a model

We are going to rely on the script available in the repository root called `train_agent.py`. This script applies all the possibilities that Sinergym has to work with deep reinforcement learning algorithms and set parameters to everything so that we can define the training options from the execution of the script easily by a JSON file.

For more information about how run `train_agent.py`, please, see [Train a model](https://ugr-sail.github.io/sinergym/compilation/main/pages/deep-reinforcement-learning.html#train-a-model).

In [1]:
import sys
from datetime import datetime

import gymnasium as gym
import numpy as np
import wandb
from stable_baselines3 import *
from stable_baselines3.common.callbacks import CallbackList
from stable_baselines3.common.logger import HumanOutputFormat
from stable_baselines3.common.logger import Logger as SB3Logger
from stable_baselines3.common.monitor import Monitor

import sinergym
import sinergym.utils.gcloud as gcloud
from sinergym.utils.callbacks import *
from sinergym.utils.constants import *
from sinergym.utils.logger import CSVLogger, WandBOutputFormat
from sinergym.utils.rewards import *
from sinergym.utils.wrappers import *

First let's define some variables for the execution.

In [2]:
# Environment ID
environment = "Eplus-5zone-mixed-discrete-stochastic-v1"
# Training episodes
episodes = 5
#Name of the experiment
experiment_date = datetime.today().strftime('%Y-%m-%d_%H:%M')
experiment_name = 'SB3_DQN-' + environment + \
    '-episodes-' + str(episodes)
experiment_name += '_' + experiment_date

We can combine this experiment executions with [Weights&Biases](https://ugr-sail.github.io/sinergym/compilation/main/pages/deep-reinforcement-learning.html#weights-and-biases-structure) in order to host all information extracted. With *wandb*, it’s possible to track and visualize all DRL training process in real time, register hyperparameters and details of each experiment, save artifacts such as models and *sinergym* output, and compare between different executions.

In [3]:
# Create wandb.config object in order to log all experiment params
experiment_params = {
    'sinergym-version': sinergym.__version__,
    'python-version': sys.version
}
experiment_params.update({'environment':environment,
                          'episodes':episodes,
                          'algorithm':'SB3_DQN'})

# Get wandb init params (you have to specify your own project and entity)
wandb_params = {"project": 'sinergym',
                "entity": 'alex_ugr'}
# Init wandb entry
run = wandb.init(
    name=experiment_name + '_' + wandb.util.generate_id(),
    config=experiment_params,
    ** wandb_params
)

Failed to detect the name of this notebook, you can set it manually with the WANDB_NOTEBOOK_NAME environment variable to enable code saving.


[34m[1mwandb[0m: Currently logged in as: [33malex_ugr[0m. Use [1m`wandb login --relogin`[0m to force relogin


Now we are ready to create the Gymnasium Environment. Here we use the environment name defined, remember that you can [change default environment configuration](https://ugr-sail.github.io/sinergym/compilation/main/pages/notebooks/change_environment.html#Changing-an-environment-registered-in-Sinergym). We will create a eval_env too in order to interact in the evaluation episodes. We can overwrite the env name with experiment name if we want.

In [4]:
env = gym.make(environment, env_name=experiment_name)
eval_env = gym.make(environment, env_name=experiment_name+'_EVALUATION')

[38;20m[ENVIRONMENT] (INFO) : Creating Gymnasium environment... [SB3_DQN-Eplus-5zone-mixed-discrete-stochastic-v1-episodes-5_2023-11-08_16:04][0m
[38;20m[MODELING] (INFO) : Experiment working directory created [/workspaces/sinergym/examples/Eplus-env-SB3_DQN-Eplus-5zone-mixed-discrete-stochastic-v1-episodes-5_2023-11-08_16:04-res1][0m
[38;20m[MODELING] (INFO) : runperiod established: {'start_day': 1, 'start_month': 1, 'start_year': 1991, 'end_day': 31, 'end_month': 12, 'end_year': 1991, 'start_weekday': 1, 'n_steps_per_hour': 4}[0m
[38;20m[MODELING] (INFO) : Episode length (seconds): 31536000.0[0m
[38;20m[MODELING] (INFO) : timestep size (seconds): 900.0[0m
[38;20m[MODELING] (INFO) : timesteps per episode: 35040[0m
[38;20m[MODELING] (INFO) : Model Config is correct.[0m
[38;20m[ENVIRONMENT] (INFO) : Environment SB3_DQN-Eplus-5zone-mixed-discrete-stochastic-v1-episodes-5_2023-11-08_16:04 created successfully.[0m
[38;20m[WRAPPER DiscretizeEnv] (INFO) : New Discrete Space 

We can also add a Wrapper to the environment, we are going to use an action normalization wrapper and a logger (extensions of `gym.Wrapper`). Normalization is very recommended in DRL algorithms with continuous action space and logger is used to monitor and log the interactions with the environment and save the data into a CSV. Files generated will be stored as artifact in *wandb* too.

In [5]:
env = NormalizeAction(env)
env = LoggerWrapper(env)
eval_env = NormalizeAction(eval_env)
eval_env = LoggerWrapper(eval_env)

[38;20m[WRAPPER LoggerWrapper] (INFO) : Wrapper initialized.[0m
[38;20m[WRAPPER LoggerWrapper] (INFO) : Wrapper initialized.[0m


At this point, we have the environment set up and ready to be used. We are going to create our learning model (Stable Baselines 3 DQN), but we can use any other algorithm.

In [6]:
model = DQN('MlpPolicy', env, verbose=1)

Using cpu device
Wrapping the env with a `Monitor` wrapper
Wrapping the env in a DummyVecEnv.


Evaluation will execute the current model during a number of episodes determined to decide if it is the best current version of the model at that point of the training. Output generated will be stored in *wandb* server too.
We are going to use the LoggerEval callback to print and save the best model evaluated during training.

In [7]:
callbacks = []

# Set up Evaluation and saving best model
eval_callback = LoggerEvalCallback(
    eval_env,
    best_model_save_path=eval_env.get_wrapper_attr('workspace_path') +
    '/best_model/',
    log_path=eval_env.get_wrapper_attr('workspace_path') +
    '/best_model/',
    eval_freq=(eval_env.get_wrapper_attr('timestep_per_episode') - 1) * 2 - 1,
    deterministic=True,
    render=False,
    n_eval_episodes=1)
callbacks.append(eval_callback)


In order to track all the training process in *wandb*, it is necessary to create a callback with a compatible wandb output format (which call *wandb* log method in the learning algorithm process).

In [8]:
# wandb logger and setting in SB3
logger = SB3Logger(
    folder=None,
    output_formats=[
        HumanOutputFormat(
            sys.stdout,
            max_length=120),
        WandBOutputFormat()])
model.set_logger(logger)
# Append callback
log_callback = LoggerCallback()
callbacks.append(log_callback)


callback = CallbackList(callbacks)

This is the number of total time steps for the training.

In [9]:
timesteps = episodes * (env.get_wrapper_attr('timestep_per_episode') - 1)

Now, is time to train the model with the callbacks defined earlier. This may take a few minutes, depending on your computer.

:warning: The warning messages that appear in `model.learn()` output is due to Stable Baselines 3 is not adapted to new standar to get environment attributes yet.

In [10]:
model.learn(
    total_timesteps=timesteps,
    callback=callback,
    log_interval=1)

#----------------------------------------------------------------------------------------------#
[38;20m[ENVIRONMENT] (INFO) : Starting a new episode... [SB3_DQN-Eplus-5zone-mixed-discrete-stochastic-v1-episodes-5_2023-11-08_16:04] [Episode 1][0m
#----------------------------------------------------------------------------------------------#
[38;20m[MODELING] (INFO) : Episode directory created [/workspaces/sinergym/examples/Eplus-env-SB3_DQN-Eplus-5zone-mixed-discrete-stochastic-v1-episodes-5_2023-11-08_16:04-res1/Eplus-env-sub_run1][0m
[38;20m[MODELING] (INFO) : Weather file USA_NY_New.York-J.F.Kennedy.Intl.AP.744860_TMY3.epw used.[0m
[38;20m[MODELING] (INFO) : Updated building model with whole Output:Variable available names[0m
[38;20m[MODELING] (INFO) : Updated building model with whole Output:Meter available names[0m
[38;20m[MODELING] (INFO) : Adapting weather to building model. [USA_NY_New.York-J.F.Kennedy.Intl.AP.744860_TMY3.epw][0m
[38;20m[ENVIRONMENT] (INFO) : Savi

  epw_content = self._headers_to_epw(use_datetimes=use_datetimes) + df.to_csv(


[38;20m[SIMULATOR] (INFO) : Running EnergyPlus with args: ['-w', '/workspaces/sinergym/examples/Eplus-env-SB3_DQN-Eplus-5zone-mixed-discrete-stochastic-v1-episodes-5_2023-11-08_16:04-res1/Eplus-env-sub_run1/USA_NY_New.York-J.F.Kennedy.Intl.AP.744860_TMY3_Random_1.0_0.0_0.001.epw', '-d', '/workspaces/sinergym/examples/Eplus-env-SB3_DQN-Eplus-5zone-mixed-discrete-stochastic-v1-episodes-5_2023-11-08_16:04-res1/Eplus-env-sub_run1/output', '/workspaces/sinergym/examples/Eplus-env-SB3_DQN-Eplus-5zone-mixed-discrete-stochastic-v1-episodes-5_2023-11-08_16:04-res1/Eplus-env-sub_run1/5ZoneAutoDXVAV.epJSON'][0m
[38;20m[ENVIRONMENT] (INFO) : Episode 1 started.[0m
[38;20m[SIMULATOR] (INFO) : handlers initialized.[0m
[38;20m[SIMULATOR] (INFO) : handlers are ready.[0m
[38;20m[SIMULATOR] (INFO) : System is ready.[0m
[38;20m[WRAPPER LoggerWrapper] (INFO) : Creating monitor.csv for current episode (episode 1) if logger is active[0m
------------------------------------------------------------

  gym.logger.warn("Casting input x to numpy array.")
  logger.warn(
  logger.warn(
  logger.warn(


--------------------------------------------------------------------------------------------------------------| 2%
| action_network/                 |            |
|    index                        | 7          |
| action_simulation/              |            |
|    Cooling_Setpoint_RL          | 23         |
|    Heating_Setpoint_RL          | 22         |
| observation/                    |            |
|    HVAC_electricity_demand_rate | 2765.6453  |
|    air_humidity                 | 12.314985  |
|    air_temperature              | 19.344555  |
|    clg_setpoint                 | 22.5       |
|    co2_emission                 | 0.0        |
|    day_of_month                 | 6.0        |
|    diffuse_solar_radiation      | 0.0        |
|    direct_solar_radiation       | 0.0        |
|    hour                         | 4.0        |
|    htg_setpoint                 | 22.0       |
|    month                        | 1.0        |
|    outdoor_humidity             | 67.0       |
|  

  epw_content = self._headers_to_epw(use_datetimes=use_datetimes) + df.to_csv(


[38;20m[SIMULATOR] (INFO) : handlers are ready.[0m
[38;20m[SIMULATOR] (INFO) : System is ready.[0m
[38;20m[WRAPPER LoggerWrapper] (INFO) : Creating monitor.csv for current episode (episode 2) if logger is active[0m
-----------------------------------------------------
| action_network/                     |             |
|    index                            | 2           |
| action_simulation/                  |             |
|    Cooling_Setpoint_RL              | 28          |
|    Heating_Setpoint_RL              | 17          |
| episode/                            |             |
|    comfort_violation_time(%)        | 51.2        |
|    cumulative_comfort_penalty       | -1.66e+04   |
|    cumulative_energy_penalty        | -1.76e+03   |
|    cumulative_power                 | 3.51e+07    |
|    cumulative_reward                | -18307.732  |
|    cumulative_temperature_violation | 3.31e+04    |
|    episode_length                   | 35040       |
|    mean_comfort_penal

  logger.warn(
  logger.warn(
  logger.warn(
  gym.logger.warn("Casting input x to numpy array.")


-------------------------------------------------
| action_network/                 |             |
|    index                        | 9           |
| action_simulation/              |             |
|    Cooling_Setpoint_RL          | 22.5        |
|    Heating_Setpoint_RL          | 21          |
| observation/                    |             |
|    HVAC_electricity_demand_rate | 1332.968    |
|    air_humidity                 | 12.466696   |
|    air_temperature              | 17.640285   |
|    clg_setpoint                 | 29.0        |
|    co2_emission                 | 0.0         |
|    day_of_month                 | 6.0         |
|    diffuse_solar_radiation      | 0.0         |
|    direct_solar_radiation       | 0.0         |
|    hour                         | 19.0        |
|    htg_setpoint                 | 16.0        |
|    month                        | 1.0         |
|    outdoor_humidity             | 54.0        |
|    outdoor_temperature          | 0.092068404 |




-----------------------------------------------
| action_network/                 |           |
|    index                        | 6         |
| action_simulation/              |           |
|    Cooling_Setpoint_RL          | 24        |
|    Heating_Setpoint_RL          | 21        |
| observation/                    |           |
|    HVAC_electricity_demand_rate | 0.0       |
|    air_humidity                 | 63.23831  |
|    air_temperature              | 30.472967 |
|    clg_setpoint                 | 22.5      |
|    co2_emission                 | 0.0       |
|    day_of_month                 | 18.0      |
|    diffuse_solar_radiation      | 0.0       |
|    direct_solar_radiation       | 0.0       |
|    hour                         | 18.0      |
|    htg_setpoint                 | 22.0      |
|    month                        | 8.0       |
|    outdoor_humidity             | 60.0      |
|    outdoor_temperature          | 25.364153 |
|    people_occupant              | 0.0 

  epw_content = self._headers_to_epw(use_datetimes=use_datetimes) + df.to_csv(


[38;20m[SIMULATOR] (INFO) : handlers initialized.[0m
[38;20m[SIMULATOR] (INFO) : handlers are ready.[0m
[38;20m[SIMULATOR] (INFO) : System is ready.[0m
[38;20m[WRAPPER LoggerWrapper] (INFO) : Creating monitor.csv for current episode (episode 1) if logger is active[0m
Progress: |**-------------------------------------------------------------------------------------------------| 2%

  gym.logger.warn("Casting input x to numpy array.")


Progress: |***************************************************************************************************| 99%
[38;20m[WRAPPER LoggerWrapper] (INFO) : End of episode, recording summary (progress.csv) if logger is active[0m
[38;20m[ENVIRONMENT] (INFO) : Environment closed. [SB3_DQN-Eplus-5zone-mixed-discrete-stochastic-v1-episodes-5_2023-11-08_16:04_EVALUATION][0m
#----------------------------------------------------------------------------------------------#
[38;20m[ENVIRONMENT] (INFO) : Starting a new episode... [SB3_DQN-Eplus-5zone-mixed-discrete-stochastic-v1-episodes-5_2023-11-08_16:04] [Episode 3][0m
#----------------------------------------------------------------------------------------------#
[38;20m[MODELING] (INFO) : Episode directory created [/workspaces/sinergym/examples/Eplus-env-SB3_DQN-Eplus-5zone-mixed-discrete-stochastic-v1-episodes-5_2023-11-08_16:04-res1/Eplus-env-sub_run3][0m
[38;20m[MODELING] (INFO) : Weather file USA_NY_New.York-J.F.Kennedy.Intl.AP.7

  epw_content = self._headers_to_epw(use_datetimes=use_datetimes) + df.to_csv(


[38;20m[WRAPPER LoggerWrapper] (INFO) : Creating monitor.csv for current episode (episode 3) if logger is active[0m
Eval num_timesteps=70077, episode_reward=-30730.39 +/- 0.00
Episode length: 35040.00 +/- 0.00
----------------------------------------------------
| action_network/                     |            |
|    index                            | 1          |
| action_simulation/                  |            |
|    Cooling_Setpoint_RL              | 29         |
|    Heating_Setpoint_RL              | 16         |
| eval/                               |            |
|    comfort_violation(%)             | 68.1       |
|    cumulative_comfort_penalty       | -2.9e+04   |
|    cumulative_energy_penalty        | -1.71e+03  |
|    cumulative_power_consumption     | 3.42e+07   |
|    cumulative_reward                | -3.07e+04  |
|    cumulative_temperature_violation | 5.8e+04    |
|    episode_length                   | 3.5e+04    |
|    mean_comfort_penalty             | -0.828

  logger.warn(
  logger.warn(
  logger.warn(
  gym.logger.warn("Casting input x to numpy array.")


--------------------------------------------------------------------------------------------------------------| 1%
| action_network/                 |           |
|    index                        | 6         |
| action_simulation/              |           |
|    Cooling_Setpoint_RL          | 24        |
|    Heating_Setpoint_RL          | 21        |
| observation/                    |           |
|    HVAC_electricity_demand_rate | 244.03078 |
|    air_humidity                 | 28.87472  |
|    air_temperature              | 20.998764 |
|    clg_setpoint                 | 24.0      |
|    co2_emission                 | 0.0       |
|    day_of_month                 | 3.0       |
|    diffuse_solar_radiation      | 1.0       |
|    direct_solar_radiation       | 0.0       |
|    hour                         | 7.0       |
|    htg_setpoint                 | 21.0      |
|    month                        | 1.0       |
|    outdoor_humidity             | 92.5      |
|    outdoor_temperat

  epw_content = self._headers_to_epw(use_datetimes=use_datetimes) + df.to_csv(


[38;20m[SIMULATOR] (INFO) : handlers are ready.[0m
[38;20m[SIMULATOR] (INFO) : System is ready.[0m
[38;20m[WRAPPER LoggerWrapper] (INFO) : Creating monitor.csv for current episode (episode 4) if logger is active[0m
-----------------------------------------------------
| action_network/                     |             |
|    index                            | 7           |
| action_simulation/                  |             |
|    Cooling_Setpoint_RL              | 23          |
|    Heating_Setpoint_RL              | 22          |
| episode/                            |             |
|    comfort_violation_time(%)        | 52.1        |
|    cumulative_comfort_penalty       | -3.59e+04   |
|    cumulative_energy_penalty        | -3.55e+03   |
|    cumulative_power                 | 7.11e+07    |
|    cumulative_reward                | -39423.22   |
|    cumulative_temperature_violation | 7.17e+04    |
|    episode_length                   | 70077       |
|    mean_comfort_penal

  logger.warn(
  logger.warn(
  logger.warn(
  gym.logger.warn("Casting input x to numpy array.")


------------------------------------------------
| action_network/                 |            |
|    index                        | 2          |
| action_simulation/              |            |
|    Cooling_Setpoint_RL          | 28         |
|    Heating_Setpoint_RL          | 17         |
| observation/                    |            |
|    HVAC_electricity_demand_rate | 1846.887   |
|    air_humidity                 | 19.366085  |
|    air_temperature              | 18.090607  |
|    clg_setpoint                 | 29.0       |
|    co2_emission                 | 0.0        |
|    day_of_month                 | 2.0        |
|    diffuse_solar_radiation      | 0.0        |
|    direct_solar_radiation       | 0.0        |
|    hour                         | 21.0       |
|    htg_setpoint                 | 16.0       |
|    month                        | 1.0        |
|    outdoor_humidity             | 29.0       |
|    outdoor_temperature          | -2.4775262 |
|    people_occupant



-----------------------------------------------*******--------------------------------------------------------| 43%
| action_network/                 |           |
|    index                        | 1         |
| action_simulation/              |           |
|    Cooling_Setpoint_RL          | 29        |
|    Heating_Setpoint_RL          | 16        |
| observation/                    |           |
|    HVAC_electricity_demand_rate | 0.0       |
|    air_humidity                 | 89.8657   |
|    air_temperature              | 21.306032 |
|    clg_setpoint                 | 29.0      |
|    co2_emission                 | 0.0       |
|    day_of_month                 | 7.0       |
|    diffuse_solar_radiation      | 0.0       |
|    direct_solar_radiation       | 0.0       |
|    hour                         | 2.0       |
|    htg_setpoint                 | 16.0      |
|    month                        | 6.0       |
|    outdoor_humidity             | 100.0     |
|    outdoor_tempera

  epw_content = self._headers_to_epw(use_datetimes=use_datetimes) + df.to_csv(


[38;20m[SIMULATOR] (INFO) : handlers are ready.[0m
[38;20m[SIMULATOR] (INFO) : System is ready.[0m
[38;20m[WRAPPER LoggerWrapper] (INFO) : Creating monitor.csv for current episode (episode 2) if logger is active[0m
Progress: |**-------------------------------------------------------------------------------------------------| 2%

  gym.logger.warn("Casting input x to numpy array.")


Progress: |***************************************************************************************************| 99%
[38;20m[WRAPPER LoggerWrapper] (INFO) : End of episode, recording summary (progress.csv) if logger is active[0m
[38;20m[ENVIRONMENT] (INFO) : Environment closed. [SB3_DQN-Eplus-5zone-mixed-discrete-stochastic-v1-episodes-5_2023-11-08_16:04_EVALUATION][0m
#----------------------------------------------------------------------------------------------#
[38;20m[ENVIRONMENT] (INFO) : Starting a new episode... [SB3_DQN-Eplus-5zone-mixed-discrete-stochastic-v1-episodes-5_2023-11-08_16:04] [Episode 5][0m
#----------------------------------------------------------------------------------------------#
[38;20m[MODELING] (INFO) : Episode directory created [/workspaces/sinergym/examples/Eplus-env-SB3_DQN-Eplus-5zone-mixed-discrete-stochastic-v1-episodes-5_2023-11-08_16:04-res1/Eplus-env-sub_run5][0m
[38;20m[MODELING] (INFO) : Weather file USA_NY_New.York-J.F.Kennedy.Intl.AP.7

  epw_content = self._headers_to_epw(use_datetimes=use_datetimes) + df.to_csv(
  logger.warn(
  logger.warn(
  logger.warn(
  gym.logger.warn("Casting input x to numpy array.")


New best mean reward!
[38;20m[SIMULATOR] (INFO) : handlers are ready.[0m
[38;20m[SIMULATOR] (INFO) : System is ready.[0m
------------------------------------------------
| action_network/                 |            |
|    index                        | 6          |
| action_simulation/              |            |
|    Cooling_Setpoint_RL          | 24         |
|    Heating_Setpoint_RL          | 21         |
| observation/                    |            |
|    HVAC_electricity_demand_rate | 1553.6073  |
|    air_humidity                 | 14.101513  |
|    air_temperature              | 22.53461   |
|    clg_setpoint                 | 24.0       |
|    co2_emission                 | 0.0        |
|    day_of_month                 | 1.0        |
|    diffuse_solar_radiation      | 189.5      |
|    direct_solar_radiation       | 229.5      |
|    hour                         | 11.0       |
|    htg_setpoint                 | 21.0       |
|    month                        | 1.0   

  epw_content = self._headers_to_epw(use_datetimes=use_datetimes) + df.to_csv(


[38;20m[SIMULATOR] (INFO) : handlers are ready.[0m
[38;20m[SIMULATOR] (INFO) : System is ready.[0m
[38;20m[WRAPPER LoggerWrapper] (INFO) : Creating monitor.csv for current episode (episode 6) if logger is active[0m
-----------------------------------------------------
| action_network/                     |             |
|    index                            | 8           |
| action_simulation/                  |             |
|    Cooling_Setpoint_RL              | 22.5        |
|    Heating_Setpoint_RL              | 22          |
| episode/                            |             |
|    comfort_violation_time(%)        | 52.4        |
|    cumulative_comfort_penalty       | -3.67e+04   |
|    cumulative_energy_penalty        | -3.53e+03   |
|    cumulative_power                 | 7.07e+07    |
|    cumulative_reward                | -40212.746  |
|    cumulative_temperature_violation | 7.34e+04    |
|    episode_length                   | 70077       |
|    mean_comfort_penal

  logger.warn(
  logger.warn(
  logger.warn(
  gym.logger.warn("Casting input x to numpy array.")


<stable_baselines3.dqn.dqn.DQN at 0x7ff4fe8400d0>

Now, we save the current model (model version when training has finished).

In [11]:
model.save(str(env.get_wrapper_attr('timestep_per_episode'))+ '/' + experiment_name)



And as always, remember to close the environment.

In [12]:
env.close()

[38;20m[WRAPPER LoggerWrapper] (INFO) : End of episode, recording summary (progress.csv) if logger is active[0m
Progress: |***************************************************************************************************| 99%
[38;20m[ENVIRONMENT] (INFO) : Environment closed. [SB3_DQN-Eplus-5zone-mixed-discrete-stochastic-v1-episodes-5_2023-11-08_16:04][0m


We have to upload all Sinergym output as *wandb* artifact. This output include all sinergym_output (and LoggerWrapper CSV files) and models generated in training and evaluation episodes.

In [14]:
artifact = wandb.Artifact(
        name="experiment1",
        type="training")
artifact.add_dir(
        env.get_wrapper_attr('workspace_path'),
        name='training_output/')
artifact.add_dir(
    eval_env.get_wrapper_attr('workspace_path'),
    name='evaluation_output/')
run.log_artifact(artifact)

# wandb has finished
run.finish()

[34m[1mwandb[0m: Adding directory to artifact (/workspaces/sinergym/examples/Eplus-env-SB3_DQN-Eplus-5zone-mixed-discrete-stochastic-v1-episodes-5_2023-11-08_16:04-res1)... Done. 0.1s
[34m[1mwandb[0m: Adding directory to artifact (/workspaces/sinergym/examples/Eplus-env-SB3_DQN-Eplus-5zone-mixed-discrete-stochastic-v1-episodes-5_2023-11-08_16:04_EVALUATION-res1)... Done. 0.0s


0,1
action_network/index,▂▆▂▇▃▃▇▃▃▆▃▇▃▅▆▃▆▇▁▃▃▃▅▇▃▇▂▂▅▄▂▅▁▇▂█▆▅▂▇
action_simulation/Cooling_Setpoint_RL,▇▁▇▁▆▆▁▆▅▂▅▁▆▃▁▆▁▁█▆▅▅▃▁▅▁▇▇▃▄▇▃█▁▇▁▂▃▇▁
action_simulation/Heating_Setpoint_RL,▂█▂█▃▃█▃▄▇▄█▃▆█▃██▁▃▄▄▆█▄█▂▂▆▅▂▆▁█▂▇▇▆▂█
episode/comfort_violation_time(%),▁▆█
episode/cumulative_comfort_penalty,█▁▁
episode/cumulative_energy_penalty,█▁▁
episode/cumulative_power,▁██
episode/cumulative_reward,█▁▁
episode/cumulative_temperature_violation,▁██
episode/episode_length,▁██

0,1
action_network/index,8.0
action_simulation/Cooling_Setpoint_RL,22.5
action_simulation/Heating_Setpoint_RL,22.0
episode/comfort_violation_time(%),52.41948
episode/cumulative_comfort_penalty,-36679.72382
episode/cumulative_energy_penalty,-3533.02357
episode/cumulative_power,70660471.30403
episode/cumulative_reward,-40212.74609
episode/cumulative_temperature_violation,73359.44764
episode/episode_length,70077.0


We have all the experiments results in our local computer, but we can see the execution in *wandb* too:


- If we check our projects, we can see the execution allocated:

![wandb_projects1](https://github.com/ugr-sail/sinergym/blob/main/images/wandb_projects1.png?raw=true)


- Hyperparameters tracked in the training experiment:

![wandb_training_hyperparameters](https://github.com/ugr-sail/sinergym/blob/main/images/wandb_training_hyperparameters.png?raw=true)



- Artifacts registered (if evaluation is enabled, best model is registered too):
  
![wandb_training_artifact](https://github.com/ugr-sail/sinergym/blob/main/images/wandb_training_artifact.png?raw=true)



- Visualization of metrics in real time:
  
![wandb_training_charts](https://github.com/ugr-sail/sinergym/blob/main/images/wandb_training_charts.png?raw=true)

## Loading a model

We are going to rely on the script available in the repository root called `load_agent.py`. This script applies all the possibilities that Sinergym has to work with deep reinforcement learning models loaded and set parameters to everything, so that we can define the load options from the execution of the script easily by a JSON file.

For more information about how run `load_agent.py`, please, see [Load a trained model](https://ugr-sail.github.io/sinergym/compilation/main/pages/deep-reinforcement-learning.html#load-a-trained-model).

First, we define the Sinergym environment ID where we want to check the loaded agent and the name of the evaluation experiment.

In [15]:
# Environment ID
environment = "Eplus-5zone-mixed-discrete-stochastic-v1"
# Episodes
episodes=5
# Evaluation name
evaluation_date = datetime.today().strftime('%Y-%m-%d_%H:%M')
evaluation_name = 'SB3_DQN-EVAL-' + environment + \
    '-episodes-' + str(episodes)
evaluation_name += '_' + evaluation_date

We can also use *wandb* here. We can allocate this evaluation of a loaded model in other project in order to not merge experiments. 

In [16]:

# Create wandb.config object in order to log all experiment params
experiment_params = {
    'sinergym-version': sinergym.__version__,
    'python-version': sys.version
}
experiment_params.update({'environment':environment,
                          'episodes':episodes,
                          'algorithm':'SB3_DQN'})

# Get wandb init params (you have to specify your own project and entity)
wandb_params = {"project": 'sinergym_evaluations',
                "entity": 'alex_ugr'}
# Init wandb entry
run = wandb.init(
    name=experiment_name + '_' + wandb.util.generate_id(),
    config=experiment_params,
    ** wandb_params
)

We make the gymnasium environment, it is **important to wrap the environment with the same wrappers as used in training**. We can use the evaluation experiment name to rename the environment.

In [20]:
env = gym.make(environment, env_name=evaluation_name)
env = NormalizeAction(env)
env = LoggerWrapper(env)

[38;20m[ENVIRONMENT] (INFO) : Creating Gymnasium environment... [SB3_DQN-EVAL-Eplus-5zone-mixed-discrete-stochastic-v1-episodes-5_2023-11-08_16:09][0m
[38;20m[MODELING] (INFO) : Experiment working directory created [/workspaces/sinergym/examples/Eplus-env-SB3_DQN-EVAL-Eplus-5zone-mixed-discrete-stochastic-v1-episodes-5_2023-11-08_16:09-res35042][0m
[38;20m[MODELING] (INFO) : runperiod established: {'start_day': 1, 'start_month': 1, 'start_year': 1991, 'end_day': 31, 'end_month': 12, 'end_year': 1991, 'start_weekday': 1, 'n_steps_per_hour': 4}[0m
[38;20m[MODELING] (INFO) : Episode length (seconds): 31536000.0[0m
[38;20m[MODELING] (INFO) : timestep size (seconds): 900.0[0m
[38;20m[MODELING] (INFO) : timesteps per episode: 35040[0m
[38;20m[MODELING] (INFO) : Model Config is correct.[0m
[38;20m[ENVIRONMENT] (INFO) : Environment SB3_DQN-EVAL-Eplus-5zone-mixed-discrete-stochastic-v1-episodes-5_2023-11-08_16:09 created successfully.[0m
[38;20m[WRAPPER DiscretizeEnv] (INFO) : 

We load the Stable Baselines 3 DQN model using the model allocated in our local computer, although we can use a remote model allocated in *wandb* from other training experiment.

In [21]:
# get wandb artifact path (to load model)
load_artifact_entity = 'alex_ugr'
load_artifact_project = 'sinergym'
load_artifact_name = 'experiment1'
load_artifact_tag = 'latest'
load_artifact_model_path = 'evaluation_output/best_model/model.zip'
wandb_path = load_artifact_entity + '/' + load_artifact_project + \
    '/' + load_artifact_name + ':' + load_artifact_tag
# Download artifact
artifact = run.use_artifact(wandb_path)
artifact.get_path(load_artifact_model_path).download('.')
# Set model path to local wandb file downloaded
model_path = './' + load_artifact_model_path
model = DQN.load(model_path)

As we can see, The *wandb* model we want to load can come from an artifact of an different entity or project from the one we are using to register the evaluation of the loaded model, as long as it is accessible.
The next step is use the model to predict actions and interact with the environment in order to collect data to evaluate the model.

In [22]:
for i in range(episodes):
    obs, info = env.reset()
    rewards = []
    terminated = False
    current_month = 0
    while not terminated:
        a, _ = model.predict(obs)
        obs, reward, terminated, truncated, info = env.step(a)
        rewards.append(reward)
        if info['month'] != current_month:
            current_month = info['month']
            print(info['month'], sum(rewards))
    print(
        'Episode ',
        i,
        'Mean reward: ',
        np.mean(rewards),
        'Cumulative reward: ',
        sum(rewards))
env.close()

#----------------------------------------------------------------------------------------------#
[38;20m[ENVIRONMENT] (INFO) : Starting a new episode... [SB3_DQN-EVAL-Eplus-5zone-mixed-discrete-stochastic-v1-episodes-5_2023-11-08_16:09] [Episode 1][0m
#----------------------------------------------------------------------------------------------#
[38;20m[MODELING] (INFO) : Episode directory created [/workspaces/sinergym/examples/Eplus-env-SB3_DQN-EVAL-Eplus-5zone-mixed-discrete-stochastic-v1-episodes-5_2023-11-08_16:09-res35042/Eplus-env-sub_run1][0m
[38;20m[MODELING] (INFO) : Weather file USA_NY_New.York-J.F.Kennedy.Intl.AP.744860_TMY3.epw used.[0m
[38;20m[MODELING] (INFO) : Updated building model with whole Output:Variable available names[0m
[38;20m[MODELING] (INFO) : Updated building model with whole Output:Meter available names[0m
[38;20m[MODELING] (INFO) : Adapting weather to building model. [USA_NY_New.York-J.F.Kennedy.Intl.AP.744860_TMY3.epw][0m
[38;20m[ENVIRONMENT]

  epw_content = self._headers_to_epw(use_datetimes=use_datetimes) + df.to_csv(


[38;20m[SIMULATOR] (INFO) : handlers initialized.[0m
[38;20m[SIMULATOR] (INFO) : handlers are ready.[0m
[38;20m[SIMULATOR] (INFO) : System is ready.[0m
[38;20m[WRAPPER LoggerWrapper] (INFO) : Creating monitor.csv for current episode (episode 1) if logger is active[0m
1 -2.844980790275461
Progress: |**-------------------------------------------------------------------------------------------------| 2%

  gym.logger.warn("Casting input x to numpy array.")


2 -550.6520763960882------------------------------------------------------------------------------------------| 9%
3 -1054.6999435408802******-----------------------------------------------------------------------------------| 16%
4 -1510.7431015254806***************--------------------------------------------------------------------------| 25%
5 -3223.16888975272*************************------------------------------------------------------------------| 33%
6 -4585.64469015879*********************************----------------------------------------------------------| 41%
7 -5997.2605708685605****************************************-------------------------------------------------| 50%
8 -7584.2880944282515***********************************************------------------------------------------| 58%
9 -9522.878708232074*********************************************************---------------------------------| 66%
10 -11216.00089309922****************************************************

  epw_content = self._headers_to_epw(use_datetimes=use_datetimes) + df.to_csv(


[38;20m[SIMULATOR] (INFO) : handlers are ready.[0m
[38;20m[SIMULATOR] (INFO) : System is ready.[0m
[38;20m[WRAPPER LoggerWrapper] (INFO) : Creating monitor.csv for current episode (episode 2) if logger is active[0m
1 -2.8104472134277922
Progress: |**-------------------------------------------------------------------------------------------------| 2%

  gym.logger.warn("Casting input x to numpy array.")


2 -539.2949397939741------------------------------------------------------------------------------------------| 9%
3 -1039.4194702407353******-----------------------------------------------------------------------------------| 16%
4 -1503.7581367997045***************--------------------------------------------------------------------------| 25%
5 -3243.706450644304************************------------------------------------------------------------------| 33%
6 -4619.775860369805********************************----------------------------------------------------------| 41%
7 -6094.781084889675*****************************************-------------------------------------------------| 50%
8 -7713.525825840575************************************************------------------------------------------| 58%
9 -9702.29296104565**********************************************************---------------------------------| 66%
10 -11387.396433434777***************************************************

  epw_content = self._headers_to_epw(use_datetimes=use_datetimes) + df.to_csv(


[38;20m[SIMULATOR] (INFO) : handlers are ready.[0m
[38;20m[SIMULATOR] (INFO) : System is ready.[0m
[38;20m[WRAPPER LoggerWrapper] (INFO) : Creating monitor.csv for current episode (episode 3) if logger is active[0m
1 -2.994138247738472
Progress: |**-------------------------------------------------------------------------------------------------| 2%

  gym.logger.warn("Casting input x to numpy array.")


2 -545.1862521387377------------------------------------------------------------------------------------------| 9%
3 -1050.7762821529848******-----------------------------------------------------------------------------------| 16%
4 -1505.9556171045572***************--------------------------------------------------------------------------| 25%
5 -3213.180333154224************************------------------------------------------------------------------| 33%
6 -4540.406334758327********************************----------------------------------------------------------| 41%
7 -6027.943381677397*****************************************-------------------------------------------------| 50%
8 -7647.598745270801************************************************------------------------------------------| 58%
9 -9558.387293790538*********************************************************---------------------------------| 66%
10 -11177.457467297521***************************************************

  epw_content = self._headers_to_epw(use_datetimes=use_datetimes) + df.to_csv(


[38;20m[SIMULATOR] (INFO) : handlers are ready.[0m
[38;20m[SIMULATOR] (INFO) : System is ready.[0m
[38;20m[WRAPPER LoggerWrapper] (INFO) : Creating monitor.csv for current episode (episode 4) if logger is active[0m
1 -2.827133392380849
Progress: |**-------------------------------------------------------------------------------------------------| 2%

  gym.logger.warn("Casting input x to numpy array.")


2 -544.6484325178959------------------------------------------------------------------------------------------| 9%
3 -1043.5171454928898******-----------------------------------------------------------------------------------| 16%
4 -1523.9624406341723***************--------------------------------------------------------------------------| 25%
5 -3247.2828010221783***********************------------------------------------------------------------------| 33%
6 -4614.8694743127335*******************************----------------------------------------------------------| 41%
7 -6060.527800057454*****************************************-------------------------------------------------| 50%
8 -7637.374791970513************************************************------------------------------------------| 58%
9 -9605.427071623892*********************************************************---------------------------------| 66%
10 -11286.00225321745****************************************************

  epw_content = self._headers_to_epw(use_datetimes=use_datetimes) + df.to_csv(


[38;20m[SIMULATOR] (INFO) : handlers are ready.[0m
[38;20m[SIMULATOR] (INFO) : System is ready.[0m
[38;20m[WRAPPER LoggerWrapper] (INFO) : Creating monitor.csv for current episode (episode 5) if logger is active[0m
1 -2.865010119360112
Progress: |*--------------------------------------------------------------------------------------------------| 1%

  gym.logger.warn("Casting input x to numpy array.")


2 -544.6171484999464------------------------------------------------------------------------------------------| 9%
3 -1057.5882779454844******-----------------------------------------------------------------------------------| 16%
4 -1509.6998200524527***************--------------------------------------------------------------------------| 25%
5 -3204.972436991025************************------------------------------------------------------------------| 33%
6 -4550.213946351334********************************----------------------------------------------------------| 41%
7 -5996.278990256148*****************************************-------------------------------------------------| 50%
8 -7586.390169836351************************************************------------------------------------------| 58%
9rogress: |******************************************************************---------------------------------| 66% -9481.853361773356
10 -11169.978948733207********************************

Finally, we register the evaluation data in wandb as an artifact to save it.

In [None]:
artifact = wandb.Artifact(
    name="evaluation1",
    type="evaluating")
artifact.add_dir(
    env.experiment_path,
    name='evaluation_output/')

run.log_artifact(artifact)

# wandb has finished
run.finish()

[34m[1mwandb[0m: Adding directory to artifact (/workspaces/sinergym/examples/Eplus-env-SB3_DQN-EVAL-Eplus-5zone-mixed-discrete-stochastic-v1-episodes-5_2023-08-03_13:36-res1)... Done. 0.1s


We have the loaded model results in our local computer, but we can see the execution in *wandb* too:

- If we check the wandb project list, we can see that sinergym_evaluations project has a new run:

![wandb_project2](https://github.com/ugr-sail/sinergym/blob/main/images/wandb_project2.png?raw=true)


- Hyperparameters tracked in the evaluation experiment and we can see the previous training artifact used to load the model:

![wandb_evaluating_hyperparameters](https://github.com/ugr-sail/sinergym/blob/main/images/wandb_evaluating_hyperparameters.png?raw=true)



- Artifact registered with Sinergym Output (and CSV files generated with the Logger Wrapper):
  
![wandb_evaluating_artifact](https://github.com/ugr-sail/sinergym/blob/main/images/wandb_evaluating_artifact.png?raw=true)