# DRL usage example

We are going to rely on the script available in the repository root called `DRL_battery.py`. This script applies all the possibilities that Sinergym has to work with deep reinforcement learning algorithms and set parameters to everything so that we can define the training options from the execution of the script.

.. note:: For more information about how run `DRL_battery.py`, please, see [DRL documentation](https://ugr-sail.github.io/sinergym/compilation/html/pages/deep-reinforcement-learning.html#how-use)

In [17]:
import sinergym
from sinergym.utils.callbacks import LoggerEvalCallback
from sinergym.utils.rewards import *
from sinergym.utils.wrappers import LoggerWrapper
from datetime import datetime
import gym
from stable_baselines3 import DQN
from stable_baselines3.common.callbacks import CallbackList
from stable_baselines3.common.vec_env import DummyVecEnv

First let's define some strings and variables for the execution.

In [18]:
environment = "Eplus-demo-v1"
episodes = 1
experiment_date = datetime.today().strftime('%Y-%m-%d %H:%M')

# register run name
name = F"DQN-{environment}-episodes_{episodes}({experiment_date})"

Now we are ready to create the Gym Environment. Here we use the run name defined before as well as the type of reward, in our case we are going to use the LinearReward defined by `Sinergym`. You can define your own or use any of the other defined by `Sinergym` have a look at ref:`rewards` for more information on that.

In [19]:
extra_conf={
    'timesteps_per_hour':4,
    'runperiod':(1,1,1991,2,1,1992),
}
env = gym.make(environment, 
                reward=LinearReward, 
                config_params=extra_conf)

[2023-04-08 00:21:04,389] EPLUS_ENV_demo-v1_MainThread_ROOT INFO:Updating idf ExternalInterface object if it is not present...
[2023-04-08 00:21:04,389] EPLUS_ENV_demo-v1_MainThread_ROOT INFO:Updating idf ExternalInterface object if it is not present...
[2023-04-08 00:21:04,389] EPLUS_ENV_demo-v1_MainThread_ROOT INFO:Updating idf ExternalInterface object if it is not present...
[2023-04-08 00:21:04,390] EPLUS_ENV_demo-v1_MainThread_ROOT INFO:Updating idf Site:Location and SizingPeriod:DesignDay(s) to weather and ddy file...
[2023-04-08 00:21:04,390] EPLUS_ENV_demo-v1_MainThread_ROOT INFO:Updating idf Site:Location and SizingPeriod:DesignDay(s) to weather and ddy file...
[2023-04-08 00:21:04,390] EPLUS_ENV_demo-v1_MainThread_ROOT INFO:Updating idf Site:Location and SizingPeriod:DesignDay(s) to weather and ddy file...
[2023-04-08 00:21:04,392] EPLUS_ENV_demo-v1_MainThread_ROOT INFO:Updating idf OutPut:Variable and variables XML tree model for BVCTB connection.
[2023-04-08 00:21:04,392] E

We can add also a Wrapper to the environment, we are going to use a Logger (extension of ``gym.Wrapper``) this is used to monitor and log the interactions with the environment and save the data into a CSV.

In [20]:
env = LoggerWrapper(env)

At this point we have the environment all set up and ready to be used to define and create our learning model in this case it's going to be a DQN, but we can use any other (have a look at the `DRL_battery.py` and read :ref:`Deep Reinforcement Learning Integration` for more detailed information on available DRL algorithms).
Please feel free to play and change the values of the attributes of our model (or even the model) to see the differences.

In [21]:
model = DQN('MlpPolicy', env, verbose=1)

Using cpu device
Wrapping the env with a `Monitor` wrapper
Wrapping the env in a DummyVecEnv.


Now we need to calculate the number of timesteps of each episode for the evaluation.

In [22]:
n_timesteps_episode = env.simulator._eplus_one_epi_len / \
                      env.simulator._eplus_run_stepsize

print(n_timesteps_episode)


35232.0


Now we need to create a vectorized wrapper for the environment because the callbacks we are going to use require a vector.

In [23]:
env_vec = DummyVecEnv([lambda: env])

We are going to use the LoggerEval callback to print and save the best model evaluated during training.

In [24]:
callbacks = []

# Set up Evaluation and saving best model
eval_callback = LoggerEvalCallback(
    env_vec,
    best_model_save_path='best_model/' + name + '/',
    log_path='best_model/' + name + '/',
    eval_freq=n_timesteps_episode * 2,
    deterministic=True,
    render=False,
    n_eval_episodes=2)
callbacks.append(eval_callback)

callback = CallbackList(callbacks)

This is the number of total time steps for the training.

In [25]:
# timesteps = episodes * n_timesteps_episode

Now is time to train the model with the callbacks defined earlier. This may take a few minutes, depending on your computer.

In [26]:
for timesteps in range(2000, 303000, 10000):
    print("Learning timesteps: ", timesteps)
    env.reset()
    model.learn(
        total_timesteps=timesteps,
        callback=callback,
        log_interval=1)
    model.save('DQN_models/Demo-v1_timesteps_' + str(timesteps))

Learning timesteps:  2000
[2023-04-08 00:21:04,817] EPLUS_ENV_demo-v1_MainThread_ROOT INFO:Creating new EnergyPlus simulation episode...
[2023-04-08 00:21:04,817] EPLUS_ENV_demo-v1_MainThread_ROOT INFO:Creating new EnergyPlus simulation episode...
[2023-04-08 00:21:04,817] EPLUS_ENV_demo-v1_MainThread_ROOT INFO:Creating new EnergyPlus simulation episode...
[2023-04-08 00:21:04,830] EPLUS_ENV_demo-v1_MainThread_ROOT INFO:EnergyPlus working directory is in /workspaces/sinergym-475112771eb968ca05ea2e27c16fed29e37cb9fe/Eplus-env-demo-v1-res1/Eplus-env-sub_run1
[2023-04-08 00:21:04,830] EPLUS_ENV_demo-v1_MainThread_ROOT INFO:EnergyPlus working directory is in /workspaces/sinergym-475112771eb968ca05ea2e27c16fed29e37cb9fe/Eplus-env-demo-v1-res1/Eplus-env-sub_run1
[2023-04-08 00:21:04,830] EPLUS_ENV_demo-v1_MainThread_ROOT INFO:EnergyPlus working directory is in /workspaces/sinergym-475112771eb968ca05ea2e27c16fed29e37cb9fe/Eplus-env-demo-v1-res1/Eplus-env-sub_run1


  return _methods._mean(a, axis=axis, dtype=dtype,
  ret = ret.dtype.type(ret / rcount)
  ret = _var(a, axis=axis, dtype=dtype, out=out, ddof=ddof,
  arrmean = um.true_divide(arrmean, div, out=arrmean,
  ret = ret.dtype.type(ret / rcount)


[2023-04-08 00:21:10,046] EPLUS_ENV_demo-v1_MainThread_ROOT INFO:EnergyPlus episode completed successfully. 
[2023-04-08 00:21:10,046] EPLUS_ENV_demo-v1_MainThread_ROOT INFO:EnergyPlus episode completed successfully. 
[2023-04-08 00:21:10,046] EPLUS_ENV_demo-v1_MainThread_ROOT INFO:EnergyPlus episode completed successfully. 
[2023-04-08 00:21:10,047] EPLUS_ENV_demo-v1_MainThread_ROOT INFO:Creating new EnergyPlus simulation episode...
[2023-04-08 00:21:10,047] EPLUS_ENV_demo-v1_MainThread_ROOT INFO:Creating new EnergyPlus simulation episode...
[2023-04-08 00:21:10,047] EPLUS_ENV_demo-v1_MainThread_ROOT INFO:Creating new EnergyPlus simulation episode...
[2023-04-08 00:21:10,059] EPLUS_ENV_demo-v1_MainThread_ROOT INFO:EnergyPlus working directory is in /workspaces/sinergym-475112771eb968ca05ea2e27c16fed29e37cb9fe/Eplus-env-demo-v1-res1/Eplus-env-sub_run2
[2023-04-08 00:21:10,059] EPLUS_ENV_demo-v1_MainThread_ROOT INFO:EnergyPlus working directory is in /workspaces/sinergym-475112771eb968c

Now we save the current model.

And as always, remember to close the environment.

In [27]:
env.close()

[2023-04-08 12:00:33,771] EPLUS_ENV_demo-v1_MainThread_ROOT INFO:EnergyPlus simulation closed successfully. 
[2023-04-08 12:00:33,771] EPLUS_ENV_demo-v1_MainThread_ROOT INFO:EnergyPlus simulation closed successfully. 
[2023-04-08 12:00:33,771] EPLUS_ENV_demo-v1_MainThread_ROOT INFO:EnergyPlus simulation closed successfully. 
