# Implementación de EnergyPlus en Ray montado en Google Colab

Una parte importante de los proyectos que involucran al aprendizaje por refuerzos es el ajuste fino de los hiperparámetros. Esto requiere poder de calculo, por lo que se pretende aquí implementar un servidor de Ray en Google Colab con el fin de ejecutar un experimento utilizando Ray Tune y Ray RLlib.

La notebook se erganiza de la siguiente manera:

1. **Montaje de Google Drive en Colab.** Esto servirá para alojar los datos generados durante el entrenamiento.
2. **Instalación de EnergyPlus.** El entorno de aprendizaje utiliza el programa [EnergyPlus](https://energyplus.net/) para la simulación de edificios.
3. **Instalación de librerías.** Se instalan en la máquina virtual las librerías utilizadas para la ejecución del experimento.
4. **Definición de funciones.** Se definen las funciones que conforman al entorno de aprendizaje por refuerzos.

  4.1. *Comprobación del seriabilidad del entorno.* Para poder distribuir el algoritmo en el servidor es importante que el entorno se pueda seriabilizar. Aquí se comprueba que así sea.

5. **Confuguración del algoritmo.** Se configuran los directorios y los hiperparámetros a ajustar en el experimento.
6. **Ejecución del experimento.** Se ejecuta la configuración establecida en el punto anterior con Ray Tune. Aquí se configuran los algoritmos de búsqueda y/o de terminación temprana, como así también la cantidad de corridas a realizar y otras relacionadas con el ajuste de los hiperparámetros.

## **1**. Drive in Colab mounting

It is necesary to have EnergyPlus.sh file for Ubuntu 20.04. This is the operating system in Colab. You can download following [this link](https://github.com/NREL/EnergyPlus/releases/download/v22.1.0/EnergyPlus-22.1.0-ed759b17ee-Linux-Ubuntu20.04-x86_64.sh).

In [None]:
from google.colab import drive
drive.mount('/content/drive')

Mounted at /content/drive


## **2**. Install EnergyPlus in Colab Server

In [None]:
# install EP to "/usr/local/EnergyPlus-23-2-0"
!chmod +x //content/drive/MyDrive/ep_drive/EnergyPlus-23.2.0-7636e6b3e9-Linux-Ubuntu20.04-x86_64.sh
!sudo /content/drive/MyDrive/ep_drive/EnergyPlus-23.2.0-7636e6b3e9-Linux-Ubuntu20.04-x86_64.sh
# to capture C-level stdout/stderr pipes in Python
!pip install wurlitzer
# check EP
print('\n- Check EnergyPlus Version')
!energyplus -version
# Add energyplus to PATH
import sys
sys.path.insert(0, '/usr/local/EnergyPlus-23-2-0')

EnergyPlus, Copyright (c) 1996-2023, The Board of Trustees of the University of Illinois, The Regents of the University of California, through Lawrence Berkeley National Laboratory (subject to receipt of any required approvals from the U.S. Dept. of Energy), Oak Ridge National Laboratory, managed by UT-Battelle, Alliance for Sustainable Energy, LLC, and other contributors. All rights reserved.

NOTICE: This Software was developed under funding from the U.S. Department of Energy and the U.S. Government consequently retains certain rights. As such, the U.S. Government has been granted for itself and others acting on its behalf a paid-up, nonexclusive, irrevocable, worldwide license in the Software to reproduce, distribute copies to the public, prepare derivative works, and perform publicly and display publicly, and to permit others to do so.

Redistribution and use in source and binary forms, with or without modification, are permitted provided that the following conditions are met:

(1)

## **3**. Install all the necesary libraries

In [None]:
# Instalación de las librerías
!pip install python-multipart
!pip install kaleido
!pip install ray[all]==2.9.1
!pip install gymnasium==0.28.1
!pip install bayesian-optimization==1.4.3
!pip install tensorflow==2.15.0
!pip install torch==2.1.2

# setting
%load_ext wurlitzer

Collecting kaleido
  Downloading kaleido-0.2.1-py2.py3-none-manylinux1_x86_64.whl (79.9 MB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m79.9/79.9 MB[0m [31m8.6 MB/s[0m eta [36m0:00:00[0m
[?25hInstalling collected packages: kaleido
[31mERROR: pip's dependency resolver does not currently take into account all the packages that are installed. This behaviour is the source of the following dependency conflicts.
lida 0.0.10 requires python-multipart, which is not installed.[0m[31m
[0mSuccessfully installed kaleido-0.2.1
Collecting python-multipart
  Downloading python_multipart-0.0.9-py3-none-any.whl (22 kB)
Installing collected packages: python-multipart
Successfully installed python-multipart-0.0.9
The wurlitzer extension is already loaded. To reload it, use:
  %reload_ext wurlitzer


## **4**. Definición de funciones

Una alternativa es realizar la clonación de un repositorio desde GitHub.

In [None]:
import numpy as np
import os
import json
import gymnasium as gym
# Used to configurate the action and observation spaces
os.environ['RAY_PICKLE_VERBOSE_DEBUG'] = '2'
import ray
# To init ray
from ray import air, tune
# To configurate the execution of the experiment
from ray.tune import register_env
# To register the custom environment. RLlib is not compatible with conventional Gym register of
# custom environments.
from ray.rllib.algorithms.ppo.ppo import PPOConfig
# To config the PPO algorithm.
from ray.rllib.algorithms.dqn.dqn import DQNConfig
# To config the DQN algorithm.
from ray.rllib.algorithms.sac.sac import SACConfig
# To config the SAC algorithm.
from ray.tune.schedulers import ASHAScheduler
# Early stop to tune the hyperparameters
from ray.tune.search.bayesopt import BayesOptSearch
# Search algorithm to tune the hyperparameters
from ray.tune.search import Repeater
# Tool to evaluate multiples seeds in a configuration of hyperparameters
#from env.VENT_ep_gym_env import EnergyPlusEnv_v0
# The EnergyPlus Environment configuration. There is defined the reward function
# and also is define the flux of execution of the MDP.

import sys
import threading
from time import sleep
from typing import Any, Dict, List, Optional
from queue import Empty, Full, Queue
# Used to separate the execution in two threads and comunicate EnergyPlus with this environment.


os_platform = sys.platform
if os_platform == "linux":
    sys.path.insert(0, '/usr/local/EnergyPlus-23-2-0')
else:
    sys.path.insert(0, 'C:/EnergyPlusV23-2-0')

from pyenergyplus.api import EnergyPlusAPI
api = EnergyPlusAPI()

"""Action spaces for diferent agents that operate devices in a centralized way.
"""

def natural_ventilation_action(central_action: int):
    """_summary_

    Args:
        central_action (int): _description_

    Returns:
        _type_: _description_
    """
    action_space = [
        [0,0],
        [0,1],
        [1,0],
        [1,1]
    ]
    return action_space[central_action]

def natural_ventilation_central_action(action1: int, action2: int):
    """_summary_

    Args:
        action1 (int): _description_
        action2 (int): _description_

    Returns:
        _type_: _description_
    """
    action_space = [
        [0,0],
        [0,1],
        [1,0],
        [1,1]
    ]
    index = 0
    for a in action_space:
        if a == [action1, action2]:
            central_action = index
            break
        else:
            index += 1

    return central_action

class TwoWindowsCentralizedControl:
    def __init__(self):
        self.action_space = [
            [0,0],
            [0,1],
            [1,0],
            [1,1]
        ]

    def natural_ventilation_action(self, central_action: int):
        """_summary_

        Args:
            central_action (int): _description_

        Returns:
            _type_: _description_
        """
        return self.action_space[central_action]

    def natural_ventilation_central_action(self, action1: int, action2: int):
        """_summary_

        Args:
            action1 (int): _description_
            action2 (int): _description_

        Returns:
            _type_: _description_
        """
        index = 0
        for a in self.action_space:
            if a == [action1, action2]:
                central_action = index
                break
            else:
                index += 1

        return central_action

"""Utilities that involve the weather.
"""
import pandas as pd
from pandas.core.frame import DataFrame
import numpy as np

def weather_file(env_config: dict, weather_choice:int = np.random.randint(0,3)):
    """This method select a random or specific weather file path and the respectives latitude, longitude, and altitude for
    the weather path training options or the path to be use for evaluation.

    Args:
        env_config (dict): Environment configuration with the 'weather_folder' path and the specification of 'is_test' condition.
        weather_choice (int, optional): This option provide to select only one weather file for training. Defaults to np.random.randint(0,3).

    Returns:
        tuple[str, float, float, int]: Return a tuple with the epw path and the respective values for latitude, longitude, and altitude.
    """
    folder_path = env_config['weather_folder']
    if not env_config['is_test']:
        weather_path = [
            ['GEF_Lujan_de_cuyo-hour-H1',-32.985,-68.93,1043],
            ['GEF_Lujan_de_cuyo-hour-H2',-32.985,-68.93,1043],
            ['GEF_Lujan_de_cuyo-hour-H3',-32.985,-68.93,1043],
        ]
        latitud = weather_path[weather_choice][1]
        longitud = weather_path[weather_choice][2]
        altitud = weather_path[weather_choice][3]
        return folder_path+'/'+weather_path[weather_choice][0]+'.epw', latitud, longitud, altitud
    else:
        return folder_path+'/GEF_Lujan_de_cuyo-hour-H4.epw', -32.985,-68.93,1043

class Probabilities:
    def __init__(
        self,
        env_config:dict
    ):
        """This class provide methods to calculate the weather probabilities during training based on the weather file 'epw'.

        Args:
            env_config (dict): Environment configuration with the 'epw' path element.

        Example:
        ```
        >>> from tools.weather_utils import Probabilities, weather_file
        >>> env_config={
                'weather_folder': 'C:/Users/grhen/Documents/GitHub/natural_ventilation_EP_RLlib/epw/GEF',
                'is_test': False,
            }
        >>> env_config['epw'], _, _, _ = weather_file(env_config)
        >>> prob = Probabilities(env_config)
        >>> julian_day = 215
        >>> predictions = prob.ten_days_predictions(julian_day)
        ```
        """
        self.env_config = env_config

        with open(self.env_config["epw"]) as file:
            self.weather_file: DataFrame = pd.read_csv(
                file,
                header = None,
                skiprows = 8
            )
            # Reading the weather epw file.
        self.ten_rows_added = False
        # Flag to be sure about the run of the next line.
        self.complement_10_days()

    def complement_10_days(self):
        """This method add rows to complement the predictions of the entire year of then days after the December 31th using the first
        ten days of the year. For that, 240 rows are added because each day has 24 hours.
        """
        primeras_10_filas = self.weather_file.head(240)
        # Obtain the first 240 rows of the weather file.
        self.weather_file = pd.concat([self.weather_file, primeras_10_filas], ignore_index=True)
        # Add the rows to the same weather file.
        self.ten_rows_added = True
        # Put this flag in True mode.


    # Paso 1: Filtrar los datos para el día juliano dado y los próximos 9 días
    def julian_day_filter(self, dia_juliano: int):
        """This method implement a filter of the weather data based on the julian day `n` and create a NDarray with booleans with
        True values in the data filtered from `[n, n+10]` bouth inclusive.

        Args:
            dia_juliano (int): First julian day of the range filtered.

        Returns:
            np_ndarray_bool
        """
        if self.ten_rows_added:
            # The julian day of each row is calculated for a extended list with 10 days more.
            dias_julianos = ((self.weather_file.index % 9240) // 24 + 1)
        else:
            # The julian day of each row is calculated for a not extended.
            dias_julianos = (self.weather_file.index % 8760) // 24 + 1
        # Check if the Julian day is within the desired range and return
        return dias_julianos.isin(range(dia_juliano, dia_juliano + 10))

    def ten_days_predictions(self, julian_day: int):
        """This method calculate the probabilies of six variables list bellow with a normal probability based on the desviation
        of the variable.

            Dry Bulb Temperature in °C with desviation of 1 °C,
            Relative Humidity in % with desviation of 10%,
            Wind Direction in degree with desviation of 20°,
            Wind Speed in m/s with desviation of 0.5 m/s,
            Total Sky in % Cover with desviation of 10%,
            Liquid Precipitation Depth in mm with desviation of 0.2 mm.

        Args:
            julian_day (int): First julian day of the range of ten days predictions.

        Returns:
            NDArray: Array with the ten days predictions. The size of the array is a sigle shape with 1440 values.
        """
        interest_variables = [6, 8, 20, 21, 22, 33]
        # This corresponds with the epw file order.
        filtered_data: DataFrame = self.weather_file[self.julian_day_filter(julian_day)][interest_variables]
        # Filter the data whith the julian day of interes and ten days ahead.
        data_list: list = filtered_data.values.tolist()
        # Transform the DataFrame into a list. This list contain a list for each hour, but as an observation of a single shape in
        # the RLlib configuration, the list is transform into a new one with only a shape.
        single_shape_list = []
        for e in range(len(data_list)):
            for v in data_list[e]:
                single_shape_list.append(v)
                # append each value of each day and hour in a consecutive way in the empty list.
        desviation = [1, 10, 20, 0.5, 10, 0.2]
        # Assignation of the desviation for each variable, in order with the epw variables consulted.
        prob_index = 0
        for e in range(len(single_shape_list)):
            single_shape_list[e] = np.random.normal(single_shape_list[e], desviation[prob_index])
            if prob_index == (len(desviation)-1):
                prob_index = 0
            else:
                prob_index += 1

        predictions = np.array(single_shape_list)
        # The prediction list is transformed in a Numpy Array to concatenate after with the rest of the observation variables.
        return predictions

"""Utilities and methods to configurate the execution of the episode in EnergyPlus with RLlib.
"""

def episode_epJSON(env_config: dict):
    """This method define the properties of the episode. Changing some properties as weather or
    Run Time Period, and defining others fix properties as volumen or window area relation.

    Args:
        env_config (dict): Environment configuration.

    Return:
        dict: The method returns the env_config with modifications.
    """
    if env_config.get('epjson', False) == False:
        env_config = epJSON_path(env_config)
        # If the path to epjson is not already set, it is set here.
    with open(env_config['epjson']) as file:
        epJSON_object: dict = json.load(file)
        # Establish the epJSON Object, it will be manipulated to modify the building model.

    epJSON_object['Building'][next(iter(epJSON_object['Building']))]['north_axis'] = env_config['rotation']
    # The building is oriented as is possitioned in the land.

    ObjectName = next(iter(epJSON_object['RunPeriod']))
    epJSON_object["RunPeriod"][ObjectName]["begin_month"] = 1
    epJSON_object["RunPeriod"][ObjectName]["begin_day_of_month"] = 1
    epJSON_object["RunPeriod"][ObjectName]["end_month"] = 12
    epJSON_object["RunPeriod"][ObjectName]["end_day_of_month"] = 31

    env_config['construction_u_factor'] = u_factor(epJSON_object)
    # The global U factor is calculated.

    for key in [key for key in epJSON_object["InternalMass"].keys()]:
        epJSON_object["InternalMass"][key]["surface_area"] = np.random.randint(10,40) if not env_config['is_test'] else 15
        # The internal thermal mass is modified.

    env_config['inercial_mass'] = inertial_mass(epJSON_object)
    # The total inertial thermal mass is calculated.

    env_config['E_max'] = (0.5+(0.5 - 0.08)*np.random.random_sample()) if not env_config['is_test'] else 2.5/6
    HVAC_names = [key for key in epJSON_object["ZoneHVAC:IdealLoadsAirSystem"].keys()]
    for hvac in range(len(HVAC_names)):
        epJSON_object["ZoneHVAC:IdealLoadsAirSystem"][HVAC_names[hvac]]["maximum_sensible_heating_capacity"] = env_config['E_max']
        epJSON_object["ZoneHVAC:IdealLoadsAirSystem"][HVAC_names[hvac]]["maximum_total_cooling_capacity"] = env_config['E_max']
        # The limit capacity of bouth cooling and heating are changed.

    env_config["epjson"] = f"{env_config['epjson_output_folder']}/model-{env_config['episode']:08}-{os.getpid():05}.epJSON"
    with open(env_config["epjson"], 'w') as fp:
        json.dump(epJSON_object, fp, sort_keys=False, indent=4)
        # The new modify epjson file is writed.

    env_config['epw'],env_config['latitud'], env_config['longitud'], env_config['altitud'] = weather_file(env_config)
    # Assign the epw path and the correspondent latitude, longitude, and altitude.
    return env_config

def epJSON_path(env_config: dict):
    """This method define the path to the epJSON file to be simulated.

    Args:
        env_config (dict): Environment configuration that must contain:
            'epjson_folderpath'
            'building_name'

    Return:
        dict: The method returns the env_config with modifications.
    """
    env_config['epjson'] = env_config['epjson_folderpath']+'/'+env_config['building_name']+'.epjson'
    return env_config

def inertial_mass(epJSON_object: dict[str,dict]):
    """_summary_

    Args:
        epJSON_object (dict[str,dict]): _description_

    Returns:
        _type_: _description_
    """
    # se define una lista para almacenar
    masas_termicas = []

    building_surfaces = [key for key in epJSON_object["BuildingSurface:Detailed"].keys()]
    # se obtienen los nombres de las superficies de la envolvente
    internal_mass_surfaces = [key for key in epJSON_object["InternalMass"].keys()]

    all_building_keys = [key for key in epJSON_object.keys()]
    all_material_list = ['Material','Material:NoMass','Material:InfraredTransparent','Material:AirGap',
        'Material:RoofVegetation','WindowMaterial:SimpleGlazingSystem','WindowMaterial:Glazing',
        'WindowMaterial:GlazingGroup:Thermochromic','WindowMaterial:Glazing:RefractionExtinctionMethod',
        'WindowMaterial:Gas','WindowGap:SupportPillar','WindowGap:DeflectionState',
        'WindowMaterial:GasMixture','WindowMaterial:Gap'
    ]
    materials_dict = {}
    for material in all_material_list:
        if material in all_building_keys:
            materials_dict[material] = epJSON_object[material].keys()
    # se obtienen los nombres de los diferentes tipos de materiales

    # lazo para consultar cada superficie de la envolvente
    for surface in building_surfaces:
        # se calcula el área de la superficie
        area = material_area(epJSON_object,surface)
        # se identifica la consutrucción
        s_construction = epJSON_object['BuildingSurface:Detailed'][surface]['construction_name']

        # se obtiene la densidad del materia: \rho[kg/m3]
        # se obtiene el calor específico del material: C[J/kg°C]
        # se calcula el volumen que ocupa el material: V[m3]=area*thickness
        # se calcula la masa térmica: M[J/°C] = \rho[kg/m3] * C[J/kg°C] * V[m3]

        # se establece un lazo para calcular la masa térmica de cada capa
        m_surface = 0
        layers = [key for key in epJSON_object['Construction'][s_construction].keys()]
        for layer in layers:
            material = epJSON_object['Construction'][s_construction][layer]
            material_list = find_dict_key_by_nested_key(
                material,
                materials_dict
            )
            # se obtiene el espesor y la conductividad térmica del material de la capa
            if material_list == 'Material:NoMass' or material_list == 'Material:AirGap' or material_list == 'Material:InfraredTransparent' or material_list == 'WindowMaterial:Gas':
                m_capa = 0
            else:
                espesor_capa = epJSON_object[material_list][material]['thickness']
                calor_especifico_capa = epJSON_object[material_list][material]['specific_heat']
                densidad_capa = epJSON_object[material_list][material]['density']
                m_capa = area * espesor_capa * calor_especifico_capa * densidad_capa

            # se suma la resistencia de la superficie
            m_surface += m_capa
        # se guarda la resistencia de la superficie
        masas_termicas.append(m_surface)

    # se suma la masa interna asignada
    for surface in internal_mass_surfaces:
        # se calcula el área de la superficie
        area = epJSON_object['InternalMass'][surface]['surface_area']
        # se identifica la consutrucción
        s_construction = epJSON_object['InternalMass'][surface]['construction_name']

        # se obtiene la densidad del materia: \rho[kg/m3]
        # se obtiene el calor específico del material: C[J/kg°C]
        # se calcula el volumen que ocupa el material: V[m3]=area*thickness
        # se calcula la masa térmica: M[J/°C] = \rho[kg/m3] * C[J/kg°C] * V[m3]

        # se establece un lazo para calcular la masa térmica de cada capa
        m_surface = 0
        layers = [key for key in epJSON_object['Construction'][s_construction].keys()]
        for layer in layers:
            material = epJSON_object['Construction'][s_construction][layer]
            material_list = find_dict_key_by_nested_key(
                material,
                materials_dict
            )
            # se obtiene el espesor y la conductividad térmica del material de la capa
            if material_list == 'Material:NoMass' or material_list == 'Material:AirGap' or material_list == 'Material:InfraredTransparent' or material_list == 'WindowMaterial:Gas':
                m_capa = 0
            else:
                espesor_capa = epJSON_object[material_list][material]['thickness']
                calor_especifico_capa = epJSON_object[material_list][material]['specific_heat']
                densidad_capa = epJSON_object[material_list][material]['density']
                m_capa = area * espesor_capa * calor_especifico_capa * densidad_capa

            # se suma la resistencia de la superficie
            m_surface += m_capa
        # se guarda la resistencia de la superficie
        masas_termicas.append(m_surface)

    # Cálculo de la masa termica total
    M_total = 0
    for m in range(0,len(masas_termicas)-1,1):
        M_total += masas_termicas[m]

    return M_total

def u_factor(epJSON_object: dict[str,dict]):
    """This function select all the building surfaces and fenestration surfaces and calculate the
    global U-factor of the building, like EnergyPlus does.
    """
    # se define una lista para almacenar las resistencias de cada supercie
    resistences = []
    areas = []
    # se obtienen los nombres de las superficies de la envolvente
    building_surfaces = [key for key in epJSON_object['BuildingSurface:Detailed'].keys()]
    fenestration_surfaces = [key for key in epJSON_object['FenestrationSurface:Detailed'].keys()]
    # se obtienen los nombres de los diferentes tipos de materiales

    all_building_keys = [key for key in epJSON_object.keys()]
    all_material_list = ['Material','Material:NoMass','Material:InfraredTransparent','Material:AirGap',
        'Material:RoofVegetation','WindowMaterial:SimpleGlazingSystem','WindowMaterial:Glazing',
        'WindowMaterial:GlazingGroup:Thermochromic','WindowMaterial:Glazing:RefractionExtinctionMethod',
        'WindowMaterial:Gas','WindowGap:SupportPillar','WindowGap:DeflectionState',
        'WindowMaterial:GasMixture','WindowMaterial:Gap'
    ]
    materials_dict = {}
    for material in all_material_list:
        if material in all_building_keys:
            materials_dict[material] = epJSON_object[material].keys()
    # se obtienen los nombres de los diferentes tipos de materiales

    # lazo para consultar cada superficie de la envolvente
    for surface in building_surfaces:
        # se calcula el área de la superficie
        areas.append(material_area(epJSON_object,surface))
        # se identifica la consutrucción
        s_construction = epJSON_object['BuildingSurface:Detailed'][surface]['construction_name']
        # se establece un lazo para calcular la resistencia de cada capa
        r_surface = 0
        layers = [key for key in epJSON_object['Construction'][s_construction].keys()]
        for layer in layers:
            material = epJSON_object['Construction'][s_construction][layer]
            material_list = find_dict_key_by_nested_key(
                material,
                materials_dict
            )
            # se obtiene el espesor y la conductividad térmica del material de la capa
            if material_list == 'Material:NoMass' or material_list == 'Material:AirGap':
                r_capa = epJSON_object[material_list][material]['thermal_resistance']
            elif material_list == 'Material:InfraredTransparent':
                r_capa = 0
            elif material_list == 'WindowMaterial:Gas':
                espesor_capa = epJSON_object[material_list][material]['thickness']
                if epJSON_object[material_list][material]['gas_type'] == 'Air':
                    conductividad_capa = 0.0257
                elif epJSON_object[material_list][material]['gas_type'] == 'Argon':
                    conductividad_capa = 0.0162
                elif epJSON_object[material_list][material]['gas_type'] == 'Xenon':
                    conductividad_capa = 0.00576
                elif epJSON_object[material_list][material]['gas_type'] == 'Krypton':
                    conductividad_capa = 0.00943
                else:
                    print('El nombre del gas no corresponde con los que pueden utilizarse: Air, Argon, Xenon, Krypton.')
                    NameError
                r_capa = espesor_capa/conductividad_capa
            else:
                espesor_capa = epJSON_object[material_list][material]['thickness']
                conductividad_capa = epJSON_object[material_list][material]['conductivity']
                r_capa = espesor_capa/conductividad_capa

            # se suma la resistencia de la superficie
            r_surface += r_capa
        # se guarda la resistencia de la superficie
        resistences.append(r_surface)

    # lazo para consultar cada superfice de fenestración
    for fenestration in fenestration_surfaces:
        # se calcula el área de la superficie
        areas.append(fenestration_area(epJSON_object, fenestration))
        # se identifica la consutrucción
        s_construction = epJSON_object['FenestrationSurface:Detailed'][fenestration]['construction_name']
        # se establece un lazo para calcular la resistencia de cada capa
        r_surface = 0
        layers = [key for key in epJSON_object['Construction'][s_construction].keys()]
        for layer in layers:
            material = epJSON_object['Construction'][s_construction][layer]
            material_list = find_dict_key_by_nested_key(
                material,
                materials_dict
            )
            # se obtiene el espesor y la conductividad térmica del material de la capa
            if material_list == 'Material:NoMass' or material_list == 'Material:AirGap':
                r_capa = epJSON_object[material_list][material]['thermal_resistance']
            elif material_list == 'Material:InfraredTransparent':
                r_capa = 0
            elif material_list == 'WindowMaterial:Gas':
                espesor_capa = epJSON_object[material_list][material]['thickness']
                if epJSON_object[material_list][material]['gas_type'] == 'Air':
                    conductividad_capa = 0.0257
                elif epJSON_object[material_list][material]['gas_type'] == 'Argon':
                    conductividad_capa = 0.0162
                elif epJSON_object[material_list][material]['gas_type'] == 'Xenon':
                    conductividad_capa = 0.00576
                elif epJSON_object[material_list][material]['gas_type'] == 'Krypton':
                    conductividad_capa = 0.00943
                else:
                    print('El nombre del gas no corresponde con los que pueden utilizarse: Air, Argon, Xenon, Krypton.')
                    NameError
                r_capa = espesor_capa/conductividad_capa
            else:
                espesor_capa = epJSON_object[material_list][material]['thickness']
                conductividad_capa = epJSON_object[material_list][material]['conductivity']
                r_capa = espesor_capa/conductividad_capa

            # se suma la resistencia de la superficie
            r_surface += r_capa
        # se guarda la resistencia de la superficie
        resistences.append(r_surface)

    # Cálculo de U-Factor en W/°C
    u_factor = 0
    for n in range(0, len(areas)-1,1):
        u_factor =+ areas[n]/resistences[n]

    return u_factor

def material_area(epJSON_object, nombre_superficie):
    """_summary_

    Args:
        epJSON_object (_type_): _description_
        nombre_superficie (_type_): _description_

    Returns:
        _type_: _description_
    """
    # Calcula dos vectores que forman dos lados del cuadrilátero
    vector1 = [
        epJSON_object['BuildingSurface:Detailed'][nombre_superficie]['vertices'][1]['vertex_x_coordinate'] - epJSON_object['BuildingSurface:Detailed'][nombre_superficie]['vertices'][0]['vertex_x_coordinate'],
        epJSON_object['BuildingSurface:Detailed'][nombre_superficie]['vertices'][1]['vertex_y_coordinate'] - epJSON_object['BuildingSurface:Detailed'][nombre_superficie]['vertices'][0]['vertex_y_coordinate'],
        epJSON_object['BuildingSurface:Detailed'][nombre_superficie]['vertices'][1]['vertex_z_coordinate'] - epJSON_object['BuildingSurface:Detailed'][nombre_superficie]['vertices'][0]['vertex_z_coordinate']
    ]
    vector2 = [
        epJSON_object['BuildingSurface:Detailed'][nombre_superficie]['vertices'][2]['vertex_x_coordinate'] - epJSON_object['BuildingSurface:Detailed'][nombre_superficie]['vertices'][0]['vertex_x_coordinate'],
        epJSON_object['BuildingSurface:Detailed'][nombre_superficie]['vertices'][2]['vertex_y_coordinate'] - epJSON_object['BuildingSurface:Detailed'][nombre_superficie]['vertices'][0]['vertex_y_coordinate'],
        epJSON_object['BuildingSurface:Detailed'][nombre_superficie]['vertices'][2]['vertex_z_coordinate'] - epJSON_object['BuildingSurface:Detailed'][nombre_superficie]['vertices'][0]['vertex_z_coordinate']
    ]

    # Calcula el producto vectorial entre los dos vectores
    producto_vectorial = [
        vector1[1] * vector2[2] - vector1[2] * vector2[1],
        vector1[2] * vector2[0] - vector1[0] * vector2[2],
        vector1[0] * vector2[1] - vector1[1] * vector2[0]
    ]

    # Calcula el módulo del producto vectorial como el área del cuadrilátero
    area = 0.5 * (abs(producto_vectorial[0]) + abs(producto_vectorial[1]) + abs(producto_vectorial[2]))
    return area

def fenestration_area(epJSON_object, fenestration):
    """_summary_

    Args:
        epJSON_object (_type_): _description_
        fenestration (_type_): _description_

    Returns:
        _type_: _description_
    """

    # Calcula dos vectores que forman dos lados del cuadrilátero
    vector1 = [
        epJSON_object['FenestrationSurface:Detailed'][fenestration]['vertex_2_x_coordinate'] - epJSON_object['FenestrationSurface:Detailed'][fenestration]['vertex_1_x_coordinate'],
        epJSON_object['FenestrationSurface:Detailed'][fenestration]['vertex_2_y_coordinate'] - epJSON_object['FenestrationSurface:Detailed'][fenestration]['vertex_1_y_coordinate'],
        epJSON_object['FenestrationSurface:Detailed'][fenestration]['vertex_2_z_coordinate'] - epJSON_object['FenestrationSurface:Detailed'][fenestration]['vertex_1_z_coordinate']
    ]
    vector2 = [
        epJSON_object['FenestrationSurface:Detailed'][fenestration]['vertex_3_x_coordinate'] - epJSON_object['FenestrationSurface:Detailed'][fenestration]['vertex_1_x_coordinate'],
        epJSON_object['FenestrationSurface:Detailed'][fenestration]['vertex_3_y_coordinate'] - epJSON_object['FenestrationSurface:Detailed'][fenestration]['vertex_1_y_coordinate'],
        epJSON_object['FenestrationSurface:Detailed'][fenestration]['vertex_3_z_coordinate'] - epJSON_object['FenestrationSurface:Detailed'][fenestration]['vertex_1_z_coordinate']
    ]

    # Calcula el producto vectorial entre los dos vectores
    producto_vectorial = [
        vector1[1] * vector2[2] - vector1[2] * vector2[1],
        vector1[2] * vector2[0] - vector1[0] * vector2[2],
        vector1[0] * vector2[1] - vector1[1] * vector2[0]
    ]

    # Calcula el módulo del producto vectorial como el área del cuadrilátero
    area = 0.5 * (abs(producto_vectorial[0]) + abs(producto_vectorial[1]) + abs(producto_vectorial[2]))
    return area

def find_dict_key_by_nested_key(key, lists_dict):
    """_summary_

    Args:
        key (_type_): _description_
        lists_dict (_type_): _description_

    Returns:
        _type_: _description_
    """
    for dict_key, lst in lists_dict.items():
        if key in lst:
            return dict_key
    return None

"""# ENERGYPLUS RUNNER

This script contain the EnergyPlus Runner that execute EnergyPlus from its Python API in the version
23.2.0.
"""

class EnergyPlusRunner:
    """This object have the particularity of `start` EnergyPlus, `_collect_obs` and `_send_actions` to
    send it trhougt queue to the EnergyPlus Environment thread.
    """
    def __init__(
        self,
        episode: int,
        env_config: Dict[str, Any],
        obs_queue: Queue,
        act_queue: Queue,
        cooling_queue: Queue,
        heating_queue: Queue,
        pmv_queue: Queue,
        ppd_queue: Queue,
        beta_queue: Queue,
        emax_queue: Queue,
        ):
        """The object has an intensive interaction with EnergyPlus Environment script, exchange information
        between two threads. For a good coordination queue events are stablished and different canals of
        information are defined.

        Args:
            episode (int): Episode number.
            env_config (Dict[str, Any]): Environment configuration defined in the call to the EnergyPlus Environment.
            obs_queue (Queue): Queue object definition.
            act_queue (Queue): Queue object definition.
            cooling_queue (Queue): Queue object definition.
            heating_queue (Queue): Queue object definition.
            pmv_queue (Queue): Queue object definition.
            ppd_queue (Queue): Queue object definition.
            beta_queue (Queue): Queue object definition.
            emax_queue (Queue): Queue object definition.

        Return:
            None.
        """
        self.episode = episode
        self.env_config = env_config
        self.env_config['episode'] = self.episode
        self.obs_queue = obs_queue
        self.act_queue = act_queue
        self.cooling_queue = cooling_queue
        self.heating_queue = heating_queue
        self.pmv_queue = pmv_queue
        self.ppd_queue = ppd_queue
        self.beta_queue = beta_queue
        self.emax_queue = emax_queue
        # Asignation of variables.

        self.obs_event = threading.Event()
        self.act_event = threading.Event()
        self.cooling_event = threading.Event()
        self.heating_event = threading.Event()
        self.pmv_event = threading.Event()
        self.ppd_event = threading.Event()
        self.beta_event = threading.Event()
        self.emax_event = threading.Event()
        # The queue events are generated.

        self.energyplus_exec_thread: Optional[threading.Thread] = None
        self.energyplus_state: Any = None
        self.sim_results: int = 0
        self.initialized = False
        self.init_handles = False
        self.simulation_complete = False
        self.first_observation = True
        # Variables to be used in this thread.

        self.env_config = epJSON_path(self.env_config)
        # The path for the epjson file is defined.

        self.variables = {
            "To": ("Site Outdoor Air Drybulb Temperature", "Environment"), #0
            "Ti": ("Zone Mean Air Temperature", "Thermal Zone"), #1
            "v": ("Site Wind Speed", "Environment"), #2
            "d": ("Site Wind Direction", "Environment"), #3
            "RHo": ("Site Outdoor Air Relative Humidity", "Environment"), #4
            "RHi": ("Zone Air Relative Humidity", "Thermal Zone"), #5
            "T_rad": ("Zone Mean Radiant Temperature", "Thermal Zone"), #del
            "Fanger_PMV":("Zone Thermal Comfort Fanger Model PMV", "People"), #del
            "Fanger_PPD":("Zone Thermal Comfort Fanger Model PPD", "People"), #del
        }
        self.var_handles: Dict[str, int] = {}
        # Declaration of variables this simulation will interact with.

        self.meters = {
            "dh": "Heating:DistrictHeatingWater", #6
            "dc": "Cooling:DistrictCooling" #7
        }
        self.meter_handles: Dict[str, int] = {}
        # Declaration of meters this simulation will interact with.

        self.actuators = {
            "opening_window_1": ("AirFlow Network Window/Door Opening", "Venting Opening Factor", "window_2"), # 8: opening factor between 0.0 and 1.0
            "opening_window_2": ("AirFlow Network Window/Door Opening", "Venting Opening Factor", "window_3"), # 9: opening factor between 0.0 and 1.0
        }
        self.actuator_handles: Dict[str, int] = {}
        # Declaration of actuators this simulation will interact with.
        # Airflow Network Openings (EnergyPlus Documentation)
        # An actuator called “AirFlow Network Window/Door Opening” is available with a control type
        # called “Venting Opening Factor.” It is available in models that have operable openings in the Airflow
        # Network model and that are entered by using either AirflowNetwork:MultiZone:Component:DetailedOpening,
        # AirflowNetwork:MultiZone:Component:SimpleOpening, or AirflowNetwork:MultiZone:Component:HorizontalOpening
        # input objects. This control allows you to use EMS to vary the size of the opening during the
        # airflow model calculations, such as for natural and hybrid ventilation.
        # The unique identifier is the name of the surface (window, door or air boundary), not the name of
        # the associated airflow network input objects. The actuator control involves setting the value of the
        # opening factor between 0.0 and 1.0. Use of this actuator with an air boundary surface is allowed,
        # but will generate a warning since air boundaries are typically always open.

    def start(self):
        """This method inicialize EnergyPlus. First the episode is configurate, the calling functions
        established and the thread is generated here.
        """
        self.env_config = episode_epJSON(self.env_config)
        # Configurate the episode.

        self.weather_stats = Probabilities(self.env_config)
        # Specify the weather statisitical file.

        self.energyplus_state = api.state_manager.new_state()
        # Start a new EnergyPlus state (condition for execute EnergyPlus Python API).

        api.runtime.callback_begin_system_timestep_before_predictor(self.energyplus_state, self._collect_first_obs)
        # Collect the first observation. This is execute only once at the begginig of the episode.
        # The calling point called “BeginTimestepBeforePredictor” occurs near the beginning of each timestep
        # but before the predictor executes. “Predictor” refers to the step in EnergyPlus modeling when the
        # zone loads are calculated. This calling point is useful for controlling components that affect the
        # thermal loads the HVAC systems will then attempt to meet. Programs called from this point
        # might actuate internal gains based on current weather or on the results from the previous timestep.
        # Demand management routines might use this calling point to reduce lighting or process loads,
        # change thermostat settings, etc.

        api.runtime.callback_begin_zone_timestep_after_init_heat_balance(self.energyplus_state, self._send_actions)
        # Execute the actions in the environment.
        # The calling point called “BeginZoneTimestepAfterInitHeatBalance” occurs at the beginning of each
        # timestep after “InitHeatBalance” executes and before “ManageSurfaceHeatBalance”. “InitHeatBalance” refers to the step in EnergyPlus modeling when the solar shading and daylighting coefficients
        # are calculated. This calling point is useful for controlling components that affect the building envelope including surface constructions and window shades. Programs called from this point might
        # actuate the building envelope or internal gains based on current weather or on the results from the
        # previous timestep. Demand management routines might use this calling point to operate window
        # shades, change active window constructions, etc. This calling point would be an appropriate place
        # to modify weather data values.

        api.runtime.callback_end_zone_timestep_after_zone_reporting(self.energyplus_state, self._collect_obs)
        # Collect the observations after the action executions and use them to provide new actions.
        # The calling point called “EndOfZoneTimestepAfterZoneReporting” occurs at the end of a zone
        # timestep after output variable reporting is finalized. It is useful for preparing calculations that
        # will go into effect the next timestep. Its capabilities are similar to BeginTimestepBeforePredictor,
        # except that input data for current time, date, and weather data align with different timesteps.

        api.runtime.set_console_output_status(self.energyplus_state, self.env_config['ep_terminal_output'])
        # Control of the console printing process.

        def _run_energyplus():
            """Run EnergyPlus in a non-blocking way with Threads.
            """
            cmd_args = self.make_eplus_args()
            print(f"running EnergyPlus with args: {cmd_args}")
            self.sim_results = api.runtime.run_energyplus(self.energyplus_state, cmd_args)
            self.simulation_complete = True

        self.energyplus_exec_thread = threading.Thread(
            target=_run_energyplus,
            args=()
        )
        self.energyplus_exec_thread.start()
        # Here the thread is divide in two.

    def _collect_obs(self, state_argument):
        """EnergyPlus callback that collects output variables, meters and actuator actions
        values and enqueue them to the EnergyPlus Environment thread.
        """
        if self.simulation_complete or not self._init_callback(state_argument):
            # To not perform observations when the episode is ended or if the callbacks and the
            # warming period are not complete.
            return

        time_step = api.exchange.zone_time_step_number(state_argument)
        hour = api.exchange.hour(state_argument)
        simulation_day = api.exchange.day_of_year(state_argument)
        # Timestep variables.
        obs = {
            **{
                key: api.exchange.get_variable_value(state_argument, handle)
                for key, handle
                in self.var_handles.items()
            },
            **{
                key: api.exchange.get_meter_value(state_argument, handle)
                for key, handle
                in self.meter_handles.items()
            },
            **{
                key: api.exchange.get_actuator_value(state_argument, handle)
                for key, handle
                in self.actuator_handles.items()
            }
        }
        # Variables, meters and actuatos conditions as observation.
        obs.update(
            {
            'hora': hour,#10
            'simulation_day': simulation_day,#11
            'volumen': self.env_config['volumen'],#12
            'window_area_relation_north': self.env_config['window_area_relation_north'],#13
            'window_area_relation_west': self.env_config['window_area_relation_west'],#14
            'window_area_relation_south': self.env_config['window_area_relation_south'],#15
            'window_area_relation_east': self.env_config['window_area_relation_east'],#16
            'construction_u_factor': self.env_config['construction_u_factor'], #17
            'inercial_mass': self.env_config['inercial_mass'], #18
            'latitud': self.env_config['latitud'], #19
            'longitud':self.env_config['longitud'], #20
            'altitud': self.env_config['altitud'], #21
            'beta': self.env_config['beta'], #22
            'E_max': self.env_config['E_max'], #23
            "rad": api.exchange.today_weather_beam_solar_at_time(state_argument, hour, time_step), #24
            }
        )
        # Upgrade of the timestep observation.

        self.cooling_queue.put(obs['dc'])
        self.cooling_event.set()
        self.heating_queue.put(obs['dh'])
        self.heating_event.set()
        self.beta_queue.put(obs['beta'])
        self.beta_event.set()
        self.emax_queue.put(obs['E_max'])
        self.emax_event.set()
        self.pmv_queue.put(obs["Fanger_PMV"])
        self.pmv_event.set()
        self.ppd_queue.put(obs["Fanger_PPD"])
        self.ppd_event.set()
        # Set the variables to communicate with queue before to delete the following.

        del obs["T_rad"]
        del obs["Fanger_PMV"]
        del obs["Fanger_PPD"]
        # Variables are deleted from the observation because are difficult to mesure.

        next_obs = np.array(list(obs.values()))
        # Transform the observation in a numpy array to meet the condition expected in a RLlib Environment
        weather_prob = self.weather_stats.ten_days_predictions(simulation_day)
        # Consult the stadistics of the weather to put into the obs array. This add 1440 elements to the observation.
        self.next_obs = np.concatenate([next_obs, weather_prob])

        self.obs_queue.put(self.next_obs)
        self.obs_event.set()
        # Set the observation to communicate with queue.

    def _collect_first_obs(self, state_argument):
        """This method is used to collect only the first observation of the environment when the episode beggins.

        Args:
            state_argument (c_void_p): EnergyPlus state pointer. This is created with `api.state_manager.new_state()`.
        """
        if self.first_observation:
            self._collect_obs(state_argument)
            self.first_observation = False
        else:
            return

    def _send_actions(self, state_argument):
        """EnergyPlus callback that sets actuator value from last decided action
        """
        if self.simulation_complete or not self._init_callback(state_argument):
            # To not perform actions when the episode is ended or if the callbacks and the
            # warming period are not complete.
            return

        self.act_event.wait(20)
        # Wait for an action.
        if self.act_queue.empty():
            # Return in the first timestep.
            return

        next_central_action = self.act_queue.get()
        # Get the central action from the EnergyPlus Environment `step` method.
        # In the case of simple agent a int value and for multiagents a dictionary.
        # TODO: Make this EPRunner abble to simple and multi-agent configuration and for natural
        # ventilation, shadow control or a integrate control.
        next_action = natural_ventilation_action(next_central_action)
        # Transform the centraliced action into a list of descentraliced actions.

        api.exchange.set_actuator_value(
            state=state_argument,
            actuator_handle=self.actuator_handles["opening_window_1"],
            actuator_value=next_action[0]
        )
        api.exchange.set_actuator_value(
            state=state_argument,
            actuator_handle=self.actuator_handles["opening_window_2"],
            actuator_value=next_action[1]
        )
        # Perform the actions in EnergyPlus simulation.

    def _init_callback(self, state_argument):
        """Initialize EnergyPlus handles and checks if simulation runtime is ready"""
        self.init_handles = self._init_handles(state_argument)
        self.initialized = self.init_handles \
            and not api.exchange.warmup_flag(state_argument)
        return self.initialized

    def _init_handles(self, state_argument):
        """Initialize sensors/actuators handles to interact with during simulation"""
        if not self.init_handles:
            if not api.exchange.api_data_fully_ready(state_argument):
                return False

            self.var_handles = {
                key: api.exchange.get_variable_handle(state_argument, *var)
                for key, var in self.variables.items()
            }
            self.meter_handles = {
                key: api.exchange.get_meter_handle(state_argument, meter)
                for key, meter in self.meters.items()
            }
            self.actuator_handles = {
                key: api.exchange.get_actuator_handle(state_argument, *actuator)
                for key, actuator in self.actuators.items()
            }
            for handles in [
                self.var_handles,
                self.meter_handles,
                self.actuator_handles
            ]:
                if any([v == -1 for v in handles.values()]):
                    available_data = api.exchange.list_available_api_data_csv(state_argument).decode('utf-8')
                    print(
                        f"got -1 handle, check your var/meter/actuator names:\n"
                        f"> variables: {self.var_handles}\n"
                        f"> meters: {self.meter_handles}\n"
                        f"> actuators: {self.actuator_handles}\n"
                        f"> available E+ API data: {available_data}"
                    )

            self.init_handles = True
        return True

    def stop(self):
        """Method to stop EnergyPlus simulation and joint the threads.
        """
        if not self.simulation_complete:
            self.simulation_complete = True
        sleep(3)
        self._flush_queues()
        self.energyplus_exec_thread.join()
        self.energyplus_exec_thread = None
        self.first_observation = True
        api.runtime.clear_callbacks()
        api.state_manager.delete_state(self.energyplus_state)

    def failed(self):
        """This method tells if a EnergyPlus simulations was finished successfully or not.

        Returns:
            bool: Boolean value of the success of the simulation.
        """
        return self.sim_results != 0

    def make_eplus_args(self):
        """Make command line arguments to pass to EnergyPlus
        """
        eplus_args = ["-r"] if self.env_config.get("csv", False) else []
        eplus_args += [
            "-w",
            self.env_config["epw"],
            "-d",
            f"{self.env_config['output']}/episode-{self.episode:08}-{os.getpid():05}",
            self.env_config["epjson"]
        ]
        return eplus_args

    def _flush_queues(self):
        """Method to liberate the space in the different queue objects.
        """
        for q in [self.obs_queue, self.act_queue, self.cooling_queue,
                  self.heating_queue, self.pmv_queue, self.ppd_queue,
                  self.beta_queue, self.emax_queue]:
            while not q.empty():
                q.get()


"""# ENERGYPLUS RLLIB ENVIRONMENT

This script define the environment of EnergyPlus implemented in RLlib. To works need to define the
EnergyPlus Runner.
"""

class EnergyPlusEnv_v0(gym.Env):
    def __init__(
        self,
        env_config
        ):
        """Environment of a building that run with EnergyPlus Runner.

        Args:
            env_config (Dict[str, Any]): _description_
                'action_space'
                'observation_space'
        """
        super().__init__()
        # super init of the base class gym.Env.
        self.env_config = env_config
        # asigning the configuration of the environment.
        self.episode = -1
        # variable for the registry of the episode number.
        self.action_space = self.env_config['action_space']
        # asignation of the action space.
        self.observation_space = self.env_config['observation_space']
        # asignation of the observation space.
        self.last_obs = {}
        # dict to save the last observation in the environment.
        self.last_beta = 0.
        # variable to save the last beta in the environment.
        self.last_emax = 0.
        # variable to save the last emax in the environment.
        self.energyplus_runner: Optional[EnergyPlusRunner] = None
        # variable where the EnergyPlus Runner object will be save.
        self.obs_queue: Optional[Queue] = None
        # queue for observation communication between threads.
        self.act_queue: Optional[Queue] = None
        # queue for actions communication between threads.
        self.cooling_queue: Optional[Queue] = None
        # queue for cooling metric communication between threads.
        self.heating_queue: Optional[Queue] = None
        # queue for heating metric communication between threads.
        self.pmv_queue: Optional[Queue] = None
        # queue for PMV metric communication between threads.
        self.ppd_queue: Optional[Queue] = None
        # queue for PPD metric communication between threads.
        self.beta_queue: Optional[Queue] = None
        # queue for beta value communication between threads. Used in reward function.
        self.emax_queue: Optional[Queue] = None
        # queue for E_max value communication between threads. Used in reward function.

        self.truncate_flag = False

    def reset(
        self, *,
        seed: Optional[int] = None,
        options: Optional[Dict[str, Any]] = None
    ):
        self.episode += 1
        # Increment the counting of episodes in 1.
        self.timestep = 0

        if not self.truncate_flag:

            if self.energyplus_runner is not None and self.energyplus_runner.simulation_complete:
                # Condition implemented to restart a new epsiode when simulation is completed and EnergyPlus Runner is already inicialized.
                self.energyplus_runner.stop()

            # If the EnergyPlus Runner is not inicialized is a new simulation run.
            self.obs_queue = Queue(maxsize=1)
            self.act_queue = Queue(maxsize=1)
            self.cooling_queue = Queue(maxsize=1)
            self.heating_queue = Queue(maxsize=1)
            self.pmv_queue = Queue(maxsize=1)
            self.ppd_queue = Queue(maxsize=1)
            self.beta_queue = Queue(maxsize=1)
            self.emax_queue = Queue(maxsize=1)
            # Define the queues for flow control between threads in a max size of 1 because EnergyPlus
            # time step will be processed at a time.


            self.energyplus_runner = EnergyPlusRunner(
                # Start EnergyPlusRunner whith the following configuration.
                episode=self.episode,
                env_config=self.env_config,
                obs_queue=self.obs_queue,
                act_queue=self.act_queue,
                cooling_queue=self.cooling_queue,
                heating_queue=self.heating_queue,
                pmv_queue = self.pmv_queue,
                ppd_queue = self.ppd_queue,
                beta_queue = self.beta_queue,
                emax_queue = self.emax_queue
            )

            self.energyplus_runner.start()
            # Divide the thread in two in this point.
            self.energyplus_runner.obs_event.wait()
            # Wait untill an observation is made.
            obs = self.obs_queue.get()
            # Get the observation.
            self.energyplus_runner.cooling_event.wait()
            # Wait untill an cooling metric read is made.
            ec = self.cooling_queue.get()
            # Get the cooling metric read. It is not used here, but yes in the step method and it is necesary
            # to liberate the space in teh queue.
            self.energyplus_runner.heating_event.wait()
            # Wait untill an heating metric read is made.
            eh = self.heating_queue.get()
            # Get the heating metric read. It is not used here, but yes in the step method and it is necesary
            # to liberate the space in teh queue.
            self.energyplus_runner.pmv_event.wait()
            # Wait untill an PMV metric read is made.
            pmv = self.pmv_queue.get()
            # Get the PMV metric read. It is not used here, but yes in the step method and it is necesary
            # to liberate the space in teh queue.
            self.energyplus_runner.ppd_event.wait()
            # Wait untill an PPD metric read is made.
            ppd = self.ppd_queue.get()
            # Get the PPD metric read. It is not used here, but yes in the step method and it is necesary
            # to liberate the space in teh queue.

            self.energyplus_runner.beta_event.wait()
            # Wait untill an beta value read is made.
            self.beta = self.beta_queue.get()
            # Get the beta value read. It is not used here, but yes in the step method and it is necesary
            # to liberate the space in teh queue.
            self.energyplus_runner.emax_event.wait()
            # Wait untill an E_max value read is made.
            self.emax = self.emax_queue.get()
            # Get the E_max value read. It is not used here, but yes in the step method and it is necesary
            # to liberate the space in teh queue.

            self.last_obs = obs
            # Save the observation as a last observation.
            self.last_beta = self.beta
            # Save the beta as a last beta.
            self.last_emax = self.emax
            # Save the E_max as a last E_max.

        else:
            obs = self.last_obs
            self.truncate_flag = False

        return obs, {}

    def step(self, action):
        self.timestep += 1
        terminated = False
        truncated = False
        # terminated variable is used to determine the end of a episode. Is stablished as False until the
        # environment present a terminal state.
        timeout = 40
        # timeout is set to 4s to handle end of simulation cases, which happens async
        # and materializes by worker thread waiting on this queue (EnergyPlus callback
        # not consuming yet/anymore).timeout value can be increased if E+ timestep takes longer.
        if self.energyplus_runner.simulation_complete:
            # simulation_complete is likely to happen after last env step()
            # is called, hence leading to waiting on queue for a timeout.
            if self.energyplus_runner.failed():
                # check for simulation errors.
                raise Exception("Faulty episode")
            terminated = True
            # if the simulation is complete, the episode is ended.
            obs = self.last_obs
            # we use the last observation as a observation for the timestep.
            self.beta = self.last_beta
            # we use the last beta as a beta for the timestep.
            self.emax = self.last_emax
            # we use the last E_max as a E_max for the timestep.
        else:
            # if the simulation is not complete, enqueue action (received by EnergyPlus through
            # dedicated callback) and then wait to get next observation.
            try:
                self.act_queue.put(action,timeout=timeout)
                self.energyplus_runner.act_event.set()
                # Send the action to the EnergyPlus Runner flow.
                self.energyplus_runner.obs_event.wait(timeout=timeout)
                obs = self.obs_queue.get(timeout=timeout)
                # Get the return observation after the action is applied.
                self.last_obs = obs
                # Upgrade last observation.
                self.beta = self.beta_queue.get(timeout=timeout)
                self.last_beta = self.beta
                # Get the return beta after the action is applied and upgrade last beta.
                self.emax = self.emax_queue.get(timeout=timeout)
                self.last_emax = self.emax
                # Get the return E_max after the action is applied and upgrade last E_max.
            except (Full, Empty):
                terminated = True
                # Set the terminated variable into True to finish the episode.
                obs = self.last_obs
                # We use the last observation as a observation for the timestep.
                self.beta = self.last_beta
                # We use the last beta as a beta for the timestep.
                self.emax = self.last_emax
                # We use the last E_max as a E_max for the timestep.

        if self.energyplus_runner.failed():
            # Raise an exception if the episode is faulty.
            truncated = True
            raise Exception("Faulty episode")

        self.energyplus_runner.cooling_event.wait(10)
        if self.energyplus_runner.failed():
            raise Exception("Faulty episode")
        self.ec = self.cooling_queue.get()
        self.ec = (abs(self.ec))/(3600000)
        # Wait for the cooling energy consume in the timestep
        self.energyplus_runner.heating_event.wait(10)
        if self.energyplus_runner.failed():
            raise Exception("Faulty episode")
        self.eh = self.heating_queue.get()
        self.eh = (abs(self.eh))/(3600000)
        # Wait for the heating energy consume in the timestep
        self.energyplus_runner.pmv_event.wait(10)
        if self.energyplus_runner.failed():
            raise Exception("Faulty episode")
        self.pmv = self.pmv_queue.get()
        # Wait for the pmv factor in the timestep
        self.energyplus_runner.ppd_event.wait(10)
        if self.energyplus_runner.failed():
            raise Exception("Faulty episode")
        self.ppd = self.ppd_queue.get()
        # Wait for the ppd factor in the timestep

        reward = (-self.beta*(self.eh + self.ec)/(self.emax) - (1-self.beta)*(self.ppd/100))
        # Compute reward, energy, comfort and ppd.
        infos = {
            'energy': self.eh + self.ec,
            'comfort': self.pmv,
            'ppd': self.ppd,
        }
        # Save energy, comfort (pmv) and ppd in the info dictionary, used after for analisys

        #truncated = self.timestep_cut(6*24*5, terminated)

        return obs, reward, terminated, truncated, infos

    def close(self):
        if self.energyplus_runner is not None:
            self.energyplus_runner.stop()

    def render(self, mode="human"):
        pass

    """def timestep_cut(self, num_timestep: int, terminated: bool):
        if self.timestep >= num_timestep and not terminated:
            self.truncate_flag = True
            return True
        else:
            self.truncate_flag = False
            return False"""

Se establece la configuración del entorno.

In [None]:
"""## DEFINE THE EXPERIMENT CONTROLS
"""
algorithm = 'DQN'
# Define the algorithm to use to train the policy. Options are: PPO, SAC, DQN.
tune_runner  = True
# Define if the experiment tuning the variables or execute a unique configuration.
restore = False
# To define if is necesary to restore or not a previous experiment. Is necesary to stablish a 'restore_path'.
restore_path = ''
# Path to the folder where the experiment is located.

env_config = {
    'weather_folder': '/content/drive/My Drive/ep_drive/epw',
    'output': '/content/drive/My Drive/ep_drive/output',
    'epjson_folderpath': '/content/drive/My Drive/ep_drive/epjson',
    'epjson_output_folder': '/content/drive/My Drive/ep_drive/models',
    # Configure the directories for the experiment.
    'ep_terminal_output': False,
    # For dubugging is better to print in the terminal the outputs of the EnergyPlus simulation process.
    'beta': 0.5,
    # This parameter is used to balance between energy and comfort of the inhabitatns. A
    # value equal to 0 give a no importance to comfort and a value equal to 1 give no importance
    # to energy consume. Mathematically is the reward:
    # r = - beta*normaliced_energy - (1-beta)*normalized_comfort
    # The range of this value goes from 0.0 to 1.0.,
    'is_test': False,
    # For evaluation process 'is_test=True' and for trainig False.
    'test_init_day': 1,
    'action_space': gym.spaces.Discrete(4),
    # action space for simple agent case
    'observation_space': gym.spaces.Box(float("-inf"), float("inf"), (1465,)),
    # observation space for simple agent case

    # BUILDING CONFIGURATION
    'building_name': 'prot_1',
    'volumen': 131.6565,
    'window_area_relation_north': 0,
    'window_area_relation_west': 0,
    'window_area_relation_south': 0.0115243076,
    'window_area_relation_east': 0.0276970753,
    'episode_len': 365,
    'rotation': 0,
}

### **4.1.** Comprobación del seriabilidad del entorno

In [None]:
from ray.util import inspect_serializability

# Assume `env` is your environment
is_serializable, unserializable_objects = inspect_serializability(EnergyPlusEnv_v0, depth=10)
print(f"Is serializable: {is_serializable}")
if not is_serializable:
    print("Unserializable objects:")
    for obj in unserializable_objects:
        print(obj)

Checking Serializability of <class '__main__.EnergyPlusEnv_v0'>
!!! FAIL serialization: ctypes objects containing pointers cannot be pickled
    Serializing '__enter__' <function Env.__enter__ at 0x7e12871af520>...
    Serializing '__exit__' <function Env.__exit__ at 0x7e12871af5b0>...
    Serializing '__init__' <function EnergyPlusEnv_v0.__init__ at 0x7e1287a75900>...
    !!! FAIL serialization: ctypes objects containing pointers cannot be pickled
    Detected 1 global variables. Checking serializability...
        Serializing 'env_config' {'weather_folder': '/content/drive/My Drive/ep_drive/epw', 'output': '/content/drive/My Drive/ep_drive/output', 'epjson_folderpath': '/content/drive/My Drive/ep_drive/epjson', 'epjson_output_folder': '/content/drive/My Drive/ep_drive/models', 'ep_terminal_output': False, 'beta': 0.5, 'is_test': False, 'test_init_day': 1, 'action_space': Discrete(4), 'observation_space': Box(-inf, inf, (1465,), float32), 'building_name': 'prot_1', 'volumen': 131.6565

## **5**. Confuguración del algoritmo

In [None]:
ray.init()
# Inicialiced Ray Server
register_env(name="EPEnv", env_creator=lambda args: EnergyPlusEnv_v0(args))
# Register the environment.

def trial_str_creator(trial):
    return "{}_{}_{}_REP".format(trial.trainable_name, trial.trial_id, tune.search.repeater.TRIAL_INDEX)


2024-02-16 10:33:32,115	INFO worker.py:1715 -- Started a local Ray instance. View the dashboard at [1m[32m127.0.0.1:8265 [39m[22m


ValueError: ctypes objects containing pointers cannot be pickled

### **5.1.** PPO Configuration

In [None]:
algorithm = 'PPO'
# PPO Algorithm Config
algo = PPOConfig().training(
  # General Algo Configs
  gamma=0.72 if not tune_runner else tune.uniform(0.7, 0.99),
  # Float specifying the discount factor of the Markov Decision process.
  lr=0.04 if not tune_runner else tune.uniform(0.001, 0.1),
  # The learning rate (float) or learning rate schedule
  #model=,
  # Arguments passed into the policy model. See models/catalog.py for a full list of the
  # available model options.
  train_batch_size=128,# if not tune_runner else tune.choice([128, 256]),
  # PPO Configs
  lr_schedule=None, # List[List[int | float]] | None = NotProvided,
  # Learning rate schedule. In the format of [[timestep, lr-value], [timestep, lr-value], …]
  # Intermediary timesteps will be assigned to interpolated learning rate values. A schedule
  # should normally start from timestep 0.
  use_critic=True, # bool | None = NotProvided,
  # Should use a critic as a baseline (otherwise don’t use value baseline; required for using GAE).
  use_gae=True, # bool | None = NotProvided,
  # If true, use the Generalized Advantage Estimator (GAE) with a value function,
  # see https://arxiv.org/pdf/1506.02438.pdf.
  lambda_=0.20216 if not tune_runner else tune.uniform(0, 1.0), # float | None = NotProvided,
  # The GAE (lambda) parameter.  The generalized advantage estimator for 0 < λ < 1 makes a
  # compromise between bias and variance, controlled by parameter λ.
  use_kl_loss=True, # bool | None = NotProvided,
  # Whether to use the KL-term in the loss function.
  kl_coeff=9.9712 if not tune_runner else tune.uniform(0.3, 10.0), # float | None = NotProvided,
  # Initial coefficient for KL divergence.
  kl_target=0.054921 if not tune_runner else tune.uniform(0.001, 0.1), # float | None = NotProvided,
  # Target value for KL divergence.
  sgd_minibatch_size=48,# if not tune_runner else tune.choice([48, 128]), # int | None = NotProvided,
  # Total SGD batch size across all devices for SGD. This defines the minibatch size
  # within each epoch.
  num_sgd_iter=6,# if not tune_runner else tune.randint(30, 60), # int | None = NotProvided,
  # Number of SGD iterations in each outer loop (i.e., number of epochs to execute per train batch).
  shuffle_sequences=True, # bool | None = NotProvided,
  # Whether to shuffle sequences in the batch when training (recommended).
  vf_loss_coeff=0.38584 if not tune_runner else tune.uniform(0.1, 1.0), # Tune this! float | None = NotProvided,
  # Coefficient of the value function loss. IMPORTANT: you must tune this if you set
  # vf_share_layers=True inside your model’s config.
  entropy_coeff=10.319 if not tune_runner else tune.uniform(0.95, 15.0), # float | None = NotProvided,
  # Coefficient of the entropy regularizer.
  entropy_coeff_schedule=None, # List[List[int | float]] | None = NotProvided,
  # Decay schedule for the entropy regularizer.
  clip_param=0.22107 if not tune_runner else tune.uniform(0.1, 0.4), # float | None = NotProvided,
  # The PPO clip parameter.
  vf_clip_param=39.327 if not tune_runner else tune.uniform(0, 50), # float | None = NotProvided,
  # Clip param for the value function. Note that this is sensitive to the scale of the
  # rewards. If your expected V is large, increase this.
  grad_clip=None, # float | None = NotProvided,
  # If specified, clip the global norm of gradients by this amount.
).environment(
  env="EPEnv",
  observation_space=gym.spaces.Box(float("-inf"), float("inf"), (49,)),
  action_space=gym.spaces.Discrete(4),
  env_config=env_config,
).framework(
  framework = 'torch',
).fault_tolerance(
  recreate_failed_workers = True,
  restart_failed_sub_environments=False,
).rollouts(
  num_rollout_workers = 1,# if not tune_runner else tune.grid_search([0, 1, 3]),
  create_env_on_local_worker=True,
  rollout_fragment_length = 'auto',
  enable_connectors = True,
  #batch_mode="truncate_episodes",
  num_envs_per_worker=1,
).experimental(
  _enable_new_api_stack = True,
).reporting( # multi_agent config va aquí
  min_sample_timesteps_per_iteration = 2000,
).checkpointing(
  export_native_model_files = True,
).debugging(
  log_level = "ERROR",
  #seed=7,# if not tune_runner else tune.grid_search([0, 1, 2, 3, 4, 5, 6, 7, 8, 9]),
).resources(
  num_gpus = 0,
)

### **5.2.** DQN Configuration

In [None]:
algo = DQNConfig().training(
  # General Algo Configs
  gamma = 0.7 if not tune_runner else tune.uniform(0.7, 0.99),
  lr = 0.1 if not tune_runner else tune.uniform(0.001, 0.3),
  grad_clip = 0.5 if not tune_runner else tune.uniform(0.5, 40.0),
  grad_clip_by = 'global_norm',
  train_batch_size = 8,# if not tune_runner else tune.choice([4, 8, 128, 256]),
  model = {
    "fcnet_hiddens": [1024,512,512,512],
    "fcnet_activation": "relu", #if not tune_runner else tune.choice(['tanh', 'relu', 'swish', 'linear']),
  },
  optimizer = {},
  # DQN Configs
  num_atoms = 40,
  v_min = -1,
  v_max = 0,
  noisy = True,
  sigma0 = 0.66 if not tune_runner else tune.uniform(0, 1),
  dueling = True,
  hiddens = [512],
  double_q = True,
  n_step = 24,
  replay_buffer_config = {
    '_enable_replay_buffer_api': True,
    'type': 'MultiAgentPrioritizedReplayBuffer',
    'capacity': 100000,
    'prioritized_replay_alpha': 0.6,
    'prioritized_replay_beta': 0.4,
    'prioritized_replay_eps': 1e-6,
    'replay_sequence_length': 1,
  },
  categorical_distribution_temperature = 0.5 if not tune_runner else tune.uniform(0, 1),
).environment(
  env="EPEnv",
  env_config=env_config,
).framework(
  framework = 'torch',
).fault_tolerance(
  recreate_failed_workers = True,
  restart_failed_sub_environments=False,
).rollouts(
  num_rollout_workers = 1,
  create_env_on_local_worker=True,
  rollout_fragment_length = 'auto',
  enable_connectors = True,
  num_envs_per_worker=1,
).experimental(
  _enable_new_api_stack = False,
).reporting( # multi_agent config va aquí
  min_sample_timesteps_per_iteration = 1000,
).checkpointing(
  export_native_model_files = True,
).debugging(
  log_level = "ERROR",
  #seed=7,
).resources(
  num_gpus = 0,
)
algo.exploration(
  exploration_config={
    "type": "EpsilonGreedy",
    "initial_epsilon": 1.,
    "final_epsilon": 0.,
    "epsilon_timesteps": 6*24*365*10,
  }
)

### **5.3.** SAC Configuration

In [None]:
algorithm = 'SAC'
algo = SACConfig().training(
  # General Algo Configs
  gamma = 0.99 if not tune_runner else tune.uniform(0.7, 0.99),
  # Float specifying the discount factor of the Markov Decision process.
  lr = 0.1 if not tune_runner else tune.uniform(0.001, 0.1),
  # The learning rate (float) or learning rate schedule
  #grad_clip = None, #float
  # If None, no gradient clipping will be applied. Otherwise, depending on the setting of grad_clip_by, the (float)
  # value of grad_clip will have the following effect: If grad_clip_by=value: Will clip all computed gradients
  # individually inside the interval [-grad_clip, +`grad_clip`]. If grad_clip_by=norm, will compute the L2-norm of
  # each weight/bias gradient tensor individually and then clip all gradients such that these L2-norms do not exceed
  # grad_clip. The L2-norm of a tensor is computed via: sqrt(SUM(w0^2, w1^2, ..., wn^2)) where w[i] are the elements
  # of the tensor (no matter what the shape of this tensor is). If grad_clip_by=global_norm, will compute the square
  # of the L2-norm of each weight/bias gradient tensor individually, sum up all these squared L2-norms across all
  # given gradient tensors (e.g. the entire module to be updated), square root that overall sum, and then clip all
  # gradients such that this global L2-norm does not exceed the given value. The global L2-norm over a list of tensors
  # (e.g. W and V) is computed via: sqrt[SUM(w0^2, w1^2, ..., wn^2) + SUM(v0^2, v1^2, ..., vm^2)], where w[i] and v[j]
  # are the elements of the tensors W and V (no matter what the shapes of these tensors are).
  #grad_clip_by = 'global_norm', #str
  # See grad_clip for the effect of this setting on gradient clipping. Allowed values are value, norm, and global_norm.
  #train_batch_size = 128, # if not tune_runner else tune.randint(128, 257),
  #  Training batch size, if applicable.
  model = {
    "fcnet_hiddens": [256],
    "fcnet_activation": "relu",
  },
  # Arguments passed into the policy model. See models/catalog.py for a full list of the
  # available model options. TODO: Provide ModelConfig objects instead of dicts
  #optimizer = None, #dict
  # Arguments to pass to the policy optimizer. This setting is not used when _enable_new_api_stack=True.
  #max_requests_in_flight_per_sampler_worker = None, #int
  # Max number of inflight requests to each sampling worker. See the FaultTolerantActorManager class for more details.
  # Tuning these values is important when running experimens with large sample batches, where there is the risk that
  # the object store may fill up, causing spilling of objects to disk. This can cause any asynchronous requests to
  # become very slow, making your experiment run slow as well. You can inspect the object store during your experiment
  # via a call to ray memory on your headnode, and by using the ray dashboard. If you’re seeing that the object store
  # is filling up, turn down the number of remote requests in flight, or enable compression in your experiment of
  # timesteps.
  #learner_class = None,
  # The Learner class to use for (distributed) updating of the RLModule. Only used when _enable_new_api_stack=True.

  # SAC Configs
  twin_q = True, #bool
  # Use two Q-networks (instead of one) for action-value estimation. Note: Each Q-network will have its own target network.
  #q_model_config = #~typing.Dict[str, ~typing.Any]
  # Model configs for the Q network(s). These will override MODEL_DEFAULTS. This is treated just as the top-level model
  # dict in setting up the Q-network(s) (2 if twin_q=True). That means, you can do for different observation spaces:
  # obs=Box(1D) -> Tuple(Box(1D) + Action) -> concat -> post_fcnet obs=Box(3D) -> Tuple(Box(3D) + Action) ->
  # vision-net -> concat w/ action -> post_fcnet obs=Tuple(Box(1D), Box(3D)) -> Tuple(Box(1D), Box(3D), Action) ->
  # vision-net -> concat w/ Box(1D) and action -> post_fcnet You can also have SAC use your custom_model as Q-model(s),
  # by simply specifying the custom_model sub-key in below dict (just like you would do in the top-level model dict.
  #policy_model_config = #~typing.Dict[str, ~typing.Any]
  # Model options for the policy function (see q_model_config above for details). The difference to q_model_config above
  # is that no action concat’ing is performed before the post_fcnet stack.
  tau = 1.0, #float
  # Update the target by au * policy + (1- au) * target_policy.
  initial_alpha = 0.5, #float
  # Initial value to use for the entropy weight alpha.
  target_entropy = 'auto', #str | float
  # Target entropy lower bound. If “auto”, will be set to -|A| (e.g. -2.0 for Discrete(2), -3.0 for Box(shape=(3,))). This
  # is the inverse of reward scale, and will be optimized automatically.
  n_step = 10, # if not tune_runner else tune.randint(1, 11), #int
  # N-step target updates. If >1, sars’ tuples in trajectories will be postprocessed to become
  # sa[discounted sum of R][s t+n] tuples.
  store_buffer_in_checkpoints = True, #bool
  # Set this to True, if you want the contents of your buffer(s) to be stored in any saved checkpoints as well. Warnings
  # will be created if: - This is True AND restoring from a checkpoint that contains no buffer data. - This is
  # False AND restoring from a checkpoint that does contain buffer data.
  replay_buffer_config = {
    '_enable_replay_buffer_api': True,
    'type': 'MultiAgentPrioritizedReplayBuffer',
    'capacity': 50000,
    'prioritized_replay_alpha': 0.6,
    'prioritized_replay_beta': 0.4,
    'prioritized_replay_eps': 1e-6,
    'replay_sequence_length': 1,
  },
  # Replay buffer config. Examples: { “_enable_replay_buffer_api”: True, “type”: “MultiAgentReplayBuffer”,
  # “capacity”: 50000, “replay_batch_size”: 32, “replay_sequence_length”: 1, } - OR - { “_enable_replay_buffer_api”: True,
  # “type”: “MultiAgentPrioritizedReplayBuffer”, “capacity”: 50000, “prioritized_replay_alpha”: 0.6,
  # “prioritized_replay_beta”: 0.4, “prioritized_replay_eps”: 1e-6, “replay_sequence_length”: 1, } - Where -
  # prioritized_replay_alpha: Alpha parameter controls the degree of prioritization in the buffer. In other words, when
  # a buffer sample has a higher temporal-difference error, with how much more probability should it drawn to use
  # to update the parametrized Q-network. 0.0 corresponds to uniform probability. Setting much above 1.0 may quickly
  # result as the sampling distribution could become heavily “pointy” with low entropy. prioritized_replay_beta: Beta
  # parameter controls the degree of importance sampling which suppresses the influence of gradient updates from
  # samples that have higher probability of being sampled via alpha parameter and the temporal-difference error.
  # prioritized_replay_eps: Epsilon parameter sets the baseline probability for sampling so that when the
  # temporal-difference error of a sample is zero, there is still a chance of drawing the sample.
  #training_intensity = #float
  # The intensity with which to update the model (vs collecting samples from the env). If None, uses “natural” values
  # of: train_batch_size / (rollout_fragment_length x num_workers x num_envs_per_worker). If not None, will make sure
  # that the ratio between timesteps inserted into and sampled from th buffer matches the given values. Example:
  # training_intensity=1000.0 train_batch_size=250 rollout_fragment_length=1 num_workers=1 (or 0) num_envs_per_worker=1 ->
  # natural value = 250 / 1 = 250.0 -> will make sure that replay+train op will be executed 4x asoften as rollout+insert
  # op (4 * 250 = 1000). See: rllib/algorithms/dqn/dqn.py::calculate_rr_weights for further details.
  clip_actions = True, #bool
  # Whether to clip actions. If actions are already normalized, this should be set to False.
  #grad_clip = #float
  # If not None, clip gradients during optimization at this value.
  optimization_config = { #~typing.Dict[str, ~typing.Any]
    'actor_learning_rate': 0.005,
    'critic_learning_rate': 0.005,
    'entropy_learning_rate': 0.0001,
  },
  # Config dict for optimization. Set the supported keys actor_learning_rate, critic_learning_rate, and
  # entropy_learning_rate in here.
  target_network_update_freq = 144, #int
  # Update the target network every target_network_update_freq steps.
  #_deterministic_loss = #bool
  # Whether the loss should be calculated deterministically (w/o the stochastic action sampling step). True only useful
  # for continuous actions and for debugging.
  #_use_beta_distribution = #bool
  # Use a Beta-distribution instead of a SquashedGaussian for bounded, continuous action spaces (not recommended; for
  # debugging only).
).environment(
  env="EPEnv",
  observation_space=gym.spaces.Box(float("-inf"), float("inf"), (49,)),
  action_space=gym.spaces.Discrete(4),
  env_config=env_config,
).framework(
  framework = 'torch',
).fault_tolerance(
  recreate_failed_workers = True,
  restart_failed_sub_environments=False,
).rollouts(
  num_rollout_workers = 1,# if not tune_runner else tune.grid_search([0, 1, 3]),
  create_env_on_local_worker=True,
  rollout_fragment_length = 'auto',
  enable_connectors = True,
  #batch_mode="truncate_episodes",
  num_envs_per_worker=1,
).experimental(
  _enable_new_api_stack = True,
).reporting( # multi_agent config va aquí
  min_sample_timesteps_per_iteration = 2000,
).checkpointing(
  export_native_model_files = True,
).debugging(
  log_level = "ERROR",
).resources(
  num_gpus = 0,
)



## **6.** Ejecución del experimento

In [None]:
if not restore:
    tune.Tuner(
        algorithm,
        tune_config=tune.TuneConfig(
            mode="max",
            metric="episode_reward_mean",
            num_samples=1000,
            # This is necesary to iterative execute the search_alg to improve the hyperparameters
            reuse_actors=False,
            trial_name_creator=trial_str_creator,
            trial_dirname_creator=trial_str_creator,

            #search_alg = Repeater(BayesOptSearch(),repeat=10),
            search_alg = BayesOptSearch(),
            # Search algorithm

            #scheduler = ASHAScheduler(time_attr = 'timesteps_total', max_t=6*24*365*3, grace_period=6*24*365),
            # Scheduler algorithm

        ),
        run_config=air.RunConfig(
            name='BOS_VN_P1_'+str(env_config['beta'])+'_'+str(algorithm),
            stop={"timesteps_total": 6*24*365*20},
            log_to_file=True,

            checkpoint_config=air.CheckpointConfig(
                checkpoint_at_end = True,
                checkpoint_frequency = 40,
                #num_to_keep = 20
            ),
            failure_config=air.FailureConfig(
                max_failures=100
                # Tries to recover a run up to this many times.
            ),
        ),
        param_space=algo.to_dict(),
    ).fit()

else:
    tune.Tuner.restore(
        path=restore_path,
        trainable = algorithm,
        resume_errored=True
    )

"""## END EXPERIMENT AND SHUTDOWN RAY SERVE
"""
ray.shutdown()

In [None]:
ray.shutdown()