## Customising the environment class - Actions and Observations

For more complex systems (and therefore simulink models) the definition of other action and observation spaces are very useful tools. Also, there can be inputs which are external and not controlled by the agent. In the following example this will be shown for an example from the area of energy systems.

As an example the energy managenment of a DC microgrid is chosen. The microgrid consist of a PV array, a battery storage, a load and a grid connection. The corresponding simulink file (MicrogridExample.slx) and the necessary simscape-files (shortCircuit.ssc, VariablePowerLoad.ssc and Wire_2Cond.ssc) are included. The main idea is to optimise the use of the battery storage to minimise the energy consuption from the main grid and maximise the use of the PV energy.

As first step, the config-file is read and a FMU is generated.

In [3]:
# import packages as in the other examples
import stablerls.gymFMU as gymFMU
import stablerls.configreader as cfg_reader
import stablerls.createFMU as createFMU
import numpy as np
import logging

# normally we dont recommend the info-logging but here it is used for demonstration
logging.basicConfig(level=logging.INFO)

# read config-file
config = cfg_reader.configreader('03-config.cfg')

# create FMU
createFMU.createFMU(config,'MicrogridExample.slx')

After the generation of the FMU, an instance of the corresponding gymnasium environment is created. Because information logging is active, all in- and outputs are listed.

In [4]:
# create instance of the model
env = gymFMU.StableRLS(config)

INFO:stablerls.fmutools:Using: 03-MicrogridFMU.fmu
INFO:stablerls.fmutools:Unzipped in C:\Users\cfn2169\AppData\Local\Temp\tmp1cj_nzza
INFO:stablerls.fmutools:Found inputs - access them by the corresponding number:
INFO:stablerls.fmutools: 0: Control.InptPV.ModuleTemperature.ModuleTemperature
INFO:stablerls.fmutools: 1: Control.InptPV.Irradiance.Irradiance
INFO:stablerls.fmutools: 2: Control.InptLoad.LoadPower
INFO:stablerls.fmutools: 3: Control.InptBattery.VoltageReference
INFO:stablerls.fmutools: 4: Control.InptBattery.SOC_Init
INFO:stablerls.fmutools: 5: Control.InptGrid.VoltageReference
INFO:stablerls.fmutools:Found outputs - access them by the corresponding number:
INFO:stablerls.fmutools: 0: Measurement.PV.V
INFO:stablerls.fmutools: 1: Measurement.PV.P
INFO:stablerls.fmutools: 2: Measurement.PV.I
INFO:stablerls.fmutools: 3: Measurement.Grid.V
INFO:stablerls.fmutools: 4: Measurement.Grid.P
INFO:stablerls.fmutools: 5: Measurement.Grid.I
INFO:stablerls.fmutools: 6: Measurement.Load.

A look at the input list shows that only some of them shall be controlled by the agent while others depend on environmental parameters like the actual weather or are fixed system parameters. In this example, only the voltage references for grid and battery converters are to be controlled. For irradiance and module temperature for the PV module model real data from an external source is imported. The values for the initial SOC at the start of a training episode are randomised to cover many possible scenarios. For the reference voltages, only discrete steps shall be possible, so two discrete actions with 11 possible states each are defined in the action space. During the action assignment, they are converted into reference steps between -0.4 and 0.4 V.

Additionally, 13 outputs are a lot and not necessary for the agent since some of them contain similiar information. Knowledge about the voltages and source currents as well as the battery SOC is enough. This results in 8 observations which are continous, but can only take limited values due to a normalisation. They will be normalised to values between +-1 in the observation processing. For the normalisation the nominal voltages and currents are added to the config file:

```
[General]
nominal_voltage = 48.0
nominal_current = 25.0
```

Therefore, the spaces needs to modified. For this, a new class is defined which is inherited from the StableRLS base class and the corresponding setter methods are overriden. In addition to the setter methods, the methods for assignment of the actions and the observation processing are overridden to be coherent with the new spaces. For the external, weather dependent values the pre-defined function *FMU_external_input* is used. Here only fixed values are taken, but in the next example files with the weather data are read.

In addition, the *reset_* function is changed to choose a random SOC for the start of the simulation and set the action signals for the first run. This function could further be expanded by reading real weather data, e. g. for different times.

In [2]:
import stablerls.gymFMU as gymFMU
import stablerls.configreader as cfg_reader
import stablerls.createFMU as createFMU
import gymnasium as gym
import numpy as np
import logging
import random

logger = logging.getLogger(__name__)

class GridEnv(gymFMU.StableRLS):
    def set_action_space(self):
        """Setter function for the action space of the agent. 
        This function overrides the base implementation of StableRLS to 
        choose only certain inputs as actions.

        Returns
        -------
        space : gymnasium.space
            Returns the action space defined by specified FMU inputs
        """
        return gym.spaces.MultiDiscrete([11, 11])
    
    def set_observation_space(self):
        """Setter function for the observation space of the agent. 
        This function overrides the base implementation of StableRLS to 
        choose only certain outputs as observations.

        Returns
        -------
        space : gymnasium.space
            Returns the observation space defined by specified FMU outputs
        """
        high = np.arange(8).astype(np.float32)
        high[:] = 1
        low = high * -1
        return gym.spaces.Box(low, high)
    
    def assignAction(self, action):
        """Changed assignment of actions to the FMU because only certain inputs
        are used for the agent actions.

        Parameters
        ----------
        action : list
            An action provided by the agent to update the environment state.
        """
        # assign actions to inputs
        # check if actions are within action space
        if not self.action_space.contains(action):
            logger.info(f"The actions are not within the action space. Action: {action}. Time: {self.time}")

        # convert discrete actions into steps of voltage references
        vStepGrid = (action[0] - 5) * 0.04    
        vStepBat = (action[1] - 5) * 0.04

        # add them to actual references to get new setpoints
        vRefGrid = self.fmu.fmu.getReal([self.fmu.input[3].valueReference])[0] + vStepGrid  
        vRefBat = self.fmu.fmu.getReal([self.fmu.input[5].valueReference])[0] + vStepBat

        # assign actions to the FMU inputs - take care of right indices!
        self.fmu.fmu.setReal([self.fmu.input[3].valueReference], [vRefGrid])
        self.fmu.fmu.setReal([self.fmu.input[5].valueReference], [vRefBat])

    def obs_processing(self, raw_obs):
        """Customised action processing: Only specific outputs are evaluated. Additionally,
        they are normalised to +-1.

        Parameters
        ----------
        raw_obs : ObsType
            The raw observation defined by all FMU outputs.
        Returns
        -------
        observation : ObsType
            The processed observation for the agent.
        """

        nDec = 2
        observation = np.array([  round((raw_obs[0] - self.nominal_voltage) / (0.1*self.nominal_voltage), 2),   # PV.V
                                  round((raw_obs[3] - self.nominal_voltage) / (0.1*self.nominal_voltage), 2),   # Grid.V                     
                                  round((raw_obs[6] - self.nominal_voltage) / (0.1*self.nominal_voltage), 2),   # Load.V
                                  round((raw_obs[13] - self.nominal_voltage) / (0.1*self.nominal_voltage), 2),  # Bat.V  
                                  round(raw_obs[2] / self.nominal_current, 2),                                  # PV.I
                                  round(raw_obs[5] / self.nominal_current, 2),                                  # Grid.I
                                  round(raw_obs[12] / self.nominal_current, 2),                                 # Bat.Inet
                                  round(raw_obs[11], nDec),                                                     # Bat.SOC
                               ]).astype(np.float32)

        return observation
    
    def reset_(self, seed=None):
        """Since zeros make no sense for the voltage references, the input reset is
        changed. Also, the initial SOC is chosen randomly between limits.

        Parameters
        ----------
        seed : int, optional
            - None -
        """
        # set voltage references to nominal voltage
        self.fmu.fmu.setReal([self.fmu.input[3].valueReference], [self.nominal_voltage])
        self.fmu.fmu.setReal([self.fmu.input[5].valueReference], [self.nominal_voltage])

        # set initial SOC, randomly chosen
        rand = random.random()
        if rand < 0.15:
            self.soc_init = 0.15
        elif rand > 0.85:
            self.soc_init = 0.85
        else:
            self.soc_init = rand 
        self.fmu.fmu.setReal([self.fmu.input[4].valueReference], [self.soc_init])

        # all other inputs are set during the calculation of the first step since 
        # they are external

        # get the first observation as specified by gymnaisum
        self._next_observation(steps=1)
        return self.obs_processing(self.outputs[self.step_count, :])

    def FMU_external_input(self):
        """This function is called before each FMU step. Here external FMU
        inputs independent of the agent action are set. In this case, this includes
        weather data and the load power.

        Use the code below to access the FMU inputs.
        self.fmu.fmu.setReal([self.fmu.input[0].valueReference], [value])
        """
        # Irradiance
        self.fmu.fmu.setReal([self.fmu.input[1].valueReference], [1000.0])

        # ModuleTemperature
        self.fmu.fmu.setReal([self.fmu.input[0].valueReference], [30.0])

        # LoadPower
        self.fmu.fmu.setReal([self.fmu.input[2].valueReference], [500.0])

With this newly defined class, a new environment object is created and simulated for 10 steps. As you can see, the observations now consist of the normalised eight values which correspond to the voltages and source currents.

In [3]:
# change logging level to warnings only
logging.basicConfig(level=logging.WARNING)

# read config-file, this stay the same for the changed environment
config = cfg_reader.configreader('03-config.cfg')

# create new env object and reset it before simulating
microgrid = GridEnv(config)
obs = microgrid.reset()

# for this example, the actions are kept constant at the reference value of 48 V
action = np.array([5,5])

terminated = False
truncated = False
while not (terminated or truncated):
    observation, reward, terminated, truncated, info  = microgrid.step(action)
    print(f'Action: {action}\nObservation: {observation}\n')

microgrid.close()

Action: [5 5]
Observation: [-0.12 -0.1  -0.17 -0.1   0.    0.11  0.17  0.22]

Action: [5 5]
Observation: [-0.13 -0.1  -0.19 -0.1   0.01  0.15  0.22  0.22]

Action: [5 5]
Observation: [-0.13 -0.1  -0.19 -0.1   0.01  0.16  0.24  0.22]

Action: [5 5]
Observation: [-0.13 -0.1  -0.19 -0.1   0.02  0.16  0.24  0.22]

Action: [5 5]
Observation: [-0.13 -0.1  -0.19 -0.1   0.02  0.16  0.24  0.22]

Action: [5 5]
Observation: [-0.13 -0.1  -0.19 -0.1   0.03  0.16  0.24  0.22]

Action: [5 5]
Observation: [-0.13 -0.1  -0.19 -0.1   0.03  0.16  0.23  0.22]

Action: [5 5]
Observation: [-0.12 -0.1  -0.19 -0.1   0.04  0.16  0.23  0.22]

Action: [5 5]
Observation: [-0.12 -0.1  -0.19 -0.1   0.04  0.16  0.23  0.22]

Action: [5 5]
Observation: [-0.12 -0.1  -0.19 -0.1   0.05  0.16  0.22  0.22]

