## Defining a reward function
One of the most important steps for reinforcement learning is the definiton of the reward function. This example shows how to do this in StableRLS.

In [1]:
# this contains the environment class
import stablerls.gymFMU as gymFMU
# this will read our config file
import stablerls.configreader as cfg_reader

import numpy as np
import logging

In [22]:
class my_env(gymFMU.StableRLS):
    def get_reward(self, action, observation):
        """This is my custom reward function"""
        info = {}
        reward = observation**2
        terminated = False
        truncated = False
        return reward, terminated, truncated, info
    

For simplicity we already included the compiled FMU models for Linux and Windows. However, if you own Matlab you can compile the *.slx models on your own. If you want to compile the model you can keep the default FMU_path in the config file. Otherwise please change it to 00-Simulink_Windows.fmu or 00-Simulink_Linux.fmu depending on your operating system.

In [23]:
# First of all we have to read the config file
config = cfg_reader.configreader('00-config.cfg')

# if we want to we can compile the simulink model. 
# Matlab and Matlab Engine for python is required!
if False:
    import stablerls.createFMU as createFMU
    createFMU.createFMU(config,'SimulinkExample00.slx')

The FMU is available now and the default options of the StableRLS gymnasium environment are sufficient to run the first simulation.

In [26]:
# create instance of the model
env = my_env(config)

# default reset call bevor the simulation starts
obs = env.reset()

# we wont change the action 
action = np.array([1,2,3,4])

terminated = False
while not terminated:
    observation, reward, terminated, truncated, info  = env.step(action)
    print(f'Action: {action}\nObservation: {observation}\nReward: {reward}\n')
    action = action * 2
        
env.close()

Action: [1 2 3 4]
Observation: [3.]
Reward: [9.]

Action: [2 4 6 8]
Observation: [6.]
Reward: [36.]



If you want to include previous results you can use `env.inputs`/`self.inputs` or `env.outputs`/`self.outputs` for more complex reward calculation.