# Experimental Results 1: SolarBatteryHouse

In this we run some experiments on the default version of the main Bauwerk environment, `SolarBatteryHouse`. For a full description of the environment, [see here](../../envs/solar_battery_house.ipynb). We recommend rerunning these experiments as baselines for work using this environment.

<div class="alert alert-info">

Note

This notebook uses Gym v0.21 because this is required for the Stable Baselines3 (SB3). Thus, the environment API is different compared to other pages in the Bauwerk docs.

</div>

## Evaluation

There is no one way that performance on the `SolarBatteryHouse` can or should be measured. One obvious choice would be to directly compare the cumulative reward, i.e. overall grid energy payments. However, the magnitude of this value completely depends on the configuration of the environment. Thus, this can make it difficult to compare performance between environments, even within Bauwerk. Instead, we propose using the access to the optimal control actions by considering *performance relative to random and optimal control*,

$p' = \frac{p_m-p_r}{p_o-p_r}$,

where $p_m$ is the average reward of the method to be evaluated, $p_r$ and $p_o$ the average rewards of random and optimal control respectively. 

<div class="alert alert-info">

Note

With the performance measure $p'$, a value $>0$ means that the method is better than random, a value close to $1$ means that the method is close to optimal, a value $<0$ means that the method is worse than random, i.e. completely useless.

</div>



## Baselines

### Random actions

Let's get started by establishing the lowest bar: what's the performance we get, when we just take random actions in the environment (sampled from the action space). To do this, we first need to determine how many actions in the environment we want to evaluate over (`EVAL_LEN`) and setup some helper code.

In [1]:
# Setup and helper code
import bauwerk
import gym
import numpy as np

EVAL_LEN = 24*30 # evaluate on 1 month of actions

# Create SolarBatteryHouse environment
env = gym.make("bauwerk/SolarBatteryHouse-v0")

def evaluate_actions(actions, env):
    cum_reward = 0
    obs = env.reset()
    for action in actions:
        obs, reward, done, info = env.step(np.array(action, dtype=np.float32))
        cum_reward += reward
    
    return cum_reward / len(actions)



In [2]:
# mean random performance over 100 trials
random_trials = [evaluate_actions([env.action_space.sample() for _ in range(EVAL_LEN)], env) for _ in range(100)]
random_std = np.std(random_trials)
p_rand = np.mean(random_trials)
# note: std here is between different trials (of multiple actions)
print(f"Avg reward with random actions: {p_rand:.4f} (standard deviation: {random_std:.4f})")

Avg reward with random actions: -0.5459 (standard deviation: 0.0233)


### Optimal actions

`SolarBatteryHouse` is a fully tractable environment. Thus, Bauwerk can easily compute the theoretically optimal actions one can take in the environment.

In [3]:
optimal_actions, _ = bauwerk.solve(env)
p_opt = evaluate_actions(optimal_actions.reshape((-1,1))[:EVAL_LEN], env)
print(f"Avg reward (per step) with optimal actions: {p_opt:.4f}")

Avg reward (per step) with optimal actions: -0.1036


## Reinforcement learning agent

Next we consider a simple reinforcement learning (RL) agent. We use [Stable Baselines3 (SB3)](https://github.com/DLR-RM/stable-baselines3) to access RL algorithm implementations.

In [4]:
from stable_baselines3 import PPO

model = PPO(
    policy="MultiInputPolicy",
    env="bauwerk/SolarBatteryHouse-v0", 
    verbose=0,
)
model.learn(total_timesteps=25000)

<stable_baselines3.ppo.ppo.PPO at 0xffff5b79f970>

In [5]:
# Obtaining model actions and evaluating them
model_actions = []
obs = env.reset()
for i in range(EVAL_LEN):
    action, _states = model.predict(obs)
    model_actions.append(action)
    obs, _, _, _ = env.step(action)

p_model = evaluate_actions(model_actions[:EVAL_LEN], env)
print(f"Avg reward (per step) with model actions: {p_model:.4f}")

Avg reward (per step) with model actions: -0.1676


In [6]:
# Measuring performance relative to random and optimal
p_bar = (p_model - p_rand)/(p_opt - p_rand)
print(f"Performance relative to random and optimal: {p_bar:.4f}")

Performance relative to random and optimal: 0.8553
