# Reinforcement Learning control strategies for Electric Vehicles fleet Virtual Power Plants
Thesis based on the development of a RL agent that manages a VPP through EVs charging stations in an household environment. Main optimization objectives of the VPP are: Valley filling, peak shaving and zero resulting load over time. Main action performed to reach objectives are: storage of Renewable energy resources and power push in the grid at high demand times. The development of the Virtual Power Plant environment is based on the ELVIS (Electric Vehicles Infrastructure Simulator) open library from DAI-Labor: https://github.com/dailab/elvis The thesis code is currently available at: (https://github.com/francescomaldonato/RL_VPP_Thesis)

Author: Francesco Maldonato

## VPP simulator Notebook based on EVs arrival, with random actions [no model loaded]

Installing required packages and dependencies

In [1]:
%%capture
!pip install py-elvis==0.2.1
!pip install pyyaml==5.4
!pip install plotly==5.9.0
!pip install -U kaleido==0.2.1

!pip install stable-baselines3[extra]==1.6.1
!pip install stable-baselines==1.6.1
!pip install sb3-contrib==1.6.1
!pip install gym==0.20.0
!pip install -q wandb==0.13.4

In [2]:
#Cloning repository and changing directory
!git clone https://github.com/francescomaldonato/RL_VPP_Thesis.git
%cd RL_VPP_Thesis/
%ls

Cloning into 'RL_VPP_Thesis'...
remote: Enumerating objects: 517, done.[K
remote: Counting objects: 100% (124/124), done.[K
remote: Compressing objects: 100% (59/59), done.[K
remote: Total 517 (delta 65), reused 121 (delta 64), pack-reused 393[K
Receiving objects: 100% (517/517), 188.99 MiB | 25.64 MiB/s, done.
Resolving deltas: 100% (214/214), done.
Checking out files: 100% (223/223), done.
/content/RL_VPP_Thesis
[0m[01;34mAgent_trainer_notebooks[0m/          [01;34mRL_VPP_Thesis[0m/
[01;34mAlgorithm_simulator_notebooks[0m/    [01;34mtrained_models[0m/
[01;34mdata[0m/                             VPP_environment.py
[01;34mEV_experiment_notebooks[0m/          VPP_simulator.ipynb
[01;34mHyperparameters_sweep_notebooks[0m/  [01;34mwandb[0m/
README.md


In [3]:
import yaml
import numpy as np
from VPP_environment import VPPEnv, VPP_Scenario_config
from elvis.config import ScenarioConfig
import os
import torch
import random
#import wandb
#from stable_baselines3 import A2C #The available algoritmhs in sb3-contrib for the custom environment with MultiInputPolicy
from sb3_contrib.common.maskable.utils import get_action_masks
import stable_baselines3 as sb3
from stable_baselines3.common.env_checker import check_env

#Check if cuda device is available for training
print("Torch-Cuda available device:", torch.cuda.is_available())
print(sb3.get_system_info())
#!wandb --version

Torch-Cuda available device: False
OS: Linux-5.10.133+-x86_64-with-Ubuntu-18.04-bionic #1 SMP Fri Aug 26 08:44:51 UTC 2022
Python: 3.7.14
Stable-Baselines3: 1.6.1
PyTorch: 1.12.1+cu113
GPU Enabled: False
Numpy: 1.21.6
Gym: 0.20.0

({'OS': 'Linux-5.10.133+-x86_64-with-Ubuntu-18.04-bionic #1 SMP Fri Aug 26 08:44:51 UTC 2022', 'Python': '3.7.14', 'Stable-Baselines3': '1.6.1', 'PyTorch': '1.12.1+cu113', 'GPU Enabled': 'False', 'Numpy': '1.21.6', 'Gym': '0.20.0'}, 'OS: Linux-5.10.133+-x86_64-with-Ubuntu-18.04-bionic #1 SMP Fri Aug 26 08:44:51 UTC 2022\nPython: 3.7.14\nStable-Baselines3: 1.6.1\nPyTorch: 1.12.1+cu113\nGPU Enabled: False\nNumpy: 1.21.6\nGym: 0.20.0\n')


In [4]:
# Ensure deterministic behavior
# torch.backends.cudnn.deterministic = True
# random.seed(0)
# torch.manual_seed(0)
# torch.cuda.manual_seed_all(0)

## Load ELVIS YAML config file
Section where the EVs arrival simulation parameters are loaded through the Yaml config file from the 'data/config_builder/' folder.

In [5]:
#Loading paths for input data
current_folder = ''
VPP_training_data_input_path = current_folder + 'data/data_training/environment_table/' + 'Environment_data_2019.csv'
VPP_testing_data_input_path = current_folder + 'data/data_testing/environment_table/' + 'Environment_data_2020.csv'
VPP_validating_data_input_path = current_folder + 'data/data_validating/environment_table/' + 'Environment_data_2018.csv'
elvis_input_folder = current_folder + 'data/config_builder/'

case = 'wohnblock_household_simulation_adaptive.yaml' #(loaded by default, 20 EVs arrivals per week with 50% average battery)

#Try different simulation parameters, uncomment below
#case = 'wohnblock_household_simulation_adaptive_10.yaml' #(10 EVs arrivals per week with 50% average battery) 
#case = 'wohnblock_household_simulation_adaptive_15.yaml' #(15 EVs arrivals per week with 50% average battery)
#case = 'wohnblock_household_simulation_adaptive_25.yaml' #(25 EVs arrivals per week with 50% average battery) 
#case = 'wohnblock_household_simulation_adaptive_30.yaml' #(30 EVs arrivals per week with 50% average battery) 
#case = 'wohnblock_household_simulation_adaptive_35.yaml' #(35 EVs arrivals per week with 50% average battery) 

with open(elvis_input_folder + case, 'r') as file:
    yaml_str = yaml.full_load(file)

elvis_config_file = ScenarioConfig.from_yaml(yaml_str)
VPP_config_file = VPP_Scenario_config(yaml_str)

print(elvis_config_file)
print(VPP_config_file)

Vehicle types: <generator object ScenarioConfig.__str__.<locals>.<genexpr> at 0x7f7e7cf793d0>Mean parking time: 23.99
Std deviation of parking time: 1
Mean value of the SOC distribution: 0.5
Std deviation of the SOC distribution: 0.1
Max parking time: 24
Number of charging events per week: 20
Vehicles are disconnected only depending on their parking time
Queue length: 0
Opening hours: None
Scheduling policy: Uncontrolled

{'start_date': '2022-01-01T00:00:00', 'end_date': '2023-01-01T00:00:00', 'resolution': '0:15:00', 'num_households': 4, 'solar_power': 16, 'wind_power': 12, 'EV_types': [{'battery': {'capacity': 100, 'efficiency': 1, 'max_charge_power': 150, 'min_charge_power': 0}, 'brand': 'Tesla', 'model': 'Model S', 'probability': 1}], 'charging_stations_n': 4, 'EVs_n': 20, 'EVs_n_max': 1044, 'mean_park': 23.99, 'std_deviation_park': 1, 'EVs_mean_soc': 50.0, 'EVs_std_deviation_soc': 10.0, 'EV_load_max': 44, 'EV_load_rated': 14.8, 'EV_load_min': 1, 'houseRWload_max': 10, 'av_max_ener

In [6]:
#TESTING Environment initialization
env = VPPEnv(VPP_testing_data_input_path, elvis_config_file, VPP_config_file)
env.plot_VPP_input_data()

Output hidden; open in https://colab.research.google.com to view.

In [7]:
env.plot_ELVIS_data()

In [8]:
#Function to check custom environment and output additional warnings if needed
check_env(env)
env.plot_reward_functions()

- ELVIS.Simulation (Av.EV_SOC=  50.0 %):
 Sum_Energy=kWh  21102.47 , Grid_used_en=kWh  38321.66 , RE-to-vehicle_unused_en=kWh  17219.18 , Total_selling_cost=€  859.31 , Grid_cost=€  1358.71 , Av.EV_en_left=kWh  100.0 , Charging_events=  1043 
- Exp.VPP_goals: Grid_used_en=kWh 0, RE-to-vehicle_unused_en=kWh 0, Grid_cost=€ 0 , Av.EV_en_left=kWh  75.08
Simulating VPP....


## VPP Simulation **testing** dataset with random actions [no model loaded]


In [9]:
episodes = 1
for episode in range(1, episodes+1):
    state = env.reset()
    done = False
    score = 0
    while not done:
        action_masks = get_action_masks(env)
        action = env.action_space.sample()
        
        n_state, reward, done, info = env.step(action)
        score+=reward
    print('Episode:{} Score:{}'.format(episode, score))

VPP_table = env.VPP_table
env.plot_VPP_energies()

Output hidden; open in https://colab.research.google.com to view.

In [10]:
VPP_table.head(15000)

Unnamed: 0_level_0,0,1,2,3,EVs_id,actions,mask_truth,ev_charged_pwr,ev_discharged_pwr,load,load_reward,EV_reward,rewards
time,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1
2022-01-01 00:00:00,0.000000,0.0,0.000000,0.000000,"[0, 0, 0, 0]","[2, 1, 1, 2]","[False, False, False, False]",0.000000,0.0,1.887455,-1.846382,0.0,-1.846382
2022-01-01 00:15:00,0.000000,0.0,0.000000,0.000000,"[0, 0, 0, 0]","[2, 2, 2, 1]","[False, False, False, False]",0.000000,0.0,1.607829,-1.275764,0.0,-1.275764
2022-01-01 00:30:00,0.000000,0.0,0.000000,0.000000,"[0, 0, 0, 0]","[1, 2, 1, 2]","[False, False, False, False]",0.000000,0.0,1.265459,-2.457114,0.0,-2.457114
2022-01-01 00:45:00,0.000000,0.0,0.000000,0.000000,"[0, 0, 0, 0]","[0, 0, 0, 1]","[True, True, True, False]",0.000000,0.0,1.974268,-1.336535,0.0,-1.336535
2022-01-01 01:00:00,0.000000,0.0,0.000000,0.000000,"[0, 0, 0, 0]","[1, 0, 0, 1]","[False, True, True, False]",0.000000,0.0,1.301921,-1.897345,0.0,-1.897345
...,...,...,...,...,...,...,...,...,...,...,...,...,...
2022-06-06 04:45:00,60.807125,0.0,0.000000,65.175133,"[2570, 0, 0, 2571]","[0, 2, 2, 2]","[False, False, False, False]",5.677725,0.0,-2.838862,-15.172386,0.0,-15.172386
2022-06-06 05:00:00,60.807125,0.0,0.000000,64.250130,"[2570, 0, 0, 2571]","[0, 2, 2, 2]","[False, False, False, False]",0.000000,-3.7,-15.172386,-16.200668,0.0,-16.200668
2022-06-06 05:15:00,60.807125,0.0,47.998642,63.325130,"[2570, 0, 2572, 2571]","[1, 0, 0, 2]","[True, True, False, False]",0.000000,-3.7,-16.200668,-2.050921,0.0,-2.050921
2022-06-06 05:30:00,63.557125,0.0,47.998642,63.075130,"[2570, 0, 2572, 2571]","[0, 2, 1, 2]","[True, False, True, False]",11.000000,-1.0,-2.230553,-2.057899,0.0,-2.057899


In [11]:
env.plot_Elvis_results()

Output hidden; open in https://colab.research.google.com to view.

In [12]:
env.plot_VPP_results()

Output hidden; open in https://colab.research.google.com to view.

In [13]:
env.plot_VPP_supply_demand()

Output hidden; open in https://colab.research.google.com to view.

In [14]:
env.plot_VPP_Elvis_comparison()

In [15]:
env.plot_rewards_results()

Output hidden; open in https://colab.research.google.com to view.

In [16]:
env.plot_rewards_stats()

In [17]:
env.plot_EVs_kpi()

In [18]:
env.plot_actions_kpi()

In [19]:
env.plot_load_kpi()

In [20]:
env.plot_yearly_load_log()

Output hidden; open in https://colab.research.google.com to view.

## Validating dataset VPP Simulation [no model loaded]

In [21]:
#VALIDATING Environment initialization
env = VPPEnv(VPP_validating_data_input_path, elvis_config_file, VPP_config_file)

Charging event: 3130, Arrival time: 2022-01-01 09:15:00, Parking_time: 24, Leaving_time: 2022-01-02 09:15:00, SOC: 0.5013878707353744, SOC target: 1.0, Connected car: Tesla, Model S 
 ... 
 Charging event: 4172, Arrival time: 2022-12-31 17:45:00, Parking_time: 23.037323903519784, Leaving_time: 2023-01-01 16:47:14.366053, SOC: 0.6214034005373177, SOC target: 1.0, Connected car: Tesla, Model S 

-DATASET: House&RW_energy_sum=kWh  -30085.39 , Grid_used_en=kWh  2136.67 , RE-to-vehicle_unused_en=kWh  -32222.06 , Total_selling_cost=€  -1187.15 , Grid_cost=€  113.34
- ELVIS.Simulation (Av.EV_SOC=  50.0 %):
 Sum_Energy=kWh  13003.47 , Grid_used_en=kWh  34535.56 , RE-to-vehicle_unused_en=kWh  21532.09 , Total_selling_cost=€  619.13 , Grid_cost=€  1502.09 , Charging_events=  1043 
- Exp.VPP_goals: Grid_used_en=kWh 0, RE-to-vehicle_unused_en=kWh 0, Grid_cost=€ 0 , Av.EV_en_left=kWh  80.89


In [22]:
#Function to check custom environment and output additional warnings if needed
check_env(env)
plot_VPP_input_data = env.plot_VPP_input_data()
plot_VPP_input_data.show()

Output hidden; open in https://colab.research.google.com to view.

In [23]:
episodes = 1
for episode in range(1, episodes+1):
    state = env.reset()
    done = False
    score = 0
    while not done:
        action_masks = get_action_masks(env)
        action = env.action_space.sample()
        
        n_state, reward, done, info = env.step(action)
        score+=reward
    print('Episode:{} Score:{}'.format(episode, score))

VPP_table = env.VPP_table
env.plot_VPP_energies()

Output hidden; open in https://colab.research.google.com to view.

In [24]:
VPP_table.head(15000)

Unnamed: 0_level_0,0,1,2,3,EVs_id,actions,mask_truth,ev_charged_pwr,ev_discharged_pwr,load,load_reward,EV_reward,rewards
time,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1
2022-01-01 00:00:00,0.0,0.0,0.0,0.0,"[0, 0, 0, 0]","[0, 2, 1, 1]","[True, False, False, False]",0.0,0.0,-3.284219,-5.129365,0.0,-5.129365
2022-01-01 00:15:00,0.0,0.0,0.0,0.0,"[0, 0, 0, 0]","[1, 0, 0, 2]","[False, True, True, False]",0.0,0.0,-4.142302,-4.921850,0.0,-4.921850
2022-01-01 00:30:00,0.0,0.0,0.0,0.0,"[0, 0, 0, 0]","[1, 0, 2, 0]","[False, True, False, True]",0.0,0.0,-3.953110,-5.359100,0.0,-5.359100
2022-01-01 00:45:00,0.0,0.0,0.0,0.0,"[0, 0, 0, 0]","[0, 2, 2, 0]","[True, False, False, True]",0.0,0.0,-4.395010,-5.053973,0.0,-5.053973
2022-01-01 01:00:00,0.0,0.0,0.0,0.0,"[0, 0, 0, 0]","[2, 2, 0, 2]","[False, False, True, False]",0.0,0.0,-4.059370,-5.052612,0.0,-5.052612
...,...,...,...,...,...,...,...,...,...,...,...,...,...
2022-06-06 04:45:00,0.0,0.0,0.0,0.0,"[0, 0, 0, 0]","[2, 0, 2, 1]","[False, True, False, False]",0.0,0.0,-11.893923,-14.472662,0.0,-14.472662
2022-06-06 05:00:00,0.0,0.0,0.0,0.0,"[0, 0, 0, 0]","[2, 1, 1, 1]","[False, False, False, False]",0.0,0.0,-14.419929,-14.684643,0.0,-14.684643
2022-06-06 05:15:00,0.0,0.0,0.0,0.0,"[0, 0, 0, 0]","[2, 0, 0, 0]","[False, True, True, True]",0.0,0.0,-14.653108,-14.041720,0.0,-14.041720
2022-06-06 05:30:00,0.0,0.0,0.0,0.0,"[0, 0, 0, 0]","[0, 0, 1, 0]","[True, True, False, True]",0.0,0.0,-13.945892,-13.520661,0.0,-13.520661


In [25]:
env.plot_Elvis_results()

Output hidden; open in https://colab.research.google.com to view.

In [26]:
env.plot_VPP_results()


Output hidden; open in https://colab.research.google.com to view.

In [27]:
env.plot_VPP_supply_demand()

Output hidden; open in https://colab.research.google.com to view.

In [28]:
env.plot_VPP_Elvis_comparison()

In [29]:
env.plot_rewards_results()

Output hidden; open in https://colab.research.google.com to view.

In [30]:
env.plot_rewards_stats()

In [31]:
env.plot_EVs_kpi()

In [32]:
env.plot_load_kpi()

In [33]:
env.plot_yearly_load_log()

Output hidden; open in https://colab.research.google.com to view.

## Training dataset VPP Simulation [no model loaded]

In [34]:
#TRAINING Environment initialization
env = VPPEnv(VPP_training_data_input_path, elvis_config_file, VPP_config_file)

Charging event: 6259, Arrival time: 2022-01-01 00:45:00, Parking_time: 24, Leaving_time: 2022-01-02 00:45:00, SOC: 0.5177522526438694, SOC target: 1.0, Connected car: Tesla, Model S 
 ... 
 Charging event: 7301, Arrival time: 2022-12-30 20:15:00, Parking_time: 24, Leaving_time: 2022-12-31 20:15:00, SOC: 0.5687094455239023, SOC target: 1.0, Connected car: Tesla, Model S 

-DATASET: House&RW_energy_sum=kWh  -34117.7 , Grid_used_en=kWh  1556.25 , RE-to-vehicle_unused_en=kWh  -35673.95 , Total_selling_cost=€  -1196.64 , Grid_cost=€  97.86
- ELVIS.Simulation (Av.EV_SOC=  50.0 %):
 Sum_Energy=kWh  8217.79 , Grid_used_en=kWh  31697.31 , RE-to-vehicle_unused_en=kWh  23479.51 , Total_selling_cost=€  454.65 , Grid_cost=€  1320.03 , Charging_events=  1043 
- Exp.VPP_goals: Grid_used_en=kWh 0, RE-to-vehicle_unused_en=kWh 0, Grid_cost=€ 0 , Av.EV_en_left=kWh  84.2


In [35]:
#Function to check custom environment and output additional warnings if needed
check_env(env)
plot_VPP_input_data = env.plot_VPP_input_data()
plot_VPP_input_data.show()

Output hidden; open in https://colab.research.google.com to view.

In [36]:
episodes = 1
for episode in range(1, episodes+1):
    state = env.reset()
    done = False
    score = 0
    while not done:
        action_masks = get_action_masks(env)
        action = env.action_space.sample()
        
        n_state, reward, done, info = env.step(action)
        score+=reward
    print('Episode:{} Score:{}'.format(episode, score))

VPP_table = env.VPP_table
env.plot_VPP_energies()

Output hidden; open in https://colab.research.google.com to view.

In [37]:
VPP_table.head(14995)

Unnamed: 0_level_0,0,1,2,3,EVs_id,actions,mask_truth,ev_charged_pwr,ev_discharged_pwr,load,load_reward,EV_reward,rewards
time,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1
2022-01-01 00:00:00,0.000000,0.0,0.0,0.000000,"[0, 0, 0, 0]","[0, 2, 2, 1]","[True, False, False, False]",0.000000,0.0,-2.631544,-4.007342,0.0,-4.007342
2022-01-01 00:15:00,0.000000,0.0,0.0,0.000000,"[0, 0, 0, 0]","[1, 2, 2, 1]","[False, False, False, False]",0.000000,0.0,-3.404405,-4.402765,0.0,-4.402765
2022-01-01 00:30:00,0.000000,0.0,0.0,0.000000,"[0, 0, 0, 0]","[2, 2, 2, 0]","[False, False, False, True]",0.000000,0.0,-3.641659,-3.804908,0.0,-3.804908
2022-01-01 00:45:00,0.000000,0.0,0.0,0.000000,"[0, 0, 0, 0]","[1, 2, 2, 1]","[False, False, False, False]",0.000000,0.0,-3.282945,-6.886972,0.0,-6.886972
2022-01-01 01:00:00,0.000000,0.0,0.0,0.000000,"[0, 0, 0, 0]","[0, 1, 0, 0]","[True, False, True, True]",0.000000,0.0,-6.075669,-5.793399,0.0,-5.793399
...,...,...,...,...,...,...,...,...,...,...,...,...,...
2022-06-06 03:30:00,50.557007,0.0,0.0,45.427555,"[8771, 0, 0, 8770]","[2, 1, 2, 1]","[False, False, False, True]",0.000000,-1.0,-5.136722,-2.080486,0.0,-2.080486
2022-06-06 03:45:00,50.307007,0.0,0.0,45.989628,"[8771, 0, 0, 8770]","[2, 1, 0, 2]","[False, False, True, False]",2.248291,-1.0,-2.248291,-9.552344,0.0,-9.552344
2022-06-06 04:00:00,50.057007,0.0,0.0,45.739628,"[8771, 0, 0, 8770]","[1, 2, 2, 2]","[True, False, False, False]",0.000000,-2.0,-9.007579,15.000000,0.0,15.000000
2022-06-06 04:15:00,51.878990,0.0,0.0,45.489628,"[8771, 0, 0, 8770]","[0, 1, 0, 0]","[True, False, True, False]",7.287936,-1.0,0.000000,-6.720751,0.0,-6.720751


In [38]:
env.plot_Elvis_results()

Output hidden; open in https://colab.research.google.com to view.

In [39]:
env.plot_VPP_results()

Output hidden; open in https://colab.research.google.com to view.

In [40]:
env.plot_VPP_supply_demand()

Output hidden; open in https://colab.research.google.com to view.

In [41]:
env.plot_VPP_Elvis_comparison()

In [42]:
env.plot_rewards_results()

Output hidden; open in https://colab.research.google.com to view.

In [43]:
env.plot_rewards_stats()

In [44]:
env.plot_EVs_kpi()

In [45]:
env.plot_actions_kpi()

In [46]:
env.plot_load_kpi()

In [47]:
env.plot_yearly_load_log()

Output hidden; open in https://colab.research.google.com to view.

In [48]:
#env.close()