# Scenario 1: Solving Battery Arbitrage Problem (Price and Carbon Optimization)

Our partner, a global energy provider is deploying Li-ion batteries at their consumer premises to either maximize economic profit (increase cost savings by charging/discharging the battery at off-peak/peak price periods) or maximize environmental impact (reduce overall carbon footprint by charging/discharging the battery based on usage of renewable/non-renewable energy sources). 

Thus, it is important to optimize the management and scheduling of these batteries (when to charge or discharge) to maximize the objective.

<img src='../_static/energy_arbitrage.png' height = 200 width = 600>

## Step 1 : Import the libraries:

Here, first we import all the general as well as encortex based abstractions necessary to solve the problem statement.

In [1]:
# import all the required general libraries
import os
import numpy as np
import pandas as pd
import seaborn as sns
sns.set() 
import typing as t 
import gym
from gym import spaces
import pytorch_lightning as pl
import matplotlib.pyplot as plt
from IPython.core.interactiveshell import InteractiveShell
from IPython.display import display, Markdown, clear_output
import ipywidgets as widgets
from itertools import repeat
InteractiveShell.ast_node_interactivity="all"

#import the encortex library and all the required dependencies
from encortex.backend import DFBackend
from encortex.env import EnCortexEnv
from encortex.logger import get_experiment_logger
from encortex.utils.data_loaders import load_data
from encortex.data import MarketData
from encortex.contract import Contract
from encortex.decision_unit import DecisionUnit
from encortex.grid import Grid
from encortex.sources import Battery, BatteryAction

from dfdb import create_in_memory_db

from encortex.environments import BatteryArbitrageScenarioEnv
from encortex.optimizers import DRLBattOpt, MILPBattOpt, SimulatedAnnealingOpt
from encortex.datasets.grid import CaliforniaPricesEmissionsData, UKPricesEmissionsData

## Step 2: Inputs from the User 

Next, we present certain configurable parameters, that the user can tweak and experiment to improve the performance for the scenario

1. Optimization Algorithms :  We support multiple algorithms such as ,
    - Mixed Integer Linear Programming (MILP)
    - Simulated Annealing (SA)
    - Deep Reinforcement Learning (DRL)

    The user can use the following flags to specify the type of algorithm to be used and mention the solver name to run the optimization. 
    
    The following cell shows how to run Reinforcement Learning. Deep Q-Networks (dqn) is used for the problem statement given here. We support multiple other reinforcement learning algorithms like :  Advantage Actor Critic (a2c), Proximal Policy Optimization (PPO) and so on. Therefore the respective solver names to be used are :  "dqn", "a2c", "PPO". Check for all the optimizers that can be used from [here](../encortex/encortex.optimizers.battery_arbitrage_optimizer.rst).

    There are various solvers which can be used for MILP. We support : OR-Tools ("ort"), Gurobi ("grb"), Cplex ("cpx"), CyLP ("clp"), ECOS ("eco"), MOSEK ("msk") and so on. We recommend using OR-Tools as a free open source solver producing similar reproducible correct solution. Gurobi is the other recommended solver which although commercial, takes lesser solving time to produce similar result. 

    Simulated Annealing doesnot require any solver.

In [2]:
#specify the type of optimization to be used:
milp_flag = False 
simulated_annealing_flag = False
solver = ["dqn"] #the algorithm to be used

# # To use MILP:  
# milp_flag = True 
# simulated_annealing_flag = False
# solver = ["grb"] #the algorithm to be used

# # To use SA:
# milp_flag = False 
# simulated_annealing_flag = True
# solver = [""] #the algorithm to be used

2. Selection of Objectives: An user can choose to optimize for any combination of the following objectives by providing the relative importance weights as a float value:
    - Carbon Optimization
    - Price Optimization

    For example, in the following cell, equal importance has been given to emission and price values, thus the algorithms will optimize for both the objectives leading to optimal schedules that the energy operator can take by which both increasing profits and reducing carbon footprints can be taken care of.

    Because of the high variability in the data, it is not always intuitive to provide equal importance to both emissions and prices so as to lead to optimal savings. For this, we use Pareto Optimization curves to come to a single point of optimality, as discussed in our Paper.
    
    Since batteries perform a limited number of cycles during their lifetime, we consider an accurate battery degradation model to model the battery's lifetime. Hence, the Degradation importance weightage is also provided in addition to the above. 

In [3]:
#provide optimization weights for the objectives
weight_emission = 1.0
weight_price = 1.0
weight_degradation = 0.0

# # Cost Optimization
# weight_emission = 0.0
# weight_price = 1.0
# weight_degradation = 0.0

# # Carbon Optimization
# weight_emission = 1.0
# weight_price = 0.0
# weight_degradation = 0.0

# # Using Degradation + joint optimization
# weight_emission = 1.16
# weight_price = 1.0
# weight_degradation = 1.0

3. Battery Configurations: An user can run several experiments by tweaking the battery configurations. Following are the battery configurations which are left to user for configurable inputs:

    - storage_capacity : the battery capacity (in kWh)
    - efficiency : here, charging and discharging efficiency (in %) taken the same/ if different take it differently 
    - depth_of_discharge : the maximum discharge (in %) percentage that can happen at a time, here 90%
    - soc_minimum : the minimum state of charge of the battery, below which the battery should not be explored
    - timestep : battery decision time steps
    - degradation_flag : whether to have degradation model in place or not for the batteries 
    - min_battery_capacity_factor : the battery capacity reduction percentage due to degradation, below which if capacity reduces due to overuse, battery doesnot stay at good optimal health
    - battery_cost_per_kWh : the battery replacement cost (in $/kWh)
    - reduction_coefficient : after every charge-discharge cycles over a certain period, the battery capacity reduced by the reduction coefficienct 
    - degradation_period_in_days : the period after which battery degrades
    - action : battery actions
    - soc_initial : initial state of charge of the battery to run the test experiments
    - test_flag : the flag initiates random initial state of charge of the battery during training runs/experiments to avoid overfitting


In [4]:
'''
run experiments for the following battery configurations
elements in the list denote different batteries/battery configurations to be used in the scenario together (here just 1 element indicating 1 battery being used)
'''
storage_capacity = [10.]
efficiency=[1.]
depth_of_discharge = [90.] 
soc_minimum = [0.1] 
timestep = [np.timedelta64("1","h")]
degradation_flag = weight_degradation > 0 
min_battery_capacity_factor = [0.8] 
battery_cost_per_kWh = [200.] 
reduction_coefficient = [0.99998] 
degradation_period_in_days = [7.] 
action = [BatteryAction("CHARGE_IDLE_DISCHARGE","actions of the battery",spaces.Discrete(3),True,)] 

soc_initial = [0.5] 
test_flag = [False] 

'\nrun experiments for the following battery configurations\nelements in the list denote different batteries/battery configurations to be used in the scenario together (here just 1 element indicating 1 battery being used)\n'

## Step 3: Instantiating Objects of the required abstractions from the framework:

#### Here, the energy operator determines the entities involved in the scenario and uses the framework provided abstractions for the same. Following are the two entities used here:

1. Battery Entity : 
We inherit the storage class to define a Li-ion battery entity. In this scenario, we define three battery actions: charge at max rate, discharge at max rate or stay idle. The energy operator populates the parameter values based on their battery configuration and instantiate an EnCortex-Battery object.

In [5]:
'''
instantiate battery objects into a list based on the no. of batteries/elements provided in the list of configuration parameters 
'''
batteries = []
for ele in range(len(storage_capacity)):
    battery = Battery(
        timestep=timestep,
        name="Li-Ion Battery",
        id=ele,
        description="Li-Ion Battery",
        storage_capacity=storage_capacity[ele],
        charging_efficiency=efficiency[ele],
        discharging_efficiency=efficiency[ele],
        soc_initial=soc_initial[ele],
        depth_of_discharge=depth_of_discharge[ele],
        soc_minimum=soc_minimum[ele],
        degradation_flag=degradation_flag,
        min_battery_capacity_factor=min_battery_capacity_factor[ele],
        battery_cost_per_kWh=battery_cost_per_kWh[ele],
        reduction_coefficient=reduction_coefficient[ele],
        degradation_period=(degradation_period_in_days[ele] * 24 * (np.timedelta64(60, 'm') / timestep[0])).astype(np.int),
        test_flag=test_flag[ele],
        action=action[ele],
    )
    batteries.append(battery)

'\ninstantiate battery objects into a list based on the no. of batteries/elements provided in the list of configuration parameters \n'

Deprecated in NumPy 1.20; for more details and guidance: https://numpy.org/devdocs/release/1.20.0-notes.html#deprecations
  degradation_period=(degradation_period_in_days[ele] * 24 * (np.timedelta64(60, 'm') / timestep[0])).astype(np.int),


2. Simplified real-time market entity as Grid: Since Energy Arbitrage does not require any bidding decisions in the market, we modify the real-time market entity to a simplified real-time market entity that captures just the real-time market prices along with the carbon footprint information. This shows the utility of our abstractions, which allow seamless modification/extension of the definitions based on the scenario. Check into the encortex references to know more about the argument details.


Now, this entity requires loading data. There are two ways of using the data:

- __Download data__ from any public source (here, we share an onedrive [link](https://microsoftapc-my.sharepoint.com/:f:/g/personal/t-vballoli_microsoft_com/Evd4JIo7F4hFjI9Y_MJPVEYBYc4iP2i-OND1gfoCx3xiIQ?e=hhzg4M) to show the functionality of the same), create a folder named data and add your train.csv and test.csv files for both forecast and actual data.   

- Using __Data Loaders__ of Encortex: We have some publicly available data support in the framework.(The commented section shows the use of Data Loaders here). Also, one needs to replace all the forecast_df and actual_df with forecast_df.data and actual_df.data here. The existing data loaders takes in 3 user-specific arguments:
    - train: A flag saying whether training/test data to load
    - forecasts: A flag saying whether experiments are to be run on forecasts/actuals
    - forecast_type: A string specifyng the type of forecast:
        - noise : Adding noise to the actual values and treating that as forecasts
        - smoothing : Smoothing the actual values and using that as forecasts
        - yesterdays : assuming yesterday's actual data as forecasts for today's data
        - meanprev : assuming mean of previous n days as forecasts for today's data (default)
        - lgbm : using light gradient boosting machine to produce forecasts
        - nbeats: using nbeats model to produce forecasts
        - auto : if forecasts already available load that instead



In [6]:
'''
training data is not required for MILP and Simulated Annealing, if working on RL training data is required, read the data from the dataloader, 
parse it to the backend of MArketData and feed it to the Grid class to instantiate a Grid object for training
'''

forecast_df = pd.read_csv("data/UK_data_2018_2019_meanprev.csv")
actual_df = pd.read_csv("data/UK_data_2018_2019_actuals.csv")

# forecast_df = UKPricesEmissionsData(train=True, forecasts=True, forecast_type="meanprev")
# actual_df = UKPricesEmissionsData(train=True, forecasts=False)

forecast_df[['emissions', 'prices']]  = forecast_df[['emissions', 'prices']].apply(lambda x: np.float32(x))
actual_df[['emissions', 'prices']]  = actual_df[['emissions', 'prices']].apply(lambda x: np.float32(x))

#parse the training data to the MarketData backend
grid_data = MarketData.parse_backend(
    entity_id = len(storage_capacity) + 2,
    in_memory = True,
    market_id = len(storage_capacity) + 2,
    entity_forecasting_id = len(storage_capacity) + 2,
    timestep = np.timedelta64("5", "m"),
    price_forecast=DFBackend(forecast_df['prices'], forecast_df['timestamps']),
    
    price_actual=DFBackend(actual_df['prices'], actual_df['timestamps']),
    carbon_emissions_forecast=DFBackend(forecast_df['emissions'], forecast_df['timestamps']),
    carbon_emissions_actual=DFBackend(actual_df['emissions'], actual_df['timestamps']),
    carbon_prices_forecast=DFBackend(forecast_df['prices'], forecast_df['timestamps']),
    carbon_prices_actual=DFBackend(actual_df['prices'], actual_df['timestamps']),
    volume_forecast=DFBackend(None, None),
    volume_actual=DFBackend(None, None),
)

#instantiate a training grid Object by feeding in the parse training data as an argument to the Grid class
grid = Grid(
    timestep=timestep,
    name = "Grid",
    id = len(storage_capacity) + 2,
    description = "Simple Market as a Grid",
    bid_start_time_schedule = "*/5 * * * *",
    bid_window = np.timedelta64(5, "m"),
    commit_start_schedule = np.timedelta64(5, "m"),
    commit_end_schedule = np.timedelta64(5, "m"),
    data = grid_data,
)

'\ntraining data is not required for MILP and Simulated Annealing, if working on RL training data is required, read the data from the dataloader, \nparse it to the backend of MArketData and feed it to the Grid class to instantiate a Grid object for training\n'

## Step 4: Creating Decision units for the problem statement:

Decision units are built based on the entities and contracts associated with a particular producer. Contracts define the flow of energy between 2 entities in the framework. We use a graph representation of entities as nodes and contracts as edges to identify decision units. A decision unit generates critical information on the schedule and the associated actions based on the included contracts/entities.

Here, for this scenario, contracts are between the grid and the batteries installed near to the consumer, and the decision unit is built on top of it.

In [7]:
#Formulate the decision unit (both for training and testing) by creating contracts between the grid and the battery
def creating_decision_units(batteries, grid, forecast_df):
    contracts = []
    for battery in batteries:
        contracts.append(Contract(grid,battery))
    decision_unit = DecisionUnit(contracts)
    decision_unit.generate_schedule(
        current_reference_time=np.datetime64(pd.Timestamp(forecast_df['timestamps'][0]))
    )
    return decision_unit

decision_unit = creating_decision_units(batteries, grid, forecast_df)

## Step 5: Function to Store Results in a dataframe and then later to csv format:

Dump results into dataframes for later visualizations - 
- battery_soc_list: stores the current list of state of charge values for MILP
- action_list : list of actions: charging/discharging/idle taken by the optimizer for a step
- power_list: power associated with the action taken by the optimization algorithm
- carbon_intensity_list: Actual carbon emission intensity values in gCO2eq/kWh
- price_intensity_list: Actual price values in $
- reward_list: List of rewards received for taking actions in particular states
- carbon_savings_forecast_list : Carbon savings done due to the action taken for a particular state at a certain timestamp
- price_savings_forecast_list : Price savings due to the action taken for a particular state at a certain timestamp 

In [8]:
def create_dataframe(
    battery_soc_list, 
    action_list, 
    power_list, 
    carbon_intensity_list,
    price_intensity_list,
    carbon_savings_list,
    price_savings_list,
    reward_list, 
    carbon_savings_forecast_list, 
    price_savings_forecast_list
):
    df=pd.DataFrame()
    df.insert(loc=0, column='Current_SOC', value=battery_soc_list)
    df.insert(loc=1, column='Predicted_Action', value=action_list)
    df.insert(loc=2, column='Predicted_Power_Action', value=power_list)
    df.insert(loc=3, column='Carbon_emissions', value=carbon_intensity_list)
    df.insert(loc=4, column='Carbon_savings', value=carbon_savings_list)
    df.insert(loc=5, column='Forecast_Carbon_savings', value=carbon_savings_forecast_list)
    df.insert(loc=6, column='Price_emissions', value=price_intensity_list)
    df.insert(loc=7, column='Price_savings', value=price_savings_list)
    df.insert(loc=8, column='Forecast_Price_savings', value=price_savings_forecast_list)
    df.insert(loc=9, column='Reward', value=reward_list)  
    return df

#store the results into results folder
country = "UK" # change it to the respective country name based on the grid's price/emissions data
dir = os.getcwd()
dir_path = os.path.join(dir, f'results_{country}/')

if not os.path.isdir(dir_path):
    os.mkdir(dir_path)


## Step 6: Instantiate the environment object from the scenario specific enironment class

Environment forms a key leyer in the EnCortex architecture to provide data and state information (state space) from entities that are needed to make a decision (action space) and a central point to orchestrate all the required decisions based on the schedule. 

EnCortex supports some of the common scenario based environments which can be easily extended to other similar custom scenarios by the energy operators. BatteryArbitrageScenarioEnv is one of the supported environments by EnCortex. Check here to know more details. The step_time_difference is another user-configurable parameter which says about the optimization step to be taken. For MILP, the optimum result comes when the step time difference is set similar to the timestep parameter. For Reinforcement Learning, it is a mandate to set it equal to the timestep else the action size increases which leads to errorneous learning by the agents.

In [None]:
'''
Instantiate an environment oject from the scenario specific environment class 
'''
if milp_flag or simulated_annealing_flag:
    step_time_diff = np.timedelta64("1", "D")
else:
    step_time_diff = np.timedelta64("1", "h")

if simulated_annealing_flag:
    continuous = True
else:
    continuous = False

env = BatteryArbitrageScenarioEnv(
    decision_unit,
    start_time=forecast_df['timestamps'][0],
    timestep=np.timedelta64("1", "h"),
    step_time_difference=step_time_diff,
    horizon=np.timedelta64("1", "D"),
    seed=40,
    weight_emission=weight_emission,
    weight_degradation=weight_degradation,
    weight_price=weight_price,
    logging_interval = 1,
    continuous=continuous
)


## Step 7: Training Pipeline for the algorithms:

Training is only for RL and testing code can be split into 3 sections of RL, MILP, SA.

First of all, set the seed. This helps reproucing the results in the same machine, but still across different machine it doesnot guarantee to produce same result. The extreme noisy learning pattern, large amount of hyperparameter tuning, unpredictability and unexplainability of the RL agents add to the demerits of the algorithm.

In [10]:
#setting a seed for reproducibility of experiment results:
pl.seed_everything(40)
seed = 40

Global seed set to 40


40

Training in RL begins here, where we instantiate the DRLBattOpt based optimizer object and then save the best trained model to automatically created model_checkpoints folder for later usage.

In [11]:
'''
Training Pipeline for RL:
The code in this cell helps to train a RL model, but there could be issues in trying it out in the jupyter cell. 
Hence we also provide a separate training script namely training_RL.py, try that out if the jupyter cell doesnot work.

During testing just the load the model saved from the training script
'''
if not (milp_flag or simulated_annealing_flag):

    #instantiate an optimizer object based on the optimizer chosen, and create the model
    opt = DRLBattOpt(
        env=env,
        seed=seed,
    )
    
    print("...... Starting Training .......")
    if not os.path.exists(f"model_checkpoints/best_model.zip"):
        model = opt(
            env,
            train_flag=True,
            path = 'model_checkpoints'

        )
    print("------Training Completed--------")

'\nTraining Pipeline for RL:\nThe code in this cell helps to train a RL model, but there could be issues in trying it out in the jupyter cell. \nHence we also provide a separate training script namely training_RL.py, try that out if the jupyter cell doesnot work.\n\nDuring testing just the load the model saved from the training script\n'

Global seed set to 40


Using cpu device
Wrapping the env with a `Monitor` wrapper
Wrapping the env in a DummyVecEnv.
...... Starting Training .......
------Training Completed--------


Mixed integer linear programming (MILP) and Simulated Annealing doesnot require training, hence the code for the MILP and SA section is shown directly while testing

## Step 8: Testing Pipeline for the algorithms :

Following is a sample general testing function which is robust to all the optimizers supported by the framework which returns the rewards accumulated and other savings specific variables to help in visualization.

In [12]:
#The test pipeline:
def testing(model, env: BatteryArbitrageScenarioEnv, opt, milp_flag: bool, simulated_annealing:bool):
    
    #for logging results into csv files
    carbon_intensity_list=[]
    price_intensity_list=[]
    battery_soc_list=[]
    action_list=[]
    power_list=[]
    reward_list=[]
    carbon_savings_list=[]
    price_savings_list=[]
    carbon_savings_forecast_list=[]
    price_savings_forecast_list=[]
    
    #For testing initiate the battery with 50% charge always (initial state of charge of the battery during test experiments : 0.5)
    for batt in env.decision_unit.storage_entities:
        batt: Battery
        batt.current_soc = 0.5
        batt.test = True
        
    #Reset the environment in the beginning and get the state values
    state = env.reset()
    done = env.is_done
    net_reward = 0
    steps=0
    
    #run the episode unless done
    while not done:
        print("------------------------------------------------------")
        print("State: ", len(state))
        print("Step no : ", steps)
        print("Battery SOC : ", env.decision_unit.storage_entities[0].current_soc)
        #storing the actual values of emissions & prices in the list for logging purpose
        for grid in env.decision_unit.markets:
            grid:Grid
            carbon_intensity_list+=list(grid.data.carbon_emissions_actual[env.time, env.time+env.step_time_difference].reshape(-1))
            price_intensity_list+=list(grid.data.carbon_prices_actual[env.time, env.time+env.step_time_difference].reshape(-1))
         
        #storing the state of charge of the batteries in a list for logging purpose:
        for batt in env.decision_unit.storage_entities:
            batt: Battery
            battery_soc_list+=[batt.current_soc]
             
        if milp_flag or simulated_annealing:
            #In MILP first the values are passed as a decision variable/ Affine Expression - the train flag signifies that
            env.train_flag = True
            
            #model called to solve the objective defined in the environment based on the constraints from the framework abstractions
            model = opt(train_flag=True)
            battery_actions = opt.predict(env)
            
            #get the numeric action values as the predicted action results and hence switch off the train flag
            env.train_flag = False
            
            for batt in env.decision_unit.storage_entities:
                batt: Battery
                if not milp_flag:
                    action_dict = {}
                    action = np.round(battery_actions*3 - 0.5)
                    action_dict[batt.id] = {"time": env.time, "action": action}
                    battery_actions = env.transform(action_dict)

                action_list+=list(battery_actions[batt.id]['Dt']-battery_actions[batt.id]['Ct']+1)
                power_list+=list((battery_actions[batt.id]['Dt']-battery_actions[batt.id]['Ct'])*batt.max_discharging_power)
        else:
            #In RL, since training is already done, just load the model to predict the actions based on the current state
            battery_actions = model.predict(state)[0]

            #transform the actions similar to the environment-specific action transformation for uniformity
            for batt in env.decision_unit.storage_entities:
                #storing the power values into a list for logging purpose
                power_list.append((battery_actions -1)*batt.max_discharging_power)

            # storing actions taken into a list for logging purpose:
            action_list+=[battery_actions]
        
        #Step to the next_state, based on the actions predicted by the optimizers, getting a reward and indicating whether the episode completed or not
        # print("Battery Actions:", battery_actions)
        next_state, reward, done, info = env.step(battery_actions)
        
        #storing the rest of the state of charges in milp of the batteries in a list for logging purpose:
        if milp_flag or simulated_annealing:
            if env.step_time_difference == env.horizon:
                for batt in env.decision_unit.storage_entities:
                    batt: Battery
                    battery_soc_list+=info[batt.id]['soc_list'][:-1]
        
            #storing reward values in a list for logging purpose:
            reward_list+=[0]*(int(env.step_time_difference / env.timestep)-1)
        reward_list+=[reward]
        
        net_reward += reward
        state = next_state
        print("Total reward", net_reward)
        
        #storing the savings values in the lists for logging purposes
        carbon_savings_list.append(env.carbon_savings_list)
        price_savings_list.append(env.price_savings_list)
        carbon_savings_forecast_list.append(env.carbon_savings_forecast_list)
        price_savings_forecast_list.append(env.price_savings_forecast_list)
        
        steps+=1
    return net_reward, reward_list, battery_soc_list, action_list, power_list, carbon_intensity_list, price_intensity_list, carbon_savings_list[0], price_savings_list[0], carbon_savings_forecast_list[0], price_savings_forecast_list[0]

1. The following code snippet generates results on the training dataset using MILP optimizer.

In [13]:
# Code for MILP to generate results on the training dataset: 
if milp_flag:

    # instantiate an optimizer object based on the optimizer chosen, and create the model
    opt = MILPBattOpt(
        env=env, objective=weight_price, solver=solver[0], seed=seed
    )
    model = opt(train_flag=True)

    # provide the model to the environment so as to prepare the constraints and objectives
    env.set_model(model)

    # test the MILP model
    print("-------Producing results on training set--------")
    net_rewardt, rewardlist, battery_soc_list, action_list, power_list, carbon_intensity_list, price_intensity_list, carbon_savings_list, price_savings_list, carbon_savings_forecast_list, price_savings_forecast_list = testing(
        model, env, opt, milp_flag, simulated_annealing_flag)

    # dump the results into csv file
    traindf = create_dataframe(battery_soc_list, action_list, power_list, carbon_intensity_list, price_intensity_list,
                          carbon_savings_list, price_savings_list, rewardlist, carbon_savings_forecast_list, price_savings_forecast_list)
    traindf.to_csv(dir_path+"traindf_MILP.csv", index=False)

2. The following code snippet generates results on the training dataset using Simulated Annealing Optimizer.

In [14]:
#Code for Simulated Annealing to generate results on the training set:
if simulated_annealing_flag:

    #instantiate an optimizer object based on the optimizer chosen, and create the model
    opt = SimulatedAnnealingOpt(
        env = env, objective= weight_price, seed = seed
    )
    model = opt(train_flag=True)

    #test the Simulated Annealing model
    print("-----------Producing results on training set------------")
    net_rewardt, rewardlist, battery_soc_list, action_list, power_list, carbon_intensity_list, price_intensity_list, carbon_savings_list, price_savings_list, carbon_savings_forecast_list, price_savings_forecast_list = testing(
        model, env, opt, milp_flag, simulated_annealing_flag)

    # dump the results into csv file
    traindf = create_dataframe(battery_soc_list, action_list, power_list, carbon_intensity_list, price_intensity_list,
                          carbon_savings_list, price_savings_list, rewardlist, carbon_savings_forecast_list, price_savings_forecast_list)
    traindf.to_csv(dir_path+"traindf_SA.csv", index=False)

3. Next, we load the saved trained DRL model to generate inference results on the same training dataset.

In [None]:
# Test the Trained RL model on the training data set first:
if not (milp_flag or simulated_annealing_flag): 

    opt = DRLBattOpt(
        env=env,
        seed=seed,
    )

    #load the model first, if trained from the python script training_RL.py 
    opt.load('model_checkpoints/', 'best_model')
    
    
    print("-------Producing results on training set--------")
    #test the RL model on the training set
    net_rewardt,rewardlist,  battery_soc_list, action_list, power_list, carbon_intensity_list, price_intensity_list, carbon_savings_list, price_savings_list, carbon_savings_forecast_list, price_savings_forecast_list = testing(opt.model, env, opt, milp_flag, simulated_annealing_flag)

    # dump the training results into csv file
    traindf= create_dataframe(battery_soc_list, action_list, power_list, carbon_intensity_list, price_intensity_list, carbon_savings_list, price_savings_list,rewardlist,  carbon_savings_forecast_list, price_savings_forecast_list)
    traindf.to_csv(dir_path+"traindf_DRL.csv", index=False)

#### Load Test Data and Reinitialization: 
Now, we need to reinitiailize the grid object with the test dataset and change the respective decision unit contracts from the environment, so as to make the environment test ready!

In [16]:
#read test data having emissions & prices values of dataset
#load forecast and actual data for testing/inference 
forecast_df = pd.read_csv("data/UK_data_2020_meanprev.csv")
actual_df = pd.read_csv("data/UK_data_2020_actuals.csv")

# forecast_df = UKPricesEmissionsData(train=False, forecasts=True) 
# actual_df = UKPricesEmissionsData(train=False, forecasts=False)

forecast_df[['emissions', 'prices']]  = forecast_df[['emissions', 'prices']].apply(lambda x: np.float32(x))
actual_df[['emissions', 'prices']]  = actual_df[['emissions', 'prices']].apply(lambda x: np.float32(x))

#parse the test data to the MarketData backend
grid_data = MarketData.parse_backend(
    len(storage_capacity) + 2,
    True,
    len(storage_capacity) + 2,
    len(storage_capacity) + 2,
    np.timedelta64("5", "m"),
    price_forecast=DFBackend(forecast_df['prices'], forecast_df['timestamps']),
    price_actual=DFBackend(actual_df['prices'], actual_df['timestamps']),
    carbon_emissions_forecast=DFBackend(forecast_df['emissions'], forecast_df['timestamps']),
    carbon_emissions_actual=DFBackend(actual_df['emissions'], actual_df['timestamps']),
    carbon_prices_forecast=DFBackend(forecast_df['prices'], forecast_df['timestamps']),
    carbon_prices_actual=DFBackend(actual_df['prices'], actual_df['timestamps']),
    volume_forecast=DFBackend(None, None),
    volume_actual=DFBackend(None, None),
)

#instantiate a test grid Object by feeding in the parsed test data as an argument to the Grid class
grid = Grid(
    timestep,
    "Grid",
    len(storage_capacity) + 2,
    "Simple Market as a Grid",
    "*/5 * * * *",
    np.timedelta64(5, "m"),
    np.timedelta64(5, "m"),
    np.timedelta64(5, "m"),
    grid_data,
)

#modify the training data to test data and run the inference:
env.decision_unit.markets[0] = grid
env.decision_unit.generate_schedule(
        current_reference_time=np.datetime64(pd.Timestamp(forecast_df['timestamps'][0]))
    )
env.start_time = (pd.Timestamp(forecast_df['timestamps'][0]))
env.decision_unit.storage_entities[0].current_soc = 0.5

{}

1. Similar to the training inference pipeline, we use the same code snippet to generate results on the test dataset using MILP

In [17]:
# generate test results for MILP
# Code for MILP to generate results on the training dataset: 
if milp_flag:

    # instantiate an optimizer object based on the optimizer chosen, and create the model
    opt = MILPBattOpt(
        env=env, objective=weight_price, solver=solver[0], seed=seed
    )
    model = opt(train_flag=True)

    # provide the model to the environment so as to prepare the constraints and objectives
    env.set_model(model)

    # test the MILP model
    print("-------Producing results on test set--------")
    net_rewardt, rewardlist, battery_soc_list, action_list, power_list, carbon_intensity_list, price_intensity_list, carbon_savings_list, price_savings_list, carbon_savings_forecast_list, price_savings_forecast_list = testing(
        model, env, opt, milp_flag, simulated_annealing_flag)

    # dump the results into csv file
    testdf = create_dataframe(battery_soc_list, action_list, power_list, carbon_intensity_list, price_intensity_list,
                          carbon_savings_list, price_savings_list, rewardlist, carbon_savings_forecast_list, price_savings_forecast_list)
    testdf.to_csv(dir_path+"testdf_MILP.csv", index=False)

2. Similar to the training inference pipeline, we use the same code snippet to generate results on the test dataset using SA

In [18]:
# generate test results for Simulated Annealing
# Code for SA to generate results on the testing dataset:

if simulated_annealing_flag:
    
    #instantiate an optimizer object based on the optimizer chosen, and create the model
    opt = SimulatedAnnealingOpt(
            env=env, objective=weight_price, seed=seed
        )
    model = opt(train_flag=True)

    #test the Simulated Annealing model
    print("-------Producing results on testing set--------")
    net_rewardt, rewardlist, battery_soc_list, action_list, power_list, carbon_intensity_list, price_intensity_list, carbon_savings_list, price_savings_list, carbon_savings_forecast_list, price_savings_forecast_list = testing(
        model, env, opt, milp_flag, simulated_annealing_flag)

    # dump the results into csv file
    testdf = create_dataframe(battery_soc_list, action_list, power_list, carbon_intensity_list, price_intensity_list,
                          carbon_savings_list, price_savings_list, rewardlist, carbon_savings_forecast_list, price_savings_forecast_list)
    testdf.to_csv(dir_path+"testdf_SA.csv", index=False)


3. Similar to the training inference pipeline, we use the same code snippet to generate results on the test dataset using DRL.

In [None]:
# generate results for RL test data
if not (milp_flag or simulated_annealing_flag):  
    
    #load the model first, if trained from the python script training_RL.py 
    opt.load('model_checkpoints/', 'best_model')
      
    print("....... Starting Inference on the test dataset ........")
    #test the RL model on the test set
    net_rewardt,rewardlist, battery_soc_list, action_list, power_list, carbon_intensity_list, price_intensity_list, carbon_savings_list, price_savings_list, carbon_savings_forecast_list, price_savings_forecast_list = testing(opt.model, env, opt, milp_flag, simulated_annealing_flag)

    # dump the test results into a csv file
    testdf= create_dataframe(battery_soc_list, action_list, power_list, carbon_intensity_list, price_intensity_list, carbon_savings_list, price_savings_list, rewardlist, carbon_savings_forecast_list, price_savings_forecast_list)
    testdf.to_csv(dir_path+"testdf_DRL.csv", index=False)

## Step 9: Result Visualization:

Instantiate the visualization object from the environment by passing 2 arguments:
- results_folder : The local folder name, where all the final results are stored
- optimizers : A list of optimizers for which results are present in the results_folder 

In [20]:
vi = env.visualize(results_folder = f"results_{country}",optimizers= ["MILP", "DRL"])

Then, after running the following cell, provide the following as input to visualize the plots:
- A multiselect option to choose between optimizers, so as to compare the final savings between two or more of them. To multiselect pressShift+ leftClick.
- Choose between training/test file options from the radio buttons provided to visualize the schedules generated for each of the optimizers running on the user-input option of train/test file.
- From the slider, select a day for which the battery schedules are to be shown
- Click on the Plot button to plot the results

In [21]:

menu = widgets.SelectMultiple(
       options=['MILP', 'DRL', 'SA'],
       value=['MILP'],
       description='Optimizer:',
       disabled = False)
rdbutton = widgets.RadioButtons(
            options=['Training File', 'Test File'],
            value='Test File', 
            layout={'width': 'max-content'},
            description='File:',
            disabled=False
        )

slider = widgets.IntSlider(
                value=0,
                min=0,
                max=int(vi.tr_files[list(menu.value)[0]].shape[0]/(env.horizon/env.timestep)),
                step=1,
                description = "Day:")

button = widgets.Button(description='Plot')
out = widgets.Output()
def on_button_clicked(b):
    with out:
        clear_output()    
        vi.initial_plots(menu.value)

        if rdbutton.value =="Test File":
            if slider.value > int(vi.te_files[list(menu.value)[0]].shape[0]/(env.horizon/env.timestep)) :
                print("Please choose a day lesser than Day 365, since its end of the test dataset")
                return

        train_data = list(vi.tr_files.values()) 
        test_data = list(vi.te_files.values())
        approach = list(menu.value)
        if len(approach) == 1:
            if approach[0] =="MILP":
                train_data=train_data[::2]
                test_data=test_data[::2]
            elif approach[0] == "DRL":
                train_data=train_data[1::2]
                test_data=test_data[1::2]
            else:
                pass

        if rdbutton.value == 'Training File':             
            title = f'Results for the Day {slider.value} of UK Data'
            vi.plot_results(train_data, slider.value, int(env.horizon/env.timestep), title, approach)
        else:
            title = f'Results for the Day {slider.value} of UK Data'
            vi.plot_results(test_data, slider.value, int(env.horizon/env.timestep), title, approach)
        
button.on_click(on_button_clicked)
info = display(Markdown("""# Savings over the whole dataset
- No. of days in the train dataset : {}
- No. of days in the test dataset : {}
\n 
\n 
Choose an Optimizer:""".format(int(vi.tr_files[list(menu.value)[0]].shape[0]/(env.horizon/env.timestep)),int(vi.te_files[list(menu.value)[0]].shape[0]/(env.horizon/env.timestep)))))
display(menu)
display(Markdown('''\n \nChoose the file to view results:'''))
display(rdbutton)
display(Markdown('''\n \nChoose a day for checking schedules:'''))
display(slider, button, out)

# Savings over the whole dataset
- No. of days in the train dataset : 728
- No. of days in the test dataset : 364

 

 
Choose an Optimizer:

SelectMultiple(description='Optimizer:', index=(0,), options=('MILP', 'DRL', 'SA'), value=('MILP',))


 
Choose the file to view results:

RadioButtons(description='File:', index=1, layout=Layout(width='max-content'), options=('Training File', 'Test…


 
Choose a day for checking schedules:

IntSlider(value=0, description='Day:', max=728)

Button(description='Plot', style=ButtonStyle())

Output()

The initial bar charts provide a "Total Savings" comparison between train and test files along with multiple optimizers if selected. The left hand side bar plots denote the overall cost savings, whereas the right side plots signify the carbon savings.

The plot below gives a clear indication of how the state of charge of the battery (green coloured line charts for multiple optimizers if selected) varies with the repective variations in Price (yellow plot) and carbon emissions (red plot). Based on the objective selected by the user, the price and carbon variations play a key role in deciding the charging and discharging schedules. For example, a common inference drawn from the schedules plot is when the prices are high, the battery discharges, whereas when the prices are low, the battery tends to charge from the utility grid, thus maximizing the profit for the consumer. Similar conclusion can be drawn for carbon arbitrage as well. 