# This Notebook will develop how to explain an Agent and assess its performance.

***Disclaimer***: This file referenced some files in other directories. In order to have working cross referencing it's recommended to start the notebook server from the root directory (`Grid2Op`) of the package and __not__ in the `getting_started` sub directory:
```bash
cd Grid2Op
jupyter notebook
```

***NB*** For more information about how to use the package, a general help can be built locally (provided that sphinx is installed on the machine) with:
```bash
cd Grid2Op
make html
```
from the top directory of the package (usually `Grid2Op`).

Once build, the help can be access from [here](../documentation/html/index.html)

It is recommended to have a look at the [0_basic_functionalities](0_basic_functionalities.ipynb) notebook before getting into this one.

**Objective**

This notebook will cover the basic of how to "code" an Agent that takes action on the powergrid. Examples will be given of "expert agent" that can take actions based on some fixed rules. More generic type of *Agent*, relying for example on machine learning / deep learning will be covered in the notebook [3_TrainingAnAgent](3_TrainingAnAgent.ipynb).

This notebook will also cover the description of the *Observation* class, usefull to take some actions.

In [1]:
import os
import sys
import grid2op

In [2]:
res = None
try:
    from jyquickhelper import add_notebook_menu
    res = add_notebook_menu()
except ModuleNotFoundError:
    print("Impossible to automatically add a menu / table of content to this notebook.\nYou can download \"jyquickhelper\" package with: \n\"pip install jyquickhelper\"")
res

## I) Description of the observations

In this paragraph we will cover the observation class. For more information about it, we recommend to have a look at the official documentation, or [here](../documentation/html/observation.html) or in the [Observations.py](../grid2op/Observations.py) files for more information. Only basic concepts are detailed in this notebook.

### I.A) Getting an observation

An observation can be accessed when calling `env.step()`. The next cell is dedicated to create an environment, and to get an observation instance.

In [3]:
# import the usefull class
from grid2op.Runner import Runner
from grid2op.ChronicsHandler import Multifolder, GridStateFromFileWithForecasts
# make a runner
runner = Runner(init_grid_path=grid2op.CASE_14_FILE,
               path_chron=grid2op.CHRONICS_MLUTIEPISODE,
               gridStateclass=Multifolder,
               gridStateclass_kwargs={"gridvalueClass": GridStateFromFileWithForecasts},
               names_chronics_to_backend = grid2op.NAMES_CHRONICS_TO_BACKEND)
# initialize it
runner.make_env()

To perform a step, as stated on the short description above, we need an action. More information about them is given in the [2_ActionRepresentation](2_ActionRepresentation.ipynb) notebook. Here we just assume we do nothing.

In [4]:
do_nothing_act = runner.env.helper_action_player({})
obs, reward, done, info = runner.env.step(do_nothing_act)

### I.B) Information present in an Observation

In this notebook we will detail only the "CompleteObservation". `Grid2Op` allows to modeled different kind of observations, for example some with incomplete data, or with noisy data etc. CompletelyObservation gives the full state of the powergrid, without any noise. It's the default observation used.

#### a) some of its attributes

An observation has calendar data (eg the time stamp of the observation):

In [5]:
obs.year, obs.month, obs.day, obs.hour_of_day, obs.minute_of_hour, obs.day_of_week

(2019, 1, 1, 0, 10, 1)

It has some powegrid generic information:

In [6]:
print("Number of generators of the powergrid: {}".format(obs.n_gen))
print("Number of loads of the powergrid: {}".format(obs.n_load))
print("Number of powerline of the powergrid: {}".format(obs.n_lines))
print("Number of elements connected to each substations in the powergrid: {}".format(obs.subs_info))
print("Total number of elements: {}".format(obs.dim_topo))

Number of generators of the powergrid: 5
Number of loads of the powergrid: 11
Number of powerline of the powergrid: 20
Number of elements connected to each substations in the powergrid: [3 6 4 6 5 6 3 2 5 3 3 3 4 3]
Total number of elements: 56


It has some information about the generators (each generator can be viewed as a point in a 3 dimensional space)

In [7]:
print("Generators active production: {}".format(obs.prod_p))
print("Generators reactive production: {}".format(obs.prod_q))
print("Generators voltage setpoint : {}".format(obs.prod_v))

Generators active production: [  0.          23.3         69.1          0.         149.48472817]
Generators reactive production: [47.13340788 58.60853089 14.68417071 23.59794742 -0.49975789]
Generators voltage setpoint : [141.075   136.35      0.22256  13.08    143.1    ]


It has some information the loads (each load is a point in a 3 dimensional space too)

In [8]:
print("Loads active consumption: {}".format(obs.load_p))
print("Loads reactive consumption: {}".format(obs.prod_q))
print("Loads voltage (voltage magnitude of the bus to which it is connected) : {}".format(obs.load_v))

Loads active consumption: [18.8 85.1 13.8 44.3  7.1 10.2 27.1  8.2  3.2  5.7 11.8]
Loads reactive consumption: [47.13340788 58.60853089 14.68417071 23.59794742 -0.49975789]
Loads voltage (voltage magnitude of the bus to which it is connected) : [141.075        0.21788455   0.21533202   0.21830444   0.21692824
   0.2111932  137.68220138   0.22256     14.72610185  13.08
   0.21591106]


In this setting a powerline can be viewed as a point in an 8 dimensional space:
   * active flow
   * reactive flow
   * voltage magnitude
   * current flow
   
from both of its end.

It it is then:

In [9]:
print("Origin active flow: {}".format(obs.p_or))
print("Origin reactive flow: {}".format(obs.q_or))
print("Origin current flow: {}".format(obs.a_or))
print("Origin voltage (voltage magnitude to the bus to which the origin end is connected): {}".format(obs.v_or))
print("Extremity active flow: {}".format(obs.p_ex))
print("Extremity reactive flow: {}".format(obs.q_ex))
print("Extremity current flow: {}".format(obs.a_ex))
print("Extremity voltage (voltage magnitude to the bus to which the origin end is connected): {}".format(obs.v_ex))

Origin active flow: [ 1.08718903e+02  4.07658248e+01 -8.32701461e+00  4.06838088e-01
 -1.65603459e+01  3.00462070e+00  1.37826937e+01  4.38146976e+01
  2.86127910e+01  1.54475549e+01 -1.88518005e+01 -5.44944509e+01
  2.03193013e+01  8.80848061e+00  2.29678572e+01  1.22123569e+01
  6.96746657e+00 -6.80436085e+00 -1.77635684e-14  1.22123569e+01]
Origin reactive flow: [ -8.43868623   7.93892829   6.60041696   8.53341302   0.71187573
   0.16055117   1.76753172   7.3023995    9.62624918   8.27511716
   6.78590648  -4.43027206   2.71055199   4.37671262  10.75525783
  -9.02161853   1.03975934  10.59191251 -22.7723314   13.29921174]
Origin current flow: [4.39955890e+02 1.67563361e+02 2.84132202e+04 2.28444271e+04
 4.44427624e+04 7.95766368e+03 3.69827824e+04 1.81785259e+02
 1.23547356e+02 7.17187011e+01 8.48386918e+01 2.31374890e+02
 5.31779004e+04 2.55156361e+04 6.57907164e+04 6.42537364e+01
 2.98120010e+01 5.27909744e+01 8.92810046e+02 7.07892801e+02]
Origin voltage (voltage magnitude to the

The last informations about the powerlines is the $\rho$ ratio, *ie.* the ratio between the current flow on each powerlines and the its thermal limits. It can be accessed with:

In [10]:
obs.rho

array([0.01039126, 0.00395766, 0.00103397, 0.00083132, 0.0016173 ,
       0.00028958, 0.00134582, 0.00429356, 0.00291805, 0.00169392,
       0.00200379, 0.00546481, 0.00193517, 0.00092853, 0.00239416,
       0.0015176 , 0.00070413, 0.00124686, 0.00218682, 0.00173389])

It also store information of the topology and the state of the powerline.

In [11]:
obs.timestep_overflow # the number of timestep each of the powerline is in overflow (1 powerline per component)
obs.line_status # the status of each powerline: True connected, False disconnected
obs.topo_vect  # the topology vector the each element (generator, load, each end of a powerline) to which the object
# is connected: 1 = bus 1, 2 = bus 2.

array([1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1,
       1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1,
       1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1])

 More information about this `topology vector` is given in the documentation [here](../documentation/html/observation.html)

#### b) some of its methods

It can be converted to / from flat numpy vector. This function is usefull for interacting with machine learning or to store it, but probably less human readable. It consists in stacking all the above-mentionned information in a single `numpy.float64` vector.

In [12]:
vector_representation_of_observation = obs.to_vect()
vector_representation_of_observation

array([ 2.01900000e+03,  1.00000000e+00,  1.00000000e+00,  1.00000000e+00,
        0.00000000e+00,  1.00000000e+01,  0.00000000e+00,  2.33000000e+01,
        6.91000000e+01,  0.00000000e+00,  1.49484728e+02,  4.71334079e+01,
        5.86085309e+01,  1.46841707e+01,  2.35979474e+01, -4.99757886e-01,
        1.41075000e+02,  1.36350000e+02,  2.22560000e-01,  1.30800000e+01,
        1.43100000e+02,  1.88000000e+01,  8.51000000e+01,  1.38000000e+01,
        4.43000000e+01,  7.10000000e+00,  1.02000000e+01,  2.71000000e+01,
        8.20000000e+00,  3.20000000e+00,  5.70000000e+00,  1.18000000e+01,
        1.31000000e+01,  6.01000000e+01,  9.50000000e+00,  3.15000000e+01,
        4.90000000e+00,  7.10000000e+00,  1.91000000e+01,  5.80000000e+00,
        2.20000000e+00,  4.00000000e+00,  8.40000000e+00,  1.41075000e+02,
        2.17884546e-01,  2.15332022e-01,  2.18304445e-01,  2.16928244e-01,
        2.11193196e-01,  1.37682201e+02,  2.22560000e-01,  1.47261019e+01,
        1.30800000e+01,  

An observation can be copied, of course:

In [13]:
obs2 = obs.copy()

Or reset:

In [14]:
obs2.reset()
print(obs2.prod_p)

None


Or loaded from a vector:

In [15]:
obs2.from_vect(vector_representation_of_observation)
obs2.prod_p

array([  0.        ,  23.3       ,  69.1       ,   0.        ,
       149.48472817])

It is also possible to assess whether two observations are equals or not:

In [16]:
obs == obs2

True

For this type of observation, it is also possible to retrieve the topology as a matrix. The topology matrix can be obtained in two different format.

*Format 1*: the `connectivity matrix` which has as many row / columns as the number of elements in the powergrid (remember an element is either an end of a powerline, or a generator or a load) and that says if 2 elements are connected to one another or not:

$$
\left\{
\begin{aligned}
\text{conn mat}[i,j] = 0 & ~\text{element i and j are NOT connected to the same bus}\\
\text{conn mat}[i,j] = 1 & ~\text{element i and j are connected to the same bus, or i and j are both end of the same powerline}\\
\end{aligned}
\right.
$$

In [17]:
obs.connectity_matrix()

array([[0., 1., 1., ..., 0., 0., 0.],
       [1., 0., 1., ..., 0., 0., 0.],
       [1., 1., 0., ..., 0., 0., 0.],
       ...,
       [0., 0., 0., ..., 0., 1., 1.],
       [0., 0., 0., ..., 1., 0., 1.],
       [0., 0., 0., ..., 1., 1., 0.]])

This representation has the advantages to always have the same dimension, regardless of the topology of the powergrid.

*Format 2*: the `bus connectivity matrix` has as many row / columns as the number of active buses of the powergrid. It should be understood as followed:

$$
\left\{
\begin{aligned}
\text{bus conn mat}[i,j] = 0 & ~\text{no powerline connect bus i to bus j}\\
\text{bus conn mat}[i,j] = 1 & ~\text{at least a powerline connectes bus i to bus j (or i == j)}\\
\end{aligned}
\right.
$$

In [18]:
obs.bus_connectivity_matrix()

array([[1., 1., 0., 0., 1., 0., 0., 0., 0., 0., 0., 0., 0., 0.],
       [1., 1., 1., 1., 1., 0., 0., 0., 0., 0., 0., 0., 0., 0.],
       [0., 1., 1., 1., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0.],
       [0., 1., 1., 1., 1., 0., 1., 0., 1., 0., 0., 0., 0., 0.],
       [1., 1., 0., 1., 1., 1., 0., 0., 0., 0., 0., 0., 0., 0.],
       [0., 0., 0., 0., 1., 1., 0., 0., 0., 0., 1., 1., 1., 0.],
       [0., 0., 0., 1., 0., 0., 1., 1., 1., 0., 0., 0., 0., 0.],
       [0., 0., 0., 0., 0., 0., 1., 1., 0., 0., 0., 0., 0., 0.],
       [0., 0., 0., 1., 0., 0., 1., 0., 1., 1., 0., 0., 0., 1.],
       [0., 0., 0., 0., 0., 0., 0., 0., 1., 1., 1., 0., 0., 0.],
       [0., 0., 0., 0., 0., 1., 0., 0., 0., 1., 1., 0., 0., 0.],
       [0., 0., 0., 0., 0., 1., 0., 0., 0., 0., 0., 1., 1., 0.],
       [0., 0., 0., 0., 0., 1., 0., 0., 0., 0., 0., 1., 1., 1.],
       [0., 0., 0., 0., 0., 0., 0., 0., 1., 0., 0., 0., 1., 1.]])

#### c) Simulate

As opposed to most RL problems, in this framework we add the possibility to "simulate" the impact of a possible action on the power grid.

This "simulate" method used some forecasts available (forecasts are made available by the way we loaded the data here, with the class `GridStateFromFileWithForecasts`. For this class, only 1 time step ahead forecasts are provided, but this might be adapted in the future).

Note that this `simulate` function can use a different simulator than the one used by the Environment. Fore more information, we encourage you to read the official documentation or if it has been built locally (recommended) to consult [this page](../documentation/html/observation.html#grid2op.Observation.Observation.simulate).

This function will:

1. apply the forecasted injection on the powergrid
2. run a powerflow with the decidated `simulate` powerflow simulator
3. return:
    1. the anticipated observation (after the action has been taken)
    2. the anticipated reward (of this simulated action)
    3. whether or not there has been an error
    4. some more informations
    
From a user point of view, this is the main difference with the previous [pypownet](https://github.com/MarvinLer/pypownet) framework. This "simulation" used to be performed directly by the environment, thus giving a direct access to the Agent to the Environment, which could break the RL framework (it was not the case in the first edition of the Learning to Run A Power Network as the Environment was fully observable).

In [19]:
do_nothing_act = runner.env.helper_action_player({})
obs_sim, reward_sim, is_done_sim, info_sim = obs.simulate(do_nothing_act)

In [20]:
obs_sim.prod_p

array([  0.        ,  23.5       ,  69.6       ,   0.        ,
       148.44173557])

In [21]:
obs.prod_p

array([  0.        ,  23.3       ,  69.1       ,   0.        ,
       149.48472817])

## II) Taking actions based on these

In this section we will make our first *Agent* that will act based on these observations.

All *Agent* must derived from the grid2op.Agent class. The main function to code for the Agents is the "act" function (more information on the official documentation or - if the documentation has been built locally- [here](../documentation/html/agent.html) ). 

Basically, the Agent receive a reward and an observation, and suggest a new action. Some different *Agent* are pre-define in the grid2op package. We won't expose them here (for more information see the documantation or the [Agent.py](../grid2op/Agent.py) file), but rather we will make a custom Agent.

This *Agent* will select among:

- do nothing 
- disconnecting the powerline having the higher relative flows
- reconnecting a powerline disconnected
- disconnecting the powerline having the lower relative flows

by using `simulate` on the corresponding actions, and choosing the one that has the highest predicted reward.

Note that this kind of Agent is not particularly smart and is given only as an example.

More information about the creation / manipulation of *Action* will be given in the notebook [2_Action_GridManipulation](2_Action_GridManipulation.ipynb)

In [22]:
from grid2op.Agent import Agent
import numpy as np
import pdb
class MyAgent(Agent):
    def __init__(self, action_space):
        # python required method to code
        Agent.__init__(self, action_space)
        self.do_nothing = self.action_space({})
        self.print_next = False
        
    def act(self, observation, reward, done=False):
        i_max = np.argmax(observation.rho)
        new_status_max = np.zeros(observation.rho.shape)
        new_status_max[i_max] = -1
        act_max = self.action_space({"set_status": new_status_max})
        
        i_min = np.argmin(observation.rho)
        new_status_min = np.zeros(observation.rho.shape)
        if observation.rho[i_min] > 0:
            # all powerlines are connected, i try to disconnect this one
            new_status_min[i_min] = -1
            act_min = self.action_space({"set_status": new_status_min})
        else:
            # at least one powerline is disconnected, i try to reconnect it
            new_status_min[i_min] = 1
#             act_min = self.action_space({"set_status": new_status_min})
            act_min = self.action_space({"set_status": new_status_min,
                                         "set_bus": {"lines_or": [(i_min, 1)], "lines_ex": [(i_min, 1)]}})
    
        _, reward_sim_dn, *_ = observation.simulate(self.do_nothing)
        _, reward_sim_max, *_ = observation.simulate(act_max)
        _, reward_sim_min, *_ = observation.simulate(act_min)
            
        if reward_sim_dn >= reward_sim_max and reward_sim_dn >= reward_sim_min:
            self.print_next = False
            res = self.do_nothing
        elif reward_sim_max >= reward_sim_min:
            self.print_next = True
            res = act_max
            print(res)
        else:
            self.print_next = True
            res = act_min
            print(res)
        return res

We compare this Agent with the Donothing agent (already coded) on the 3 episode made available with this package. To make the comparison more interesting, it's better to use the L2RPN rewards.

In [23]:
from grid2op.main import main
from grid2op.Agent import DoNothingAgent
from grid2op.Reward import L2RPNReward
res = main(nb_episode=1,
           agent_class=DoNothingAgent,
           path_casefile=grid2op.CASE_14_FILE,
           path_chronics=grid2op.CHRONICS_MLUTIEPISODE,
           names_chronics_to_backend=grid2op.NAMES_CHRONICS_TO_BACKEND,
           gridStateclass_kwargs={"gridvalueClass": GridStateFromFileWithForecasts},
           reward_class=L2RPNReward
          )
print("The results for DoNothing agent are:")
for chron_name, cum_reward, nb_time_step, max_ts in res:
    msg_tmp = "\tFor chronics located at {}\n".format(chron_name)
    msg_tmp += "\t\t - cumulative reward: {:.6f}\n".format(cum_reward)
    msg_tmp += "\t\t - number of time steps completed: {:.0f} / {:.0f}".format(nb_time_step, max_ts)
    print(msg_tmp)

The results for DoNothing agent are:
	For chronics located at /home/donnotben/.local/lib/python3.6/site-packages/grid2op/data/test_multi_chronics/1
		 - cumulative reward: 5739.951023
		 - number of time steps completed: 287 / 287


In [24]:
from grid2op.main import main
from grid2op.Agent import DoNothingAgent
from grid2op.Reward import L2RPNReward
res = main(nb_episode=1,
           agent_class=MyAgent,
           path_casefile=grid2op.CASE_14_FILE,
           path_chronics=grid2op.CHRONICS_MLUTIEPISODE,
           names_chronics_to_backend=grid2op.NAMES_CHRONICS_TO_BACKEND,
           gridStateclass_kwargs={"gridvalueClass": GridStateFromFileWithForecasts},
           reward_class=L2RPNReward
          )
print("The results for the custom agent are:")
for chron_name, cum_reward, nb_time_step, max_ts in res:
    msg_tmp = "\tFor chronics located at {}\n".format(chron_name)
    msg_tmp += "\t\t - cumulative reward: {:.6f}\n".format(cum_reward)
    msg_tmp += "\t\t - number of time steps completed: {:.0f} / {:.0f}".format(nb_time_step, max_ts)
    print(msg_tmp)

This action will:
	 - NOT change anything to the injections
	 - force disconnection of 1 powerlines ([10])
	 - NOT switch any line status
	 - NOT switch anything in the topology
	 - NOT force any particular bus configuration
The results for the custom agent are:
	For chronics located at /home/donnotben/.local/lib/python3.6/site-packages/grid2op/data/test_multi_chronics/1
		 - cumulative reward: 5739.951012
		 - number of time steps completed: 287 / 287


As we can see, there is a slight degradation of the performance (cumulative reward) for the second (custom) Agent. The cumulative reward in this case is 5739.95***1012*** the baseline (do nothing) was 5739.95***1023***.

This agent is NOT recommended.

In [25]:
from grid2op.Agent import PowerLineSwitch
res = main(nb_episode=1,
           agent_class=PowerLineSwitch,
           path_casefile=grid2op.CASE_14_FILE,
           path_chronics=grid2op.CHRONICS_MLUTIEPISODE,
           names_chronics_to_backend=grid2op.NAMES_CHRONICS_TO_BACKEND,
           gridStateclass_kwargs={"gridvalueClass": GridStateFromFileWithForecasts},
           reward_class=L2RPNReward
          )
print("The results for the PowerLineSwitch agent are:")
for chron_name, cum_reward, nb_time_step, max_ts in res:
    msg_tmp = "\tFor chronics located at {}\n".format(chron_name)
    msg_tmp += "\t\t - cumulative reward: {:.6f}\n".format(cum_reward)
    msg_tmp += "\t\t - number of time steps completed: {:.0f} / {:.0f}".format(nb_time_step, max_ts)
    print(msg_tmp)

The results for the PowerLineSwitch agent are:
	For chronics located at /home/donnotben/.local/lib/python3.6/site-packages/grid2op/data/test_multi_chronics/1
		 - cumulative reward: 5739.952376
		 - number of time steps completed: 287 / 287


We want however to emphasize that do nothing is NOT the best solution, even in this simple case. For example, an agent choosing at each time step to disconnect / reconnect as to greedily maximize the anticipated reward will have a cumulative reward of 5739.95***2376*** in this situation.