# Prescient Tutorial

## Getting Started
This is a tutorial to demonstration the basic functionality of Prescient. Please follow the installation instructions in the [README](https://github.com/grid-parity-exchange/Prescient/blob/master/README.md) before proceeding. This tutorial will assume we are using the CBC MIP solver, however, we will point out where one could use a different solver (CPLEX, Gurobi, Xpress). For larger systems, a commerical MIP solver is recommended.

## Sample 5-bus case
We will use the example 5-bas case that comes with Prescient. This example is in the "rts-gmlc" format for Prescient, which designed to be compatable with the RTS-GMLC dataset (publically available [here](https://github.com/GridMod/RTS-GMLC)). To find out more about the RTS-GMLC system, or if you use the RTS-GMLC system in published research, please see or cite the [RTS-GMLC paper](https://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=8753693&isnumber=4374138&tag=1).

The format is defined as a collection of csv files, devided into "static" system data and time-varing system data. The static data include branches (`branch.csv`), buses (`bus.csv`), generators (`gen.csv`), initial generator status (`initial_status.csv`), reserves (`reserves.csv`). The file `timeseries_pointers.csv` links static attributes defined in the aforementioned files to time-varying attributes, for quantities such as load, reserve requirement, and renewables output. Critically, Prescient requires both a day-ahead time series, which serves as a forecasted value for the day-ahead unit commitment problem, and a real-time time series, which is uses as the realized values in the real-time economic dispatch problem. The file `simulation_objects.csv` provides metadata on the time series data, such as begin date, end date, and the temporal resolution.

A complete description of the RTS-GMLC format can be found in the [RTS-GMLC repository](https://github.com/GridMod/RTS-GMLC/tree/master/RTS_Data/SourceData).

In [None]:
import os
sorted(os.listdir("5bus"))

### Looking at the data
First, we'll peak at the `simulation_objects.csv` file:

In [None]:
import pandas as pd
pd.read_csv(os.path.join("5bus","simulation_objects.csv"))

This shows that the effective dates for this data are January 1, 2020 to December 31, 2020, limiting the potential scope of any Prescient simulation to be within these dates. Further, based on the `Period_Resolution` we see that the day-ahead data is specified at an hourly resolution and the real-time data is specified at a 5-minute resolution.

Prescient supports running the real-time economic dispatch problem at a finer or coarser frequency than the provided data, so long as the frequency is divides 60 minutes.


Similarly, we can look at the static data files:

In [None]:
pd.read_csv(os.path.join("5bus","branch.csv"))

In [None]:
pd.read_csv(os.path.join("5bus","bus.csv"))

In [None]:
pd.read_csv(os.path.join("5bus","gen.csv"))

In [None]:
pd.read_csv(os.path.join("5bus","reserves.csv"))

Currently, the 5-bus simulation does not define any reserve requirements. We will show later how to add a basic reserve requirement as part of the options to Prescient.


`timeseries_pointers.csv` connects columns in the data files, (such as `gen.csv`'s `PMax MW`) to time series data, as well as defining the load for each Area defined in `bus.csv`.

In [None]:
pd.read_csv(os.path.join("5bus","timeseries_pointers.csv"))

Here `Simulation` tells if the data is for day-ahead or real-time, `Category` could be `Generator`, `Area`, or `Reserve`, `Object` tells us the name or row(s) in the associated data file, and the `Parameter` specifies what column is updated with time series data. Finally, the `Data File` is the relative or absolute location of the data csv file on disk.


As an example, let's look at the `DAY_AHEAD_renewables.csv`:

In [None]:
pd.read_csv(os.path.join("5bus","DAY_AHEAD_renewables.csv"))

We see the `Year`/`Month`/`Day`/`Period` specified for each value, and the other columns are associated with the `Object` from `timeseries_pointers.csv`. Note that there's only one value for each item -- by the definitions in `timeseries_pointers.csv`, `1_HYDRO` and `2_RTPV` have **both** their `PMin MW` **and** `PMax MW` updated with these time series values, whereas `10_PV` and `4_WIND` have **only** their `PMax MW` updated. Because all `10_PV` and `4_WIND` have a `PMin MW` of `0`, this means these resources are fully curtailable, whereas `1_HYDRO` and `2_RTPV` are must-take because `PMin MW` and `PMax MW` are set to the same value.


Finally, let's look at `DAY_AHEAD_load.csv`:

In [None]:
pd.read_csv(os.path.join("5bus","DAY_AHEAD_load.csv"))

The load for each Area is distributed amoungst the buses by using a load participation factor calculated from the `MW Load` column for each bus. Recalling that data:

In [None]:
pd.read_csv(os.path.join("5bus","bus.csv"))[["Bus ID","MW Load", "Area"]]

So buses 1, 4, and 10 in Area 1 have load participation factors of `0.0`, `1.0` and `0.0`, respectively, while buses 2 and 3 in Area 2 both have load participation factors of `0.5`.

NOTE: Currently for Prescient you **must** specify load by Area. If load is individualized to a specific bus, then it can be part of its own area.


### Real-Time Data
While we haven't looked at it explicitly, the real-time csv files work in a similar fashion to the day-ahead files. The day-ahead time series get applied for the day-ahead unit commitment, and the real-time time series files are applied for the real-time economic dispatch. The allows Prescient to realistically represent uncertainty in the system when committing units in the day-ahead market.

## Running the simulator
Next we show how to set-up and run the simulator. From a Python script this can be accomplished easily, if you know the needed options:

In [None]:
from prescient.simulator import Prescient
Prescient().simulate(
        data_path = "5bus",     # Where the source data is
        input_format = "rts-gmlc", # Use the rts-gmlc input format, default is deprecated dat-files
        simulate_out_of_sample = True, # This option directs the simulator to use different forecasts from actuals.
                                       # If False, the simulation is run with forecasts equal to actuals
        run_sced_with_persistent_forecast_errors = True, # This option directs the simulator to use forecasts
                                                         #(adjusted by the current forecast error) for SCED
                                                         # look-ahead periods, instead of using the actuals
                                                         # for SCED look-ahead periods.
        output_directory = "5bus_output", # Where to write the output data
        start_date = "07-10-2020", # Date to start the simulation on, must be within the range of the data.
        num_days = 7, # Number of days to simulate, including the start date. All days must be in the data.
        reserve_factor = 0.1, # Additional reserve factor *not* included in the data.
                              # Input is a fraction of load at every time step
       
        sced_solver = "cbc", # (MI)LP solver to use for the SCED
        sced_frequency_minutes = 15, # SCED frequency in minutes. While a 5-minute SCED provides high-fidelity
                                    # data, a 15 or 60 minute SCED will likely decrease the simulation time
                                    # by approximately a factor of 3 or 10, respectively. The 5-bus test case has 
                                    # 5-mintue real-time data, but we're under no obligation for the SCED frequency
                                    # to match that -- it can be more or less frequently than the provided data.
        sced_horizon = 1, # Number of look-ahead periods (in sced_frequency_minutes) in the real-time SCED
        sced_slack_type = "ref-bus-and-branches", # Slacks on the branch flows and power-balance at reference bus
                                                  # in SCED. The default has **no** slacks on the branch flows but
                                                  # at every bus's power-balance equation
        ruc_slack_type = "ref-bus-and-branches", # Similar to `sced_slack_type`, except for the day-ahead
                                                 # unit commitment problem.
        ruc_horizon = 36, # Number of hours in unit commitment. Typically needs to be at least 24.
        ruc_mipgap = 0.01, # mipgap for the day-ahead unit commitment
        deterministic_ruc_solver = "cbc", # MILP solver to use for unit commitment
                                          # (e.g., cbc, gurobi, cplex, xpress)
        deterministic_ruc_solver_options = {"feas":"off", "DivingF":"on",}, # additional options for the MIP solver
        output_solver_logs = False, # If True, outputs the logs from the unit commitment and SCED solves
        compute_market_settlements = True, # If True, solves a day-ahead pricing problem (in addition to the
                                           # real-time pricing problem) and computes generator revenue based
                                           # on day-ahead and real-time prices.
        monitor_all_contingencies = True, # If True, monitors and enforces **all** T-1 transmission constraints.
                                          # Can be computationally prohibitive for larger systems.
                                          # A future version of Prescient will allow for supplying a contingency
                                          # monitoring list.
        price_threshold = 1000, # Maximum day-ahead or real-time price for energy in $/MWh. Only affects price
                                # computation, **not** commitment/dispatch.
        contingency_price_threshold = 100, # Penalty factor for contingency constraint violation in $/MWh. Only
                                           # affects price computation, **not** commitment/dispatch.
        reserve_price_threshold = 5, # Maximum day-ahead or real-time price for reserves in $/MWh. Only affects
                                     # price computation, **not** commitment/dispatch.
)

## Analyzing results
Summary and detailed `*.csv` files are written to the specified output directory (in this case, `5bus_output`).

In [None]:
sorted(os.listdir("5bus_output"))

Below we give a breif description of the contents of each file.
- `bus_detail.csv`: Detailed results (demand, LMP, etc.) by bus.
- `contingency_detail.csv`: Detailed contingeny flows for monitored contingencies for each SCED period.
- `daily_summary.csv`: Summary results by day. Demand, renewables data, costs, load shedding/over generation, etc.
- `hourly_gen_summary.csv`: Gives total thermal headroom and data on reserves (shortfall, price) by hour.
- `hourly_summary.csv`: Summary results by hour. Similar to `daily_summary.csv`.
- `line_detail.csv`: Detailed results (flow in MW) by bus.
- `overall_simulation_output.csv`: Summary results for the entire simulation run. Similar to `daily_summary.csv`.
- `plots`: Directory containing stackgraphs for every day of the simulation.
- `renewables_detail.csv`: Detailed results (output, curtailment) by renewable generator.
- `runtimes.csv`: Runtimes for each economic dispatch problem.
- `thermal_detail.csv`: Detailed results (dispatch, commitment, costs) per thermal generator.
- `virtual_detail.csv`: Detailed results (dispatch) by virtual generator.


### Plots
Generally, the first think to look at, as a sanity check, is the stackgraphs:

In [None]:
dates = [f"2020-07-1{i}" for i in range(0,7)]
from IPython.display import Image
for date in dates:
    display(Image(os.path.join("5bus_output","plots",f"stackgraph_{date}.png")))

### Plotting LMPs over time

In [None]:
from matplotlib import pyplot as plt
def prescient_output_to_df(file_name):
    '''Helper for loading data from Prescient output csv.
        Combines Datetimes into single column.
    '''
    df = pd.read_csv(file_name)
    df['Datetime'] = \
        pd.to_datetime(df['Date']) + \
        pd.to_timedelta(df['Hour'], 'hour') + \
        pd.to_timedelta(df['Minute'], 'minute')
    df.drop(columns=['Date','Hour','Minute'], inplace=True)
    # put 'Datetime' in front
    cols = df.columns.tolist()
    cols = cols[-1:]+cols[:-1]
    return df[cols]

bus_detail = prescient_output_to_df(os.path.join("5bus_output","bus_detail.csv"))
bus_detail.set_index('Datetime', inplace=True)

In [None]:
# Pandas df with the data from bus_detail.csv
bus_detail

In [None]:
# plot the LMPs
bus_detail.groupby("Bus")["LMP"].plot(legend=True, ylabel='LMP ($/MWh)', figsize=(10,5))
plt.show()

As we can see, the LMPs for bus2 and bus3 (in Area 2) sometimes have much higher LMPs than bus1, bus10, and bus4. This is typically driven by binding transmission constraints. We can examine the binding transmission constraints by looking at the files `line_detail.csv` and `contingency_detail.csv`.

### Examining Base-Case Transmission Line Flows

In [None]:
# load in the output data for the base-case nominal line flow
line_detail = prescient_output_to_df(os.path.join("5bus_output","line_detail.csv"))

# load in the branch input file, which has the line limits
branch_csv = pd.read_csv(os.path.join("5bus","branch.csv"), index_col=0)

In [None]:
rate_A_limits = branch_csv['Cont Rating']
# rename the line_limits to match the
# index of line_flows
rate_A_limits.index.name = "Line"
rate_A_limits

In [None]:
import numpy as np
# index line_detail by "Datetime" and "Line"
line_detail.set_index(["Datetime", "Line"], inplace=True)
line_detail["Relative Flow"] = np.abs(line_detail['Flow']/rate_A_limits)
line_detail

In [None]:
# find periods where the flow is near the limit
line_detail[line_detail['Relative Flow']>0.99]

As we can see, there are no lines close to their limits. Next we'll consider the contingency flows.

### Examining Transmission Contingency Violations

In [None]:
# load in the output data for the line flow under monitored contingencies
contingency_detail = prescient_output_to_df(os.path.join("5bus_output","contingency_detail.csv"))
contingency_detail.set_index(["Datetime", "Contingency", "Line"], inplace=True)
contingency_detail

In [None]:
# grab the short-term rating used for contingencies
rate_C_limits = branch_csv["STE Rating"]
rate_C_limits.index.name = "Line"
rate_C_limits

In [None]:
contingency_detail["Relative Flow"] = np.abs(contingency_detail['Flow']/rate_C_limits)
contingency_detail

In [None]:
# find periods where the flow is near the limit
contingency_detail[contingency_detail['Relative Flow']>0.99]

As we can see, Line `branch_3_4_1` is at or near its contingency limit for a good portion of the simulation when `branch_1_2` is on contingency outage. This causes congestion between Area 1 (bus1, bus4, & bus10) and Area 2 (bus2 & bus3), and the $100/MWh contingency flow violation cost is split between the LMP on bus2 and bus3.

Depending on the analysis to be done, one could increase or decrease the option `contingency_price_threshold` to achieve a different result.