# WESM Price Prediction - EDA

## I. Dataset Description

### Setup and Loading

In [3]:
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns

pd.set_option('display.max_columns', None)
plt.style.use('seaborn-v0_8-darkgrid')

df = pd.read_csv("final_dataset.csv")
df["datetime"] = pd.to_datetime(df["datetime"])

### The Data

The final dataset is generated from ETL Processing which combines data from three different catagories, **GWAP**, **RTD Regional Summaries**, and **Outage Schedules**. The columns are defined as follows:

- `datetime`: Target End Time of Dispatch Interval
- `GWAP`: Generator Weighted Average Price
- `energy_demand_mw`: MW Requirement for Energy
- `energy_supply_mw`: MW Generation for Energy
- `energy_shortage_mw`: 
- `reserve_demand_mw`: MW Requirement for Reserve Commodities (Non-energy commodity type)
- `reserve_supply_mw`: MW Generation for Reserve Commodities (Non-energy commodity type)
- `outage_count`: Count of outages per datetime
    - Value of 0 - No outages occured at timestamp
- `GWAP_Lag_1`: GWAP value 5 minutes ago
- `GWAP_Lag_12`: GWAP value 1 hour ago
- `GWAP_Lag_288`: GWAP value at the same time yesterday (24 hours ago)

In [4]:
print(df.info())
print(df.describe())
print(df.head())

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 15260 entries, 0 to 15259
Data columns (total 11 columns):
 #   Column              Non-Null Count  Dtype         
---  ------              --------------  -----         
 0   datetime            15260 non-null  datetime64[ns]
 1   GWAP                15260 non-null  float64       
 2   energy_demand_mw    15260 non-null  float64       
 3   energy_supply_mw    15260 non-null  float64       
 4   energy_shortage_mw  15260 non-null  float64       
 5   reserve_demand_mw   15260 non-null  float64       
 6   reserve_supply_mw   15260 non-null  float64       
 7   outage_count        15260 non-null  float64       
 8   GWAP_Lag_1          15260 non-null  float64       
 9   GWAP_Lag_12         15260 non-null  float64       
 10  GWAP_Lag_288        15260 non-null  float64       
dtypes: datetime64[ns](1), float64(10)
memory usage: 1.3 MB
None
                            datetime          GWAP  energy_demand_mw  \
count                     

As seen above, the minimum GWAP recorded is -9999. Initially, we thought this was a sentinel value (error code) but after observing neighboring data points (e.g., -9998 and -9500) and reading the [WESM Price Determination Methodology](https://www.wesm.ph/downloads/download/TWFya2V0IFJlcG9ydHM=/NTYw), we were able to determine that these were valid data.

According to the WESM Price Determination Methodology, this phenomenon is driven by "Excess Generation", where the market clears at or near the Offer Price Floor (currently set at -P10,000/MWh).
> "5.4.2 In the event of over-generation, the excess price shall be determined as the offer price floor."

The Logic:
- Over-generation: Happens when Supply > Demand.
- "Pay-to-Stay" Strategy: the -9,999 (just P1.00 above the floor) implies a strategic bid. Baseload generators (like Coal) have high startup costs, so they bid highly negative prices (essentially paying the market to keep running) rather than shutting down.
- The Result: The -9999.0 GWAP confirms that the grid was saturated enough to force prices down to the "desperation bids" of the large plants.