# Quick Start: ERCOT RTLMP Spike Prediction System

This notebook provides a quick start guide for the ERCOT RTLMP spike prediction system. It demonstrates the core functionality including data fetching, feature engineering, model training, and inference with visualizations.

**System Overview**

The ERCOT RTLMP spike prediction system forecasts the probability of price spikes in the Real-Time Locational Marginal Price (RTLMP) market before day-ahead market closure. Energy storage operators need accurate predictions of potential price spikes to optimize battery charging/discharging strategies and maximize revenue.

**Key Features**

*   Daily inference runs before day-ahead market closure
*   72-hour forecast horizon starting from the day after DAM closure
*   Probability predictions for each hour in the forecast horizon
*   Modular code structure with clearly defined interfaces
*   Retraining capability on a two-day cadence


## Notebook Structure

1.  **Setup and Imports**: Import necessary libraries and set up the environment.
2.  **Data Fetching**: Retrieve RTLMP and grid condition data from ERCOT.
3.  **Feature Engineering**: Transform raw data into model-ready features.
4.  **Model Training**: Train an XGBoost model to predict price spikes.
5.  **Inference**: Generate price spike probability forecasts.
6.  **Visualization**: Visualize forecast results.
7.  **Conclusion**: Summary and next steps.

## Setup and Imports

### Import necessary libraries

In [None]:
# External libraries
import pandas as pd  # version 2.0+
import numpy as np  # version 1.24+
import matplotlib.pyplot as plt  # version 3.7+
import seaborn as sns  # version 0.12+
import plotly.express as px  # version 5.14+
import datetime  # Standard
from sklearn.model_selection import train_test_split  # scikit-learn version 1.2+
from sklearn.metrics import roc_auc_score, brier_score_loss, precision_score, recall_score, f1_score  # scikit-learn version 1.2+

### Import internal modules

In [None]:
# Internal modules
from src.backend.data.fetchers.ercot_api import ERCOTDataFetcher  # src/backend/data/fetchers/ercot_api.py
from src.backend.features.feature_pipeline import FeaturePipeline, DEFAULT_FEATURE_CONFIG  # src/backend/features/feature_pipeline.py
from src.backend.models.xgboost_model import XGBoostModel  # src/backend/models/xgboost_model.py
from src.backend.inference.engine import InferenceEngine  # src/backend/inference/engine.py
from src.backend.config.schema import InferenceConfig  # src/backend/config/schema.py
from src.backend.visualization.forecast_plots import ForecastPlotter  # src/backend/visualization/forecast_plots.py

### Define global constants

In [None]:
# Global constants
PRICE_SPIKE_THRESHOLD = 100.0  # $/MWh
FORECAST_HORIZON = 72  # hours
DEFAULT_NODES = ['HB_NORTH', 'HB_SOUTH', 'HB_WEST', 'HB_HOUSTON']

## Data Fetching

### Explain the data sources

This section demonstrates how to fetch RTLMP and grid condition data from ERCOT using the `ERCOTDataFetcher` class. The data is fetched for a specified date range and node locations.

In [None]:
# Initialize ERCOTDataFetcher
data_fetcher = ERCOTDataFetcher()

In [None]:
# Fetch historical RTLMP data
start_date = datetime.datetime(2023, 1, 1)
end_date = datetime.datetime(2023, 1, 10)
rtlmp_df = data_fetcher.fetch_historical_data(start_date, end_date, DEFAULT_NODES)

In [None]:
# Fetch grid condition data
grid_start_date = datetime.datetime(2023, 1, 1)
grid_end_date = datetime.datetime(2023, 1, 10)
grid_df = data_fetcher.fetch_historical_data(grid_start_date, grid_end_date, identifiers=[])

In [None]:
# Display sample data
print("RTLMP Data:")
print(rtlmp_df.head())

print("\nGrid Condition Data:")
print(grid_df.head())

## Feature Engineering

### Explain feature engineering process

This section demonstrates how to transform raw data into model-ready features using the `FeaturePipeline` class. The pipeline includes time-based, statistical, weather, and market features.

In [None]:
# Initialize FeaturePipeline
feature_pipeline = FeaturePipeline(DEFAULT_FEATURE_CONFIG)

In [None]:
# Add data sources to the pipeline
feature_pipeline.add_data_source('rtlmp_df', rtlmp_df)
feature_pipeline.add_data_source('grid_df', grid_df)

In [None]:
# Create features
features_df = feature_pipeline.create_features()

In [None]:
# Display engineered features
print("Engineered Features:")
print(features_df.head())

## Model Training

### Explain the modeling approach

This section demonstrates how to train an XGBoost model to predict price spikes. The data is split into training and testing sets, and the model is trained using the training data. The model performance is then evaluated using the testing data.

In [None]:
# Create target variable
def create_target_variable(rtlmp_df: pd.DataFrame, threshold: float) -> pd.Series:
    """Creates a binary target variable indicating whether a price spike occurred"""
    # Group RTLMP data by hour
    hourly_data = rtlmp_df.groupby(rtlmp_df['timestamp'].dt.floor('H'))
    
    # For each hour, check if any 5-minute price exceeds the threshold
    def check_spike(group):
        return (group['price'] > threshold).any()
    
    # Create a binary indicator (1 if spike occurred, 0 otherwise)
    target_series = hourly_data.apply(check_spike)
    
    # Return the binary target series
    return target_series

target = create_target_variable(rtlmp_df, PRICE_SPIKE_THRESHOLD)

In [None]:
# Split data into training and testing sets
X = features_df.drop(columns=['timestamp', 'node_id'], errors='ignore') # Drop non-feature columns
y = target.astype(int)  # Convert target to int

X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

In [None]:
# Initialize and train XGBoostModel
model = XGBoostModel(model_id='rtlmp_spike_model', hyperparameters={'objective': 'binary:logistic', 'eval_metric': 'logloss'}) # Specify objective and evaluation metric
model.train(X_train, y_train)

In [None]:
# Evaluate model performance
def evaluate_forecast(forecast_df: pd.DataFrame, actual_df: pd.DataFrame, threshold: float) -> dict:
    """Evaluates the forecast against actual values"""
    # Align forecast and actual data by timestamp
    merged_data = pd.merge(forecast_df, actual_df, left_index=True, right_index=True, suffixes=('_forecast', '_actual'))
    
    # Calculate AUC-ROC score
    auc_roc = roc_auc_score(merged_data['spike_occurred'], merged_data['spike_probability'])
    
    # Calculate Brier score
    brier_score = brier_score_loss(merged_data['spike_occurred'], merged_data['spike_probability'])
    
    # Calculate precision, recall, and F1 score
    y_pred = (merged_data['spike_probability'] > 0.5).astype(int)
    precision = precision_score(merged_data['spike_occurred'], y_pred)
    recall = recall_score(merged_data['spike_occurred'], y_pred)
    f1 = f1_score(merged_data['spike_occurred'], y_pred)
    
    # Return dictionary with all metrics
    return {
        'auc_roc': auc_roc,
        'brier_score': brier_score,
        'precision': precision,
        'recall': recall,
        'f1_score': f1
    }

performance_metrics = model.evaluate(X_test, y_test)

In [None]:
# Visualize feature importance
importance = model.get_feature_importance()
importance_df = pd.DataFrame({"Feature": importance.keys(), "Importance": importance.values()})
plt.figure(figsize=(10, 6))
sns.barplot(x="Importance", y="Feature", data=importance_df.sort_values(by="Importance", ascending=False))
plt.title("Feature Importance")
plt.show()

## Inference

### Explain the inference process

This section demonstrates how to generate price spike probability forecasts using the `InferenceEngine` class. The inference engine loads the trained model and generates a 72-hour forecast.

In [None]:
# Initialize InferenceEngine with configuration
inference_config = InferenceConfig(thresholds=[PRICE_SPIKE_THRESHOLD])
inference_engine = InferenceEngine(config=inference_config)

# Load the trained model
inference_engine.load_model(model_id='rtlmp_spike_model')

In [None]:
# Generate 72-hour forecast
forecast_start_date = datetime.datetime(2023, 1, 11)
forecast_data_sources = {
    'rtlmp_df': rtlmp_df,
    'grid_df': grid_df
}
forecast_df = inference_engine.run_inference(forecast_data_sources)

In [None]:
# Display forecast results
print("Forecast Results:")
print(forecast_df.head())

## Visualization

### Explain visualization options

This section demonstrates how to visualize the forecast results using the `ForecastPlotter` class. The visualizations include a probability timeline plot and a threshold comparison plot.

In [None]:
# Initialize ForecastPlotter
plotter = ForecastPlotter()

In [None]:
# Load forecast data
plotter.load_forecast(forecast_df)

In [None]:
# Create probability timeline plot
fig_timeline, ax_timeline = plotter.plot_probability_timeline()
plt.show()

In [None]:
# Create threshold comparison plot
fig_comparison, ax_comparison = plotter.plot_threshold_comparison()
plt.show()

In [None]:
# Create interactive dashboard
fig_dashboard = plotter.create_forecast_dashboard()
fig_dashboard.show()

In [None]:
# Export visualizations
# plotter.save_plot(fig_timeline, 'probability_timeline.png')
# plotter.save_plot(fig_comparison, 'threshold_comparison.png')
# plotter.save_interactive_plot(fig_dashboard, 'forecast_dashboard.html')

## Conclusion

### Summary of what was learned

In this notebook, we demonstrated the core functionality of the ERCOT RTLMP spike prediction system, including data fetching, feature engineering, model training, inference, and visualization. This provides a foundation for further exploration and customization of the system.

### Next steps for exploration

1.  Experiment with different feature engineering techniques.
2.  Explore different machine learning models and hyperparameter tuning.
3.  Implement more sophisticated visualization techniques.
4.  Integrate the system with real-time data sources and battery optimization workflows.

### References

*   [ERCOT Data API Documentation](https://www.ercot.com/services/data)
*   [XGBoost Documentation](https://xgboost.readthedocs.io/en/stable/)
*   [Scikit-learn Documentation](https://scikit-learn.org/stable/)
*   See other notebooks in the `examples/notebooks` directory for more detailed examples.