# National Energy Consortium (NEC)

## Problem Statement
- **Objective**: Select one plant form 64 options to serve each demand scenario
- **Goal**: Minimise cost (UDS/MWh) for meeting demand
- **Error Metrics**: RMSE between optimal and selected plant costs
- **Data**: 3 Datasets (demand, plants, generations_costs)

### Error metric

The per-scenario error is defined as:

$$
\text{Error}(d) \;=\; \min_{p \in P} c(p, d) \;-\; c(p'_d, d)
$$

- d: a demand scenario  
- P: set of candidate plants (64 options)  
- p: a plant in P  
- p'_d: the plant selected for scenario d by the model/heuristic  
- c(p, d): cost (UDS/MWh) of plant p under scenario d

Notes:
- This computes the difference between the optimal (minimum) cost across all plants for scenario d and the cost of the selected plant.  
- Use these per-scenario errors to compute aggregate metrics (e.g., RMSE, MAE) across all scenarios.

### Score (RMSE)

The aggregate error score (root-mean-square error) across demand scenarios is:

$$
\text{Score} \;=\; \sqrt{\frac{1}{D}\sum_{d=1}^{D}\text{Error}(d)^2}
$$

Where:
- D: total number of demand scenarios  
- Error(d): per-scenario error (as defined earlier, Error(d) = min_{p in P} c(p,d) - c(p'_d,d))

This score summarizes the typical magnitude of the per-scenario selection error (lower is better).

In [1]:
# Import necessary libraries
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
from sklearn.preprocessing import StandardScaler
from sklearn.model_selection import LeaveOneGroupOut, GridSearchCV, train_test_split
from sklearn.ensemble import RandomForestRegressor, GradientBoostingRegressor
from sklearn.linear_model import Ridge, Lasso
from sklearn.svm import SVR
from sklearn.metrics import mean_squared_error, mean_absolute_error, r2_score
import json
import warnings
from dataclasses import dataclass
from typing import Dict, List, Tuple, Optional, Any
warnings.filterwarnings('ignore')

# Set random seed for reproducibility
np.random.seed(42)

# Configure display
pd.set_option('display.max_columns', None)
pd.set_option('display.max_rows', 100)
sns.set_style("whitegrid")
plt.rcParams['figure.figsize'] = (14, 6)


#### Helper classes and Data Structure

In [4]:
class Logger:
    def __init__(self, verbose: bool = True):
        self.verbose = verbose
    
    def header(self, text:str, width:int=80):
        if self.verbose:
            print("="*width)
            print(text)
            print("="*width)
    
    def subheader(self,text:str):
        if self.verbose:
            print(f"\n{text}")
            print("-")*len(text)
            
    def info(self, text:str, indent:int=0):
        if self.verbose:
            print(" "*indent + f"✔ {text}")
    
    def data(self, text:str, indent: int=0):
        if self.verbose:
            print(" "*indent + f"➤ {text}")
    
    def metric(self, label:str, value: Any, unit:str="", indent:int=0):
        if self.verbose:
            print(" "*indent +f" {label}: {value} {unit}")
        
    def success(self, text:str):
        if self.verbose:
            print(f"\n{'='*80}")
            print(f"✔ {text}")
            print(f"{'='*80}\n")
        

        
class StepConfig:
    name: str
    description: str
    verbose: bool = True

class Step:
    def __init__(self, config:Optional[StepConfig]=None):
        self.config = config or StepConfig(name=self.__class__.__name__, description="Analysis Step")
        self.logger = Logger(verbose=self.config.verbose) 

    def execute(self, *args, **kwargs):
        raise NotImplementedError("Execute method must be implemented by subclasses")
    
    def get_results(self) -> Dict[str, Any]:
        raise NotImplementedError("get_results method must be implemented by subclasses")

        