Pierre Nikitits
## Course Project: Electricity Price Explanation

Dataset:

- Consumption
- Exchange
- Net Export/Import
- Energy Sources
- Residual Load
- Weather Conditions
- Market Dynamics

Steps:

1. Preprocessing Data
2. Random Forest
3. Training and Validation
4. Hyperparameter Tuning
5. Evaluation
6. Interpretation


## Part 1: Loading and Preprocessing the data

In [1]:
import pandas as pd
path = "/Users/pierre/Documents/GitHub/EnsembleLearningProject/Data/"

X_train = pd.read_csv(path + 'X_train.csv').set_index('ID')
y_train = pd.read_csv(path + 'y_train.csv').set_index('ID')
X_test = pd.read_csv(path + 'X_test.csv').set_index('ID')
y_test = pd.read_csv(path + 'y_test.csv').set_index('ID')

In [2]:
print("X_train :" , X_train.shape)
print("y_train :" , y_train.shape)

print("\nX_test  :" , X_test.shape)
print("y_test  :" , y_test.shape)

X_train : (1494, 34)
y_train : (1494, 1)

X_test  : (654, 34)
y_test  : (654, 1)


In [3]:
X_train = pd.get_dummies(X_train, columns=['COUNTRY'])
X_test = pd.get_dummies(X_test, columns=['COUNTRY'])

X_train, X_test = X_train.align(X_test, join='inner', axis=1)

In [4]:
# Check for missing values
print("Missing values in X_train:", X_train.isnull().sum().sum())
print("Missing values in X_test:", X_test.isnull().sum().sum())

X_train.fillna(X_train.median(), inplace=True)
X_test.fillna(X_test.median(), inplace=True)
print("\nFill missing values\n")

print("Missing values in X_train:", X_train.isnull().sum().sum())
print("Missing values in X_test:", X_test.isnull().sum().sum())


Missing values in X_train: 1002
Missing values in X_test: 400

Fill missing values

Missing values in X_train: 0
Missing values in X_test: 0


In [5]:
print(y_train.shape)
y_train = y_train.squeeze()
y_test = y_test.squeeze()
print(y_train.shape)

(1494, 1)
(1494,)


## Part 2: Model definition

In [6]:
from sklearn.ensemble import RandomForestRegressor
from sklearn.metrics import mean_squared_error

model = RandomForestRegressor(random_state=42)
model.fit(X_train, y_train)
predictions = model.predict(X_test)

mse = mean_squared_error(y_test, predictions)
print(f"Mean Squared Error: {mse}")


Mean Squared Error: 1.247309519415119


## Part 3: Training and validation

In [8]:
from sklearn.metrics import mean_squared_error, mean_absolute_error, r2_score
from math import sqrt
import numpy as np

rmse = sqrt(mean_squared_error(y_test, predictions))
print(f"Root Mean Squared Error (RMSE): {rmse}")

mae = mean_absolute_error(y_test, predictions)
print(f"Mean Absolute Error (MAE): {mae}")

r2 = r2_score(y_test, predictions)
print(f"R-squared (R²): {r2}")


n = X_test.shape[0]
p = X_test.shape[1]
adjusted_r2 = 1 - (1-r2) * (n-1) / (n-p-1)
print(f"Adjusted R-squared: {adjusted_r2}")


def mean_absolute_percentage_error(y_true, y_pred): 
    y_true, y_pred = np.array(y_true), np.array(y_pred)
    return np.mean(np.abs((y_true - y_pred) / y_true)) * 100

mape = mean_absolute_percentage_error(y_test, predictions)
print(f"Mean Absolute Percentage Error (MAPE): {mape}%")


Root Mean Squared Error (RMSE): 1.1168301211084517
Mean Absolute Error (MAE): 0.8963478438970939
R-squared (R²): -0.18292638362122648
Adjusted R-squared: -0.24992059628585905
Mean Absolute Percentage Error (MAPE): 426.65073057747725%
