# Your mission

You started working on the Ecowatt project at RTE. In order to avoid possible shortage, one must plan for peaks in national electricity. You manager Mark is going on holidays for a week. You will be sole responsible for forecasting the weekly demand, while he is absent.

In order to prevent electricity shortage, you must accurately forecast the demand 7 days ahead, on an hourly basis.

Your mission is to train an accurate predictive model with the lowest root mean squared error (RMSE). Mark is a very technical guy, he likes to understand all technical details and would like you to compare the performances of classical models and neural-net based models.


Your **target variable** is the consommation_totale

**Data source** : https://data.enedis.fr/pages/accueil/

In [211]:
import os
import numpy as np
import pandas as pd
from sklearn.metrics import mean_squared_error

import matplotlib.pyplot as plt
import seaborn as sns

In [212]:
%run ./utils.ipynb

In [213]:
TARGET = "consommation_totale"

In [214]:
df = pd.read_csv("data/bilan-electrique.csv")

## Train test split

Define here the range of your train/test split

In [215]:
X_train = df[-1000:-100]
X_test = df[-100:]

# Modeling with ARIMA
In this section, you are to perform some classical modelings, the suggested method here is ARIMA, but you can try other models such as ARMA, ARIMAX, SARIMAX...

## Modeling
The following code allows ARIMA modeling with one combination of (p,d,q).

In [216]:
parameters = (2,1,1)
errors, predictions = evaluate_arima_model(
    X_train[TARGET],
    X_test[TARGET],
    parameters
    )
errors


Non-stationary starting autoregressive parameters found. Using zeros as starting parameters.


Non-invertible starting MA parameters found. Using zeros as starting parameters.



3.860381035528318e+17

## Search for the best ARIMA model
We use grid search to search for the best ARIMA parameters that gives the lowest error. This follows the Box-Jenkins methology.

In [217]:
best_cfg, best_score = arima_grid_search(X_train[TARGET],
                                            X_test[TARGET],
                                            range(1,3),range(0,3),range(0,3))

ARIMA(1,0,0) RMSE=4775634692700016640.000
ARIMA(1,0,1) RMSE=39709749912179621888.000



Maximum Likelihood optimization failed to converge. Check mle_retvals



ARIMA(1,0,2) RMSE=122652137280996689051648.000
ARIMA(1,1,0) RMSE=404428336767815488.000
ARIMA(1,1,1) RMSE=423435196056450304.000
ARIMA(1,1,2) RMSE=399746415915415808.000
ARIMA(1,2,0) RMSE=434112932225798208.000
ARIMA(1,2,1) RMSE=434148570902039616.000
ARIMA(1,2,2) RMSE=434552004537873216.000
ARIMA(2,0,0) RMSE=19037543649422471168.000
ARIMA(2,0,1) RMSE=50468877124328005632.000



Maximum Likelihood optimization failed to converge. Check mle_retvals


Maximum Likelihood optimization failed to converge. Check mle_retvals


Maximum Likelihood optimization failed to converge. Check mle_retvals


Maximum Likelihood optimization failed to converge. Check mle_retvals


Maximum Likelihood optimization failed to converge. Check mle_retvals


Maximum Likelihood optimization failed to converge. Check mle_retvals


Maximum Likelihood optimization failed to converge. Check mle_retvals


Maximum Likelihood optimization failed to converge. Check mle_retvals


Maximum Likelihood optimization failed to converge. Check mle_retvals


Maximum Likelihood optimization failed to converge. Check mle_retvals


Maximum Likelihood optimization failed to converge. Check mle_retvals


Maximum Likelihood optimization failed to converge. Check mle_retvals


Maximum Likelihood optimization failed to converge. Check mle_retvals


Maximum Likelihood optimization failed to converge. Check mle_r

ARIMA(2,0,2) RMSE=38143774315771358510841856.000
ARIMA(2,1,0) RMSE=399359490663385728.000



Non-stationary starting autoregressive parameters found. Using zeros as starting parameters.


Non-invertible starting MA parameters found. Using zeros as starting parameters.



ARIMA(2,1,1) RMSE=386038103552831808.000
ARIMA(2,1,2) RMSE=386458909990416768.000
ARIMA(2,2,0) RMSE=434614928841748672.000
ARIMA(2,2,1) RMSE=434719864329648448.000
ARIMA(2,2,2) RMSE=419938333253831552.000
Best ARIMA(2, 1, 1) MSE=386038103552831808.000


In [218]:
print(best_cfg, best_score)

(2, 1, 1) 3.860381035528318e+17


## Visualization
To have a better view on the difference between true and predict values, we visualize them by plotting both the signals.

In [219]:
# prepare the dataset for plotting
predict_date = df["horodate"]
df_predict = pd.DataFrame(zip(predict_date,
                              predictions, X_test[TARGET].values),
                          columns=["date", "predict", "true"])

In [221]:
fig = go.Figure(layout=)

fig.add_trace(go.Scatter(x=df_predict["date"], y=df_predict["predict"], name="predict"))
fig.add_trace(go.Scatter(x=df_predict["date"], y=df_predict["true"], name="true"))
