<center><H1>Modelado con RNN tipo <i>sequence to sequence</i> para la predicción de la demanda de energía eléctrica en la Gerencia Regional Oriental</H1><center>

<center><img src="https://www.gstatic.com/devrel-devsite/prod/ve2848ad92313fddfcd40baeb58a2f663fe2fd55c371a714a6bb3e329e2b15223/tensorflow/images/lockup.svg"  height="80px" style="padding-bottom:5px;"  /></center>

<center><H2>Julio Waissman Vilanova</H2>

<table align="center">
      <td align="center"><a target="_blank" href="https://www.unison.mx">
            <img src="https://www.unison.mx/wp-content/themes/awaken/images/logo.png"  height="70px" style="padding-bottom:5px;"  /></a></td>  
      <td align="center"><a target="_blank" href="https://www.gob.mx/cenace">
            <img src="https://universidad.cenace.gob.mx/pluginfile.php/244/block_html/content/CENACE-logo-completo.png" width="300" style="padding-bottom:5px;" /></a></td>
      <td align="center"><a target="_blank" href="https://colab.research.google.com/github/juliowaissman/rn-cenace/blob/main/Encoder_Oriental.ipynb">
            <img src="https://i.ibb.co/2P3SLwK/colab.png"  style="padding-bottom:5px;" />Ejecuta en Google Colab</a></td>

</table>

# Cargar y limpiar datos

In [1]:
# Las bibliotecas de base
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt

# Para normalizar los datos de entrada
from sklearn.preprocessing import MinMaxScaler

#Tensorflow con keras
import tensorflow as tf
from tensorflow import keras
from tensorflow.keras import layers

# Gráficas más fáciles de manipular con plotly
import plotly.express as px
import plotly.graph_objects as go

# Como se verán las gráficas de matplotlib
plt.style.use('ggplot')
plt.rcParams['figure.figsize'] = (15,7)

In [14]:
url_prontmp = "https://github.com/juliowaissman/rn-cenace/raw/main/proyectos/Tomas/DatosPron_temp.xlsx"
url_demanda = "https://github.com/juliowaissman/rn-cenace/raw/main/proyectos/Tomas/DatosReales_consumo_temp.xlsx"

df_t = pd.read_excel(url_prontmp)
df_d = pd.read_excel(url_demanda)

print(df_t.info())
print(df_d.info())

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 72 entries, 0 to 71
Data columns (total 5 columns):
 #   Column      Non-Null Count  Dtype         
---  ------      --------------  -----         
 0   FECHA       72 non-null     datetime64[ns]
 1   HORA        72 non-null     int64         
 2   DIA         72 non-null     object        
 3   NUM_SEMANA  72 non-null     int64         
 4   GRADOS      72 non-null     float64       
dtypes: datetime64[ns](1), float64(1), int64(2), object(1)
memory usage: 2.9+ KB
None
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 76920 entries, 0 to 76919
Data columns (total 6 columns):
 #   Column      Non-Null Count  Dtype         
---  ------      --------------  -----         
 0   FECHA       76920 non-null  datetime64[ns]
 1   HORA        76920 non-null  int64         
 2   DIA         76920 non-null  object        
 3   NUM_SEMANA  76920 non-null  int64         
 4   MW          76920 non-null  int64         
 5   GRADOS      76920 non-null 

In [22]:
df = df_d.copy()
df.set_index(df_d.FECHA)


df['DIA'] = df.DIA.replace({
  'M': 1, 'W': 2, 'J': 3, 'V': 4, 'S': 5, 'D': 6, 'L': 0
})

df.loc[df.MW > 9000, "MW"] = df.MW[df.MW > 9000] / 2

df = df[df.MW > 0]
#df.asfreq('H', method='bfill')

print(df.info())

fig = px.line(df, x="FECHA", y="MW", title='Demanda de energía GR Oriente')
fig.show()

fig = px.line(df, x="FECHA", y="GRADOS", title='Demanda de energía GR Oriente')
fig.show()

<class 'pandas.core.frame.DataFrame'>
Int64Index: 76914 entries, 0 to 76919
Data columns (total 6 columns):
 #   Column      Non-Null Count  Dtype         
---  ------      --------------  -----         
 0   FECHA       76914 non-null  datetime64[ns]
 1   HORA        76914 non-null  int64         
 2   DIA         76914 non-null  int64         
 3   NUM_SEMANA  76914 non-null  int64         
 4   MW          76914 non-null  float64       
 5   GRADOS      76914 non-null  float64       
dtypes: datetime64[ns](1), float64(2), int64(3)
memory usage: 4.1 MB
None


# Generar los conjuntos de datos para aprendizaje y prueba

In [23]:
df_train = df[df.FECHA.dt.year < 2021]
df_test = df[df.FECHA.dt.year == 2021]


name_attr = ['MW', 'GRADOS', 'HORA', 'DIA', 'NUM_SEMANA']
n_attr = len(name_attr)

train = df_train[name_attr]
scalers = {}  # Un diccionario con los scalers

for attr in name_attr:
  scaler = MinMaxScaler(feature_range=(-1, 1))
  s_s = scaler.fit_transform(train[attr].values.reshape(-1,1))
  scalers[attr] = scaler
  train[attr] = s_s.ravel()

test = df_test[name_attr]
for attr in name_attr:
  scaler = scalers[attr]
  s_s = scaler.transform(test[attr].values.reshape(-1,1))
  test[attr] = s_s.ravel()

print(train.info())
print(test.info())

<class 'pandas.core.frame.DataFrame'>
Int64Index: 70124 entries, 0 to 70128
Data columns (total 5 columns):
 #   Column      Non-Null Count  Dtype  
---  ------      --------------  -----  
 0   MW          70124 non-null  float64
 1   GRADOS      70124 non-null  float64
 2   HORA        70124 non-null  float64
 3   DIA         70124 non-null  float64
 4   NUM_SEMANA  70124 non-null  float64
dtypes: float64(5)
memory usage: 3.2 MB
None
<class 'pandas.core.frame.DataFrame'>
Int64Index: 6790 entries, 70129 to 76919
Data columns (total 5 columns):
 #   Column      Non-Null Count  Dtype  
---  ------      --------------  -----  
 0   MW          6790 non-null   float64
 1   GRADOS      6790 non-null   float64
 2   HORA        6790 non-null   float64
 3   DIA         6790 non-null   float64
 4   NUM_SEMANA  6790 non-null   float64
dtypes: float64(5)
memory usage: 318.3 KB
None




A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy



A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy



A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy



A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/

In [25]:
def divide_series(series, n_pasado, n_futuro, n_salto, es_train=True):
  """
  n_pasado: número de observaciones pasadas para el encoder 
  n_futuro: número de observaciones futuras
  n_salto: a partir de donde empiezan a contar las observaciones futuras

  """
  X, y = list(), list() # Vamos a crear listas y al final hacemos ndarrays
  generador = range(len(series)) if es_train else range(0, len(series), n_futuro)
  
  for ini in generador:
    fin_anterior = ini + n_pasado
    fin_actual = fin_anterior + n_salto + n_futuro
    if fin_actual > len(series):
      break
    pasado = series[ini: fin_anterior, :]
    futuro = series[fin_anterior + n_salto: fin_actual, 0].reshape(-1,1)
    X.append(pasado)
    y.append(futuro)
  return np.array(X), np.array(y)

n_pasado = 24 * 14 + 12
n_futuro = 24 * 7
n_salto = 12

X_train, y_train = divide_series(train.values, n_pasado, n_futuro, n_salto)
X_test, y_test = divide_series(test.values, n_pasado, n_futuro, n_salto, es_train=False)


# Reacomodar como un tensor de 3 dimensiones
X_train = X_train.reshape((X_train.shape[0], X_train.shape[1], n_attr))
X_test = X_test.reshape((X_test.shape[0], X_test.shape[1], n_attr))

y_train = y_train.reshape((y_train.shape[0], y_train.shape[1], 1))
y_test = y_test.reshape((y_test.shape[0], y_test.shape[1], 1))

X_train.shape, X_test.shape, y_train.shape, y_test.shape


((69597, 348, 5), (38, 348, 5), (69597, 168, 1), (38, 168, 1))

# Modelo

In [27]:
encoder_inputs = layers.Input(shape=(n_pasado, n_attr))
#-------------------------------------------------------
encoder_l1 = layers.LSTM(100, return_state=True)
encoder_outputs1 = encoder_l1(encoder_inputs)
encoder_states1 = encoder_outputs1[1:]
#-------------------------------------------------------
decoder_rvec = layers.RepeatVector(n_futuro)
decoder_inputs = decoder_rvec(encoder_outputs1[0])
#-------------------------------------------------------
decoder_l1 = layers.LSTM(100, return_sequences=True)
decoder_l1_output = decoder_l1(decoder_inputs, initial_state=encoder_states1)
#-------------------------------------------------------
decoder_l2 = layers.TimeDistributed(layers.Dense(1))
decoder_outputs = decoder_l2(decoder_l1_output)



modelo_Oriental_7dias = keras.models.Model(encoder_inputs, decoder_outputs)
modelo_Oriental_7dias.summary()

Model: "model_1"
__________________________________________________________________________________________________
Layer (type)                    Output Shape         Param #     Connected to                     
input_2 (InputLayer)            [(None, 348, 5)]     0                                            
__________________________________________________________________________________________________
lstm_2 (LSTM)                   [(None, 100), (None, 42400       input_2[0][0]                    
__________________________________________________________________________________________________
repeat_vector_1 (RepeatVector)  (None, 168, 100)     0           lstm_2[0][0]                     
__________________________________________________________________________________________________
lstm_3 (LSTM)                   (None, 168, 100)     80400       repeat_vector_1[0][0]            
                                                                 lstm_2[0][1]               

# Entrenamiento

In [None]:
reduce_lr = keras.callbacks.LearningRateScheduler(lambda x: 1e-3 * 0.90 ** x)

path_checkpoint = "model_checkpoint.h5"
modelckpt_callback = keras.callbacks.ModelCheckpoint(
    monitor="val_loss",
    filepath=path_checkpoint,
    verbose=1,
    save_weights_only=True,
    save_best_only=True,
)

es_callback = keras.callbacks.EarlyStopping(
    monitor="val_loss", 
    min_delta=0, 
    patience=5
)

modelo_Oriental_7dias.compile(
    optimizer=keras.optimizers.Adam(), 
    loss="mae"
)

history = modelo_Oriental_7dias.fit(
    X_train,
    y_train,
    epochs=25,
    validation_split=0.2,
    batch_size=32,
    callbacks=[reduce_lr, es_callback, modelckpt_callback]
)


Epoch 1/25

Epoch 00001: val_loss improved from inf to 0.12518, saving model to model_checkpoint.h5
Epoch 2/25

Epoch 00002: val_loss improved from 0.12518 to 0.10648, saving model to model_checkpoint.h5
Epoch 3/25

Epoch 00003: val_loss improved from 0.10648 to 0.10470, saving model to model_checkpoint.h5
Epoch 4/25

Epoch 00004: val_loss improved from 0.10470 to 0.10268, saving model to model_checkpoint.h5
Epoch 5/25
 137/1740 [=>............................] - ETA: 9:06 - loss: 0.0814