<center><H1>Modelado con RNN tipo <i>sequence to sequence</i> para la predicción de la demanda de energía eléctrica en la ZC de Casas Grandes</H1><center>

<center><img src="https://www.gstatic.com/devrel-devsite/prod/ve2848ad92313fddfcd40baeb58a2f663fe2fd55c371a714a6bb3e329e2b15223/tensorflow/images/lockup.svg"  height="80px" style="padding-bottom:5px;"  /></center>

<center><H2>Julio Waissman Vilanova</H2>

<table align="center">
      <td align="center"><a target="_blank" href="https://www.unison.mx">
            <img src="https://www.unison.mx/wp-content/themes/awaken/images/logo.png"  height="70px" style="padding-bottom:5px;"  /></a></td>  
      <td align="center"><a target="_blank" href="https://www.gob.mx/cenace">
            <img src="https://universidad.cenace.gob.mx/pluginfile.php/244/block_html/content/CENACE-logo-completo.png" width="300" style="padding-bottom:5px;" /></a></td>
      <td align="center"><a target="_blank" href="https://colab.research.google.com/github/juliowaissman/rn-cenace/blob/main/Encoder_Casas_Grandes.ipynb">
            <img src="https://i.ibb.co/2P3SLwK/colab.png"  style="padding-bottom:5px;" />Ejecuta en Google Colab</a></td>

</table>

In [1]:
# Las bibliotecas de base
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt

# Para normalizar los datos de entrada
from sklearn.preprocessing import MinMaxScaler

#Tensorflow con keras
import tensorflow as tf
from tensorflow import keras
from tensorflow.keras import layers

# Gráficas más fáciles de manipular con plotly
import plotly.express as px
import plotly.graph_objects as go

# Como se verán las gráficas de matplotlib
plt.style.use('ggplot')
plt.rcParams['figure.figsize'] = (15,7)

# Cargar datos

In [9]:
url = "https://github.com/juliowaissman/rn-cenace/raw/main/proyectos/Claudia/Demanda%20ZC%20Casas%20Grandes_2017-2021.xlsx"

df_raw = pd.read_excel(url)
df_raw.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 41617 entries, 0 to 41616
Data columns (total 5 columns):
 #   Column                 Non-Null Count  Dtype  
---  ------                 --------------  -----  
 0   AÑO                    41617 non-null  int64  
 1   MES                    41617 non-null  object 
 2   Día                    41617 non-null  int64  
 3   Hora                   41617 non-null  int64  
 4   DEMANDA CASAS GRANDES  41617 non-null  float64
dtypes: float64(1), int64(3), object(1)
memory usage: 1.6+ MB


In [55]:
df = df_raw.copy()

df["MES"] = df.MES.str.upper()
df['MES'] = df.MES.replace({
    "ENE": 1, "FEB": 2, "MZO": 3, "ABR": 4, "MAY": 5, 'JUN': 6, 
    'JUL': 7, 'AGO': 8, 'SEP': 9, 'OCT': 10, 'NOV': 11, 'DIC': 12, 'MAR': 3
})

print("Existe una hora 25")
print(df.query("Hora == 25"))
print(df.iloc[7246:7255, :])

df.rename(columns={"DEMANDA CASAS GRANDES": "Demanda"}, inplace=True)

df['Fecha'] = pd.to_datetime(pd.DataFrame({
    'year': df.AÑO,
    'month': df.MES,
    'day': df.Día,
    'hour':df.Hora
}))
df.set_index(df.Fecha, append=False, inplace=True)

# Quitar los datos con demanda = 0
df = df[df.Demanda > 0]

# Vamos a convertir en una serie de tiempo con incrementos horario
#df = df.asfreq('H', method='pad')

print(df.info())

fig = px.line(df, x=df.index, y="Demanda", title='Demanda de energía ZC Casas Grandes')
fig.show()

Existe una hora 25
       AÑO  MES  Día  Hora  DEMANDA CASAS GRANDES
7248  2017   10   29    25               79.02978
       AÑO  MES  Día  Hora  DEMANDA CASAS GRANDES
7246  2017   10   29    23               86.87314
7247  2017   10   29    24               83.27583
7248  2017   10   29    25               79.02978
7249  2017   10   30     1               76.88507
7250  2017   10   30     2               78.00801
7251  2017   10   30     3               73.33292
7252  2017   10   30     4               73.35382
7253  2017   10   30     5               73.88614
7254  2017   10   30     6               78.38128
<class 'pandas.core.frame.DataFrame'>
DatetimeIndex: 41612 entries, 2017-01-01 01:00:00 to 2021-10-01 00:00:00
Data columns (total 6 columns):
 #   Column   Non-Null Count  Dtype         
---  ------   --------------  -----         
 0   AÑO      41612 non-null  int64         
 1   MES      41612 non-null  int64         
 2   Día      41612 non-null  int64         
 3   Hora    

# Generar los conjuntos de entrenamiento, validación y aprendizaje



In [60]:
df_train = df[df.AÑO < 2021]
df_test = df[df.AÑO == 2021]


name_attr = ['Demanda', 'MES', 'Día', 'Hora']
n_attr = len(name_attr)

train = df_train[name_attr]
scalers = {}  # Un diccionario con los scalers

for attr in name_attr:
  scaler = MinMaxScaler(feature_range=(-1, 1))
  s_s = scaler.fit_transform(train[attr].values.reshape(-1,1))
  scalers[attr] = scaler
  train[attr] = s_s.ravel()

test = df_test[name_attr]
for attr in name_attr:
  scaler = scalers[attr]
  s_s = scaler.transform(test[attr].values.reshape(-1,1))
  test[attr] = s_s.ravel()

print(train.info())
print(test.info())

<class 'pandas.core.frame.DataFrame'>
DatetimeIndex: 35061 entries, 2017-01-01 01:00:00 to 2021-01-01 00:00:00
Data columns (total 4 columns):
 #   Column   Non-Null Count  Dtype  
---  ------   --------------  -----  
 0   Demanda  35061 non-null  float64
 1   MES      35061 non-null  float64
 2   Día      35061 non-null  float64
 3   Hora     35061 non-null  float64
dtypes: float64(4)
memory usage: 1.3 MB
None
<class 'pandas.core.frame.DataFrame'>
DatetimeIndex: 6551 entries, 2021-01-01 01:00:00 to 2021-10-01 00:00:00
Data columns (total 4 columns):
 #   Column   Non-Null Count  Dtype  
---  ------   --------------  -----  
 0   Demanda  6551 non-null   float64
 1   MES      6551 non-null   float64
 2   Día      6551 non-null   float64
 3   Hora     6551 non-null   float64
dtypes: float64(4)
memory usage: 255.9 KB
None




A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy



A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy



A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy



A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/

In [66]:
def divide_series(series, n_pasado, n_futuro, n_salto, es_train=True):
  """
  n_pasado: número de observaciones pasadas para el encoder 
  n_futuro: número de observaciones futuras
  n_salto: a partir de donde empiezan a contar las observaciones futuras

  """
  X, y = list(), list() # Vamos a crear listas y al final hacemos ndarrays
  generador = range(len(series)) if es_train else range(0, len(series), n_futuro)
  
  for ini in generador:
    fin_anterior = ini + n_pasado
    fin_actual = fin_anterior + n_salto + n_futuro
    if fin_actual > len(series):
      break
    pasado = series[ini: fin_anterior, :]
    futuro = series[fin_anterior + n_salto: fin_actual, 0].reshape(-1,1)
    X.append(pasado)
    y.append(futuro)
  return np.array(X), np.array(y)

n_pasado = 24 * 14 + 12
n_futuro = 24 
n_salto = 12

X_train, y_train = divide_series(train.values, n_pasado, n_futuro, n_salto)
X_test, y_test = divide_series(test.values, n_pasado, n_futuro, n_salto, es_train=False)


# Reacomodar como un tensor de 3 dimensiones
X_train = X_train.reshape((X_train.shape[0], X_train.shape[1], n_attr))
X_test = X_test.reshape((X_test.shape[0], X_test.shape[1], n_attr))

y_train = y_train.reshape((y_train.shape[0], y_train.shape[1], 1))
y_test = y_test.reshape((y_test.shape[0], y_test.shape[1], 1))

X_train.shape, X_test.shape, y_train.shape, y_test.shape


((34678, 348, 4), (257, 348, 4), (34678, 24, 1), (257, 24, 1))

# Modelo

In [62]:
encoder_inputs = layers.Input(shape=(n_pasado, n_attr))
#-------------------------------------------------------
encoder_l1 = layers.LSTM(100, return_state=True)
encoder_outputs1 = encoder_l1(encoder_inputs)
encoder_states1 = encoder_outputs1[1:]
#-------------------------------------------------------
decoder_rvec = layers.RepeatVector(n_futuro)
decoder_inputs = decoder_rvec(encoder_outputs1[0])
#-------------------------------------------------------
decoder_l1 = layers.LSTM(100, return_sequences=True)
decoder_l1_output = decoder_l1(decoder_inputs, initial_state=encoder_states1)
#-------------------------------------------------------
decoder_l2 = layers.TimeDistributed(layers.Dense(1))
decoder_outputs = decoder_l2(decoder_l1_output)



modeloCG_1 = keras.models.Model(encoder_inputs, decoder_outputs)
modeloCG_1.summary()

Model: "model"
__________________________________________________________________________________________________
Layer (type)                    Output Shape         Param #     Connected to                     
input_1 (InputLayer)            [(None, 348, 4)]     0                                            
__________________________________________________________________________________________________
lstm (LSTM)                     [(None, 50), (None,  11000       input_1[0][0]                    
__________________________________________________________________________________________________
repeat_vector (RepeatVector)    (None, 24, 50)       0           lstm[0][0]                       
__________________________________________________________________________________________________
lstm_1 (LSTM)                   (None, 24, 50)       20200       repeat_vector[0][0]              
                                                                 lstm[0][1]                   

# Entrenamiento

In [64]:
reduce_lr = keras.callbacks.LearningRateScheduler(lambda x: 1e-3 * 0.90 ** x)

path_checkpoint = "model_checkpoint.h5"
modelckpt_callback = keras.callbacks.ModelCheckpoint(
    monitor="val_loss",
    filepath=path_checkpoint,
    verbose=1,
    save_weights_only=True,
    save_best_only=True,
)

es_callback = keras.callbacks.EarlyStopping(
    monitor="val_loss", 
    min_delta=0, 
    patience=5
)

modeloCG_1.compile(
    optimizer=keras.optimizers.Adam(), 
    loss="mae"
)

history = modeloCG_1.fit(
    X_train,
    y_train,
    epochs=25,
    validation_split=0.2,
    batch_size=32,
    callbacks=[reduce_lr, es_callback, modelckpt_callback]
)

Epoch 1/25

Epoch 00001: val_loss improved from inf to 0.13544, saving model to model_checkpoint.h5
Epoch 2/25

Epoch 00002: val_loss improved from 0.13544 to 0.12852, saving model to model_checkpoint.h5
Epoch 3/25

Epoch 00003: val_loss improved from 0.12852 to 0.11769, saving model to model_checkpoint.h5
Epoch 4/25

Epoch 00004: val_loss did not improve from 0.11769
Epoch 5/25

Epoch 00005: val_loss improved from 0.11769 to 0.11728, saving model to model_checkpoint.h5
Epoch 6/25

Epoch 00006: val_loss did not improve from 0.11728
Epoch 7/25

Epoch 00007: val_loss did not improve from 0.11728
Epoch 8/25

Epoch 00008: val_loss did not improve from 0.11728
Epoch 9/25

Epoch 00009: val_loss did not improve from 0.11728
Epoch 10/25

Epoch 00010: val_loss improved from 0.11728 to 0.11535, saving model to model_checkpoint.h5
Epoch 11/25

Epoch 00011: val_loss did not improve from 0.11535
Epoch 12/25

Epoch 00012: val_loss improved from 0.11535 to 0.10858, saving model to model_checkpoint.h5

In [72]:
y_est = modeloCG_1.predict(X_test)
y_test[:,:,0].ravel().shape, y_est[:,:,0].ravel().shape, test.index[n_pasado + n_salto:-23].shape

((6168,), (6168,), (6168,))

In [73]:
scaler = scalers['Demanda']

yr = scaler.inverse_transform(y_test[:,:,0].ravel().reshape(-1, 1))
yh = scaler.inverse_transform(y_est[:,:,0].ravel().reshape(-1, 1))

df_est = pd.DataFrame({
    "Real": yr.ravel(),
    "Estimado": yh.ravel(),
    "Fecha": df_test.Fecha[n_pasado + n_salto:-23]     
})

In [74]:
fig = go.Figure()
fig.add_trace(go.Scatter(x=df_est.Fecha, y=df_est.Estimado, name="Estimada"))
fig.add_trace(go.Scatter(x=df_est.Fecha, y=df_est.Real, name="Real"))
fig.update_layout(title="Estimación de la demanda")
fig.show()

In [71]:
6191 - 6168

23