Una de las técnicas para preparar un dataset de entrenamiento para forecasting de series temporales se denomina *windowing*. En esta, uno incluye como features un conjunto de valores de un atributo, mientras pone como label el valor futuro de dicho atributo. 

Se da un ejemplo de lo siguiente en esta notebook.

In [1]:
import tensorflow as tf

2024-03-14 10:29:33.415416: I tensorflow/core/util/port.cc:113] oneDNN custom operations are on. You may see slightly different numerical results due to floating-point round-off errors from different computation orders. To turn them off, set the environment variable `TF_ENABLE_ONEDNN_OPTS=0`.
2024-03-14 10:29:33.442601: I external/local_tsl/tsl/cuda/cudart_stub.cc:31] Could not find cuda drivers on your machine, GPU will not be used.
2024-03-14 10:29:33.543187: E external/local_xla/xla/stream_executor/cuda/cuda_dnn.cc:9261] Unable to register cuDNN factory: Attempting to register factory for plugin cuDNN when one has already been registered
2024-03-14 10:29:33.543268: E external/local_xla/xla/stream_executor/cuda/cuda_fft.cc:607] Unable to register cuFFT factory: Attempting to register factory for plugin cuFFT when one has already been registered
2024-03-14 10:29:33.558725: E external/local_xla/xla/stream_executor/cuda/cuda_blas.cc:1515] Unable to register cuBLAS factory: Attempting to

In [5]:
# se crea un tf dataset elemental
dataset = tf.data.Dataset.range(10)

# show
for val in dataset:
    print(val.numpy())

0
1
2
3
4
5
6
7
8
9


In [6]:
# window the data
# window genera ventana de <size> cantidad de elementos
# donde la ventana se mueve <shift> elementos cada vez
# drop_remainder=True asegura que el resultados sólo tiene tensores de la misma longitud
dataset = dataset.window(size=5, shift=1, drop_remainder=True)

# show
# El resultado es un conjunto de datasets individuales.
for wd in dataset:
    print(wd)

<_VariantDataset element_spec=TensorSpec(shape=(), dtype=tf.int64, name=None)>
<_VariantDataset element_spec=TensorSpec(shape=(), dtype=tf.int64, name=None)>
<_VariantDataset element_spec=TensorSpec(shape=(), dtype=tf.int64, name=None)>
<_VariantDataset element_spec=TensorSpec(shape=(), dtype=tf.int64, name=None)>
<_VariantDataset element_spec=TensorSpec(shape=(), dtype=tf.int64, name=None)>
<_VariantDataset element_spec=TensorSpec(shape=(), dtype=tf.int64, name=None)>


In [7]:
# show
for wd in dataset:
    print([item.numpy() for item in wd])

[0, 1, 2, 3, 4]
[1, 2, 3, 4, 5]
[2, 3, 4, 5, 6]
[3, 4, 5, 6, 7]
[4, 5, 6, 7, 8]
[5, 6, 7, 8, 9]


2024-03-14 10:34:19.820875: W tensorflow/core/framework/dataset.cc:959] Input of Window will not be optimized because the dataset does not implement the AsGraphDefInternal() method needed to apply optimizations.


In [9]:
# esto sirve para obtener un solo dataset de tensores.
dataset = dataset.flat_map(lambda window: window.batch(5))

for window in dataset:
    print(window.numpy())

[0 1 2 3 4]
[1 2 3 4 5]
[2 3 4 5 6]
[3 4 5 6 7]
[4 5 6 7 8]
[5 6 7 8 9]


In [10]:
dataset

<_FlatMapDataset element_spec=TensorSpec(shape=(None,), dtype=tf.int64, name=None)>

In [11]:
# se construyen features labels
# en este caso, mediante .map(), se genera la label como la última componente de la window
# Create tuples with features (first four elements of the window) and labels (last element)
dataset = dataset.map(lambda window: (window[:-1], window[-1]))

# Print the results
for x,y in dataset:
  print("x = ", x.numpy())
  print("y = ", y.numpy())
  print()

x =  [0 1 2 3]
y =  4

x =  [1 2 3 4]
y =  5

x =  [2 3 4 5]
y =  6

x =  [3 4 5 6]
y =  7

x =  [4 5 6 7]
y =  8

x =  [5 6 7 8]
y =  9



In [12]:
# shuffle
# En el caso de series de tiempo, es conveniente realizar
# shuffle para evitar overfitting del orden de la serie.

dataset = dataset.shuffle(buffer_size=10) # buffer_size > total datos

# batch/prefetch
# Create batches of windows
# batch = 2: se entrena en grupos de a dos
# prefetch(1): se precarga 1 dataset al estar realizandose el entrenamiento de un batch.
dataset = dataset.batch(2).prefetch(1)