# Generating sequences

To be able to train neural networks on sequential data, you need to pre-process it first. You'll chunk the data into inputs-target pairs, where the inputs are some number of consecutive data points and the target is the next data point.

Your task is to define a function to do this called create_sequences(). As inputs, it will receive data stored in a DataFrame, df and seq_length, the length of the inputs. As outputs, it should return two NumPy arrays, one with input sequences and the other one with the corresponding targets.

* Iterate over the range of the number of data points minus the length of an input sequence.
* Define the inputs x as the slice of df from the ith row to the i + seq_lengthth row and the column at index 1.
* Define the target y as the slice of df at row index i + seq_length and the column at index 1.

In [32]:
import pandas as pd
import numpy as np
import torch
from torch.utils.data import TensorDataset
import torch.nn as nn

In [2]:
df = pd.read_csv("/kaggle/input/electricity-consumption/electricityConsumptionAndProductioction.csv/electricityConsumptionAndProductioction.csv")

In [6]:
df.head()


Unnamed: 0,DateTime,Consumption,Production,Nuclear,Wind,Hydroelectric,Oil and Gas,Coal,Solar,Biomass
0,2019-01-01 00:00:00,6352,6527,1395,79,1383,1896,1744,0,30
1,2019-01-01 01:00:00,6116,5701,1393,96,1112,1429,1641,0,30
2,2019-01-01 02:00:00,5873,5676,1393,142,1030,1465,1616,0,30
3,2019-01-01 03:00:00,5682,5603,1397,191,972,1455,1558,0,30
4,2019-01-01 04:00:00,5557,5454,1393,159,960,1454,1458,0,30


In [7]:
# Extract 'timestamp' and 'Consumption' columns
df = df[['DateTime', 'Consumption']]
df.columns = ['timestamp', 'consumption']

# Convert 'timestamp' to datetime
df['timestamp'] = pd.to_datetime(df['timestamp'])
df = df.set_index('timestamp')

# Step 2: Resample the data to 1-hour intervals
df = df.resample('H').mean()

A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  df['timestamp'] = pd.to_datetime(df['timestamp'])
  df = df.resample('H').mean()


In [8]:
df.head()

Unnamed: 0_level_0,consumption
timestamp,Unnamed: 1_level_1
2019-01-01 00:00:00,6352.0
2019-01-01 01:00:00,6116.0
2019-01-01 02:00:00,5873.0
2019-01-01 03:00:00,5682.0
2019-01-01 04:00:00,5557.0


In [26]:
len(df)

46008

In [20]:

# Step 3: Split the data into training and testing sets
train_data = df[:'2023-05-30']  # First three years (up to end of 2022)
test_data = df['2023-06-01':]   # Fourth year (starting from 2023)



In [21]:
print(len(train_data))
print(len(test_data))

38664
7320


In [28]:
train_data

Unnamed: 0_level_0,consumption
timestamp,Unnamed: 1_level_1
2019-01-01 00:00:00,6352.0
2019-01-01 01:00:00,6116.0
2019-01-01 02:00:00,5873.0
2019-01-01 03:00:00,5682.0
2019-01-01 04:00:00,5557.0
...,...
2023-05-30 19:00:00,6309.0
2023-05-30 20:00:00,6411.0
2023-05-30 21:00:00,6647.0
2023-05-30 22:00:00,6283.0


In [22]:
test_data

Unnamed: 0_level_0,consumption
timestamp,Unnamed: 1_level_1
2023-06-01 00:00:00,5136.0
2023-06-01 01:00:00,4943.0
2023-06-01 02:00:00,4778.0
2023-06-01 03:00:00,4726.0
2023-06-01 04:00:00,4701.0
...,...
2024-03-31 19:00:00,5618.0
2024-03-31 20:00:00,6107.0
2024-03-31 21:00:00,5991.0
2024-03-31 22:00:00,5527.0


In [24]:
train_data.shape

(38664, 1)

**Sequential Dataset**

Good job building the create_sequences() function! It's time to use it to create a training dataset for your model.

Just like tabular and image data, sequential data is easiest passed to a model through a torch Dataset and DataLoader. To build a sequential Dataset, you will call create_sequences() to get the NumPy arrays with inputs and targets, and inspect their shape. Next, you will pass them to a TensorDataset to create a proper torch Dataset, and inspect its length.

Your implementation of create_sequences() and a DataFrame with the training data called train_data are available.

Call create_sequences(), passing it the training DataFrame and a sequence length of 24*4, assigning the result to X_train, y_train.
Define dataset_train by calling TensorDataset and passing it two arguments, the inputs and the targets created by create_sequences(), both converted from NumPy arrays to tensors of floats.

In [29]:
def create_sequences(df, seq_length):
    xs, ys = [], []
    # Iterate over data indices
    for i in range(len(df) - seq_length):
      	# Define inputs
        x = df.iloc[i:(i + seq_length), 0]
        # Define target
        y = df.iloc[i + seq_length, 0]
        xs.append(x)
        ys.append(y)
    return np.array(xs), np.array(ys)

You can now use create_sequences() to create a set of training or testing examples for the model, where each example consists of an input of seq_length consecutive data points, and the single target, the following data point. 

In [30]:
# Define sequence length: 24 hours
seq_length = 24 * 24

# Use create_sequences to create inputs and targets for training data
X_train, y_train = create_sequences(train_data, seq_length)
print(X_train.shape, y_train.shape)

(38088, 576) (38088,)


As you can see from the printed output, we have 38088 training examples, each consisting of 576 inputs and 1 target value. The TensorDataset you have just built behaves the same way as other Torch Datasets you have used before, such us our custom WaterDataset or the ImageFolder dataset; you can pass it to a DataLoader in the same way. With the sequential data ready, let's take a look at model architectures suitable for processing sequential data!

# Building a forecasting RNN

It's time to build your first recurrent network! It will be a sequence-to-vector model consisting of an RNN layer with two layers and a hidden_size of 32. After the RNN layer, a simple linear layer will map the outputs to a single value to be predicted.

In [33]:
class Net(nn.Module):
    def __init__(self):
        super().__init__()
        # Define RNN layer
        self.rnn = nn.RNN(
            input_size=1,
            hidden_size=32,
            num_layers=2,
            batch_first=True,
        )
        self.fc = nn.Linear(32, 1)

    def forward(self, x):
        # Initialize first hidden state with zeros
        h0 = torch.zeros(2, x.size(0), 32)
        # Pass x and h0 through recurrent layer
        out, _ = self.rnn(x, h0)  
        # Pass recurrent layer's last output through linear layer
        out = self.fc(out[:, -1, :])
        return out