<a href="https://colab.research.google.com/github/jeet-yadav27/E-Commerce-Revenue-Management/blob/main/CH_6_How_To_Prepare_Time_series_data_for_CNNS_and_LSTMS.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

### transform univariate time series to supervised learning problem

In [4]:
import numpy as np

# Split a univariate sequence into samples
def split_sequence(sequence, n_steps):
    X, y = [], []
    for i in range(len(sequence)):
        # Find the end of this pattern
        end_ix = i + n_steps
        # Check if we are beyond the sequence
        if end_ix > len(sequence)-1:
            break
        # Gather input and output parts of the pattern
        seq_x, seq_y = sequence[i:end_ix], sequence[end_ix]
        X.append(seq_x)
        y.append(seq_y)
    return np.array(X), np.array(y)

# Define univariate time series
series = np.array([1, 2, 3, 4, 5, 6, 7, 8, 9, 10])
print(series.shape)

# Transform to a supervised learning problem
X, y = split_sequence(series, 3)
print(X.shape, y.shape)

# Show each sample
for i in range(len(X)):
    print(X[i], y[i])

(10,)
(7, 3) (7,)
[1 2 3] 4
[2 3 4] 5
[3 4 5] 6
[4 5 6] 7
[5 6 7] 8
[6 7 8] 9
[7 8 9] 10


### 3D Data Preparation Basics

In [8]:
import numpy as np

# Split a univariate sequence into samples
def split_sequence(sequence, n_steps):
    # Initialize empty lists for input and output
    X, y = [], []

    for i in range(len(sequence)):
        # Find the end of this pattern
        end_ix = i + n_steps

        # Check if we are beyond the sequence
        if end_ix > len(sequence)-1:
            break

        # Gather input and output parts of the pattern
        seq_x, seq_y = sequence[i:end_ix], sequence[end_ix]

        # Append to input and output lists
        X.append(seq_x)
        y.append(seq_y)

    # Return input and output arrays
    return np.array(X), np.array(y)

# Define univariate time series
series = np.array([1, 2, 3, 4, 5, 6, 7, 8, 9, 10])
print("Original Series Shape:", series.shape)

# Transform to a supervised learning problem
X, y = split_sequence(series, 3)
print("Transformed X Shape:", X.shape)
print("Transformed y Shape:", y.shape)

# Transform input from [samples, features] to [samples, timesteps, features]
X = X.reshape((X.shape[0], X.shape[1], 1))
print("Reshaped X Shape:", X.shape)


Original Series Shape: (10,)
Transformed X Shape: (7, 3)
Transformed y Shape: (7,)
Reshaped X Shape: (7, 3, 1)


###Data Preparation Example
Consider that you are in the current situation:
I have two columns in my data le with 5,000 rows, column 1 is time (with 1 hour
interval) and column 2 is the number of sales and I am trying to forecast the number
of sales for future time steps. Help me to set the number of samples, time steps and
features in this data for an LSTM?
There are few problems here:
 Data Shape. LSTMs expect 3D input, and it can be challenging to get your head around
this the rst time.
 Sequence Length. LSTMs don't like sequences of more than 200-400 time steps, so the
data will need to be split into subsamples.
We will work through this example, broken down into the following 4 steps:
1. Load the Data
2. Drop the Time Column
3. Split Into Samples
4. Reshape Subsequences

Step 1: Load the Data

In [11]:
# example of defining a dataset
from numpy import array
# define the dataset
data = list()
n = 5000
for i in range(n):
 data.append([i+1, (i+1)*10])
data = array(data)
print(data.shape)

(5000, 2)


Running this piece both prints the rst 5 rows of data and the shape of the loaded data. We
can see we have 5,000 rows and 2 columns: a standard univariate time series dataset

Step2: Drop the Time Column

In [14]:
# example of dropping the time dimension from the dataset
from numpy import array
# define the dataset
data = list()
n = 5000
for i in range(n):
 data.append([i+1, (i+1)*10])
data = array(data)
# drop time
data = data[:, 1]
print(data[:5])
print(data.shape)

[10 20 30 40 50]
(5000,)


Step 3:  Split Into Samples

In [15]:
from numpy import array

# Define the dataset
data = list()
n = 5000

for i in range(n):
    data.append([i+1, (i+1)*10])

data = array(data)

# Drop time
data = data[:, 1]

# Split into samples (e.g. 5000/200 = 25)
samples = list()
length = 200

# Step over the 5,000 in jumps of 200
for i in range(0, n, length):
    # Grab from i to i + 200
    sample = data[i:i+length]
    samples.append(sample)

print(len(samples))

25


Step 4: Reshape Subsequences







1.    The LSTM needs data with the format of [samples, timesteps, features]. We have 25

2.   samples, 200 time steps per sample, and 1 feature. First, we need to convert our list of into a 2D NumPy array with the shape [25, 200]

In [17]:
from numpy import array
# define the dataset
data = list()
n = 5000
for i in range(n):
  data.append([i+1, (i+1)*10])
data = array(data)
# drop time
data = data[:, 1]
# split into samples (e.g. 5000/200 = 25)
samples = list()
length = 200
# step over the 5,000 in jumps of 200
for i in range(0,n,length):
# grab from i to i + 200
   sample = data[i:i+length]
   samples.append(sample)
# convert list of arrays into 2d array
data = array(samples)
print(data.shape)

(25, 200)


In [18]:
data


array([[   10,    20,    30, ...,  1980,  1990,  2000],
       [ 2010,  2020,  2030, ...,  3980,  3990,  4000],
       [ 4010,  4020,  4030, ...,  5980,  5990,  6000],
       ...,
       [44010, 44020, 44030, ..., 45980, 45990, 46000],
       [46010, 46020, 46030, ..., 47980, 47990, 48000],
       [48010, 48020, 48030, ..., 49980, 49990, 50000]])

Next, we can use the reshape() function to add one additional dimension for our single
feature and use the existing columns as time steps instead.

In [19]:
# example of creating a 3d array of subsequences
from numpy import array
# define the dataset
data = list()
n = 5000
for i in range(n):
 data.append([i+1, (i+1)*10])
data = array(data)
# drop time
data = data[:, 1]
# split into samples (e.g. 5000/200 = 25)
samples = list()
length = 200
# step over the 5,000 in jumps of 200
for i in range(0,n,length):
# grab from i to i + 200
  sample = data[i:i+length]
  samples.append(sample)
# convert list of arrays into 2d array
data = array(samples)
# reshape into [samples, timesteps, features]
data = data.reshape((len(samples), length, 1))
print(data.shape)

(25, 200, 1)


In [20]:
data

array([[[   10],
        [   20],
        [   30],
        ...,
        [ 1980],
        [ 1990],
        [ 2000]],

       [[ 2010],
        [ 2020],
        [ 2030],
        ...,
        [ 3980],
        [ 3990],
        [ 4000]],

       [[ 4010],
        [ 4020],
        [ 4030],
        ...,
        [ 5980],
        [ 5990],
        [ 6000]],

       ...,

       [[44010],
        [44020],
        [44030],
        ...,
        [45980],
        [45990],
        [46000]],

       [[46010],
        [46020],
        [46030],
        ...,
        [47980],
        [47990],
        [48000]],

       [[48010],
        [48020],
        [48030],
        ...,
        [49980],
        [49990],
        [50000]]])

And that is it. The data can now be used as an input (X) to an LSTM model, or even a
CNN model.