# Preparing Time Series Data for CNNs and LSTMs

This notebook explores the process of preparing time series data for CNNs and LSTMs

This first part of this notebooks tries to prepare a test time series data (i.e., it was generated). 

The second part of the notebook tries to prepare a real time series data i.e., the historical stock price data of Globe Telecommunications (a publicly listed company in the Philippines)

## Part 1

In [1]:
# import libraries
from numpy import array
from pandas import read_csv

In [2]:
# split a univariate sequence into samples
def split_sequence(sequence: array, n_steps: int) -> array:
    X, y = list(), list()
    for i in range(len(sequence)):
        # find the end of this pattern
        end_ix = i+n_steps
        #check if we are beyond the sequence
        if end_ix > len(sequence)-1:
            break
        # gather input and output parts of the pattern
        seq_x, seq_y = sequence[i:end_ix], sequence[end_ix]
        X.append(seq_x)
        y.append(seq_y)
    return array(X), array(y)

In [3]:
#define univariate time series
series = array([1,2,3,4,5,6,7,8,9,10])
print(series.shape)
#transform to a supervised learning problem
X, y = split_sequence(series, 3)
print(X.shape, y.shape)
for i in range(len(X)):
    print(X[i], y[i])

(10,)
(7, 3) (7,)
[1 2 3] 4
[2 3 4] 5
[3 4 5] 6
[4 5 6] 7
[5 6 7] 8
[6 7 8] 9
[7 8 9] 10


In [4]:
# transform input from [samples, features] to [samples, timesteps, features]
X = X.reshape((7,3,1))

# transform input from [samples, features] to [samples, timesteps, features]
X = X.reshape((X.shape[0],X.shape[1],1))

In [5]:
# create a test data
data = list()
n = 5000
for i in range(n):
    data.append([i+1, (i+1)*10])
data = array(data)


# siimulate dropping index from the data
data = data[:,1]
print(data[:5])

[10 20 30 40 50]


## Part 2

In [6]:
# load time series dataset
series = read_csv(filepath_or_buffer = './Data/csv/GLO.csv', header = 0, index_col = 0)
print(series.head)
print(series.columns)

<bound method NDFrame.head of                 Close       Open       High        Low   Volume
Date                                                           
1986-01-02   172.6575   172.6575   172.6575   172.6575    600.0
1986-01-03   172.6575   172.6575   172.6575   172.6575   1900.0
1986-01-06   175.4879   175.4879   175.4879   175.4879   2800.0
1986-01-07   181.1488   175.4879   181.1488   175.4879   3200.0
1986-01-09   181.1488   181.1488   181.1488   181.1488   2600.0
...               ...        ...        ...        ...      ...
2023-09-12  1781.0000  1786.0000  1809.0000  1781.0000  12985.0
2023-09-13  1787.0000  1799.0000  1799.0000  1781.0000  16850.0
2023-09-14  1795.0000  1795.0000  1795.0000  1782.0000  12070.0
2023-09-15  1752.0000  1795.0000  1810.0000  1752.0000  29105.0
2023-09-18  1752.0000  1789.0000  1789.0000  1752.0000  27225.0

[8861 rows x 5 columns]>
Index(['Close', 'Open', 'High', 'Low', 'Volume'], dtype='object')


In [7]:
# drop all other columns except for closing stock price
series.drop(columns=['Open','High', 'Low', 'Volume'])

Unnamed: 0_level_0,Close
Date,Unnamed: 1_level_1
1986-01-02,172.6575
1986-01-03,172.6575
1986-01-06,175.4879
1986-01-07,181.1488
1986-01-09,181.1488
...,...
2023-09-12,1781.0000
2023-09-13,1787.0000
2023-09-14,1795.0000
2023-09-15,1752.0000


In [8]:
# convert series into numpy array and only retain the closing prices
series = series['Close'].to_numpy()
print(type(series))
print(series.shape)


<class 'numpy.ndarray'>
(8861,)


In [9]:
# split the time series data
X, y = split_sequence(series, 10)
print(X[:5])

[[172.6575 172.6575 175.4879 181.1488 181.1488 181.1488 181.1488 181.1488
  178.3184 178.3184]
 [172.6575 175.4879 181.1488 181.1488 181.1488 181.1488 181.1488 178.3184
  178.3184 178.3184]
 [175.4879 181.1488 181.1488 181.1488 181.1488 181.1488 178.3184 178.3184
  178.3184 181.1488]
 [181.1488 181.1488 181.1488 181.1488 181.1488 178.3184 178.3184 178.3184
  181.1488 181.1488]
 [181.1488 181.1488 181.1488 181.1488 178.3184 178.3184 178.3184 181.1488
  181.1488 183.9793]]
