## <font color='darkblue'>Preface</font>
([course link](https://machinelearningmastery.com/using-cnn-for-financial-time-series-prediction/)) <b><font size='3ptx'>Convolutional neural networks have their roots in image processing. It was first published in LeNet to recognize the MNIST handwritten digits. However, convolutional neural networks are not limited to handling images.</font></b>

<b>In this tutorial, we are going to look at an example of using CNN for time series prediction with an application from financial markets.</b> By way of this example, we are going to explore some techniques in using Keras for model training as well.

After completing this tutorial, you will know
* What a typical multidimensional financial data series looks like?
* How can CNN applied to time series in a classification problem
* How to use generators to feed data to train a Keras model
* How to provide a custom metric for evaluating a Keras model

<a id='sect_0'></a>
### <font color='darkgreen'>Tutorial overview</font>
This tutorial is divided into 7 parts; they are:
* [**Background of the idea**](#sect_1)
* [**Preprocessing of data**](#sect_2)
* [**Data generator**](#sect_3)
* [**The model**](#sect_4)
* Training, validation, and test
* Extensions
* Does it work?

In [25]:
import enum
import os
import pandas as pd
import numpy as np
import random
from sklearn.preprocessing import StandardScaler

class DataType(enum.Enum):
  TRAIN = 0
  VALID = 1

  
DATADIR = '../../datas/CNNpred_data'
TRAIN_TEST_CUTOFF = '2016-04-21'
TRAIN_VALID_RATIO = 0.75

<a id='sect_1'></a>
## <font color='darkblue'>Background of the idea</font>
In this tutorial we are following the paper titled “CNNpred: CNN-based stock market prediction using a iverse set of variables” by Ehsan Hoseinzade and Saman Haratizadeh. The data file and sample code from the author are available in [github](https://github.com/hoseinzadeehsan/CNNpred-Keras).

<b>The goal of the paper is simple: To predict the next day’s direction of the stock market</b> (<font color='brown'>i.e., up or down compared to today</font>), hence it is a binary classification problem. However, it is interesting to see how this problem are formulated and solved.

We have seen the examples on using CNN for sequence prediction. If we consider [Dow Jones Industrial Average](https://en.wikipedia.org/wiki/Dow_Jones_Industrial_Average) (<font color='brown'>DJIA</font>) as an example, we may build a CNN with 1D convolution for prediction. This makes sense because a 1D convolution on a time series is roughly computing its moving average or using digital signal processing terms, applying a filter to the time series. It should provide some clues about the trend.

However, <b>when we look at financial time series, it is quite a common sense that some derived signals are useful for predictions too. For example, price and volume together can provide a better clue. Also some other technical indicators such as the moving average of different window size are useful too</b>. If we put all these align together, we will have a table of data, which each time instance has multiple features, and the goal is still to predict the direction of one time series.

In the CNNpred paper, 82 such features are prepared for the DJIA time series:
![features](images/1.PNG)

<b>Unlike LSTM, which there is an explicit concept of time steps applied, we present data as a matrix in CNN models</b>. As shown in the table below, the features across multiple time steps are presented as a 2D array.
![features](images/2.PNG)

<a id='sect_2'></a>
## <font color='darkblue'>Preprocessing of data</font>
<b><font size='3ptx'>In the following, we try to implement the idea of the CNNpred from scratch using Tensorflow’s keras API. While there is a reference implementation from the author in the github link above, we reimplement it differently to illustrate some Keras techniques.</font></b>

Firstly the data are five CSV files, each for a different market index, under the Dataset directory from github repository above, or we can also get a copy here ([CNNpred-data.zip](https://machinelearningmastery.com/?attachment_id=13057)). The input data has a date column and a name column to identify the ticker symbol for the market index. We can leave the date column as time index and remove the name column. The rest are all numerical.

For five data file in the directory, we read each of them as a separate pandas DataFrame and keep them in a Python dictionary:

In [2]:
data = {}

for filename in os.listdir(DATADIR):
    if not filename.lower().endswith(".csv"):
        continue # read only the CSV files
    filepath = os.path.join(DATADIR, filename)
    X = pd.read_csv(filepath, index_col="Date", parse_dates=True)
    # basic preprocessing: get the name, the classification
    # Save the target variable as a column in dataframe for easier dropna()
    name = X["Name"][0]
    del X["Name"]
    cols = X.columns
    X["Target"] = (X["Close"].pct_change().shift(-1) > 0).astype(int)
    X.dropna(inplace=True)
    # Fit the standard scaler using the training dataset
    index = X.index[X.index > TRAIN_TEST_CUTOFF]
    index = index[:int(len(index) * TRAIN_VALID_RATIO)]
    scaler = StandardScaler().fit(X.loc[index, cols])
    # Save scale transformed dataframe
    X[cols] = scaler.transform(X[cols])
    data[name] = X

The result of the above code is a DataFrame for each index, which the classification label is the column “Target” while all other columns are input features. We also normalize the data with a standard scaler.

In [3]:
data.keys()

dict_keys(['RUT', 'DJI', 'S&P', 'NYA', 'NASDAQ'])

In [4]:
data['RUT']['Target'][:5]

Date
2010-10-19    1
2010-10-20    0
2010-10-21    1
2010-10-26    0
2010-10-27    0
Name: Target, dtype: int64

As we are going to predict the market direction, we first try to create the classification label. The market direction is defined as the closing index of tomorrow compared to today. If we have read the data into a pandas DataFrame, we can use `X["Close"].pct_change()` to find the percentage change, which a positive change for the market goes up. So we can shift this to one time step back as our label. For example:

In [5]:
X["Close"][10:15]

Date
2010-11-08   -8.335279
2010-11-09   -8.386844
2010-11-10   -8.339116
2010-11-11   -8.409379
2010-11-12   -8.522084
Name: Close, dtype: float64

In [6]:
X["Close"].pct_change()[10:15]

Date
2010-11-08   -0.000388
2010-11-09    0.006186
2010-11-10   -0.005691
2010-11-11    0.008426
2010-11-12    0.013402
Name: Close, dtype: float64

In [7]:
(X["Close"].pct_change().shift(-1) > 0).astype(int)[10:15]

Date
2010-11-08    1
2010-11-09    0
2010-11-10    1
2010-11-11    1
2010-11-12    1
Name: Close, dtype: int64

<b>In time series problems, it is generally reasonable not to split the data into training and test sets randomly, but to set up a cutoff point in which the data before the cutoff is training set while that afterwards is the test set</b>. The scaling above are based on the training set but applied to the entire dataset:
```python
    index = X.index[X.index > TRAIN_TEST_CUTOFF]
    index = index[:int(len(index) * TRAIN_VALID_RATIO)]
```

In [8]:
print(f'We have {X.shape[0]} points as training data')

We have 1114 points as training data


<a id='sect_3'></a>
## <font color='darkblue'>Data generator</font> ([back](#sect_0))
<b><font size='3ptx'>We are not going to use all time steps at once, but instead, we use a fixed length of `N` time steps to predict the market direction at step `N+1`. In this design, the window of N time steps can start from anywhere</font></b>

We can just create a large number of DataFrames with large amount of overlaps with one another. <b>To save memory, we are going to build a data generator for training and validation, as follows</b>:

In [9]:
def datagen(data, seq_len, batch_size, targetcol, kind: DataType):
    """As a generator to produce samples for Keras model.
    
    Args:
      data: Raw data to produce fixed length of data from.
      seq_len: The desired sequence length.
      batch_size: Parameter used in learning.
      targetcol: The target column to make prediction at.
      kind: ('train'|'valid')
      
    Returns:
      Fixed length of sequence dataset.
    """
    batch = []
    while True:
        # Pick one dataframe from the pool
        key = random.choice(list(data.keys()))
        df = data[key]
        input_cols = [c for c in df.columns if c != targetcol]
        index = df.index[df.index < TRAIN_TEST_CUTOFF]
        split = int(len(index) * TRAIN_VALID_RATIO)
        if kind == DataType.TRAIN:
            index = index[:split]   # range for the training set
        elif kind == DataType.VALID:
            index = index[split:]   # range for the validation set
            
        # Pick one position, then clip a sequence length
        while True:
            t = random.choice(index)      # pick one time step
            n = (df.index == t).argmax()  # find its position in the dataframe
            if n-seq_len+1 < 0:
                continue # can't get enough data for one sequence length
            frame = df.iloc[n-seq_len+1:n+1]
            batch.append([frame[input_cols].values, df.loc[t, targetcol]])
            break
            
        # if we get enough for a batch, dispatch
        if len(batch) == batch_size:
            X, y = zip(*batch)
            X, y = np.expand_dims(np.array(X), 3), np.array(y)
            yield X, y
            batch = []

[**Generator**](https://docs.python.org/3/c-api/gen.html) is a special function in Python that does not return a value but to yield in iterations, such that a sequence of data are produced from it. **For a generator to be used in Keras training, it is expected to yield a batch of input data and target. This generator supposed to run indefinitely. Hence the generator function above is created with an infinite loop starts with `while True`**.

In each iteration, it randomly pick one DataFrame from the Python dictionary, then within the range of time steps of the training set (<font color='brown'>i.e., the beginning portion</font>), we start from a random point and take `N` time steps using the pandas `iloc[start:end]` syntax to create a input under the variable frame. This DataFrame will be a 2D array. The target label is that of the last time step. The input data and the label are then appended to the list batch. Until we accumulated for one batch’s size, we dispatch it from the generator.

The last four lines at the code snippet above is to dispatch a batch for training or validation. We collect the list of input data (<font color='brown'>each a 2D array</font>) as well as a list of target label into variables `X` and `y`, then convert them into numpy array so it can work with our Keras model. We need to add one more dimension to the numpy array `X` using <font color='blue'>np.expand_dims()</font> because of the design of the network model, as explained below.

In [10]:
test_data = np.array([[[1, 2, 3], 0], [[4, 5, 6], 1], [[7, 8, 9], 2]], dtype=object)
test_data.shape

(3, 2)

In [11]:
test_data

array([[list([1, 2, 3]), 0],
       [list([4, 5, 6]), 1],
       [list([7, 8, 9]), 2]], dtype=object)

In [12]:
new_test_data = np.expand_dims(test_data, 2)
new_test_data

array([[[list([1, 2, 3])],
        [0]],

       [[list([4, 5, 6])],
        [1]],

       [[list([7, 8, 9])],
        [2]]], dtype=object)

In [13]:
new_test_data.shape

(3, 2, 1)

In [14]:
new_test_data = np.expand_dims(test_data, 0)
new_test_data

array([[[list([1, 2, 3]), 0],
        [list([4, 5, 6]), 1],
        [list([7, 8, 9]), 2]]], dtype=object)

In [15]:
new_test_data.shape

(1, 3, 2)

<a id='sect_4'></a>
## <font color='darkblue'>The Model</font> ([back](#sect_0))
<font size='3ptx'><b>The 2D CNN model presented in the original paper accepts an input tensor of shape 
$N*m*1$ for `N` the number of time steps and `m` the number of features in each time step. The paper assumes 
$N=60$ and $m=82$.</b></font>

The model comprises of three convolutional layers, as described as follows:

In [19]:
from tensorflow.keras import backend as K
from tensorflow.keras.layers import Dense, Dropout, Flatten, Conv2D, MaxPool2D, Input
from tensorflow.keras.models import Sequential, load_model
from tensorflow.keras.callbacks import ModelCheckpoint
from sklearn.preprocessing import StandardScaler
from sklearn.metrics import accuracy_score, f1_score, mean_absolute_error

In [17]:
def cnnpred_2d(seq_len=60, n_features=82, n_filters=(8,8,8), drop_rate=0.1):
    """2D-CNNpred model according to the paper
    
    Args:
      seq_len: Length of sequence.
      n_features: Number of features.
      n_filters: CNN hyper paramemter.
      drop_rate: Keras hyper parameter.
      
    Returns:
      CNN model.
    """
    model = Sequential([
        Input(shape=(seq_len, n_features, 1)),
        Conv2D(n_filters[0], kernel_size=(1, n_features), activation="relu"),
        Conv2D(n_filters[1], kernel_size=(3,1), activation="relu"),
        MaxPool2D(pool_size=(2,1)),
        Conv2D(n_filters[2], kernel_size=(3,1), activation="relu"),
        MaxPool2D(pool_size=(2,1)),
        Flatten(),
        Dropout(drop_rate),
        Dense(1, activation="sigmoid")
    ])
    return model

and the model is presented by the following:

In [20]:
cnn_model = cnnpred_2d()
cnn_model.summary()

Model: "sequential_1"
_________________________________________________________________
 Layer (type)                Output Shape              Param #   
 conv2d_3 (Conv2D)           (None, 60, 1, 8)          664       
                                                                 
 conv2d_4 (Conv2D)           (None, 58, 1, 8)          200       
                                                                 
 max_pooling2d_2 (MaxPooling  (None, 29, 1, 8)         0         
 2D)                                                             
                                                                 
 conv2d_5 (Conv2D)           (None, 27, 1, 8)          200       
                                                                 
 max_pooling2d_3 (MaxPooling  (None, 13, 1, 8)         0         
 2D)                                                             
                                                                 
 flatten_1 (Flatten)         (None, 104)              

The first convolutional layer has 8 units, and is applied across all features in each time step. It is followed by a second convolutional layer to consider three consecutive days at once, for it is a common belief that three days can make a trend in the stock market. It is then applied to a max pooling layer and another convolutional layer before it is flattened into a one-dimensional array and applied to a fully-connected layer with sigmoid activation for binary classification.

### <font color='darkgreen'>Training, validation, and test</font>
That’s it for the model. The paper used [**MAE**](https://en.wikipedia.org/wiki/Mean_absolute_error) as the loss metric and also monitor for accuracy and F1 score to determine the quality of the model. We should point out that F1 score depends on precision and recall ratios, which are both considering the positive classification. The paper, however, consider the average of the F1 from positive and negative classification. Explicitly, it is the F1-macro metric:
![F1-macro metric](images/3.PNG)

The first term in the big parenthesis above is the normal F1 metric that considered positive classifications. And the second term is the reverse, which considered the negative classifications.

While this metric is available in scikit-learn as [**sklearn.metrics**.f1_score()](https://scikit-learn.org/stable/modules/generated/sklearn.metrics.f1_score.html) there is no equivalent in Keras. Hence we would create our own by borrowing code from [this stackexchange question](https://datascience.stackexchange.com/questions/45165/):

In [22]:
from tensorflow.keras import backend as K

def recall_m(y_true, y_pred):
    true_positives = K.sum(K.round(K.clip(y_true * y_pred, 0, 1)))
    possible_positives = K.sum(K.round(K.clip(y_true, 0, 1)))
    recall = true_positives / (possible_positives + K.epsilon())
    return recall

def precision_m(y_true, y_pred):
    true_positives = K.sum(K.round(K.clip(y_true * y_pred, 0, 1)))
    predicted_positives = K.sum(K.round(K.clip(y_pred, 0, 1)))
    precision = true_positives / (predicted_positives + K.epsilon())
    return precision

def f1_m(y_true, y_pred):
    precision = precision_m(y_true, y_pred)
    recall = recall_m(y_true, y_pred)
    return 2 * ((precision * recall)/(precision + recall + K.epsilon()))

def f1macro(y_true, y_pred):
    f_pos = f1_m(y_true, y_pred)
    # negative version of the data and prediction
    f_neg = f1_m(1-y_true, 1-K.clip(y_pred,0,1))
    return (f_pos + f_neg) / 2

The training process can take hours to complete. Hence we want to save the model in the middle of the training so that we may interrupt and resume it. We can make use of checkpoint features in Keras:

In [23]:
checkpoint_path = "./cp2d-{epoch}-{val_f1macro:.2f}.h5"

callbacks = [
    ModelCheckpoint(
      checkpoint_path,
      monitor='val_f1macro', mode="max", verbose=0,
      save_best_only=True, save_weights_only=False, save_freq="epoch")]

We set up a filename template `checkpoint_path` and ask Keras to fill in the epoch number as well as validation F1 score into the filename. We save it by monitoring the validation’s F1 metric, and this metric is supposed to increase when the model gets better. Hence we pass in the `mode="max"` to it.

It should now be trivial to train our model, as follows:

In [26]:
seq_len    = 60
batch_size = 128
n_epochs   = 20
n_features = 82

model = cnnpred_2d(seq_len, n_features)
model.compile(optimizer="adam", loss="mae", metrics=["acc", f1macro])
history = model.fit(datagen(data, seq_len, batch_size, "Target", "train"),
          validation_data=datagen(data, seq_len, batch_size, "Target", "valid"),
          epochs=n_epochs, steps_per_epoch=400, validation_steps=10, verbose=1,
          callbacks=callbacks)

Epoch 1/20
Epoch 2/20
Epoch 3/20
Epoch 4/20
Epoch 5/20
Epoch 6/20
Epoch 7/20
Epoch 8/20
Epoch 9/20
Epoch 10/20
Epoch 11/20
Epoch 12/20
Epoch 13/20
Epoch 14/20
Epoch 15/20
Epoch 16/20
Epoch 17/20
Epoch 18/20
Epoch 19/20
Epoch 20/20


<keras.callbacks.History at 0x7f76485dfac0>

Two points to note in the above snippets. We supplied "acc" as the accuracy as well as the function `f1macro` defined above as the metrics parameter to the <font color='blue'>compile()</font> function. Hence these two metrics will be monitored during training. Because the function is named `f1macro`, we refer to this metric in the checkpoint’s monitor parameter as `val_f1macro`.

Separately, in the <font color='blue'>fit()</font> function, we provided the input data through the <font color='blue'>datagen()</font> generator as defined above. Calling this function will produce a generator, which during the training loop, batches are fetched from it one after another. Similarly, validation data are also provided by the generator.