#Deep Learning Exercise 6 - Time Series Classification

This exercise is about comparison of models suitable to time series classification for univariate and multivariete data.

Data we will use come from [Time Series Classification Website](https://www.timeseriesclassification.com/dataset.php), we will use sensor data from FordA and Siemens datasets.

Other datasets are also available, we will show you how to create your own as well.

[Open in Google colab](https://colab.research.google.com/github/jplatos/VSB-FEI-Deep-Learning/blob/master/dl_06_time_series_classification.ipynb) [Download from Github](https://raw.githubusercontent.com/jplatos/VSB-FEI-Deep-Learning/main/dl_06_time_series_classification.ipynb)


In [None]:
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
from sklearn.tree import DecisionTreeClassifier
from sklearn.ensemble import RandomForestClassifier
import tensorflow as tf
import tensorflow.keras as keras
from sklearn.metrics import accuracy_score, confusion_matrix

tf.version.VERSION

Lets download [FordA](https://www.timeseriesclassification.com/description.php?Dataset=FordA) dataset converted for our purposes to the [Feather file format](https://arrow.apache.org/docs/python/feather.html), a binary file format for data exchange.

The data originates from ARFF file format used in Weka Data analysis tool and has classes labeled $\{-1,1\}$ which is not suitable for TensorFlow (SKLearn has no trouble with it), so we have to confert it to the $\{0,1\}$ set. 

In [None]:
train = pd.read_feather('https://github.com/jplatos/VSB-FEI-Deep-Learning/blob/941b1912c0971bef3c2ace907de4883bac8a88a6/datasets/FordA_TRAIN.feather?raw=true')
test = pd.read_feather('https://github.com/jplatos/VSB-FEI-Deep-Learning/blob/941b1912c0971bef3c2ace907de4883bac8a88a6/datasets/FordA_TRAIN.feather?raw=true')
train.target.replace({-1:0}, inplace=True)
test.target.replace({-1:0}, inplace=True)
print('Train: ',train.shape)
print('Test: ', test.shape)

The data contain 500 time steps of a measurement and single target value. The time series is almost normalized and it is not necessary to deal with it using scaling or normalizing. It may slightly improve the results but it depends on your experiments. 

In [None]:
train.head()

Parallel Coordinate plot is slightly difficult in MatPlotLib but this demonstration suffices. Other libraries may works better. But as you see it is very difficult to see many differences between the time series.

In [None]:
colors = ['b', 'g']
plt.figure(figsize=(21,9))
for idx in range(100):
  plt.plot(train.iloc[idx][:-1], c=colors[int(train.iloc[idx][-1])])
plt.tight_layout()
plt.show()

Convert the data into numpy arrays and separates *X* and *y* data from each other for triaingn and testing data.

In [None]:
train_x, train_y = train.drop(columns=['target']).values, train.target.values
test_x, test_y = test.drop(columns=['target']).values, test.target.values

Simple accuracy metric computed as well as confusion matrix display.

In [None]:
def compute_metrics(y_true, y_pred, show_confusion_matrix=False):
  print(f'\tAccuracy: {accuracy_score(y_true, y_pred)*100:8.2f}%')
  if (show_confusion_matrix):
    print('\tConfusion matrix:\n', confusion_matrix(y_true, y_pred))

Lets try some simple basic model on the data. DecisionTree and RandomForrest. As you will see it is a difficult task for them to get nice results. The result may differe from run to run due to incorporating a random process in prunning for DecisionTree and bagging in RandomForrest.

In [None]:
base_models = [DecisionTreeClassifier(), RandomForestClassifier()]

for model in base_models:
  model.fit(train_x, train_y)
  y_pred = model.predict(test_x)

  print(type(model).__name__)
  compute_metrics(test_y, y_pred)

## Neural Network models
Lets try some basic neural network model for this task. The first is a classical dense network with two hidden layers and dropout optimization, that is able to best the Randomforrest classifier.

In [None]:
def show_history(history):
    plt.figure()
    for key in history.history.keys():
        plt.plot(history.epoch, history.history[key], label=key)
    plt.legend()
    plt.tight_layout()

In [None]:
model = keras.Sequential([
    keras.layers.Dense(256, activation='relu', input_shape=train_x[0].shape),
    keras.layers.Dropout(0.2),
    keras.layers.Dense(256, activation='relu', input_shape=train_x[0].shape),
    keras.layers.Dropout(0.2),
    keras.layers.Dense(64, activation='relu'),
    keras.layers.Dense(2, activation='softmax')
])

model.summary()
model.compile(optimizer='adam', loss = keras.losses.SparseCategoricalCrossentropy(from_logits=False), metrics = ['accuracy'])

In [None]:
history = model.fit(train_x, train_y, validation_data=(test_x, test_y), epochs=10, batch_size=32)
show_history(history)

To test Convolution in single dimension we need to reshape the data to have the proper format. The format is the same to recurrent data (accidentaly) and must be in a format $(vectors,length,planes)$.

In [None]:
train_xc = np.reshape(train_x, (*train_x.shape, 1))
test_xc = np.reshape(test_x, (*test_x.shape, 1))
train_xc.shape, test_xc.shape

Lets try the single convolution layer as a input mapping that generates a huge number of weights for Dense layers after flattening. The results are not excelent. 

In [None]:
model = keras.Sequential([
    keras.layers.Conv1D(64, kernel_size=3, activation='relu', input_shape=train_xc[0].shape),
    keras.layers.Flatten(),
    keras.layers.Dense(64, activation='relu'),
    keras.layers.Dense(2, activation='softmax')
])

model.summary()
model.compile(optimizer='adam', loss = keras.losses.SparseCategoricalCrossentropy(from_logits=False), metrics = ['accuracy'])

In [None]:
history = model.fit(train_xc, train_y, validation_data=(test_xc, test_y), epochs=10, batch_size=32)
show_history(history)

The slightly more complicated model is able to beat all previous models with smallel number of weight needed.

In [None]:
model = keras.Sequential([
    keras.layers.Conv1D(64, kernel_size=3, activation='relu', input_shape=train_xc[0].shape),
    keras.layers.Conv1D(64, kernel_size=3, activation='relu'),
    keras.layers.MaxPool1D(2),
    keras.layers.Conv1D(64, kernel_size=3, activation='relu', input_shape=train_xc[0].shape),
    keras.layers.Conv1D(64, kernel_size=3, activation='relu'),
    keras.layers.Flatten(),
    keras.layers.Dense(64, activation='relu'),
    keras.layers.Dense(2, activation='softmax')
])

model.summary()
model.compile(optimizer='adam', loss = keras.losses.SparseCategoricalCrossentropy(from_logits=False), metrics = ['accuracy'])

In [None]:
history = model.fit(train_xc, train_y, validation_data=(test_xc, test_y), epochs=10, batch_size=32)
show_history(history)

Even more capable model with more pooling layers with 1/4 weight of the previsou model i able to achieve more than 90% of the accuracy. It has a one big drawback that reduce its ability to achieve better results. 

In [None]:
model = keras.Sequential([
    keras.layers.Conv1D(64, kernel_size=3, activation='relu', input_shape=train_xc[0].shape),
    keras.layers.MaxPool1D(2),
    keras.layers.Conv1D(64, kernel_size=3, activation='relu'),
    keras.layers.MaxPool1D(2),
    keras.layers.Conv1D(64, kernel_size=3, activation='relu', input_shape=train_xc[0].shape),
    keras.layers.MaxPool1D(2),
    keras.layers.Conv1D(64, kernel_size=3, activation='relu'),
    keras.layers.Flatten(),
    keras.layers.Dense(64, activation='relu'),
    keras.layers.Dense(2, activation='softmax')
])

model.summary()
model.compile(optimizer='adam', loss = keras.losses.SparseCategoricalCrossentropy(from_logits=False), metrics = ['accuracy'])

In [None]:
history = model.fit(train_xc, train_y, validation_data=(test_xc, test_y), epochs=10, batch_size=32)
show_history(history)

## Recurrent models
Lets focus on a more time series look on the data and use a recurrent models on the data, that should be able to achieve a better results when used properly. 

In [None]:
model = keras.Sequential([
    keras.layers.GRU(64, activation='tanh', input_shape=train_xc[0].shape, return_sequences=True),
    keras.layers.GRU(32, activation='tanh', input_shape=train_xc[0].shape, return_sequences=True),
    keras.layers.Flatten(),
    keras.layers.Dense(64, activation='relu'),
    keras.layers.Dense(2, activation='softmax')
])

model.summary()
model.compile(optimizer='adam', loss = keras.losses.SparseCategoricalCrossentropy(from_logits=False), metrics = ['accuracy'])

In [None]:
history = model.fit(train_xc, train_y, validation_data=(test_xc, test_y), epochs=10, batch_size=32)
show_history(history)