<a href="https://colab.research.google.com/github/msskx/deepLearning/blob/main/timeseries_transformer_classification.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Timeseries classification with a Transformer model

**Author:** [Theodoros Ntakouris](https://github.com/ntakouris)<br>
**Date created:** 2021/06/25<br>
**Last modified:** 2021/08/05<br>
**Description:** This notebook demonstrates how to do timeseries classification using a Transformer model.

In [24]:
from tensorflow.keras.utils import to_categorical

## Introduction

This is the Transformer architecture from
[Attention Is All You Need](https://arxiv.org/abs/1706.03762),
applied to timeseries instead of natural language.

This example requires TensorFlow 2.4 or higher.

## Load the dataset

We are going to use the same dataset and preprocessing as the
[TimeSeries Classification from Scratch](https://keras.io/examples/timeseries/timeseries_classification_from_scratch)
example.

In [2]:
import numpy as np


def readucr(filename):
    data = np.loadtxt(filename, delimiter="\t")
    y = data[:, 0]
    x = data[:, 1:]
    return x, y.astype(int)


root_url = "https://raw.githubusercontent.com/hfawaz/cd-diagram/master/FordA/"

x_train, y_train = readucr(root_url + "FordA_TRAIN.tsv")
x_test, y_test = readucr(root_url + "FordA_TEST.tsv")

x_train = x_train.reshape((x_train.shape[0], x_train.shape[1], 1))
x_test = x_test.reshape((x_test.shape[0], x_test.shape[1], 1))

n_classes = len(np.unique(y_train))

idx = np.random.permutation(len(x_train))
x_train = x_train[idx]
y_train = y_train[idx]

y_train[y_train == -1] = 0
y_test[y_test == -1] = 0

In [11]:
x_train.shape,y_train.shape

((3601, 500, 1), (3601,))

In [22]:
x_test.shape,y_test.shape

((1320, 500, 1), (1320,))

## Build the model

Our model processes a tensor of shape `(batch size, sequence length, features)`,
where `sequence length` is the number of time steps and `features` is each input
timeseries.

You can replace your classification RNN layers with this one: the
inputs are fully compatible!

In [33]:
def data():
    df = pd.read_csv('/content/drive/MyDrive/ECG/Xtrain.csv',index_col=0)  # 读取文件
    dft=pd.read_csv('/content/drive/MyDrive/ECG/trainreference.csv')
    
    
    
    df_for_training=df[:-1600000]
    df_for_testing=df[-1600000:]
    dataX=[]
    dataY=[]
#     scaler=MinMaxScaler(feature_range=(0,1))
    for i in range(1600):
        temp=df.loc[df['series_id']==i].drop(labels='series_id',axis=1)
#         temp=scaler.fit_transform(temp)
#         temp=temp[:2000]
        dataX.append(np.array(temp))
    
    dataX=np.array(dataX)
    # dataY=np.array(pd.DataFrame(dft[['good','bad']]))
    dataY=np.array(dft['tag'])
    trainX=dataX[:1280]
    trainY=dataY[:1280]
    testX=dataX[1280:]
    testY=dataY[1280:]
    print(trainX.shape, trainY.shape, testX.shape, testY.shape)
    return trainX, trainY, testX, testY

In [34]:
x_train,y_train,x_test,y_test=data()

(1280, 5000, 12) (1280,) (320, 5000, 12) (320,)


In [19]:
from tensorflow import keras
from tensorflow.keras import layers

We include residual connections, layer normalization, and dropout.
The resulting layer can be stacked multiple times.

The projection layers are implemented through `keras.layers.Conv1D`.

In [35]:

def transformer_encoder(inputs, head_size, num_heads, ff_dim, dropout=0):
    # Normalization and Attention
    x = layers.LayerNormalization(epsilon=1e-6)(inputs)
    x = layers.MultiHeadAttention(
        key_dim=head_size, num_heads=num_heads, dropout=dropout
    )(x, x)
    x = layers.Dropout(dropout)(x)
    res = x + inputs

    # Feed Forward Part
    x = layers.LayerNormalization(epsilon=1e-6)(res)
    x = layers.Conv1D(filters=ff_dim, kernel_size=1, activation="relu")(x)
    x = layers.Dropout(dropout)(x)
    x = layers.Conv1D(filters=inputs.shape[-1], kernel_size=1)(x)
    return x + res


The main part of our model is now complete. We can stack multiple of those
`transformer_encoder` blocks and we can also proceed to add the final
Multi-Layer Perceptron classification head. Apart from a stack of `Dense`
layers, we need to reduce the output tensor of the `TransformerEncoder` part of
our model down to a vector of features for each data point in the current
batch. A common way to achieve this is to use a pooling layer. For
this example, a `GlobalAveragePooling1D` layer is sufficient.

In [36]:

def build_model(
    input_shape,
    head_size,
    num_heads,
    ff_dim,
    num_transformer_blocks,
    mlp_units,
    dropout=0,
    mlp_dropout=0,
):
    inputs = keras.Input(shape=input_shape)
    x = inputs
    for _ in range(num_transformer_blocks):
        x = transformer_encoder(x, head_size, num_heads, ff_dim, dropout)

    x = layers.GlobalAveragePooling1D(data_format="channels_first")(x)
    for dim in mlp_units:
        x = layers.Dense(dim, activation="relu")(x)
        x = layers.Dropout(mlp_dropout)(x)
    outputs = layers.Dense(n_classes, activation="softmax")(x)
    return keras.Model(inputs, outputs)


In [28]:
x_train.shape[1:]

(5000, 12)

## Train and evaluate

In [1]:
input_shape = x_train.shape[1:]

model = build_model(
    input_shape,
    head_size=256,
    num_heads=4,
    ff_dim=4,
    num_transformer_blocks=4,
    mlp_units=[128],
    mlp_dropout=0.4,
    dropout=0.25,
)

model.compile(
    loss="sparse_categorical_crossentropy",
    optimizer=keras.optimizers.Adam(learning_rate=1e-4),
    metrics=["sparse_categorical_accuracy"],
)
model.summary()

callbacks = [keras.callbacks.EarlyStopping(patience=10, restore_best_weights=True)]

model.fit(
    x_train,
    y_train,
    validation_split=0.2,
    epochs=200,
    batch_size=64,
    callbacks=callbacks,
)

model.evaluate(x_test, y_test, verbose=1)

NameError: ignored

## Conclusions

In about 110-120 epochs (25s each on Colab), the model reaches a training
accuracy of ~0.95, validation accuracy of ~84 and a testing
accuracy of ~85, without hyperparameter tuning. And that is for a model
with less than 100k parameters. Of course, parameter count and accuracy could be
improved by a hyperparameter search and a more sophisticated learning rate
schedule, or a different optimizer.

You can use the trained model hosted on [Hugging Face Hub](https://huggingface.co/keras-io/timeseries_transformer_classification) and try the demo on [Hugging Face Spaces](https://huggingface.co/spaces/keras-io/timeseries_transformer_classification).

In [6]:
import pandas as pd

In [1]:
from google.colab import drive
drive.mount('/content/drive')

Mounted at /content/drive


In [7]:
df = pd.read_csv('/content/drive/MyDrive/ECG/Xtrain.csv',index_col=0)  # 读取文件
dft=pd.read_csv('/content/drive/MyDrive/ECG/label.csv')

In [9]:
df

Unnamed: 0,series_id,0,1,2,3,4,5,6,7,8,9,10,11
0,0,0.10004,0.18300,0.08296,-0.13908,0.00976,0.13420,0.55388,0.60268,0.51484,0.40504,0.27572,0.15372
1,0,0.10248,0.18788,0.08540,-0.14152,0.00976,0.13664,0.56364,0.62220,0.52704,0.42212,0.28792,0.15372
2,0,0.10736,0.19520,0.08784,-0.14884,0.00976,0.14152,0.57340,0.62952,0.54412,0.43188,0.29280,0.15860
3,0,0.11468,0.20252,0.08784,-0.15616,0.01464,0.14640,0.58072,0.64660,0.56120,0.44896,0.30012,0.16592
4,0,0.11956,0.20984,0.09028,-0.16104,0.01464,0.15128,0.58560,0.66368,0.57828,0.45628,0.30988,0.16592
...,...,...,...,...,...,...,...,...,...,...,...,...,...
4995,1599,0.03172,-0.06832,-0.10004,0.02196,0.06588,-0.08296,-0.06832,-0.10004,-0.11224,-0.05856,-0.05368,-0.09028
4996,1599,0.02928,-0.06344,-0.09272,0.01952,0.06100,-0.07808,-0.06832,-0.10492,-0.11468,-0.05124,-0.05124,-0.08540
4997,1599,0.02928,-0.06588,-0.09516,0.02196,0.06344,-0.08052,-0.06832,-0.10248,-0.11712,-0.05612,-0.05612,-0.08052
4998,1599,0.02440,-0.07564,-0.10004,0.02928,0.06344,-0.08784,-0.07076,-0.10248,-0.11468,-0.05612,-0.05612,-0.08540


In [10]:
from tsfresh import extract_features
extracted_features = extract_features(df, column_id="series_id", column_sort="series_id")


ERROR:numba.cuda.cudadrv.driver:Call to cuInit results in CUDA_ERROR_NO_DEVICE
Feature Extraction:   0%|          | 7/19200 [02:56<134:29:29, 25.23s/it]


KeyboardInterrupt: ignored