# Anomaly detection in technical systems: anomaly in the water circulation loop

## Testbed description

The testbed of the industrial Internet of things is intended for:

- Demonstration of the possibilities and benefits associated with the implementation of industrial Internet of things technologies;
- Approbation, verification and validation of new technologies related to the industrial Internet of Things in laboratory conditions in order to determine a set of scientific theories, algorithms and practical tools for the application of these technologies in urgent industrial problems;
- Conducting educational and research work on the subject of the industrial Internet of things.

The simple and intuitive user interface of the program allows you to operate the systems in accordance with the instructions presented in the user manual separately.

The testbed is a closed-loop water pump, a power supply system, control systems, data collection and monitoring system, built using industrial Internet of things technologies.
The stand consists of the following systems:

1. Water circulation system.
2. Control system of the water circulation system (hereinafter referred to as the Control System).
3. System for monitoring the state of the water circulation system (hereinafter - Monitoring System).
4. TSN technology demonstration system.
5. System for storing, processing and visualizing data.

The water circulation system is designed to simulate a water supply system in a laboratory environment and circulates water through water pipes using a water pump.
The water circulation system simulates the following faults:

- Introduction of imbalance on the connecting shaft (misalignment) of the motor and the water pump;
- Changing the flow rate of the valve at the pump inlet (REDUCTION OF FLOW AREA);
- Changing the flow rate of the valve at the pump outlet (REDUCTION OF FLOW AREA).

The system consists of the following components:
- Water pump
- Electric motor
- Inverter
- Electrovalve (1)
- Electrovalve (2)
- Mechanical lever for misalignment
- Vibration sensors
- Water tank with pipes
- Pressure sensor
- Flow meter
- Thermocouple

In [None]:
from IPython.display import Image
Image(filename="../input/skoltech-anomaly-benchmark-skab-teaser/look.png", width=1000, height=500)

Front panel and composition of the water circulation, control and monitoring systems: 1,2 - solenoid valve (amount - 1); 3 - a tank with water (1); 4 - a water pump (1); 5 - emergency stop button (1); 6 - electric motor (1); 7 - inverter (1); 8 - compactRIO (1); 9 - a mechanical lever for shaft misalignment (1). Not shown parts - vibration sensor (2); pressure meter (1); flow meter (1); thermocouple (2).

## Working with the data

The dataset contains 4 anomalies (incidents):

- MISALIGNMENT OF THE PUMP AND ENGINE SHAFT (abruptly)  
Abrupt appearance of a defect: 18:39:22  
Abrupt defect shutdown: 18:42:32

- MISALIGNMENT OF PUMP AND MOTOR SHAFT (slow)  
Slow appearance of the defect: 18:44:36-18:45:49  
Abrupt defect shutdown: 18:46:51

- REDUCTION OF FLOW AREA SECTION-1 (top)  
Slow appearance of the defect: 19:06:57-19:07:37  
Slow defect shutdown: 19:10:45-19:11:31

- REDUCTION OF FLOW AREA SECTION-2 (bottom)  
Slow appearance of the defect: 19:14:40-19:16:24  
Slow defect shutdown: 19:19:15-19:21:16

## Problem statement

- **DS problem in terms of business:** We need to detect anomalies as soon as they appear.

Metric:
Average Detection Delay (ADD)

$\text{ADD} = \frac{1}{|Y|}\sum_{y \in Y} ( \tau_y - \theta_y )$,

where $|Y|$ - total amount of changepoints,  
$\tau_y$ - moment of detection,  
$\theta_y$ - moment of changepoint (anomaly appearing).

- **DS problem in terms of math:** We need to propose (construct) the model, that most accurately describes the normal operation of the testbed.

Metric:
Mean Absolute Error (MAE)

$\text{MAE} = \frac{1}{N} \sum^{N}_{i=1}|x_i - \hat{x}_i|$,

where $N$ - Total amount of data instances,  
$x_i$ - true value at a time moment $i$,  
$\hat{x}_i$ - predicted value at a time moment $i$.

### Libraries importing

In [None]:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
from sklearn.preprocessing import StandardScaler
%matplotlib inline

### Data loading

In [None]:
# Reading the data
raw_data = pd.read_csv('../input/skoltech-anomaly-benchmark-skab-teaser/SkAB teaser.csv', 
                   sep=';', 
                   index_col='datetime', 
                   parse_dates=True).drop('index',axis=1)

In [None]:
# Showing the first 10 rows of the table with the data
raw_data.head(10)

Let's pivot the table, because we need each signal to be in the separate column.

In [None]:
# Pivoting the table
raw_data = raw_data.pivot_table(values='value', index=raw_data.index, columns='id')

In [None]:
# Showing first 5 rows of the table
raw_data.head()

In [None]:
raw_data.info()

In [None]:
# Plotting
raw_data.plot(figsize=(12,6), marker='o', markersize=3);

Let's cut off the beginning interval of the data to exclude the transition period (warming up).

In [None]:
# Cutting off
raw_data = raw_data['2019-07-08 17:52:29':]

In [None]:
# Plotting
raw_data.plot(figsize=(12,6), marker='o', markersize=3);

In [None]:
# Saving processed data
# raw_data.to_csv('raw_data.csv')

In [None]:
print(f'The shape of the table (dataframe): {raw_data.shape}')

### Features

In [None]:
raw_data.describe()

In [None]:
# Plotting separate pictures for signals
for name in raw_data.columns:
    raw_data[name].plot(figsize=(12,3), marker='o', markersize=2)
    plt.xlabel('Time')
    plt.ylabel('Value')
    plt.title(f'Signal: {name}')
    plt.show()

### Additional data processing

In [None]:
# todo
def preprocessing(raw_data):
    data = raw_data.copy()
    
    # your code
    
    return data

data = preprocessing(raw_data=raw_data)

### Splitting the data into training, validation and test sets

- training sample — sample for model's parameters optimization.
- validation sample — sample for selecting the best model from the set of models built on the training sample.
- test sample — sample for assesing the quality of the problem solution.

Descriptions and options for the definitions of training, validation and test samples are presented in the [article](https://medium.com/@tekaround/train-validation-test-set-in-machine-learning-how-to-understand-6cdd98d4a764).

[Article](https://hunch.net/?p=22) about overfitting in ML.

In [None]:
# Showing the training, validation and test sets
data.plot(figsize=(12,6))
plt.axvspan(data.index[0], 
            '2019-07-08 18:25', 
            color='green', 
            alpha=0.1, 
            label='Training set')
plt.axvspan('2019-07-08 18:25', 
            '2019-07-08 18:35', 
            color='yellow', 
            alpha=0.1, 
            label='Validation set')
plt.axvspan('2019-07-08 18:35', 
            data.index[-1], 
            color='red', 
            alpha=0.1, 
            label='Test set')
plt.legend(bbox_to_anchor =(0.8, -0.2), ncol = 3)
plt.xlabel('Time')
plt.ylabel('Value')
plt.title('Training, validation and test sets');

### Data scaling (normalizing)

Most of all ML algorithms need data to be scaled before fitting.

In [None]:
# Scaler initialization
StSc = StandardScaler()
# Fitting Scaler on the training and validation sets
StSc.fit(data[:'2019-07-08 18:25'])

# Applying scaler
# training set
train_sc = StSc.transform(data[:'2019-07-08 18:25'])
# validation set
val_sc = StSc.transform(data['2019-07-08 18:25':'2019-07-08 18:35'])
# all data
data_sc = StSc.transform(data)

## Fitting the model

- Link to the Keras with TensorFlow backend course: https://youtu.be/qFJeN9V1ZsI

### Libraries importing

In [None]:
from tensorflow.keras.layers import Input, Dense, BatchNormalization, Activation, Dropout
from tensorflow.keras.models import Model, Sequential
from tensorflow.keras.optimizers import Adam
from tensorflow.keras import regularizers
from sklearn.metrics import mean_absolute_error, mean_squared_error
from tensorflow.keras.callbacks import EarlyStopping
from itertools import product

### Function for results reproducibility

In [None]:
def Random(seed_value):
    # 1. Set `PYTHONHASHSEED` environment variable at a fixed value
    import os
    os.environ['PYTHONHASHSEED'] = str(seed_value)

    # 2. Set `python` built-in pseudo-random generator at a fixed value
    import random
    random.seed(seed_value)

    # 3. Set `numpy` pseudo-random generator at a fixed value
    import numpy as np
    np.random.seed(seed_value)

    # 4. Set `tensorflow` pseudo-random generator at a fixed value
    import tensorflow as tf
    tf.random.set_seed(seed_value)

### Autoencoder description

![ae](https://miro.medium.com/max/700/1*44eDEuZBEsmG_TCAKRI3Kw@2x.png)

## Useful links

- Autoencoder Explained:
https://www.youtube.com/watch?v=H1AllrJ-_30

- Outlier Detection with Autoencoder Ensembles:
https://saketsathe.net/downloads/autoencode.pdf

- About batch normalization layer:
https://arxiv.org/pdf/1502.03167v2.pdf

In [None]:
# Function for specific architecture fitting
def arch(param, data):
    """Specific architecture fitting

    Parameters
    ----------
    param : list
    
    data : np.array
    """
    Random(0)
    input_dots = Input((8,))

    x = Dense(param[0])(input_dots)
    x = BatchNormalization()(x)
    x = Activation('elu')(x)

    x = Dense(param[1])(x)
    x = BatchNormalization()(x)
    x = Activation('relu')(x)

    bottleneck = Dense(param[2], activation='linear')(x)

    x = Dense(param[1])(bottleneck)
    x = BatchNormalization()(x)
    x = Activation('relu')(x)

    x = Dense(param[0])(x)
    x = BatchNormalization()(x)
    x = Activation('relu')(x)

    out = Dense(8, activation='linear')(x)

    model = Model(input_dots, out)
    model.compile(optimizer=Adam(param[3]), loss='mae', metrics=["mse"])
    
    model.fit(data, data,
                validation_split=0.2,
                epochs=10,
                batch_size=param[4],
                verbose=0,
                shuffle=True,
               )
    return model

### Let's fit the random model (architecture)

In [None]:
model = arch(param=(6, 5, 4, 0.0001, 30), data=train_sc)

In [None]:
for i in range(val_sc.shape[1]):
    plt.figure(figsize=(12,3))
    plt.plot(StSc.inverse_transform(val_sc)[:, i])
    plt.plot(StSc.inverse_transform(model.predict(val_sc))[:, i])
    plt.show()

Mean Absolute Error (MAE)

$\text{MAE} = \frac{1}{N} \sum^{N}_{i=1}|x_i - \hat{x}_i|$,

where $N$ - Total amount of data instances,  
$x_i$ - true value at a time moment $i$,  
$\hat{x}_i$ - predicted value at a time moment $i$.

In [None]:
mean_absolute_error(val_sc, model.predict(val_sc))

### Hyperparameters to optimize

In [None]:
# Selecting the parameners' grid for checking
n1=[6, 5]
n2=[4, 3]
n3=[2, 1]
lr=[0.05, 0.01]
batch_size=[32, 64]

parameters = product(n1, n2, n3, lr, batch_size)
parameters_list = list(parameters)
print(f'Total number of parameter combinations: {len(parameters_list)}')

In [None]:
# Table with the parameters' grid
pd.DataFrame(parameters_list, columns=['neurons 1st layer',
                                      'neurons 2nd layer',
                                      'neurons 3rd layer',
                                      'learning rate',
                                      'batch size']).head()

### Results of the model selection

In [None]:
from tqdm.notebook import tqdm

In [None]:
# Greedy brute force
errors = []
for params in tqdm(parameters_list):
    
    model = arch(params, train_sc)
    train_pred = model.predict(train_sc, batch_size=params[4])
    val_pred = model.predict(val_sc, batch_size=params[4])
    
    train_error = mean_absolute_error(train_sc, train_pred)
    val_error = mean_absolute_error(val_sc, val_pred)
    
    errors.append(list(params)+[train_error, val_error])

# Sort the parameters by the error value
df_errors = pd.DataFrame(errors,
                         columns=['neurons 1st layer', 
                                  'neurons 2nd layer', 
                                  'neurons 3rd layer', 
                                  'learning rate', 
                                  'batch size', 
                                  'mae train', 
                                  'mae val'])
df_errors.sort_values('mae val').head()

### Fitting the best model

In [None]:
best_params = parameters_list[df_errors.sort_values('mae val').index[0]]

model = arch(best_params, train_sc)
model.summary()

In [None]:
for i in range(val_sc.shape[1]):
    plt.figure(figsize=(12,3))
    plt.plot(StSc.inverse_transform(val_sc)[:, i])
    plt.plot(StSc.inverse_transform(model.predict(val_sc))[:, i])
    plt.show()

### Health indicator

In [None]:
test_residuals = data_sc - model.predict(data_sc)

pd.DataFrame(test_residuals, columns=data.columns, index = data.index).plot(figsize=(12,6))
plt.xlabel('Time')
plt.ylabel('Residuals')
plt.title('Residuals')
plt.show()

In [None]:
train_residuals = train_sc - model.predict(train_sc)
val_residuals = val_sc - model.predict(val_sc)

UCL = pd.DataFrame(val_residuals).abs().sum(axis=1).quantile(0.99)

In [None]:
# Health indicator
pd.DataFrame(test_residuals, index=data.index).abs().sum(axis=1).plot(marker='o', 
                                                                      markersize=2, 
                                                                      alpha=0.2, 
                                                                      figsize=(12,6), 
                                                                      label='Health indicator')
# Health indicator with the median filter
pd.DataFrame(test_residuals, index=data.index).abs().sum(axis=1).rolling(3).median().plot(marker='o', 
                                                                                          markersize=2, 
                                                                                          alpha=0.7, 
                                                                                          figsize=(12,6),
                                                                                          label='Smoothed Health indicator')

plt.axvspan(data.index[0], 
            '2019-07-08 18:25', 
            color='green', 
            alpha=0.1, 
            label='Training sample')
plt.axvspan('2019-07-08 18:25', 
            '2019-07-08 18:35', 
            color='yellow', 
            alpha=0.1, 
            label='Validation sample')
plt.axvspan('2019-07-08 18:35', 
            data.index[-1], 
            color='red', 
            alpha=0.1, 
            label='Test sample')

plt.axhline(UCL, color='r', label='Upper control limit')
plt.ylim([0, 4*UCL])
plt.xlabel('Time')
plt.ylabel('Health indicator value')
plt.legend(bbox_to_anchor =(0.8, -0.2), ncol = 3)
plt.show()

In [None]:
fig, ax = plt.subplots(figsize=(12,6))
ax.axvspan(
    data.index[data.index=='2019-07-08 18:39:22'][0],
    data.index[data.index=='2019-07-08 18:42:32'][0],
    alpha=0.2, 
    color='red')
ax.axvspan(
    data.index[data.index=='2019-07-08 18:44:36'][0],
    data.index[data.index=='2019-07-08 18:46:51'][0],
    alpha=0.2, 
    color='red')
ax.axvspan(
    data.index[data.index=='2019-07-08 19:06:57'][0],
    data.index[data.index=='2019-07-08 19:11:31'][0],
    alpha=0.2, 
    color='red')
ax.axvspan(
    data.index[data.index=='2019-07-08 19:14:40'][0],
    data.index[data.index=='2019-07-08 19:21:16'][0],
    alpha=0.2, 
    color='red', label='Anomalies (incidents)')
ax.plot(data.index, pd.DataFrame(test_residuals).abs().sum(axis=1), 
        marker='o', markersize=2, alpha=0.2, label='Health indicator')
ax.plot(data.index, pd.DataFrame(test_residuals).abs().sum(axis=1).rolling(3).median(), 
        marker='o', markersize=2, alpha=0.7, label='Smoothed Health indicator')

ax.axhline(UCL, color='r', label='Upper control limit')
ax.set_ylim([0, 4*UCL])
ax.set_xlabel('Time')
ax.set_ylabel('Health indicator value')
plt.legend(bbox_to_anchor =(0.8, -0.1), ncol = 3)
plt.show()

Here you can implement and calculate Average Detection Delay (ADD)

$\text{ADD} = \frac{1}{|Y|}\sum_{y \in Y} ( \tau_y - \theta_y )$,

where $|Y|$ - total amount of changepoints,  
$\tau_y$ - moment of detection,  
$\theta_y$ - moment of changepoint (anomaly appearing).

### Feature importance calculation

In [None]:
def feature_importance(residuals, analysis_type="collective", date_from=None, date_till=None, weigh=True):
    """Feature importance calculation

    Parameters
    ----------
    residuals : pandas.DataFrame()

    analysis_type : str, "single"/"collective", "single" by default

    date_from : str in format 'yyyy-mm-dd HH:MM:SS', None by default

    date_till : str in format 'yyyy-mm-dd HH:MM:SS', None by default

    weigh : boolean, True by default
        If analysis_type == "collective".

    Returns
    -------
    data : pandas.DataFrame().
    """
    if date_from is None:
        start = 0
    if date_till is None:
        end = -1
    data = residuals[date_from:date_till].abs().copy()

    if (analysis_type == "collective") & (weigh == False):
        data = data.div(data.sum(axis=1), axis=0) * 100
        return pd.DataFrame(data.mean(), columns=['Feature importance, %']).T
    elif (analysis_type == "collective") & (weigh == True):
        data = data.mean().div(data.mean().sum(), axis=0) * 100
        return pd.DataFrame(data, columns=['Feature importance, %']).T
    elif analysis_type == "single":
        return data.div(data.sum(axis=1), axis=0) * 100

In [None]:
for dates in [['2019-07-08 18:39:22','2019-07-08 18:42:32'],
              ['2019-07-08 18:44:36','2019-07-08 18:46:51'],
              ['2019-07-08 19:06:57','2019-07-08 19:11:31'],
              ['2019-07-08 19:14:40','2019-07-08 19:21:16']]:
    print(f'Incident since {dates[0]} till {dates[1]}')
    display(feature_importance(pd.DataFrame(test_residuals, index=data.index, columns=data.columns), date_from=dates[0], date_till=dates[1]))
    print('\n')

## Thank you for your attention!