This Jupyter Notebook is part of the course [Python for Industry 4.0](https://www.udemy.com/course/python-for-industry-40/?referralCode=D7925A2D76BA4C94CA4E) from [Industry 4.0 Academy](https://www.i40a.com).

Latos© copyright 2022. All Rights Reserved.

# Project 2 - Control Loop Performance Monitoring

## CPM
Typically, process industries have between 500 and 5000 control loops. These control loops are installed and tuned based on the current state of the process. 

The performance of the control deteriorates over the lifetime of a plant because equipment ages and process dynamics change. So it is essential to detect and fix problems related to control loop performance.

Monitoring the loops by visual inspection would consume high personal resources and limit the investigation to part of these loops. Aiming to verify automatically whether the controllers, actuators, and sensors are working correctly, control loop performance monitoring (CPM) plays a vital role in plant performance. CPM techniques included: controller performance assessment, non-linearity detection, oscillation detection, stiction detection, sensor fault detection, and others.

To learn more about CPM, see the article on [this link](https://www.sciencedirect.com/science/article/pii/S2405896316306036)


## Aim of the project
Build simple control loop performance indicator using Python functions.

## Dataset
Complete information about the dataset on the [dataset repository](https://github.com/i40a/datasets/blob/main/control_loop/info.md).

## Load the dataset

Load the csv file on the path below (may take time, since it is a larger file)

In [None]:
import pandas as pd
path = 'https://raw.githubusercontent.com/i40a/datasets/main/control_loop/mods/2/control_loop.csv'

In [None]:
# load csv file
df = pd.read_csv(path, index_col='Time', parse_dates=True)
df.tail()

Unnamed: 0_level_0,Values,Quality,Loop,Variable
Time,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1
2017-10-08 23:59:51.897,0.138086,2,FIC14,SP
2017-10-08 23:59:53.956,0.13864,2,FIC14,SP
2017-10-08 23:59:56.016,0.139064,2,FIC14,SP
2017-10-08 23:59:58.075,0.138427,2,FIC14,SP
2017-10-08 23:59:59.105,0.138898,2,FIC14,SP


## Explore the dataset

First, understand the data and variables from the [dataset repository](https://github.com/i40a/datasets/blob/main/control_loop/info.md).

Then, use the numerical and visualization libraries to get insights from the data.

### Possible operations
* Number and name of the control loops
* Number and name of variables
* Plot all variables for one loop (FIC32, for example)

In [None]:
# check Loop
df['Loop'].value_counts()

FIC41    338602
FIC14    311050
FIC32    306696
FIC19    303604
LIC44    254026
FIC37    210677
Name: Loop, dtype: int64

In [None]:
# check the different variables
df['Variable'].value_counts()

OP    508137
PV    506254
MV    371734
SP    338530
Name: Variable, dtype: int64

In [None]:
# count values for each variable on each loop
for loop in df['Loop'].unique():
    print(loop)
    df_loop = df[df['Loop'] == loop]
    print(df_loop['Variable'].value_counts())

LIC44
MV    84694
PV    84687
OP    84641
SP        4
Name: Variable, dtype: int64
FIC41
PV    84707
MV    84699
OP    84680
SP    84516
Name: Variable, dtype: int64
FIC37
OP    84703
PV    84390
MV    41581
SP        3
Name: Variable, dtype: int64
FIC32
OP    84716
PV    84714
SP    84705
MV    52561
Name: Variable, dtype: int64
FIC19
OP    84698
SP    84679
PV    83043
MV    51184
Name: Variable, dtype: int64
FIC14
PV    84713
OP    84699
SP    84623
MV    57015
Name: Variable, dtype: int64


In [None]:
# plot Variables over time
loop = 'FIC32'

df_loop = df[df['Loop'] == loop]

import plotly.express as px
fig = px.line(df_loop, x=df_loop.index, y='Values', color='Variable')
fig.show()

## Algorithms

### Faulty sensor

Detects faulty sensor by noise level. In this simple indicator, noise is the difference between a record and its previous record (true if sampling rate and noise level is high) and noise level is the standard deviation of the differences. 

aim: build a function that receives the DataFrame and a loop and returns the noise level.

In [None]:
# select PV variable from a loop
Loop = 'FIC14'

df_loop = df[df['Loop'] == loop]
df_var = df_loop[df_loop['Variable'] == 'PV']
df_var

Unnamed: 0_level_0,Values,Quality,Loop,Variable
Time,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1
2017-10-06 12:00:01.994,-0.179931,2,FIC32,PV
2017-10-06 12:00:04.053,-0.180243,2,FIC32,PV
2017-10-06 12:00:06.112,-0.181459,2,FIC32,PV
2017-10-06 12:00:08.172,-0.181641,2,FIC32,PV
2017-10-06 12:00:09.201,-0.182183,2,FIC32,PV
...,...,...,...,...
2017-10-08 23:59:50.867,0.163626,2,FIC32,PV
2017-10-08 23:59:52.927,0.155351,2,FIC32,PV
2017-10-08 23:59:54.986,0.149556,2,FIC32,PV
2017-10-08 23:59:57.045,0.149608,2,FIC32,PV


In [None]:
# evaluates the difference between a record and its previous record
# use diff method in Pandas DataFramme
df_diff = df_var['Values'].diff()
df_diff

Time
2017-10-06 12:00:01.994         NaN
2017-10-06 12:00:04.053   -0.000312
2017-10-06 12:00:06.112   -0.001217
2017-10-06 12:00:08.172   -0.000182
2017-10-06 12:00:09.201   -0.000542
                             ...   
2017-10-08 23:59:50.867    0.001079
2017-10-08 23:59:52.927   -0.008275
2017-10-08 23:59:54.986   -0.005795
2017-10-08 23:59:57.045    0.000053
2017-10-08 23:59:59.105   -0.002644
Name: Values, Length: 84714, dtype: float64

In [None]:
# ignore first value in diff (nan) and evaluates the standard deviation of the diff
df_diff[1:].std()

0.004327167685139705

In [None]:
# function with all steps
def noise_level(df, loop):
   # select PV variable from a loop
    df_loop = df[df['Loop'] == loop]
    df_var = df_loop[df_loop['Variable'] == 'PV']

    # evaluates the difference between a record and its previous record
    df_diff = df_var['Values'].diff()

    # ignore first value in diff (nan) and evaluates the standard deviation of the diff
    return df_diff[1:].std()

In [None]:
# apply the function to a different loop
noise_level(df, 'FIC41')

0.012955504027368154

### Valve travel

Evaluates how much the control valve has moved. The more the valve has moved, the lower the loop performance and the greater the valve wear.

aim: build a function that receives the DataFrame and a loop and returns the valve travel (sum(abs(travel_each_time_step)))


In [None]:
# select MV variable from a loop
Variable = 'MV'
Loop = 'FIC14'

df_loop = df[df['Loop'] == loop]
df_var = df_loop[df_loop['Variable'] == 'MV']
df_var

Unnamed: 0_level_0,Values,Quality,Loop,Variable
Time,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1
2017-10-06 12:00:04.053,-0.258846,2,FIC32,MV
2017-10-06 12:00:06.112,-0.257385,2,FIC32,MV
2017-10-06 12:00:11.245,-0.255822,2,FIC32,MV
2017-10-06 12:00:13.289,-0.252765,2,FIC32,MV
2017-10-06 12:00:15.348,-0.246819,2,FIC32,MV
...,...,...,...,...
2017-10-08 23:59:46.764,-0.000892,2,FIC32,MV
2017-10-08 23:59:50.867,-0.000994,2,FIC32,MV
2017-10-08 23:59:52.927,-0.002455,2,FIC32,MV
2017-10-08 23:59:54.986,-0.014005,2,FIC32,MV


In [None]:
# subtract each value by its previous value (distance traveled by the valve between records)
# use diff method in Pandas DataFrame 
travel_diff = df_var['Values'].diff()
travel_diff

Time
2017-10-06 12:00:04.053         NaN
2017-10-06 12:00:06.112    0.001461
2017-10-06 12:00:11.245    0.001563
2017-10-06 12:00:13.289    0.003058
2017-10-06 12:00:15.348    0.005945
                             ...   
2017-10-08 23:59:46.764   -0.001359
2017-10-08 23:59:50.867   -0.000102
2017-10-08 23:59:52.927   -0.001461
2017-10-08 23:59:54.986   -0.011551
2017-10-08 23:59:57.045   -0.002922
Name: Values, Length: 52561, dtype: float64

In [None]:
# evaluate the valve travel.
sum(abs(travel_diff[1:]))

163.69383907185133

In [None]:
# function with all steps
def valve_travel(df, loop):
    # select MV variable from a loop
    df_loop = df[df['Loop'] == loop]
    df_var = df_loop[df_loop['Variable'] == 'MV']

    # subtract each value by its previous value
    travel_diff = df_var['Values'].diff()

    # evaluate the valve travel.
    return sum(abs(travel_diff[1:]))

In [None]:
# test the function to a different loop
valve_travel(df, 'LIC44')

818.8199048919968

### Mean absolute error

Evaluates how far the process is from the setpoint (desired value).

Aim: build a function that receives the DataFrame and a loop and returns the mean absolute error (sum(abs(SP - PV)) / n_samples)

In [None]:
# select data from a loop (FIC14, for example)
Loop = 'FIC14'
df_loop = df[df['Loop'] == loop]
df_loop

Unnamed: 0_level_0,Values,Quality,Loop,Variable
Time,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1
2017-10-06 12:00:04.053,-0.258846,2,FIC32,MV
2017-10-06 12:00:06.112,-0.257385,2,FIC32,MV
2017-10-06 12:00:11.245,-0.255822,2,FIC32,MV
2017-10-06 12:00:13.289,-0.252765,2,FIC32,MV
2017-10-06 12:00:15.348,-0.246819,2,FIC32,MV
...,...,...,...,...
2017-10-08 23:59:50.867,0.152602,2,FIC32,SP
2017-10-08 23:59:52.927,0.153993,2,FIC32,SP
2017-10-08 23:59:54.986,0.154287,2,FIC32,SP
2017-10-08 23:59:57.045,0.155125,2,FIC32,SP


In [None]:
# create pv and sp series selecting only the Values column
pv = df_loop[df_loop['Variable'] == 'PV']['Values']
sp = df_loop[df_loop['Variable'] == 'SP']['Values']

In [None]:
# resample the data to the same base. 
# it is necessary, since the data for pv and sp were not collected at the same time
pv_resample = pv.resample('1min').mean()
sp_resample = sp.resample('1min').mean()

In [None]:
# concatenate both variables to the same DataFrame, rename columns 
df_concat = pd.concat((pv_resample, sp_resample), axis=1)
df_concat.columns = ['pv', 'sp']
df_concat

Unnamed: 0_level_0,pv,sp
Time,Unnamed: 1_level_1,Unnamed: 2_level_1
2017-10-06 12:00:00,-0.171397,-0.163987
2017-10-06 12:01:00,-0.152549,-0.148668
2017-10-06 12:02:00,-0.140740,-0.132039
2017-10-06 12:03:00,-0.119967,-0.116020
2017-10-06 12:04:00,-0.106370,-0.099093
...,...,...
2017-10-08 23:55:00,0.113487,0.126253
2017-10-08 23:56:00,,
2017-10-08 23:57:00,0.145568,0.152173
2017-10-08 23:58:00,0.150804,0.152293


In [None]:
# fill nan values using method backfill
df_concat = df_concat.fillna(method='backfill')
df_concat

Unnamed: 0_level_0,pv,sp
Time,Unnamed: 1_level_1,Unnamed: 2_level_1
2017-10-06 12:00:00,-0.171397,-0.163987
2017-10-06 12:01:00,-0.152549,-0.148668
2017-10-06 12:02:00,-0.140740,-0.132039
2017-10-06 12:03:00,-0.119967,-0.116020
2017-10-06 12:04:00,-0.106370,-0.099093
...,...,...
2017-10-08 23:55:00,0.113487,0.126253
2017-10-08 23:56:00,0.145568,0.152173
2017-10-08 23:57:00,0.145568,0.152173
2017-10-08 23:58:00,0.150804,0.152293


In [None]:
# evaluate the mean absolute error
sum(abs(df_concat['pv'] - df_concat['sp'])) / len(df_concat)

0.0032951240510434186

In [None]:
# function with all steps
def mae(df, loop):
    # select data from a loop (FIC14, for example)
    df_loop = df[df['Loop'] == loop]
    df_loop

    # create pv and sp series selecting only the Values column
    pv = df_loop[df_loop['Variable'] == 'PV']['Values']
    sp = df_loop[df_loop['Variable'] == 'SP']['Values']

    # resample the data to the same base. 
    # it is necessary, since the data for pv and sp were not collected at the same time
    pv_resample = pv.resample('1min').mean()
    sp_resample = sp.resample('1min').mean()

    # concatenate both variables to the same DataFrame, rename columns 
    df_concat = pd.concat((pv_resample, sp_resample), axis=1)
    df_concat.columns = ['pv', 'sp']

    # fillna with backfill method
    df_concat = df_concat.fillna(method='backfill')

    res = sum(abs(df_concat['pv'] - df_concat['sp'])) / len(df_concat)

    return res

In [None]:
# test the function
mae(df, 'FIC41')

0.012941068331087643

# Possible improvements

* include period selection to the function, so it would be possible to evaluate the indicator based on time range.
* add filter to valve travel indicator to attenuate noise before calculating the travel.
* evaluate other indicators, such as: oscillation detection, slow control, spikes, no signal, saturation.
