# Pumping Events Detection

Detect pumping cycles. Calculate and visualize the health of the pump using the data from the cycles.



## Input

The Pressure.csv file contains the following columns:

- Date - timestamp of the row

- Pressure [Pa] - actual pressure in a vacuum chamber

- PumpState [Enumeration] - actual state of a pump

Each row of the file represents one logging. The PumpState values are logged only when the pump changes its state. Until then, the previous logged state is valid. The Pressure is logged irregularly and only when a pressure measurement is performed.


## Algorithm Assigment

The pumping cycle is defined as followed:

- starts 10 seconds after the PumpState changes its value to pumping (1)

- ends 5 second before the PumpState changes its value to pumped (2)

- there are no changes of PumpState during the cycle (but some additional pumping = 1 values may be logged during the cycle)

## A) Cycle Threshold

Calculate the cycle threshold (minimal Pressure value) for each cycle. Finally, calculate the mean of the cycle threshold values.

## B) Cycle Trend Health Hypothesis

Consider the following pump health hypothesis. If the pump is healthy, the Pressure should gradually decrease during the pumping cycle. Check whether the data match the hypothesis. Use any approach and visualization means to perform the check. (The hypothesis testing is a possible solution but is not required. Simple "common sense" approach is perfectly OK.)


## Output

1.) Visualize all the cycle threshold values using time as x-axis

2.) Visualize the cycle trend health analysis results

3.) Commit all your code and results back to the repository




### IMPORTS

In [None]:
%matplotlib inline

import pandas as pd
import matplotlib.pyplot as plt
import plotly.graph_objs as go
from plotly.offline import iplot
import plotly
plotly.offline.init_notebook_mode()

pd.set_option('mode.chained_assignment', None)


### LOAD INPUT

In [None]:
df_pressure = pd.read_csv('Pressure.csv')


### ALGORITHM CORE

- get dataframe with PumpState 1 or 2 (defines the start and end of each cycle)
- if the PumpState is the same as the previous one, remove it

In [None]:
df_cycles_def = df_pressure.query('1 <= PumpState <= 2')
df_cycles_def = (
    df_cycles_def
    .groupby((df_cycles_def['PumpState'] != df_cycles_def['PumpState']
    .shift())
    .cumsum()    
    .values)
    .first()
    )

df_cycles_def['Date'] = pd.to_datetime(df_cycles_def['Date'])
df_pressure['Date'] = pd.to_datetime(df_pressure['Date'])
df_pressure = (
    df_pressure
    .set_index('Date')
    .drop(['PumpState', 'Int'], axis=1)
)


- get dataframe only with two columns (df_StartEnd)
- first column is the start time of each cycle (values defined as: original one plus 10 seconds)
- second column si the end time of each cycle (values defined as: original one minus 5 seconds)

In [None]:
df_StartEnd = (
    pd.concat
    ([df_cycles_def.loc[df_cycles_def['PumpState'] == 1].reset_index()['Date'] + pd.Timedelta(seconds=10), 
      df_cycles_def.loc[df_cycles_def['PumpState'] == 2].reset_index()['Date'] - pd.Timedelta(seconds=5)], 
     axis=1)
)
df_StartEnd.columns = ['DateStart', 'DateEnd']


- iterate through the df_StartEnd dataframe and use the start/end time to define the whole cycle
- save each cycle with its datetime data to new dataframe df_cycles
- create new dataframe df_min_values to store minimum value of each cycle (with its date)

In [None]:
df_cycles = pd.DataFrame()
df_min_values = pd.DataFrame()

for index in df_StartEnd.itertuples():
    df_temp = (
        df_pressure   
        .truncate(before=index[1], after=index[2])
        .reset_index() 
        )
    df_cycles = [df_cycles, df_temp]
    df_cycles = pd.concat(df_cycles, axis=1)
    df_min_values = df_min_values.append(df_temp[df_temp.eq(df_temp['Pressure'].min()).any(1)], ignore_index = True)
    
df_min_values = df_min_values.set_index('Date')


### BASIC STATISTICS

In [None]:
stats = (
    df_min_values
    .describe()
    .transpose()
)

stats


- the mean of the cycle threshold values is equal to 9.06 Pa

### VISUALIZATION

In [None]:
df_cycles.plot(legend=False);
plt.xlabel('Time [-]');
plt.ylabel('Pressure [Pa]');
plt.title('Cycle trend health analysis');


In [None]:
df_cycles_pressure = df_cycles['Pressure']
df_cycles_date = df_cycles['Date']

# insert the number of cycle you want to display (0 to count-1)
# 'count' can be found in basic statistics
cycle_no = 80

values = go.Scatter(x=df_cycles_date.ix[:,cycle_no], y=df_cycles_pressure.ix[:,cycle_no],
                    mode='lines',
                    name='Cycle trend health analysis',
                    connectgaps=True)

layout = {
    'title' : 'Cycle trend health analysis',
    'yaxis' : {'title' : 'Pressure [Pa]'}
}

data = [values]
fig = {'data' : data, 'layout' : layout}

iplot(fig, validate = False)


In [None]:
values = go.Scatter(x=df_min_values.index, y=df_min_values['Pressure'],
                    mode='lines',
                    name='Cycle treshold values',
                    connectgaps=True)

layout = {
    'title' : 'Cycle treshold values',
    'yaxis' : {'title' : 'Pressure [Pa]'}
}

data = [values]
fig = {'data' : data, 'layout' : layout}

iplot(fig, validate = False)
