# This Notebook is an example of Data Science applied to Industrial Data Analysis.

In this case, we'll analyse a Cooler Drum problem, given some input variables:

* Current of input conveyor belt;
* Current of Cooler Drum;
* Temperature of input material;
* Current of the Drum's reductor;
* Output flow through conveyor belt.

*This opportunity of study was given by Yara International.*

<div style="width:100%;text-align: center;">
<img src="http://static1.squarespace.com/static/5e6b8563380ccc4e4fd26a4f/t/5e9601c8e8579f209b6924f4/1586889770542/Yara_International.jpg?format=1500w" width="400px">
    <div class="caption">Yara International</div>

In [None]:
import numpy as np
import pandas as pd
import warnings
import matplotlib as mpl
import matplotlib.pyplot as plt
from sklearn import preprocessing
import seaborn as sns
import missingno as msno

# Mute warnings
warnings.filterwarnings('ignore')

In [None]:
df = pd.read_csv('../input/coolerdrumanalysis/ARQV-GERAL.csv',sep=";")

In [None]:
df.head()

Replacing commas for dots (standard for numerical operations)

In [None]:
df=df.apply(lambda x: x.str.replace(',','.'))

## Analysing the correlation between STATUS-RESF ( Status of the cooling drum ) with the other variables:

*STATUS-RESF HAS 2 VALUES: OPERANDO (OPERATING) AND FALHA(FAILURE)*

Transforming *STATUS-RESF* to categorical data.

In [None]:
df['STATUS-RESF'] = df['STATUS-RESF'].astype('category').cat.codes

Extracting only the interesting values:

In [None]:
df_values = df.drop(['Date','Time'],axis=1)

In [None]:
df_values

Now, normalizing the numerical data to work as categorical:

In [None]:
x = df_values.values
min_max_scaler = preprocessing.MinMaxScaler()
x_scaled = min_max_scaler.fit_transform(x)
df_norm = pd.DataFrame(x_scaled)

In [None]:
df_norm.head()

In [None]:
df_norm.info()

Renaming to better understanding:

In [None]:
df_norm.columns = ['STATUS-RESF','RESF','TP-16-AMP','TP-16-FLOW','TEMP-PROD','VIB-RED']

In [None]:
df_norm.head()

In [None]:
corrMatrix = df_norm.corr()
fig = plt.figure(figsize=(10,10))
sns.heatmap(corrMatrix, annot=True)
plt.show()

## Now, let's look closer to the rows before the failures:

First, we need to create a Index:

In [None]:
df_norm['IDX'] = range(1, len(df_norm) + 1)

In [None]:
consecutives = df_norm['STATUS-RESF'].diff().ne(0).cumsum()
df_norm.groupby(consecutives).agg(list)

Now we have the IDs of the rows before each stop, let's take some examples for each stop (head)

In [None]:
idx = df_norm.index.get_indexer_for(df_norm[df_norm['STATUS-RESF']==0].index)
n=5
df_failure = df_norm.iloc[np.unique(np.concatenate([np.arange(max(i-n,0), min(i+n+1, len(df_norm)))
                                            for i in idx]))]
df_failure.head(10)

Let's analyse the correlation again:

In [None]:
df_failure=df_failure.drop(['IDX'],axis=1)
corrMatrix = df_failure.corr()
fig = plt.figure(figsize=(10,10))
sns.heatmap(corrMatrix, annot=True)
plt.show()

At least, let's look at some value:

In [None]:
df_failure.head(20)

Exporting the df to analysis:

In [None]:
df_failure.to_csv("Df_Failure.csv", index = False)

## Visual analysis of the Cooling Drum status through time

In [None]:
df_failure['IDX'] = range(1, len(df_failure) + 1)

In [None]:
import plotly.express as px

plt.figure(figsize=(40,20))
fig = px.line(df_failure.iloc[:,:5])
plt.savefig('norm_plot.jpg')
fig.show()

By the line plot, we can deduce some points:

* The temperature, flow and current arround the cooling drum rises before a failure.
* Right before a failure, we have a spike on the cooling drum current (probably compensating for overcharge).
* The temperature drops during a failure, possibly causing a clogging.
* The input conveyor belt has little to no contribution to the problem.