<a href="https://githubtocolab.com/purduelamm/purdue_me597_iiot/blob/main/lab/lab8/PL8_Colab3.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Prelab 8.3 Classifying Air-tight Vacuum and Air-leak Vacuum Data using Autoencoders for Anomaly Detection: Y and Z-axis


Because recent [Colab update on March 8, 2023](https://medium.com/google-colab/colab-updated-to-python-3-9-2593f8b1eb79), the default Python version in Colab is 3.9. This results in [TensorFlow version compatibility](https://www.tensorflow.org/install/source#tested_build_configurations) issues between Colab and Raspberry Pi since Raspberry Pi (Raspberry Pi OS version 10, Buster) uses Python 3.7 as default. Therefore, before we get started, let's first set up Python 3.7 and then install other required and compatible packages on Colab. This takes around 2 minutes.

Note that after you install TensorFlow 2.2.0 in 10th code block, you will see '**RESTART RUNTIME**' button as captured below. Please click the button and move to the next cell so that installed package is able to be applied to Colab session.

**In addition, please make sure that you perform the same procedure in the new Colab file if you want to develop your own machine-learning models and train them for implementation on Raspberry Pi.**

<br>

<img src="https://github.com/Eunseob/purdue_me597/blob/main/lab/img/prelab10_img0.png?raw=true" width="60%">

<br>

In [None]:
# keep the packages update
!sudo apt-get update -y
!sudo apt upgrade -y

In [None]:
# install Python3.7 on virtual session
!sudo apt-get install python3.7 python3.7-dev python3.7-distutils libpython3.7-dev

In [None]:
#change alternatives
!sudo update-alternatives --install /usr/bin/python3 python3 /usr/bin/python3.10 1
!sudo update-alternatives --install /usr/bin/python3 python3 /usr/bin/python3.7 2

In [None]:
#Check that it points at the right location
# The version musb be Python 3.7.X
!python3 --version

In [None]:
# install pip
!curl https://bootstrap.pypa.io/get-pip.py -o get-pip.py
!python3 get-pip.py --force-reinstall

In [None]:
#install colab's dependencies
!python3 -m pip install ipython ipython_genutils ipykernel jupyter_console prompt_toolkit httplib2 astor

In [None]:
!sudo apt install python3.7-distutils

In [None]:
# link to the old google package
!ln -s /usr/local/lib/python3.10/dist-packages/google \
       /usr/local/lib/python3.7/dist-packages/google


In [None]:
# IPython no longer exposes traitlets like this, it's a separate package now
!sed -i "s/from IPython.utils import traitlets as _traitlets/import traitlets as _traitlets/" /usr/local/lib/python3.7/dist-packages/google/colab/*.py
!sed -i "s/from IPython.utils import traitlets/import traitlets/" /usr/local/lib/python3.7/dist-packages/google/colab/*.py

In [None]:
# install tensorflow version 2.2.0
# After running this, you have to reconnect the session by clicking 'RESTART RUNTIME' button at the end of the output cell 
!pip install tensorflow==2.2.0
!pip install protobuf==3.20.1

In [None]:
# Let's check the installed tensorflow version
# The output cell must be 'TensorFlow Version is 2.2.0'
import tensorflow as tf

print('TensorFlow Version is', tf.__version__)

TensorFlow Version is 2.2.0


In [None]:
# required Python packages for this colab
!pip install matplotlib
!pip install pandas
!pip install scipy==1.4.1
!pip install scikit-learn

In [None]:
# The output will be 'TensorFlow Version is 2.11.0'
# However, in the following lab, we have to install TensorFlow version 2.2.0.
import tensorflow as tf

print('TensorFlow Version is', tf.__version__)

TensorFlow Version is 2.2.0


In [None]:
import matplotlib.pyplot as plt
import numpy as np
import pandas as pd
import scipy.fftpack
from tensorflow import keras

from sklearn.metrics import accuracy_score, precision_score, recall_score, roc_curve, auc
from sklearn.model_selection import train_test_split
from tensorflow.keras import layers, losses
from tensorflow.keras.models import Model

In [None]:
# Copying raw data from github dataset file
url = 'https://github.com/Eunseob/purdue_me597/blob/main/lab/lab8/Prelab8_data.csv?raw=true'
#df is the variable where the data is stored
df = pd.read_csv(url)

#Data selection
# X-axis: 'Xacc array [m/s2]'
# Y-axis: 'Yacc array [m/s2]'
# Z-axis: 'Zacc array [m/s2]'
# If you want to use x-axis (X-axis),
# AXIS = 'Xacc array [m/s2]'
AXIS =  #Pick and write the axis you want to work <-----------------------------------------------------------------------------

#Exploding the values contained in selected column and converting the string values into float values
df_new = pd.concat([df['Condition'],df[AXIS].str.split(' ', expand=True).astype(float)], axis=1)
ds = df_new.copy()
#Converting the Classifier into binary values
ds.loc[df['Condition'] == 'Vacuuming', 'Status'] = 1
ds.loc[df['Condition'] == 'Air_leakage', 'Status'] = 0
ds.drop('Condition', axis=1, inplace=True)

#Data transformation

raw_data = ds.values
# The last element contains the labels
labels = raw_data[:, -1]

# The other data points are the vacuum accelerometer data
data = raw_data[:, 0:-1]

train_data, test_data, train_labels, test_labels = train_test_split(
    data, labels, test_size=0.2, random_state=21
)
#Normalizing the values of the dataset 
min_val = tf.reduce_min(train_data)
max_val = tf.reduce_max(train_data)

train_data = (train_data - min_val) / (max_val - min_val)
test_data = (test_data - min_val) / (max_val - min_val)

train_data = tf.cast(train_data, tf.float32)
test_data = tf.cast(test_data, tf.float32)
#Splitting the dataset based on classification: train_labels: Vacuuming, ~train_labels: Air Leakage
train_labels = train_labels.astype(bool)
test_labels = test_labels.astype(bool)

normal_train_data = train_data[train_labels]
normal_test_data = test_data[test_labels]

anomalous_train_data = train_data[~train_labels]
anomalous_test_data = test_data[~test_labels]

portion_of_anomaly_in_training = 0.1 #10% of training data will be anomalies
end_size = int(len(normal_train_data)/(10-portion_of_anomaly_in_training*10))
combined_train_data = np.append(normal_train_data, anomalous_test_data[:end_size], axis=0)
combined_train_data.shape

SyntaxError: ignored

In [None]:
#Plotting sample of normal data
plt.grid()
plt.plot(np.arange(1000), normal_train_data[0])
plt.title("A Normal vibration signal")
plt.show()

In [None]:
#Plotting sample of anomalous data
plt.grid()
plt.plot(np.arange(1000), anomalous_train_data[0])
plt.title("An abnormal vibration signal (Air leakage)")
plt.show()

In [None]:
#Creating the artificial neural network using Autoencoder
EMBEDDING_SIZE =  #Define how many neurons in the inner layer   <-----------------------------------------------------------------------------
class AnomalyDetector(Model):
  def __init__(self):
    super(AnomalyDetector, self).__init__()
    self.encoder = tf.keras.Sequential([
      layers.Dense(32, activation="relu"),
      layers.Dense(16, activation="relu"),
      layers.Dense(EMBEDDING_SIZE, activation="relu")]) # Smallest Layer Defined Here
    
    self.decoder = tf.keras.Sequential([
      layers.Dense(16, activation="relu"),
      layers.Dense(32, activation="relu"),
      layers.Dense(1000, activation="sigmoid")])
    
  def call(self, x):
    encoded = self.encoder(x)
    decoded = self.decoder(encoded)
    return decoded

autoencoder = AnomalyDetector()
print("Chosen Embedding Size: ", EMBEDDING_SIZE)

autoencoder.compile(optimizer='adam', loss='mae')
#Training the model. 
history = autoencoder.fit(normal_train_data, normal_train_data, 
          epochs=200, 
          batch_size=200,
          validation_data=(test_data, test_data),
          shuffle=True)

In [None]:
#Plotting the evolution of training and validation loss
plt.plot(history.history["loss"], label="Training Loss")
plt.plot(history.history["val_loss"], label="Validation Loss")
plt.legend()

How are the loss functions looking? Is there a need to adjust the EMBEDDING SIZE or the epochs in order to minimize it more?

In [None]:
#Plotting True positive and false positive rate assessment
reconstructions = autoencoder(test_data)
loss = tf.keras.losses.mae(reconstructions, test_data)
fpr = []
tpr = []
#the test labels are flipped to match how the roc_curve function expects them.
flipped_labels = 1-test_labels 
fpr, tpr, thresholds = roc_curve(flipped_labels, loss)
plt.figure()
lw = 2
plt.plot(fpr, tpr, color='darkorange',
         lw=lw, label='ROC curve ')
plt.plot([0, 1], [0, 1], color='navy', lw=lw, linestyle='--')
plt.xlim([0.0, 1.0])
plt.ylim([0.0, 1.05])
plt.xlabel('False Positive Rate')
plt.ylabel('True Positive Rate')
plt.title('Receiver operating characteristic example')
plt.legend(loc="lower right")

# plot some thresholds
thresholds_every=20
thresholdsLength = len(thresholds)
colorMap=plt.get_cmap('jet', thresholdsLength)
for i in range(0, thresholdsLength, thresholds_every):
  threshold_value_with_max_four_decimals = str(thresholds[i])[:5]
  plt.scatter(fpr[i], tpr[i], c='black')
  plt.text(fpr[i] - 0.03, tpr[i] + 0.005, threshold_value_with_max_four_decimals, fontdict={'size': 15});

plt.show()

In [None]:
roc_auc = auc(fpr, tpr)
print(roc_auc)

In [None]:
threshold =  #Assign a value labeled in black in the ROC graph   <-----------------------------------------------------------------------------
def predict(model, data, threshold):
  reconstructions = model(data)
  loss = tf.keras.losses.mae(reconstructions, data)
  return tf.math.less(loss, threshold), loss

def print_stats(predictions, labels):
  print("Accuracy = {}".format(accuracy_score(labels, predictions)))
  print("Precision = {}".format(precision_score(labels, predictions)))
  print("Recall = {}".format(recall_score(labels, predictions)))
  preds, scores = predict(autoencoder, test_data, threshold)
print_stats(preds, test_labels)

### Task 3.1
How can you compare the models using data from the X-axis and Y-axis data? which one does a better job classifying? Explain your reasoning.



---

Write down your answer to Task 3.1 here.

---


##Working in the Z-axis

Recycle the code from the previous two dimensions, to build a model using the data from the Z-axis.

In [None]:
#Your code here




#

### Task 3.2
Which model (X, Y, or Z) would you choose to classify normal and abnormal readings for the vacuum problem? Explain your reasoning.



---

Write down your answer to Task 3.2 here.

---


### Task 3.3
What other data transformations/extractions would you consider to build a model to classify normal and abnormal data on the vacuum problem?



---

Write down your answer to Task 3.3 here.

---


<br><br>


[View the rubric for this prelab](https://colab.research.google.com/github/purduelamm/purdue_me597_iiot/blob/main/lab/lab8/PL8_Rubric.ipynb)

<br></br>

Get back to [Lab Index Page](https://colab.research.google.com/github/purduelamm/purdue_me597_iiot/blob/main/index.ipynb)