## Import Packages

In [None]:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt

In [None]:
# Import 'Tensorflow' pakage
import tensorflow as tf
from tensorflow import keras

# Check the version of tensorflow
print(tf.__version__)

In [None]:
# Check if a GPU(in Google server) is allocated
device_name = tf.test.gpu_device_name()
if device_name != '/device:GPU:0':
    raise SystemError('GPU device not found')

print('Found GPU at: {}'.format(device_name))

In [None]:
# Acess to google drive
from google.colab import drive
drive.mount('/content/drive')

.

.

.
# Load Raw Data and Extract Acceleration Data
- Generate single array that consists of every acceleration data (normal and abnormal)

In [None]:
AccData_pd_load = pd.read_csv('https://github.com/purduelamm/purdue_me597_iiot_online/blob/main/ml_tutorial/Dataset_Acc/AccData.csv?raw=true').iloc[:,1:]
AccData_pd_load.shape

In [None]:
AccData = np.array(AccData_pd_load)
AccData.shape

# Convert Acceleration Data into Spectrogram by STFT

[Tip]

You can define the size of spectrogram (resolution of time and frequency)

by adjusting 'Number of samples(N) per segment (nperseg)' and 'Number of samples(N) for overlap'

In [None]:
from scipy import signal

Fs = 12800  # Sampling Frequency
f,t,AccSTFT = signal.spectrogram(AccData, Fs, nperseg = 78, noverlap = 10)
AccSTFT.shape

.

.

.

.

## Split Training & Test Data
- Use 'train_test_split' function
- It randomly samples the training and testing data according to the designated ratio.

In [None]:
NoOfData = 180

NormalSet   = AccSTFT[:NoOfData]
AbnormalSet = AccSTFT[NoOfData:]

NoOfSensor  = 1
NormalSet   = NormalSet.reshape(NormalSet.shape[0], NormalSet.shape[1], NormalSet.shape[2], NoOfSensor)
AbnormalSet = AbnormalSet.reshape(AbnormalSet.shape[0], AbnormalSet.shape[1], AbnormalSet.shape[2], NoOfSensor)

NormalSet.shape, AbnormalSet.shape

In [None]:
from sklearn.model_selection    import train_test_split

# Designate test data ratio
TestData_Ratio = 0.2

TrainData_Nor, TestData_Nor = train_test_split(NormalSet  , test_size=TestData_Ratio, random_state=777)
TrainData_Abn, TestData_Abn = train_test_split(AbnormalSet, test_size=TestData_Ratio, random_state=777)

print(TrainData_Nor.shape, TestData_Nor.shape)
print(TrainData_Abn.shape, TestData_Abn.shape)

## Data Labling (One-hot Encoding)
- Use 'np.zeros' and 'np.ones'
- '[1,0]' refers to 'Normal' and '[1,0]' refers to 'Abnormal' in this tutorial

In [None]:
TrainLabel_Nor = np.zeros((TrainData_Nor.shape[0],2))
TrainLabel_Abn = np.ones( (TrainData_Abn.shape[0],2))
TestLabel_Nor  = np.zeros((TestData_Nor.shape[0],2))
TestLabel_Abn  = np.ones( (TestData_Abn.shape[0],2))

TrainLabel_Nor[:,0] = 1  # [1,0]: Normal
TrainLabel_Abn[:,0] = 0  # [0,1]: Abnormal
TestLabel_Nor[:,0]  = 1  # [1,0]: Normal
TestLabel_Abn[:,0]  = 0  # [0,1]: Abnormal

print(TrainLabel_Nor.shape, TestLabel_Nor.shape)
print(TrainLabel_Abn.shape, TestLabel_Abn.shape)

## Data and Label Preparation

In [None]:
TrainData  = np.concatenate([TrainData_Nor , TrainData_Abn ], axis=0)
TestData   = np.concatenate([TestData_Nor  , TestData_Abn  ], axis=0)
TrainLabel = np.concatenate([TrainLabel_Nor, TrainLabel_Abn], axis=0)
TestLabel  = np.concatenate([TestLabel_Nor , TestLabel_Abn ], axis=0)

print(TrainData.shape,  TestData.shape)
print(TrainLabel.shape, TestLabel.shape)

.

.

.

.

.

### [Main hyperparameters of CNN]

1. **Number of Convolutional Layers**: The number of convolutional layers in a CNN determines the network's ability to capture hierarchical features of the input data. Each layer can be thought of as a level of abstraction, with initial layers capturing basic features like edges and textures, and deeper layers capturing more complex features. The depth of the network is crucial for learning complex patterns, but increasing the number of convolutional layers increases the computational cost and the risk of overfitting. The depth $D$ of the network is directly related to the number of convolutional layers.

.

2. **Filter Size in Convolutional Layers**: The size of the filters (or kernels) in convolutional layers affects the area of input data that each filter covers. Common filter sizes include [3x3],[5x5], and [7x7]. Smaller filters can capture finer details of the input image, while larger filters capture broader features but with less spatial resolution. The choice of filter size ($F$) is a trade-off between capturing detailed features and computational efficiency.

.

3. **Number of Filters per Convolutional Layer**: The number of filters in a convolutional layer determines how many features are captured from the input data at that layer. More filters allow the network to capture a wider variety of features, enhancing the network's ability to recognize different patterns in the data. However, increasing the number of filters ($N$) also increases the computational complexity and the model's capacity, potentially leading to overfitting.

.

4. **Type of Pooling Layers**: Pooling layers reduce the spatial dimensions of the input feature maps, making the network more efficient and reducing the sensitivity to the exact location of features in the input data. There are two main types of pooling:
Max Pooling: Selects the maximum value from each patch of the feature map.
Average Pooling: Computes the average value of each patch of the feature map.
Pooling operations are typically applied with a [2x2] window, reducing the spatial dimensions of the feature maps by a factor of 2. This downsampling effect helps to make the model more robust to variations in the position of features within the input data.

.

5. **Stride and Padding**: Stride ($S$) and padding ($P$) are hyperparameters that affect the size of the output feature maps produced by convolutional and pooling layers. Stride refers to the number of pixels by which the filter moves across the input image. A stride of 1 means the filter moves one pixel at a time, while a higher stride results in larger movements. Padding involves adding extra pixels around the input image to allow the convolutional operation to be applied more fully at the borders of the image. Proper selection of stride and padding is essential for controlling the dimensions of the output feature maps and ensuring that important information is not lost at the edges of the image.

.

6. **Activation functions**: Similar to ANNs, activation functions in CNNs introduce non-linearity, enabling the network to learn complex patterns and representations. Common activation functions include:
  - Sigmoid: $f(x) = \frac{1}{1 + e^{-x}}$
  - Hyperbolic Tangent (tanh): $f(x) = \frac{e^x - e^{-x}}{e^x + e^{-x}}$
  - Rectified Linear Unit (ReLU): $f(x) = max(0, x)$
  - Leaky ReLU: $f(x) = max(\alpha x, x)$, where $\alpha$ is a small constant (e.g., 0.01)

.

7. **Learning rate, Optimizer, Loss function, and Epochs**: The considerations for learning rate, optimizer, loss function, and epochs in CNNs are similar to those in ANNs. These hyperparameters play a critical role in the training process, affecting the speed and quality of convergence to the optimal model weights. The choice of optimizer (e.g., Adam, SGD) and loss function (e.g., Cross-Entropy Loss) is dictated by the specific task and data at hand. The learning rate controls the update magnitude of the model weights during training, and the number of epochs determines how many times the entire training dataset is passed through the network.

### Prepare lists of hyperparameters for grid search

In [None]:
# Hyperparameters for grid search
param_FiltS = [3, 5] # filter(kernel) size (only convolution layer)
param_FiltN = [2, 4] # number of filters   (only convolution layer)
param_Strid = [1, 2] # stride              (only convolution layer)

# Fixed hyperparameters
noOfNeuron    = 10
learningRate  = 0.0001
Epoch         = 1000

# Calculate the number of cases
NoOfCases = len(param_FiltS) * len(param_FiltN) * len(param_Strid)
NoOfCases

In [None]:
# Complete this function
def CNN_model(input_data, noOfNeuron, learningRate, filterSize, numOfFilters, stride):
















    return model

In [None]:
# Create an empty dataframe to store the accuracy results
Accuracy_df = pd.DataFrame(np.zeros(shape=(NoOfCases , 4)),
                           columns=['filter size', 'number of filters', 'stride', 'Accuracy'])
Accuracy_df

### Train the CNN models with different combinations of hyperparameters and save them

In [None]:
# Initialize a count value to store the performance of each model
cnt = 0

# Iterate through all possible combinations of filter size, filter number, and stride

















# Display the resulting dataframe with model performances
Accuracy_df

### Confirm the grid search results

In [None]:
# Sort the Accuracy_df by 'Accuracy' column in descending order
Accuracy_df_sorted = Accuracy_df.sort_values(by='Accuracy', ascending=False).reset_index(drop=True)

# Output the best case
Best_FiltS = int(Accuracy_df_sorted.iloc[0, 0])
Best_FiltN = int(Accuracy_df_sorted.iloc[0, 1])
Best_Strid = int(Accuracy_df_sorted.iloc[0, 2])

print(f"[Best case]\n" +
      f"Filter size   : [{Best_FiltS},{Best_FiltS}]\n" +
      f"Num of Filters: {Best_FiltN}\n" +
      f"Strides       : {Best_Strid}\n" +
      "Accuracy: %.2f" % (Accuracy_df_sorted.iloc[0, 3]))

In [None]:
# Calculate mean and standard deviation accuracy for each filter size
mean_accuracy_FiltS = Accuracy_df.groupby(['filter size'])['Accuracy'].agg(['mean', 'std']).reset_index()
mean_accuracy_FiltS

In [None]:
# Calculate mean and standard deviation of accuracy for each number of filter
mean_accuracy_FiltN = Accuracy_df.groupby(['number of filters'])['Accuracy'].agg(['mean', 'std']).reset_index()
mean_accuracy_FiltN

In [None]:
# Calculate mean and standard deviation of accuracy for each stride
mean_accuracy_Strid = Accuracy_df.groupby(['stride'])['Accuracy'].agg(['mean', 'std']).reset_index()
mean_accuracy_Strid

## [Confusion matrix] for the best CNN model

- A table that visualizes the performance of a classification model by displaying the number of true positive (TP), true negative (TN), false positive (FP), and false negative (FN) predictions. The rows represent the true class labels, while the columns represent the predicted class labels. In a binary classification problem:

    - TP: The number of instances where the model correctly predicted the positive class.
    - TN: The number of instances where the model correctly predicted the negative class.
    - FP: The number of instances where the model falsely predicted the positive class (actual negative instances).
    - FN: The number of instances where the model falsely predicted the negative class (actual positive instances).

In [None]:
# Retrieve activation function, hidden layers, and learning rate values from the first row of 'Accuracy_df_sorted'
Best_FiltS = int(Accuracy_df_sorted.iloc[0, 0])
Best_FiltN = int(Accuracy_df_sorted.iloc[0, 1])
Best_Strid = int(Accuracy_df_sorted.iloc[0, 2])

# Load the best ANN model using the retrieved hyperparameters
best_cnn_model_name = f'CNN_FS{Best_FiltS}_FN{Best_FiltN}_St{Best_Strid}.h5'
best_cnn_model = keras.models.load_model('/content/drive/MyDrive/Colab Notebooks/SavedFiles/ML_Models/GridSearch_CNN/' + best_cnn_model_name)

# Predict the output (Robotic spot-welding condition) for the test data
Predicted = best_cnn_model.predict(TestData)

# Convert TestLabel and Predicted into vectors to calculate the confusion matrix and evaluation metrics
TestLabel_rev = np.argmax(TestLabel, axis=1)
Predicted_rev = np.argmax(Predicted, axis=1)

# Plot the confusion matrix
import seaborn as sns
from sklearn.metrics import confusion_matrix

# Calculate the confusion matrix
cm = confusion_matrix(TestLabel_rev, Predicted_rev)

plt.figure(figsize=(6, 6))
sns.heatmap(cm, annot=True, fmt='d', cmap=plt.cm.Blues, cbar=False, square=True)
plt.xlabel("Predicted label")
plt.ylabel("True label")
plt.title("Confusion Matrix of the Best CNN Model")
plt.show()

## [Evaluation metrics] for the best CNN model

1. $Accuracy$: The proportion of correctly classified instances out of the total instances. It measures the overall performance of a classification model.

    - $Accuracy: (TP + TN) / (TP + TN + FP + FN)$

2. $Precision$: The proportion of true positive instances among the instances predicted as positive. It measures how well the model correctly identifies positive instances.

    - $Precision: TP / (TP + FP)$

3. $Recall$: The proportion of true positive instances among the actual positive instances. It measures the ability of the model to find all the positive instances.

    - $Recall: TP / (TP + FN)$

4. $F1 Score$: The harmonic mean of precision and recall. It provides a single score that balances both precision and recall, which is especially useful when dealing with imbalanced datasets.

    - $F1 Score: 2 * (Precision * Recall) / (Precision + Recall)$

In [None]:
from sklearn import metrics

# Calculate the evaluation metrics
accuracy  = metrics.accuracy_score(TestLabel_rev, Predicted_rev)
precision = metrics.precision_score(TestLabel_rev, Predicted_rev)
recall    = metrics.recall_score(TestLabel_rev, Predicted_rev)
f1_score  = metrics.f1_score(TestLabel_rev, Predicted_rev)

# Print the evaluation metrics
print(f"Best CNN Model Evaluation:\n")
print(f"Accuracy : {accuracy:.2f}")
print(f"Precision: {precision:.2f}")
print(f"Recall   : {recall:.2f}")
print(f"F1 Score : {f1_score:.2f}")