### Predictive Maintenance in the Robotic Arms Industry by applying Digital Twin Technology

This is the main working file. It includes a function that simulates a robotic arm. During its operation, there will be irregularities. The applied algorithms will try to detect the irregularities, classify them as anomalies, and set further measures to pinpoint the problematic component.

## Imports

Before running the notebook, make sure that all neccessary modules and programms are correctly installed. Also, I would advise to use Python 3.7.9, since I had troubles running parts of the code with other versions.

In [None]:
#machine learning
import shap
import numpy as np
import pandas as pd
from sklearn.ensemble import RandomForestClassifier
from sklearn.model_selection import train_test_split
from sklearn.metrics import accuracy_score
from sklearn.metrics import precision_score
from sklearn.metrics import f1_score
from sklearn.metrics import f1_score
from sklearn.metrics import confusion_matrix
from sklearn.inspection import permutation_importance
import matplotlib.pyplot as plt
import pyspark

#azure
from azure.digitaltwins.core import DigitalTwinsClient
from azure.identity import DefaultAzureCredential
from azure.identity import VisualStudioCodeCredential

#additional installs
import os
import time

#python scripts
import anomaly_detection as ad
import digital_twin_azure as dt
import predictive_maintenance as pm

## Connecting to Azure 

The following code connects your environment to the Azure platform. A browser window will pop up and you will be asked to login to your microsoft account. This information will then be stored for easy access of to the SDK.

In [None]:
!az login

## Credentials

The following code chunk uses the credentials received by the previous step. It will build a service client which will be needed to update the Digital Twin, or run certain queries.

In [None]:
#define the URL of your Digital Twin instance on the Azure platzform
url = "SeleniumForest.api.weu.digitaltwins.azure.net"

#store the gathered credentials in a variable
credential = DefaultAzureCredential()
#create an instance of the Digital Twin Client
#It can be resued later on
global service_client
service_client = DigitalTwinsClient(url, credential)

## Load Data Set

Next we will load the data set into a pandas datafram. Make sure that the Data repository exists within the set working directory. 

In [None]:
#store the CSV content in a dataframe
df = pd.read_csv('./Data/right_arm.csv')
df.head()

## Pre-Processing

This section will cover the pre-processing of the available dataframe. First, all blank spaces within the column names will be replaced by underlines. This will make working with the names a lot easier, as it allows for simple copy and paste shortcuts and avoid processing errors. Furthermore, the function extract_every_nth_row() can be used to reduce the proccessing time. It extracts every n´th row of the dataframe. Since one might run low computational resources, n can be adjusted to personal preferences or skipped entirely. 

Furthermore, there will  be a train and test split of all defined features and the target. Lastly, the index will be reseted.

In [None]:
#replace blank spaces in the column names with '_'
df.columns = df.columns.str.replace(' ', '_')

#use only required columns
df = df[['Norm_of_Cartesion_Linear_Momentum', 'Robot_Current', 'Tool_Current', 'Tool_Temperature', 'TCP_Force', 'Anomaly_State']]

################################
##make it smaller for testing###
#since the dataset is quite big 
#one can keep this code block, to
#shrink the dataset#############
def extract_every_nth_row(df, n):
    new_df = df.iloc[::n].copy()
    return new_df

n = 100

df = extract_every_nth_row(df, n)
################################
################################

#seperate the features and the target
#X and y 
#features
X = df[['Norm_of_Cartesion_Linear_Momentum', 'Robot_Current', 'Tool_Current', 'Tool_Temperature', 'TCP_Force']]
#target
y = df['Anomaly_State'] 

#Train & Test Split
#for now the testsize is 20% of the total dataset
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

#reset the index to make the subsets iterateable again
X_train = X_train.reset_index(drop=True)
X_test = X_test.reset_index(drop=True)
y_train = y_train.reset_index(drop=True)
y_test = y_test.reset_index(drop=True)

## Simulation

The following code will focus on creating and running the simulation. The function start_machine() contains all previously created python scripts. By running start_machine() the function will simulate the startup of the robotic arm. Sensory data coming from teh simulation data set will be received. 

First, the code will make sure that the environment is correctly connected to the Azure instance. A browser window might pop-up and you will be asked to login with your personal credentials. Next, train_model() will be used to build a random forest prediction model. After that step is completed, simulated robotic arm will start sending sensory data to the digital environment. By running predict_model(), the algorithm will predict the current anomaly state for the machine given the pre-trained model. After that, the received prediction will be sent to the Digital Twin together with all received sensory data. Additionally, a dashboard-like plot will be displayed, showing all current workloads and the machines anomaly state. In case an anomaly is detected, an alarm will pop-up notifing the user. Furthermore, a SHAP bar plot will appear. This plot will show which component was relevant for the algorithm´s predictions. Domain-expert therefore, can take a closer look at the identified faulty components. 

This can be done untill the simulation data runs out, or the process has reached the predefined threshold (see "set iterations for simulation run").  

In [None]:
def start_machine(model_name: str, df, df_sim, index):
        print('Connecting to the Azure platform...')
        service_client = dt.connect_azure()
        
        print('Machine starting up...')
        print('Training ML algorithm...')
        rf = ad.train_model(X_train, X_test, y_train, y_test)
        
        print('Model has been trained')
        
        #set as global to calculate resulting scores
        global y_predicted
        
        #set empty list to store predictions
        global y_predicted_list
        y_predicted_list = []

        #set iterations for simulation run
        while index < 5:
                y_predicted = ad.predict_model(X_test, index, rf)
                y_predicted_list.append(y_predicted)
                dt.update_machine('RoboArm', X_test, index, y_predicted)
                dt.plot_twin_state()
                
                if y_predicted == 1:
                        print('Anomaly detected!')
                else:
                        print('No anomaly detected.')
                
                index += 1
        
        print('Simulation complete. Generating SHAP beeswarm and partial dependence plot...')
        
        #run function to generate beeswarm for all instances of X_test.
        #also prints pdp for chosen feature and single instance with index 0.
        pm.explain_prediction(X_train, y_train, X_test, 0)
           
        return y_predicted

The following code chunk will start the machine. By defining the initial index the starting point within the simulation data set can be chosen. In this case, the X_test set was used as the simulation dataset, giving the possibility to compare prediction results with actual anomalies.

It is worth mentioning, that SHAP might have issues displaying the plot due to interdepenencies to other plot modules. For most cases, the shap.initjs() should fix this issue. However, if the issue still occurs I would recommend storing the plot as a jpg or png and open it manually on your machine instead of within the IDE.

In [None]:
#set starting index for simulation data set
shap.initjs()

start_machine('RoboArm', df, X_test, 0)

### Simulation Results
The following code can be used to make working and evaluating the digital twin´s performance a bit easier. 

In [None]:
cm = confusion_matrix(y_test[:15], y_predicted_list)
print("Confusion Matrix:")
print(cm)

#Accuracy of simulation
accuracy = accuracy_score(y_test[:15], y_predicted_list)
print("Accuracy:", accuracy)

#Precision of simulation
precision = precision_score (y_test[:15], y_predicted_list)
print("Precision:", precision)

#F1 Score of simualtion
#f1_score = f1_score(y_test[:15], y_predicted_list, average='weighted')
#print("F1 score:", f1_score)

comparison_df = pd.DataFrame({
        'y_test': y_test[:15], 
        'prediction_results': y_predicted_list
    })
comparison_df.head(50)

### Beeswarm Plot to visualize SHAP values during Simulation

In [None]:
#This code will print a SHAP beeswarm plot of the simulation data (X_test)
model = RandomForestClassifier(n_estimators=100, random_state=42)
model.fit(X_train, y_train)
y_predicted = model.predict(X_test)

explainer = shap.Explainer(model)
shap_values = explainer(X_test)

positive_indices = np.where(y_predicted == 1)[0]
shap_values_positive = shap_values[positive_indices, :, 1]

shap.plots.beeswarm(shap_values_positive)

### Partial Dependence Plot 

In [None]:
#This code generates a SHAP partial dependence plot for a specific feature.
#The feature can be chosen by changing the first function parameter.

#select only positive entries 
selected_X_test = X_test.iloc[positive_indices, :]
selected_X_test

selected_y_test = np.where(y_test == 1)
selected_y_test

In [None]:
#show partial dependence plot
sample_ind=4
shap.partial_dependence_plot(
    "Tool_Temperature", model.predict, selected_X_test, model_expected_value=True,
    feature_expected_value=True, ice=False,
    shap_values=shap_values_positive[sample_ind:sample_ind+1,:]
)

In [None]:
pm.create_shap_beeswarm_all(X_train, y_train, X_test, 0)