# DeviceEngine Class

Dedicated engine for device data, inherited from Core Engine. Each DeviceEngine class object will represent a unique device with its own set of processing parameters and results.

In [1]:
from src.StreamPort.device.DeviceEngine import DeviceEngine

SyntaxError: f-string: unmatched '[' (DeviceEngine.py, line 411)

ExtractedLabelled contains the simulated errors from the 2D-LC. Batches titled 'Basis' and 'Basis_post' are considered "true" curves, i.e., they represent standard conditions with no faults.

In [None]:
#specify path to get analyses from
base_dir = r'C:\Users\PC0118\Desktop\ExtractedSignals'
#base_dir = r'C:\Users\PC0118\Desktop\ExtractedLabelled'

Creates an empty DeviceEngine object and prints it

In [None]:

dev = DeviceEngine(source = base_dir)
dev.print()

DeviceEngine object without an explicitly provided source performs all capabilities on files within the current working directory.

In [None]:
#dev1 = DeviceEngine()
#dev1.print()
#print(dev1._source)
#del dev1

# ProjectHeaders Class

In [None]:
from src.StreamPort.core import ProjectHeaders

In [None]:
head = ProjectHeaders.ProjectHeaders(dtype = '2-D')

In [None]:
print(head.headers)

In [None]:
head.print()

Add project headers. They can be passed as ProjectHeaders objects or dict

In [None]:
dev.add_headers(headers = {'name': 'Pressure Curve Analysis', 'author': 'Sandeep H.'})
dev.print()

# DeviceAnalysis Class

Each DeviceAnalysis object is a child of the Analysis Class. It holds the details of an Analysis for each individual device.

In [None]:
#from src.StreamPort.device.DeviceAnalysis import DeviceAnalysis

#Creates an empty DeviceAnalysis object and prints it
#devAnalysis = DeviceAnalysis()
#devAnalysis.print()

 
DeviceEngine's find_analyses() method returns a DeviceAnalysis Object or a list of DeviceAnalysis objects, besides printing the dataframes for each unique Method, paired with the metadata(Date, Runtime) for each curve.

This method makes use of the source variable to accept a path to a directory containing analyses as an argument and find analyses from the target path.

The path can refer to a directory containing data for specific groups of experiments "210812_Gem 2021-08-12 09-49-10" or one such experiment containing its own set of method-related analysis data "210812_Gem--005.D", "210812_Gem--007.D", ..



Read analysis objects from engine.

In [None]:
analyses = dev.find_analyses()

Each DeviceEngine object has an attribute _method_ids that records all methods encountered in the analysis of the current Device.

In [None]:
print(dev._method_ids)

And an attribute _history to hold data on all experiments related to this device.

Add analyses objects that were found using find_analyses() to current device records.

Add analyses in the form of individual DeviceAnalysis objects or a list of such objects.

In [None]:
dev.add_analyses(analyses)

In [None]:
dev.print()

In [None]:
#ana = dev.get_analyses('09:59:42')

get_analyses always returns a list, even if it contains only one element

In [None]:
#ana[0].print()

since the above analysis from the labeled data (09:59:42) slightly deviates from the remaining 'basis' data it is removed from the dataset, just as with the extreme anomalies(001-blank) so the training data is not polluted

In [None]:
#dev.remove_analyses('Device Pressure Analysis - 240930_Mix-1_training-data_basis_post 2024-09-30 09-58-37| Start time: 09:59:42 09/30/24')

In [None]:
#dev.print()

# Plot Analyses

DeviceEngine's *plot_analyses()* and *plot_results()* calls each analysis object's respective *plot()* function after dynamically grouping related analyses. 
Grouping is done on the basis of unique method id's paired with unique experiment dates.
User can set the 'group_by'(str) argument to control how the data is grouped. Defaults to 'method', otherwise 'date'

Plot analyses by calling inbuilt plot function and passing each object's index as argument

Plot analyses by word or subword present in analysis date

Plot all available analyses by omitting 'analyses' argument.
Group by defaults to 'method'

In [None]:
#dev.plot_analyses('basis', group_by='method')

In [None]:
dev.plot_analyses(group_by='method')

# ProcessingSettings - Feature Extraction

Create a new ProcessingSettings object 

In [None]:
from src.StreamPort.device.DeviceProcSettings import ExtractPressureFeatures

'weighted' argument of ExtractPressureFeatures object can be used to control whether the pressure curves should first be transformed by calculating percentage change between adjacent datapoints.
Defaults to False, in which case feature extraction is performed on the raw pressure curves.

In [None]:
settings = ExtractPressureFeatures(weighted=False)

Add processing settings

In [None]:
dev.add_settings(settings)
dev.print()

Now we run the settings to extract pressure features after adding analyses.

In [None]:
pressure_features = settings.run(dev)

In [None]:
print(pressure_features)

Add the extracted features to the results (dict) attribute

In [None]:
dev.add_results(pressure_features)

Retrieve the stored results associated with the current object.

# ProcessingSettings - Seasonal Decomposition

Create a new ProcessingSettings object to extract seasonal components from analyses.

In [None]:
from src.StreamPort.device.DeviceProcSettings import DecomposeCurves

*'period' argument of DecomposeCurves is used to control the window size over which the features are calculated. Defaults to 30 here.

In [None]:
curve_decompose = DecomposeCurves(period=30)#period was 30, defaults to 10. try both

In [None]:
dev.add_settings(curve_decompose)
dev.print()

In [None]:
seasonal_components = curve_decompose.run(dev)
print(seasonal_components)

In [None]:
dev.add_results(seasonal_components)

In [None]:
dev.get_results(-1)

#Each .D folder is an analysis with timestamp

Latest entry in analyses contains most up to date results

# ProcessingSettings - Fourier Transformation

Create a new ProcessingSettings object to perform Fast Fourier Analysis on raw curve and seasonal component of analyses time decomposition.

In [None]:
from src.StreamPort.device.DeviceProcSettings import FourierTransform

In [None]:
fourier_transform = FourierTransform()

In [None]:
dev.add_settings(fourier_transform)
dev.print()

In [None]:
transformed_seasonal = fourier_transform.run(dev)
print(transformed_seasonal)

In [None]:
dev.add_results(transformed_seasonal)

In [None]:
dev.get_results(-1)

scaled results are unavailable since data has not been scaled yet

Adding features before scaling:
scale_features() calls add_extracted_features() before grouping and scaling data.

add_extracted_features() introduces new features that were extracted from the behaviour of the seasonal and noise components of the raw curves in the frequency domain. These frequencies were binned and averaged in different time-windows and added as features.

Additional features added were Idle time of the batch, error in defined vs. measured runtime.

# ProcessingSettings - Feature Scaling  


Scale extracted and engineered features to improve the quality of the information we get from them. These prove more useful when visually analysing data

In [None]:
from src.StreamPort.device.DeviceProcSettings import Scaler

User selects the type of scaler to be used from preloaded options : 'minmax', 'std'(Standard), 'robust', 'maxabs', 'norm'(Normalizer).
Scaler defualts to Normalizer in the absence of an argument.

'replace' argument allows user to replace existing features with scaled features or to create a new entry instead. Defaults to False.

In [None]:
feature_scaler = Scaler(parameters='robust')

In [None]:
dev.add_settings(feature_scaler)
dev.print()

In [None]:
scaled_features = feature_scaler.run(dev)

In [None]:
dev.add_results(scaled_features)

In [None]:
dev.print()

# Plot Results 

Plot the computed results of feature extraction for chosen results based on user input to select *base* to extract base features, *decompose* for seasonal decomposition, fourier *transform* 

User may also plot the raw pressure curves by omitting the 'features' argument, indicating that the *results* of feature extraction are not to be plotted, just the curves.

In [None]:
#this_method = 'Pac' 

'group_by' allows user to group data either by 'date' or 'method':
1. 'date' prepares data with weight on experiment date. So matching methods on different dates will not be grouped.
2. 'method' prepares data purely on method and groups all available data for the given method.

In [None]:
#dev.plot_results(this_method, group_by='method', interactive=True, scaled=False)

Select features to plot. Setting 'scaled' argument allows to toggle plots of scaled features or unscaled. Defaults to True.


In [None]:
#dev.plot_results(results = 'Pac', features ='base', scaled=True, transpose=True, group_by='method')

In [None]:
#dev.plot_results(results = ['11:35:38', '15:53:24'], features ='base', transpose=True, interactive=True, scaled=True, type='bar')

use 'interactive' argument to toggle between static and interactive plots

In [None]:
#dev.plot_results(results = ['11:35:38', '15:53:24'], features ='decompose', scaled=False, interactive= True)

setting type to 'box' enables a box plot of the data. Available options are 'box' and 'scatter' by default

In [None]:
#dev.plot_results(results = ['11:35:38', '15:53:24'], features ='transform', scaled=False)

# MachineLearning - Isolation Forest for preliminary classification  

ADD CLASS LABELS TO ANALYSIS OBJECTS AFTER FEATURE ANALYSIS. FIRST ANALYSIS '001-blank' is assigned a separate class of ML operations due to it being a systematic fault.

classify() dynamically assigns class labels through MLEngine's make_iso_forest() to all analyses encountered and classified

First, create a MachineLearningEngine object to enable ML ops on prepared data.

In [None]:
from src.StreamPort.ml.MachineLearningEngine import MachineLearningEngine

Alternative to running iso_forest to create each engine object

get_device_data from each MLEngine object can extract relevant and preprocessed data from a given device, while also updating it to conform to MLAnalysis object specifications.
Iso forest was designed to have this method inbuilt.

In [None]:
ml_engine = MachineLearningEngine()

random_state(int) argument can be specified to reproduce results. Defaults to None, sets a random seed.

In [None]:
from src.StreamPort.ml.MachineLearningProcessingSettings import MakeModelIsoForest
iso_forest = MakeModelIsoForest(dev, random_state=22)#22 seemed to pick better train sets

In [None]:
ml_engine.add_settings(iso_forest)
ml_engine.print()

In [None]:
method_objects = iso_forest.run(ml_engine)

In [None]:
print(method_objects)

In [None]:
pac_engine = method_objects[-2][0]
points = pac_engine.get_data()
pac_anomalies = pac_engine.get_anomalies()
print('dataset', points)
print('anomalies:', pac_anomalies)

In [None]:
import pandas as pd

In [None]:
all_anomalies = []
for i in range(len(method_objects)):
    engine = method_objects[i][0]
    anomalies = engine.get_anomalies()
    anomalies.to_csv(f'anomalies_{i}.csv', index=False)

In [None]:
pac_preds = method_objects[-2][1]
print('target var', pac_preds)

In [None]:
X = points
y = pac_preds

# Evaluation Metrics - Cross-validation


In [None]:
from sklearn.model_selection import cross_val_score

In [None]:
from sklearn.ensemble import IsolationForest as iso
model = iso(contamination= 0.15, bootstrap= True, random_state=22)

In [None]:
#cv_scores = cross_val_score(model, X, y, cv=5)

# MachineLearning - PCA

make_iso_forest() of MLEngine class automatically creates sub-objects of MLEngine class for each encountered group of analyses per unique method after performing iso_forest and plotting results. Can be modified to save results later

PCA is first used in dimensionality reduction before applying DBSCAN and also as an alternative outlier detection model.

In [None]:
from src.StreamPort.ml.MachineLearningProcessingSettings import MakeModelPCASKL
pca = MakeModelPCASKL(n_components = 2, center_data= True)

In [None]:
#pac_engine = method_engines['Pac_engine']
#ikali_engine = method_engines['Irino_Kali_engine']

pac_engine.add_settings(pca)
pca_scores = pca.run(pac_engine)
pac_engine.add_results(pca_scores)
pca_results = pac_engine.get_results('pca_model')

In [None]:
print("Principal components:\n", pca_results[1].components_)
print("Explained variance ratio:", pca_results[1].explained_variance_ratio_)

In [None]:
print("Transformed data:\n", pca_results[0])

In [None]:
dev.plot_results('basis_post')

In [None]:
dev.plot_results(results ='basis_post', features ='base', scaled=True, transpose=False, group_by='method')

In [None]:
dev.plot_results(results ='basis_post', features ='base', scaled=True, transpose=True, group_by='method')

In [None]:
import matplotlib.pyplot as plt

In [None]:

x_pca = pca_results[0]
plt.figure(figsize=(8, 6))
plt.scatter(x_pca[:, 0], x_pca[:, 1], color='blue', label='Data Points')

# Add labels and title
plt.xlabel('Principal Component 1')
plt.ylabel('Principal Component 2')
plt.title('PCA Score Plot')

sample_names = list(points.index)
samples = [(point.split('|')[-1]).split(' ')[-2] for point in sample_names]

print(len(x_pca))
# Optional: Add text annotations for each point
for i in range(len(x_pca)):
    plt.text(x_pca[i, 0], x_pca[i, 1], samples[i], fontsize=12)

plt.legend()
plt.grid(True)
plt.show()

# Optionally print the explained variance
print("Explained variance ratio:", pca_results[1].explained_variance_ratio_)

# Machine Learning - DBSCAN

In [None]:
from sklearn.cluster import DBSCAN
eps = 0.8 # Adjust based on the k-distance plot
min_samples = 6 # Adjust based on your data
dbscan = DBSCAN(eps=eps, min_samples=min_samples)
dbscan_results = dbscan.fit_predict(x_pca)
print(dbscan_results)

In [None]:
plt.figure(figsize=(8, 6))
plt.scatter(x_pca[:, 0], x_pca[:, 1], c=dbscan_results, cmap='viridis', marker='o', s=50)
plt.title('DBSCAN Clustering')
plt.xlabel('Feature 1')
plt.ylabel('Feature 2')
plt.colorbar(label='Cluster Label')
plt.show()

# Machine Learning - Partial Least Squares 

Partial least sqares is used as an additional metric to identify anomalies

In [None]:
#print(points)

In [None]:
#from sklearn.cross_decomposition import PLSRegression

In [None]:
"""
# Step 3: Fit PLS regression model (let's choose 2 components)
pls = PLSRegression(n_components=2)
y = [i for i in range(0, len(samples))]
pls.fit(points, dbscan_results)# try y, gives different plot, keep looking into it

# Step 4: Project data onto PLS components
X_pls = pls.transform(points)

# Step 5: Plot the PLS components
plt.figure(figsize=(10, 6))

# Plot the first PLS component vs the second PLS component
plt.scatter(X_pls[:, 0], X_pls[:, 1], c=dbscan_results, cmap='viridis', edgecolor='k', s=100)
plt.colorbar(label='Target Variable (y)')
plt.title('PLS Components Plot')
plt.xlabel('PLS Component 1')
plt.ylabel('PLS Component 2')

plt.show()
"""

In [None]:
"""
import numpy as np
# Step 6: Plot explained variance (scree plot)
explained_variance = pls.x_scores_.var(axis=0) / np.var(points, axis=0).sum()

plt.figure(figsize=(8, 5))
plt.bar(range(1, len(explained_variance) + 1), explained_variance, alpha=0.7)
plt.xlabel('PLS Components')
plt.ylabel('Explained Variance')
plt.title('Explained Variance by PLS Components')
plt.show()
"""

# Machine Learning - LSTM

# Machine Learning - Local Outlier Factor (LOF)

# Machine Learning - One-Class SVM

Implementation of One-Class SVM for anomaly detection. Most appropriate when ONLY normal data is available to find anomalies in new unlabeled data. Try with Kjell's labeled data.

In [None]:
from sklearn.svm import OneClassSVM

In [None]:

# 3. Train One-Class SVM on the normal data (you can use only normal data for training if you have separate normal data) 
svm = OneClassSVM(nu=0.1, kernel='rbf', gamma=0.0001)# nu controls the number of outliers
X_scaled = points
svm.fit(x_pca)

# Predict anomalies
y_pred = svm.predict(x_pca)


In [None]:
print(y_pred)

In [None]:
print(dbscan_results)

In [None]:
# Visualize the results
plt.figure(figsize=(8, 6))

# Plot normal points (predicted as 1)
plt.scatter(x_pca[y_pred == 1][:, 0], x_pca[y_pred == 1][:, 1], color='blue', label='Normal')

# Plot outliers (predicted as -1)
plt.scatter(x_pca[y_pred == -1][:, 0], x_pca[y_pred == -1][:, 1], color='red', label='Anomalies')

plt.title("One-Class SVM for Anomaly Detection (Multiple Features)")
plt.xlabel("Feature 1")
plt.ylabel("Feature 2")
plt.legend()
plt.show()


# Evaluation Metrics - Precision score

metrics like Precision, F1, Recall score will be used here to check effectivity of model at detecting anomalies

In [None]:
#to evaluate model through cross-validation and evalutation metrics
#from sklearn.metrics import precision_score, recall_score, f1_score, confusion_matrix

In [None]:
# Calculate classification metrics
#precision = precision_score(y_test, y_pred)

# Print the results
#print(f"Precision: {precision:.2f}")

# Evaluation Metrics - Recall score

In [None]:
#recall = recall_score(y_test, y_pred)

#print(f"Recall: {recall:.2f}")

# Evaluation Metrics - F1-score

In [None]:
#f1 = f1_score(y_test, y_pred)

#print(f"F1 Score: {f1:.2f}")

# Evaluation Metrics - Confusion Matrix

In [None]:
#conf_matrix = confusion_matrix(y_test, y_pred)

#print(f"Confusion Matrix:\n{conf_matrix}")

# Reproduce everything here on Orange and then try 26k data

# XML file parsing for real-time classification and maintenance - Bs4

Bs4 using html.parser can parse malformed xml unusable with ET. Future implementation will allow to scan for actuals in a directory

In [None]:
import os
import pandas as pd

In [None]:
xml_ordner = r'C:\Users\PC0118\Desktop\actuals'

In [None]:
filegroup = os.listdir(xml_ordner)
num_files = len(filegroup)
print(num_files)
for f in filegroup:
    print(f)

In [None]:
filegroup.pop()

In [None]:
from bs4 import BeautifulSoup as bs

In [None]:
file = filegroup[30]

In [None]:
with open(os.path.join(xml_ordner, file), 'r') as f:
        content = f.read()                                                                
f.close()

In [None]:
# Parse the malformed XML with BeautifulSoup
soup = bs(content, 'html.parser')

In [None]:
print(soup.prettify())  # View the prettified XML

# XML parsing - ElementTree

In [None]:
import xml.etree.ElementTree as ET


In [None]:
bad_files = []

In [None]:
good_dfs = {}

Traverse from root to end nodes and find relevant status information to build a dataframe out of.

In [None]:
for i in range(50, 51):
        try:  

                tree = ET.parse(os.path.join(xml_ordner, filegroup[30]))
                root = tree.getroot()
                #second child of root contains actuals, first child holds schematics for data
                #print(root[0].tag, root[0].attrib)
                diffgrams = root[1]
                #print(len(diffgrams))
                #print(diffgrams.tag, diffgrams.attrib, diffgrams.text)


                
                #dataframe to hold data in xml file initiated with list of entries and sample names identified by timestamp 
                diffgram_df = pd.DataFrame()
                samples = []

                feature = []

                num_observations = len(diffgrams[0])
                print(f"Actuals file: {filegroup[30]} - No fatal errors, num. observations: {num_observations}")

                for element in diffgrams[0]:
                        #print(element.tag, element.attrib)#, element[0].text

                        if element[0].text in samples:
                                feature = pd.DataFrame(feature, index=[f'Analysis - {sample}' for sample in samples])
                                diffgram_df = pd.concat([diffgram_df, feature], axis = 1)
                                feature = []
                                samples = []
                        feature.append({element.tag : element.get('{urn:schemas-microsoft-com:xml-diffgram-v1}id')})
                        samples.append(element[0].text)
                if feature != [] or samples != []:
                        feature = pd.DataFrame(feature, index=[f'Analysis - {sample}' for sample in samples])
                        diffgram_df = pd.concat([diffgram_df, feature], axis = 1)

                good_dfs.update({file : diffgram_df})
                #print(diffgram_df)

        except Exception as e:
                bad_files.append({'file' : filegroup[30],
                                  'error': e})
                print(f"Actuals file: {filegroup[30]} - Error encountered: {e}. Skipping this iteration for file number {35}.")
                continue

print('files read:', 35)

In [None]:
print(35)

In [None]:
print(bad_files)

In [None]:
print(good_dfs.keys())

In [None]:
df = good_dfs['actuals 23.9.2024 8_30_22-252.xml']

In [None]:
print(len(df))

In [None]:
df1 = df.iloc[12800:12850, :]
df2 = df.iloc[26220:26270, :]
df3 = df.iloc[28000:28040, :]

In [None]:
import matplotlib.pyplot as plt
from pandas.plotting import table

In [None]:
# Create a figure to hold the table
fig, ax = plt.subplots(figsize=(6, 2))  # Adjust size as needed
ax.axis('off')  # Hide axes
tbl = table(ax, df1, loc='center', colWidths=[0.2]*df1.shape[1])  # Adjust column width if needed
tbl.auto_set_font_size(False)
tbl.set_fontsize(12)
tbl.scale(1.2, 1.2)  # Scale the table

# Save the table as an image
image_path = 'table_image.png'
plt.savefig(image_path, bbox_inches='tight', dpi=300)
plt.close()


# Dashboard

In [None]:
#packages to create a dashboard
import dash
from dash import dcc
from dash import html 
from dash.dependencies import Input, Output

Set up divisions with the option to select the information to be displayed

# something off here

In [None]:
app = dash.Dash(__name__)
app.layout =html.Div([

                        html.Div([  
                        html.H1('Title', style={'text-align' : 'center'}),


                        dcc.RadioItems(
                                        id='radio-items',
                                        options=[
                                                    {'label' : 'Curves', 'value' : ''},
                                                    {'label' : 'Features', 'value' : 'base'},
                                                    {'label' : 'Decomp', 'value' : 'decompose'},
                                                    {'label' : 'Transform', 'value' : 'transform'}
                                                ],
                                        value=''   #default
                                       ),
                                       html.Div(id='output-container',
                                                style={
                                                    'backgroundColor': '#f9f9f9',
                                                    'border': '1px solid #ccc',
                                                    'padding': '20px',
                                                    'borderRadius': '5px',
                                                    'boxShadow': '2px 2px 12px rgba(0, 0, 0, 0.1)'
                                                    }
                                                )
                                ]),

                        html.Div([
                        dcc.DatePickerRange(
                            id='date-picker-range',
                            start_date='2023-01-01',
                            end_date='2023-12-31',
                            display_format='YYYY-MM-DD'
                        )
                        ], style={'border': '1px solid black', 'padding': '10px', 'margin': '10px'})

                    ]) 

            


In [None]:
#import webbrowser
@app.callback(    
    Output('output-container', 'children'),
    Input('radio-items', 'value')
)
def update_graph(value):
    dev.plot_results('Pac', features=value)
    #webbrowser.open('plot.html')
    return     

if __name__ == '__main__':
    app.run_server(debug=True)