## MachineLearningEngine Class

The MachineLearningEngine class is builds on the CoreEngine class. The CoreEngine class serves as a parent class engines that focus on data, while the MachineLearningEngine class is for engines that focus on learning from data.

In [None]:
from src.StreamPort.ml.MachineLearningEngine import MachineLearningEngine

#Creates an empty MachineLearningEngine object and prints it
engine = MachineLearningEngine()
engine.print()

## MachineLearningAnalysis Class

The MachineLearningAnalysis class is builds on the class Analysis. The Analysis class that is used to perform analysis on the data. 

In [None]:
from src.StreamPort.ml.MachineLearningAnalysis import MachineLearningAnalysis

#Creates an empty MachineLearningAnalysis obejct and prints it
analysis = MachineLearningAnalysis()
analysis.print()

#### Load the CSV File  

This method loads the dataset from csv file and create a list of analysis object. Used the data to make a matrix with the analysis names and visualizes the results using a scatter plot.  

In [None]:
from src.StreamPort.ml.MachineLearningEngine import MachineLearningEngine
from sklearn.decomposition import PCA 
import matplotlib.pyplot as plt

#Creates an empty MachineLearningEngine object and prints it
path = 'feature_list.csv'
engine = MachineLearningEngine()
engine.add_analyses_from_csv(path)

engine.print()

print("Create a list of analysis object and prints it" )
for analysis in engine._analyses:
    print(f"Analysis: {analysis.name}")
    for key, value in analysis.data.items():
        print(f"{key}: {value}")
    print("\n")

rownames = engine.get_analyses_names()
print("Analysename: ", rownames)

mat = engine.get_data()
mat.index = rownames
print("Matrix: \n", mat)

pca = PCA(n_components=2)
scores = pca.fit_transform(mat)

plt.scatter(scores[:, 0], scores[:, 1])

plt.xlabel("PCA 1")
plt.ylabel("PCA 2")
plt.title("PCA Scores")
plt.show()

#### Make a Principle Conponent Analysis (PCA)

The method implements a machine learning engine that perfporms PCA on the dataset and visualizes the results. ProcessingSetting is the parent of MakePCA. The ProcessingSettings used to assemble data processing workflows within the each engine. The subclass MakePCASKL of MakePCA using skitklearn algorithm to perform the PCA.

In [1]:
from src.StreamPort.ml.MachineLearningEngine import MachineLearningEngine
from src.StreamPort.ml.MachineLearningProcessingSettings import  MakeModelPCASKL

#Creates an empty MachineLearningEngine object and prints it
path = 'feature_list.csv'
engine = MachineLearningEngine()
engine.add_analyses_from_csv(path)

class_path = 'feature_metadata.csv'
engine.add_classes_from_csv(class_path)

engine.print()
print(engine.get_classes())

# !!! make a general data plot
#engine.plot_data()
# x axis in the index of the features (i.e., col names)
# y axis is the valule for each analysis
# color legend is applied for each analysis


# Add the ProcessingSettings to the _settings attribute with add settings
pca_model = MakeModelPCASKL(n_components = 2, center_data= True)
engine.add_settings(pca_model)
engine.print()
# Create a method in the ML engine to perfom PCA and collect the results
engine.run_workflow()
# The results are added to the _results atribute of the engine
# make a plot method in the ML engine for the PCA results and classes
engine.plot_pca()

# make a loadings plot after confirming the scores plot


Structure of the CSV file: {'number_of_rows': 45, 'number_of_columns': 4445}
Structure of the CSV file: {'number_of_rows': 45, 'number_of_columns': 2}

MachineLearningEngine 
  name: None 
  author: None 
  path: None 
  date: 2024-07-26 18:13:08.230853 
  analyses: 45 
  settings: 0 

['control', 'influent_ozone', 'influent_uv', 'influent_ac', 'effluent']

MachineLearningEngine 
  name: None 
  author: None 
  path: None 
  date: 2024-07-26 18:13:08.230853 
  analyses: 45 
  settings: 1 

Running workflow with settings: MakeModel


#### Make a Density-Based Spatial Clustering of Application with Noise (DBSCAN)



In [None]:
from src.StreamPort.ml.MachineLearningEngine import MachineLearningEngine
from src.StreamPort.ml.MachineLearningProcessingSettings import  MakeModelDBSCANSKL

#Creates an empty MachineLearningEngine object and prints it
path = 'feature_list.csv'
engine = MachineLearningEngine()
engine.add_analyses_from_csv(path)

class_path = 'feature_metadata.csv'
engine.add_classes_from_csv(class_path)

engine.print()

#model_pls = MakeModelPLSSKL(n_components=2)
#engine.add_settings(model_pls)
#engine.print()
#engine.run_workflow()
#engine.plot_pls()

model_dbscan = MakeModelDBSCANSKL(eps=True, min_samples=True)
engine.add_settings(model_dbscan)
model_dbscan.run(engine)
engine.print()
engine.run_workflow()
engine.plot_dbscan()