<h1> 4. Support Vector Machine Model (Train + Performance Evaluation) </h1>

In [22]:
import sys
from sklearn.model_selection import train_test_split
import matplotlib.pyplot as plt
from sklearn.svm import SVC
from sklearn.ensemble import RandomForestClassifier
from sklearn.metrics import accuracy_score
import numpy as np
from pathlib import Path
from data.input_data import DatasetCreator

### Get the absolute path of the current file

In [23]:
# Get the absolute path of the current file
current_file_path = Path('ensemble_model_random_forest.ipynb').resolve()

# Get the directory of the current file
project_dir = current_file_path.parent

# Add the project directory to sys.path
sys.path.insert(0, str(project_dir))

### Step 1: Create Datasets


In [24]:
dataset_creator = DatasetCreator()
blob_dataset = dataset_creator.create_blob_dataset()
circles_dataset = dataset_creator.create_make_circles_dataset()

### Step 2: Preprocesing - Split Data into Training, Validation, and Test Sets

In [25]:
X_blob, y_blob = blob_dataset['X'], blob_dataset['y']
X_circles, y_circles = circles_dataset['X'], circles_dataset['y']

### Split blob dataset into training and temporary (remaining) data

In [26]:

X_blob_train_temp, X_blob_test, y_blob_train_temp, y_blob_test = train_test_split(X_blob, y_blob, test_size=0.2,
                                                                                  random_state=42)
X_blob_train, X_blob_val, y_blob_train, y_blob_val = train_test_split(X_blob_train_temp, y_blob_train_temp,test_size=0.25, random_state=42)

print(f"Blob Dataset:")
print(f"Train set: {X_blob_train.shape}, Validation set: {X_blob_val.shape}, Test set: {X_blob_test.shape}")

Blob Dataset:
Train set: (660, 2), Validation set: (220, 2), Test set: (220, 2)


### Split circles dataset into training and temporary (remaining) data

In [27]:
X_circles_train_temp, X_circles_test, y_circles_train_temp, y_circles_test = train_test_split(X_circles, y_circles,test_size=0.2,random_state=42)
X_circles_train, X_circles_val, y_circles_train, y_circles_val = train_test_split(X_circles_train_temp, y_circles_train_temp, test_size=0.25,random_state=42)

print(f"\nCircles Dataset:")
print(f"Train set: {X_circles_train.shape}, Validation set: {X_circles_val.shape}, Test set: {X_circles_test.shape}")


Circles Dataset:
Train set: (300, 2), Validation set: (100, 2), Test set: (100, 2)


### Train basic model SVM on blob dataset

In [28]:
svm_model_blob = SVC(kernel='linear', C=1.0)
svm_model_blob = svm_model_blob.fit(X_blob_train, y_blob_train)

### Evaluate model performance on Blob Test set

In [29]:
svm_blob_accuracy = accuracy_score(y_blob_test, svm_model_blob.predict(X_blob_test))
print(f"SVM Blob Accuracy on Blob Dataset: {svm_blob_accuracy:.4f}")

SVM Blob Accuracy on Blob Dataset: 0.9545


### Train basic model SVM on circle dataset

In [30]:
svm_model_circle = SVC(kernel='linear', C=1.0)
svm_model_circle = svm_model_circle.fit(X_circles_train, y_circles_train)

### Evaluate model performance on Circle Test set

In [31]:
svm_circle_accuracy = accuracy_score(y_circles_test, svm_model_circle.predict(X_circles_test))
print(f"SVM Cicles Dataset Accuracy: {svm_circle_accuracy}")

SVM Cicles Dataset Accuracy: 0.49


### In Conclusion, The SVM performs exceptionally well with the accuracy score of 0.9545 on the Blob dataset. A score close to 1.0 suggests that the SVM can effectively separate the classes in the Blob dataset with high accuracy. In contrast to that, The SVM performs poorly (accuracy of 0.44), suggesting that the data points in the Circles dataset are not effectively separated by the linear SVM kernel.  