### **Support Vector Machines (SVM)**

They are mainly used in classification problems. The main goal of an `SVM` is to find the hyperplane in a multi-dimensional space that best separates different classes of data. <br /><br />
A hyperplane is a submanifold of a higher-dimensional space (in a three-dimensional space, it would be a plane; in a two-dimensional space, it would be a line).

`USAGE EXAMPLES:`<br />
Image Classification: An SVM can be trained to distinguish between different types of objects in images.<br>

For example, an SVM can be trained to classify images of cats and dogs using features extracted from the images, such as textures and shapes.<br>
With this same approach we can also carry out facial recognition

In [None]:
!pip3 install numpy pandas matplotlib scikit-learn opencv-python

In [None]:
# Import libraries
import numpy as np
import matplotlib.pyplot as plt
from sklearn import datasets
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import StandardScaler
from sklearn.svm import SVC
from sklearn.metrics import classification_report, confusion_matrix
from sklearn.utils import Bunch

In [None]:
# Load sample image dataset
# We can try different datasets integrated in sklearn
# as long as they return a list
from sklearn.datasets import load_sample_images

dataset = load_sample_images()
images = dataset.images

# Function to extract features from an image
# Here we divide the image into a grid to obtain colors
# to identify regions or features
def extract_features(image, grid_size=(10, 10)):
    height, width, _ = image.shape
    features = []
    for row in range(0, height, grid_size[0]):
        for col in range(0, width, grid_size[1]):
            region = image[row:row + grid_size[0], col:col + grid_size[1]]
            avg_color = np.mean(region, axis=(0, 1))
            features.append(avg_color)
    return np.array(features)

# Extract features from images
X = []
y = []
for idx, image in enumerate(images):
    features = extract_features(image)
    X.append(features)
    y.append(np.full(features.shape[0], idx))  # Label each region of the image with the image index

# Convert lists to numpy arrays
X = np.vstack(X)
y = np.concatenate(y)

# print(X)
# print(y)

In [None]:
# Split into training and test sets
# X variables used for prediction
# y variable we want to predict
# test_size=0.2: indicates using 20% of data for test set and 80% for training set
# Training set is used to train the model, teaching the model the relationship between patterns in the data (more is better)
# Test set is used to compare predictions and see how accurate they are
# random_state: used to control how random data is split. If two people run the same function with the same random_state value,
#   they will get exactly the same data split (test and training).
# X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.3, random_state=0)
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.3, random_state=42)

In [None]:
scaler = StandardScaler()
X_train = scaler.fit_transform(X_train)
X_test = scaler.transform(X_test)

# Train the SVM model
# linear’, ‘poly’, ‘rbf’, ‘sigmoid’, ‘precomputed’
svm = SVC(kernel='rbf')
svm.fit(X_train, y_train)

# Make predictions
y_pred = svm.predict(X_test)

In [None]:
# Evaluate the model

# Confusion matrix
# TP (True Positives): Correct predictions where the model predicts the correct positive class.
# FP (False Positives): Incorrect predictions where the model predicts the positive class but the instance is negative.
# FN (False Negatives): Incorrect predictions where the model predicts the negative class but the instance is positive.
# TN (True Negatives): Correct predictions where the model predicts the correct negative class.
#
#                   Predicted Positive	   Predicted Negative
# Actual Positive	        TP	                    FN
# Actual Negative	        FP                      TN
print("Confusion Matrix")
print(confusion_matrix(y_test, y_pred))

# Precision: The ratio of true positives to the total predicted positives (TP / (TP + FP)).
# Recall: The ratio of true positives to the total actual positives (TP / (TP + FN)).
# F1 Score: The harmonic mean of precision and recall, providing a balance between both metrics (2 * (Precision * Recall) / (Precision + Recall)).
# Support: The number of actual occurrences of the class in the data.
print("Classification Report")
print(classification_report(y_test, y_pred))

In [None]:
# The following code shows us the selected image with the previously plotted grid
# and some patterns detected by the SVM (based on color)
# This can help us to detect objects in images
import cv2
def visualize_predictions(image, model, grid_size=(10, 10)):
    features = extract_features(image, grid_size)
    features = scaler.transform(features)
    predictions = model.predict(features)
    height, width, _ = image.shape
    result = image.copy()
    for idx, (row, col) in enumerate([(i, j) for i in range(0, height, grid_size[0]) for j in range(0, width, grid_size[1])]):
        color = (0, 255, 0) if predictions[idx] == 0 else (0, 0, 255)
        cv2.rectangle(result, (col, row), (col + grid_size[1], row + grid_size[0]), color, 1)
    plt.imshow(result)
    plt.title('Predictions on the Image')
    plt.show()

# Visualize predictions on the first image
visualize_predictions(images[1], svm)