# Chapter 5.5 - Non Linear Learning Algorithms

In [2]:
import numpy as np
import pandas as pd

import seaborn as sns
import matplotlib.pyplot as plt
%matplotlib inline

#import warnings
#warnings.filterwarnings("ignore")

## Support Vector Machines (SVM)

Non linear SVM also exists for regression problems.

In [5]:
from sklearn.svm import SVC  # Import the Support Vector Classifier (SVC) from sklearn
from sklearn import datasets  # Import datasets module from sklearn for generating sample datasets
import numpy as np  # Import NumPy for numerical computations

# Generate a binary classification dataset with 10 samples and 2 features
X, y = datasets.make_classification(n_samples=10,    # Number of samples (10 samples)
                                    n_features=2,    # Number of features (2 features per sample)
                                    n_redundant=0,   # No redundant features
                                    n_classes=2,     # Binary classification (2 classes)
                                    random_state=1,  # Set seed for reproducibility
                                    shuffle=False)   # Disable shuffling of the data

# Initialize a Support Vector Classifier (SVC) with the Radial Basis Function (RBF) kernel
clf = SVC(kernel='rbf')  # Radial Basis Function (RBF) kernel is the default
# Optionally, you could set gamma (kernel coefficient), but it's not set here.

# Fit the SVC model to the dataset
clf.fit(X, y)  # Train the SVC model using the input data X and labels y

# Print the number of errors (samples that were misclassified)
print("Number of Errors: \n%i" % np.sum(y != clf.predict(X)))  # Compare the true labels with the predicted labels
print()

# Obtain the decision function values for the input data
clf.decision_function(X)  # This returns the distance of the samples from the decision boundary

# Useful Internals:

# Array of support vectors (the critical points that lie on the edge of the margin)
clf.support_vectors_

# Indices of support vectors within the original dataset X
print('The support vectors stored in dataset match exactly with support vectors found by the model?')
np.all(X[clf.support_, :] == clf.support_vectors_)  # Verify that the support vectors match their indices in X

Number of Errors: 
0

The support vectors stored in dataset match exactly with support vectors found by the model?


True

#### Concepts

Number of Errors: This tells you how well the SVC classifier performed on the training dataset. Ideally, this number should be 0, indicating perfect classification.

Support Vectors: These are the most critical points for defining the decision boundary, and examining them can provide insight into the model’s performance.

Decision Function: The distance from the decision boundary shows how confident the model is in its predictions. Larger distances indicate higher confidence in the classification.

#### Interpretation

Number of Errors = 0, indicating perfect classification.

Since it returned True, it confirms that the model correctly identified the support vectors within the original dataset X.

## Random Forest

A random forest is a meta estimator that fits a number of decision tree learners on various
sub-samples of the dataset and use averaging to improve the predictive accuracy and control
over-fitting.

Random forest models reduce the risk of overfitting by introducing randomness by:

    • building multiple trees (n_estimators)
    • drawing observations with replacement (i.e., a bootstrapped sample)
    • splitting nodes on the best split among a random subset of the features selected at every node

In [9]:
from sklearn.ensemble import RandomForestClassifier

forest = RandomForestClassifier(n_estimators = 100)             # Initialize the Random Forest Classifier with 100 trees (estimators)
forest.fit(X, y)                                                # Fit the Random Forest model on the dataset X (features) and y (labels)
print("Number of Errors: %i" % np.sum(y != forest.predict(X)))  # Compare the true labels (y) with the predicted labels and calculate # of errors

Number of Errors: 0
