# Notebook ICD - 16

### Libraries

In [1]:
import numpy as np
import pandas as pd

## SVM from scratch

This implementation follows the core principles of Support Vector Machines, where the main objective is to find the optimal hyperplane that separates classes. The classifier is initialized with three key parameters: 
- the learning rate controls how quickly the model adjusts during training, 
- the regularization parameter (lambda) prevents overfitting by balancing the margin and errors, and 
- the number of iterations sets how many times the algorithm should iterate over the dataset to optimize the hyperplane. 

Inside the class, the **fit** method is used to train the model by adjusting the weights (w) and bias (b) through gradient descent. During training, each data point is classified based on whether it satisfies the margin condition, and if it doesn't, both the weights and the bias are updated accordingly. 

The **predict** method uses the learned weights and bias to classify new instances by calculating the sign of the decision boundary.

In [2]:
class SVM:
    def __init__(self, learning_rate=0.001, lambda_param=0.01, n_iters=1000):
        """
        Initialize the SVM with hyperparameters for learning rate, regularization, and number of iterations.
        
        learning_rate: The step size for each iteration of gradient descent.
        lambda_param: Regularization parameter to prevent overfitting.
        n_iters: The number of times the algorithm will iterate over the dataset to find the optimal hyperplane.
        """
        self.learning_rate = learning_rate  # Controls the speed of convergence
        self.lambda_param = lambda_param    # Regularization parameter (controls the margin)
        self.n_iters = n_iters              # Number of training iterations
        self.w = None                       # Weight vector (learned parameters)
        self.b = None                       # Bias term

    def fit(self, X, y):
        """
        Train the SVM model using the training data.
        
        X: The training feature matrix (n_samples, n_features)
        y: The training labels (n_samples,). Labels should be in {-1, 1}.
        
        This method applies gradient descent to optimize the weights (w) and bias (b)
        to maximize the margin between the classes.
        """
        n_samples, n_features = X.shape
        y_ = np.where(y <= 0, -1, 1)  # Convert labels to -1 and 1 for SVM

        # Initialize the weight vector and bias term
        self.w = np.zeros(n_features)
        self.b = 0

        # Gradient descent optimization loop
        for _ in range(self.n_iters):
            for idx, x_i in enumerate(X):
                condition = y_[idx] * (np.dot(x_i, self.w) - self.b) >= 1
                if condition:
                    # If the condition holds, apply a regularization update (no penalty)
                    self.w -= self.learning_rate * (2 * self.lambda_param * self.w)
                else:
                    # If the condition fails, apply the update to w and b to penalize the misclassification
                    self.w -= self.learning_rate * (2 * self.lambda_param * self.w - np.dot(x_i, y_[idx]))
                    self.b -= self.learning_rate * y_[idx]

    def predict(self, X):
        """
        Predict the labels for the test data.
        
        X: Test feature matrix.
        
        Returns the predicted labels (either -1 or 1).
        """
        # Return the sign of the dot product (linear decision boundary)
        return np.sign(np.dot(X, self.w) - self.b)

### Implementation example

Following the SVM definition, the Weather (Play Tennis) dataset is loaded into memory using pandas. This dataset includes several weather-related attributes such as outlook, temperature, humidity, and windy, which are used to predict whether tennis can be played. This stage prepares the raw data for further preprocessing.

Next step, data preprocessing is essential because SVM requires numerical input. Each categorical feature in the dataset (e.g., sunny, hot, normal) is mapped to a corresponding integer value, allowing the algorithm to process the information effectively. The target label (whether tennis can be played or not) is also converted to numerical values (0 for "no", and 1 for "yes").

After the preprocessing is complete, the dataset is split into features and labels. The features matrix (X) includes all the weather conditions (outlook, temperature, humidity, windy), while the labels vector (y) contains the target values (play tennis: yes or no). This separation allows the model to learn from the feature set while predicting the corresponding target labels.

In [3]:
# Load the dataset (assumed to be uploaded or present in the local system)
data = pd.read_csv('weather.nominal.csv')
print(data.head())

# Convert categorical variables into numerical values for KNN
data['outlook'] = data['outlook'].map({'sunny': 0, 'overcast': 1, 'rainy': 2})
data['temperature'] = data['temperature'].map({'hot': 0, 'mild': 1, 'cool': 2})
data['humidity'] = data['humidity'].map({'high': 0, 'normal': 1})
data['windy'] = data['windy'].astype(int)
data['play'] = data['play'].map({'no': -1, 'yes': 1})

# Define X (features) and y (labels)
X = data.drop(columns='play').values  # Features
y = data['play'].values  # Labels

# Show the first transformed data
#print(X)

    outlook temperature humidity  windy play
0     sunny         hot     high  False   no
1     sunny         hot     high   True   no
2  overcast         hot     high  False  yes
3     rainy        mild     high  False  yes
4     rainy        cool   normal  False  yes


Next, the SVM classifier is initialized with the specified hyperparameters, and the training process is executed using the fit method. The model iterates over the dataset multiple times, adjusting the hyperplane to maximize the margin between the two classes (yes and no). Through each iteration, it improves its prediction ability by refining the weight vector and bias to minimize classification errors.

In [4]:
# Initialize and Train the SVM Classifier
svm = SVM(learning_rate=0.001, lambda_param=0.01, n_iters=1000)
svm.fit(X, y)

# This step initializes the SVM classifier with a specified learning rate, regularization parameter (lambda), and the number of iterations.
# The `fit` method is used to train the classifier using the feature matrix X and the target labels y.
# The model optimizes the weights and bias to create a hyperplane that separates the classes.

Once the model is trained, a test instance is created to simulate new weather conditions (sunny, hot, normal humidity, and windy). This instance is represented as a numerical array, where each feature is encoded to match the format used during training. Finally, the classifier's predict method is employed to classify this test instance. Based on the learned hyperplane, the model outputs whether it is a suitable day to play tennis, and the result is displayed as either "yes" or "no".

In [5]:
# Create a Test Instance
test_instance = np.array([[0, 0, 1, 1]])  # sunny, hot, normal, TRUE

# This step defines a new test instance that represents a weather condition: sunny, hot, normal humidity, and windy.
# The instance is passed as a NumPy array, where each feature is encoded numerically (sunny = 0, hot = 0, normal = 1, windy = TRUE = 1).

# Make a Prediction
prediction = svm.predict(test_instance)
print(f"Prediction for the test instance: {'yes' if prediction[0] == 1 else 'no'}")

# Finally, the trained SVM classifier makes a prediction on the test instance.
# The `predict` method returns a label: either 1 (yes, play tennis) or -1 (no, don't play tennis).
# The output is printed, interpreting the predicted label in the context of the weather conditions.

Prediction for the test instance: yes


## Scikit-learn implementation

Support vector machines (SVMs) are a set of supervised learning methods used for classification, regression and outliers detection. The advantages of support vector machines are:

- Effective in high dimensional spaces.
- Still effective in cases where number of dimensions is greater than the number of samples.
- Uses a subset of training points in the decision function (called support vectors), so it is also memory efficient.
- Versatile: different Kernel functions can be specified for the decision function. Common kernels are provided, but it is also possible to specify custom kernels.

The disadvantages of support vector machines include:
- If the number of features is much greater than the number of samples, avoid over-fitting in choosing Kernel functions and regularization term is crucial.

SVC, NuSVC and LinearSVC are classes capable of performing binary and multi-class classification on a dataset.
C-Support Vector Classification (SVC) implementation is based on libsvm. The fit time scales at least quadratically with the number of samples and may be impractical beyond tens of thousands of samples. For large datasets consider using LinearSVC or SGDClassifier instead, possibly after a Nystroem transformer or other Kernel Approximation.

The multiclass support is handled according to a one-vs-one scheme.

### Library

In [6]:
from sklearn import svm

### Dataset

In [7]:
df = pd.read_csv(r'weather.numeric.csv')

Show dataset

In [8]:
print(df)

    Day   Outlook  Temperature  Humidity    Wind   Play
0     1     sunny           85        85    weak  False
1     2     sunny           80        90  strong  False
2     3  overcast           83        86    weak   True
3     4      rain           70        96    weak   True
4     5      rain           68        80    weak   True
5     6      rain           65        70  strong  False
6     7  overcast           64        65  strong   True
7     8     sunny           72        95    weak  False
8     9     sunny           69        70    weak   True
9    10      rain           75        80    weak   True
10   11     sunny           75        70  strong   True
11   12  overcast           72        90  strong   True
12   13  overcast           81        75    weak   True
13   14      rain           71        91  strong  False


In [9]:
# defining the dependent and independent variables
X_train = df[['Outlook', 'Temperature', 'Humidity', 'Wind']]
y_train = df[['Play']]

print(X_train.head())
print(y_train.head())

    Outlook  Temperature  Humidity    Wind
0     sunny           85        85    weak
1     sunny           80        90  strong
2  overcast           83        86    weak
3      rain           70        96    weak
4      rain           68        80    weak
    Play
0  False
1  False
2   True
3   True
4   True


### From categorical to numeric

In [10]:
from sklearn.preprocessing import LabelEncoder
encoder = LabelEncoder()

outlook = X_train.iloc[:,0]
outlook_enc = encoder.fit_transform(outlook)
print(outlook.tolist())
print(outlook_enc)

wind = X_train.iloc[:,3]
wind_enc = encoder.fit_transform(wind)
print(wind.tolist())
print(wind_enc)

['sunny', 'sunny', 'overcast', 'rain', 'rain', 'rain', 'overcast', 'sunny', 'sunny', 'rain', 'sunny', 'overcast', 'overcast', 'rain']
[2 2 0 1 1 1 0 2 2 1 2 0 0 1]
['weak', 'strong', 'weak', 'weak', 'weak', 'strong', 'strong', 'weak', 'weak', 'weak', 'strong', 'strong', 'weak', 'strong']
[1 0 1 1 1 0 0 1 1 1 0 0 1 0]


In [11]:
df_outlook = pd.DataFrame(outlook_enc, columns = ['Outlook'])
df_wind = pd.DataFrame(outlook_enc, columns = ['Wind'])
X_train_num = pd.concat([df_outlook, X_train.iloc[:,1], X_train.iloc[:,2], df_wind], axis=1)
print(X_train_num)

    Outlook  Temperature  Humidity  Wind
0         2           85        85     2
1         2           80        90     2
2         0           83        86     0
3         1           70        96     1
4         1           68        80     1
5         1           65        70     1
6         0           64        65     0
7         2           72        95     2
8         2           69        70     2
9         1           75        80     1
10        2           75        70     2
11        0           72        90     0
12        0           81        75     0
13        1           71        91     1


### Generación del modelo

SVC, NuSVC and LinearSVC are classes capable of performing binary and multi-class classification on a dataset.
SVC and NuSVC are similar methods, but accept slightly different sets of parameters and have different mathematical formulations. On the other hand, LinearSVC is another (faster) implementation of Support Vector Classification for the case of a linear kernel.

In [12]:
clf = svm.SVC().fit(X_train_num, y_train)

  y = column_or_1d(y, warn=True)


### Evaluando modelo con nueva instancia

In [13]:
# sunny:2, hot:85, normal:65, strong:0 
new_example = [[2, 60, 65, 1]]
X_test = pd.DataFrame(new_example, columns = ['Outlook', 'Temperature', 'Humidity', 'Wind'])
print(X_test)
print(clf.predict(X_test))

   Outlook  Temperature  Humidity  Wind
0        2           60        65     1
[ True]


### Support vectors

SVMs decision function (detailed in the Mathematical formulation) depends on some subset of the training data, called the support vectors. Some properties of these support vectors can be found in attributes support_vectors_, support_ and n_support_:

In [14]:
# get support vectors
print(clf.support_vectors_)

# get indices of support vectors
print(clf.support_)

# get number of support vectors for each class
print(clf.n_support_)

[[ 2. 85. 85.  2.]
 [ 2. 80. 90.  2.]
 [ 1. 65. 70.  1.]
 [ 2. 72. 95.  2.]
 [ 1. 71. 91.  1.]
 [ 0. 83. 86.  0.]
 [ 1. 70. 96.  1.]
 [ 1. 68. 80.  1.]
 [ 1. 75. 80.  1.]
 [ 2. 75. 70.  2.]
 [ 0. 72. 90.  0.]
 [ 0. 81. 75.  0.]]
[ 0  1  5  7 13  2  3  4  9 10 11 12]
[5 7]
