# Assignment #4: Support Vector Machines (Multiclassification)

### Luke Schwenke

### April 29, 2023

In [1]:
import pandas as pd
import numpy as np
from sklearn.model_selection import train_test_split, GridSearchCV, RandomizedSearchCV
import numpy as np
from sklearn.preprocessing import StandardScaler
from sklearn.ensemble import RandomForestClassifier
from sklearn.metrics import confusion_matrix, classification_report
import matplotlib.pyplot as plt
from sklearn.svm import SVC

In [2]:
import os
os.chdir('/Users/lmschwenke/Downloads')

In [3]:
train = pd.read_csv('train_data.csv')
test = pd.read_csv('test_data.csv')

In [33]:
print(train.shape)
print(test.shape)

(507, 148)
(168, 148)


In [34]:
train.head(3)

Unnamed: 0,class,BrdIndx,Area,Round,Bright,Compact,ShpIndx,Mean_G,Mean_R,Mean_NIR,...,SD_NIR_140,LW_140,GLCM1_140,Rect_140,GLCM2_140,Dens_140,Assym_140,NDVI_140,BordLngth_140,GLCM3_140
0,concrete,1.32,131,0.81,222.74,1.66,2.18,192.94,235.11,240.15,...,31.15,5.04,0.8,0.58,8.56,0.82,0.98,-0.1,1512,1287.52
1,shadow,1.59,864,0.94,47.56,1.41,1.87,36.82,48.78,57.09,...,12.01,3.7,0.52,0.96,7.01,1.69,0.86,-0.14,196,2659.74
2,shadow,1.41,409,1.0,51.38,1.37,1.53,41.72,51.96,60.48,...,18.75,3.09,0.9,0.63,8.32,1.38,0.84,0.1,1198,720.38


In [31]:
train['class'].value_counts()

building     97
concrete     93
tree         89
grass        83
shadow       45
asphalt      45
car          21
soil         20
pool         14
Name: class, dtype: int64

In [5]:
train = train.dropna()
test = test.dropna()

In [6]:
train.columns

Index(['class', 'BrdIndx', 'Area', 'Round', 'Bright', 'Compact', 'ShpIndx',
       'Mean_G', 'Mean_R', 'Mean_NIR',
       ...
       'SD_NIR_140', 'LW_140', 'GLCM1_140', 'Rect_140', 'GLCM2_140',
       'Dens_140', 'Assym_140', 'NDVI_140', 'BordLngth_140', 'GLCM3_140'],
      dtype='object', length=148)

In [7]:
# Create X/y Train and Test
X_train = train.drop('class', axis=1)
y_train = train['class']

X_test = test.drop('class', axis=1)
y_test = test['class']

In [8]:
# Scale X_train
scaler = StandardScaler()
scaler.fit(X_train)
X_train = pd.DataFrame(scaler.transform(X_train), columns=X_train.columns)

# Scale X_test
scaler.fit(X_test)
X_test = pd.DataFrame(scaler.transform(X_test), columns=X_test.columns)

## Random Forest Classifier - Base Model

In [9]:
rf = RandomForestClassifier()
rf.fit(X_train, y_train)

RandomForestClassifier()

In [10]:
y_pred_class = rf.predict(X_test)

In [11]:
print("Random Forest Base Test - Confusion Matrix:\n", confusion_matrix(y_test, y_pred_class),"\n\n")
print("Random Forest Base Test - Classification Report:\n", classification_report(y_test, y_pred_class))

Random Forest Base Test - Confusion Matrix:
 [[14  0  0  0  0  0  0  0  0]
 [ 1 19  0  4  1  0  0  0  0]
 [ 0  1 13  0  0  1  0  0  0]
 [ 0  2  0 20  0  0  0  1  0]
 [ 0  0  0  0 25  0  0  0  4]
 [ 1  0  1  0  0 13  0  0  0]
 [ 3  0  0  0  0  0 13  0  0]
 [ 0  1  0  5  2  0  0  6  0]
 [ 0  0  0  1  1  0  0  0 15]] 


Random Forest Base Test - Classification Report:
               precision    recall  f1-score   support

    asphalt        0.74      1.00      0.85        14
   building        0.83      0.76      0.79        25
        car        0.93      0.87      0.90        15
   concrete        0.67      0.87      0.75        23
      grass        0.86      0.86      0.86        29
       pool        0.93      0.87      0.90        15
     shadow        1.00      0.81      0.90        16
       soil        0.86      0.43      0.57        14
       tree        0.79      0.88      0.83        17

    accuracy                           0.82       168
   macro avg       0.84      0.82  

In [12]:
y_pred_class_train = rf.predict(X_train)

print("Random Forest Base Train - Confusion Matrix:\n", confusion_matrix(y_train, y_pred_class_train),"\n\n")
print("Random Forest Base Train - Classification Report:\n", classification_report(y_train, y_pred_class_train))

Random Forest Base Train - Confusion Matrix:
 [[45  0  0  0  0  0  0  0  0]
 [ 0 97  0  0  0  0  0  0  0]
 [ 0  0 21  0  0  0  0  0  0]
 [ 0  0  0 93  0  0  0  0  0]
 [ 0  0  0  0 83  0  0  0  0]
 [ 0  0  0  0  0 14  0  0  0]
 [ 0  0  0  0  0  0 45  0  0]
 [ 0  0  0  0  0  0  0 20  0]
 [ 0  0  0  0  0  0  0  0 89]] 


Random Forest Base Train - Classification Report:
               precision    recall  f1-score   support

    asphalt        1.00      1.00      1.00        45
   building        1.00      1.00      1.00        97
        car        1.00      1.00      1.00        21
   concrete        1.00      1.00      1.00        93
      grass        1.00      1.00      1.00        83
       pool        1.00      1.00      1.00        14
     shadow        1.00      1.00      1.00        45
       soil        1.00      1.00      1.00        20
       tree        1.00      1.00      1.00        89

    accuracy                           1.00       507
   macro avg       1.00      1.00

In [13]:
importances = rf.feature_importances_

# Sort feature importances in descending order
indices = np.argsort(importances)[::-1]

# Rearrange feature names so they match the sorted feature importances
names = [X_train.columns[i] for i in indices]

# Print the top 5 features
print("Top 5 Features:")
for i in range(5):
    print(f"{i+1}. {names[i]} ({importances[indices[i]]:.4f})")

Top 5 Features:
1. NDVI (0.0476)
2. Mean_NIR (0.0314)
3. Mean_R_40 (0.0307)
4. Mean_R (0.0278)
5. Mean_NIR_40 (0.0267)


## Linear SVM Classifier - Base Model

In [14]:
svm = SVC(kernel='linear')
svm.fit(X_train, y_train)

SVC(kernel='linear')

In [15]:
svm_test_preds = svm.predict(X_test)

print("Linear SVM Base Test - Confusion Matrix:\n", confusion_matrix(y_test, svm_test_preds),"\n\n")
print("Linear SVM Base Test - Classification Report:\n", classification_report(y_test, svm_test_preds))

Linear SVM Base Test - Confusion Matrix:
 [[13  0  0  0  0  0  1  0  0]
 [ 0 22  1  2  0  0  0  0  0]
 [ 0  0 13  1  0  0  0  1  0]
 [ 0  4  0 16  0  0  0  3  0]
 [ 0  0  0  1 22  0  0  0  6]
 [ 0  2  0  0  0 12  1  0  0]
 [ 1  0  0  0  0  0 15  0  0]
 [ 0  0  1  4  3  0  0  6  0]
 [ 0  0  0  1  1  0  0  0 15]] 


Linear SVM Base Test - Classification Report:
               precision    recall  f1-score   support

    asphalt        0.93      0.93      0.93        14
   building        0.79      0.88      0.83        25
        car        0.87      0.87      0.87        15
   concrete        0.64      0.70      0.67        23
      grass        0.85      0.76      0.80        29
       pool        1.00      0.80      0.89        15
     shadow        0.88      0.94      0.91        16
       soil        0.60      0.43      0.50        14
       tree        0.71      0.88      0.79        17

    accuracy                           0.80       168
   macro avg       0.81      0.80      0.

In [16]:
svm_train_preds = svm.predict(X_train)

print("Linear SVM Base Train - Confusion Matrix:\n", confusion_matrix(y_train, svm_train_preds),"\n\n")
print("Linear SVM Base Train - Classification Report:\n", classification_report(y_train, svm_train_preds))

Linear SVM Base Train - Confusion Matrix:
 [[45  0  0  0  0  0  0  0  0]
 [ 0 97  0  0  0  0  0  0  0]
 [ 0  0 21  0  0  0  0  0  0]
 [ 0  0  0 93  0  0  0  0  0]
 [ 0  0  0  0 83  0  0  0  0]
 [ 0  0  0  0  0 14  0  0  0]
 [ 0  0  0  0  0  0 45  0  0]
 [ 0  0  0  0  0  0  0 20  0]
 [ 0  0  0  0  0  0  0  0 89]] 


Linear SVM Base Train - Classification Report:
               precision    recall  f1-score   support

    asphalt        1.00      1.00      1.00        45
   building        1.00      1.00      1.00        97
        car        1.00      1.00      1.00        21
   concrete        1.00      1.00      1.00        93
      grass        1.00      1.00      1.00        83
       pool        1.00      1.00      1.00        14
     shadow        1.00      1.00      1.00        45
       soil        1.00      1.00      1.00        20
       tree        1.00      1.00      1.00        89

    accuracy                           1.00       507
   macro avg       1.00      1.00      

## Support Vector Machine Classifier + Linear Kernel + Grid Search

In [17]:
param_grid = {'C': np.arange(0.01, 10.01, 0.2)}

svm = SVC(kernel='linear')

# Perform grid search cross-validation to find the best hyperparameters
grid_search = GridSearchCV(svm, param_grid=param_grid, cv=5, verbose=0)
grid_search.fit(X_train, y_train)

GridSearchCV(cv=5, estimator=SVC(kernel='linear'),
             param_grid={'C': array([0.01, 0.21, 0.41, 0.61, 0.81, 1.01, 1.21, 1.41, 1.61, 1.81, 2.01,
       2.21, 2.41, 2.61, 2.81, 3.01, 3.21, 3.41, 3.61, 3.81, 4.01, 4.21,
       4.41, 4.61, 4.81, 5.01, 5.21, 5.41, 5.61, 5.81, 6.01, 6.21, 6.41,
       6.61, 6.81, 7.01, 7.21, 7.41, 7.61, 7.81, 8.01, 8.21, 8.41, 8.61,
       8.81, 9.01, 9.21, 9.41, 9.61, 9.81])})

In [18]:
print(f'Best parameters: {grid_search.best_params_}')
print(f'Mean test score: {grid_search.best_score_:.3f}')

# Use the best model to predict on test data
best_model = grid_search.best_estimator_
svm_grid_preds_test = best_model.predict(X_test)

Best parameters: {'C': 0.01}
Mean test score: 0.809


In [19]:
print("SVM + Linear Kernel + Grid Search Test - Confusion Matrix:\n", confusion_matrix(y_test, svm_grid_preds_test),"\n\n")
print("SVM + Linear Kernel + Grid Search Test - Classification Report:\n", classification_report(y_test, svm_grid_preds_test))

SVM + Linear Kernel + Grid Search Test - Confusion Matrix:
 [[13  0  0  0  0  0  1  0  0]
 [ 0 22  0  2  1  0  0  0  0]
 [ 0  1 14  0  0  0  0  0  0]
 [ 0  3  0 19  0  0  0  1  0]
 [ 0  0  0  1 25  0  0  0  3]
 [ 0  1  0  0  0 13  1  0  0]
 [ 3  0  0  0  0  0 13  0  0]
 [ 0  1  0  6  3  0  0  4  0]
 [ 0  0  0  1  0  0  0  0 16]] 


SVM + Linear Kernel + Grid Search Test - Classification Report:
               precision    recall  f1-score   support

    asphalt        0.81      0.93      0.87        14
   building        0.79      0.88      0.83        25
        car        1.00      0.93      0.97        15
   concrete        0.66      0.83      0.73        23
      grass        0.86      0.86      0.86        29
       pool        1.00      0.87      0.93        15
     shadow        0.87      0.81      0.84        16
       soil        0.80      0.29      0.42        14
       tree        0.84      0.94      0.89        17

    accuracy                           0.83       168
   ma

In [20]:
svm_grid_preds_train = best_model.predict(X_train)

print("SVM + Linear Kernel + Grid Search Train - Confusion Matrix:\n", confusion_matrix(y_train, svm_grid_preds_train),"\n\n")
print("SVM + Linear Kernel + Grid Search Train - Classification Report:\n", classification_report(y_train, svm_grid_preds_train))

SVM + Linear Kernel + Grid Search Train - Confusion Matrix:
 [[40  0  0  0  0  0  5  0  0]
 [ 2 87  0  7  0  0  1  0  0]
 [ 0  1 19  1  0  0  0  0  0]
 [ 0  9  0 83  1  0  0  0  0]
 [ 0  1  0  0 70  0  0  0 12]
 [ 0  1  0  0  1 12  0  0  0]
 [ 1  0  0  0  0  0 43  0  1]
 [ 0  3  0  4  2  0  0 11  0]
 [ 0  0  0  0  3  0  1  0 85]] 


SVM + Linear Kernel + Grid Search Train - Classification Report:
               precision    recall  f1-score   support

    asphalt        0.93      0.89      0.91        45
   building        0.85      0.90      0.87        97
        car        1.00      0.90      0.95        21
   concrete        0.87      0.89      0.88        93
      grass        0.91      0.84      0.88        83
       pool        1.00      0.86      0.92        14
     shadow        0.86      0.96      0.91        45
       soil        1.00      0.55      0.71        20
       tree        0.87      0.96      0.91        89

    accuracy                           0.89       507
   

##  Support Vector Machine Classifier + Polynomial Kernel + Grid Search

In [21]:
param_grid = {'C': np.arange(0.01, 10.01, 0.2), 'degree': [2, 3, 4, 5, 6]}

svm = SVC(kernel='poly')

# Perform grid search cross-validation to find the best hyperparameters
grid_search = GridSearchCV(svm, param_grid=param_grid, cv=5, verbose=0)
grid_search.fit(X_train, y_train)

GridSearchCV(cv=5, estimator=SVC(kernel='poly'),
             param_grid={'C': array([0.01, 0.21, 0.41, 0.61, 0.81, 1.01, 1.21, 1.41, 1.61, 1.81, 2.01,
       2.21, 2.41, 2.61, 2.81, 3.01, 3.21, 3.41, 3.61, 3.81, 4.01, 4.21,
       4.41, 4.61, 4.81, 5.01, 5.21, 5.41, 5.61, 5.81, 6.01, 6.21, 6.41,
       6.61, 6.81, 7.01, 7.21, 7.41, 7.61, 7.81, 8.01, 8.21, 8.41, 8.61,
       8.81, 9.01, 9.21, 9.41, 9.61, 9.81]),
                         'degree': [2, 3, 4, 5, 6]})

In [22]:
print(f'Best parameters: {grid_search.best_params_}')
print(f'Mean test score: {grid_search.best_score_:.3f}')

# Use the best model to predict on test data
best_model = grid_search.best_estimator_
svm_pred_test_poly = best_model.predict(X_test)

Best parameters: {'C': 3.81, 'degree': 3}
Mean test score: 0.789


In [23]:
print("SVM + Polynomial Kernel + Grid Search Test - Confusion Matrix:\n", confusion_matrix(y_test, svm_pred_test_poly),"\n\n")
print("SVM + Polynomial Kernel + Grid Search Test - Classification Report:\n", classification_report(y_test, svm_pred_test_poly))

SVM + Polynomial Kernel + Grid Search Test - Confusion Matrix:
 [[13  0  0  0  0  0  1  0  0]
 [ 0 18  0  4  3  0  0  0  0]
 [ 0  2 11  0  0  1  0  1  0]
 [ 0  3  0 19  1  0  0  0  0]
 [ 0  0  0  0 26  0  0  1  2]
 [ 0  4  0  0  0 10  1  0  0]
 [ 1  0  0  0  0  0 14  0  1]
 [ 0  1  0  5  8  0  0  0  0]
 [ 0  0  0  1  3  0  0  0 13]] 


SVM + Polynomial Kernel + Grid Search Test - Classification Report:
               precision    recall  f1-score   support

    asphalt        0.93      0.93      0.93        14
   building        0.64      0.72      0.68        25
        car        1.00      0.73      0.85        15
   concrete        0.66      0.83      0.73        23
      grass        0.63      0.90      0.74        29
       pool        0.91      0.67      0.77        15
     shadow        0.88      0.88      0.88        16
       soil        0.00      0.00      0.00        14
       tree        0.81      0.76      0.79        17

    accuracy                           0.74       1

In [24]:
svm_grid_preds_train_poly = best_model.predict(X_train)

print("SVM + Polynomial Kernel + Grid Search Train - Confusion Matrix:\n", confusion_matrix(y_train, svm_grid_preds_train_poly),"\n\n")
print("SVM + Polynomial Kernel + Grid Search Train - Classification Report:\n", classification_report(y_train, svm_grid_preds_train_poly))

SVM + Polynomial Kernel + Grid Search Train - Confusion Matrix:
 [[44  0  0  0  1  0  0  0  0]
 [ 0 95  0  1  1  0  0  0  0]
 [ 0  0 20  0  1  0  0  0  0]
 [ 0  1  0 91  1  0  0  0  0]
 [ 0  1  0  0 81  0  0  0  1]
 [ 0  0  0  0  1 13  0  0  0]
 [ 0  0  0  0  0  0 45  0  0]
 [ 0  0  0  0 11  0  0  9  0]
 [ 0  0  0  0  5  0  0  0 84]] 


SVM + Polynomial Kernel + Grid Search Train - Classification Report:
               precision    recall  f1-score   support

    asphalt        1.00      0.98      0.99        45
   building        0.98      0.98      0.98        97
        car        1.00      0.95      0.98        21
   concrete        0.99      0.98      0.98        93
      grass        0.79      0.98      0.88        83
       pool        1.00      0.93      0.96        14
     shadow        1.00      1.00      1.00        45
       soil        1.00      0.45      0.62        20
       tree        0.99      0.94      0.97        89

    accuracy                           0.95      

## Support Vector Machine Classifier + RBF Kernel + Grid Search

In [25]:
param_grid = {'C': np.arange(0.01, 10.2, 0.2), 'gamma': [0.01, 0.1, 1, 10, 100]}

# Initialize an SVM model with an RBF kernel
svm = SVC(kernel='rbf')

# Perform grid search
grid_search = GridSearchCV(svm, param_grid=param_grid, cv=5, verbose=0)
grid_search.fit(X_train, y_train)

GridSearchCV(cv=5, estimator=SVC(),
             param_grid={'C': array([1.000e-02, 2.100e-01, 4.100e-01, 6.100e-01, 8.100e-01, 1.010e+00,
       1.210e+00, 1.410e+00, 1.610e+00, 1.810e+00, 2.010e+00, 2.210e+00,
       2.410e+00, 2.610e+00, 2.810e+00, 3.010e+00, 3.210e+00, 3.410e+00,
       3.610e+00, 3.810e+00, 4.010e+00, 4.210e+00, 4.410e+00, 4.610e+00,
       4.810e+00, 5.010e+00, 5.210e+00, 5.410e+00, 5.610e+00, 5.810e+00,
       6.010e+00, 6.210e+00, 6.410e+00, 6.610e+00, 6.810e+00, 7.010e+00,
       7.210e+00, 7.410e+00, 7.610e+00, 7.810e+00, 8.010e+00, 8.210e+00,
       8.410e+00, 8.610e+00, 8.810e+00, 9.010e+00, 9.210e+00, 9.410e+00,
       9.610e+00, 9.810e+00, 1.001e+01]),
                         'gamma': [0.01, 0.1, 1, 10, 100]})

In [26]:
# Print the best hyperparameters and the corresponding mean test score
print(f'Best parameters: {grid_search.best_params_}')
print(f'Mean test score: {grid_search.best_score_:.3f}')

Best parameters: {'C': 2.81, 'gamma': 0.01}
Mean test score: 0.828


In [27]:
# Use the best model to predict on test data
best_model = grid_search.best_estimator_
y_pred_rbf_grid = best_model.predict(X_test)

In [28]:
print("SVM + RBF Kernel + Grid Search Test - Confusion Matrix:\n", confusion_matrix(y_test, y_pred_rbf_grid),"\n\n")
print("SVM + RBF Kernel + Grid Search Test - Classification Report:\n", classification_report(y_test, y_pred_rbf_grid))

SVM + RBF Kernel + Grid Search Test - Confusion Matrix:
 [[13  0  0  0  0  0  1  0  0]
 [ 0 19  0  5  1  0  0  0  0]
 [ 0  0 14  1  0  0  0  0  0]
 [ 0  3  0 20  0  0  0  0  0]
 [ 0  1  0  0 24  0  0  0  4]
 [ 0  0  0  0  0 14  1  0  0]
 [ 1  0  0  1  0  0 14  0  0]
 [ 0  1  0  5  3  0  0  5  0]
 [ 0  0  0  1  0  0  0  0 16]] 


SVM + RBF Kernel + Grid Search Test - Classification Report:
               precision    recall  f1-score   support

    asphalt        0.93      0.93      0.93        14
   building        0.79      0.76      0.78        25
        car        1.00      0.93      0.97        15
   concrete        0.61      0.87      0.71        23
      grass        0.86      0.83      0.84        29
       pool        1.00      0.93      0.97        15
     shadow        0.88      0.88      0.88        16
       soil        1.00      0.36      0.53        14
       tree        0.80      0.94      0.86        17

    accuracy                           0.83       168
   macro av

In [29]:
y_pred_rbf_grid_train = best_model.predict(X_train)

print("SVM + RBF Kernel + Grid Search Train - Confusion Matrix:\n", confusion_matrix(y_train, y_pred_rbf_grid_train),"\n\n")
print("SVM + RBF Kernel + Grid Search Train - Classification Report:\n", classification_report(y_train, y_pred_rbf_grid_train))

SVM + RBF Kernel + Grid Search Train - Confusion Matrix:
 [[45  0  0  0  0  0  0  0  0]
 [ 0 96  0  1  0  0  0  0  0]
 [ 0  0 21  0  0  0  0  0  0]
 [ 0  1  0 92  0  0  0  0  0]
 [ 0  1  0  0 81  0  0  0  1]
 [ 0  0  0  0  0 14  0  0  0]
 [ 0  0  0  0  0  0 45  0  0]
 [ 0  1  0  0  0  0  0 19  0]
 [ 0  0  0  0  1  0  0  0 88]] 


SVM + RBF Kernel + Grid Search Train - Classification Report:
               precision    recall  f1-score   support

    asphalt        1.00      1.00      1.00        45
   building        0.97      0.99      0.98        97
        car        1.00      1.00      1.00        21
   concrete        0.99      0.99      0.99        93
      grass        0.99      0.98      0.98        83
       pool        1.00      1.00      1.00        14
     shadow        1.00      1.00      1.00        45
       soil        1.00      0.95      0.97        20
       tree        0.99      0.99      0.99        89

    accuracy                           0.99       507
   macro 

## Concepts

1 - From the models run in steps 2-6, which performs the best based on the Classification Report? Support your reasoning with evidence around your test data. 

* The best performing model was **SVM with RBF Kernel** with a test set precision of 85%, recall of 83%, and f1-score of 83%. These were the highest values compared to the other models.

2 - Compare models run for steps 4-6 where different kernels were used. What is the benefit of using a polynomial or rbf kernel over a linear kernel? What could be a downside of using a polynomial or rbf kernel? 

* The 3 SVM kernels performed well but the 2 strongest were between Linear and RBF, only differing slightly in their recall values. Additionally, the C values were quite different accross the 3 (0.01, 3.81, 2.81 for Linear, Poly, RBF).

* The main benefit of using a polynomial or RBF kernel over a linear kernel is that they can capture more complex patterns in the data, allowing for more accurate predictions. This is particularly useful when the relationship between input variables and output variables is non-linear or when the data is not separable by a straight line.

* The downsides is that they can be more computationally expensive than a linear kernel and can possibly overfit the model to the training data.

3 - Explain the 'C' parameter used in steps 4-6. What does a small C mean versus a large C in sklearn? Why is it important to use the 'C' parameter when fitting a model? 

* The C parameter controls the distance between the decision boundary and the closest datapoints for each class. SVM wants to find the decision boundary that maximizes the margin while still correctly classifying all the training data.

* A smaller value of C creates a wider margin at the cost of allowing some training examples to be misclassified. Conversely, a larger value of C creates a narrower margin, which may allow for fewer misclassifications. If C is too large, the model may overfit the training data and generalize poorly to new, unseen data. On the other hand, if C is too small, the model may underfit the training data, resulting in poor performance on both the training and test data. Therefore, it's important to choose an appropriate value of C that balances the trade-off between model complexity and generalization performance. This can be achieved through techniques such as grid search or cross-validation.

4 - Scaling our input data does not matter much for Random Forest, but it is a critical step for Support Vector Machines. Explain why this is such a critical step. Also, provide an example of a feature from this data set that could cause issues with our SVMs if not scaled.

* SVMs aim to find the decision boundary that maximizes the margin between classes, and the margin is affected by the scale of the input features. When input features have very different scales, the feature with the larger scale will have a greater influence on the decision boundary than the features with smaller scales. This can lead to biased results and make the SVM more likely to misclassify data points.

* A lot of the features in this dataset are numeric but are on different scales. The Area column for example ranges between 22 and 5,767. Compared to some other columns in this dataset, these are very large numbers that would cause issues if not scaled.

5 - Describe conceptually what the purpose of a kernel is for Support Vector Machines.

* The purpose of a kernel in Support Vector Machines (SVM) is to transform the input data into a higher-dimensional space where it is easier to separate the different classes using a linear decision boundary. In other words, a kernel allows SVM to learn a nonlinear decision boundary in a transformed feature space without explicitly computing the coordinates of the data in that space.

* By using a kernel, SVM can find a decision boundary that maximizes the margin between the classes in the transformed feature space. This can lead to better generalization performance and less overfitting compared to other classification algorithms that do not use a kernel, especially when the number of input features is large or when the data is not linearly separable in the original feature space.