### Assignment 4: Multiclassification of urban land using Support Vector Machines
#### Natalie Kim
UCI's Urban Land Cover Data Set - the classification of urba land cover with 9 different classes that are fairly balanced across train & test. Multi-scale spectral size, shape, and texture information are used for classification.

Use the SVC & LinearSVC classifiers from sklearn.svm. Use RandomForestClassifier to compare performance of this model to support vector machine.

#### 1. Data Processing

##### a) Import the data: You are provided separate.csv files for train and test.
- Train shape (507, 148)
- Test shape (168, 148)

In [1]:
import pandas as pd

In [2]:
train = pd.read_csv('/Users/xnxk040/Library/CloudStorage/OneDrive-TheUniversityofChicago/machine learning/ml data/train_data.csv')
test = pd.read_csv('/Users/xnxk040/Library/CloudStorage/OneDrive-TheUniversityofChicago/machine learning/ml data/test_data.csv')

In [3]:
train.shape

(507, 148)

In [4]:
test.shape

(168, 148)

##### b) Remove any rows that have missing data across both sets of data

In [5]:
# Remove NaN and NA
train_cleaned = train.dropna()
test_cleaned = test.dropna()

##### c) The target variable (dependent variable) is called "class", make sure to separate this out into a "y_train" and "y_test" and remove from your "X_train" and "X_test"

In [7]:
X_train = train_cleaned.drop(columns='class')
y_train = train_cleaned['class']

X_test = test_cleaned.drop(columns='class')
y_test = test_cleaned['class']

##### d) Scale all features / predictors (NOT THE TARGET VARIABLE)
Using "StandardScaler". Note: need to scale due to SVM.

In [8]:
from sklearn.preprocessing import StandardScaler

In [9]:
scaler = StandardScaler()
scaler.fit(X_train)

In [10]:
# scaled data
X_train_scaled = scaler.transform(X_train)
X_test_scaled = scaler.transform(X_test)

#### 2. Random Forest Classifier - Base Model
Create a simple Random Forest only using default parameters - to compare SVMs to Random Forst in multiclass problems

##### a) Use the RandomForestClassifier in sklearn. Fit your model on the training data

In [11]:
from sklearn.ensemble import RandomForestClassifier

In [13]:
clf = RandomForestClassifier()
clf.fit(X_train_scaled, y_train)

##### b) Use the fitted model to predict on test data. Use the .predict() method to get the predicted classes.

In [14]:
# predicted classes
y_pred_RF = clf.predict(X_test_scaled)

##### c) Calculate the confusion matrix and classification report for the test data

In [15]:
from sklearn.metrics import classification_report
from sklearn.metrics import confusion_matrix

In [16]:
# confusion matrix
confusion_matrix(y_test, y_pred_RF)

array([[14,  0,  0,  0,  0,  0,  0,  0,  0],
       [ 1, 22,  0,  2,  0,  0,  0,  0,  0],
       [ 0,  1, 13,  0,  0,  0,  1,  0,  0],
       [ 0,  5,  0, 18,  0,  0,  0,  0,  0],
       [ 0,  0,  0,  0, 25,  0,  0,  0,  4],
       [ 1,  0,  1,  0,  0, 13,  0,  0,  0],
       [ 3,  0,  0,  0,  0,  0, 13,  0,  0],
       [ 0,  1,  0,  5,  2,  0,  0,  6,  0],
       [ 0,  0,  0,  1,  1,  0,  0,  0, 15]])

In [17]:
# classification_report
print(classification_report(y_test, y_pred_RF))

              precision    recall  f1-score   support

    asphalt        0.74      1.00      0.85        14
   building        0.76      0.88      0.81        25
        car        0.93      0.87      0.90        15
   concrete        0.69      0.78      0.73        23
      grass        0.89      0.86      0.88        29
       pool        1.00      0.87      0.93        15
     shadow        0.93      0.81      0.87        16
       soil        1.00      0.43      0.60        14
       tree        0.79      0.88      0.83        17

    accuracy                           0.83       168
   macro avg       0.86      0.82      0.82       168
weighted avg       0.85      0.83      0.82       168



##### d) Calculate predictions for the training data & build the classification report & confusion matrix. Are there signs of overfitting? Why or why not?

In [18]:
# train predicted classes
y_pred_RF_train = clf.predict(X_train_scaled)

In [19]:
# confusion matrix
confusion_matrix(y_train, y_pred_RF_train)

array([[45,  0,  0,  0,  0,  0,  0,  0,  0],
       [ 0, 97,  0,  0,  0,  0,  0,  0,  0],
       [ 0,  0, 21,  0,  0,  0,  0,  0,  0],
       [ 0,  0,  0, 93,  0,  0,  0,  0,  0],
       [ 0,  0,  0,  0, 83,  0,  0,  0,  0],
       [ 0,  0,  0,  0,  0, 14,  0,  0,  0],
       [ 0,  0,  0,  0,  0,  0, 45,  0,  0],
       [ 0,  0,  0,  0,  0,  0,  0, 20,  0],
       [ 0,  0,  0,  0,  0,  0,  0,  0, 89]])

In [20]:
# classification_report
print(classification_report(y_train, y_pred_RF_train))

              precision    recall  f1-score   support

    asphalt        1.00      1.00      1.00        45
   building        1.00      1.00      1.00        97
        car        1.00      1.00      1.00        21
   concrete        1.00      1.00      1.00        93
      grass        1.00      1.00      1.00        83
       pool        1.00      1.00      1.00        14
     shadow        1.00      1.00      1.00        45
       soil        1.00      1.00      1.00        20
       tree        1.00      1.00      1.00        89

    accuracy                           1.00       507
   macro avg       1.00      1.00      1.00       507
weighted avg       1.00      1.00      1.00       507



Yes, there are very clear signs of overfitting of the training data for Random Forest, We can see that all of the metrics for the training data - accuracy, precision, recall, and f1 score - are all at 100%. Not only is this suspicously high, but it also is different from the test data which had metrics lower than 100%. For example, the accuracy was 83%.

##### e) Identify the top 5 features. Feel free to print a list OR to make a plot

In [22]:
# convert feature_importances to pandas series for usability
RF_features = pd.Series(clf.feature_importances_, index=X_train.columns)

# Sort the features based on importance
RF_top_features = RF_features.sort_values(ascending=False)

# Print the top 5 features
print("Top 5 Features:")
print(RF_top_features.head(5))

Top 5 Features:
NDVI         0.038612
Mean_R       0.035632
NDVI_40      0.031103
Mean_R_60    0.030564
Mean_NIR     0.028459
dtype: float64


#### 3. LinearSVM Classifier - Base Model
Create a simple LinearSVC Classifier only using default parameters

##### a) Use LinearSVC in sklearn. Fit your model on the training data

In [23]:
from sklearn import svm
from sklearn.svm import LinearSVC

In [26]:
svc = LinearSVC(dual="auto",max_iter=10000)

In [27]:
svc.fit(X_train_scaled, y_train)

##### b) Use fitted model to predict on test data. Use the .predict() method to get the predicted classes

In [28]:
y_pred_lsvc = svc.predict(X_test_scaled)

##### c) Calculate the confusion matrix and classification report for test data

In [29]:
# confusion matrix
confusion_matrix(y_test, y_pred_lsvc)

array([[13,  0,  0,  0,  0,  0,  1,  0,  0],
       [ 0, 22,  1,  1,  1,  0,  0,  0,  0],
       [ 0,  2, 12,  0,  0,  0,  0,  0,  1],
       [ 1,  6,  0, 15,  0,  0,  0,  0,  1],
       [ 0,  0,  0,  1, 26,  0,  0,  0,  2],
       [ 1,  0,  1,  0,  0, 13,  0,  0,  0],
       [ 2,  0,  0,  0,  0,  0, 14,  0,  0],
       [ 0,  4,  0,  1,  3,  0,  0,  6,  0],
       [ 0,  0,  0,  1,  7,  0,  0,  0,  9]])

In [31]:
# classification_report
print(classification_report(y_test, y_pred_lsvc))

              precision    recall  f1-score   support

    asphalt        0.76      0.93      0.84        14
   building        0.65      0.88      0.75        25
        car        0.86      0.80      0.83        15
   concrete        0.79      0.65      0.71        23
      grass        0.70      0.90      0.79        29
       pool        1.00      0.87      0.93        15
     shadow        0.93      0.88      0.90        16
       soil        1.00      0.43      0.60        14
       tree        0.69      0.53      0.60        17

    accuracy                           0.77       168
   macro avg       0.82      0.76      0.77       168
weighted avg       0.80      0.77      0.77       168



##### d) Calculate the predictions for training data & build the classification report & confusion matrix. Are there signs of overfitting? Why or Why not?

In [32]:
# train predicted classes
y_pred_lsvc_train = svc.predict(X_train_scaled)

In [33]:
# confusion matrix
confusion_matrix(y_train, y_pred_lsvc_train)

array([[45,  0,  0,  0,  0,  0,  0,  0,  0],
       [ 0, 97,  0,  0,  0,  0,  0,  0,  0],
       [ 0,  0, 21,  0,  0,  0,  0,  0,  0],
       [ 0,  0,  0, 93,  0,  0,  0,  0,  0],
       [ 0,  1,  0,  0, 80,  0,  0,  0,  2],
       [ 0,  0,  0,  0,  0, 14,  0,  0,  0],
       [ 0,  0,  0,  0,  0,  0, 45,  0,  0],
       [ 0,  0,  0,  0,  0,  0,  0, 20,  0],
       [ 0,  0,  0,  0,  0,  0,  0,  0, 89]])

In [34]:
# classification_report
print(classification_report(y_train, y_pred_lsvc_train))

              precision    recall  f1-score   support

    asphalt        1.00      1.00      1.00        45
   building        0.99      1.00      0.99        97
        car        1.00      1.00      1.00        21
   concrete        1.00      1.00      1.00        93
      grass        1.00      0.96      0.98        83
       pool        1.00      1.00      1.00        14
     shadow        1.00      1.00      1.00        45
       soil        1.00      1.00      1.00        20
       tree        0.98      1.00      0.99        89

    accuracy                           0.99       507
   macro avg       1.00      1.00      1.00       507
weighted avg       0.99      0.99      0.99       507



Yes, we can also see signs of overfitting due to the high values of the metrics such as accuracy. Additionally, there is significant discrepancies between the training and test results. The training has an accuracy of 99%, while the test was only at 77%

#### 4. Support Vector Machine Classifier + Linear Kernel + Grid Search:
Use GridSearchCV to try various hyperparameters in a SVM with linear kernel

##### a) use SVC from sklearn with kernel = "linear". Run the GridSearchCV using the following (SVMs run much faster than RandomForest)
C: 0.01 - 10 in increments of 0.2 (consider using the np.arange() method from numpy to build out a sequence of values). Note: can try out more parameters

Use 5 cross-fold and default scoring. Set verbose = 0 to reduce printing

In [40]:
from sklearn.model_selection import GridSearchCV

In [39]:
import numpy as np

In [60]:
# parameter grid
param_grid = {
    'C': np.arange(0.01,10.2,0.2)
}

In [61]:
# Initialize classifier
linear_svc = svm.SVC(kernel='linear')

In [62]:
# GridSearchCV
grid_search = GridSearchCV(estimator=linear_svc, param_grid=param_grid, cv=5, verbose=0)

grid_search.fit(X_train_scaled, y_train)

##### b) Identify best performing model:
- .best_params_(): method outputs to best performing parameters
- .best_estimator_(): method outputs the best performing model, and can be used for predicting on the X_test

In [63]:
# best performing parameters of model
bestParam = grid_search.best_params_

In [64]:
# best estimator of model
bestModel = grid_search.best_estimator_

##### c) Use the best estimator model to predict on test data. Use the .predict() method to get the predicted classes.

In [65]:
y_pred_LSVM = bestModel.predict(X_test_scaled)

##### d) Calculate the confusion matrix and classification report for test data.

In [66]:
# confusion matrix
confusion_matrix(y_test, y_pred_LSVM)

array([[13,  0,  0,  0,  0,  0,  1,  0,  0],
       [ 0, 22,  0,  2,  1,  0,  0,  0,  0],
       [ 0,  1, 14,  0,  0,  0,  0,  0,  0],
       [ 0,  5,  0, 17,  0,  0,  0,  1,  0],
       [ 0,  0,  0,  1, 25,  0,  0,  0,  3],
       [ 0,  0,  0,  0,  0, 14,  1,  0,  0],
       [ 1,  0,  0,  0,  0,  0, 15,  0,  0],
       [ 0,  3,  0,  5,  2,  0,  0,  4,  0],
       [ 0,  0,  0,  1,  2,  0,  0,  0, 14]])

In [67]:
# classification_report
print(classification_report(y_test, y_pred_LSVM))

              precision    recall  f1-score   support

    asphalt        0.93      0.93      0.93        14
   building        0.71      0.88      0.79        25
        car        1.00      0.93      0.97        15
   concrete        0.65      0.74      0.69        23
      grass        0.83      0.86      0.85        29
       pool        1.00      0.93      0.97        15
     shadow        0.88      0.94      0.91        16
       soil        0.80      0.29      0.42        14
       tree        0.82      0.82      0.82        17

    accuracy                           0.82       168
   macro avg       0.85      0.81      0.82       168
weighted avg       0.83      0.82      0.81       168



##### e) Calculate predictions for the training data & build the classification report & confusion matrix. Are there signs of overfitting? Why or why not?

In [68]:
# train predicted classes
y_pred_LSVM_train = bestModel.predict(X_train_scaled)

In [69]:
# confusion matrix
confusion_matrix(y_train, y_pred_LSVM_train)

array([[40,  0,  0,  0,  0,  0,  5,  0,  0],
       [ 2, 87,  0,  7,  0,  0,  1,  0,  0],
       [ 0,  1, 19,  1,  0,  0,  0,  0,  0],
       [ 0,  9,  0, 83,  1,  0,  0,  0,  0],
       [ 0,  1,  0,  0, 70,  0,  0,  0, 12],
       [ 0,  1,  0,  0,  1, 12,  0,  0,  0],
       [ 1,  0,  0,  0,  0,  0, 43,  0,  1],
       [ 0,  3,  0,  4,  2,  0,  0, 11,  0],
       [ 0,  0,  0,  0,  3,  0,  1,  0, 85]])

In [70]:
# classification_report
print(classification_report(y_train, y_pred_LSVM_train))

              precision    recall  f1-score   support

    asphalt        0.93      0.89      0.91        45
   building        0.85      0.90      0.87        97
        car        1.00      0.90      0.95        21
   concrete        0.87      0.89      0.88        93
      grass        0.91      0.84      0.88        83
       pool        1.00      0.86      0.92        14
     shadow        0.86      0.96      0.91        45
       soil        1.00      0.55      0.71        20
       tree        0.87      0.96      0.91        89

    accuracy                           0.89       507
   macro avg       0.92      0.86      0.88       507
weighted avg       0.89      0.89      0.89       507



Though the accuracy between the train and test sets are 89% and 82%, the other metrics - precision, recall and f1-score - do not have significant discrepancy. Due to this, we can consider this not a sign of overfitting.

#### 5. Support Vector Machine Classifier + Polynomial Kernel + Grid Search
We will now use GridSearchCV to try various hyperparameters in a SVM with a polynomial kernel.

##### a) Use SVC from sklearn with kernel = "poly". Run the GridSearchCV using the following:

- C: 0.01 - 10 in increments of 0.2
- degree: 2, 3, 4, 5, 6

Note: Feel free to try out more parameters, the above is the bare minimum for this assignment.

Use 5 cross-fold and the default scoring.

In [71]:
# parameter grid
param_grid2 = {
    'C': np.arange(0.01,10.2,0.2),
    'degree': [2, 3, 4, 5, 6]
}

In [73]:
# Initialize classifier
poly_svc = svm.SVC(kernel='poly')

In [74]:
# GridSearchCV
grid_search2 = GridSearchCV(estimator=poly_svc, param_grid=param_grid2, cv=5, verbose=0)

grid_search2.fit(X_train_scaled, y_train)

##### b) Identify the best performing model:
- .best_params_() : This method outputs to best performing parameters
- .best_estimator_() : This method outputs the best performing model, and can be used for predicting on the X_test

In [75]:
# best performing parameters of model
bestParam_poly = grid_search2.best_params_

In [76]:
# best estimator of model
bestModel_poly = grid_search2.best_estimator_

##### c) Use the best estimator model to predict on test data. Use the .predict() method to get the predicted classes.

In [77]:
y_pred_polySVM = bestModel_poly.predict(X_test_scaled)

##### d) Calculate the confusion matrix and classification report for test data.

In [78]:
# confusion matrix
confusion_matrix(y_test, y_pred_polySVM)

array([[13,  0,  0,  0,  0,  0,  1,  0,  0],
       [ 0, 22,  0,  2,  1,  0,  0,  0,  0],
       [ 0,  2, 11,  0,  0,  1,  0,  1,  0],
       [ 0,  5,  0, 17,  1,  0,  0,  0,  0],
       [ 0,  0,  0,  0, 26,  0,  0,  1,  2],
       [ 0,  0,  0,  0,  0, 14,  1,  0,  0],
       [ 1,  0,  0,  0,  0,  0, 14,  0,  1],
       [ 0,  2,  0,  5,  7,  0,  0,  0,  0],
       [ 0,  0,  0,  1,  3,  0,  0,  0, 13]])

In [79]:
# classification_report
print(classification_report(y_test, y_pred_polySVM))

              precision    recall  f1-score   support

    asphalt        0.93      0.93      0.93        14
   building        0.71      0.88      0.79        25
        car        1.00      0.73      0.85        15
   concrete        0.68      0.74      0.71        23
      grass        0.68      0.90      0.78        29
       pool        0.93      0.93      0.93        15
     shadow        0.88      0.88      0.88        16
       soil        0.00      0.00      0.00        14
       tree        0.81      0.76      0.79        17

    accuracy                           0.77       168
   macro avg       0.74      0.75      0.74       168
weighted avg       0.73      0.77      0.75       168



##### e) Calculate predictions for the training data & build the classification report & confusion matrix. Are there signs of overfitting? Why or why not?

In [80]:
# train predicted classes
y_pred_polySVM_train = bestModel_poly.predict(X_train_scaled)

In [81]:
# confusion matrix
confusion_matrix(y_train, y_pred_polySVM_train)

array([[44,  0,  0,  0,  1,  0,  0,  0,  0],
       [ 0, 95,  0,  1,  1,  0,  0,  0,  0],
       [ 0,  0, 20,  0,  1,  0,  0,  0,  0],
       [ 0,  1,  0, 91,  1,  0,  0,  0,  0],
       [ 0,  1,  0,  0, 81,  0,  0,  0,  1],
       [ 0,  0,  0,  0,  1, 13,  0,  0,  0],
       [ 0,  0,  0,  0,  0,  0, 45,  0,  0],
       [ 0,  0,  0,  0, 11,  0,  0,  9,  0],
       [ 0,  0,  0,  0,  5,  0,  0,  0, 84]])

In [82]:
# classification_report
print(classification_report(y_train, y_pred_polySVM_train))

              precision    recall  f1-score   support

    asphalt        1.00      0.98      0.99        45
   building        0.98      0.98      0.98        97
        car        1.00      0.95      0.98        21
   concrete        0.99      0.98      0.98        93
      grass        0.79      0.98      0.88        83
       pool        1.00      0.93      0.96        14
     shadow        1.00      1.00      1.00        45
       soil        1.00      0.45      0.62        20
       tree        0.99      0.94      0.97        89

    accuracy                           0.95       507
   macro avg       0.97      0.91      0.93       507
weighted avg       0.96      0.95      0.95       507



Yes, there are signs of overfitting here as there is a significant difference between the train and test values in the classification reports.

#### 6. Support Vector Machine Classifier + RBF Kernel + Grid Search
We will now use GridSearchCV to try various hyperparameters in a SVM with a RBF kernel.

##### a) Use SVC from sklearn with kernel = "rbf". Run the GridSearchCV using the following:

- C: 0.01 - 10 in increments of 0.2
- gamma: 0.01,  0.1, 1, 10, 100

Note: Feel free to try out more parameters, the above is the bare minimum for this assignment.

Use 5 cross-fold and the default scoring.

In [83]:
# parameter grid
param_grid3 = {
    'C': np.arange(0.01,10.2,0.2),
    'gamma': [0.01, 0.1, 1, 10, 100]
}

In [84]:
# Initialize classifier
rbf_svc = svm.SVC(kernel='rbf')

In [85]:
# GridSearchCV
grid_search3 = GridSearchCV(estimator=rbf_svc, param_grid=param_grid3, cv=5, verbose=0)

grid_search3.fit(X_train_scaled, y_train)

##### b) Identify the best performing model:
- .best_params_() : This method outputs to best performing parameters
- .best_estimator_() : This method outputs the best performing model, and can be used for predicting on the X_test

In [89]:
# best performing parameters of model
bestParam_rbf = grid_search3.best_params_

In [87]:
# best estimator of model
bestModel_rbf = grid_search3.best_estimator_

##### c) Use the best estimator model to predict on test data. Use the .predict() method to get the predicted classes.

In [90]:
y_pred_rbfSVM = bestModel_rbf.predict(X_test_scaled)

##### d) Calculate the confusion matrix and classification report for test data.

In [91]:
# confusion matrix
confusion_matrix(y_test, y_pred_rbfSVM)

array([[13,  0,  0,  0,  0,  0,  1,  0,  0],
       [ 0, 21,  0,  3,  1,  0,  0,  0,  0],
       [ 0,  1, 14,  0,  0,  0,  0,  0,  0],
       [ 0,  4,  0, 19,  0,  0,  0,  0,  0],
       [ 0,  1,  0,  0, 26,  0,  0,  0,  2],
       [ 0,  0,  0,  0,  0, 14,  1,  0,  0],
       [ 1,  0,  0,  0,  0,  0, 15,  0,  0],
       [ 0,  2,  0,  4,  3,  0,  0,  5,  0],
       [ 0,  0,  0,  1,  1,  0,  0,  0, 15]])

In [92]:
# classification_report
print(classification_report(y_test, y_pred_rbfSVM))

              precision    recall  f1-score   support

    asphalt        0.93      0.93      0.93        14
   building        0.72      0.84      0.78        25
        car        1.00      0.93      0.97        15
   concrete        0.70      0.83      0.76        23
      grass        0.84      0.90      0.87        29
       pool        1.00      0.93      0.97        15
     shadow        0.88      0.94      0.91        16
       soil        1.00      0.36      0.53        14
       tree        0.88      0.88      0.88        17

    accuracy                           0.85       168
   macro avg       0.88      0.84      0.84       168
weighted avg       0.86      0.85      0.84       168



##### e) Calculate predictions for the training data & build the classification report & confusion matrix. Are there signs of overfitting? Why or why not?

In [93]:
# train predicted classes
y_pred_rbfSVM_train = bestModel_rbf.predict(X_train_scaled)

In [94]:
# confusion matrix
confusion_matrix(y_train, y_pred_rbfSVM_train)

array([[45,  0,  0,  0,  0,  0,  0,  0,  0],
       [ 0, 96,  0,  1,  0,  0,  0,  0,  0],
       [ 0,  0, 21,  0,  0,  0,  0,  0,  0],
       [ 0,  1,  0, 92,  0,  0,  0,  0,  0],
       [ 0,  1,  0,  0, 81,  0,  0,  0,  1],
       [ 0,  0,  0,  0,  0, 14,  0,  0,  0],
       [ 0,  0,  0,  0,  0,  0, 45,  0,  0],
       [ 0,  1,  0,  0,  0,  0,  0, 19,  0],
       [ 0,  0,  0,  0,  1,  0,  0,  0, 88]])

In [95]:
# classification_report
print(classification_report(y_train, y_pred_rbfSVM_train))

              precision    recall  f1-score   support

    asphalt        1.00      1.00      1.00        45
   building        0.97      0.99      0.98        97
        car        1.00      1.00      1.00        21
   concrete        0.99      0.99      0.99        93
      grass        0.99      0.98      0.98        83
       pool        1.00      1.00      1.00        14
     shadow        1.00      1.00      1.00        45
       soil        1.00      0.95      0.97        20
       tree        0.99      0.99      0.99        89

    accuracy                           0.99       507
   macro avg       0.99      0.99      0.99       507
weighted avg       0.99      0.99      0.99       507



Yes, there are clear signs of overfitting as there is high training performance, but lower test performance. This discrepancy is a sign that model fit too well to the training data.

#### 7. Conceptual Questions

##### a) From the models run in steps 2-6, which performs the best based on the Classification Report? Support your reasoning with evidence around your test data. 
Based on the classification report, the model that performed the best was the RBF SVC model because it performed stronly across all of the metrics - precision, recall, and f1-score - for the weighted average, and had the highest accuracy of 0.85 out of all of the models. If we look closer at the scores for individual classes, we can see that the model reported back high precision and recall for car, pool, shadow, and soil. These other models did not perform as well, for example the Poly SVC had 0 precision, recall, and f1-score for the soil class.

##### b) Compare models run for steps 4-6 where different kernels were used. What is the benefit of using a polynomial or rbf kernel over a linear kernel? What could be a downside of using a polynomial or rbf kernel? 
The benefit of using either a Polynomial or RBF kernel over a linear kernel is because the higher-dimensionality capability that the poly and RBF kernel have. This makes these latter options ideal for mapping non-linear relationships between our target and predictors.

A downside of using polynomial or RBF  kernel, however, is the risk of overfitting due to the higher dimensionality.

##### c) Explain the 'C' parameter used in steps 4-6. What does a small C mean versus a large C in sklearn? Why is it important to use the 'C' parameter when fitting a model? 
It is a penalty parameter which trades off misclassification of training examples against simplicity of the decision surface. A low C makes the decision surface smooth, while a high C aims at classifying all training examples correctly.

##### d) Scaling our input data does not matter much for Random Forest, but it is a critical step for Support Vector Machines. Explain why this is such a critical step. Also, provide an example of a feature from this data set that could cause issues with our SVMs if not scaled.
Scaling our data is important because Support Vector Machine algorithms are not scale invariant. In other words, the SVM algorithm is affected by a change to a feature's value or unit of measurement. One variable that could have affected our SVM if it was not scaled is the area. The units is in meters squared, but given that we have a number of variables that are ratios rather than distinct measurement values, this could've severely affected our model.

##### e) Describe conceptually what the purpose of a kernel is for Support Vector Machines.
A kernel is a function used to handle varying data structures by computing the dot product of two vectors in a higher-dimensional space. It enables the algorithm to handle more complexities in the data much like the real world.