<a href="https://colab.research.google.com/github/google/applied-machine-learning-intensive/blob/master/content/06_other_models/05_svm/colab.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

#### Copyright 2020 Google LLC.

In [0]:
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# https://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.

# Support Vector Machines

Support Vector Machines (SVM) are powerful tools for performing both classification and regression tasks. In this colab we'll create a classification model using an SVM in scikit-learn.

## Load the Data

Let's begin by loading a dataset that we'll use for classification.

In [0]:
import pandas as pd
from sklearn.datasets import load_iris

iris_bunch = load_iris()

iris_df = pd.DataFrame(iris_bunch.data, columns=iris_bunch.feature_names)
iris_df['species'] = iris_bunch.target

iris_df.describe() 

You can see in the data description above that the range of values for each of the columns is quite a bit different. For instance, the mean sepal length is almost twice as big as the mean sepal width.

SVM is sensitive to features with different scales. We'll run the data through the `StandardScaler` to get all of the feature data scaled.

First let's create the scalar and fit it to our features.

In [0]:
from sklearn.preprocessing import StandardScaler

scaler = StandardScaler()
scaler.fit(iris_df[iris_bunch.feature_names])

scaler.mean_

We can now transform the data by applying the `scaler`.

In [0]:
iris_df[iris_bunch.feature_names] = scaler.transform(
    iris_df[iris_bunch.feature_names])

iris_df.describe()

Since we scaled the data, the column names are now a bit deceiving. These are no longer unaltered centimeters, but normalized lengths. Let's rename the columns to get "(cm)" out of the names.

In [0]:
iris_df = iris_df.rename(index=str, columns={
  'sepal length (cm)': 'sepal_length',
  'sepal width (cm)': 'sepal_width',
  'petal length (cm)': 'petal_length',
  'petal width (cm)': 'petal_width'})
iris_df.head()

We could use all of the features to train our model, but in this case we are going to pick two features so that we can make some nice visualizations later on in the colab.

In [0]:
features = ['petal_length', 'petal_width']
target = 'species'

Now we can create and train a classifier. There are multiple ways to create an SVM model in scikit-learn. We are going to use the [linear support vector classifier](https://scikit-learn.org/stable/modules/generated/sklearn.svm.LinearSVC.html).

In [0]:
from sklearn.svm import LinearSVC

classifier = LinearSVC()
classifier.fit(iris_df[features], iris_df[target])

We can now use our model to make predictions. We'll make predictions on the data we just trained on in order to get an F1 score.

In [0]:
from sklearn.metrics import f1_score

predictions = classifier.predict(iris_df[features])

f1_score(iris_df[target], predictions, average='micro')

We can visualize the decision boundaries using the pyplot `contourf` function.

In [0]:
import matplotlib.pyplot as plt
import numpy as np

# Find the smallest value in the feature data. We are looking across both
# features since we scaled them. Make the min value a little smaller than
# reality in order to better see all of the points on the chart.
min_val = min(iris_df[features].min()) - 0.25

# Find the largest value in the feature data. Make the max value a little bigger
# than reality in order to better see all of the points on the chart.
max_val = max(iris_df[features].max()) + 0.25

# Create a range of numbers from min to max with some small step. This will be
# used to make multiple predictions that will create the decision boundary
# outline.
rng = np.arange(min_val, max_val, .02)

# Create a grid of points.
xx, yy = np.meshgrid(rng, rng)

# Make predictions on every point in the grid.
predictions = classifier.predict(np.c_[xx.ravel(), yy.ravel()])

# Reshape the predictions for plotting.
zz = predictions.reshape(xx.shape)

# Plot the predictions on the grid.
plt.contourf(xx, yy, zz)

# Plot each class of iris with a different marker.
#   Class 0 with circles
#   Class 1 with triangles
#   Class 2 with squares
for species_and_marker in ((0, 'o'), (1, '^'), (2, 's')):
  plt.scatter(
    iris_df[iris_df[target] == species_and_marker[0]][features[0]],
    iris_df[iris_df[target] == species_and_marker[0]][features[1]],
    marker=species_and_marker[1])
plt.show()

# Exercises

## Exercise 1: Polynomial SVC

The scikit-learn module also has an [SVC](https://scikit-learn.org/stable/modules/generated/sklearn.svm.SVC.html) classifier that can use non-linear kernels. Create an `SVC` classifier with a 3-degree polynomial kernel, and train it on the iris data. Make predictions on the iris data that you trained on, and then print out the F1 score.

### **Student Solution**

In [0]:
# Your code goes here

---

### Answer Key

In [0]:
from sklearn.datasets import load_iris
from sklearn.metrics import f1_score
from sklearn.preprocessing import StandardScaler
from sklearn.svm import SVC
import matplotlib.pyplot as plt
import numpy as np
import pandas as pd

iris_bunch = load_iris()

iris_df = pd.DataFrame(iris_bunch.data, columns=iris_bunch.feature_names)
iris_df['species'] = iris_bunch.target

scaler = StandardScaler()
scaler.fit(iris_df[iris_bunch.feature_names])

iris_df[iris_bunch.feature_names] = scaler.transform(
    iris_df[iris_bunch.feature_names])

iris_df = iris_df.rename(index=str, columns={
  'sepal length (cm)': 'sepal_length',
  'sepal width (cm)': 'sepal_width',
  'petal length (cm)': 'petal_length',
  'petal width (cm)': 'petal_width'})

features = ['petal_length', 'petal_width']
target = 'species'

classifier = SVC(kernel='poly', degree=3)
classifier.fit(iris_df[features], iris_df[target])
predictions = classifier.predict(iris_df[features])

print(f1_score(iris_df[target], predictions, average='micro'))

---

## Exercise 2: Plotting

Create a plot that shows the decision boundaries of the polynomial SVC that you created in exercise 1.

### **Student Solution**

In [0]:
# Your code goes here

---

### Answer Key

**Solution**

In [0]:
from sklearn.datasets import load_iris
from sklearn.metrics import f1_score
from sklearn.preprocessing import StandardScaler
from sklearn.svm import SVC
import matplotlib.pyplot as plt
import numpy as np
import pandas as pd

iris_bunch = load_iris()

iris_df = pd.DataFrame(iris_bunch.data, columns=iris_bunch.feature_names)
iris_df['species'] = iris_bunch.target

scaler = StandardScaler()
scaler.fit(iris_df[iris_bunch.feature_names])

iris_df[iris_bunch.feature_names] = scaler.transform(
    iris_df[iris_bunch.feature_names])

iris_df = iris_df.rename(index=str, columns={
  'sepal length (cm)': 'sepal_length',
  'sepal width (cm)': 'sepal_width',
  'petal length (cm)': 'petal_length',
  'petal width (cm)': 'petal_width'})

features = ['petal_length', 'petal_width']
target = 'species'

classifier = SVC(kernel='poly', degree=3)
classifier.fit(iris_df[features], iris_df[target])
predictions = classifier.predict(iris_df[features])

print(f1_score(iris_df[target], predictions, average='micro'))

min_val = min(iris_df[features].min()) - 1
max_val = max(iris_df[features].max()) + 1
xx, yy = np.meshgrid(np.arange(min_val, max_val, .02),
                     np.arange(min_val, max_val, .02))
predictions = classifier.predict(np.c_[xx.ravel(), yy.ravel()])
zz = predictions.reshape(xx.shape)
plt.contourf(xx, yy, zz)

for species_and_marker in ((0, 'o'), (1, '^'), (2, 's')):
  plt.scatter(
    iris_df[iris_df[target] == species_and_marker[0]][features[0]],
    iris_df[iris_df[target] == species_and_marker[0]][features[1]],
    marker=species_and_marker[1])
plt.show()

---

## Exercise 3: C Hyperparameter

We accepted the default 1.0 C hyperparameter in the classifier above. Try halving and doubling the C value. How does it affect the F1 score?

Visualize the decision boundaries. Do they visibly change?

### **Student Solution**

In [0]:
# Your code goes here

---

### Answer Key

With `C` halved, the decision boundaries smooth out a bit, and the F1 score goes down.

In [0]:
from sklearn.datasets import load_iris
from sklearn.metrics import f1_score
from sklearn.preprocessing import StandardScaler
from sklearn.svm import SVC
import matplotlib.pyplot as plt
import numpy as np
import pandas as pd

iris_bunch = load_iris()

iris_df = pd.DataFrame(iris_bunch.data, columns=iris_bunch.feature_names)
iris_df['species'] = iris_bunch.target

scaler = StandardScaler()
scaler.fit(iris_df[iris_bunch.feature_names])

iris_df[iris_bunch.feature_names] = scaler.transform(
    iris_df[iris_bunch.feature_names])

iris_df = iris_df.rename(index=str, columns={
  'sepal length (cm)': 'sepal_length',
  'sepal width (cm)': 'sepal_width',
  'petal length (cm)': 'petal_length',
  'petal width (cm)': 'petal_width'})

features = ['petal_length', 'petal_width']
target = 'species'

classifier = SVC(kernel='poly', degree=3, C=0.5)
classifier.fit(iris_df[features], iris_df[target])
predictions = classifier.predict(iris_df[features])

print(f1_score(iris_df[target], predictions, average='micro'))

min_val = min(iris_df[features].min()) - 1
max_val = max(iris_df[features].max()) + 1
xx, yy = np.meshgrid(np.arange(min_val, max_val, .02),
                     np.arange(min_val, max_val, .02))
predictions = classifier.predict(np.c_[xx.ravel(), yy.ravel()])
zz = predictions.reshape(xx.shape)
plt.contourf(xx, yy, zz)

for species_and_marker in ((0, 'o'), (1, '^'), (2, 's')):
  plt.scatter(
    iris_df[iris_df[target] == species_and_marker[0]][features[0]],
    iris_df[iris_df[target] == species_and_marker[0]][features[1]],
    marker=species_and_marker[1])
plt.show()

With `C` doubled, the decision boundaries get a little more curved, and the F1 score increases.

In [0]:
from sklearn.datasets import load_iris
from sklearn.metrics import f1_score
from sklearn.preprocessing import StandardScaler
from sklearn.svm import SVC
import matplotlib.pyplot as plt
import numpy as np
import pandas as pd

iris_bunch = load_iris()

iris_df = pd.DataFrame(iris_bunch.data, columns=iris_bunch.feature_names)
iris_df['species'] = iris_bunch.target

scaler = StandardScaler()
scaler.fit(iris_df[iris_bunch.feature_names])

iris_df[iris_bunch.feature_names] = scaler.transform(
    iris_df[iris_bunch.feature_names])

iris_df = iris_df.rename(index=str, columns={
  'sepal length (cm)': 'sepal_length',
  'sepal width (cm)': 'sepal_width',
  'petal length (cm)': 'petal_length',
  'petal width (cm)': 'petal_width'})

classifier = SVC(kernel='poly', degree=3, C=2.0)
classifier.fit(iris_df[features], iris_df[target])
predictions = classifier.predict(iris_df[features])

print(f1_score(iris_df[target], predictions, average='micro'))

min_val = min(iris_df[features].min()) - 1
max_val = max(iris_df[features].max()) + 1
xx, yy = np.meshgrid(np.arange(min_val, max_val, .02),
                     np.arange(min_val, max_val, .02))
predictions = classifier.predict(np.c_[xx.ravel(), yy.ravel()])
zz = predictions.reshape(xx.shape)
plt.contourf(xx, yy, zz)

for species_and_marker in ((0, 'o'), (1, '^'), (2, 's')):
  plt.scatter(
    iris_df[iris_df[target] == species_and_marker[0]][features[0]],
    iris_df[iris_df[target] == species_and_marker[0]][features[1]],
    marker=species_and_marker[1])
plt.show()

---

## Exercise 4: Regression

Use the [LinearSVR](https://scikit-learn.org/stable/modules/generated/sklearn.svm.LinearSVR.html) to predict Boston housing prices in the [Boston housing dataset](https://scikit-learn.org/stable/modules/generated/sklearn.datasets.load_boston.html). Hold out some test data and print your final RMSE.

### **Student Solution**

In [0]:
# Your code goes here

---

### Answer Key

In [0]:
from sklearn.datasets import load_boston
from sklearn.metrics import mean_squared_error
from sklearn.preprocessing import StandardScaler
from sklearn.svm import LinearSVR
import math
import matplotlib.pyplot as plt
import numpy as np
import pandas as pd

boston_bunch = load_boston()

boston_df = pd.DataFrame(boston_bunch.data, columns=boston_bunch.feature_names)
boston_df['price'] = boston_bunch.target

scaler = StandardScaler()
scaler.fit(boston_df[boston_bunch.feature_names])

boston_df[boston_bunch.feature_names] = scaler.transform(
    boston_df[boston_bunch.feature_names])

model = LinearSVR()
model.fit(boston_df[boston_bunch.feature_names], boston_df['price'])
predictions = model.predict(boston_df[boston_bunch.feature_names])

math.sqrt(mean_squared_error(boston_df['price'], predictions))

---