# Linear models - Beyond linear separations

In this notebook, we will illustrate that using the right preprocessing, the separation of a linear model can make this model flexible enough to fit data where the link between the features and the target is non-linear.

In [None]:
import sklearn

sklearn.set_config(display="diagram")

## Limitation of linear separation

We will create a complex classification toy dataset where we expect a linear model to not work.
Let's generate the dataset and make a scatter plot of the dataset.

In [None]:
import numpy as np
import pandas as pd
from sklearn.datasets import make_moons

feature_names = ["Feature #0", "Features #1"]
target_name = "class"

X, y = make_moons(n_samples=100, noise=0.13, random_state=42)

# We store both the data and target in a dataframe to ease plotting
moons = pd.DataFrame(np.concatenate([X, y[:, np.newaxis]], axis=1),
                     columns=feature_names + [target_name])
moons[target_name] = moons[target_name].astype("category")
X, y = moons[feature_names], moons[target_name]

In [None]:
import seaborn as sns
sns.set_context("poster")

In [None]:
import matplotlib.pyplot as plt

_ = moons.plot.scatter(
    x=feature_names[0], y=feature_names[1], c=y,
    s=50, cmap=plt.cm.RdBu,
)

Looking at the dataset, we observe that a linear separation will not do a good enough job to discriminate both classes.

<div class="alert alert-success">
    <p><b>EXERCISE</b>:</p>
    <ul>
        <li>Fit a <tt>LogisticRegression</tt> model on the dataset.</li>
        <li>Using the helper class <tt>helper.plotting.DecisionBoundaryDisplay</tt>, draw the decision boundary of the model.</li>
    </ul>
</div>

In [None]:
# %load solutions/solution_35.py

In [None]:
# %load solutions/solution_36.py

<div class="alert alert-success">
    <p><b>EXERCISE</b>:</p>
    <ul>
        <li>Fit a <tt>LogisticRegression</tt> model on the dataset but this time insert a <tt>sklearn.preprocessing.PolynomialFeatures</tt> transformer.</li>
        <li>Using the helper class <tt>helper.plotting.DecisionBoundaryDisplay</tt>, draw the decision boundary of the model.</li>
    </ul>
</div>

In [None]:
# %load solutions/solution_37.py

In [None]:
# %load solutions/solution_38.py

## What about SVM

Another family of linear algorithms are Support Vector Machine (SVM). The training paradigm is different from logistic regression. This model try to find the hyperplane that maximize the margin to the point close to the hyperplane.

In [None]:
from sklearn.svm import LinearSVC

model = make_pipeline(StandardScaler(), LinearSVC())
model.fit(X, y)

In [None]:
display = DecisionBoundaryDisplay.from_estimator(
    model, X, cmap=plt.cm.RdBu,
)
_ = moons.plot.scatter(
    x=feature_names[0], y=feature_names[1], c=y,
    s=50, cmap=plt.cm.RdBu, ax=display.ax_
)

What made SVM interesting at some point was their capability to become non-linear using a so-called "kernel trick". The kernel trick allows to project the data in an higher dimensional space but without to build explicitely the kernel itself and only computing the dot product in this space. The class `SVC` allows to use such kernel. We will use a polynomial kernel to create something similar to the previous pipeline that used a `PolynomialFeatures`.

In [None]:
from sklearn.svm import SVC

model = make_pipeline(StandardScaler(), SVC(kernel="poly", degree=3))
model.fit(X, y)

In [None]:
display = DecisionBoundaryDisplay.from_estimator(
    model, X, cmap=plt.cm.RdBu,
)
_ = moons.plot.scatter(
    x=feature_names[0], y=feature_names[1], c=y,
    s=50, cmap=plt.cm.RdBu, ax=display.ax_
)

One can even used different type of kernel, for instance Radial Basis Function (RBF).

In [None]:
from sklearn.svm import SVC

model = make_pipeline(StandardScaler(), SVC(kernel="rbf"))
model.fit(X, y)

In [None]:
display = DecisionBoundaryDisplay.from_estimator(
    model, X, cmap=plt.cm.RdBu,
)
_ = moons.plot.scatter(
    x=feature_names[0], y=feature_names[1], c=y,
    s=50, cmap=plt.cm.RdBu, ax=display.ax_
)

Be aware that SVM do not scale very well with the number of data point. Sometimes, it is better to use a kernel approximation and create the explicit kernel with a transformer such as `Nystroem`.

In [None]:
from sklearn.kernel_approximation import Nystroem

model = make_pipeline(Nystroem(), LogisticRegression())
model.fit(X, y)

In [None]:
display = DecisionBoundaryDisplay.from_estimator(
    model, X, cmap=plt.cm.RdBu,
)
_ = moons.plot.scatter(
    x=feature_names[0], y=feature_names[1], c=y,
    s=50, cmap=plt.cm.RdBu, ax=display.ax_
)

We see that the decision boundary of this model is pretty similar to an SVM with an RBF kernel. Now, let's do an exercise to demonstrate the scaling limitation of the SVM classifier.

In [None]:
data = pd.read_csv("../datasets/adult-census-numeric-all.csv")
data.head()

In [None]:
target_name = "class"
X = data.drop(columns=target_name)
y = data[target_name]

In [None]:
X.shape

The dataset contains almost 50,000 samples that is already a lot for an SVM model.

<div class="alert alert-success">
    <p><b>EXERCISE</b>:</p>
    <ul>
        <li>Split the dataset into a training and testing sets.</li>
        <li>Create a model containing a SVM that uses an RBF kernel. Check the time that the model needs to be fitted.</li>
        <li>Repeat the same experiment with a model that uses a Nystroem kernel approsimation and a logistic regression.</li>
        <li>Check the score of both models on the testing set.</li>
    </ul>
</div>

In [None]:
# %load solutions/solution_39.py

In [None]:
# %load solutions/solution_40.py

In [None]:
# %load solutions/solution_41.py

In [None]:
# %load solutions/solution_42.py

In [None]:
# %load solutions/solution_43.py