# Lab assignment №1, part 3

This lab assignment consists of several parts. You are supposed to make some transformations, train some models, estimate the quality of the models and explain your results.

Several comments:
* Don't hesitate to ask questions, it's a good practice.
* No private/public sharing, please. The copied assignments will be graded with 0 points.
* Blocks of this lab will be graded separately.

__*This is the third part of the assignment. First and second parts are waiting for you in the same directory.*__

In [None]:
import numpy as np

from sklearn.model_selection import train_test_split
from sklearn.model_selection import StratifiedKFold
from sklearn.model_selection import GridSearchCV
from sklearn.pipeline import Pipeline
from sklearn.preprocessing import PolynomialFeatures

from sklearn.datasets import make_moons
from sklearn.datasets import make_circles

from sklearn.linear_model import LogisticRegression
from sklearn.ensemble import RandomForestClassifier
from sklearn.svm import SVC

from sklearn.metrics import f1_score, accuracy_score

import matplotlib.pyplot as plt
from mlxtend.plotting import plot_decision_regions
import plotly.graph_objects as go
from plotly.subplots import make_subplots

##  Part 3. SVM and kernels

Kernels concept get adopted in variety of ML algorithms (e.g. Kernel PCA, Gaussian Processes, kNN, ...).

So in this task you are to examine kernels for SVM algorithm applied to rather simple artificial datasets.

To make it clear: we will work with the classification problem through the whole notebook. 

Let's generate our dataset and take a look on it.

In [None]:
moons_points, moons_labels = make_moons(n_samples=500, noise=0.2, random_state=42)

plt.scatter(moons_points[:, 0], moons_points[:, 1], c=moons_labels)
plt.show()

## 1.1 Pure models.
First let's try to solve this case with good old Logistic Regression and simple (linear kernel) SVM classifier.

Train LR and SVM classifiers (choose params by hand, no CV or intensive grid search neeeded) and plot their decision regions. Calculate one preffered classification metric.

Describe results in one-two sentences.

_Tip:_ to plot classifiers decisions you colud use either sklearn examples ([this](https://scikit-learn.org/stable/auto_examples/neural_networks/plot_mlp_alpha.html#sphx-glr-auto-examples-neural-networks-plot-mlp-alpha-py) or any other) and mess with matplotlib yourself or great [mlxtend](https://github.com/rasbt/mlxtend) package (see their examples for details)

_Pro Tip:_ write function `plot_decisions` taking a dataset and an estimator and plotting the results cause you want to use it several times below

Функция для отрисовки

In [None]:
def plot_decisions(fig, X, y, estimator, row, col, prob=False):
    '''
    Plot decision region for X by the 0 and 1 features using estimator (classification problem)
    If we have classification problem into more than 2 classes, arg prob ignored
    '''
    x_min, x_max = X[:, 0].min() - 0.5, X[:, 0].max() + 0.5
    y_min, y_max = X[:, 1].min() - 0.5, X[:, 1].max() + 0.5
    h = 0.01
    
    xx, yy = np.meshgrid(np.arange(x_min, x_max, h), np.arange(y_min, y_max, h))
    
    Z = estimator.predict(np.hstack((np.expand_dims(xx.ravel(), axis=-1),
                          np.expand_dims(yy.ravel(), axis=-1))))
    
    if len(np.unique(y)) == 2 and prob:
        Z = estimator.predict_proba(np.hstack((np.expand_dims(xx.ravel(), axis=-1),
                                    np.expand_dims(yy.ravel(), axis=-1))))[:, 0]
    
    Z = Z.reshape(xx.shape)
    
    fig.add_trace(go.Contour(z = Z, 
                             x = np.arange(x_min, x_max, h), 
                             y = np.arange(y_min, y_max, h), 
                             colorscale = 'Viridis',
                             showscale=False), row=row, col=col)
    
    fig.add_trace(go.Scatter(x=X[:, 0], 
                             y=X[:, 1],
                             mode='markers',
                             marker=dict(size=5, color=y, colorscale='plotly3')), row=row, col=col)
    
    fig.update_xaxes(title_text='Accuracy: {}'.format(accuracy_score(y, estimator.predict(X))),
                     row=row, col=col)
    
    

In [None]:
lr = LogisticRegression(C = 0.1, max_iter=500)
svm = SVC(kernel='linear', C=0.1, probability=True)

lr.fit(moons_points, moons_labels)
svm.fit(moons_points, moons_labels)

In [None]:
fig = make_subplots(
    rows = 1,
    cols = 2,
    subplot_titles=['Decision region, {}'.format(md) for md in ('svm', 'logreg')]
)

plot_decisions(fig, moons_points, moons_labels, svm, 1, 1, prob=True)
plot_decisions(fig, moons_points, moons_labels, lr, 1, 2, prob=True)

fig.update_layout(
    autosize = False,
    width = 950,
    height = 400,
    showlegend=False)

fig.show()

По самим разделяющим поверхностям трудно понять, какая лучше. Они обе примерный угол наилучшей линейной разделяющей поверхности.  
Точность чуть выше у svm, но тут не так критично - да и обе модели провели прямую, только оптимизировали ее проведение с помощью разных loss-в.  
Для более интересных результатов, будет изменять ядро.

## 1.2 Kernel tirck


Now use different kernels (`poly`, `rbf`, `sigmoid`) on SVC to get better results. Play `degree` parameter and others.

For each kernel estimate optimal params, plot decision regions, calculate metric you've chosen eariler.

Write couple of sentences on:

* What have happenned with classification quality?
* How did decision border changed for each kernel?
* What `degree` have you chosen and why?

In [None]:
param_grid = {'clf__C' : np.linspace(0.0001, 10, 100),
              'clf__gamma' : np.linspace(0, 1, 5)}

pipe = Pipeline([('clf', SVC(kernel='rbf', probability=True))])
svm_rbf = GridSearchCV(pipe, param_grid, n_jobs=-1, verbose=5).fit(moons_points, moons_labels).best_estimator_

param_grid = {'clf__C' : np.linspace(0.0001, 10, 100),
              'clf__gamma' : np.linspace(0, 1, 5),
              'clf__coef0' : [0, 0.5, 1]}
pipe = Pipeline([('clf', SVC(kernel='sigmoid', probability=True))])
svm_sigm = GridSearchCV(pipe, param_grid, n_jobs=-1, verbose=5).fit(moons_points, moons_labels).best_estimator_

param_grid = {'clf__C' : np.linspace(0.0001, 10, 100),
              'clf__degree': np.arange(2, 10),
              'clf__coef0' : [0, 0.5, 1]}

pipe = Pipeline([('clf', SVC(kernel='poly', probability=True))])
svm_poly = GridSearchCV(pipe, param_grid, n_jobs=-1, verbose=5).fit(moons_points, moons_labels).best_estimator_

fig = make_subplots(
    rows = 2,
    cols = 2,
    subplot_titles=['Decision region, {}'.format(md) for md in ('svm rbf',
                                                                'svm sigmoid',
                                                                'svm poly')]
)

plot_decisions(fig, moons_points, moons_labels, svm_rbf, 1, 1, prob=True)
plot_decisions(fig, moons_points, moons_labels, svm_sigm, 1, 2, prob=True)
plot_decisions(fig, moons_points, moons_labels, svm_poly, 2, 1, prob=True)

fig.update_layout(
    autosize = False,
    width = 900,
    height = 900,
    showlegend=False)

fig.show()

In [None]:
svm_poly.named_steps['clf']

Как видно, наихудшим образом приблизило сигмоидальное ядро (даже хуже, чем линейное).  
Нормальное и полиномиальные ядра же показали примерно одинаковую точность. Но если приглядеться к графикам, то в случае нормального ядра класс 0 окружен классом 1, и подобное поведение выглядит странным, в то же время полиномиальное ядро разделяет два класса и не один другим, проводит кривую между ними.  
Оптимальная тепень многочлена равна 8.

## 1.3 Simpler solution (of a kind)
What is we could use Logisitc Regression to successfully solve this task?

Feature generation is a thing to help here. Different techniques of feature generation are used in real life, couple of them will be covered in additional lectures.

In particular case simple `PolynomialFeatures` ([link](https://scikit-learn.org/stable/modules/generated/sklearn.preprocessing.PolynomialFeatures.html)) are able to save the day.

Generate the set of new features, train LR on it, plot decision regions, calculate metric.

* Comare SVM's results with this solution (quality, borders type)
* What degree of PolynomialFeatures have you used? Compare with same SVM kernel parameter.

In [None]:
degrees = [2, 4, 6, 8, 10, 12]

fig = make_subplots(
    rows = 3,
    cols = 2,
    subplot_titles=['Decision region, {}'.format(md) for md in ['logreg degree {}'.format(i) for i in degrees]]
)

param_grid = {'clf__C' : np.linspace(0.0001, 10, 100)}

for i, degree in enumerate(degrees):

    pipe = Pipeline([('poly', PolynomialFeatures(degree=degree)), 
                     ('clf', LogisticRegression(fit_intercept=False))])

    log_reg = GridSearchCV(pipe, param_grid, n_jobs=-1).fit(moons_points, moons_labels).best_estimator_
    
    plot_decisions(fig, moons_points, moons_labels, log_reg, int(i / 2) + 1, int(i % 2) + 1, prob=True)
    
fig.update_layout(
    autosize = False,
    width = 900,
    height = 900,
    showlegend=False)

fig.show()

Уже на степени, равной 6, нам удалось добиться точности не хуже, чем дал SVM.  
Т.е. действительно можно обучать логистическую регрессию над преобразованными нашими признаками, и ожидать прирост точности за счет искуственного создания нелинейных зависимостей от исходных.  
Но очевидно не стоит создать признаки, являющиеся линейной комбинацией исходных.

## 1.4 Harder problem

Let's make this task a bit more challenging via upgrading dataset:

In [None]:
circles_points, circles_labels = make_circles(n_samples=500, noise=0.06, random_state=42)

plt.figure(figsize=(5, 5))
plt.scatter(circles_points[:, 0], circles_points[:, 1], c=circles_labels)

And even more:

In [None]:
points = np.vstack((circles_points*2.5 + 0.5, moons_points))
labels = np.hstack((circles_labels, moons_labels + 2)) # + 2 to distinct moons classes

plt.figure(figsize=(5, 5))
plt.scatter(points[:, 0], points[:, 1], c=labels)

Now do your best using all the approaches above!

Tune LR with generated features, SVM with appropriate kernel of your choice. You may add some of your loved models to demonstrate their (and your) strength. Again plot decision regions, calculate metric.

Justify the results in a few phrases.

Применим все те же 3 модели, посмотрим на accuracy

In [None]:
param_grid = {'clf__C' : np.linspace(0.0001, 10, 10),
              'clf__gamma' : np.linspace(0, 1, 5)}

pipe = Pipeline([('clf', SVC(kernel='rbf', probability=True))])
svm_rbf = GridSearchCV(pipe, param_grid, n_jobs=-1).fit(points, labels).best_estimator_

param_grid = {'clf__C' : np.linspace(0.0001, 10, 10),
              'clf__degree': np.arange(2, 8, 2),
              'clf__coef0' : [0, 1]}

pipe = Pipeline([('clf', SVC(kernel='poly', probability=True))])
svm_poly = GridSearchCV(pipe, param_grid, n_jobs=-1, verbose=5).fit(points, labels).best_estimator_

fig = make_subplots(
    rows = 1,
    cols = 2,
    subplot_titles=['Decision region, {}'.format(md) for md in ('svm rbf',
                                                                'svm sigmoid',
                                                                'svm poly')]
)

plot_decisions(fig, points, labels, svm_rbf, 1, 1, prob=True)
plot_decisions(fig, points, labels, svm_poly, 1, 2, prob=True)

fig.update_layout(
    autosize = False,
    width = 900,
    height = 400,
    showlegend=False)

fig.show()

Рассмотрим логистическую регрессию с полиномиальными фичами

In [None]:
degrees = [2, 4, 6, 8, 10, 12]

fig = make_subplots(
    rows = 1,
    cols = 1,
    subplot_titles=['Decision region, {}'.format('logreg')]
)

param_grid = {'clf__C' : np.linspace(0.1, 3, 10),
              'poly__degree': degrees}

pipe = Pipeline([('poly', PolynomialFeatures()), 
                 ('clf', LogisticRegression(fit_intercept=False))])

log_reg = GridSearchCV(pipe, param_grid, n_jobs=-1, verbose=5).fit(points, labels).best_estimator_
    
plot_decisions(fig, points, labels, log_reg, 1, 1, prob=False)
    
fig.update_layout(
    autosize = False,
    width = 450,
    height = 400,
    showlegend=False)

fig.show()

Еще можно заметить, что наш оптимизатор не смог сойтись, но при этом все равно хорошая точность.

In [None]:
fig = make_subplots(
    rows = 1,
    cols = 1,
    subplot_titles=['Decision region, {}'.format('random forest')]
)

param_grid = {'poly__degree': [2, 3, 4],
              'clf__max_depth': [2, 3, 5, 10, 15, 20, 25, 50],
              'clf__max_features': [3, 5, 7, 10, 20, 40]}


pipe = Pipeline([('poly', PolynomialFeatures()),
                     ('clf', RandomForestClassifier())])

model_rf = GridSearchCV(pipe, param_grid, n_jobs=-1, verbose=5).fit(points, labels).best_estimator_

model_rf.fit(points, labels)

plot_decisions(fig, points, labels, model_rf, 1, 1, prob=False)
    
fig.update_layout(
    autosize = False,
    width = 450,
    height = 400,
    showlegend=False)

fig.show()

А Random Forest у нас переобучился под выборку, можно заметить по достаточно узким полоскам на графике. При этом он все равно уловил основную зависимость между меткой класса и положением точек в пространстве. 

***Вывод.***  


Можно снова заметить, что логистическая регрессия с полиномиальными фичами работает примерно так же как svm с нелинейным ядром. При этом на более сложном датасете точность классификации конечно же упало, но тут очень сильно повлияло наложение классов, идеально разделить было бы трудно.  
Случайный лес же переобучается, по графику видно, что он хоть и улавливает зависимости, но реагирует на шум, из-за этого общая разделяющая поверхность становится более грубой.  