___
<h1> Machine Learning </h1>
<h2> M. Sc. in Electrical and Computer Engineering </h2>
<h3> Instituto Superior de Engenharia / Universidade do Algarve </h3>

[MEEC](https://ise.ualg.pt/en/curso/1477) / [ISE](https://ise.ualg.pt) / [UAlg](https://www.ualg.pt)

Pedro J. S. Cardoso (pcardoso@ualg.pt)
___

# SVM 

Support vector machines (SVMs) are a set of supervised learning methods used for classification, regression and outliers detection.
See https://scikit-learn.org/stable/modules/svm.html for an explanation of the module and https://scikit-learn.org/stable/modules/svm.html#svm-mathematical-formulation for a the mathematical formulation.

## Classification

In [None]:
from sklearn.svm import SVC
from sklearn.datasets import load_iris
from sklearn.model_selection import train_test_split

iris = load_iris()

X_train, X_test, y_train, y_test = train_test_split(iris.data,
                                                    iris.target, 
                                                    random_state=10)

In [None]:
svm = SVC(
    C=.1, 
    kernel='poly', 
    degree=4).fit(X_train, y_train)

score = svm.score(X_test, y_test)
score

In [None]:
svm = SVC(
    C=.01, 
    kernel='poly', 
    degree=4).fit(X_train, y_train)

score = svm.score(X_test, y_test)
print(score)
print('"1.0!! Pure luke"!! try with other random state value (train_test_split)!')

See also https://scikit-learn.org/stable/auto_examples/svm/plot_iris_svc.html#sphx-glr-auto-examples-svm-plot-iris-svc-py

## Regression

Next we present a few examples of regression using SVM

### LinearSVR

In [None]:
%matplotlib inline
from sklearn.svm import LinearSVR
from sklearn.datasets import load_boston
from sklearn.model_selection import train_test_split
import matplotlib.pyplot as plt
import numpy as np

boston = load_boston()

X = boston.data
y = boston.target

X_train, X_test, y_train, y_test = train_test_split(X,y,
                                                    shuffle=True,
                                                    random_state=1,
                                                    test_size=0.1)

svm = LinearSVR(C=100,
                max_iter=100000,
                random_state=1
               ).fit(X_train, y_train)

score = svm.score(X_test, y_test)
score        

In [None]:
y_pred = svm.predict(X_test)

plt.figure(figsize=(15,10))

plt.plot(y_test, c='b')
plt.plot(y_pred, c='g')
plt.plot(np.abs(y_pred-y_test), c='r')
plt.xlabel("House_i in the test dataset")
plt.ylabel("House value")

plt.legend(["test", "pred", "$\Delta = |y_i-\hat{y_i}|$"])

### SVR:  kernel='poly'

After some "fight" with the parameters...

In [None]:
from sklearn.svm import SVR

svm = SVR(C=10000, 
          kernel='poly', 
          gamma=.00001,
          degree=3,
          max_iter=100000000,
          shrinking=True
         ).fit(X_train, y_train)

score = svm.score(X_test, y_test)
score        

In [None]:
y_pred = svm.predict(X_test)

import pandas as pd
df = pd.DataFrame([y_test, y_pred, y_pred-y_test]).T
df.columns =  ["test", "pred", "\Delta"]
df


In [None]:
plt.figure(figsize=(15,10))

plt.plot(y_test, c='b')
plt.plot(y_pred, c='g')
plt.plot(np.abs(y_pred-y_test), c='r')
plt.xlabel("House_i in the test dataset")
plt.ylabel("House value")

plt.legend(["test", "pred", "$\Delta = |y_i-\hat{y_i}|$"])

plt.show()

In [None]:
X_test

# Cross validation

But, really, how good are our models?

## Classification

In [None]:
from sklearn.model_selection import cross_val_score
from sklearn.datasets import load_iris
from sklearn.svm import SVC

svc = SVC()
scores = cross_val_score(estimator=svc, # model
                         X=iris.data, 
                         y=iris.target, # X, y
                         cv=5,       #number of folds - default 3-fold cross validation (see alternatives in documentation)
                         n_jobs=-1,  # use all CPU
                         verbose=1,   # verbose level 
                        )
scores

## Regression

In [None]:
from sklearn.svm import LinearSVR
from sklearn.datasets import load_boston
from sklearn.model_selection import train_test_split

boston = load_boston()

svr = SVR(C=1000)

scores = cross_val_score(estimator=svr, # model
                         X=boston.data, 
                         y=boston.target, # X, y
                         cv=5,       #number of folds - default 3-fold cross validation (see alternatives in documentation)
                         n_jobs=-1,  # use all CPU
                         verbose=1,   # verbose level 
                        )
scores

... needs improvements!?