# Support Vector Machines

## Linear SVM Classification

In [6]:
# The LinearSVC class regularized the bias term,so you should center the training set first by subtracting its mean.
# This is automatic if you scale the data using the StandardScaler. moreover,make sure you set the loss hyperparameter 
# to "hinge",as it is not the default value.Finally,for better performance you should set the dual hyperparameter to False,
#unless tere are more features than training instances.

In [11]:
# The following Scikit-Learn code loads the iris dataset, scales the features, and then trains a linear SVM model( using 
# the LinearSVC class with C = 1 and the hinge loss function, described shortly)to detect Iris-Virginica flowers.



import numpy as np
from sklearn import datasets
from sklearn.pipeline import Pipeline
from sklearn.preprocessing import StandardScaler
from sklearn.svm import LinearSVC

iris = datasets.load_iris()
X = iris['data'][:, (2, 3)]    # petal lenght, petal width
y = (iris['target']==2).astype(np.float64)   # Iris-Virginica

svm_clf = Pipeline([
    ('scaler', StandardScaler()),
    ('linear_svc', LinearSVC(C=1, loss='hinge')),
])

svm_clf.fit(X, y)

svm_clf.predict([[5.5, 1.7]])

array([1.])

In [13]:
iris.target_names

array(['setosa', 'versicolor', 'virginica'], dtype='<U10')

In [None]:
# unlike Logistic Regression classifiers, SVM classifiers do not output probabilities for each class.
# Alternatively ,you could use the svc class,using SVC(kernel="linear", C=1), but it is much slower,especially with 
# large training sets, so it is not recommended. 
# Another option is to use the SGDClassifier class,with SGDClassifier(loss='hinge', alpha=1/(m*C)). This applies regular 
# Stochastic Gradient Descent to train a linear SVM classifier.

## Nonlinear SVM Classification

In [7]:
# one approach to handling nonlinear datasets is to add more features,such as polynomial features(chapter-4); in some 
# cases this can result in a linearly seperable dataset.

In [None]:
# To implement this idea using Scikit-Learn, you can create a Pipeline containing a PolynomialFeatures transforme,
# followed by a StandardScaler and a LinearSVC.
# Let's test this on the moons dataset: this is a toy dataset for binary classification in which the data points are 
# shaped as two interleaving half circles. You can generate this dataset using the make_moons() function:



from sklearn.datasets import make_moons
from sklearn.pipeline import Pipeline
from sklearn.preprocessing import PolynomialFeatures

polynomial_svm_clf = Pipeline([
    ("poly_features", PolynomialFeatures(degree=3)),
    ("scaler", StandardScaler()),
    ("svm_clf", LinearSVC(C=10, loss="hinge"))
])
polynomial_svm_clf.fit(X, y)

### Polynomial Kernel

In [10]:
# ...but at a low polynomial degree it cannot deal with very complex datasets, and with a high polynomial degree it 
# creates a huge number of features, making the model too slow.
# Fortunately, when using SVMs you can apply an almost miraculous mathematical technique called the kernel trick.it 
# makes it possible to get the same result as if you added many polynomial features, even with very high degree- 
#polynomials, without actually having to add them.
# Let's test it on the moons dataset:

from sklearn.svm import SVC
poly_kernel_svm_clf = Pipeline([
    ("scaler", StandardScaler()),
    ("svm_clf", SVC(kernel="poly", degree=3, coef0=1, C=5))
])
poly_kernel_svm_clf


Pipeline(memory=None,
     steps=[('scaler', StandardScaler(copy=True, with_mean=True, with_std=True)), ('svm_clf', SVC(C=5, cache_size=200, class_weight=None, coef0=1,
  decision_function_shape='ovr', degree=3, gamma='auto_deprecated',
  kernel='poly', max_iter=-1, probability=False, random_state=None,
  shrinking=True, tol=0.001, verbose=False))])

### Adding Similarity Features 