# Linear models for classification

## Logistic regression

• L2 regularization is 'on' by default (like ridge regression) and controlled by **C** parameter

• Larger C means less Regularization

• As with regularized linear regression, it can be important to normalize all features so that they are on the same scale.

In [None]:
# Logistic regression for binary classification

from sklearn.linear_model import LogisticRegression

X_train, X_test, y_train, y_test = train_test_split(X_C2, y_C2,random_state = 0)

clf = LogisticRegression(C=100).fit(X_train, y_train)

print('Accuracy of Logistic regression classifier on training set: {:.2f}'
     .format(clf.score(X_train, y_train)))
print('Accuracy of Logistic regression classifier on test set: {:.2f}'
     .format(clf.score(X_test, y_test)))

[clf.predict([[h,w]])[0]]

#### Regularization shines for data that has higher dimensional feature space

In [None]:
# Logistic regression regularization: C parameter

for this_C, subplot in zip([0.1, 1, 100], subaxes):
    clf = LogisticRegression(C=this_C).fit(X_train, y_train)
    print('Accuracy of Logistic regression classifier with C = {} on training set: {:.2f}'
     .format(this_C, clf.score(X_train, y_train)))
    print('Accuracy of Logistic regression classifier with C = {} on test set: {:.2f}\n'
     .format(this_C, clf.score(X_test, y_test)))

## SVM

**• The strength of regularization is determined by C parameter**

**• Larger values of C: less regularization**
        
        –Fit the training data as well as possible
        –Each individual data point is important to classify correctly

**• Smaller values of C: more regularization**
        
        –More tolerant of errors on individual data points

### Linear Support Vector Machine

In [2]:
from sklearn.svm import SVC

X_train, X_test, y_train, y_test = train_test_split(X_C2, y_C2, random_state = 0)

this_C = 1.0
clf = SVC(kernel = 'linear', C=this_C).fit(X_train, y_train)
print('Coefficients:\n', clf.coef_)
print('Intercepts:\n', clf.intercept_)

# Linear Support Vector Machine: C parameter
for this_C, subplot in zip([0.00001, 100], subaxes):
    clf = LinearSVC(C=this_C).fit(X_train, y_train)
    print('Accuracy of Linear SVC classifier on training set: {:.2f}'
     .format(clf.score(X_train, y_train)))
    print('Accuracy of Linear SVC classifier on test set: {:.2f}'
     .format(clf.score(X_test, y_test)))

## Kernelized Support Vector Machines

Can be used for either Classification or Regression

### Classification

### RBF Kernel

• A kernel is a similarity measure (modified dot product) between data points

• Transforms input data into higher dimensional feature space and then it is easier to find the hyperplane separting the data.

• Still using Maximum Margin principle but due to non-linear transformation, these boundaries are not equally distant from amrgin edges points in orignal input space

In [None]:
from sklearn.svm import SVC

X_train, X_test, y_train, y_test = train_test_split(X_D2, y_D2, random_state = 0)

# The default SVC kernel is radial basis function (RBF)
clf = SVC().fit(X_train, y_train)

#### Polynomial kernel, degree = 3

In [None]:
clf = SVC(kernel = 'poly', degree = 3).fit(X_train, y_train)

#### Support Vector Machine with RBF kernel: gamma parameter

• Smaller Gamma means a large similarity radius i.e. points farther apart are considered similar. More points grouped togetehr Smoother decision Boundary

• Larger Gammas means Kernel value deacy quickly point have to be close to be considered similar. More complex tightly restrained decision boundaries

In [None]:
X_train, X_test, y_train, y_test = train_test_split(X_D2, y_D2, random_state = 0)

for this_gamma, subplot in zip([0.01, 1.0, 10.0], subaxes):
    clf = SVC(kernel = 'rbf', gamma=this_gamma).fit(X_train, y_train)

#### Support Vector Machine with RBF kernel: using both C and gamma parameter

In [None]:
from sklearn.svm import SVC
from sklearn.model_selection import train_test_split

# Can also normalized data with feature preprocessing using minmax scaling
# from sklearn.preprocessing import MinMaxScaler
# scaler = MinMaxScaler()
# X_train_scaled = scaler.fit_transform(X_train)
# X_test_scaled = scaler.transform(X_test)

X_train, X_test, y_train, y_test = train_test_split(X_D2, y_D2, random_state = 0)

for this_gamma, this_axis in zip([0.01, 1, 5], subaxes):   
    for this_C, subplot in zip([0.1, 1, 15, 250], this_axis):
        title = 'gamma = {:.2f}, C = {:.2f}'.format(this_gamma, this_C)
        clf = SVC(kernel = 'rbf', gamma = this_gamma,C = this_C).fit(X_train, y_train)