# Lecture 8.2 SVM and Naive Bayes

In [1]:
%matplotlib inline
# import necessary libraries and specify that graphs should be plotted inline. 

# from sklearn.tree import DecisionTreeClassifier
import matplotlib.pyplot as plt
import numpy as np

import warnings
warnings.filterwarnings('ignore')

### Loading Data for Scikit-Learn Datasets
In today's practice, we will use two datasets: the cancer dataset, and the iris dataset. Both are sklearn-embedded datasets. Run cell below to check details for cancer dataset.

In [2]:
from sklearn.datasets import load_breast_cancer # Loading all info of cancer dataset
cancer = load_breast_cancer()
print("Key Elements", cancer.keys())
# cancer.target_names
# cancer.feature_names
# print(cancer.DESCR)

Key Elements dict_keys(['data', 'target', 'frame', 'target_names', 'DESCR', 'feature_names', 'filename'])


## Support Vector Machine


Based on whether the separator is linear (i.e., using kernel function or not), we classify the SVM approach into two types: linear SVM and kernel SVM. These two approaches are realized through different syntax in Scikit-Learn. 

Recall that we also mentioned hard/soft-SVM, based on whether the classifier allows for noisy data points. This difference will be accommodated using different values of hyperparameter C. We set C to a very large number as an approximation of hard-margin SVM.

### Linear SVM
For a baisc Linear SVM classifier, we use syntax:
**<center>sklearn.svm.LinearSVC()</center>**
- **C:** Hyperparameter of how acceptable the model is for margin violations. Smaller C indicates more acceptability. Default value is 1.
- Set random state for technical reasons.

#### Practice
- Load cancer data (sklearn.datasets.load_breast_cancer), use all variables (except the target) as predictors, split the data.
- Train a linear SVC, leave all settings as default. 
    - What is the training and test score?
- Train a linear SVC with grid search and 5-fold cross validation. *(Hint: You may want to set n_jobs = 2 as an input parameter for GridSearchCV. This saves some time for computation)*
    - Let choices of C be: [0.001, 0.01, 0.1, 1, 10, 100, 100000]. 
    - What is the the best C? 
    - Under this case, what is the training and test score? 


In [3]:
# Loading Data
from sklearn.datasets import load_breast_cancer
cancer = load_breast_cancer()

In [4]:
# Splitting Data
from sklearn.model_selection import train_test_split
X_train, X_test, y_train, y_test = train_test_split(cancer.data, cancer.target, random_state = 42)

In [5]:
# Linear SVC 
from sklearn.svm import LinearSVC

lr_svc = LinearSVC(random_state = 0) # default C=1
lr_svc.fit(X_train, y_train)

lr_svc.score(X_test, y_test)

0.965034965034965

In [7]:
# CPU available
import os
n_cpu = os.cpu_count()
n_cpu

8

In [8]:
# Linear SVC with GridSearchCV
from sklearn.model_selection import GridSearchCV

# Define Function
linear_svc = LinearSVC(random_state = 22) # must specify random state here

# Define a list of hyperparameters
params_svc = {'C': [0.001, 0.01, 0.1, 1, 10, 100, 100000]  }

grid_lrsvc = GridSearchCV(linear_svc, params_svc, n_jobs = 2)

grid_lrsvc.fit(X_train, y_train)

GridSearchCV(estimator=LinearSVC(random_state=22), n_jobs=2,
             param_grid={'C': [0.001, 0.01, 0.1, 1, 10, 100, 100000]})

In [9]:
grid_lrsvc.best_params_

{'C': 0.001}

### SVM with Kernel Functions
In most cases, SVM comes together with kernel functions, so that the classifier can handle non-linear separable cases. For implementation, we use syntax:
**<center>sklearn.svm.SVC()</center>**
- First, be aware that the last three letters, SVC, are capitalized.
- **C:** used to specify how acceptable for margin violations. Same as linear case.
- **kernel:** used to specify the kernel function. Choose from {'linear', 'poly', 'rbf', 'sigmoid', 'recomputed'}. These are different kernel functions. We introduced only two of them (i.e., poly and rbf). The default value is 'rbf'.
- **degree:** used when kernel='poly', to specify the polynomial degree. Default value = 3.
- **gamma:** used when kernel = 'rbf', we can set it manually. Default is relevant to feature number and feature variations.
- Set random state for technical reasons.

To explore other details, check https://scikit-learn.org/stable/modules/generated/sklearn.svm.SVC.html

#### Practice
- Use the same cancer data and training-test splittion. Train an SVM model, let kernel function be rbf, leave other parameters and hyperparameters as default. What is the accuracy for training set and test set?
- Train an SVM model, let kernel function be rbf. 
    - Apply grid search with 5-fold CV. Let choices of C be: [0.001, 0.01, 0.1, 1, 10, 100, 10000]. 
    - Let choices of gamma be: [0.0001, 0.001,0.001,0.1,1,10]. 
    - Which model is the best? 
    - What is the performance of the training and test set?


In [10]:
from sklearn.svm import SVC
k_svc = SVC(random_state = 0, kernel = 'rbf')
k_svc.fit(X_train, y_train)
k_svc.score(X_test, y_test)

0.951048951048951

In [11]:
from sklearn.svm import SVC
from sklearn.model_selection import GridSearchCV
# Define Function
base_svc = SVC(random_state = 0, kernel = 'rbf') # if poly kernel, then kernel = 'poly'

#define a list of parameters
param_svc_kernel = {'C':   [0.001, 0.01, 0.1, 1, 10, 100, 10000]     ,
                    'gamma':  [0.0001, 0.001,0.001,0.1,1,10]  } # C = 10,000 mimics hard-margin SVM

#apply grid search
grid_ksvc = GridSearchCV(base_svc, param_svc_kernel, cv = 5, n_jobs=2)

grid_ksvc.fit(X_train, y_train)

GridSearchCV(cv=5, estimator=SVC(random_state=0), n_jobs=2,
             param_grid={'C': [0.001, 0.01, 0.1, 1, 10, 100, 10000],
                         'gamma': [0.0001, 0.001, 0.001, 0.1, 1, 10]})

In [12]:
grid_ksvc.best_params_

{'C': 10, 'gamma': 0.0001}

## Naive Bayes

Naive Bayes can be implemented in several ways. We discuss two specific cases in today's class: (1) If all predictors are categorical, and (2) If some or all predictors are continuous.

#### Categorical Features
If all predictors are categorical, use syntax:

**<center>sklearn.naive_bayes.CategoricalNB()</center>**
- alpha: A smoothing factor. The default value is 1. To get the same result as manually calculated, set alpha = 0.

#### Continuous Features
If all predictors are continuous, we would need to use one of the methods below:
1. Bin the continuous variable first, then use the previously mentioned syntax, sklearn.naive_bayes.CategoricalNB(). *We do not discuss on this method in here.*

2. Assume the data follows a normal distribution. Then we can use the syntax below: 

**<center>sklearn.naive_bayes.GaussianNB()</center>**

For both models (i.e., categorical NB and Gaussian NB), we can obtain predicted probability for each class using .predict_proba. 

#### Practice 1
Replicate the result of the in-class practice (i.e., firm report example, probability = 0.47).


In [None]:
# Plug in data: 
## Charges: =1 if yes, =0 if no.
## Size: =1 if large, =0 if small
## Y: =1 if T, =0 if F
X = np.array([[1,0], [0,0], [0,1], [0, 1], [0,0], [0,0], [1,0], [1,1], [0,1], [1,1]])
Y = np.array([1,1,1,1,1,1,0,0,0,0])

In [None]:
from sklearn.naive_bayes import CategoricalNB



# New Record: Yes, Small


#### Practice 2
Using the iris data, train a Naive Bayes model. Assume variables are normally distributed.
- Split the data into training and test
- Train the model on the training set (Use Gaussian NB)

In [None]:
from sklearn.datasets import load_iris
iris = load_iris() 
# load the complete data information in. It consists both data and descriptive info.
iris.keys()

In [None]:
# Splitting data
from sklearn.model_selection import train_test_split
X_train, X_test, y_train, y_test = train_test_split(iris.data, iris.target, random_state=22)

In [None]:
# Training the Model
from sklearn.naive_bayes import GaussianNB


