## Support Vector Machines
<br>

Support Vector Machines are __supervised learning__ models for classification and regression problems. . SVM is commonly used in classfication of text documents or image identification , in general has a good accuracy in predicting __High dimensional features.__

### Use Case: Predict Rating of Side Effects of a Drug basis online comments

### Load Libraries

In [None]:
import numpy as np
import pandas as pd

# Visualization
import matplotlib.pyplot as plt
from matplotlib.ticker import FuncFormatter
import seaborn as sns

# sklearn for feature extraction & modeling
from sklearn.feature_extraction.text import CountVectorizer, TfidfVectorizer, TfidfTransformer
from sklearn.naive_bayes import MultinomialNB
from sklearn import svm
from sklearn.metrics import confusion_matrix
from sklearn.model_selection import train_test_split
from sklearn.externals import joblib
from sklearn.pipeline import Pipeline
from sklearn.model_selection import GridSearchCV
from sklearn.preprocessing import LabelEncoder
# Iteratively read files
import glob
import os

# For displaying images in ipython
from IPython.display import HTML, display
# Plotting libraries
from IPython.display import SVG
#from graphviz import Source
from IPython.display import display
import matplotlib
import matplotlib.pyplot as plt
import seaborn as sns
sns.set(color_codes = True)
%matplotlib inline

### Load data files
Drug Rating data: https://archive.ics.uci.edu/ml/datasets/Drug+Review+Dataset+%28Druglib.com%29

citation: Felix Gräßer, Surya Kallumadi, Hagen Malberg, and Sebastian Zaunseder. 2018. Aspect-Based Sentiment Analysis of Drug Reviews Applying Cross-Domain and Cross-Data Learning. In Proceedings of the 2018 International Conference on Digital Health (DH '18). ACM, New York, NY, USA, 121-125. DOI

In [None]:
df = pd.read_csv("drugLib_raw/drugLibTrain_raw.tsv",sep= "\t")

In [None]:
print("Size of training data set: {}".format(df.shape))
print("................................................\n")
df.head()

In [None]:
df = df.dropna() # drop any na / null rows from data

In [None]:
# Load test data separately
test = pd.read_csv("drugLib_raw/drugLibTest_raw.tsv",sep= "\t")
print("Size of training data set: {}".format(test.shape))
print("................................................\n")
test.head()

In [None]:
ratings = df['sideEffects'].value_counts()
ratings

### Split Data ---> Train(80%) Test (20%)

In [None]:
X_train, X_test, y_train, y_test = train_test_split(df["sideEffectsReview"], df["sideEffects"],random_state = 42,
                                                   test_size = 0.20)
X_train.shape,X_test.shape,y_train.shape

### Build Preprocessing ---> Model Training Pipeline

- Using Naive Bayes Classfier

In [None]:
# Building Pipeline for raw text transformation
clf = Pipeline([
    ('vect', CountVectorizer(stop_words= "english")),
    ('tfidf', TfidfTransformer()),
    ('classifier', MultinomialNB()),
    ])

In [None]:
model = clf.fit(X_train,y_train)

In [None]:
print("Accuracy of Naive Bayes Classifier is {}".format(model.score(X_test,y_test)))

### Build Preprocessing ---> Model Training Pipeline

### Using Support Vector Machine

### Introduciton: 
Rather than modeling each class, we simply find a line or curve (in two dimensions) or manifold (in multiple dimensions) that divides the classes from each other.

<img src="images/svm.jpeg" alt="svm" style="width:30%">

### Goal:
To maximize the margin between the points on either side of the so called decision line. The benefit of this process is, that after the separation, the model can easily guess the target classes (labels) for new cases.

#### Linear & Non Linear Data
- Linear data or two variables are called linear if there relationship can be expressed as Y = (a0 + aiX) which is equation of line. The same data can also be divided into two regions using a line.
<br>

<img src="images/linear_nonlinear.png" alt="lin" style="width:50%">
- Non Linear data - has complex relationship among variables (features) and cannot be easily separated by a line as show by above figure on the right.

### Kernel Trick for Non Linear data
It is a set of mathematical transformation of exisiting features into higher dimentional feature space. This allows to define separable boundary to classify data between multiple categories.

<img src="images/kernel.png" alt="kernel" style="width:50%">

In [None]:
# Building Pipeline for raw text transformation
clf = Pipeline([
    ('vect', CountVectorizer(stop_words= "english")),
    ('tfidf', TfidfTransformer()),
    ('classifier', svm.SVC(kernel = "linear")),
    ])

In [None]:
model = clf.fit(X_train,y_train)

In [None]:
print("Accuracy of Support Vector Machine Classifier is {}".format(model.score(X_test,y_test)))

### Construct Confusion Matrix

In [None]:
# Predict on Test data
y_predicted = model.predict(X_test)
y_predicted[0:10]

In [None]:
#Confusion Matrix
#Compute confusion matrix
cnf_matrix = confusion_matrix(y_test, y_predicted)
np.set_printoptions(precision=2)
cnf_matrix

In [None]:
import itertools

def plot_confusion_matrix(cm, classes,
                          normalize=False,
                          title='Confusion matrix',
                          cmap=plt.cm.Blues):
    """
    This function prints and plots the confusion matrix.
    Normalization can be applied by setting `normalize=True`.
    """
    if normalize:
        cm = cm.astype('float') / cm.sum(axis=1)[:, np.newaxis]
        print("Normalized confusion matrix")
    else:
        print('Confusion matrix, without normalization')

    print(cm)

    plt.imshow(cm, interpolation='nearest', cmap=cmap)
    plt.title(title)
    plt.colorbar()
    tick_marks = np.arange(len(classes))
    plt.xticks(tick_marks, classes, rotation=45)
    plt.yticks(tick_marks, classes)

    fmt = '.2f' if normalize else 'd'
    thresh = cm.max() / 2.
    for i, j in itertools.product(range(cm.shape[0]), range(cm.shape[1])):
        plt.text(j, i, format(cm[i, j], fmt),
                 horizontalalignment="center",
                 color="white" if cm[i, j] > thresh else "black")

    plt.ylabel('True label')
    plt.xlabel('Predicted label')
    plt.tight_layout()

In [None]:
#With Normalization
plt.figure(figsize= (8,8))
plot_confusion_matrix(cnf_matrix, classes= np.sort(df["sideEffects"].unique()),
                      title='Confusion matrix, without normalization')
# With normalization
plt.figure(figsize= (8,8))
plot_confusion_matrix(cnf_matrix, classes= np.sort(df["sideEffects"].unique())
                      , normalize=True,title='Normalized confusion matrix')

plt.show()

### Build Model on Entire Data and predict Test data

In [None]:
# Building Pipeline for raw text transformation
clf = Pipeline([
    ('vect', CountVectorizer(stop_words= "english")),
    ('tfidf', TfidfTransformer()),
    ('classifier', svm.SVC(kernel = "linear")),
    ])

In [None]:
model = clf.fit(df["sideEffectsReview"],df["sideEffects"])

In [None]:
print("Accuracy of Naive Bayes Classifier is {}".
      format(model.score(test["sideEffectsReview"],test["sideEffects"])))

### Construct Confusion Matrix

In [None]:
y_pred = model.predict(test["sideEffectsReview"])
#Confusion Matrix
# Compute confusion matrix
cnf_matrix = confusion_matrix(test["sideEffects"], y_pred)
np.set_printoptions(precision=2)
cnf_matrix

In [None]:
#With Normalization
plt.figure(figsize= (8,8))
plot_confusion_matrix(cnf_matrix, classes= np.sort(df["sideEffects"].unique()),
                      title='Confusion matrix, without normalization')
# With normalization
plt.figure(figsize= (8,8))
plot_confusion_matrix(cnf_matrix, classes= np.sort(df["sideEffects"].unique())
                      , normalize=True,title='Normalized confusion matrix')

plt.show()

### Pros & Cons of SVM

Below excertp from sklearn documentation on svm.

__The advantages of support vector machines are:__

- Effective in high dimensional spaces.
- Still effective in cases where number of dimensions is greater than the number of samples.
- Uses a subset of training points in the decision function (called support vectors), so it is also memory efficient.
- Versatile: different Kernel functions can be specified for the decision function. Common kernels are provided, but it is also possible to specify custom kernels.
<br>

__The disadvantages of support vector machines include:__

- If the number of features is much greater than the number of samples, avoid over-fitting in choosing Kernel functions and regularization term is crucial.
- SVMs do not directly provide probability estimates, these are calculated using an expensive five-fold cross validation (see Scores and probabilities, below).

### Further Reading / Exploration

Analytics Vidya: https://www.analyticsvidhya.com/blog/2017/09/understaing-support-vector-machine-example-code/

kdnuggets: https://www.kdnuggets.com/2017/08/support-vector-machines-learning-svms-examples.html