# SUPPORT VECTOR MACHINE (SVM)

### What is it?
SVM is a supervised machine learning model used for classification and regression. From what we have found it is mostly used for classification. This algorithm separates data based on categories or outputs the user defines.

### When to use it?
Thanksome say it is the best algorithm to use. Compared to other algorithms SVM is particulary good in the following situations.
1. When the number of features (variables) and number of training data observations are both very large (e.g. millions of features and millions of instances).
2. When sparsity in the problem is very high (e.g. most of the features have zero value).
It is often used for image classification, gene classsification, drug disambiguation


### Limitations?
1. Can be hard to understand exactly what is going on with SVM. It is sometimes called a black box. 
2. It does not give probabilities.
3. Can be cumberson for multiclass problems.
4. Choosing a good kernel function is difficult.


#### For definition of statistics terms visit: http://www.nedarc.org/statisticalHelp/statisticalTermsDictionary.html
#### For definitions of machine learning terms visit: https://developers.google.com/machine-learning/glossary/

In [1]:
#Bring in dependencies
# %matplotlib inline
import matplotlib.pyplot as plt
import numpy as np
import pandas as pd
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import StandardScaler
from sklearn.metrics import mean_squared_error, r2_score
from sklearn.metrics import classification_report, confusion_matrix  
from sklearn import metrics
from sklearn import datasets
from sklearn import svm

In [2]:
#Load dataset
cancer = datasets.load_breast_cancer()


#The 13 features used as independent variables
cancer_X = cancer.data

#The response or dependent variable is a quantitative measure of disease progression one year after baselin
cancer_y = cancer.target




#We will look at 13 data measurements of breast cancer patients as well as if their tumor was benign or malignant
# print the names of the 13 features
print("Features: ", cancer.feature_names)

# print the label type of cancer('malignant' 'benign')
print("Target Outcomes: ", cancer.target_names)


Features:  ['mean radius' 'mean texture' 'mean perimeter' 'mean area'
 'mean smoothness' 'mean compactness' 'mean concavity'
 'mean concave points' 'mean symmetry' 'mean fractal dimension'
 'radius error' 'texture error' 'perimeter error' 'area error'
 'smoothness error' 'compactness error' 'concavity error'
 'concave points error' 'symmetry error' 'fractal dimension error'
 'worst radius' 'worst texture' 'worst perimeter' 'worst area'
 'worst smoothness' 'worst compactness' 'worst concavity'
 'worst concave points' 'worst symmetry' 'worst fractal dimension']
Target Outcomes:  ['malignant' 'benign']


In [3]:
# Split dataset into training set and test set
X_train, X_test, y_train, y_test = train_test_split(cancer_X, cancer_y, random_state=42) 

In [4]:
#Create a svm Classifier
clf = svm.SVC(kernel='linear') # Linear Kernel

#Train the model using the training sets
clf.fit(X_train, y_train)

# Make predictions using the testing set
y_pred = clf.predict(X_test)

In [5]:
df = pd.DataFrame({'y test':y_test, 'y prediction':y_pred})
df

Unnamed: 0,y test,y prediction
0,1,1
1,0,0
2,0,0
3,1,1
4,1,1
5,0,0
6,0,0
7,0,0
8,1,1
9,1,1


In [6]:
# Model Accuracy: how often is the classifier correct for the testing data?
print("Accuracy:",metrics.accuracy_score(y_test, y_pred))

Accuracy: 0.958041958041958
