### Jose Portilla's Course project solution
___
# Support Vector Machines Project 


## The Data
We will be using the famous [Iris flower data set](http://en.wikipedia.org/wiki/Iris_flower_data_set). 

The Iris flower data set or Fisher's Iris data set is a multivariate data set introduced by Sir Ronald Fisher in the 1936 as an example of discriminant analysis. 

The data set consists of 50 samples from each of three species of Iris (Iris setosa, Iris virginica and Iris versicolor), so 150 total samples. Four features were measured from each sample: the length and the width of the sepals and petals, in centimeters.

The iris dataset contains measurements for 150 iris flowers from three different species.

The three classes in the Iris dataset:

    Iris-setosa (n=50)
    Iris-versicolor (n=50)
    Iris-virginica (n=50)

The four features of the Iris dataset:

    sepal length in cm
    sepal width in cm
    petal length in cm
    petal width in cm

## Get the data

**Use seaborn to get the iris data by using: iris = sns.load_dataset('iris')**

In [None]:
import seaborn as sns
import pandas as pd
iris = pd.read_csv('../input/iris-flower-dataset/IRIS.csv')

In [None]:
iris.head()

Let's visualize the data!

## Exploratory Data Analysis

**Import some libraries you'll need.**

In [None]:
import pandas as pd
import matplotlib.pyplot as plt
%matplotlib inline

**Create a pairplot of the data set.**

In [None]:
sns.pairplot(data=iris, hue='species')

**Create a kde plot of sepal_length versus sepal width for setosa species of flower.**

In [None]:
setosa = iris[iris['species']=='setosa']
sns.kdeplot( setosa['sepal_width'], setosa['sepal_length'],
                 cmap="plasma", shade=True, shade_lowest=False)

# Train Test Split

** Split your data into a training set and a testing set.**

In [None]:
from sklearn.model_selection import train_test_split
X_train, X_test, y_train, y_test = train_test_split(iris.drop('species', axis=1), iris['species'])

# Train a Model

Now its time to train a Support Vector Machine Classifier. 

**Call the SVC() model from sklearn and fit the model to the training data.**

In [None]:
from sklearn.svm import SVC
svc = SVC()

In [None]:
svc.fit(X_train, y_train)

## Model Evaluation

**Now get predictions from the model and create a confusion matrix and a classification report.**

In [None]:
predictions = svc.predict(X_test)

In [None]:
from sklearn.metrics import classification_report, confusion_matrix

In [None]:
print(confusion_matrix(y_test, predictions))

In [None]:
print(classification_report(y_test, predictions))

Let's see if we can tune the parameters to try to get even better by using Gridsearch!

## Gridsearch

**Import GridsearchCV from SciKit Learn.**

In [None]:
from sklearn.model_selection import GridSearchCV

**Create a dictionary called param_grid and fill out some parameters for C and gamma.**

In [None]:
param_grid = {'C':[0.1,1,10,100], 'gamma':[0.1,0.01, 0.001, 1]}

**Create a GridSearchCV object and fit it to the training data.**

In [None]:
grid = GridSearchCV(svc, param_grid, verbose=3)
grid.fit(X_train, y_train)

**Now take that grid model and create some predictions using the test set and create classification reports and confusion matrices for them.**

In [None]:
pred = grid.predict(X_test)

In [None]:
confusion_matrix(y_test, pred)

In [None]:
print(classification_report(y_test, pred))