#Thinkful Data Science Course
##Unit 4: Predicting the Future; 
##Lesson 8: Evaluating Classifier Performance

###Evaluating Classifier Performance Overview

Throughout the unit we've been splitting our data into training, test, and validation sets. Let's take a moment and discuss why this is necesary. By now you can probably see that learning an estimator and testing that estimator's performance on the same data is a methodological mistake. It's like if a professor administered a test with the exact same questions as the practice test. All a student would have to do to get 100% would be to memorize all the solutions to the practice test; they wouldn't acutally have to learn anything. If you test your estimator on the data used to train it, it knows all the answers, and thus can achieve a perfect score, even though it very well could fail to predict any- thing on data it's never seen before. This is called overfitting. Predicting on never-before-seen data is kind of the whole point, so knowing how our estimator performs on data its already seen isn't really useful.

Holding out a subset of your data for testing, i.e., excluding a subset of your data from your training set, gives you some never-before-seen data to test your estimator's performance. The scikit-learn library has a train_test_split helper function to randomly split data into training and test sets.

When evaluating different settings (“hyperparameters”) for estimators, such as the C setting that must be manually set for an SVM, there is still a risk of overfitting on the test set because the parameters can be tweaked until the estimator performs optimally. This way, knowledge about the test set can “leak” into the model and we can't make claims about how it will generalize (i.e., how it will perform) on never-before-seen data.

To resolve this problem, we can hold out yet another subset of our data for validation. Training proceeds on the training set, evaluation is done on the validation set, and when it seems like we have a good model, we can perform our final evaluation on the test set.

####Use the cross_validation.train_test_split() helper function to split the Iris dataset into training and test sets, holding out 40% of the data for testing. 
How many points do you have in your training set? In your test set?

In [2]:
from sklearn.cross_validation import train_test_split
import pandas as pd

In [3]:
from sklearn import datasets
iris = datasets.load_iris()

In [4]:
iris_df = pd.DataFrame()
iris_df['sepal_length'] = iris.data[:,0]
iris_df['sepal_width'] = iris.data[:,1]
iris_df['petal_length'] = iris.data[:,2]
iris_df['petal_width'] = iris.data[:,3]
iris_df['target'] = iris.target
iris_df['target_flower'] = iris.target
iris_df['target_flower'].replace(0, 'setosa', inplace = True)
iris_df['target_flower'].replace(1, 'versicolor', inplace = True)
iris_df['target_flower'].replace(2, 'virginica', inplace = True)
iris_df1 = iris_df[iris_df['target_flower']=='setosa']
iris_df2 = iris_df[iris_df['target_flower']=='versicolor']
iris_df3 = iris_df[iris_df['target_flower']=='virginica']

In [5]:
X = iris_df[['sepal_length', 'sepal_width', 'petal_length', 'petal_width']].as_matrix()
y = iris_df['target'].as_matrix()

In [6]:
X_train, X_test, y_train,y_test = train_test_split(X, y, test_size=0.40)

In [7]:
X_train

array([[ 5.9,  3.2,  4.8,  1.8],
       [ 6.7,  3.1,  4.4,  1.4],
       [ 5.6,  3. ,  4.1,  1.3],
       [ 5.6,  2.8,  4.9,  2. ],
       [ 6.1,  2.8,  4.7,  1.2],
       [ 6.5,  3. ,  5.5,  1.8],
       [ 6.7,  2.5,  5.8,  1.8],
       [ 5.6,  2.5,  3.9,  1.1],
       [ 4.9,  3.1,  1.5,  0.1],
       [ 5.7,  2.8,  4.1,  1.3],
       [ 5.9,  3. ,  5.1,  1.8],
       [ 7.1,  3. ,  5.9,  2.1],
       [ 6.3,  2.9,  5.6,  1.8],
       [ 4.5,  2.3,  1.3,  0.3],
       [ 6.2,  2.8,  4.8,  1.8],
       [ 5.2,  3.5,  1.5,  0.2],
       [ 5. ,  3.5,  1.3,  0.3],
       [ 7. ,  3.2,  4.7,  1.4],
       [ 5.5,  3.5,  1.3,  0.2],
       [ 5.7,  2.8,  4.5,  1.3],
       [ 5. ,  2. ,  3.5,  1. ],
       [ 5.5,  2.4,  3.7,  1. ],
       [ 5. ,  3.6,  1.4,  0.2],
       [ 6.4,  2.9,  4.3,  1.3],
       [ 6.7,  3. ,  5. ,  1.7],
       [ 5.2,  2.7,  3.9,  1.4],
       [ 6.4,  2.8,  5.6,  2.1],
       [ 4.3,  3. ,  1.1,  0.1],
       [ 6.7,  3. ,  5.2,  2.3],
       [ 5.6,  3. ,  4.5,  1.5],
       [ 6

In [8]:
X_test

array([[ 6.2,  2.2,  4.5,  1.5],
       [ 4.4,  3.2,  1.3,  0.2],
       [ 5.8,  2.7,  4.1,  1. ],
       [ 5.1,  3.8,  1.9,  0.4],
       [ 5.5,  2.4,  3.8,  1.1],
       [ 5.8,  2.7,  3.9,  1.2],
       [ 5.7,  2.9,  4.2,  1.3],
       [ 6.2,  2.9,  4.3,  1.3],
       [ 6.5,  3. ,  5.8,  2.2],
       [ 4.8,  3.4,  1.9,  0.2],
       [ 6.8,  3.2,  5.9,  2.3],
       [ 5.1,  3.4,  1.5,  0.2],
       [ 4.8,  3. ,  1.4,  0.1],
       [ 7.7,  3.8,  6.7,  2.2],
       [ 5.7,  4.4,  1.5,  0.4],
       [ 6.9,  3.1,  4.9,  1.5],
       [ 6.7,  3.3,  5.7,  2.1],
       [ 5.9,  3. ,  4.2,  1.5],
       [ 6.1,  2.9,  4.7,  1.4],
       [ 5. ,  3.4,  1.6,  0.4],
       [ 4.9,  3. ,  1.4,  0.2],
       [ 6.7,  3.3,  5.7,  2.5],
       [ 5.5,  4.2,  1.4,  0.2],
       [ 5.4,  3.9,  1.7,  0.4],
       [ 7.6,  3. ,  6.6,  2.1],
       [ 7.2,  3.6,  6.1,  2.5],
       [ 6. ,  2.7,  5.1,  1.6],
       [ 4.8,  3. ,  1.4,  0.3],
       [ 6.3,  3.3,  6. ,  2.5],
       [ 5.4,  3.4,  1.7,  0.2],
       [ 5

In [9]:
y_train

array([1, 1, 1, 2, 1, 2, 2, 1, 0, 1, 2, 2, 2, 0, 2, 0, 0, 1, 0, 1, 1, 1, 0,
       1, 1, 1, 2, 0, 2, 1, 1, 1, 0, 2, 1, 1, 0, 0, 1, 1, 0, 2, 2, 2, 2, 2,
       0, 2, 2, 2, 0, 0, 0, 1, 2, 1, 0, 0, 0, 1, 1, 1, 2, 1, 0, 0, 0, 0, 0,
       1, 1, 2, 1, 0, 2, 0, 2, 2, 0, 2, 2, 0, 1, 2, 0, 1, 0, 1, 2, 2])

In [10]:
y_test

array([1, 0, 1, 0, 1, 1, 1, 1, 2, 0, 2, 0, 0, 2, 0, 1, 2, 1, 1, 0, 0, 2, 0,
       0, 2, 2, 1, 0, 2, 0, 0, 0, 2, 1, 2, 0, 0, 1, 1, 2, 0, 1, 2, 2, 1, 2,
       1, 2, 0, 0, 0, 2, 0, 1, 2, 2, 2, 1, 2, 2])

####How many points do you have in your training set? 

In [11]:
print('There are ', len(X_train), 'points in the training set')

There are  90 points in the training set


####In your test set?

In [12]:
print('There are', len(X_test), 'points in the test set, which is', (len(X_test)/(len(X_test)+len(X_train)))*100, '% of the data.')

There are 60 points in the test set, which is 40.0 % of the data.


####Fit a linear Support Vector Classifier to the training set and evaluate its performance on the test set. 

What is the score? How does it compare to the score in the Support Vector Machine lesson?

In [13]:
from sklearn.svm import SVC
X=X_train
y=y_train
clf = SVC()
clf.fit(X,y)
clf.score(X,y)

0.97777777777777775

####What is the score?

In [14]:
print('The SVC score is', clf.score(X,y))

The SVC score is 0.977777777778


####How does it compare to the score in the Support Vector Machine lesson?

###Cross Validation

The more data we set aside for testing and validation, the less data we have for training, and this will negatively impact estimator performance. To resolve this problem, we can use cross validation (see lesson 4.1.5) to "recycle" data over different folds. In this assignment, we're going to implement 5-fold cross-validation on the Iris dataset to train and test a Support Vector Machine classifier.

####Compute the 5-fold cross-validation score of the SVC from the last assignment.

####Compute the mean score and the standard deviation of the scores.

In [15]:
from sklearn import cross_validation
import statsmodels.api as sm
import numpy as np

In [18]:
from sklearn.metrics import mean_squared_error
from sklearn.metrics import mean_absolute_error
from sklearn.metrics import r2_score
from sklearn import svm

In [21]:
kf = cross_validation.KFold(len(y), n_folds=5)
r2 = []
mae = []
mse = []
for train_index, test_index in kf:
    model = svm.svc(y_train,X_train)
    f = model.fit()
    y_pred = f.predict(X_test)
    r2.append(r2_score(np.squeeze(np.asarray(y_test)), y_pred))
    mse.append(mean_squared_error(np.squeeze(np.asarray(y_test)), y_pred))
    mae.append(mean_absolute_error(np.squeeze(np.asarray(y_test)), y_pred))

AttributeError: 'module' object has no attribute 'svc'

In [69]:
kf = cross_validation.KFold(len(y), n_folds=5)
r2 = []
mae = []
mse = []
for train_index, test_index in kf:
    model = sm.OLS(y_train,X_train)
    f = model.fit()
    y_pred = f.predict(X_test)
    r2.append(r2_score(np.squeeze(np.asarray(y_test)), y_pred))
    mse.append(mean_squared_error(np.squeeze(np.asarray(y_test)), y_pred))
    mae.append(mean_absolute_error(np.squeeze(np.asarray(y_test)), y_pred))

In [70]:
X_test

array([[ 5. ,  3. ,  1.6,  0.2],
       [ 5.2,  3.5,  1.5,  0.2],
       [ 5.7,  2.9,  4.2,  1.3],
       [ 5.7,  4.4,  1.5,  0.4],
       [ 4.9,  3. ,  1.4,  0.2],
       [ 7.7,  2.6,  6.9,  2.3],
       [ 5.6,  2.5,  3.9,  1.1],
       [ 5. ,  3.5,  1.3,  0.3],
       [ 6.3,  3.3,  4.7,  1.6],
       [ 6.2,  2.9,  4.3,  1.3],
       [ 5. ,  2.3,  3.3,  1. ],
       [ 4.9,  2.5,  4.5,  1.7],
       [ 5.1,  3.5,  1.4,  0.2],
       [ 5. ,  3.4,  1.6,  0.4],
       [ 7.7,  2.8,  6.7,  2. ],
       [ 4.3,  3. ,  1.1,  0.1],
       [ 6.3,  3.3,  6. ,  2.5],
       [ 5.3,  3.7,  1.5,  0.2],
       [ 5.1,  3.8,  1.5,  0.3],
       [ 4.7,  3.2,  1.3,  0.2],
       [ 6.5,  3. ,  5.2,  2. ],
       [ 7.2,  3.2,  6. ,  1.8],
       [ 7.3,  2.9,  6.3,  1.8],
       [ 5.1,  3.8,  1.9,  0.4],
       [ 6.6,  3. ,  4.4,  1.4],
       [ 6.7,  3.1,  5.6,  2.4],
       [ 4.6,  3.2,  1.4,  0.2],
       [ 6.5,  2.8,  4.6,  1.5],
       [ 6.5,  3. ,  5.5,  1.8],
       [ 6. ,  2.7,  5.1,  1.6],
       [ 5

In [37]:
r2

[0.94809477892140359,
 0.94809477892140359,
 0.94809477892140359,
 0.94809477892140359,
 0.94809477892140359]

In [38]:
mae

[0.14930339544048812,
 0.14930339544048812,
 0.14930339544048812,
 0.14930339544048812,
 0.14930339544048812]

In [39]:
mse

[0.037544776580184777,
 0.037544776580184777,
 0.037544776580184777,
 0.037544776580184777,
 0.037544776580184777]

As the sklean documentation notes, the default score computed at each cross-validation iteration is the estimator's accuracy. We could tell it to return the F1 score, precision, or recall, instead.

How do the accuracy scores compare to the F1 scores for this dataset?

In [48]:
import sklearn.metrics as skm

In [53]:
print("Accuracy = %f" %(skm.accuracy_score(y_test,y_pred)))

ValueError: Can't handle mix of multiclass and continuous

####Why doesn't the above code work?

In [66]:
y_test

array([0, 0, 1, 0, 0, 2, 1, 0, 1, 1, 1, 2, 0, 0, 2, 0, 2, 0, 0, 0, 2, 2, 2,
       0, 1, 2, 0, 1, 2, 1, 0, 1, 2, 0, 2, 2, 2, 1, 0, 2, 2, 1, 2, 1, 0, 0,
       1, 2, 2, 1, 0, 0, 1, 0, 2, 0, 1, 0, 0, 0])

In [72]:
y_pred

array([-0.03939035, -0.093304  ,  1.19345255, -0.0478905 , -0.0828931 ,
        2.25476896,  1.01691575, -0.07110942,  1.42301126,  1.16464357,
        0.86728198,  1.58655055, -0.10943756,  0.0662745 ,  2.0350162 ,
       -0.15204857,  2.26896606, -0.10616891, -0.03005057, -0.0894203 ,
        1.75767666,  1.78742809,  1.86074394,  0.1338879 ,  1.20071728,
        2.06171432, -0.05081547,  1.32278227,  1.73086084,  1.57108275,
       -0.05497586,  1.04047302,  2.02843298, -0.02590029,  1.78151594,
        1.95803731,  2.03081521,  1.21864441, -0.05571339,  1.79689653,
        1.98894173,  1.39490452,  2.0062485 ,  1.16444398,  0.04462797,
       -0.16008255,  1.31163384,  1.9363089 ,  2.01133591,  1.23170891,
       -0.24689857, -0.11080025,  1.4173758 , -0.01719577,  1.74128184,
       -0.11985856,  0.84092701, -0.08125373, -0.07001809, -0.16355079])

It appears y_pred is not given as 0, 1, or 2... Perhaps I just need to round?

In [80]:
y_pred.round()

array([-0., -0.,  1., -0., -0.,  2.,  1., -0.,  1.,  1.,  1.,  2., -0.,
        0.,  2., -0.,  2., -0., -0., -0.,  2.,  2.,  2.,  0.,  1.,  2.,
       -0.,  1.,  2.,  2., -0.,  1.,  2., -0.,  2.,  2.,  2.,  1., -0.,
        2.,  2.,  1.,  2.,  1.,  0., -0.,  1.,  2.,  2.,  1., -0., -0.,
        1., -0.,  2., -0.,  1., -0., -0., -0.])

In [89]:
P = abs(y_pred.round())
P

array([ 0.,  0.,  1.,  0.,  0.,  2.,  1.,  0.,  1.,  1.,  1.,  2.,  0.,
        0.,  2.,  0.,  2.,  0.,  0.,  0.,  2.,  2.,  2.,  0.,  1.,  2.,
        0.,  1.,  2.,  2.,  0.,  1.,  2.,  0.,  2.,  2.,  2.,  1.,  0.,
        2.,  2.,  1.,  2.,  1.,  0.,  0.,  1.,  2.,  2.,  1.,  0.,  0.,
        1.,  0.,  2.,  0.,  1.,  0.,  0.,  0.])

In [90]:
P.astype(int)

array([0, 0, 1, 0, 0, 2, 1, 0, 1, 1, 1, 2, 0, 0, 2, 0, 2, 0, 0, 0, 2, 2, 2,
       0, 1, 2, 0, 1, 2, 2, 0, 1, 2, 0, 2, 2, 2, 1, 0, 2, 2, 1, 2, 1, 0, 0,
       1, 2, 2, 1, 0, 0, 1, 0, 2, 0, 1, 0, 0, 0])

In [91]:
y_pred = P.astype(int)

Finished rounding, now find Accuracy, Precision, Recall, F1.

In [92]:
print("Accuracy = %f" %(skm.accuracy_score(y_test,y_pred)))

Accuracy = 0.983333


In [94]:
print("Precision = %f" %(skm.precision_score(y_test,y_pred)))

Precision = 0.984167


  sample_weight=sample_weight)


In [95]:
print("Recall = %f" %(skm.recall_score(y_test,y_pred)))

Recall = 0.983333


  sample_weight=sample_weight)


In [96]:
print("F1 score = %f" %(skm.f1_score(y_test,y_pred)))

F1 score = 0.983278


  sample_weight=sample_weight)
