## K-Fold Cross Validation
Testing accuracy for just once doesn't account for the variance in the data and might give misleading results. K-Fold validation randomly selects one of $k$ parts of the data set then tests the accuracy on the same. After required number of iterations, the accuracy is averaged

In [1]:
import pandas as pd
import matplotlib.pyplot as plt
import numpy as np

df = pd.read_csv('Social_Network_Ads.csv')
X = df.iloc[:, 2:4]   # Using 1:2 as indices will give us np array of dim (10, 1)
y = df.iloc[:, 4]

df.head()

Unnamed: 0,User ID,Gender,Age,EstimatedSalary,Purchased
0,15624510,Male,19,19000,0
1,15810944,Male,35,20000,0
2,15668575,Female,26,43000,0
3,15603246,Female,27,57000,0
4,15804002,Male,19,76000,0


In [2]:
# Scale
from sklearn.preprocessing import StandardScaler
X_sca = StandardScaler()
X = X_sca.fit_transform(X)

In [5]:
from sklearn.model_selection import KFold


kfold_cv = KFold(n_splits=10)

for train_index, test_index in kfold_cv.split(X):
    X_train, Xtest = X[train_index], X[test_index]
    y_train, y_test = y[train_index], y[test_index]
    

In [6]:
from sklearn.svm import SVC #support vector classifier
clf = SVC(kernel='linear', random_state=0).fit(X_train, y_train)


In [10]:
# applying k-fold cross validation
from sklearn.model_selection import cross_val_score
accuracies = cross_val_score(clf, X_train, y_train, cv=10)
print accuracies
print accuracies.mean()
print accuracies.std()

[ 0.90322581  0.90322581  0.77419355  0.87096774  0.77419355  0.86206897
  0.82758621  0.68965517  0.79310345  0.89655172]
0.829477196885
0.0671935884472


## Leave one out cross validation

Another type of cross validation is leave one out cross validation. Out of the $n$ samples, one of them is left out and the model is trained on other samples. When number of samples = n then K-Fold validation is same as leave one out cross validation

In [3]:
X.shape

(400, 2)