### Cross Validation and Grid (numpy)
#### (Implementing k-fold cross validation using only numpy)

#### Cross Validation
Cross-validation is a model validation technique. It is usually used for algorithm fine tuning and along with gridsearch, it helps us find the best hyper-parameters for implementation.<br>

#### k-Fold
Is one of the technique for cross-validating information. This method is simple and very helpful when a limited amount of training data is available.<br>
To implement k-fold and Grid Search:
 1. Divide the training dataset in k-partitions (or folds).
 2. Train the model k-times for every hyper-parameter you want to evaluate, switching one of the partitions as 'testing' data, the rest as training. Keep track of the testing results.
 3. Choose the best hyper-parameters that in average give better results.
 
__In the following implementation we will just be creating k-Fold from scratch using lists and arrays.__

In [1]:
import numpy as np

In [2]:
def kFold(n, k):
    #Input:
    #  n: number of training examples
    #  k: number of folds
    #Output:
    #  test_groups: list with testing indices
    #  train_groups: list with training indices (both lists size equals to n)
    size = int(n/k)                            #size of each fold
    indices = list(np.random.permutation(n))   #indices in random order, we will be picking one by one from this list
    test_groups = [[] for _ in range(k)]       #empty test list
    train_groups = [[] for _ in range(k)]      #empty training list
    j=0
    for i in range(size*k):         #loop through indices (size*k will help us round down in case n/k is not integer
        if len(test_groups[j]) == size:
            j += 1
        test_groups[j].append(indices[i])
    for i in range(k):              #Training are all indices excepts the ones contained in each testing set
        train_groups[i] += ( list (np.random.permutation( list (set(list(range(n)))-set(test_groups[i])) ) ) )
    return test_groups, train_groups

In [3]:
#variables to change
n = 16          #number of samples
k = 5           #folds

In [4]:
test_groups, train_groups = kFold(n,k)
print('Test Groups')
for i in test_groups:
    print(i)
print('\nTrain Groups')
for i in train_groups:
    print(i)

Test Groups
[12, 7, 13]
[3, 2, 9]
[15, 1, 4]
[5, 6, 0]
[11, 10, 14]

Train Groups
[1, 2, 0, 8, 5, 6, 3, 4, 11, 15, 9, 10, 14]
[1, 7, 4, 5, 10, 6, 8, 0, 12, 14, 11, 15, 13]
[13, 10, 5, 7, 9, 0, 2, 3, 11, 14, 8, 12, 6]
[13, 15, 8, 10, 7, 3, 1, 12, 4, 11, 9, 2, 14]
[6, 8, 13, 15, 2, 1, 12, 5, 3, 0, 4, 9, 7]


In [5]:
#sample dataset
X = np.arange(n*k).reshape(n,k)
y = np.concatenate((np.ones(int(n/2)),-np.ones(n-int(n/2))),axis=0)
print('X-array (size:',X.shape,')\n',X,'\n')
print('y-array (size:',y.shape,'\n',y)

X-array (size: (16, 5) )
 [[ 0  1  2  3  4]
 [ 5  6  7  8  9]
 [10 11 12 13 14]
 [15 16 17 18 19]
 [20 21 22 23 24]
 [25 26 27 28 29]
 [30 31 32 33 34]
 [35 36 37 38 39]
 [40 41 42 43 44]
 [45 46 47 48 49]
 [50 51 52 53 54]
 [55 56 57 58 59]
 [60 61 62 63 64]
 [65 66 67 68 69]
 [70 71 72 73 74]
 [75 76 77 78 79]] 

y-array (size: (16,) 
 [ 1.  1.  1.  1.  1.  1.  1.  1. -1. -1. -1. -1. -1. -1. -1. -1.]


In [6]:
for i in range(k):
    x_train = X[train_groups[i],:]
    y_train = y[train_groups[i]]
    x_test = X[test_groups[i],:]
    y_test = y[test_groups[i]]
    print('TRAIN / TEST:',i+1,'\nX_train:','\n',x_train)
    print('y_train:','\n',y_train)
    print('\nX_test:','\n',x_test)
    print('y_test:','\n',y_test,'\n')
    print('- '*30)

TRAIN / TEST: 1 
X_train: 
 [[ 5  6  7  8  9]
 [10 11 12 13 14]
 [ 0  1  2  3  4]
 [40 41 42 43 44]
 [25 26 27 28 29]
 [30 31 32 33 34]
 [15 16 17 18 19]
 [20 21 22 23 24]
 [55 56 57 58 59]
 [75 76 77 78 79]
 [45 46 47 48 49]
 [50 51 52 53 54]
 [70 71 72 73 74]]
y_train: 
 [ 1.  1.  1. -1.  1.  1.  1.  1. -1. -1. -1. -1. -1.]

X_test: 
 [[60 61 62 63 64]
 [35 36 37 38 39]
 [65 66 67 68 69]]
y_test: 
 [-1.  1. -1.] 

- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - 
TRAIN / TEST: 2 
X_train: 
 [[ 5  6  7  8  9]
 [35 36 37 38 39]
 [20 21 22 23 24]
 [25 26 27 28 29]
 [50 51 52 53 54]
 [30 31 32 33 34]
 [40 41 42 43 44]
 [ 0  1  2  3  4]
 [60 61 62 63 64]
 [70 71 72 73 74]
 [55 56 57 58 59]
 [75 76 77 78 79]
 [65 66 67 68 69]]
y_train: 
 [ 1.  1.  1.  1. -1.  1. -1.  1. -1. -1. -1. -1. -1.]

X_test: 
 [[15 16 17 18 19]
 [10 11 12 13 14]
 [45 46 47 48 49]]
y_test: 
 [ 1.  1. -1.] 

- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - 
TRAIN / TEST: 3 
X_train: 
 [[65 66 67