There are a myriad of decisions you must make when conﬁguring your deep learning models. Many of these decisions can be resolved by copying the structure of other people’s networks and using heuristics. Ultimately, the best technique is to actually design small experiments and empirically evaluate options using real data. This includes:
- high-level decisions
  - the number, size and type of layers in your network. 
- lower-level decisions 
  - the choice of **loss** function, **activation** functions, **optimization** procedure and number of **epochs**.
  
As such, you need to have a robust test harness that allows you to estimate the performance of a given conﬁguration on unseen data, and reliably compare the performance to other conﬁgurations.

**Splitting:** The large amount of data and the complexity of the models require very long training times. As such, it is typical to use a simple separation of data into training and test(validation) datasets. Keras provides two convenient ways of evaluating your deep learning algorithms this way:
1. Use an automatic veriﬁcation dataset.
2. Use a manual veriﬁcation dataset.

## 1. Automatic verification dataset
Keras can separate a portion of your **training data** into a **validation data** and evaluate the performance of your model on that validation dataset **each epoch** by setting the validation split argument on the **`fit()`** function to a percentage of the size of your training dataset.  

In [None]:
import pandas as pd
df = pd.read_csv('pima-indians-diabetes.csv', header=None)
#X = df[:][0:7] it won't work..[row][fucked] coz it's a series..the multi-col is not allowed.
#y = df[:][8] it works..[row][col] coz it's a series..

X = df.iloc[:, 0:8] #dataframe
y = df.iloc[:, 8] #series

# df.shape : 768 x 9

In [None]:
import numpy as np
np.random.seed(47)

data = np.loadtxt('pima-indians-diabetes.csv', delimiter=',')
X = data[:, 0:8] #array
y = data[:,8] #array

In [None]:
from keras.models import Sequential
from keras.layers import Dense

model = Sequential()
model.add(Dense(12, input_dim=8, activation='relu'))
model.add(Dense(8, activation='relu'))
model.add(Dense(1, activation='sigmoid'))

model.compile(loss='binary_crossentropy', optimizer='adam', metrics=['accuracy'])

model.fit(X, y, validation_split=0.33, epochs=150, batch_size=10)
# batch_size defines number of samples that going to be propagated through the network.
#here, Algorithm takes first 10 samples (from 1st to 10th) from the training dataset and trains network
#(that are evaluated before a weight update in the network). Next it takes second 10
#samples (from 11st to 20th) and train network again. We can keep doing this procedure until we will propagate through the 
#networks all samples. The problem usually happens with the last set of samples. In our example we've used 768 which is not 
#divisible by 10 without remainder. The simplest solution is just to get final 8 samples and train the network.
# (+)Typically networks trains faster with mini-batches. That's because we update weights after each propagation.
# (-)The smaller the batch the less accurate estimate of the gradient.

# 150 epoch = 150 forward pass and 150 backward pass of all the training examples.(licking)
# batch size = the number of training examples in one forward/backward pass. (one dish)
# number of iterations = number of passes, each pass using [batch size] number of examples. (size of dishes)

#To be clear, one pass = one forward pass + one backward pass (we do not count the forward pass and backward pass as two 
#different passes)..for Example: if you have 1000 training examples, and your batch size is 500, then it will take 2 iterations.

#you can see that the verbose output on each epoch that shows the loss and accuracy on both the training dataset(loss, acc) and
#the validation dataset(val_loss, val_acc).

## 2. Manual verification dataset
In this example we use the handy **`train_test_split()`** function from the scikit-learn library to separate our data into a training and test dataset. The validation dataset can be speciﬁed to the **`fit()`** function in Keras by the **validation_data** argument. It takes a tuple of the input and output datasets.

In [None]:
from sklearn.model_selection import train_test_split

X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.33, random_state=47)

model = Sequential()
model.add(Dense(12, input_dim=8, activation='relu'))
model.add(Dense(8, activation='relu'))
model.add(Dense(1, activation='sigmoid'))

model.compile(loss='binary_crossentropy', optimizer='adam', metrics=['accuracy'])

model.fit(X_train, y_train, validation_data=(X_test, y_test), epochs=150, batch_size=10, verbose=1)

## 3. K-fold Cross Validation
It splits the **training dataset** into **k subsets** and takes turns training models on all subsets except one which is held out, and evaluating model performance on the **held out validation dataset**. The process is repeated until all subsets are given an opportunity to be the **held out validation set**. The performance measure is then **averaged across all models** that are created. 

Cross-validation is often not used for evaluating deep learning models because of the greater computational expense. For example k-fold cross-validation is often used with 5 or 10 folds. As such, 5 or 10 models must be constructed and evaluated, greatly adding to the evaluation time of a model. Nevertheless, when the problem is small enough or if you have sufficient compute resources, k-fold cross-validation can give you a less biased estimate of the performance of your model. 

In the example below we use the handy **StratifiedKFold** class from the scikit-learn library to split up the training dataset into 10 folds. The folds are **stratiﬁed**, meaning that the algorithm **balances the number of instances** of each class in each fold. The example creates and evaluates 10 models using the 10 splits of the data and collects all of the scores. The verbose output for each epoch is turned oﬀ by passing verbose=0 to the **`fit()`** and **`evaluate()`** functions on the model. 

In [None]:
from sklearn.model_selection import StratifiedKFold

Kfold = StratifiedKFold(n_splits=10, shuffle=True, random_state=47)

cv_scores = []
for tr, te in Kfold.split(X,y):
    model = Sequential()
    model.add(Dense(12, input_dim=8, activation='relu'))
    model.add(Dense(8, activation='relu'))
    model.add(Dense(1, activation='sigmoid'))
    model.compile(loss='binary_crossentropy', optimizer='adam', metrics=['accuracy'])
    
    model.fit(X[tr], y[tr], epochs=150, batch_size=10, verbose=0)
    scores = model.evaluate(X[te], y[te], verbose=0)
    print("%s: %.2f" %(model.metrics_names[1], scores[1]))
    cv_scores.append(scores[1])
    
print("mean: %.2f std: (+/- %.2f)" %(np.mean(cv_scores), np.std(cv_scores)))

# Here, can we streamline it ? 

SKLearn offers 'model', 'validation tool', 'tuning hyperparameters(Grid_Search)

### 1) Evaluate Models with Cross-Validation

**KerasClassifier()** and **KerasRegressor()** in keras take an argument **build_fn** which is the **name** of the function to call to create your model. 
- You should define the **function** that define your model, compile your model, and returns it. Let's deﬁne a function **`create_model()`** that create a simple multilayer neural network for the problem.  

- We also pass in additional arguments of **epochs=150** and **batch_size=10**. These are automatically bundled up and passed on to the **`fit()`** function which is called internally by the **KerasClassifier()**. 

To split dataset: use **`StratifiedKfold()`**

To evaluate the performance using the cross-validation scheme: use **`cross_val_score()`**. 

You can see that when the Keras model is wrapped that estimating model accuracy can be **greatly streamlined**, compared to the manual enumeration of cross-validation folds. 

In [None]:
# MLP(Multi Layered Perceptrons) for 'Pima_Indians_Dataset' with 10-fold_cross_validation via sklearn..

from keras.models import Sequential
from keras.layers import Dense
from keras.wrappers.scikit_learn import KerasClassifier

# Define the function to create model, required for 'KerasClassifier()'
def create_model():
    model = Sequential()
    model.add(Dense(12, input_dim=8, activation='relu'))
    model.add(Dense(8, activation='relu'))
    model.add(Dense(1, activation='sigmoid'))
    model.compile(loss='binary_crossentropy', optimizer='adam', metrics=['accuracy'])
    return(model)

# this already contains 'fit()'
model = KerasClassifier(build_fn=create_model, epochs=150, batch_size=10, verbose=0)


# splitting dataset + fit and evaluation
# A total of 10 models are created and evaluated and the ﬁnal average accuracy is displayed
from sklearn.model_selection import StratifiedKFold
from sklearn.model_selection import cross_val_score

kfold = StratifiedKFold(n_splits=10, shuffle=True, random_state=47)
result = cross_val_score(model, X, y, cv = kfold)

result.mean()

### 2) Grid-Search CV!! Model parameters
The previous example showed how easy it is to **wrap your deep learning model from Keras** and use it in functions from the scikit-learn library. In this example we go a step further. We already know we can provide arguments to the fit() function. **The function** that we specify to the **build_fn** argument when creating the `KerasClassifier()` wrapper can also take **arguments**. We can use these arguments to further customize the construction of the model. 

We use **GridSearchCV** to evaluate different conﬁgurations for our neural network model and report on the combination that provides the best estimated performance. 

The **create_model()** function is deﬁned to take two arguments **optimizer** and **init**, both of which must have default values. This will allow us to evaluate the effect of using different **1)optimization algorithms** and **2)weight initialization schemes** for our network. 

>After creating our model, we deﬁne **arrays of values** for the parameter we wish to search, speciﬁcally
>- **Optimizers** for searching diﬀerent weight values
>- **Initializers** for preparing the network weights using dfferent schemes
>- Number of **epochs** for training the model for different number of exposures to the training dataset.
>- **Batches** for varying the number of samples before weight updates

The options are specfied into a **dictionary** and passed to the conﬁguration of the GridSearchCV scikit-learn class.

This class will evaluate a version of our neural network model for each combination of parameters (2 × 3 × 3 × 3) for the combinations of **optimizers**, **initializations**, **epochs** and **batches**). Each combination is then evaluated using the default of **3-fold** stratifiedKFold(). 

Finally, the performance and combination of conﬁgurations for the best model will be displayed, followed by the performance of all combinations of parameters.

In [None]:
# MLP(multi layered perceptrons) for Pima Indians Dataset with grid search via sklearn 

from keras.models import Sequential
from keras.layers import Dense
from keras.wrappers.scikit_learn import KerasClassifier
from sklearn.model_selection import GridSearchCV

# Function to create model, required for KerasClassifier 
def create_model(optimizer='rmsprop', init='glorot_uniform'): 
    model = Sequential() 
    model.add(Dense(12, input_dim=8, kernel_initializer=init, activation='relu')) 
    model.add(Dense(8, kernel_initializer=init, activation='relu')) 
    model.add(Dense(1, kernel_initializer=init, activation='sigmoid')) 
    # Compile model 
    model.compile(loss='binary_crossentropy', optimizer=optimizer, metrics=['accuracy']) 
    return(model)

# create model 
model = KerasClassifier(build_fn=create_model, verbose=0) 

# grid search epochs, batch_sizes and optimizers 
optimizers = ['rmsprop','adam']
inits = ['glorot_uniform','normal','uniform']
epochs = [50,100,150]
batches = [5, 10, 20]

param_grid = dict(optimizers = optimizers, initializer=inits, epochs = epochs, batches=batches)
grid_result = GridSearch(estimator=model, param_grid = param_grid).fit(X,y)

# summarize results
print('best score is %f, using %s' %(grid_result.best_score_, grid_result.best_params_))

means = grid_result.cv_results_['mean_test_score']
stds = grid_result.cv_results_['std_test_score']
params = grid_result.cv_results_['params']
for i,j,k in zip(means, stds, params):
    print('mean: %f ,std: %f with %r' %(i,j,k))