# Simple hold-out validation
Simply set appart some fraction of the data set as the test set. Train on the remaining data and evaluate on the test set. 

IMPORTANT: to prevent *information leaks* to the NN, the model mustn't be tuned based on the test set, so a good practice is to *also* reserve a validation set.

In [None]:
num_validation_samples = 10000

Shuffle is usually appropriate (except when time is an important feature!)

In [None]:
np.random.shuffle(data)

Defines validation and training set:

In [None]:
validation_data = data[:num_validation_samples]
data = data[num_validation_samples:]

training_data = data[:]

Trains the model on the training data, and evaluates it on the validation data:

In [None]:
model = get_model()
model.train(training_data)
validation_score = model.evaluate(validation_data)

# At this point you can tune your model,
# retrain it, evaluate it, tune it again...
# using, as discussed, the validation set

Once the hyperparameters are turned, it's common to train the final model from scratch on all non-test data available

In [None]:
model = get_model()
model.train(np.concatenate([training_data,
                        validation_data]))

test_score = model.evaluate(test_data)

**Overview**

This is a very simple evaluation protocol that suffers from one flaw: if little data is available, the the validation and test sets may contain too few examples. This problem is easy to stop: if different random shuffling round of the data before splitting end up yielding very different measures of model performance, is safe to assume the sets are too small. The solutions are discussed next:

# K-fold validation

The steps as:
- split your data into K partitions of equal size. 
- For each partition i , train a model on the remaining K – 1 partitions, and evaluate it on partition i .
- Calculate the final score as the averages of the K scores obtained

In [None]:
k = 4
num_validation_samples = len(data) // k

np.random.shuffle(data)

validation_scores = []
for fold in range(k):
    validation_data = data[num_validation_samples * fold:
    num_validation_samples * (fold + 1)]
    training_data = data[:num_validation_samples * fold] +
        data[num_validation_samples * (fold + 1):]
        
    model = get_model()
    model.train(training_data)
    validation_score = model.evaluate(validation_data)
    validation_scores.append(validation_score)

validation_score = np.average(validation_scores)
model = get_model()
model.train(data)
test_score = model.evaluate(test_data)

### Iterated K-fold valdiation with shuffling
When there's relatively little data available, a good practice is to apply K-fold validation multiple times, shuffling the data every time *before* splitting in *K* ways.

The final score is the average of the scores obtained at each run of K-fold validation.

This can be expensive, as it involves training and evaluation *P x K* models (where *P* is the number of iterations used).

**IMPORTANT:** If the objective is to predict the future given the past, we must *NOT* randomly shuffle the data before splitting -> temporal leak!