# Model Evaluation

Model evaluation is done by splitting the available data into three sets: training data, validation data and test data. Training data is used to train the model; validation data is used to find the best configuration parameters (hyperparameters) of the model; and the test data is used to evaluate the generalizability of the model.

We consider three classic evaluation recipes: 

* simple hold-out validation
* $K$-fold validation, and 
* iterated $K$-fold validation with shuffling.

## Simple Hold-Out Validation

In [1]:
import numpy as np
from typing import List, Tuple


def train_val_split(data: np.array, val_set_frac: float) -> Tuple[np.array, np.array]:
    assert val_set_frac >= 0 and val_set_frac <= 1
    
    local_cpy = np.copy(data)
    np.random.shuffle(local_cpy)
    
    idx = int(len(local_cpy) * val_set_frac)
    val_data = local_cpy[: idx]
    train_data = local_cpy[idx :]
    
    return train_data, val_data

One tests a set of models with different configurations (hyperparameters) and chooses the best configuration.

```python
scores = []
for model_config in list_of_configurations:
    model = get_model(model_config)
    train, val = train_val_split(training_data)
    model.fit(train)
    scores.append(model.evaluate(val))
```

Then the chosen model is evaluated with the test data.
```python
model = get_model(chosen_config)                                        
model.train(training_data)             
test_score = model.evaluate(test_data) 
```