# Evaluation of models

First, get all the functions and the classes from the `dectrees.ipynb` notebook, and copy them to a `tree.py` file. We will use this to import our implementation for this part.

In [1]:
from tree import DecisionTreeModel, read_csv, split_observations_and_labels, gini, entropy

Now, we can train our model the same as we did before:

In [2]:
dataset = "decision_tree_example.csv"
csv = read_csv(dataset)
data, labels = split_observations_and_labels(csv)

model = DecisionTreeModel(gini)
model.fit(data, labels)

model.tree_.print_tree()

0: google?
T->3: 18?
 T->{' Basic\n': 1}
 F->3:  18?
  T->{' None\n': 1}
  F->{' Premium\n': 3}
F->0: slashdot?
 T->{' None\n': 3}
 F->2:  yes?
  T->3: 19?
   T->{' Basic': 1}
   F->{' Basic\n': 3}
  F->3:  21?
   T->{' Basic\n': 1}
   F->{' None\n': 3}


Let's try to find the best parameters for our dataset. For this, first we will follow the procedure explained in the course slides:

1. Retrieve a wide set of examples
2. Divide this set into two sets: the training set and the test set
3. Build the classifier with the training set
4. Measure the percentage of examples of the test set that are correctly classified
5. Repeat steps 2 to 4 for different sizes of training and test sets chosen randomly

One iteration of the process might look like this:

In [6]:
def train_test_split(data, labels, train_pctg, rng):
    """
    Splits the input data and their corresponding labels in two
    sets, one for training and one for testing, according to a
    percentage (`train_pctg`) that specifies how many rows will
    end in the training set.

    This method will split the rows randomly, so a random generator
    is expected (see https://docs.python.org/3/library/random.html#random.Random).
    Note that with this class allows you to call the other functions in `random`
    as if they were methods from this class (e.g. `rng.randint(10, 20)`).

    This method returns the tuple `(train_data, test_data, train_labels, test_labels)`
    """
    idxs = list(range(len(data)))
    rng.shuffle(idxs)
    
    n_train_rows = int(train_pctg * len(data))
    train_idxs = idxs[:n_train_rows]

    train_data, test_data, train_labels, test_labels = [], [], [], []

    for idx, (row, label) in enumerate(zip(data, labels)):
        if idx in train_idxs:
            train_data.append(row)
            train_labels.append(label)
        else:
            test_data.append(row)
            test_labels.append(label)
    return train_data, test_data, train_labels, test_labels


from random import Random
rng = Random(42) # provide a seed to have reproducible results

train_data, test_data, train_labels, test_labels = train_test_split(data, labels, .67, rng)

print(train_data)
print(test_data)

print(train_labels)
print(test_labels)

[['google', ' France', ' yes', '23'], ['digg', ' USA', ' yes', '24'], ['(direct)', ' NewZealand', ' no', ' 12'], ['(direct)', ' UK', ' no', ' 21'], ['google', ' USA', ' no', ' 24'], ['slashdot', ' France', ' yes', '19'], ['digg', ' USA', ' no', ' 18'], ['google', ' UK', ' no', ' 18'], ['digg', ' NewZealand', ' yes', '12'], ['google', ' UK', ' yes', '18']]
[['slashdot', ' USA', ' yes', '18'], ['kiwitobes', ' France', ' yes', '23'], ['google', ' UK', ' no', ' 21'], ['kiwitobes', ' UK', ' no', ' 19'], ['slashdot', ' UK', ' no', ' 21'], ['kiwitobes', ' France', ' yes', '19']]
[' Premium\n', ' Basic\n', ' None\n', ' Basic\n', ' Premium\n', ' None\n', ' None\n', ' None\n', ' Basic\n', ' Basic\n']
[' None\n', ' Basic\n', ' Premium\n', ' None\n', ' None\n', ' Basic']


In [7]:
model = DecisionTreeModel(gini)
model.fit(train_data, train_labels)
print(model.score(test_data, test_labels))

0.16666666666666666


Implement the full procedure in the `evaluate_model` function. Note that we pass a `model`. The nice thing about our `DecisionTreeModel` class is that it can be "retrained" on new data. This "retrain" will forget about previous examples, so in essence we are training a model from scratch.

This means we do not need to pass the class and the parameters to the `evaluate_model` function, simplifying its definition. **Remember though that the model at the end is trained with the final iteration's data.**

In [11]:
def evaluate_model(model, data, labels, iterations, rng) -> float:
    score_sum = 0.0
    for i in range(iterations):
        train_X, test_X, train_y, test_y = train_test_split(data, labels, .67, rng)
        model.fit(train_X,train_y)
        score_sum += model.score(test_X,test_y)
    return score_sum / iterations

model = DecisionTreeModel(gini)
print(evaluate_model(model, data, labels, 5, Random(42)))

0.19999999999999998


**Exercise:** Try to use different training sizes for evaluating the model. Modify the `evaluate_model` function to have this as a parameter (with a default value).

# Hyper-parameter tuning

As we have seen, we have some parameters that modify the algorithm to train the model (e.g. the score function used in the decision tree build process). Those parameters are called **hyper-parameters**. By modifying those parameters we can impact the result of the algorithm, changing the generalization capabilities of the trained models.

We define as **hyper-parameter tuning** the process where we try to automatically select which values we set to those *hyper-parameters* such that we maximize the prediction capabilities on unseen data.

For this, we first need to define the **parameter space**, which is the domain (the valid values) of each hyper-parameter.

Let's define the parameter space for our `DecisionTreeModel` class:

- `scorefn`: $\{gini, entropy\}$
- `beta`: $[0, \infty)$ (note that different `scorefn` might result in impurities larger than 1, so we don't have an upper bound).
- `prune_threshold`: $[0, \infty)$

For now, we will define a discrete set of values for each parameter, and consider any possible combination of those values (i.e. the cartesian product).

In [12]:
parameter_space = {
    "scoref": [gini, entropy],
    "beta": [0, 0.1], # try some values
    "prune_threshold": [0, 0.1], # try some values
}

import itertools
def iterate_parametrizations(pspace):
    names = list(pspace.keys())
    for values in itertools.product(*pspace.values()):
        yield dict(zip(names, values))

We can see all the possible parametrizations with:

In [13]:
for params in iterate_parametrizations(parameter_space):
    print(params)

{'scoref': <function gini at 0x79bbe0352200>, 'beta': 0, 'prune_threshold': 0}
{'scoref': <function gini at 0x79bbe0352200>, 'beta': 0, 'prune_threshold': 0.1}
{'scoref': <function gini at 0x79bbe0352200>, 'beta': 0.1, 'prune_threshold': 0}
{'scoref': <function gini at 0x79bbe0352200>, 'beta': 0.1, 'prune_threshold': 0.1}
{'scoref': <function entropy at 0x79bbe03522a0>, 'beta': 0, 'prune_threshold': 0}
{'scoref': <function entropy at 0x79bbe03522a0>, 'beta': 0, 'prune_threshold': 0.1}
{'scoref': <function entropy at 0x79bbe03522a0>, 'beta': 0.1, 'prune_threshold': 0}
{'scoref': <function entropy at 0x79bbe03522a0>, 'beta': 0.1, 'prune_threshold': 0.1}


Now let's define our evaluate function, and try to find the best parameters for our example dataset.

In [14]:
def select_best_parameters(data, labels, parameter_space, rng):
    best_params, best_score = None, 0.0

    for params in iterate_parametrizations(parameter_space):
        model = DecisionTreeModel(**params)
        score = evaluate_model(model, ...)
        ...
    
    return best_params, best_score


best, score = select_best_parameters(data, labels, parameter_space, Random(42))
print("Best parameters for our model are", best, "with an score of", score)

TypeError: evaluate_model() missing 3 required positional arguments: 'labels', 'iterations', and 'rng'

# Cross validation
Another way to select the best parameters is to perform the cross validation procedure.

![CV procedure](https://scikit-learn.org/stable/_images/grid_search_cross_validation.png).

The process is similar to what we have seen, but instead of doing N iterations over different random splits, we separate the training data in different splits. Then, at each iteration, we use one split as the test data, and combine the other ones as the train data. We do this until we have used all the splits as the test split.

In [None]:
def cross_val_score(model, data, labels, rng, k):
    ...


**Exercise:** Modify the `select_best_parameters` function to accept a scoring function. Note that different scoring functions can have different parameters, but they must accept at least `model, data, labels`.

Option 1: Use lambda functions (or [partial](https://docs.python.org/3/library/functools.html#functools.partial) for a cleaner approach) to define a function with the other parameters set, and then have `select_best_parameters` to call only `fn(model, data, labels)`:

```python
def select_best_parameters(data, labels, pspace, fn):
    ...
    fn(model, data, labels)
    ...

... = select_best_parameters(data, labels, pspace, lambda m,d,l: evaluate_model(m, d, l, 5, Random(42)))
```

Option 2: Use `*args, **kwargs` in `select_best_parameters` to pass additional parameters:
```python
def select_best_parameters(data, labels, pspace, fn, *args, **kwargs):
    ...
    fn(model, data, labels, *args, **kwargs)
    ...

... = select_best_parameters(data, labels, pspace, evaluate_model, 5, Random(42))
```


Try now to select the best parameters using cross validation instead.