# Tutorial 01: Validating models

In this notebook, we can see how to validate models. Note that it requires to have run the TUTORIAL_00 notebook first so we precompute the features that will be used in this notebook.

In [10]:
import json
from math import ceil
from time import time
import pandas as pd
from damage.data import DataStream
from damage.models import CNN, RandomSearch

In [4]:
features = pd.read_pickle('../logs/features/test.p')
features.head()

Unnamed: 0_level_0,Unnamed: 1_level_0,Unnamed: 2_level_0,damage_num,destroyed,raster_date,latitude,longitude,location_index,image
city,patch_id,date,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1
daraa,960-3520,2013-09-07,0.0,0,2017-02-07,32.642095,36.073268,444970884350,"[[[90, 73, 74, 33, 20, 16], [90, 77, 74, 16, 4..."
daraa,1600-3520,2013-09-07,0.0,0,2017-02-07,32.642095,36.076701,444970884359,"[[[107, 97, 90, 82, 61, 49], [107, 97, 90, 90,..."
daraa,1600-4160,2013-09-07,0.0,0,2017-02-07,32.638662,36.076701,444958290923,"[[[123, 121, 123, 99, 89, 82], [132, 125, 123,..."
daraa,2240-3520,2013-09-07,0.0,0,2017-02-07,32.642095,36.080134,444970884368,"[[[123, 121, 107, 255, 227, 206], [115, 117, 1..."
daraa,2240-4160,2013-09-07,0.0,0,2017-02-07,32.638662,36.080134,444958290932,"[[[189, 178, 173, 255, 178, 165], [156, 150, 1..."


We will make use of three custom classes: __RandomSearch__, __CNN__ and __DataStream__. __RandomSearch__ is a class that samples hyperparameters for ML models. As of may 2019, only the space for cnn's has been implemented. __CNN__ is a class that defines a Convolutional Neural Network model and follows the standards of Sklearn and Keras APIs, containing methods called fit, predict, fit_generator, predict_generator and validate_generator. In this case, we make use of the validate_generator method, which takes a generator of data as required by Keras's fit_generator method: each batch yields a tuple of (features, target). We use the __DataStream__ object to create these generators, first by generating the indices with the split_by_path_id method, which follows the standards of Sklearn splitters and then with the get_data_generator_from_index method that turns those indices into data generators.

In [9]:
#### Modelling
sampler = RandomSearch()
Model = CNN
spaces = sampler.sample_cnn(1) # Only one space will be sampled for the purpose of the tutorial
for space in spaces:
    data_stream = DataStream(batch_size=space['batch_size'], train_proportion=0.8)
    num_batches = ceil(len(features) / space['batch_size'])
    train_index_generator, test_index_generator = data_stream.split_by_patch_id(features['image'])
    train_generator = data_stream.get_data_generator_from_index(
        [features['image'], features['destroyed']], train_index_generator)
    test_indices = list(test_index_generator)
    test_generator = data_stream.get_data_generator_from_index(
        [features['image'], features['destroyed']], test_indices)
    space['epochs'] = 1 # Epochs set to 1 for the tutorial
    space['class_weight'] = {
        0: features['destroyed'].mean(),
        1: 1 - features['destroyed'].mean(),
    }
    model = Model(**space)
    losses = model.validate_generator(train_generator, test_generator,
                                      steps_per_epoch=num_batches,
                                      validation_steps=1,
                                      **space)
    losses['model'] = str(Model)
    losses['space'] = space
    losses['features'] = 'test.p'
    with open('logs/experiments/test_{}.json'.format(round(time())), 'w') as f:
        json.dump(str(losses), f)



NameError: name 'time' is not defined