# Train multi-class classifier.

We train a multi-class classifier to compare the results of $ClassSim$ computed using the multi-class classifier with those of OVR case.  
We only require one classifier as we stated in our paper.

## Set up

In [1]:
import os
import sys

import numpy as np

import pandas as pd
import glob

import warnings
warnings.filterwarnings('ignore')

In [2]:
BASE_MODEL_PATH="trained_model"
%mkdir -p $BASE_MODEL_PATH

In [3]:
SAVE_MODEL_PATH="{}/multiclass".format(BASE_MODEL_PATH)
%mkdir -p $SAVE_MODEL_PATH

## Data preparation

In [4]:
from models.modelutils import dir2filedict, split_fdict

Using TensorFlow backend.


Load category and file path information.

In [5]:
fdict = dir2filedict("data")

In [6]:
categories = sorted(fdict.keys())

Split data int {train, validation, test} datasets.

In [7]:
trdict, testdict = split_fdict(fdict, test_size=0.2, random_state = 123)

In [8]:
trdict, valdict = split_fdict(trdict, test_size=0.2, random_state = 456)

In [9]:
valdict['clouds'][0:5]

['data/clouds/0678.jpeg',
 'data/clouds/0701.jpeg',
 'data/clouds/0431.jpeg',
 'data/clouds/0033.jpeg',
 'data/clouds/0290.jpeg']

Here is expected outputs.   
The output may be different if you create image urls yourself or exlude some files for GMM, but all the outputs in {*train.ipynb*, *classifier_similarity.ipynb*, *train_multiclass_classifier.ipynb*, *train_second.ipynb*} must be the same. 

['data/clouds/0678.jpeg',  
 'data/clouds/0701.jpeg',  
 'data/clouds/0431.jpeg',  
 'data/clouds/0033.jpeg',  
 'data/clouds/0290.jpeg']

### Copy images files into temporary directories

In order to handle datasets as a suitable format of Keras ImageDataGenerator, images are copied into temporary directories with a specific structure.

In [10]:
import tempfile
import shutil

In [11]:
tmp_train_dir = tempfile.TemporaryDirectory()
tmp_valid_dir = tempfile.TemporaryDirectory()
tmp_test_dir = tempfile.TemporaryDirectory()

In [12]:
def copy_images(tmp_dir, data_dict):
    for cat in data_dict.keys():
        os.makedirs("{}/{}".format(tmp_dir.name, cat), exist_ok=True)
        for img_path in data_dict[cat]:
            img_name = img_path.split("/")[-1]
            shutil.copy2(img_path, "{}/{}/{}".format(tmp_dir.name, cat, img_name))

In [13]:
%%time
copy_images(tmp_train_dir, trdict)

CPU times: user 1.09 s, sys: 5.41 s, total: 6.5 s
Wall time: 6.5 s


In [14]:
%%time
copy_images(tmp_valid_dir, valdict)

CPU times: user 264 ms, sys: 1.29 s, total: 1.56 s
Wall time: 1.56 s


In [15]:
%%time
copy_images(tmp_test_dir, testdict)

CPU times: user 344 ms, sys: 1.58 s, total: 1.92 s
Wall time: 2.03 s


### Create ImageDataGenerator

In [16]:
from keras.preprocessing.image import ImageDataGenerator

In [17]:
IMG_SIZE = 256
BATCH_SIZE = 32

In [18]:
TRAIN_DATAGEN = ImageDataGenerator(
        rescale=1./255,
)

TRAIN_GENERATOR = TRAIN_DATAGEN.flow_from_directory(
        directory=tmp_train_dir.name,
        target_size=(IMG_SIZE, IMG_SIZE),
        class_mode='sparse',
        batch_size=BATCH_SIZE,
)

Found 7489 images belonging to 16 classes.


In [19]:
VALID_DATAGEN = ImageDataGenerator(
        rescale=1./255,
)

VALID_GENERATOR = VALID_DATAGEN.flow_from_directory(
        directory=tmp_valid_dir.name,
        target_size=(IMG_SIZE, IMG_SIZE),
        class_mode='sparse',
        batch_size=BATCH_SIZE,
)

Found 1880 images belonging to 16 classes.


In [20]:
TEST_DATAGEN = ImageDataGenerator(
        rescale=1./255,
)

TEST_GENERATOR = TEST_DATAGEN.flow_from_directory(
        directory=tmp_test_dir.name,
        target_size=(IMG_SIZE, IMG_SIZE),
        class_mode='sparse',
        batch_size=1,
)

Found 2352 images belonging to 16 classes.


## Train multi-class classifier and save it

In [21]:
from keras.applications.inception_v3 import InceptionV3
from keras.models import Model, model_from_json
from keras.layers import Dense, GlobalAveragePooling2D
from keras import optimizers

In [22]:
base_model = InceptionV3(weights='imagenet', include_top=False)
x = base_model.output
x = GlobalAveragePooling2D()(x)
x = Dense(1024, activation='relu')(x)
predictions = Dense(TRAIN_GENERATOR.num_classes, activation='softmax')(x)

model = Model(inputs=base_model.input, outputs=predictions)

In [23]:
for layer in model.layers[:len(base_model.layers)]:
    layer.trainable = False
for layer in model.layers[len(base_model.layers):]:
    layer.trainable = True

In [24]:
optimizer = optimizers.Adam(lr=0.001, decay=0.01)

In [25]:
model.compile(optimizer=optimizer, loss='sparse_categorical_crossentropy', metrics=["accuracy"])

Execute training.

In [26]:
model.fit_generator(
    generator=TRAIN_GENERATOR
    , steps_per_epoch=TRAIN_GENERATOR.n // BATCH_SIZE 
    , epochs=5
    , verbose=1
    , validation_data=VALID_GENERATOR
    , validation_steps=VALID_GENERATOR.n // BATCH_SIZE
)

Epoch 1/5
Epoch 2/5
Epoch 3/5
Epoch 4/5
Epoch 5/5


<keras.callbacks.History at 0x7f9138c46080>

Save the trained classifier.

In [30]:
import json

In [31]:
model.save_weights('{}/multiclass.h5'.format(SAVE_MODEL_PATH))
with open("{}/multiclass.json".format(SAVE_MODEL_PATH), 'w') as f:
    json.dump(json.loads(model.to_json()), f) # model.to_json() is a STRING of json
with open("{}/multiclass-labels.json".format(SAVE_MODEL_PATH), 'w') as f:
    json.dump(TRAIN_GENERATOR.class_indices, f)

## Evaluate trained model under experiment of simple classification

Evaluation of the trained classifier with 16 classes multi-class classification using test datasets.  
This evaluation is not related to our paper.

In [32]:
%%time

model.evaluate_generator(
    TEST_GENERATOR
    , steps=TEST_GENERATOR.n
)

CPU times: user 5min 51s, sys: 17.3 s, total: 6min 8s
Wall time: 2min 39s


[0.73933519242161094, 0.73979591836734693]

left: loss, right: accuracy