# K-Fold Testing Tool

This is a set of tools that should help to get up to speed when delivering Visual Recognition projects. It provides helpers to simplify the training, testing and evaluation of classifiers.
This particular tool helps you to automate k-fold cross validation for IBM Watson Visual Recognition.

## Features
- K-Fold Cross Validation
- Persisting of train, test and result sets

## Image Corpus Layout

Currently the tooling is working with image corpora that are file and folder based. An image corpus can consist of several folders. Each folder represents a class the respective classifier will be able to recognize. Each class folder  contains all images that will be used to train and test the classifier on this class. If only one class per classifier is given, a folder called negative_examples is needed as well.

To get a better understanding of the layout, take a look at this sample folder hierarchy (also contained in this project):

```
 ./corpus
     /audi
         /athree
             3_1.jpg
             ...
         /afour
             a4_1.jpg
             ...
     /mercedes_corpus
         /sclass
             sclass_1.jpg
             ...
         /negative_examples
             negative_sclass_1.jpg
             ...
```

# Initialization

In [None]:
# import basic libraries
import time
import os
import sys
import pickle
import json
import numpy as np
import pandas as pd

# import sklearn helpers
from sklearn.model_selection import train_test_split
from sklearn.model_selection import StratifiedKFold
from sklearn import metrics

# import custom VR tooling libs
import vrtool

# Configuration


When using this tool for the first time, you'll find a file called **dummy.config.ini** which needs to be copied and renamed to **config.ini**.


Configure *your* tool by entering your IAM API key and URL of the Visual Recognition service instance.
```
[vr]
IAM_API_KEY:your_IAM_api_key
URL:your_service_url
```

# Corpus Overview & Statistics

The following section provides an extensive overview of the image corpus and statistics of the same.

In [None]:
# The name of the folder that contains the corpora in your project
corpora_folder_name = '../corpus'
config_name = 'config.ini'

#Load config and setup tool
runner = vrtool.Runner(corpora_folder_name, config_name=config_name)

# Print a summary of the available corpora in the corpora directory
corpora = runner.get_available_corpora()

print('\nAvailable image corpora:')
print('\n'.join('{}: {[0]}'.format(*el) for el in enumerate(corpora)))

In [None]:
# Print a detailed overview of the different classes and their distribution within each corpus
corpora_info = runner.get_corpora_info(corpora)

# Create Test / Training Sets

In this step the training and test sets for a specific classifier are created in the follwoing steps:
1. Determine the corpus to be used by setting the **corpus_to_train** variable to a corpus name in your corpora folder (e.g. bmw)
2. Set the number of splits **k** for K-Fold cross validation
3. Check if **(k-1) * number of images per class > 10**, otherwise the class won't be used for training

In [None]:
# Select the name of the corpus for which a classifier will be trained
corpus_to_train = 'mercedes'

# Number of splits k for K-Fold cross validation
splits = 4

# Select the right corpus based on the value of corpus_to_train and filter out classes 
# with less than 10 images for training
img_info = [el['image_info'] for el in corpora_info if el['corpus_name'] == corpus_to_train ][0]
negative_examples = [el['negative_examples'] for el in corpora_info if el['corpus_name'] == corpus_to_train ][0]
img_info = img_info.groupby('class_name').filter(lambda x: (splits-1)*len(x) >= 70)

print("Classifier to be trained:", corpus_to_train)
print("Classes to be trained:", img_info['class_name'].unique())
print(img_info.class_name.value_counts())

## Create Experiments
Training and testing sets for the Stratified K Fold cross validation are created. The stratification is based on the class_name labels in the data set.

In [None]:
experiments = runner.create_experiments(splits, img_info)

# Save Dataframes

All training and testing configurations will be saved as pickle file in the **modelconfiguration** folder referencing the image data used. 
That allows to reuse the data for retraining, testing or further analysis. 

In [None]:
runner.save_experiments(experiments, corpus_to_train, splits)

# Train Classifier

Train the classifier based on the experiments defined in the previous steps. This might take a couple of minutes depending on the number of training images used.

Internally the method is creating batches of images which are then zipped and sent to the Visual Recognition API for training.

You can also use previously created experiment pickle files to create classifiers by setting the **USE_EXTERNAL_EXPERIMENT_DATA** to **True** and specify the path to the external experiments.

## Load external experiment data sets for training
By deafult this cell does nothing and uses the data set that was created in this notebook.

You can also use previously created experiment pickle files to test classifiers by setting the **USE_EXTERNAL_EXPERIMENT_DATA** to **True** and specify the path to the external experiments.

In [None]:
# Default: False -> use experiments created in this notebook
#          True -> use external experiments created earlier
USE_EXTERNAL_EXPERIMENT_DATA = False

# If True, specifiy external experiment data path (path_to_experiment.pkl)
EXTERNAL_EXPERIMENT_PATH='modelconfigurations/YOUR_EXPERIMENT_FILE.pkl'

if USE_EXTERNAL_EXPERIMENT_DATA:
    with open(EXTERNAL_EXPERIMENT_PATH,'rb') as f:
        experiments = pickle.load(f)

## Start Training

In [None]:
results = runner.train_k_classifiers(experiments, corpus_to_train, splits)

# Test Classifier 

Performs classifier testing by packaging the image data into several zip files and sending them to the Visual Recognition Service for scoring. 

Main steps:
1. Get the relevant classifier ids to be used for testing
2. Perform the tests


## Get classifier Ids
Loads all available classifiers and tries to find the ones that match your corpus_to_train and cross validation folds.

If more than one classifier per cross validation iteration is found, you will see warnings. 
You need to interactively select the classifiers for each iteration.

In [None]:
collision = False
possible_classifiers = runner.vr_service.get_classifier_ids_by_name(corpus_to_train)
# Create empty list for the classifier ids
classifier_ids = [None] * splits

for class_idx,classifier in enumerate(possible_classifiers):
    for idx in range(splits):
        if possible_classifiers[class_idx]['name'] == corpus_to_train+'_'+str(idx):
            if (classifier_ids[idx] is None):
                classifier_ids[idx] = possible_classifiers[class_idx]['classifier_id']
            else:
                collision = True
                print("Found collision for classifier "+corpus_to_train+ " and fold "+str(idx)
                      +". Already got "+ classifier_ids[idx] +", also found "
                      + possible_classifiers[class_idx]['classifier_id'])
            
if(collision):
    print("----------------------------------------------------------------")
    print("Multiple classifier ids for the same corpus and split found. "
          +"Please select and assign the correct classifier ids manually: ")
    print("Fetching possible classifiers...")
    for idx, current_id in enumerate(classifier_ids):
        possible_ids = runner.vr_service.get_classifier_ids_by_name(corpus_to_train+"_"+str(idx))
        print("Possible ids for current iteration:", possible_ids)
        classifier_id = input("Provide Classifier ID for iteration "+str(idx)+":")
        classifier_ids[idx] = classifier_id
        print("Fetching next possible classifier IDs")
    print("Got the following classifier ids which will be used for testing: ", classifier_ids)
    print(' ')

else:
    print("Classifier ID to be used:", classifier_ids)

In [None]:
#classifier_ids[4] = 'mercedes_0123456789'

## Load external experiment data set for testing
By deafult this cell does nothing and uses the data set that was created in this notebook.

You can also use previously created experiment pickle files to test classifiers by setting the **USE_EXTERNAL_EXPERIMENT_DATA** to **True** and specify the path to the external experiments.

In [None]:
# If False, use experiments created in previous steps in this notebook
USE_EXTERNAL_EXPERIMENT_DATA = False

# Otherwise, external experiment data (filename.pkl) will be used from the specified path
EXTERNAL_EXPERIMENT_PATH='modelconfigurations/TRAIN_TEST.pkl'

if USE_EXTERNAL_EXPERIMENT_DATA:
    with open(EXTERNAL_EXPERIMENT_PATH,'rb') as f:
        experiments = pickle.load(f)

## Perform Tests

Test the classifier based on the experiments defined in the previous steps. This might take a couple of minutes (**usually 2-5**) depending on the number of images used for testing. 



In [None]:
results = runner.test_k_classifiers(experiments, classifier_ids, splits)

# Evaluation

In this section the classifier performance is analyzed based on the tests that were performed in the previous steps.
A confusion matrix is created to analyze the true & false / positives & negatives.

## Load external data set for evaluation
By deafult this cell does nothing and uses the data set that was created in this notebook.

You can also use previously created experiment pickle files to test classifiers by setting the **USE_EXTERNAL_RESULT_DATA** to **True** and specify the path to the external experiments.

In [None]:
# If False, use result data from the current test run in this notebook
USE_EXTERNAL_RESULT_DATA = False

# Otherwise, external result data (filename.pkl) will be used from the specified path
EXTERNAL_RESULT_PATH='modelconfigurations/YOUR_EVALUATION_FILE.pkl'

if USE_EXTERNAL_RESULT_DATA:
    with open(EXTERNAL_RESULT_PATH,'rb') as f:
        experiments = pickle.load(f)

In [None]:
# match results against expected classification results
if not USE_EXTERNAL_RESULT_DATA:
    for idx,experiment in enumerate(experiments):
        evaluation = runner.merge_predicted_and_target_labels(experiment['test'], results[idx])
        experiment['evaluation'] = evaluation

    # save evaluation results for further analysis and documentation
    with open("modelconfigurations/"+corpus_to_train+"_result_"+str(splits)+"_fold_" +time.strftime("%d-%m-%Y-%H-%M-%S")+ ".pkl", "wb") as fp:
        pickle.dump(experiments, fp)

## Plot confusion matrix as table

In [None]:
# extract actual and predicted values from evaluation
thresholds = [0.7, 0.8]

runner.print_experiment_confusion_matrix(experiments, thresholds)

In [None]:
# extract actual and predicted values from evaluation
thresholds = [0.6, 0.7, 0.8, 0.9]

runner.print_consolidated_experiment_confusion_matrix(experiments, thresholds)

## Create Classification Report
Creates a classification report including the most important metrics

In [None]:
# seperate view on iterations
thresholds = [0.7]

runner.print_experiment_classification_report(experiments, thresholds)

In [None]:
#consolidated classification report
thresholds = [0.6, 0.7, 0.8]

runner.print_consolidated_experiment_classification_report(experiments, thresholds)

In [None]:
# Get overall accuracy metric
thresholds = [0.7, 0.8, 0.85, 0.9]

runner.print_consolidated_experiment_metrics(experiments, thresholds)

# Visualize False Positives & False Negatives

In [None]:
import glob
import matplotlib.pyplot as plt
import matplotlib.image as mpimg
%matplotlib inline

threshold = 0.8

for id,experiment in enumerate(experiments):
    ev = experiment['evaluation'].copy()
    y_actual, y_pred = runner.get_y_values(experiment['evaluation'], threshold)
    
    
    fpfn = ev[ y_actual!= y_pred ]

    image_count = fpfn.shape[0]

    fig = plt.figure(figsize=(40,130))

    columns = 5
    idx = 0

    for i, row in fpfn.iterrows():
        image = mpimg.imread(row['image_x'])
        ax = fig.add_subplot(int(image_count / columns + 1), columns, idx + 1)
        ax.set_title("is: "+row['class_name']
                             +"\n pred: "
                             + row['predicted_class_1']
                             +" \n file: "
                             +row['image_x'].split('/')[-1]
                             +" \n score: "
                             +str(row['predicted_score_1']), fontsize=25)
        idx = idx +1
        ax.imshow(image, aspect='auto')

# Histogram Threshold Performance

In [None]:
scores = []
for idx,experiment in enumerate(experiments):
    score_list = experiment['evaluation']['predicted_score_1']
    scores = scores + list(score_list)

print(len(scores))
n, bins, patches = plt.hist(scores, 20, normed=0, facecolor='green', alpha=0.9)

In [None]:
runner.zip_helper.clean_up()