# Evaluation Tool

This is a set of tools that should help to get up to speed when delivering Visual Recognition projects. It provides helpers to simplify the training, testing and evaluation of classifiers.
This particular tool helps you to automate blind set validation for IBM Watson Visual Recognition classifiers.

## Features
- Automated Classifier Testing
- Persisting of test and result sets

## Image Corpus Layout

Currently the tooling is working with image corpora that are file and folder based. An image corpus can consist of several folders. Each folder represents a class the respective classifier will be able to recognize. Each class folder  contains all images that will be used to test the classifier on this class.

To get a better understanding of the layout, take a look at this sample folder hierarchy (also contained in this project):

```
 ./corpus
     /mercedes_blindtest
         /sclass
             sclass_1.jpg
             ...
         /negative_examples
             negative_sclass_1.jpg
             ...
```
## Process
1. Prepare your image set: Create a folder in the corpus directory that contains a subfolder for each class of your classifier your want to test. Each subfolder contains the images you want to use for testing.
2. Make sure your config.ini file contains the right API key (either IAM or old API key)
3. Set the classifier ID of the classifier you want to test.
4. Run Tests
5. Evaluate results

# Initialization

In [None]:
# import basic libraries
import time
import os
import sys
import pickle
import json
import configparser
import datetime
import numpy as np
import pandas as pd

# import sklearn helpers
from sklearn.model_selection import train_test_split
from sklearn.model_selection import StratifiedKFold
from sklearn.preprocessing import label_binarize
from sklearn import metrics
from scipy import interp

# import custom VR tooling libs
import vrtool

# Configuration


When using this tool for the first time, you'll find a file called **dummy.config.ini** which needs to be copied and renamed to **config.ini**.


Configure *your* tool by entering your IAM API key and URL of the Visual Recognition service instance.
```
[vr]
IAM_API_KEY:your_IAM_api_key
URL:your_service_url
```

# Corpus Overview & Statistics

The following section provides an extensive overview of the image corpus and statistics of the same.

In [None]:
# The name of the folder that contains the corpora, currently relative to notebook location
corpora_folder_name = '../corpus'
config_name = 'config.ini'

runner = vrtool.Runner(corpora_folder_name, config_name=config_name)
corpora = runner.get_available_corpora()

# Print a summary of the available corpora in the corpora directory
print()
print('Available image corpora:')
print('\n'.join('{}: {[0]}'.format(*el) for el in enumerate(corpora)))

# Corpus Config
corpus_to_test = 'mercedes_blindtest'

# Statistics
statistics = {}
statistics['corpusname'] = corpus_to_test

In [None]:
# Print a detailed overview of the different classes and their distribution within each corpus
corpora_info = runner.get_corpora_info(corpora)

In [None]:
test_data = [el['image_info'] for el in corpora_info if el['corpus_name'] == corpus_to_test ][0]
negative_test = []
test_data = test_data.groupby('class_name').filter(lambda x: len(x) >= 1)
print(test_data.head())

# Test Classifier 

Performs classifier testing by packaging the image data into several zip files and sending them to the Visual Recognition Service for scoring. 

Main steps:
1. Get the relevant classifier ids to be used for testing
2. Perform the tests


## Select classifier IDs to test

In [None]:
print(json.dumps(runner.vr_instance.list_classifiers().get_result(), indent=4, sort_keys=True))

In [None]:
# set classifier ID
classifier_id = 'CLASSIFIER_ID'

## Perform Tests

Test the classifier based on the experiments defined in the previous steps. This might take a couple of minutes depending on the number of images used for testing.



In [None]:
if(len(negative_test) >0):
    test_data = pd.concat([test_data, negative_test])
    
# perform test
start = datetime.datetime.now()

test_results = runner.test_classifier_with_data_frame(classifier_id, test_data)
end = datetime.datetime.now()

print("Testing finished after: ",end-start)


In [None]:
parsed_result = runner.vr_service.parse_img_results(test_results)

# Evaluation

In this section the classifier performance is analyzed based on the tests that were performed in the previous steps.
A confusion matrix is created to analyze the true & false / positives & negatives.

## Load external data set for evaluation
By deafult this cell does nothing and uses the data set that was created in this notebook.

You can also use previously created experiment pickle files to test classifiers by setting the **USE_EXTERNAL_RESULT_DATA** to **True** and specify the path to the external experiments.

In [None]:
# If False, use result data from the current test run in this notebook
USE_EXTERNAL_RESULT_DATA = False

# Otherwise, external result data (filename.pkl) will be used from the specified path
EXTERNAL_RESULT_PATH='modelconfigurations/YOUR_EVALUATION_FILE.pkl'

if USE_EXTERNAL_RESULT_DATA:
    with open(EXTERNAL_RESULT_PATH,'rb') as f:
        evaluation = pickle.load(f)

In [None]:
if not USE_EXTERNAL_RESULT_DATA:
    # match results against expected classification results
    evaluation = runner.merge_predicted_and_target_labels(test_data, test_results)

    # save evaluation results for further analysis and documentation
    evaluation.to_pickle("modelconfigurations/"+corpus_to_test + "_result_" +time.strftime("%d-%m-%Y-%H-%M-%S")+ ".pkl")

### Save Classification Results to CSV

In [None]:
runner.evaluation_result_to_csv(evaluation, corpus_to_test)

## Plot confusioin matrix as table

In [None]:
thresholds = [0.6, 0.7, 0.75, 0.8, 0.85, 0.9, 0.95]
pd.options.display.max_colwidth = 600
classification_reports = []
confusion_matrices = []

for threshold in thresholds: 
    ev = evaluation.copy()
    ev.loc[ev['predicted_score_1'] < threshold,'predicted_class_1'] = 'None'
    y_actual, y_pred = runner.get_y_values(ev)
    confusion_matrix = pd.crosstab(y_actual, y_pred)
    confusion_matrices.append((threshold, confusion_matrix))
    print("Overall Accuracy for threshold {0}: {1}".format(threshold ,metrics.accuracy_score(y_actual, y_pred)))
    print("")
    print("Confusion Matrix:")
    print(confusion_matrix)
    classification_report = runner.get_classification_report(y_actual, y_pred)
    classification_reports.append((threshold, classification_report))
    print("")
    print("Classification Report:")
    print(classification_report)
    print('------------------------------------------------------------')

### Save Classification Reports as CSV

In [None]:
runner.classification_reports_to_csv(classification_reports, corpus_to_test)

### Save Confusion Matrix as CSV

In [None]:
runner.confusion_matrix_to_csv(confusion_matrices, corpus_to_test)

## Plot confusioin matrix as chart

In [None]:
# extract actual and predicted values from evaluation
y_actual, y_pred = runner.get_y_values(evaluation)

# plot confusion matrix
confmatrix = runner.get_confusion_matrix(y_actual, y_pred)

runner.plot_confusion_matrix(confmatrix, y_actual, y_pred, normalize=True,
                      title='Normalized confusion matrix')

## Create Classification Report
Creates a classification report including the most important metrics

In [None]:
thresholds = [0.6, 0.7, 0.8, 0.9]

runner.print_classification_report(evaluation, thresholds)

In [None]:
print("Overall Accuracy:",metrics.accuracy_score(y_actual, y_pred))

# Visualize False Positives & False Negatives

In [None]:
import glob
import matplotlib.pyplot as plt
import matplotlib.image as mpimg
%matplotlib inline

threshold = 0.75 
ev = evaluation.copy()
ev.loc[ev['predicted_score_1'] < threshold,'predicted_class_1'] = 'None'

# extract actual and predicted values from evaluation
y_actual, y_pred = runner.get_y_values(ev)

fpfn = ev[ y_actual!= y_pred ]

image_count = fpfn.shape[0]

fig = plt.figure(figsize=(40,30))

columns = 5
idx = 0

for i, row in fpfn.iterrows():
    image = mpimg.imread(row['image_x'])
    ax = fig.add_subplot(int(image_count / columns + 1), columns, idx + 1)
    ax.set_title("is: "+row['class_name']
                         +"\n pred: "
                         + row['predicted_class_1']
                         +" \n file: "
                         +row['image_x'].split('/')[-1]
                         +" \n score: "
                         +str(row['predicted_score_1']), fontsize=25)
    idx = idx +1
    ax.imshow(image, aspect='auto')
    
plt.show()

# Histogram Threshold Performance

In [None]:
result_scores = evaluation['predicted_score_1']

n, bins, patches = plt.hist(result_scores, 20, normed=0, facecolor='green', alpha=0.9)

In [None]:
runner.zip_helper.clean_up()