# Notebook for testing performance of intent classification in Watson Conversation Service

Modified from Conversation Performance Evaluation notebook - https://github.com/joe4k/wdcutils. Uses dataframes to capture data sets, as well as ability to get full content and build a conversation workspace.


## Environment Setup

Couple of dependencies that need to be initialized. Uses pandas for dataframes, sklearn for generating the confusion matrix and calculating performance metrics, and matplotlib to visualize the confusion matrix. 

### Install Libs/Deps
Installs the main dependencies for the notebook. Do not run the cell if you already have the dependencies installed. You will want to restart your kernel after running the cell.

In [None]:
!pip install pandas
!pip install matplotlib
!pip install watson-developer-cloud --upgrade
!pip install sklearn

!pip freeze

### Import Libs/Deps

In [None]:
import json
import os
import time
import configparser
import itertools 
import pandas as pd
import numpy as np
import matplotlib as mpl
import matplotlib.pyplot as plt

from sklearn.metrics import confusion_matrix
from sklearn.metrics import precision_recall_fscore_support
from sklearn.metrics import classification_report
from sklearn.metrics import accuracy_score
from sklearn.model_selection import train_test_split
from watson_developer_cloud import ConversationV1
from watson_developer_cloud import WatsonException
from watson_developer_cloud import WatsonApiException

### Load Common Functions

A set of functions used in the notebook:
- run_test_set_df : Iterates through the dataframe content, calling Watson Conversation with utterance. Storing/returning the utterance value, the correct classification, the predicted classification and the prediction confidence.
- plot_confusion_matrix : Uses matplotlib to visualize the confusion matrix inline

In [None]:
def run_test_set_df(conversation, workspace_id, output_state, test_df):
    utterance_value = []
    actual_value = []
    predicted_value = []
    predicted_value_confidence=[]
    counter=0
    for test_data_index, test_data_row in test_df.iterrows():
        counter += 1
        utterance = test_data_row['utterance'] 
        try:
            conv_response = conversation.message( workspace_id=workspace_id, input={'text':utterance}, alternate_intents=True)
        except WatsonApiException as we:
            print("Watson API Exception ", we)
            print(test_data_index)
            print(test_data_row)
            raise
        
        if(output_state and counter % 50 == 0):
            print("=======================================================================")
            print("Processed count: {0}".format(counter))
            print(json.dumps(conv_response, indent=4))
            print("=======================================================================")

        utterance_value.append(utterance)
        actual_value.append(test_data_row['intent'])
        if conv_response['intents']: 
            predicted_value.append(conv_response['intents'][0]['intent'])
            predicted_value_confidence.append(conv_response['intents'][0]['confidence'])
        else:
            predicted_value.append('IRRELEVANT')
            predicted_value_confidence.append(0)
            
    print("\nFinished processing dataframe set. {0} records".format(counter))
    return actual_value, predicted_value, predicted_value_confidence, utterance_value

### From sklearn site - http://scikit-learn.org/stable/auto_examples/model_selection/plot_confusion_matrix.html
def plot_confusion_matrix(conf_matrix, classes=None, normalize=False, title='', cmap=plt.cm.Blues):
    plt.figure()
    plt.imshow(conf_matrix, interpolation='nearest', cmap=cmap)
    plt.title(title)
    plt.colorbar()
    tick_marks = np.arange(len(classes))
    plt.xticks(tick_marks, classes, rotation=45)
    plt.yticks(tick_marks, classes)

    if normalize:
        conf_matrix = conf_matrix.astype('float') / conf_matrix.sum(axis=1)[:, np.newaxis]
        print("Normalized confusion matrix")
    else:
        print("Confusion matrix, without normalization")

    thresh = conf_matrix.max() / 2.
    for i, j in itertools.product(range(conf_matrix.shape[0]), range(conf_matrix.shape[1])):
        plt.text(j, i, conf_matrix[i, j],
                 horizontalalignment="center",
                 color="white" if conf_matrix[i, j] > thresh else "black")

    plt.tight_layout()
    plt.ylabel('True label')
    plt.xlabel('Predicted label')
    plt.show()

### Initialize Variables

Initializes main parameters for execution. Will read from a configuration file called **notebooks_config.ini**, which should be created in the same directory as this notebook. The file should have the following structure:

```
[CONVERSATION_SERVICE]
USERNAME = YOUR_WATSON_CONVERSATION_USERNAME
PASSWORD = YOUR_WATSON_CONVERSATION_PASSWORD
VERSION = 2017-05-26
LANGUAGE = en
WORKSPACE_ID = A WATSON CONVERSATION WORKSPACE ID
WORKSPACE_NAME = A WATSON CONVERSATION WORKSPACE NAME

[DATASET]
ALL_DATA_CSV_FILE = A_DATA_CSV (i.e. /Users/me/alldata.csv)
EXPERIMENT_DATA_CSV_FILE = A_SEPARATE_EXPERIMENT_DATA_CSV (i.e. /Users/me/experiment_data.csv)

[OUTPUT]
BASE_OUTPUT_DIRECTORY = A_DIRECTORY (i.e. /Users/me/tmp/)
```

At a minimum, the conversation service username/password and either a single complete data file AND/OR an experiment data file should be provided.

A workspace id along with experiment data csv file is provided in cases where you already have a conversation workspace built and are just running a performance experiment.

The all_data_csv_file parameter is used to generate a separate train/test set (simple withold validation) for cases where a conversation workspace is going to be created and tested.


In [None]:
################# Input Variables #################
config_file = os.path.join(os.getcwd(), 'notebooks_config.ini')
config = configparser.ConfigParser(allow_no_value=True)
config.read(config_file)

conversation_username = config.get('CONVERSATION_SERVICE', 'USERNAME')
conversation_password = config.get('CONVERSATION_SERVICE', 'PASSWORD')
conversation_version = config.get('CONVERSATION_SERVICE', 'VERSION', fallback='2017-05-26')
conversation_language = config.get('CONVERSATION_SERVICE', 'LANGUAGE', fallback='en')
conversation_workspace_id = config.get('CONVERSATION_SERVICE', 'WORKSPACE_ID', fallback=None)
conversation_workspace_name = config.get('CONVERSATION_SERVICE', 'WORKSPACE_NAME', fallback='Test Workspace')
base_output_directory = config.get('OUTPUT', 'BASE_OUTPUT_DIRECTORY', fallback = os.getcwd())
all_data_csv_file = config.get('DATASET', 'ALL_DATA_CSV_FILE', fallback = None)
experiment_data_csv_file = config.get('DATASET', 'EXPERIMENT_DATA_CSV_FILE', fallback = None)

if all_data_csv_file is None and experiment_data_csv_file is None:
    raise Exception('Need to specify either a full data file for training/testing or a experiment data file. Check your configuration file.')

################# Generated / Internal Variables #################
current_timestamp = time.strftime("%Y%m%d-%H%M%S")
plot_figure_size = (20,20)
experiment_withold_amount = 0.3

if all_data_csv_file is not None:
    test_data_csv_file = base_output_directory + 'test_data_' + current_timestamp + '.csv'
    train_data_csv_file = base_output_directory + 'train_data_' + current_timestamp + '.csv'
    test_results_csv_file = base_output_directory + 'test_results_data_' + current_timestamp + '.csv'
    test_confusion_matrix_csv_file = base_output_directory + 'test_confusion_matrix_' + current_timestamp + '.csv'
    if conversation_workspace_id is not None:
        print("You have supplied train/test data as well as a workspace id. A new workspace will be created with the training/test data instead of testing against the supplied workspace id.")

if experiment_data_csv_file is not None:
    experiment_results_csv_file = base_output_directory + 'experiment_results_data_' + current_timestamp + '.csv'
    experiment_confusion_matrix_csv_file = base_output_directory + 'experiment_confusion_matrix_' + current_timestamp + '.csv'


conversation_wrapper = ConversationV1(
    username=conversation_username,
    password=conversation_password,
    version= conversation_version
)

%whos str float

## Data Processing

### Setup Train / Test Data
This cell will read the all_data_csv_file into a pandas dataframe, then split the dataset into a train and test set which will be used further down to create a conversation workspace and run a performance experiment.

The two generated datasets are written out as csv files.

In [None]:
try:
    df = pd.read_csv(all_data_csv_file)
    print("Number of records: {0}".format(len(df.index)))
except NameError as e:
    print("Error: Setup is incorrect or incomplete.\n")
    raise

train_data, test_data = train_test_split(df, test_size = experiment_withold_amount)
print("Number of records in training Set: {0}".format(len(train_data)))
print("Number of records in withhold test Set: {0}".format(len(test_data)))


###### Write out test/train data #######
train_data.to_csv(train_data_csv_file, encoding='utf-8')
test_data.to_csv(test_data_csv_file, encoding='utf-8')

### Setup Experiment Data
This cell will read the experiment_data_csv_file into a pandas dataframe, so that it can be used to gather performance metrics against a conversation workspace.

In [None]:
try:
    es_df = pd.read_csv(experiment_data_csv_file)
    print("Number of experiment set records: {0}".format(len(es_df.index)))
except NameError as e:
    print("Error: Setup is incorrect or incomplete.\n")
    raise
    

#Alternative, load the data frame manually.
#df = pd.DataFrame([
#    { "intent": "intent_value_1", "utterance": "example 1"},
#    { "intent": "intent_value_1", "utterance": "example 2"},
#    { "intent": "intent_value_1", "utterance": "example 3"},
#    { intent": "intent_value_2", "utterance": "example 4"},
#    { "intent": "intent_value_2", "utterance": "example 5"}
#], columns=['intent', 'utterance'])
#df.head(3)

## Conversation Workspace Setup
**This section is optional.**

Used to create a Watson conversation workspace using the supplied data. If you already have a conversation workspace created and are just running an experiment these cells do not need to be run

### Create Workspace

Takes the training data split and builds out an intent structure that will be used to create a Watson Conversation workspace. This creates an intent model using the supplied data and also includes a single dialog node in the model that echoes the intents object identified during execution. The dialog node is not critical but incuded to remove the warning message that would appear when calling a conversation workspace with no dialog.  

The created workspace id is stored for testing below.

In [None]:
grouped = train_data.groupby('intent')

intents = []
for name, group in grouped:
    examples = []
    for index, row in group.iterrows(): 
        example = { 'text': row['utterance'] }
        examples.append(example)
    
    intent = { 
        'intent': name,
        'examples': examples
    }
    
    intents.append(intent)

dialog_nodes = [
    {
     'dialog_node': 'anything_else',
     'conditions': 'anything_else',
     'parent': None, 
     'previous_sibling': None,
     'output': {'text': {'values': ['<? intents ?>'], 'selection_policy': 'sequential'}}, 
     'context': None,
     'metadata': None,
     'go_to': None
    }
]

response = conversation_wrapper.create_workspace(
    dialog_nodes=dialog_nodes,
    intents=intents,
    language=conversation_language,
    name=conversation_workspace_name
)
conversation_workspace_id = response['workspace_id']
print("Workspace create response:")
print(json.dumps(response, indent=4))
print("\nWorkspace ID: {0}".format(conversation_workspace_id))

### Check Workspace Status

If a workspace is created, need to wait for the training to complete and the model to become available. Re-run this cell until the output shows that the workspace is in an available state. The output should conclude with:
> Status is: Available


In [None]:
response = conversation_wrapper.get_workspace(workspace_id=conversation_workspace_id, export=False)
print("Workspace details response:")
print(json.dumps(response, indent=4))
print("\nStatus is: {}".format(response['status']))

## Test / Data Experiment

Following cells are used to run the actual test against the conversation service. If a train/test set was created, run the 'Execute Test Set' cell. If a Experiment set was supplied, run the 'Execute Experiment Set' cell.

### Execute Test Set 

Uses the common function to call the conversation workspace using the test set dataframe, creating a confusion matrix with the results.

If output_status is set to True, the function will periodically output status of the test (how many records have been run/completed and the full watson conversation response to the current test record). If its set to False only a final output with the total number of completed test records will be shown.

In [None]:
output_status = False
actual_vals, predicted_vals, predicted_conf_vals, test_utterance_vals = run_test_set_df(conversation_wrapper, conversation_workspace_id, output_status, test_data)

## Get Label Names
label_names = []
label_names = test_data['intent'].drop_duplicates().tolist()
label_names.append('IRRELEVANT') 

#SKLearn Confusion Matrix
result_confusion_matrix = confusion_matrix(actual_vals, predicted_vals, labels=label_names)

### Execute Experiment Set

Uses the common function to call the conversation workspace using the experiment dataframe, creating a confusion matrix with the results.

If output_status is set to True, the function will periodically output status of the experiment (how many records have been run/completed and the full watson conversation response to the current experiment record). If its set to False only a final output with the total number of completed experiment records will be shown.

In [None]:
output_status = False
es_actual_vals, es_predicted_vals, es_predicted_conf_vals, es_test_utterance_vals = run_test_set_df(conversation_wrapper, conversation_workspace_id, output_status, es_df)
## Get Label Names
es_label_names = []
es_label_names = es_df['intent'].drop_duplicates().tolist()
es_label_names.append('IRRELEVANT')

#SKLearn Confusion Matrix
es_result_confusion_matrix = confusion_matrix(es_actual_vals, es_predicted_vals, labels=es_label_names)

## Print / Visualize Experiment Results

This section will manipulate and visualize the above test/experiment results. The first portion is against the test set results and the second section runs against the experiment set results.

1. The first cell in each section creates a data frame that captures results with misses that can be written to a csv.
2. The second cell in each section is optional and will just output the top ten records in the results dataframe. Really just a sanity check to see if labels make sense.
3. The third cell in each section will visualize the confusion matrix in a plot.
4. The final cell in each section will gather the performance metrics for this test (using sklearn report). * Note: You may see warning output from this cell if there is an intent in the test results that had data *

### Test Set Results

In [None]:
## Results as dataframe
test_results = pd.DataFrame({
     'Utterance': test_utterance_vals,
     'Actual': actual_vals,
     'Predicted': predicted_vals,
     'Confidence' : predicted_conf_vals
    }, columns=['Utterance','Actual','Predicted','Confidence', 'Missed'])
test_results['Missed'] = test_results.apply(lambda x : 'X' if x['Actual'] != x['Predicted'] else '', axis=1)

In [None]:
#OPTIONAL CELL - JUST A SANITY CHECK TO MAKE SURE THE RESULTS CAPTURED MAKE SENSE.
print("Test Set Sample Output: ")
test_results.head(10)

In [None]:
## Plot/Visualize the confusion matrix
%matplotlib inline
mpl.rcParams['figure.figsize'] = plot_figure_size
plot_confusion_matrix(result_confusion_matrix, classes=label_names, title='Intent confusion matrix')

In [None]:
# Compute and print the performance metrics (accuracy, precision, recall, etc) of the classification test
acc = accuracy_score(actual_vals, predicted_vals)
print("Classification Overall Accuracy: {0}\n".format(acc))
print(classification_report(actual_vals, predicted_vals, labels=label_names))

### Blind Set Results

In [None]:
## Results as dataframe
es_test_results = pd.DataFrame({
     'Utterance': es_test_utterance_vals,
     'Actual': es_actual_vals,
     'Predicted': es_predicted_vals,
     'Confidence' : es_predicted_conf_vals
    }, columns=['Utterance','Actual','Predicted','Confidence', 'Missed'])
es_test_results['Missed'] = es_test_results.apply(lambda x : 'X' if x['Actual'] != x['Predicted'] else '', axis=1)

In [None]:
print("Experiment Set Sample Output: ")
es_test_results.head(10)

In [None]:
## Plot/Visualize Data
%matplotlib inline
mpl.rcParams['figure.figsize'] = plot_figure_size
plot_confusion_matrix(es_result_confusion_matrix, classes=es_label_names, title='Intent confusion matrix')

In [None]:
# Compute accuracy of classification
es_acc = accuracy_score(es_actual_vals, es_predicted_vals)
print("Classification Overall Accuracy: {0}\n".format(es_acc))
print(classification_report(es_actual_vals, es_predicted_vals, labels=es_label_names))

## Export Results

Writes the result information to CSV files. One CSV file for the results containing test utterance, response and confidence. Another CSV file for the confusion matrix results.


In [None]:
## Write Test Set Results to File
test_results.to_csv(test_results_csv_file, encoding='utf-8')

tmp_df_out = pd.DataFrame(data=result_confusion_matrix, index= label_names, columns=label_names)
tmp_df_out.to_csv(test_confusion_matrix_csv_file, encoding='utf-8')

In [None]:
## Write Blind Set Results to File
es_test_results.to_csv(experiment_results_csv_file, encoding='utf-8')

tmp_df_out = pd.DataFrame(data=es_result_confusion_matrix, index= es_label_names, columns=es_label_names)
tmp_df_out.to_csv(experiment_confusion_matrix_csv_file, encoding='utf-8')