# Notebook for testing performance of intent classification in Watson Conversation Service
[Watson Developer Cloud](https://www.ibm.com/watsondevelopercloud) is a platform of cognitive services that leverage machine learning techniques to help partners and clients solve a variety business problems. Furthermore, several of the WDC services fall under the **supervised learning** suite of machine learning algorithms, that is, algorithms that learn by example. This begs the questions: "How many examples should we provide?" and "When is my solution ready for prime time?"

It is critical to understand that training a machine learning solution is an iterative process where it is important to continually improve the solution by providing new examples and measuring the performance of the trained solution. In this notebook, we show how you can compute important Machine Learning metrics (accuracy, precision, recall, confusion_matrix) to judge the performance of your Watson Conversation service solution. For more details on these various metrics, please consult the **[Is Your Chatbot Ready for Prime-Time?](https://developer.ibm.com/dwblog/2016/chatbot-cognitive-performance-metrics-accuracy-precision-recall-confusion-matrix/)** blog.


<br> The notebook assumes you have already created a [Watson Conversation Service](https://www.ibm.com/watson/developercloud/conversation.html) instance and trained it based on a number of intents. </br>
<br> To leverage this notebook, you need to provide the following information</br>
* Credentials for your Watson Conversation instance (username and password)
* Workspace id for your conversation 
* csv file with your text utterances and corresponding intent labels
* results csv file to write the results to
* csv file to write confusion matrix results to

Note that the input test csv file should have a header with the fields **text** and **class**. 

In [None]:
#Import utilities
import json
import csv
from sklearn.metrics import confusion_matrix
from sklearn.metrics import precision_recall_fscore_support
from sklearn.metrics import classification_report
from sklearn.metrics import accuracy_score
import matplotlib.pyplot as plt
import pandas_ml
from pandas_ml import ConfusionMatrix
from watson_developer_cloud import ConversationV1

Provide the path to the parms file which includes credentials to access your Conversation service as well as the input
test csv file and the output csv files to write the output results and confusion matrix to.

In [None]:
# Provide complete path to the file which includes all required parms
# A sample parms file is included (example_parms.json)
convParmsFile = 'PATH to CONVERSATION parms file'
parms = ''
with open(convParmsFile) as parmFile:
    parms = json.load(parmFile)

url=parms['url']
user=parms['user']
password=parms['password']
workspace_id=parms['workspace_id']
test_csv_file=parms['test_csv_file']
results_csv_file=parms['results_csv_file']
confmatrix_csv_file=parms['confmatrix_csv_file']

# Create an object for your Conversation instance
conversation = ConversationV1(
  username=user,
  password=password,
  version='2017-02-03')

Define useful methods to classify using trained Watson Conversation service.

In [None]:
# Given a text string and a pointer to Conversation instance and workspaceID, get back Conversation response
def getConversationResponse(conv_instance,workspaceID,string):
    context={}
    response = conv_instance.message(
        workspace_id=workspaceID, 
        message_input={'text':string},
        context=context)
    classes=response
    return classes

# Process multiple text utterances (provided via csv file) in batch. Effectively, read the csv file and for each text
# utterance, get NLC response. Aggregate and return results.
def batchConversation(conv_instance,workspaceID,csvfile):
    test_classes=[]
    convpredict_classes=[]
    convpredict_confidence=[]
    text=[]
    i=0
    with open(csvfile) as csvfile:
        csvReader=csv.DictReader(csvfile)
        print('csvfile: ', csvfile)
        for row in csvReader:
            print ('testing row: ', row['text'])
            test_classes.append(row['class'])
            conv_response = getConversationResponse(conv_instance,workspaceID,row['text'])
            #print 'response: ', conv_response
            if conv_response['intents']: 
                convpredict_classes.append(conv_response['intents'][0]['intent'])
                convpredict_confidence.append(conv_response['intents'][0]['confidence'])
            else:
                convpredict_classes.append('')
                convpredict_confidence.append(0)
            text.append(row['text'])
            i = i+1
            if(i%250 == 0):
                print ('Processed ', i, ' records')
        print ('Finished processing ', i, ' records')
    return test_classes, convpredict_classes, convpredict_confidence, text

# Plot confusion matrix as an image
def plot_conf_matrix(conf_matrix):
    plt.figure()
    plt.imshow(conf_matrix)
    plt.show()

# Print confusion matrix to a csv file
def confmatrix2csv(conf_matrix,labels,csvfile):
    with open(csvfile, 'w') as csvfile:
        csvWriter = csv.writer(csvfile)
        row=list(labels)
        row.insert(0,"")
        csvWriter.writerow(row)
        for i in range(conf_matrix.shape[0]):
            row=list(conf_matrix[i])
            row.insert(0,labels[i])
            csvWriter.writerow(row)
            

In [None]:
# This is an optional step to quickly test response from NLC for a given utterance
#testQ='I have a billing question'
#results = getConversationResponse(conversation,workspace_id,testQ)
#print(json.dumps(results, indent=2))

Call Conversation on the specified csv file and collect results.

In [None]:
test_classes,convpredict_classes,convpredict_conf,text=batchConversation(conversation,workspace_id,test_csv_file)

In [None]:
# print results to csv file including original text, the correct label, 
# the predicted label and the confidence reported by Conversation.
csvfileOut=results_csv_file
with open(csvfileOut,'w') as csvOut:
    outrow=['text','true class','Conversation Predicted class','Confidence']
    csvWriter = csv.writer(csvOut,dialect='excel')
    csvWriter.writerow(outrow)
    for i in range(len(text)):
        outrow=[text[i],test_classes[i],convpredict_classes[i],str(convpredict_conf[i])]
        csvWriter.writerow(outrow)

In [None]:
# Compute confusion matrix
labels=list(set(test_classes))
conv_confusion_matrix = confusion_matrix(test_classes, convpredict_classes, labels)
convConfMatrix = ConfusionMatrix(test_classes, convpredict_classes)

In [None]:
# Print out confusion matrix with labels to csv file
confmatrix2csv(conv_confusion_matrix,labels,confmatrix_csv_file)

In [None]:
%matplotlib inline
convConfMatrix.plot()

In [None]:
# Compute accuracy of classification
acc=accuracy_score(test_classes, convpredict_classes)
print ('Classification Accuracy: ', acc)

In [None]:
# print precision, recall and f1-scores for the different classes
print(classification_report(test_classes, convpredict_classes, labels=labels))

In [None]:
#Optional if you would like each of these metrics separately
#[precision,recall,fscore,support]=precision_recall_fscore_support(test_classes, convpredict_classes, labels=labels)
#print ('precision: ', precision)
#print ('recall: ', recall)
#print ('f1 score: ', fscore)
#print ('support: ', support)