## Notebook for testing performance of intent classification in Watson Conversation Service
[Watson Developer Cloud](https://www.ibm.com/watsondevelopercloud) is a platform of cognitive services that leverage machine learning techniques to help partners and clients solve a variety business problems. Furthermore, several of the WDC services fall under the **supervised learning** suite of machine learning algorithms, that is, algorithms that learn by example. This begs the questions: "How many examples should we provide?" and "When is my solution ready for prime time?"

It is critical to understand that training a machine learning solution is an iterative process where it is important to continually improve the solution by providing new examples and measuring the performance of the trained solution. In this notebook, we show how you can compute important Machine Learning metrics (accuracy, precision, recall, confusion_matrix) to judge the performance of your Watson Conversation service solution. For more details on these various metrics, please consult the **[Is Your Chatbot Ready for Prime-Time?](https://developer.ibm.com/dwblog/2016/chatbot-cognitive-performance-metrics-accuracy-precision-recall-confusion-matrix/)** blog.


<br> The notebook assumes you have already created a [Watson Conversation Service](https://www.ibm.com/watson/developercloud/conversation.html) instance and trained it based on a number of intents. </br>
<br> To leverage this notebook, you need to provide the following information</br>
* Credentials for your Watson Conversation instance (username and password)
* Workspace id for your conversation 
* csv file with your text utterances and corresponding intent labels
* results csv file to write the results to
* csv file to write confusion matrix results to

In [None]:
# Only run this cell if you don't have pandas_ml or watson_developer_cloud installed
!pip install pandas_ml
# You can specify the latest verion of watson_developer_cloud (1.0.0 as of November 20, 2017)
!pip install -I watson-developer-cloud==1.0.0


In [None]:
#Import utilities
import json
import sys
import codecs
import unicodecsv as csv
from sklearn.metrics import confusion_matrix
from sklearn.metrics import precision_recall_fscore_support
from sklearn.metrics import classification_report
from sklearn.metrics import accuracy_score
import matplotlib.pyplot as plt
import pandas_ml
from pandas_ml import ConfusionMatrix
from watson_developer_cloud import ConversationV1

Provide the path to the parms file which includes credentials to access your Conversation service as well as the input
test csv file and the output csv files to write the output results and confusion matrix to.

In [None]:
# Sample parms file data
#{
#	"url": "https://gateway.watsonplatform.net/conversation/api",
#	"user":"YOUR_WCS_USERNAME",
#	"password": "YOUR_WCS_PASSWORD",
#	"workspace_id":"YOUR_WCS_WORKSPACE_ID",
#	"test_csv_file": "COMPLETE_PATH_TO_YOUR_TEST_CSV_FILE",
#	"results_csv_file": "COMPLETE PATH TO RESULTS FILE (any file you can write to)",
#	"confmatrix_csv_file": "COMPLETE PATH TO CONFUSION MATRIX FILE (any file you can write to)"
#}


In [None]:
# Provide complete path to the file which includes all required parms
# A sample parms file is included (example_parms.json)
convParmsFile = 'COMPLETE PATH TO YOUR PARM FILE'
parms = ''
with open(convParmsFile) as parmFile:
    parms = json.load(parmFile)

url=parms['url']
user=parms['user']
password=parms['password']
workspace_id=parms['workspace_id']
test_csv_file=parms['test_csv_file']
results_csv_file=parms['results_csv_file']
confmatrix_csv_file=parms['confmatrix_csv_file']

# Create an object for your Conversation instance
# Make sure to update the version to match your WCS workspace
conversation = ConversationV1(
  username=user,
  password=password,
  version='2017-05-26')

Define useful methods to classify using trained Watson Conversation service.

In [None]:
# Given a text string and a pointer to Conversation instance and workspaceID, get back Conversation response
def getConversationResponse(conv_instance,workspaceID,string):
    context={}
    # remove newlines from input text as that causes WCS to return an error
    string = string.replace("\n","")
    response = conv_instance.message(
        workspace_id=workspaceID, 
        input={'text':string},
        context=context)
    classes=response
    return classes

# Process multiple text utterances (provided via csv file) in batch. Effectively, read the csv file and for each text
# utterance, get NLC response. Aggregate and return results.
def batchConversation(conv_instance,workspaceID,csvfile):
    test_classes=[]
    convpredict_classes=[]
    convpredict_confidence=[]
    text=[]
    i=0
    print ('reading csv file: ', csvfile)
    with open(csvfile,"rb") as csvfile:
        # For better handling of utf8 encoded text
        csvReader = csv.reader(csvfile, encoding="utf-8-sig")
        for row in csvReader:
            # Assume input text is 2 column csv file, first column is text
            # and second column is the label/class/intent
            # Sometimes, the text string includes commas which may split
            # the text across multiple colmns. The following code handles that.
            if len(row) > 2:
                qelements = row[0:len(row)-1]
                utterance = ",".join(qelements)
                test_classes.append(row[len(row)-1])
            else:
                utterance = row[0]
                test_classes.append(row[1])
            utterance = utterance.replace('\r', ' ')
            print ('i: ', i, ' testing row: ', utterance)
          
            conv_response = getConversationResponse(conv_instance,workspaceID,utterance)
            if conv_response['intents']: 
                convpredict_classes.append(conv_response['intents'][0]['intent'])
                convpredict_confidence.append(conv_response['intents'][0]['confidence'])
            else:
                convpredict_classes.append('')
                convpredict_confidence.append(0)
            text.append(utterance)
            i = i+1
            if(i%250 == 0):
                print("")
                print('Processed ', i, ' records')
            if(i%10 == 0):
                sys.stdout.write('.'),
        print("")
        print ('Finished processing ', i, ' records')
    return test_classes, convpredict_classes, convpredict_confidence, text

# Plot confusion matrix as an image
def plot_conf_matrix(conf_matrix):
    plt.figure()
    plt.imshow(conf_matrix)
    plt.show()

# Print confusion matrix to a csv file
def confmatrix2csv(conf_matrix,labels,csvfile):
    with open(csvfile, 'wb') as csvfile:
        csvWriter = csv.writer(csvfile)
        row=list(labels)
        row.insert(0,"")
        csvWriter.writerow(row)
        for i in range(conf_matrix.shape[0]):
            row=list(conf_matrix[i])
            row.insert(0,labels[i])
            csvWriter.writerow(row)
            

In [None]:
# This is an optional step to quickly test response from NLC for a given utterance
#testQ='I have a billing question'
#results = getConversationResponse(conversation,workspace_id,testQ)
#print(json.dumps(results, indent=2))

Call Conversation on the specified csv file and collect results.

In [None]:
test_classes,convpredict_classes,convpredict_conf,text=batchConversation(conversation,workspace_id,test_csv_file)

In [None]:
# print results to csv file including original text, the correct label, 
# the predicted label and the confidence reported by Conversation.
csvfileOut=results_csv_file
csvWriter = codecs.open(csvfileOut, 'w', encoding="utf-8-sig")

outrow=['text','true class','Conversation Predicted class','Confidence']
csvWriter.write("text"+","+"true class"+","+"Conversation Predicted class"+","+"Confidence")
csvWriter.write("\n")

for i in range(len(text)):
    t = text[i]
    txt = text[i]
    if txt[:1] != '"' or txt.strip()[-1] != '"':
        txt = "\"" + txt + "\""
    csvWriter.write(txt+","+test_classes[i]+","+convpredict_classes[i]+","+str(convpredict_conf[i]))
    csvWriter.write("\n")
csvWriter.close()

In [None]:
# Compute confusion matrix
labels=list(set(test_classes))
conv_confusion_matrix = confusion_matrix(test_classes, convpredict_classes, labels)
convConfMatrix = ConfusionMatrix(test_classes, convpredict_classes)

In [None]:
# Print out confusion matrix with labels to csv file
confmatrix2csv(conv_confusion_matrix,labels,confmatrix_csv_file)

In [None]:
%matplotlib inline
convConfMatrix.plot()

In [None]:
# Compute accuracy of classification
acc=accuracy_score(test_classes, convpredict_classes)
print ('Classification Accuracy: ', acc)

In [None]:
# print precision, recall and f1-scores for the different classes
print(classification_report(test_classes, convpredict_classes, labels=labels))

In [None]:
#Optional if you would like each of these metrics separately
#[precision,recall,fscore,support]=precision_recall_fscore_support(test_classes, convpredict_classes, labels=labels)
#print ('precision: ', precision)
#print ('recall: ', recall)
#print ('f1 score: ', fscore)
#print ('support: ', support)