# Notebook for testing performance of NLU Sentiment
[Watson Developer Cloud](https://www.ibm.com/watsondevelopercloud) is a platform of cognitive services that leverage machine learning techniques to help partners and clients solve a variety business problems. 

It is critical to understand that training a machine learning solution is an iterative process where it is important to continually improve the solution by providing new examples and measuring the performance of the trained solution. In this notebook, we show how you can compute important Machine Learning metrics (accuracy, precision, recall, confusion_matrix) to judge the performance of your solution. For more details on these various metrics, please consult the **[Is Your Chatbot Ready for Prime-Time?](https://developer.ibm.com/dwblog/2016/chatbot-cognitive-performance-metrics-accuracy-precision-recall-confusion-matrix/)** blog.


<br> The notebook assumes you have already created a Watson [Natural Language Understanding](https://www.ibm.com/watson/services/natural-language-understanding/) instance. </br>
<br> To leverage this notebook, you need to provide the following information</br>
* Credentials for your NLU instance (username and password)
* csv file with your text utterances and corresponding sentiment labels (positive, negative, neutral)
* results csv file to write the results to

In [None]:
# Only run this cell if you don't have pandas_ml or watson_developer_cloud installed
!pip install pandas_ml
# You can specify the latest verion of watson_developer_cloud (1.0.0 as of November 20, 2017)
#!pip install -I watson-developer-cloud==1.0.0
## install latest watson developer cloud Python SDK
!pip install --upgrade watson-developer-cloud

In [None]:
#Import utilities
import json
import sys
import codecs
import unicodecsv as csv
from sklearn.metrics import confusion_matrix
from sklearn.metrics import precision_recall_fscore_support
from sklearn.metrics import classification_report
from sklearn.metrics import accuracy_score
import matplotlib.pyplot as plt
import pandas_ml
from pandas_ml import ConfusionMatrix
from watson_developer_cloud import NaturalLanguageUnderstandingV1
from watson_developer_cloud.natural_language_understanding.features import (v1 as Features)

Provide the path to the parms file which includes credentials to access your NLU service as well as the input
test csv file and the output csv file to write the output results to.

In [None]:
# Sample parms file data
#{
#	"url": "https://gateway.watsonplatform.net/natural-language-understanding/api/v1",
#	"user":"YOUR_NLU_USERNAME",
#	"password": "YOUR_NLU_PASSWORD",
#	"test_csv_file": "COMPLETE_PATH_TO_YOUR_TEST_CSV_FILE",
#	"results_csv_file": "COMPLETE PATH TO RESULTS FILE (any file you can write to)",
#	"confmatrix_csv_file": "COMPLETE PATH TO CONFUSION MATRIX FILE (any file you can write to)"
#}



In [None]:
# Provide complete path to the file which includes all required parms
# A sample parms file is included (example_parms.json)
nluParmsFile = 'COMPLETE PATH TO YOUR PARMS FILE'
parms = ''
with open(nluParmsFile) as parmFile:
    parms = json.load(parmFile)

url=parms['url']
user=parms['user']
password=parms['password']
test_csv_file=parms['test_csv_file']
results_csv_file=parms['results_csv_file']
confmatrix_csv_file=parms['confmatrix_csv_file']

json.dumps(parms)

# Create an object for your NLU instance
natural_language_understanding = NaturalLanguageUnderstandingV1(
    version = '2017-02-27',
    username = user,
    password = password
)


Define useful methods to return sentiment of text using NLU.

In [None]:
# Given a text string and a pointer to NLU instance, get back NLU sentiment response
def getNLUresponse(nlu_instance,features,string):
    nlu_analysis = natural_language_understanding.analyze(features, string, return_analyzed_text=True)
    return nlu_analysis

# Process multiple text utterances (provided via csv file) in batch. Effectively, read the csv file and for each text
# utterance, get NLU sentiment. Aggregate and return results.
def batchNLU(nlu_instance,features,csvfile):
    test_labels=[]
    nlupredict_labels=[]
    nlupredict_score =[]
    text=[]
    i=0
    print ('reading csv file: ', csvfile)
    with open(csvfile, 'rb') as csvfile:
        # For better handling of utf8 encoded text
        csvReader = csv.reader(csvfile, encoding="utf-8-sig")
        for row in csvReader:
            print(row)
            # Assume input text is 2 column csv file, first column is text
            # and second column is the label/class/intent
            # Sometimes, the text string includes commas which may split
            # the text across multiple colmns. The following code handles that.
            if len(row) > 2:
                qelements = row[0:len(row)-1]
                utterance = ",".join(qelements)
                test_labels.append(row[len(row)-1])
            else:
                utterance = row[0]
                test_labels.append(row[1])
            utterance = utterance.replace('\r', ' ')
            print ('i: ', i, ' testing row: ', utterance)
            
            nlu_response = getNLUresponse(nlu_instance,features,utterance)
            if nlu_response['sentiment']:
                nlupredict_labels.append(nlu_response['sentiment']['document']['label'])
                nlupredict_score.append(nlu_response['sentiment']['document']['score'])
            else:
                nlupredict_labels.append('')
                nlupredict_score.append(0)
            text.append(utterance)
            
            i = i+1
            if(i%250 == 0):
                print("")
                print("Processed ", i, " records")
            if(i%10 == 0):
                sys.stdout.write('.')
        print("")
        print("Finished processing ", i, " records")
    return test_labels, nlupredict_labels, nlupredict_score, text

# Plot confusion matrix as an image
def plot_conf_matrix(conf_matrix):
    plt.figure()
    plt.imshow(conf_matrix)
    plt.show()

# Print confusion matrix to a csv file
def confmatrix2csv(conf_matrix,labels,csvfile):
    with open(csvfile, 'wb') as csvfile:
        csvWriter = csv.writer(csvfile)
        row=list(labels)
        row.insert(0,"")
        csvWriter.writerow(row)
        for i in range(conf_matrix.shape[0]):
            row=list(conf_matrix[i])
            row.insert(0,labels[i])
            csvWriter.writerow(row)
            

In [None]:
# This is an optional step to quickly test response from NLU for a given utterance
testQ='it was great talking to your agent '
features = {"sentiment":{}}
results = getNLUresponse(natural_language_understanding,features,testQ)
print(json.dumps(results, indent=2))

Call NLU on the specified csv file and collect results.

In [None]:
features = {"sentiment":{}}
test_labels,nlupredict_labels,nlupredict_score,text=batchNLU(natural_language_understanding,features,test_csv_file)

In [None]:
# print results to csv file including original text, the correct label, 
# the predicted label and the score reported by NLU.
csvfileOut=results_csv_file
with open(csvfileOut, 'wb') as csvOut:
    outrow=['text','true label','NLU Predicted label','Score']
    csvWriter = csv.writer(csvOut,dialect='excel')
    csvWriter.writerow(outrow)
    for i in range(len(text)):
        outrow=[text[i],test_labels[i],nlupredict_labels[i],str(nlupredict_score[i])]
        csvWriter.writerow(outrow)

In [None]:
# Compute confusion matrix
labels=list(set(test_labels))
nlu_confusion_matrix = confusion_matrix(test_labels, nlupredict_labels, labels)
nluConfMatrix = ConfusionMatrix(test_labels, nlupredict_labels)

In [None]:
# Print out confusion matrix with labels to csv file
confmatrix2csv(nlu_confusion_matrix,labels,confmatrix_csv_file)

In [None]:
%matplotlib inline
nluConfMatrix.plot()

In [None]:
# Compute accuracy of classification
acc=accuracy_score(test_labels, nlupredict_labels)
print('Classification Accuracy: ', acc)

In [None]:
# print precision, recall and f1-scores for the different classes
print(classification_report(test_labels, nlupredict_labels, labels=labels))

In [None]:
#Optional if you would like each of these metrics separately
#[precision,recall,fscore,support]=precision_recall_fscore_support(test_labels, nlupredict_labels, labels=labels)
#print("precision: ", precision)
#print("recall: ", recall)
#print("f1 score: ", fscore)
#print("support: ", support)