# Watson Assistant Performance Metrics [Client Name]

This notebook performs analytics on the user log records of Watson Assistant (including Voice Interaction). A variety of key business metrics (listed below in Table of Contents) are calculated and displayed in the notebook. The data and metrics are also saved to CSV and can be used for building visualizations in a Cognos Dashboard in Watson Studio. 

Logs can also be exported to spreadsheet in Section 3.0 to perform blind testing in order to improve the performance of your virtual agent.

### Table of Contents
* [1.0 Configuration and Log Collection](#config)
* [2.0 Key Performance Metrics](#performance-metrics)
    * [2.1 Core Metrics](#core-metrics)
        * [2.1.1 Abandonment at Greeting](#abandonment)
        * [2.1.2 Coverage Metric](#coverage-metric)
        * [2.1.3 Escalation Metric](#escalation-metric)
        * [2.1.4 Active Users](active-users)
        * [2.1.5 Top Intents & Average Confidence Scores](#top-intents-scores)
        * [2.1.6 Top Entities](#top-entities)
        * [2.1.7 Optional: Bilingual Assistants](#bilingual-assistants)
    * [2.2 Voice Interaction Metrics](#voice-metrics)
        * [2.2.1 Containment Rate](#containment-rate)
        * [2.2.2 Active Callers](#active-callers)
        * [2.2.3 SMS Sent](#sms-sent)
    * [2.3 Custom Metrics](#custom-metrics)
        * [2.3.1 Context Variable Count](#context-variable-count)
        * [2.3.2 Response Mentions](#response-mentions)
    * [2.4 Export to CSV](#export-to-csv)
* [3.0 Collecting data for blind testing or new ground truth](#blind-testing)

## Housekeeping <a class="anchor" id="housekeeping"></a>
This section will import libraries and dependencies for this notebook. 

**Action Required:** 
- Update the `project_id` and `project_access_token` in order to access your data assets.
- Upload `getAllLogs.py` and `extractConversations.py` into your project's assets. They can be found at https://github.com/cognitive-catalyst/WA-Testing-Tool/tree/master/log_analytics

In [None]:
# @hidden_cell
# The project token is an authorization token that is used to access project resources like data sources, connections, and used by platform APIs.
from project_lib import Project
project = Project(project_id='XXXXXXXXXXXXXX', project_access_token='XXXXXXXXXXXXXX')
pc = project.project_context


In [None]:
# Import dependencies. Ensure these are loaded into your Studio assets.
fobj = open("getAllLogs.py", "wb")
fobj.write(project.get_file("getAllLogs.py").read()) 
fobj.close()

fobj = open("extractConversations.py", "wb")
fobj.write(project.get_file("extractConversations.py").read()) 
fobj.close()

In [None]:
%load_ext autoreload
%autoreload 2

import warnings
warnings.simplefilter("ignore")

!pip install ibm-watson
!pip install --user --upgrade "plotly_express==0.4.0";
!pip install --user --upgrade "matplotlib==3.2.1";
!pip install squarify

import json
import pandas as pd
import getAllLogs
import extractConversations
import plotly.express as px
import plotly.graph_objects as go
import seaborn as sns
import matplotlib.pyplot as plt 

In [None]:
# Custom functions to re-use code throughout notebook
def turn_dict_to_df(df,col_names):
    df = pd.DataFrame.from_dict(df)
    df.reset_index(level=0, inplace=True)
    df.columns = col_names
    return df

## 1.0 Configuration and log collection <a class="anchor" id="config"></a>
The next few cells require some configuration.  Review the variables and update them for your specific assistant.  The comments in the cells guide you in the configuration.

In [None]:
# Action Required: Define the customer name. This prefix will be used for saving CSV & JSON files.
custName = 'XXXXXXX'

### 1.1 Option One: Retrieve logs from the Watson Assistant instance
This option will allow you to retrieve the logs from the Assistant API.

- **For Chat** solutions using an Assistant layer (e.g. web chat), set `workspace_id=None` and provide `assistant_id` filter
- **For Voice Interaction** solutions, define `workspace_id` and leave `assistant_id` filter blank/commented out

Update `log_filter` for any other filters, e.g. update `response_timestamp` if you wish to limit the amount of data retrieved.

In [None]:
# Extract logs from your assistant. Comment entire cell out if you are using existing logs in next step.

# API, URL,and workspace ID are extractable from "View API Details page"
iam_apikey='XXXXXXXXXXXXXX' # Update this

#url pattern depends on region and when it was created (update one to match your instance)
url="https://api.us-south.assistant.watson.cloud.ibm.com"
#url="https://api.us-east.assistant.watson.cloud.ibm.com"

workspace_id='XXXXXXXXXXXXXX' # Update or set to None without quotes

# Filter API is described at: https://cloud.ibm.com/docs/assistant?topic=assistant-filter-reference#filter-reference
log_filter="language::en,response_timestamp>=2020-05-15" \
# +",request.context.system.assistant_id::<<assistant_id>>" # If using this, uncomment and replace <<assistant_id>>

#Change the number of logs retrieved, default settings will return 100,000 logs (200 pages of 500)
page_size_limit=500
page_num_limit=200

#WA API version
version="2018-09-20" 

rawLogsJson = getAllLogs.getLogs(iam_apikey, url, workspace_id, log_filter, page_size_limit, page_num_limit, version)
rawLogsPath= custName + "_logs.json"

# getAllLogs.writeLogs(rawLogsJson, rawLogsPath) # Saves the logs locally
project.save_data(file_name = rawLogsPath,data = json.dumps(rawLogsJson),overwrite=True); # Saves the logs in Studio/COS
print('\nSaved log data to {}'.format(rawLogsPath))

### 1.2 Option Two: Load logs from JSON file
If you have previously saved the JSON file, you can uncomment this section to load the logs. Otherwise, comment this section out and continue.

In [None]:
# #If you have previously stored your logs on the file system, you can reload them here by uncommenting these lines
# rawLogsPath= custName+"_logs.json"
# rawLogsJson = extractConversations.readLogs(rawLogsPath)

### 1.3 Format logs
Update these fields by following the instructions in the comments.

In [None]:
# Define the conversation corrrelation field name for your Watson Assistant records.
# Provide the field name as it appears in the log payload (default is 'response.context.conversation_id')
# For a single-skill assistant use 'response.context.conversation_id'
# For a Voice Gateway/Voice Agent assistant use 'request.context.vgwSessionID'
# For a multi-skill assistant you will need to provide your own key
primaryLogKey = "response.context.conversation_id"

# Name of the correlating key as it appears in the data frame columns (remove 'response.context.')
conversationKey='conversation_id'

# Optionally provide a comma-separated list of custom fields you want to extract, in addition to the default fields
#customFieldNames = "response.context.STT_CONFIDENCE,response.context.action,response.context.vgwBargeInOccurred"
customFieldNames = "response.context.vgwSIPFromURI,response.context.vgwSessionID,request.context.vgwSMSFailureReason,\
request.context.vgwSMSUserPhoneNumber,response.context.user_record.calling_about_child,response.context.user_record.covidExposure,\
response.context.user_record.covidSymptoms,response.output.vgwAction.parameters.transferTarget,response.context.language,\
response.context.metadata.user_id"


allLogsDF = extractConversations.extractConversationData(rawLogsJson, primaryLogKey, customFieldNames)
conversationsGroup = allLogsDF.groupby(conversationKey,as_index=False)

print("Total log events:",len(allLogsDF))
allLogsDF.head()

In [None]:
allLogsDF.columns

In [None]:
# Splits the response_timestamp into month, day, and year fields that can be used for Cognos visualizations
allLogsDF["full_date"] = pd.to_datetime(allLogsDF["response_timestamp"])
allLogsDF['month'] = allLogsDF['full_date'].dt.month
allLogsDF['year'] = allLogsDF['full_date'].dt.year
allLogsDF['day'] = allLogsDF['full_date'].dt.day

# 2.0 Key Performance Metrics <a class="anchor" id="performance-metrics"></a>
The notebook will calculate various performance metrics including `coverage` and `containment`. Basic volume metrics will also be provided.

- **2.1 Core Metrics:** These are conversational metrics that apply to both chat and voice solutions.
- **2.2 Voice Interaction Metrics:** Additional measurements for voice solutions including phone calls, call transfers, unique caller IDs, etc.
- **2.3 Custom metrics:** Other ad-hoc analysis. Requires knowledge of Python.
- **2.4 Export to CSV** Save total logs, key metrics, and uncovered messages to CSV

## 2.1 Core Metrics <a class="anchor" id="core-metrics"></a>
These metrics apply to all Watson Assistant solutions. For voice solutions, additional metrics are in the next section.

In [None]:
# Let's define a dict{} that we will send to CSV for use in Watson Studio Cognos Dashboard
metrics_dict = {}

# These should match the count in the Watson Assistant Analytics tooling.
totalConvs = len(allLogsDF[conversationKey].unique())
print("Total messages:", len(allLogsDF))
print("Total conversations:", totalConvs)

### 2.1.1 Abandonment at Greeting <a class="anchor" id="abandonment"></a>
- Log events with a blank `input.text` are the greeting messages. This should equal total conversations.
- Filtering out these messages will reveal how many conversations abandoned before the first user utterance.

In [None]:
# Filter out blank inputs and vgwHangUp tags in log events
filteredLogsDF = allLogsDF[allLogsDF['input.text'] != ""]
filteredLogsDF = filteredLogsDF[filteredLogsDF['input.text'] != 'vgwHangUp'] 
filteredLogsDF = filteredLogsDF[filteredLogsDF['input.text'] != 'vgwPostResponseTimeout'] 

filteredMessages = len(filteredLogsDF)
filteredConvs = len(filteredLogsDF[conversationKey].unique())
abandonedGreeting = (totalConvs - filteredConvs)
metrics_dict['abandonedGreeting'] = [abandonedGreeting] # Put into metrics dict

print("Total messages:", filteredMessages)
print("Total conversations:", filteredConvs)
print("Abandoned conversations:", abandonedGreeting)

### 2.1.2 Coverage Metric <a class="anchor" id="coverage-metric"></a>
Coverage is the measurement of the portion of total user messages that your assistant is attempting to respond to. 

- **Action Required:** Define the node_ids in `anything_else_nodes` list that represent any responses for uncovered messages.
- Coverage is then calculated by taking the number of visits to these nodes divided by total messages and subtracting one. 
- This metric filters out Voice Gateway actions such as `vgwHangUp`

In [None]:
# Define the node_id for anything_else and other uncovered nodes
anything_else_nodes = ['XXXXXXXXXXXXXX'] 

for row in filteredLogsDF.itertuples():
    nodes = row.nodes_visited
    for node in nodes:
        if node in anything_else_nodes:
            anything_else_nodes.append(row.Index)
            
uncoveredDF = filteredLogsDF[filteredLogsDF.index.isin(anything_else_nodes)]

print("Uncovered messages:",len(uncoveredDF))
coverageMetric = 1-len(uncoveredDF)/filteredMessages
metrics_dict['coverage'] = [coverageMetric] # Put into metrics dict
print("Coverage metric is",'{:.0%}'.format(coverageMetric))

uncoveredDF = uncoveredDF[['input.text','output.text','intent','intent_confidence']]

uncoveredDF.to_csv(custName + "_uncovered_msgs.csv",index=False, header=['Utterance','Response','Intent','Confidence'])

uncoveredDF.head(10).sort_values('intent_confidence',ascending=False)

### 2.1.3 Escalation Metric <a class="anchor" id="escalation-metric"></a>
Escalation is defined as responding with a method to contact a live person (e.g. pointing to a 1-800 number). 
- **Action Required:** Define the node_id for where your assistant responds to an escalation request (e.g. `#General-Agent-Escalation`)
- For Voice Interaction solutions, we calculate `call containment` in the next section by counting the number of call transfers in the logs. 

In [None]:
# Define the escalation node
escalation_node = "XXXXXXXXXXXXXX" 
node_visits_escalated = allLogsDF[[escalation_node in x for x in allLogsDF['nodes_visited']]]

print("Total visits to escalation node:",len(node_visits_escalated))

escalationMetric = len(node_visits_escalated)/filteredMessages
metrics_dict['escalation'] = [escalationMetric] # Put into metrics dict
print("\nEscalation metric is",'{:.0%}'.format(escalationMetric))

### 2.1.4 Active Users <a class="anchor" id="active-users"></a>
How many unique users used the assistant?

In [None]:
uniqueUsers = allLogsDF["metadata.user_id"].nunique()
metrics_dict['uniqueUsers'] = [uniqueUsers] # Put into metrics dict
print('Total unique users: {}'.format(uniqueUsers))

### 2.1.5 Top Intents & Average Confidence Scores <a class="anchor" id="top-intents-scores"></a>

In [None]:
# Using pandas aggregators to count how often each intent is selected and its average confidence
intentsDF = filteredLogsDF.groupby('intent',as_index=False).agg({
   'input.text': ['count'], 
   'intent_confidence': ['mean']
})

intentsDF.columns=["intent","count","confidence"] #Flatten the column headers for ease of use

intentsDF = intentsDF[intentsDF['intent'] !=''] # Remove blanks, usually VGW tags + greetings
intentsDF = intentsDF.sort_values('count',ascending=False)
intentsDF.head(5) # If you want specific number shown, edit inside head(). If you want to show all, remove head() 

In [None]:
ax = sns.barplot(x="count", y="intent", data=intentsDF.head(),orient='h',palette="Blues_d").set_title('Top Intents')

### 2.1.6 Top Entities (Skip for now) <a class="anchor" id="top-entities"></a>

In [None]:
entityDF = allLogsDF[allLogsDF["entities"] != ""]
#intentsDF = intentsDF[intentsDF['intent'] !=''] # Remove blanks, usually VGW tags + greetings
entityDF["entities"].iloc[50]

In [None]:
# ax = sns.lineplot(x="week", y="conversations", data=trendDF,label='',sort=False).set_title('Weekly Conversations')

### 2.1.7 Optional: Bilingual Assistants <a class="anchor" id="bilingual-assistants"></a>
For assistants that use a single skill for two different languages. The skill may set a context variable (e.g. `$language=="english"`) and then respond accordingly based on this variable. This cell will count the unique conversation_ids that have a given context variable.
- **Action Required:** Define the `languageVar` that your skill uses to identify the language used to respond to the user.

In [None]:
languageVar = 'language' # define the context variable that you retrieved above in customFields

languageDF = allLogsDF.groupby([languageVar])["conversation_id"].nunique()
languageDF = turn_dict_to_df(languageDF, ['Context Var', 'Count'])
languageDF = languageDF[languageDF['Context Var'] != '']
languageDF

## 2.2 Voice Interaction Metrics <a class="anchor" id="voice-metrics"></a>
These metrics are for Voice Agent solutions. We start with volume metrics.

In [None]:
uniqueCallers = allLogsDF['vgwSIPFromURI'].unique()
uniqueCalls = allLogsDF['vgwSessionID'].unique()

print("Total phone calls:", len(uniqueCalls)) # It will print '1' if there are no calls found in the logs
print("Total unique callers:", len(uniqueCallers))
print("Average messages per call:", int(len(allLogsDF) / len(uniqueCalls)))

In [None]:
# Filters out blank inputs and vgwHangUp tags in log events
filteredLogsDF = allLogsDF[allLogsDF['input.text'] != ""]
filteredLogsDF = filteredLogsDF[filteredLogsDF['input.text'] != 'vgwHangUp'] 
filteredLogsDF = filteredLogsDF[filteredLogsDF['input.text'] != 'vgwPostResponseTimeout'] 

### 2.2.1 Containment Rate <a class="anchor" id="containment-rate"></a>
How many call transfers did the voice solution perform?

In [None]:
transfersDF = allLogsDF.groupby(["output.vgwAction.parameters.transferTarget"])["vgwSessionID"].count()
transfersDF = turn_dict_to_df(transfersDF, ['TransferTo', 'Count'])
transfersDF = transfersDF[transfersDF['TransferTo'] != '']

print('Call transfer count:', transfersDF['Count'].sum()) 
containmentRate = 1 - transfersDF['Count'].sum() / len(uniqueCalls)
print('Call containment rate:', '{:.0%}'.format(containmentRate))
metrics_dict['callTransfers'] = [transfersDF['Count'].sum()] # Put into metrics dict
metrics_dict['containment'] = [containmentRate] # Put into metrics dict
transfersDF.sort_values('Count',ascending=False)

### 2.2.2 Active Callers <a class="anchor" id="active-callers"></a>
How many unique caller IDs dialed into the voice solution?

In [None]:
callsDF = allLogsDF.groupby(['vgwSIPFromURI'])['vgwSessionID'].nunique()
callsDF = pd.DataFrame.from_dict(callsDF)
callsDF.reset_index(level=0, inplace=True)
callsDF.columns = ['Caller ID', 'Call Count']
print('Total unique caller IDs:', len(callsDF))
callsDF.head().sort_values('Call Count',ascending=False)
metrics_dict['callerIDs'] = [len(callsDF)] # Put into metrics dict

### 2.2.3 SMS Sent <a class="anchor" id="sms-sent"></a>
A text message can be sent to the caller and can be initiated from within the Watson Assistant JSON editor. This will count the number of SMS sent.

In [None]:
smsDF = allLogsDF[allLogsDF['vgwSMSUserPhoneNumber'] != '']
metrics_dict['sms'] = [len(smsDF)] # Put into metrics dict
print('Total SMS sent to callers: {}'.format(len(smsDF)))

## 2.3 Custom Metrics <a class="anchor" id="custom-metrics"></a>
This section is optional and can be used to create custom metrics. It will require the basic knowledge of Python and Pandas. Two examples of custom metrics included below can be modified, or additional metrics can be added here. **Jump to section 2.4 if you do not wish to build custom metrics.**

### 2.3.1 Context Variable Count <a class="anchor" id="context-variable-count"></a>
Some use cases require the use of context variables in order to track user inputs. For one customer, the assistant asks a series of questions in order to screen the patient. This section will identify based on context variables from within the `user_record` array. 

In [None]:
Q1 = allLogsDF.groupby(["user_record.calling_about_child"])["conversation_id"].nunique()
Q1 = turn_dict_to_df(Q1, ['Yes/No', 'Count'])
Q1['Question'] = 'Q1'
Q1 = Q1[Q1['Yes/No'] != '']

In [None]:
Q2 = allLogsDF.groupby(["user_record.covidExposure"])["conversation_id"].nunique()
Q2 = turn_dict_to_df(Q2, ['Yes/No', 'Count'])
Q2['Question'] = 'Q2'
Q2 = Q2[Q2['Yes/No'] != '']

In [None]:
Q3 = allLogsDF.groupby(["user_record.covidSymptoms"])["conversation_id"].nunique()
Q3 = turn_dict_to_df(Q3, ['Yes/No', 'Count'])
Q3['Question'] = 'Q3'
Q3 = Q3[Q3['Yes/No'] != '']

In [None]:
customVarDF = pd.concat([Q1,Q2,Q3]).reset_index(level=0,drop=True)
customVarDF.to_csv(custName+'_ScreeningCount.csv',index=None)
project.save_data(file_name = custName + "_ScreeningCount.csv",data = customVarDF.to_csv(index=False),overwrite=True) # This saves in COS. Comment out if running locally
customVarDF

### 2.3.2 Response Mentions <a class="anchor" id="response-mentions"></a>
A specific customer wanted to identify all mentions of `311` in the responses to users. 

In [None]:
helpDF = allLogsDF[(allLogsDF['output.text'].str.contains('311')) | (allLogsDF['output.text'].str.contains('3-1-1'))] 
print('Total 3-1-1 response mentions:', len(helpDF))

In [None]:
newDF = pd.DataFrame(calculate_weekly_conversations(helpDF, week_offset))
newDF

In [None]:
# ax = sns.barplot(x="week", y="messages", data=newDF,orient='v', palette="Blues_d")

## 2.4 Export to CSV <a class="anchor" id="export-to-csv"></a>
Save important metrics and data from Pandas dataframes to CSV for use in Cognos Analytics dashboards and for deeper data exploration. The three CSV files will be saved to your Assets folder in your Watson Studio project. The files can be used to build visualizations in Cognos Dashboard.
### 2.4.1 Save all logs to CSV

In [None]:
# allLogsDF.to_csv(custName+'_logs.csv',index=False) # This saves if running notebook locally. Comment out for Studio. 
print('Saving metrics to {}'.format(custName+ "_logs.csv"))
project.save_data(file_name = custName + "_logs.csv",data = allLogsDF.to_csv(index=False),overwrite=True) # This saves in COS. Comment out if running locally

### 2.4.2 Save KPIs to CSV

In [None]:
metricsDF = pd.DataFrame(metrics_dict)
# metricsDF.to_csv(custName + "_KeyMetrics.csv",index=False) # This saves if running notebook locally. Comment out for Studio. 
project.save_data(file_name = custName + "_KeyMetrics.csv",data = metricsDF.to_csv(index=False),overwrite=True); # This saves in COS. Comment out if running locally
print('Saving key metrics to {}'.format(custName+ "_KeyMetrics.csv"))
metricsDF

### 2.4.3 Save uncovered messages to CSV
Improve Coverage by analyzing these uncovered messages. This might require adding training data to Intents or customizing STT models.

In [None]:
print('\nSaved', len(uncoveredDF), 'messages to:', custName + "_uncovered_msgs.csv")

project.save_data(file_name = custName + "_uncovered_msgs.csv",data = uncoveredDF.to_csv(index=False),overwrite=True); # This saves in COS. Comment out if running locally

## 3.0 Collecting data for blind testing or new ground truth <a class="anchor" id="blind-testing"></a>
This section is **optional** for those wishing to perform blind testing on user data. Whether we want to assess the performance of our classifier via a blind test or gather new ground truth training data we need a quick way to extract what our users are saying to open-ended questions.  There are multiple ways to extract these utterances depending on the type of assistant.

Regardless of method the general recipe is:

1. Extract user utterances and intents assigned by Watson Assistant
2. Use SMEs to provide the actual intent of each utterance
3. Assess test performance and update training (ie, via [Dialog Skill Analysis notebook](https://medium.com/ibm-watson/announcing-dialog-skill-analysis-for-watson-assistant-83cdfb968178))


### 3.1 Gathering initial user responses via hardcoded dialog turn number
For a single-skill assistant we can use the `dialog_turn_counter` field to extract utterances on a given turn.  This field uses a 1-based index, ie the first turn is index=1. (Python generally assumes a 0-based index).

If the user speaks first, search on USER_FIRST_TURN_COUNTER=1.  If the assistant speaks first, use USER_FIRST_TURN_COUNTER=2


In [None]:
USER_FIRST_TURN_COUNTER=2
userFirstTurnView = filteredLogsDF[filteredLogsDF['dialog_turn_counter']==USER_FIRST_TURN_COUNTER]
userFirstTurnDF = userFirstTurnView[["input.text","intent","intent_confidence"]]

userFirstTurnDF.head()

### 3.2 Write out the user utterances to a file
Dataframes are easily exported to a comma-separated file which is easily imported into Excel and other tools.
For a blind test you need at the user utterance and the predicted intent.
When you have SMEs review the intents you should mindfully select one of these two options:

1. Include the predicted intent and let the SME make corrections.  This is the fastest approach but may bias the SMEs towards what was already predicted.
2. Remove the predicted intent.  This is more time-consuming for SMEs but generates unbiased labels.

This file-writing code can be used with any of the "gather response patterns" in this notebook.

In [None]:
# Uncomment ONE of the patterns
# Pattern 1: Write out all utterances and predictions
userFirstTurnDF.to_csv(custName+"_utterances.csv",index=False,header=["Utterance","Predicted Intent", "Prediction Confidence"])

#Pattern 2: Write only the user utterance
# userFirstTurnDF = userFirstTurnView[["input.text"]]
# userFirstTurnDF.to_csv("utterances2.csv",index=False,header=["Utterance"])

### End of Notebook v1.3 (last modified on 5-22-20)