# Call Center Instrumentation & Analytics (CCIA)

## Watson-Call-Center-Think18 Lab March 2015

This document provides guidance and background for a hands-on Python + IBM Watson lab being presented at an IBM Think2018 conference in March 2018, and for an IPython / Jupyter notebook and python code available for open source after the event.

The focus is Call Center Instrumentation and Analytics (CCIA) pattern.   The Notebook and information here seek to help organizations beginning to explore how to better understand the unstructured "dark data" that arises from phone calls to call centers. 

### Why is this useful?
Enterprises spend more than $1 trillion on 250 billion customer service calls each year.  By using multiple IBM Watson "signal services" to extract signal from raw audio data; perform data analytics, clustering, unsupervised and machine learning, and visualizations, technical teams can use data understand patterns in call centers. KPI and ROI positive.

### What is the process? And what Watson services are used?
- Step 1 - Speech to Text (STT) – Converts Raw Audio to Transcripts; 
- Step 2 - Natural Language Understanding (NLU) - extracts features concepts, entities, keywords, categories/topics, sentiment and emotion; 
- Step 3 - Natural Language Classifier (NLC) - is a user trained classification service, with user defined “ground truth” that classifies text chunks; 
- Step 4 - Tone Analyzer (Tone) – uses linguistic analysis to detect emotional and language tones in written text; 
- Step 5 - Call Center Analytics – analyzes and visualizes the data signal to allow for interpretation of data and in cases, actionable insights; 


### Beginner Audience & Focus on Basics
- This is a beginner lab intended to educate on the fundamentals of getting from data to insights with IBM Watson and open source tools 
- Audience may include IT and operations teams curious about enriching unstructured data – the lab is NOT intended for sophisticated call center technologists 
- Lab/code does NOT purport to compete with expensive and sophisticated solutions already in market 
- The lab and code cover the basics – to educate on the fundamental plumbing and steps, to provide base for instrumentation 

### Success Metrics
If successful – the lab participants or notebook users will
1.	Gain experience in using an IPython / Jupyter notebook
https://ipython.org/notebook.html
2.	Connect to four Watson Developer Cloud ‘signal service’ APIs 
https://www.ibm.com/watson/developer/ 
3.	Connect to IBM Cloud Object storage for data read and write 
https://www.ibm.com/cloud/object-storage 
4.	Understand whether/how the tools and methods might benefit org
https://github.com/mamoonraja/call-center-think18/tree/master/notebooks



## Notebook 1 – Speech to Text (STT) & First Contact
## Install Python Dependencies

Python’s standard library is very extensive, offering a wide range of facilities.  It contains built-in modules like JSON a lightweight data interchange format.  https://docs.python.org/2/library/index.html and https://docs.python.org/2/library/json.html

IBM Watson Developer Cloud has a Python client library to quickly get started with the various Watson APIs services.
https://pypi.python.org/pypi/watson-developer-cloud

Using Python with IBM COS: Python support is provided through the Boto 3 library. The boto3 library provides complete access and can source credentials. The IBM COS endpoint must be specified when creating a service resource or low-level client as shown in documentation
https://ibm-public-cos.github.io/crs-docs/python


In [290]:
#imports.... Run this each time after restarting the Kernel
!pip install watson_developer_cloud
import watson_developer_cloud as watson
import json
from botocore.client import Config
import ibm_boto3
import requests
from urllib.request import urlopen 




## Set up Cloud Object Storage
IBM Cloud Object Storage is a highly scalable cloud storage service, designed for high durability, resiliency and security. Store, manage and access your data via our self-service portal and RESTful APIs. Connect applications directly to Cloud Object Storage use other IBM Cloud Services with your data.  If creating separate of lab - go here: https://console.bluemix.net/catalog/services/cloud-object-storage to create one

Once Cloud object storage instance is created (pre-condition for this project), go to Cloud Object Storage dashboard: Login at http://ibm.com/cloud/ aka https://console.ng.bluemix.net/  and from Dashboard select your Cloud object storage instance, which will take you to service dashboard page.

### Credentials
Credentials are also created for you when you create project. From service dashboard page select `Service Credentials` from left navigation menu item, and copy/paste the credentials below:


In [303]:
# For Cloud Object Storage

credentials_os = {
}



### Bucket name
Buckets are created for you when you create project. From service dashboard page select `Buckets` from left navigation menu item, and get your bucket name and copy/paste bucket name below:


In [304]:
credentials_os['BUCKET'] = '<bucket_name>' # copy bucket name from COS

### How to get audio files?

You can follow following sources to get audio files:
    - https://github.com/mamoonraja/call-center-think18/tree/master/resources/audio_samples
    - https://github.com/rustyoldrake/call_center_instrumentation_analytics/blob/master/CCIA_lab_test_audio.zip

Audio files are alson uploaded to a cloud object storage bucket for lab purposes, and you can get audio files by using read-only credentials provided by us.

In [305]:
credentials_audio_samples = {
    "apikey": "RRKwTeEGGDJtPDii9SPRxOHooiXZnUdpCRcjnzdPo3uh",
    "endpoints": "https://cos-service.bluemix.net/endpoints",
    "iam_apikey_description": "Auto generated apikey during resource-key operation for Instance - crn:v1:bluemix:public:cloud-object-storage:global:a/8739a0c318b37263a932b45c1947965d:7ce353a1-fa6f-4e25-a311-e29b0b2a8ad8::",
    "iam_apikey_name": "auto-generated-apikey-3e922f8a-7453-4038-8410-5399f9306530",
    "iam_role_crn": "crn:v1:bluemix:public:iam::::serviceRole:Reader",
    "iam_serviceid_crn": "crn:v1:bluemix:public:iam-identity::a/8739a0c318b37263a932b45c1947965d::serviceid:ServiceId-3039af12-8feb-4ef5-8306-d6acd6a0ca31",
    "resource_instance_id": "crn:v1:bluemix:public:cloud-object-storage:global:a/8739a0c318b37263a932b45c1947965d:7ce353a1-fa6f-4e25-a311-e29b0b2a8ad8::",
    "BUCKET": 'audio-samples',
}

credentials_audio_samples['BUCKET'] = 'audio-samples'

## Getting STT Credentials 
Importing Credentials – Each Watson signal service (STT, NLC, NLU and Tone) will require credentials - a username and password
If you already have an IBM Cloud / Bluemix account login here https://console.bluemix.net/ but if you have not yet registered for IBM Cloud - you will need to Register for a Free account here https://www.ibm.com/watson/developer/ registration takes less than 4 minutes and is free. More information here https://www.ibm.com/watson/developer-resources/ 

Once logged in - go to https://console.bluemix.net/developer/watson/dashboard - browse services for SPEECH TO TEXT, and select Details, Create service from here https://console.bluemix.net/catalog/services/speech-to-text  - for free you can select LITE Plan 
LITE plan for STT “gets you started with 100 minutes per month at no cost”

The Username and Password (and URL) is found by clicking on service credentials and then “view credential”, and copy/paste credentials below: 

In [306]:
credentials_stt = {
}


In [318]:
# The code was removed by DSX for sharing.

In [319]:
# The code was removed by DSX for sharing.

### Set up Object Storage

In [320]:
def set_up_object_storage(credentials_object_storage):
    endpoints = requests.get(credentials_object_storage['endpoints']).json()

    iam_host = (endpoints['identity-endpoints']['iam-token'])
    cos_host = (endpoints['service-endpoints']['cross-region']['us']['public']['us-geo'])

    auth_endpoint = "https://" + iam_host + "/oidc/token"
    service_endpoint = "https://" + cos_host


    client = ibm_boto3.client(
        's3',
        ibm_api_key_id = credentials_object_storage['apikey'],
        ibm_service_instance_id = credentials_object_storage['resource_instance_id'],
        ibm_auth_endpoint = auth_endpoint,
        config = Config(signature_version='oauth'),
        endpoint_url = service_endpoint
       )
    return client

client = set_up_object_storage(credentials_os)
client_global = set_up_object_storage(credentials_audio_samples)



## Speect to Text 

Following cell has two methods:
 - `get_transcript()` calls speech to text enpoint and generates a text transcript for you for a sample audio file.
 - `analyze_sample()` gets the sample object from cloud storage, calls get_transcript to fetch the tranccript, and saves your transcript in cloud storage as `<file_name>_text.json`.
 
OGG, WAV FLAC, L16, MP3, MPEG formats are options for the IBM Watson STT service.  For the lab we use OGG samples. 

In [321]:
#STT

import json
import io
from os.path import join, dirname
from watson_developer_cloud import SpeechToTextV1

speech_to_text = SpeechToTextV1(
    username = credentials_stt['username'],
    password = credentials_stt['password'],
    url = 'https://stream.watsonplatform.net/speech-to-text/api',
)


# OGG, WAV FLAC, L16, MP3, MPEG formats are options for the STT service 
# with Narrowband (generaly telco) and Broadband (e.g. higher quality USB mic) audio.  
# For the LAB – OGG format was used for sample files in lab. Of other audio formats e.g. WAV - remember to change 'OGG' content_type='audio/ogg' in code below if you do.

#get transcript Very basic one
def get_transcript(audio):
    transcript = json.dumps(speech_to_text.recognize(audio=audio, content_type='audio/ogg', timestamps=True,
        word_confidence=True), indent=2)
    return transcript

def download_file(path, filename):
    url = path + filename
    print(url)
    r = requests.get(url, stream=True)
    return r.content

def analyze_sample(sample):
    streaming_body = client_global.get_object(Bucket = credentials_audio_samples['BUCKET'], Key=sample)['Body'] #http
    audio = streaming_body.read()
    text = get_transcript(audio)
    client.put_object(Bucket = credentials_os['BUCKET'], Key = sample.split('.')[0] + '_text.json', Body = text)
    return text

def visualize(transcript):
    for result in json.loads(transcript)['results']:
        print(result['alternatives'][0]['transcript'], result['alternatives'][0]['confidence'])    


## More about audio files

`file_list` provides list of audio file in an array, each OGG file produces its own transcript.  

    - Samples 1,2,3,4,5 are short - about 2 minutes 
    - Samples 6 and 7 are about 7 minutes (the STT SERVICE WILL NEED TIME TO PROCESS)

For longer files and transcription at scale: https://www.ibm.com/watson/developercloud/speech-to-text/api/v1/

### WebSockets
WebSockets includes a single method that establishes a persistent connection with the service over the WebSocket protocol.

### Sessionless
Sessionless includes a method that provides a simple means of transcribing audio without the overhead of establishing and maintaining a session. Sessions provides methods that allow a client to maintain a long, multi-turn exchange, or session, with the service or to establish multiple parallel conversations with a particular instance of the service.

### Asynchronous
Asynchronous provides a non-blocking interface for transcribing audio. You can register a callback URL to be notified of job status and, optionally, results, or you can poll the service to learn job status and retrieve results manually. Longer (e.g. 1 hour) audio files may justify using asynchronous method, and a real time a sessions method (both defined below)



In [322]:
file_list = ['sample1-addresschange-positive.ogg',
             'sample2-address-negative.ogg',
             'sample3-shirt-return-weather-chitchat.ogg',
             'sample4-angryblender-sportschitchat-recovery.ogg',
             'sample5-calibration-toneandcontext.ogg',
             'jfk_1961_0525_speech_to_put_man_on_moon.ogg',
             'May 1 1969 Fred Rogers testifies before the Senate Subcommittee on Communications.ogg']

   # QA: Double check last filename (spaces vs underscores)

## Transcription Test - Point to first file in list (position 0) and analyze 

In [None]:
# TRANSCRIBE – this is where STT receives the OGG files provided and returns text to TRANSCRIPT
# this is a test of ONE transcription in the list - place '0' - may take a minute
transcript = analyze_sample(file_list[0])
visualize(transcript)

In [None]:
for filename in file_list:
    print('Processing file: ', filename)
    transcript = analyze_sample(filename)
    visualize(transcript)