## Install Required Libraries
Install the required libraries to be able to connect to IBM Db2 Warehouse on the cloud.
Also set environment variable DYLD_LIBRARY_PATH to point to the correct install for ibm_db package.

In [52]:
!pip install ibm_db



In [53]:
import sys,os

In [54]:
# Provide complete path to your python site-packages install
PYTHON_INSTALL="YOUR_PYTHON_SITE_PACKAGES_INSTALL_PATH"
LIB_PATH=PYTHON_INSTALL + "/clidriver/lib"
ICC_PATH=LIB_PATH + "/icc"
print(LIB_PATH)

/Users/kozhaya/Documents/MyFiles/software/anaconda3/envs/python2/lib/python2.7/site-packages/clidriver/lib


In [55]:
os.environ['DYLD_LIBRARY_PATH'] = LIB_PATH + ":" + ICC_PATH + ":" + "$DYLD_LIBRARY_PATH"
print(os.environ['DYLD_LIBRARY_PATH'])

/Users/kozhaya/Documents/MyFiles/software/anaconda3/envs/python2/lib/python2.7/site-packages/clidriver/lib:/Users/kozhaya/Documents/MyFiles/software/anaconda3/envs/python2/lib/python2.7/site-packages/clidriver/lib/icc:$DYLD_LIBRARY_PATH


In [56]:
print(os.environ['DYLD_LIBRARY_PATH'])

/Users/kozhaya/Documents/MyFiles/software/anaconda3/envs/python2/lib/python2.7/site-packages/clidriver/lib:/Users/kozhaya/Documents/MyFiles/software/anaconda3/envs/python2/lib/python2.7/site-packages/clidriver/lib/icc:$DYLD_LIBRARY_PATH


In [57]:
import ibm_db
import time
import pandas
import ibm_db_dbi

## Connection to DB2 Warehouse
Get the credentials for your DB2 Warehouse on the cloud instance. To do so, log into your IBM Cloud account and then on top left, click on the menu icon and then select Dashboard.
![IBM Cloud Dashboard](./files/dashboard.png "Dashboard")

Next click on your DB2 Warehouse instance to launch that instance and on that page, select Service Credentials from the left navigation column.

This should return the credentials required to access your service.
Specifically, select the value for "dsn" key as shown in the following image:

![DB2 Warehouse Service Credentials](./files/db2_creds.png "Dashboard")

Include that value in the next cell to create a connection to your DB2 Warehouse on the Cloud instance.

In [58]:
conn = ibm_db.connect(dsn,"","")
pconn = ibm_db_dbi.Connection(conn)

## Read data from DB2 table into Pandas dataframe
Specify the table you'd like to read from DB2 into a Pandas dataframe. In the example below, I load the data from table DASH6296.DSX_CLOUDANT_SINGERS_TWEETS into a df pandas dataframe. Depending on how large your table is, this may take a few minutes to complete.

In [59]:
pandasDF = pandas.read_sql('SELECT * FROM DASH6296.DSX_CLOUDANT_SINGERS_TWEETS', pconn)

In [60]:
# Plot dimensions of the Pandas dataframe
pandasDF.shape

(198070, 24)

## Specify NLU Credentials
Next, you need to specify the credentials for your Watson Natural Language Understanding (NLU) service. If you don't have an NLU service, you can create one by following [these instructions](https://console.bluemix.net/docs/services/natural-language-understanding/getting-started.html#getting-started-tutorial) and obtaining the service credentials. You need to specify the URL, username, and password.


In [61]:
credentials_json= {
    "nlu_url":"YOUR_WATSON_NLU_URL",
    "nlu_username": "YOUR WATSON NLU USERNAME",
	"nlu_password": "YOUR WATSON NLU PASSWORD",
	"nlu_version": "2017-02-27"
}

## Watson NLU Enrichment Definition
In this cell, import the Watson Developer Cloud Python SDK, parse the
NLU credentials, and define the function to enrich text with NLU.

In [62]:
import watson_developer_cloud
from watson_developer_cloud import NaturalLanguageUnderstandingV1
from watson_developer_cloud.natural_language_understanding_v1 import Features, SentimentOptions, KeywordsOptions
from watson_developer_cloud import WatsonException

## Define credentials for NLU service
nlu_url = credentials_json['nlu_url']
nlu_username=credentials_json['nlu_username']
nlu_password=credentials_json['nlu_password']
nlu_version=credentials_json['nlu_version']
nlu = watson_developer_cloud.NaturalLanguageUnderstandingV1(version = nlu_version,
                                                            username = nlu_username,
                                                            password = nlu_password)

## Send text to NLU and extract Sentiment and Keywords
## Make sure text is utf-8 encoded
def enrichNLU(text):
    utf8text = text.encode("utf-8")
    # In python3, need to decode to string
    utf8text = utf8text.decode('utf-8')
    
    try:
        result = nlu.analyze(text = utf8text, features = Features(sentiment=SentimentOptions(),keywords=KeywordsOptions()))
        sentiment = result['sentiment']['document']['score']
        sentiment_label = result['sentiment']['document']['label']
        keywords = list(result['keywords'])  
    except WatsonException:
        result = None
        sentiment = 0.0
        sentiment_label = None
        keywords = None
    return sentiment, sentiment_label, keywords

In [63]:
## Sample text utterance for testing NLU
## Skip this cell unless you want to run a quick test in which case 
## you can un-comment the following lines and running the cell
#t = "I am really frustrated with this poor service"
#nlu_results = enrichNLU(t)
#print(nlu_results)

## Extract a Random Sample of Records
Next, we will extract a randome rample of records to run NLU enrichment on. This is needed to make sure we don't exceed our limit of free NLU calls per day.

In [64]:
numrecords = 500.0
fraction = numrecords/pandasDF.shape[0]
dfsample = pandasDF.sample(frac=fraction, replace=False)
# verify extracted sample is the correct size
dfsample.shape

(500, 24)

## NLU Enrichment
Next, we run the NLU enrichment on all records in the extracted sample. 
We time the run to report how long did the execution take.

In [65]:
## This calls the enrichNLU function which accesses the Watson NLU API
start_time = time.time()
dfsample['SENTIMENT'],dfsample['SENTIMENT_LABEL'],\
dfsample['KEYWORDS'] = zip(*dfsample['TEXT_CLEAN'].map(enrichNLU))
#print(dfsample)
print("total run time is: ", time.time() - start_time)

('total run time is: ', 200.57519388198853)


In [21]:
# Print out the size of the enriched sample, you should see 3 additional 
# columns compares to the initial dataframe
dfsample.shape

(500, 27)