# Sarcasm score with Watson NLU

Apply [Watson Natural Language Understanding](https://cloud.ibm.com/catalog/services/natural-language-understanding) on the same "sarcasm" data, to get `label` (positive/negative/neutral), along with sores on `sadness` `joy` `fear` `disgust` and `anger`  

## 0. Libraries

In [1]:
import pandas as pd
import numpy as np
import json

from ibm_watson import NaturalLanguageUnderstandingV1
from ibm_cloud_sdk_core.authenticators import IAMAuthenticator
from ibm_watson.natural_language_understanding_v1 import Features,CategoriesOptions,EmotionOptions,KeywordsOptions

## 1. Build "Stopwords"

Just like TF/IDF days, where frequency of words may be inversely related to the importance ... anyone remember LDA, hehe.  

In [2]:
with open("sarcasm.json", 'r') as f:
    sarcasm_data = json.load(f)

print("records total", len(sarcasm_data))

records total 26709


## 2. Process "sarcasm.json" file 

Sarcasm data provided by Laurency Moroney https://storage.googleapis.com/laurencemoroney-blog.appspot.com/sarcasm.json

Data looks like this

In [3]:
from bs4 import BeautifulSoup
import string

stopwords_list = ["a", "about", "above", "after", "again", "against", "all", "am", "an", "and", "any", "are", "as", "at",
             "be", "because", "been", "before", "being", "below", "between", "both", "but", "by", "could", "did", "do",
             "does", "doing", "down", "during", "each", "few", "for", "from", "further", "had", "has", "have", "having",
             "he", "hed", "hes", "her", "here", "heres", "hers", "herself", "him", "himself", "his", "how",
             "hows", "i", "id", "ill", "im", "ive", "if", "in", "into", "is", "it", "its", "itself",
             "lets", "me", "more", "most", "my", "myself", "nor", "of", "on", "once", "only", "or", "other", "ought",
             "our", "ours", "ourselves", "out", "over", "own", "same", "she", "shed", "shell", "shes", "should",
             "so", "some", "such", "than", "that", "thats", "the", "their", "theirs", "them", "themselves", "then",
             "there", "theres", "these", "they", "theyd", "theyll", "theyre", "theyve", "this", "those", "through",
             "to", "too", "under", "until", "up", "very", "was", "we", "wed", "well", "were", "weve", "were",
             "what", "whats", "when", "whens", "where", "wheres", "which", "while", "who", "whos", "whom", "why",
             "whys", "with", "would", "you", "youd", "youll", "youre", "youve", "your", "yours", "yourself",
             "yourselves"]

# Create the mapping table to use in translate()
stopwords_table = str.maketrans('', '', string.punctuation)

### 2.1. Load sentences, cleaning text and exclude "stopwords"

In [4]:
sentences = [] 
labels = []
urls = []

for item in sarcasm_data:
    sentence = item['headline'].lower()
    sentence = sentence.replace(",", " , ")
    sentence = sentence.replace(".", " . ")
    sentence = sentence.replace("-", " - ")
    sentence = sentence.replace("/", " / ")
    soup = BeautifulSoup(sentence)
    sentence = soup.get_text()
    words = sentence.split()
    filtered_sentence = ""
    for word in words:
        word = word.translate(stopwords_table) #replace specified characters with the character described the mapping table
        if word not in stopwords_table:
            filtered_sentence = filtered_sentence + word + " "
    sentences.append(filtered_sentence)
    labels.append(item['is_sarcastic'])
    urls.append(item['article_link'])

### 2.2. Sample data

In [5]:
print("The length of list is: ", len(sentences))

# Show the last N elements
N = 10
print("The last", N, "elements of list are : ", str(sentences[-N:]), "elements of label are : ", str(labels[-N:]))

The length of list is:  26709
The last 10 elements of list are :  ['what you should buy your basic friend  according to pinterest ', 'whats in your mailbox tips on what to do when uncle sam comes knocking ', 'paul ryan is more of a con man than ever ', 'pentagon to withhold budget figures out of respect for american families ', 'pope francis wearing sweater vestments he got for christmas ', 'american politics in moral free  fall ', 'americas best 20 hikes ', 'reparations and obama ', 'israeli ban targeting boycott supporters raises alarm abroad ', 'gourmet gifts for the foodie 2014 '] elements of label are :  [0, 0, 0, 1, 1, 0, 0, 0, 0, 0]


## 2.3. Setup Dataframe

In [6]:
training_size = 50 # mindful of NLU pricing plan 

training_sentences = sentences[0:training_size]

df = pd.DataFrame(training_sentences, columns =['Sarcasm?'])
df.head()

Unnamed: 0,Sarcasm?
0,former versace store clerk sues over secret bl...
1,the roseanne revival catches up to our thorny ...
2,mom starting to fear sons web series closest t...
3,boehner just wants wife to listen not come up...
4,j k rowling wishes snape happy birthday in t...


# 3. Watson NLU
## 3.1. Credentials

In [7]:
IAM_KEY = '**************'
SERVICE_URL = '**************'

## 3.2. Invoke Watson NLU

In [8]:
authenticator = IAMAuthenticator(IAM_KEY)
natural_language_understanding = NaturalLanguageUnderstandingV1(
    version='2021-08-01',
    authenticator=authenticator
)

natural_language_understanding.set_service_url(SERVICE_URL)

## 3.3 Response

In [9]:
responses = []
normalize = []

for index, row in df.iterrows():

    response = natural_language_understanding.analyze(
    text = row['Sarcasm?'],
    features=Features(keywords=KeywordsOptions(sentiment=True,emotion=True,limit=1))).get_result()
    normalize.append(pd.json_normalize(response['keywords']))
    responses.append(response)

In [10]:
normalize

[                         text  relevance  count  sentiment.score  \
 0  former versace store clerk   0.993272      1                0   
 
   sentiment.label  emotion.sadness  emotion.joy  emotion.fear  \
 0         neutral         0.186342     0.033968      0.020928   
 
    emotion.disgust  emotion.anger  
 0         0.123552       0.527004  ,
                        text  relevance  count  sentiment.score  \
 0  roseanne revival catches   0.960538      1        -0.826963   
 
   sentiment.label  emotion.sadness  emotion.joy  emotion.fear  \
 0        negative         0.660705     0.031229      0.023658   
 
    emotion.disgust  emotion.anger  
 0         0.073265       0.178345  ,
             text  relevance  count  sentiment.score sentiment.label  \
 0  closest thing   0.929554      1        -0.806136        negative   
 
    emotion.sadness  emotion.joy  emotion.fear  emotion.disgust  emotion.anger  
 0         0.043082     0.158796      0.600447          0.00413       0.010554 

In [11]:
responses

[{'usage': {'text_units': 1, 'text_characters': 77, 'features': 1},
  'language': 'en',
  'keywords': [{'text': 'former versace store clerk',
    'sentiment': {'score': 0, 'label': 'neutral'},
    'relevance': 0.993272,
    'emotion': {'sadness': 0.186342,
     'joy': 0.033968,
     'fear': 0.020928,
     'disgust': 0.123552,
     'anger': 0.527004},
    'count': 1}]},
 {'usage': {'text_units': 1, 'text_characters': 83, 'features': 1},
  'language': 'en',
  'keywords': [{'text': 'roseanne revival catches',
    'sentiment': {'score': -0.826963, 'label': 'negative'},
    'relevance': 0.960538,
    'emotion': {'sadness': 0.660705,
     'joy': 0.031229,
     'fear': 0.023658,
     'disgust': 0.073265,
     'anger': 0.178345},
    'count': 1}]},
 {'usage': {'text_units': 1, 'text_characters': 79, 'features': 1},
  'language': 'en',
  'keywords': [{'text': 'closest thing',
    'sentiment': {'score': -0.806136, 'label': 'negative'},
    'relevance': 0.929554,
    'emotion': {'sadness': 0.0430

In [12]:
df['Response'] = responses
df['Normalized'] = normalize
df.head()

Unnamed: 0,Sarcasm?,Response,Normalized
0,former versace store clerk sues over secret bl...,"{'usage': {'text_units': 1, 'text_characters':...",text relevance coun...
1,the roseanne revival catches up to our thorny ...,"{'usage': {'text_units': 1, 'text_characters':...",text relevance count ...
2,mom starting to fear sons web series closest t...,"{'usage': {'text_units': 1, 'text_characters':...",text relevance count sentiment....
3,boehner just wants wife to listen not come up...,"{'usage': {'text_units': 1, 'text_characters':...",text relevance count sentiment.score ...
4,j k rowling wishes snape happy birthday in t...,"{'usage': {'text_units': 1, 'text_characters':...",text relevance count sentiment.score sen...


## 3.4. Prettify result

In [13]:
score_df = df

for index, row in score_df.iterrows():
    score_df.loc[index,"Label"]   = score_df.iloc[index]['Response']['keywords'][0]['sentiment']['label']
    score_df.loc[index,"Score"]   = score_df.iloc[index]['Response']['keywords'][0]['sentiment']['score']
    score_df.loc[index,"Sadness"] = score_df.iloc[index]['Response']['keywords'][0]['emotion']['sadness']
    score_df.loc[index,"Joy"]     = score_df.iloc[index]['Response']['keywords'][0]['emotion']['joy']
    score_df.loc[index,"Fear"]    = score_df.iloc[index]['Response']['keywords'][0]['emotion']['fear']
    score_df.loc[index,"Disgust"] = score_df.iloc[index]['Response']['keywords'][0]['emotion']['disgust']
    score_df.loc[index,"Anger"]   = score_df.iloc[index]['Response']['keywords'][0]['emotion']['anger']

score_df.head()

Unnamed: 0,Sarcasm?,Response,Normalized,Label,Score,Sadness,Joy,Fear,Disgust,Anger
0,former versace store clerk sues over secret bl...,"{'usage': {'text_units': 1, 'text_characters':...",text relevance coun...,neutral,0.0,0.186342,0.033968,0.020928,0.123552,0.527004
1,the roseanne revival catches up to our thorny ...,"{'usage': {'text_units': 1, 'text_characters':...",text relevance count ...,negative,-0.826963,0.660705,0.031229,0.023658,0.073265,0.178345
2,mom starting to fear sons web series closest t...,"{'usage': {'text_units': 1, 'text_characters':...",text relevance count sentiment....,negative,-0.806136,0.043082,0.158796,0.600447,0.00413,0.010554
3,boehner just wants wife to listen not come up...,"{'usage': {'text_units': 1, 'text_characters':...",text relevance count sentiment.score ...,negative,-0.901408,0.648336,0.123014,0.063226,0.032541,0.044559
4,j k rowling wishes snape happy birthday in t...,"{'usage': {'text_units': 1, 'text_characters':...",text relevance count sentiment.score sen...,positive,0.918839,0.022918,0.992164,0.00651,0.005448,0.011898


## 3.5. Result sort by ... i.e. 'Joy'

In [14]:
final_df = score_df.drop(columns=['Response', 'Normalized'])

In [15]:
sorted_df = final_df.sort_values(by='Joy', ascending=False)
sorted_df

Unnamed: 0,Sarcasm?,Label,Score,Sadness,Joy,Fear,Disgust,Anger
4,j k rowling wishes snape happy birthday in t...,positive,0.918839,0.022918,0.992164,0.00651,0.005448,0.011898
44,give the gift of play this holiday season,neutral,0.0,0.117082,0.911846,0.01779,0.008611,0.02996
31,gillian jacobs on what its like to kiss adam b...,positive,0.810839,0.05533,0.83113,0.268281,0.020426,0.017507
9,fridays morning email inside trumps presser fo...,neutral,0.0,0.051395,0.776315,0.075755,0.059193,0.027654
6,the fascinating case for eating lab grown meat,positive,0.940957,0.123195,0.705304,0.015357,0.243538,0.016908
37,moana sails straight to the top of the box off...,neutral,0.0,0.11831,0.674794,0.048415,0.012466,0.021123
49,monster undeterred by night light,neutral,0.0,0.047191,0.646341,0.351561,0.016403,0.022273
18,bloombergs program to build better cities just...,positive,0.781383,0.414648,0.628939,0.046366,0.021253,0.01739
32,uber vows to repay nyc drivers tens of million...,negative,-0.849276,0.028886,0.526647,0.07132,0.095028,0.084872
48,nasa now almost positive mars is rocky,positive,0.858009,0.081524,0.525742,0.048404,0.011581,0.026974
