# Calculate sentiment and emotion in Meltwater data

In this notebook we estimate the sentiment and emotion scores on the extraction from Meltwater. _Sentiment_ is a metric ranging from -1 to 1, where -1 represents a negative view of the topic being discussed in the text that the sentiment score is calculated upon, whilst 1 corresponds to a positive treatment of that text. We also estimate five emotions: anger, disgust, fear, joy and sadness. A text can present different emotions to a certain extent. Hence, the scores range from 0 to 1 and should add up to at most 1. 

#### Input
- Dataset with processed tweets from Meltwater (see notebook `Clean Twitter Data from Meltwater`): `Meltwater_processed.csv`

#### Output
- Input dataset extended with extra columns containing sentiment (-1 to 1) and emotion scores (5, from 0 to 1): `tweets_emotions_Notts_Melt_ALL.csv`. The scores are only calculated for Nottingham only. 


## 1. Preliminaries

Here we just install some packages that are not present by default in our environment. Then, we import all packages, set up our NLU service and read data.

In [1]:
!pip install ibm_watson
!pip install watson_developer_cloud
!pip install ibm_cloud_sdk_core

In [2]:
import pandas as pd
import numpy as np
import os  
import warnings
warnings.filterwarnings('ignore')

from ibm_watson import NaturalLanguageUnderstandingV1
from watson_developer_cloud.natural_language_understanding_v1 import Features, EntitiesOptions, KeywordsOptions, SemanticRolesOptions, SentimentOptions, EmotionOptions, ConceptsOptions, CategoriesOptions
from ibm_cloud_sdk_core.authenticators import IAMAuthenticator

In [3]:
# Watson NLU API key
API = {
  "apikey": "XXXXXXXXXXXXXXX",
  "iam_apikey_description": "XXXXXXXXXXXXXXX",
  "iam_apikey_name": "XXXXXXXXXXXXXXX",
  "iam_role_crn": "XXXXXXXXXXXXXXX",
  "iam_serviceid_crn": "XXXXXXXXXXXXXXX",
  "url": "XXXXXXXXXXXXXXX"
}


api_key = API['apikey']
url = API['url']

natural_language_understanding = NaturalLanguageUnderstandingV1(version='2020-10-29', authenticator=IAMAuthenticator(api_key))
natural_language_understanding.set_service_url(url)

In [4]:
# Read data
df_proc = pd.read_csv('/project_data/data_asset/Meltwater_processed.csv')
df_proc = df_proc[df_proc['City'] == 'Nottingham']
df_proc.head()

## 2. Extract sentiment and emotion

In [68]:
# Define functions

def analyze_using_NLU(analysistext):
  """ 
  Extract results from Watson Natural Language Understanding. Returns a dictionary with sentiment and emotion scores. 
  """

  res=dict()
  response = natural_language_understanding.analyze(text=analysistext,
                                                    features=Features(
                                                        sentiment=SentimentOptions(),
                                                        emotion=EmotionOptions()), 
                                                    language='en')
  res['results']=response
  return res['results']


def get_values_from_NLU(df_orig, col_name):  
    """ 
    Pass results from analyze_using_NLU to a dataframe
    """
    df = df_orig.copy()

    sadness = []
    joy = []
    fear = []
    disgust = []
    anger = []
    sentiment = []
    
    count = 1

    for i in range(0, len(df)):
        if count % 10 == 0: 
            print('Pass number:', count) # Print progress
        txt = df[col_name].iloc[i]
        time.sleep(0.5) # Sleep some time not to overload the server
        dictionary = analyze_using_NLU(txt)

        sadness.append(dictionary.result['emotion']['document']['emotion']['sadness'])
        joy.append(dictionary.result['emotion']['document']['emotion']['joy'])
        fear.append(dictionary.result['emotion']['document']['emotion']['fear'])
        disgust.append(dictionary.result['emotion']['document']['emotion']['disgust'])
        anger.append(dictionary.result['emotion']['document']['emotion']['anger'])
        sentiment.append(dictionary.result['sentiment']['document']['score'])
        
        count += 1

    df['sadness'] = sadness 
    df['joy'] = joy
    df['fear'] = fear 
    df['disgust'] = disgust
    df['anger'] = anger
    df['sentiment'] = sentiment
    
    df.reset_index(inplace=True)
    
    return df

In [5]:
# Get sentiment and emotions
# Produce emotions in batches. Doing in batches in not really necessary - This ensures that we don't overload the server plus we have checkpoints if it is overloaded in the end
tweets_emotions = pd.DataFrame()
_df_list = list(np.linspace(0, len(df_proc), 100, dtype = int))
start = 0
batch_count = 1
for point in _df_list:
    print('Batch:', batch_count)
    batch_emotions = get_values_from_NLU(df_proc[start:point+1],'Text_comment')
    tweets_emotions = tweets_emotions.append(batch_emotions)
    start = point+1
    batch_count +=1
    print('\n')
    
# Produce all emotions in one batch
# tweets_emotions = get_values_from_NLU(df_proc,'Text_comment')


In [6]:
# Export tweets and emotions
tweets_emotions.to_csv('/project_data/data_asset/tweets_emotions_Notts_Melt_ALL.csv', index = False)

________

#### Authors
- **Álvaro Corrales Cano** is a Data Scientist within IBM's Cloud Pak Acceleration team. With a background in Economics, Álvaro specialises in a wide array Econometric techniques and causal inference, including regression, discrete choice models, time series and duration analysis.
- **Anthony Ayanwale** is a Data Scientist within IBM's Cloud Pak Acceleration team, where he specialises in Data Science and Machine Learning Solutions. 
- **Nicolas Ayoub** is a Data Scientist within IBM's Cloud Pak Acceleration team, where he specialises in Data Science and Machine Learning Solutions.

Copyright © IBM Corp. 2020. Licensed under the Apache License, Version 2.0. Released as licensed Sample Materials.
