<a href="https://colab.research.google.com/github/junting-huang/data_storytelling/blob/main/case_8_sentiment.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# case_7. sentiment

Sentiment analysis is a natural language processing (NLP) technique aimed at determining and extracting the sentiment or emotional tone expressed in a piece of text. The primary objective is to understand whether the sentiment conveyed is positive, negative, or neutral. Sentiment analysis utilizes machine learning algorithms and linguistic techniques to analyze and interpret subjective information from various sources, such as social media, customer reviews, and news articles. This valuable tool has widespread applications, including gauging public opinion, brand monitoring, and customer feedback analysis. Businesses often employ sentiment analysis to gain insights into customer sentiments, allowing them to make informed decisions, enhance customer experiences, and adapt strategies based on the prevailing sentiment within a given context.


Sentiment analysis applied to literature involves employing NLP techniques to assess and understand the emotional tone, opinions, and attitudes expressed within literary texts. By leveraging computational tools, sentiment analysis can unveil the sentiments embedded in characters' dialogues, narrative elements, and overall thematic content. This analytical approach allows researchers, scholars, and literary enthusiasts to gain deeper insights into the emotional nuances of literary works, identifying patterns in sentiment shifts, character dynamics, and thematic developments. Whether examining classical literature or contemporary novels, sentiment analysis provides a quantitative and systematic way to explore the emotional landscape of literary pieces, contributing to a more nuanced understanding of the subjective aspects conveyed by authors through their prose and characters.

## 7.1 sentiment analysis


The provided code uses the TextBlob library in Python to perform sentiment analysis on a user-inputted sentence. Feel free to input different sentences and compare their sentiment scores.

In [None]:
from textblob import TextBlob

# Get a sentence from the user
sentence = input('Please enter a sentence: ')

# Create a TextBlob object
blob = TextBlob(sentence)

# Determine the polarity and subjectivity of the sentence
polarity = blob.sentiment.polarity
subjectivity = blob.sentiment.subjectivity

# Print the results
print(f'The polarity of this sentence is {polarity}.')
print(f'The subjectivity of this sentence is {subjectivity}.')

## 5.2 valence aware dictionary and sentiment reasoner (VADER)

The Valence Aware Dictionary and sentiment Reasoner, or VADER, is a pre-built sentiment analysis tool designed specifically for social media text. Developed by researchers at the Georgia Institute of Technology, VADER excels at handling the unique linguistic characteristics of informal, colloquial language commonly found on platforms like Twitter and Facebook. What sets VADER apart is its ability to not only classify text as positive, negative, or neutral but also to capture the intensity or degree of sentiment expressed. It considers not only individual words but also their context, including negations and booster words, to provide a more nuanced understanding of sentiment. VADER assigns a compound score that represents the overall sentiment polarity and strength in a given piece of text. Its ease of use, accuracy, and adaptability make VADER a valuable tool in sentiment analysis applications, particularly in the realm of social media analytics and opinion mining.

For more information about VADER, please see this medium post: https://medium.com/mlearning-ai/vader-valence-aware-dictionary-and-sentiment-reasoner-sentiment-analysis-28251536698.

In [None]:
! pip install nltk

In [None]:
import nltk
nltk.download('vader_lexicon')

The *polarity_scores* method from the *SentimentIntensityAnalyzer* is to obtain sentiment scores for a given line of text. The polarity_scores method returns a dictionary containing the positive, negative, neutral, and compound scores for the input text. The compound score is often used to represent the overall sentiment of the text.

In [None]:
from nltk.sentiment.vader import SentimentIntensityAnalyzer

def analyze_sentiment(lyrics):
    sid = SentimentIntensityAnalyzer()

    # Split the lyrics into lines
    lines = lyrics.split('\n')

    # Analyze the sentiment of each line and aggregate the results
    total_scores = {'neg': 0, 'neu': 0, 'pos': 0, 'compound': 0}
    for line in lines:
        if line:  # Ignore empty lines
            scores = sid.polarity_scores(line)
            for key in scores:
                total_scores[key] += scores[key]

    # Calculate average scores
    num_lines = len(lines)
    avg_scores = {key: total_scores[key] / num_lines for key in total_scores}

    sentiment = None
    if avg_scores['compound'] > 0:
        sentiment = 'Positive'
    elif avg_scores['compound'] < 0:
        sentiment = 'Negative'
    else:
        sentiment = 'Neutral'

    return sentiment, avg_scores

# Replace with your actual song lyrics
lyrics = """
Shadows are fallin' and I've been here all day
It's too hot to sleep and time is runnin' away
I feel like my soul has turned into steel
I've still got the scars that the sun didn't heal
There's not even room enough to be anywhere
It's not dark yet, but it's gettin' there
Well, my sense of humanity has gone down the drain
Behind every beautiful thing there's been some kind of pain
She wrote me a letter and she wrote it so kind
She put down in writin' what was in her mind
I just don't see why I should even care
It's not dark yet, but it's gettin' there
Well, I've been to London and I been to gay Paris
I've followed the river and I got to the sea
I've been down on the bottom of the world full of lies
I ain't lookin' for nothin' in anyone's eyes
Sometimes my burden is more than I can bear
It's not dark yet, but it's gettin' there
I was born here and I'll die here against my will
I know it looks like I'm movin' but I'm standin' still
Every nerve in my body is so naked and numb
I can't even remember what it was, I came here to get away from
Don't even hear the murmur of a prayer
It's not dark yet, but it's gettin' there
"""

sentiment, scores = analyze_sentiment(lyrics)

print(f"Sentiment: {sentiment}")
print(f"Scores: {scores}")

Do you agree with the sentiment analysis result?

## 7.2 Application: bob dylan's saddest song

The following code does a sentiment analysis to analyze the lyrics of Bob Dylan's songs and identify the "saddest" one based on sentiment polarity. 

In [None]:
import pandas as pd
from textblob import TextBlob
from nltk.corpus import stopwords
from nltk.tokenize import word_tokenize
import nltk

# You need to download the 'punkt' and 'stopwords' packages if you haven't done so already
nltk.download('punkt')
nltk.download('stopwords')

# Load the data
df = pd.read_csv('bob_dylan_songs.csv')  # replace with your csv file

# Create a function to clean the lyrics
def clean_lyrics(lyrics):
    stop_words = set(stopwords.words('english'))
    word_tokens = word_tokenize(lyrics)
    filtered_lyrics = [w for w in word_tokens if not w in stop_words]
    return " ".join(filtered_lyrics)

# Apply cleaning function to lyrics
df['cleaned_lyrics'] = df['lyrics'].apply(clean_lyrics)

# Get the polarity of each song
df['polarity'] = df['cleaned_lyrics'].apply(lambda x: TextBlob(x).sentiment.polarity)

# Sort the dataframe by polarity
df = df.sort_values('polarity')

# The first song in the sorted dataframe is the 'saddest' one
saddest_song = df.iloc[0]

print("The saddest Bob Dylan's song according to sentiment analysis is: ", saddest_song['title'])


The top 20 saddest songs:

In [None]:
df.head(20)

## 7.3 emotion analysis

Emotion analysis is a branch of AI to discern and interpret emotions expressed in written or spoken language. This field of study involves using machine learning algorithms and linguistic analysis to identify and categorize emotions such as happiness, sadness, anger, fear, and more within a given text. Emotion analysis goes beyond simple sentiment analysis by delving into the nuanced expressions of emotions and their intensity. Techniques often include analyzing word choice, context, and linguistic patterns to infer emotional states accurately. In NLP applications, emotion analysis finds relevance in sentiment classification, chatbot interactions, social media monitoring, and mental health assessment, among others. The aim is to equip machines with the ability to recognize and respond to human emotions, fostering more empathetic and context-aware interactions in various domains.

First, we create a pivot table where each row corresponds to a word, each column corresponds to an emotion. For example, the word abandon associates fear, negative, and sadness emotions.

In [None]:
import pandas as pd

filepath = 'data/NRC-emotion-lexicon-wordlevel-alphabetized-v0.92.txt' # replace with the path to your lexicon file
emolex_df = pd.read_csv(filepath,  names=["word", "emotion", "association"], sep='\t')
emolex_df = emolex_df.pivot(index='word', columns='emotion', values='association').reset_index() # this is like using pivot table in excel

In [None]:
emolex_df.head(10)

Then, we calculate the sum of emotion values for the words in the text.

In [None]:
def emotion_analyzer(text, df=emolex_df):
  words_to_check = []

  for word in text.split(' '):
        word = word.lower()
        words_to_check.append(word)

  filtered_df = df[df["word"].isin(words_to_check)]
  sum_of_values = filtered_df.sum(numeric_only=True)

  return sum_of_values

In [None]:
text = "I am happy and excited about this great opportunity!"
emotion_analyzer(text)

In [None]:
emotion_analyzer(lyrics)

What does the result tell you about the emotions of the text?