# **NLP GROUP PROJECT - SENTIMENT ANALYSIS**

### **- Wilson Lee**    **- Radhika Patel** **- Arshad Irfan Faisal** **- Felipe Basurto** **- Maurizio Polizzi**

Our goal is to get an overall sentiment score of the lyrics of each song. To select the most ideal sentiment model for our case, we created a small dataset to experiment with different pre-trained models such as Flair, Textblob, and amanda-cristina/finetuning-sentiment-model-4500-lyrics on Hugging Face, but the results of them didn’t fit with what we want. Here’s the reasons:

* TextBlob model(https://textblob.readthedocs.io/en/dev/):
TextBlob is quite simple to use and gives us an overall sentiment score, but after some researches to compare different models, we realized that the accuracy performance of TextBlob is not ideal. (http://bitly.ws/IXek)

* Flair model(https://github.com/flairNLP/flair):
The problem Flair is that it's only giving us 'POSITIVE', 'NEGATIVE', 'NEUTRAL' as the output instead of a sentiment score, so it didn't meet with our goal.

* amanda-cristina/finetuning-sentiment-model-4500-lyrics(https://huggingface.co/amanda-cristina/finetuning-sentiment-model-4500-lyrics?text=I+like+you+I+love+you):
The result of this model is a list containing sentiment score of each word in the input, instead of a numeric value of the whole song. That's why we decided not to use this model eventually.

The final model we decided is VADER(Valence Aware Dictionary and sEntiment Reasoner). It gave the result that suits the best for the goal of our project, as it provides an overall score of a whole paragraph of text, and its accuracy is higher than TextBlob based on our research.

--------------

**Importing libraries**

In [1]:
import re
import pandas as pd
from nltk.corpus import stopwords
from nltk.tokenize import word_tokenize
from nltk.sentiment.vader import SentimentIntensityAnalyzer
import nltk
import pickle

nltk.download('stopwords')
nltk.download('punkt')
nltk.download('vader_lexicon')

[nltk_data] Downloading package stopwords to
[nltk_data]     C:\Users\fejab\AppData\Roaming\nltk_data...
[nltk_data]   Package stopwords is already up-to-date!
[nltk_data] Downloading package punkt to
[nltk_data]     C:\Users\fejab\AppData\Roaming\nltk_data...
[nltk_data]   Package punkt is already up-to-date!
[nltk_data] Downloading package vader_lexicon to
[nltk_data]     C:\Users\fejab\AppData\Roaming\nltk_data...
[nltk_data]   Package vader_lexicon is already up-to-date!


True

To apply the lyrics sentiment analysis, first we need to import the dataset with the clean version of the lyrics, which was created in a previous notebook. It is a parquet file instead of a csv file because it is more efficient to read and write data.

In [2]:
df_with_lyrics = pd.read_parquet('data/df_with_lyrics.parquet')

df = df_with_lyrics.copy()

And then, we're going to use a pre-trained sentiment analysis model called "VADER"(Valence Aware Dictionary and sEntiment Reasoner) go give each of the song a sentiment score based on its lyrics.

We've experimented with different models such as Flair, Textblob, and amanda-cristina/finetuning-sentiment-model-4500-lyrics on Hugging Face, but we concluded that VADER gives the result that suits the best for the goal of our project, as it provides an overall score of a whole paragraph of text. 

More details about VADER: https://towardsdatascience.com/an-short-introduction-to-vader-3f3860208d53

In [3]:
# Create an instance of the SentimentIntensityAnalyzer
sid = SentimentIntensityAnalyzer()

# Define a function to get the sentiment score for each lyric
def get_sentiment_score(lyrics):
    if isinstance(lyrics, str):
        return sid.polarity_scores(lyrics)['compound']
    else:
        return 0

# Apply the get_sentiment_score function to the 'Lyrics_clean' column
df['Sentiment_Score'] = df['Lyrics'].apply(get_sentiment_score)

In [4]:
df.head()

Unnamed: 0,Genre,Lyrics,Sentiment_Score
0,Blues,true foundation I'm lifting bloodstained Banne...,0.9946
1,Blues,love way spread wings Yes got sweet little ang...,0.9774
2,Blues,"Everyday, everyday blues Ooooh, everyday, ever...",0.9698
3,Blues,"28 waist, 44 hips got real crazy legs upsets b...",0.2732
4,Blues,can't even close eyes Three o'clock morning ba...,0.6973


Exporting the sentiment scores to the main database.

In [12]:
lyrics = pd.read_csv('lyrics.csv')

lyrics_and_sent= pd.concat([lyrics, df['Sentiment_Score']], axis=1)
lyrics_and_sent.to_parquet('lyrics_and_sent.parquet')

-------------

After getting the sentiment score, scaling from -1 to 1, we're going to map the songs into 5 types based on the sentiment.

In [5]:
# Define a function to map sentiment scores to sentiment types (We found these thresholds by experimenting with different values)
def map_sentiment_type(score):
    if score <= -0.98:
        return 'Very Negative'
    elif score <= -0.6:
        return 'Slightly Negative'
    elif score <= 0.6:
        return 'Neutral'
    elif score <= 0.98:
        return 'Slightly Positive'
    else:
        return 'Very Positive'

# Apply the map_sentiment_type function to the 'Sentiment_Score' column
df['type'] = df['Sentiment_Score'].apply(map_sentiment_type)

In [6]:
df.head()

Unnamed: 0,Genre,Lyrics,Sentiment_Score,type
0,Blues,true foundation I'm lifting bloodstained Banne...,0.9946,Very Positive
1,Blues,love way spread wings Yes got sweet little ang...,0.9774,Slightly Positive
2,Blues,"Everyday, everyday blues Ooooh, everyday, ever...",0.9698,Slightly Positive
3,Blues,"28 waist, 44 hips got real crazy legs upsets b...",0.2732,Neutral
4,Blues,can't even close eyes Three o'clock morning ba...,0.6973,Slightly Positive


In [7]:
# Count the occurrences of each sentiment type
sentiment_counts = df['type'].value_counts()

# Print the count of each sentiment type
print(sentiment_counts)

Very Positive        333
Slightly Positive    265
Slightly Negative    148
Very Negative        122
Neutral               97
Name: type, dtype: int64


Okay, so now we have a working model which is able to score the sentiment on our lyrics, although we are not able to score it, since there is not sentiment information on it, we are going to save the model information for later usage.

In [9]:
import torch

In [10]:
# Save the sid Vader model in a pickle file
#with open('NLP_sentiment_model.pkl', 'wb') as f:
#    pickle.dump(sid, f)
    
torch.save(sid, 'NLP_sentiment_model.pt')

-------

## **Transformers** 
Because everything can be also done with transformers

In this case, we are planning to use a pre-trained model of DistilBERT, which is a much smaller version of BERT, but still as powerful as the original one. Since this model is trained on a huge corpus of text, it is able to understand the context of the words and sentences, and give a sentiment score based on that.

The Hugging Face Transformers library helps us a lot on the use of these kind of models.

In [52]:
from transformers import AutoTokenizer, pipeline

# Initialize the DistilBERT tokenizer
tokenizer = AutoTokenizer.from_pretrained('distilbert-base-uncased-finetuned-sst-2-english')
classifier = pipeline('sentiment-analysis', model='distilbert-base-uncased-finetuned-sst-2-english')

sentiment_scores = []
sentiment_labels = []

# Process the text and run the sentiment analysis pipeline
for lyrics in df_with_lyrics['Lyrics']:
    # Truncate the text to the maximum sequence length of DistilBERT
    lyrics = tokenizer(lyrics, truncation=True, max_length=500)['input_ids']
    lyrics = tokenizer.decode(lyrics)
    
    result = classifier(lyrics)[0]
    sentiment_scores.append(result['score'])
    sentiment_labels.append(result['label'])

# Add the sentiment score and label to the data
df_with_lyrics['Sentiment_Score'] = sentiment_scores
df_with_lyrics['Sentiment_Label'] = sentiment_labels

In [54]:
df_with_lyrics['type'] = df_with_lyrics['Sentiment_Score'].apply(map_sentiment_type)

In [55]:
df_with_lyrics

Unnamed: 0,Genre,Lyrics,Sentiment_Score,Sentiment_Label,type
0,Blues,true foundation I'm lifting bloodstained Banne...,0.998819,POSITIVE,Very Positive
1,Blues,love way spread wings Yes got sweet little ang...,0.680024,POSITIVE,Slightly Positive
2,Blues,"Everyday, everyday blues Ooooh, everyday, ever...",0.986249,NEGATIVE,Very Positive
3,Blues,"28 waist, 44 hips got real crazy legs upsets b...",0.735240,NEGATIVE,Slightly Positive
4,Blues,can't even close eyes Three o'clock morning ba...,0.995777,NEGATIVE,Very Positive
...,...,...,...,...,...
995,Rock,"used love her, kill used love her, hm yeah, ki...",0.986992,NEGATIVE,Very Positive
996,Rock,"now, know say goodbye then, seems seen eyes Th...",0.915832,POSITIVE,Slightly Positive
997,Rock,"could see tomorrow, plans? one live sorrow, as...",0.912260,POSITIVE,Slightly Positive
998,Rock,"One, two, one, two, three, four *Whistling* Sh...",0.836154,NEGATIVE,Slightly Positive


With an average runtime of 4 minutes, this model is still too big for us to be able to use it in our real time application, so we are going to use the VADER model for running inference.

What we can do, is to assume that the scores with the Transformer model are more accurate, so we can consider it as a ground truth, and use it to score the VADER model.

In [63]:
from sklearn.metrics import mean_squared_error, mean_absolute_error, r2_score, explained_variance_score

# Extract the relevant columns for scoring
df1_scores = df['Sentiment_Score']
df2_scores = df_with_lyrics['Sentiment_Score']

# Calculate the mean squared error (MSE)
mse = mean_squared_error(df1_scores, df2_scores)

# Calculate the mean absolute error (MAE)
mae = mean_absolute_error(df1_scores, df2_scores)

# Calculate the root mean squared error (RMSE)
rmse = mean_squared_error(df1_scores, df2_scores, squared=False)

# Calculate the R-squared (R2) score
r2 = r2_score(df1_scores, df2_scores)

# Calculate the explained variance score
evs = explained_variance_score(df1_scores, df2_scores)

# Print the regression metrics
print("Mean Squared Error (MSE):", mse)
print("Mean Absolute Error (MAE):", mae)
print("Root Mean Squared Error (RMSE):", rmse)
print("R-squared (R2) Score:", r2)
print("Explained Variance Score (EVS):", evs)

Mean Squared Error (MSE): 1.1282304204940032
Mean Absolute Error (MAE): 0.6741211997841554
Root Mean Squared Error (RMSE): 1.062181914972197
R-squared (R2) Score: -0.578678373915376
Explained Variance Score (EVS): -0.059185035522111074


Based on these metrics, it appears that the model's predictions are not accurate, as indicated by the relatively high MSE, MAE, and RMSE values. Additionally, the negative R2 score and EVS score suggest that the model's performance is worse than a baseline mean prediction. It may be necessary to improve the model or consider alternative approaches to achieve better accuracy in predicting sentiment scores.

But since sentiment is not an objective metric, we can't really say that the VADER model is not working well, plus this is just a test with no relevance to our real time application.