# Sentiment Analysis

#### Using distilbert-base-uncased-emotion

https://huggingface.co/bhadresh-savani/distilbert-base-uncased-emotion?text=I+feel+a+bit+let+down

- classifies emotional sentiment of the sentence (and accounts for context)
- base of model is BERT.
- output: joy, love, anger, fear, surprise.
- DistiliBERT uses distillation to compress large models into smaller models so that it is 60% faster.
- This model is fine tuned on tweet emotion dataset.

### Import Libraries & Pretrained Model

In [2]:
from transformers import pipeline

In [3]:
classifier = pipeline("text-classification",model='bhadresh-savani/distilbert-base-uncased-emotion', return_all_scores=True)

config.json:   0%|          | 0.00/768 [00:00<?, ?B/s]

pytorch_model.bin:   0%|          | 0.00/268M [00:00<?, ?B/s]

tokenizer_config.json:   0%|          | 0.00/291 [00:00<?, ?B/s]

vocab.txt:   0%|          | 0.00/232k [00:00<?, ?B/s]

special_tokens_map.json:   0%|          | 0.00/112 [00:00<?, ?B/s]



In [4]:
import pandas as pd
import numpy as np

In [5]:
from google.colab import drive
drive.mount('/content/drive')

Mounted at /content/drive


In [8]:
path_to_zip = "/content/drive/MyDrive/AppliedCV_Spring2024/complete_data.gz"
complete_dataset = pd.read_csv(path_to_zip, compression='gzip')

In [9]:
complete_dataset

Unnamed: 0.1,Unnamed: 0,track,artist,uri,danceability,energy,key,loudness,mode,speechiness,...,valence,tempo,duration_ms,time_signature,chorus_hit,sections,target,bucket,genre,lyrics
0,0,Wednesday Night Prayer Meeting - Alternative Take,Charles Mingus,spotify:track:42IqaW4VDhFomQMojirykk,0.377,0.4530,5,-14.669,0,0.0636,...,0.6070,126.930,341333,3,35.14482,15,0,WednesdayNightPrayerMeeting-AlternativeTake-Ch...,r&b,3 ContributorsChapter 2 LyricsCHAPTER TWO\n\nT...
1,1,Where The Little Jesus Sleeps,Harry Belafonte,spotify:track:5LZTD5POD0t1sQ2lRGHtt7,0.460,0.0403,5,-24.472,1,0.0543,...,0.1560,87.746,129267,4,43.11708,6,0,WhereTheLittleJesusSleeps-HarryBelafonte.wav,edm,3 ContributorsWhere the Little Jesus Sleeps Ly...
2,2,Jehovah The Lord Will Provide,Harry Belafonte,spotify:track:6lVYbSu7JRob3PVhUtD26l,0.343,0.1180,7,-18.652,1,0.0642,...,0.1660,87.256,179507,3,33.79222,8,0,JehovahTheLordWillProvide-HarryBelafonte.wav,rock,2 ContributorsJehovah the Lord Will Provide Ly...
3,3,Silent Night,Harry Belafonte,spotify:track:28Z2DoBv5PldDhygTpKUFe,0.315,0.0685,0,-21.441,1,0.0381,...,0.1800,89.775,219453,3,20.08166,9,0,SilentNight-HarryBelafonte.wav,rock,"3 ContributorsSilent Night LyricsSilent night,..."
4,4,The Baby Boy,Harry Belafonte,spotify:track:5Hs7dFbMFpeLlT7Z5EGMk4,0.601,0.0288,4,-21.441,1,0.0675,...,0.5150,127.622,205867,4,64.98770,6,0,TheBabyBoy-HarryBelafonte.wav,pop,2 ContributorsThe Baby Boy LyricsThe Virgin Ma...
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
28889,28889,Lotus Flowers,Yolta,spotify:track:4t1TljQWJ6ZuoSY67zVvBI,0.172,0.3580,9,-14.430,1,0.0342,...,0.0361,72.272,150857,4,24.30824,7,0,LotusFlowers-Yolta.wav,rock,19 ContributorsVirgin Lyrics[Intro]\n(We got L...
28890,28890,Calling My Spirit,Kodak Black,spotify:track:2MShy1GSSgbmGUxADNIao5,0.910,0.3660,1,-9.954,1,0.0941,...,0.7400,119.985,152000,4,32.53856,8,1,CallingMySpirit-KodakBlack.wav,pop,114 ContributorsCalling My Spirit Lyrics[Verse...
28891,28891,Teenage Dream,Katy Perry,spotify:track:55qBw1900pZKfXJ6Q9A2Lc,0.719,0.8040,10,-4.581,1,0.0355,...,0.6050,119.999,227760,4,20.73371,7,1,TeenageDream-KatyPerry.wav,pop,66 ContributorsTranslationsItalianoTeenage Dre...
28892,28892,Stormy Weather,Oscar Peterson,spotify:track:4o9npmYHrOF1rUxxTVH8h4,0.600,0.1770,7,-16.070,1,0.0561,...,0.5600,120.030,213387,4,21.65301,14,0,StormyWeather-OscarPeterson.wav,pop,2 ContributorsStormy Weather LyricsDon't know ...


In [16]:
complete_dataset['track'] = complete_dataset['track'].str.replace(r'\s+', '', regex=True)
complete_dataset['artist'] = complete_dataset['artist'].str.replace(r'\s+', '', regex=True)
complete_dataset['file'] = complete_dataset['track'] + '-' + complete_dataset['artist']
complete_dataset.drop(['track', 'artist'], axis=1, inplace=True)
complete_dataset = complete_dataset[['file'] + [col for col in complete_dataset.columns if col != 'file']]

In [34]:
complete_dataset.rename(columns={'file': 'song_name'}, inplace=True)

In [10]:
output = pd.read_csv("/content/drive/MyDrive/AppliedCV_Spring2024/data_output.csv")

In [35]:
subset_df = complete_dataset[complete_dataset['song_name'].isin(output['song_name'])]
subset_df = subset_df.drop_duplicates(subset=['song_name'])

In [36]:
subset_df.shape

(992, 22)

In [37]:
output['song_name'].isin(subset_df['song_name']).sum()

992

### Match order of lyrics column to the output

In [44]:
order = np.array(output['song_name'])
order_mapping = {value: index for index, value in enumerate(order)}
reordered_df = subset_df[subset_df['song_name'].isin(order)].assign(order_key=lambda x: x['song_name'].map(order_mapping)).sort_values(by='order_key')
reordered_df.drop(columns='order_key', inplace=True)
reordered_df = reordered_df.reset_index(drop=True)

In [67]:
reordered_df.iloc[1][0]

'thinkingofyou-mommy'

In [77]:
# since there are 145 out of 992 without lyrics data,
# use the song name as "lyrics" and predict sentiment based on song name

reordered_df.loc[reordered_df['lyrics'] == '0', 'lyrics'] = reordered_df.loc[reordered_df['lyrics'] == '0', 'song_name']

In [78]:
reordered_df['lyrics'].iloc[1]

'thinkingofyou-mommy'

In [59]:
output.iloc[0][0]

'WhatTheWaterGaveMe-Florence+TheMachine'

In [79]:
classify = lambda lyrics: max(classifier(lyrics)[0], key=lambda x: x['score'])['label']

'joy'

In [98]:
output.shape

(992, 6)

In [134]:
sentiment_pred = []

classify = lambda lyrics: max(classifier(lyrics)[0], key=lambda x: x['score'])['label']
def classify_with_error(lyrics, i):
  try:
    return classify(lyrics)
  except Exception as e:
    print(i,":",e)
    return None

# for row in range(len(output)):
#   print(f"row: {row}")
#   sentiment_pred.append(classify_with_error(reordered_df.iloc[row][-1]))


In [110]:
reordered_df['lengths'] = reordered_df['lyrics'].apply(len)

In [140]:
max_length = 500  # start with 1000 and trim based on error
truncate = lambda x: x[:max_length] if len(x) > max_length else x
reordered_df['truncated_lyrics'] = reordered_df['lyrics'].apply(truncate)

In [137]:
sentiment_pred = [""]*len(output)
for i in range(len(output)):
  sentiment_pred[i] = classify_with_error(reordered_df['truncated_lyrics'].iloc[i],i)

92 : The size of tensor a (523) must match the size of tensor b (512) at non-singleton dimension 1
125 : The size of tensor a (515) must match the size of tensor b (512) at non-singleton dimension 1
459 : The size of tensor a (611) must match the size of tensor b (512) at non-singleton dimension 1
577 : The size of tensor a (632) must match the size of tensor b (512) at non-singleton dimension 1


In [141]:
retry = [92, 125, 459, 577]
for i in retry:
  sentiment_pred[i] = classify_with_error(reordered_df['truncated_lyrics'].iloc[i],i)

In [142]:
output["sentiment"] = sentiment_pred

In [144]:
output

Unnamed: 0,song_name,genre_output,popularity_output,danceability_output,energy_output,sentiment
0,WhatTheWaterGaveMe-Florence+TheMachine,3,0,[0.63750064],[0.61099994],sadness
1,thinkingofyou-mommy,3,0,[0.527886],[0.3540758],joy
2,Cantando-VicenteFernández,4,0,[0.64045924],[0.617881],joy
3,Blackbird-LordMelody,3,0,[0.70685804],[0.30188432],sadness
4,Toss-Up-N2Deep,3,0,[0.7421246],[0.7239125],joy
...,...,...,...,...,...,...
987,BulletWithButterflyWings-TheSmashingPumpkins,4,0,[0.6430634],[0.75621456],joy
988,HappyWithYou-SamanthaCole,3,0,[0.72664696],[0.6498237],joy
989,IWantCandy-BowWowWow,4,0,[0.65332144],[0.73874915],joy
990,SomethingInMyHouse-DeadOrAlive,4,0,[0.73544574],[0.83352506],fear


In [145]:
output.to_csv("/content/drive/MyDrive/AppliedCV_Spring2024/output_w_sentiment.csv", index=False)

### Examples Song Lyrics -

#### Happy by Pharrell Williams

In [None]:
lyrics = """It might seem crazy what I'm 'bout to say
Sunshine she's here, you can take a break
I'm a hot air balloon that could go to space
With the air, like I don't care, baby, by the way

(Because I'm happy)
Clap along if you feel like a room without a roof
(Because I'm happy)
Clap along if you feel like happiness is the truth
(Because I'm happy)
Clap along if you know what happiness is to you
(Because I'm happy)
Clap along if you feel like that's what you wanna do

Here come bad news, talking this and that (Yeah!)
Well, give me all you got, don't hold it back (Yeah!)
Well, I should probably warn ya, I'll be just fine (Yeah!)
No offense to you, don’t waste your time, here's why

(Because I'm happy)
Clap along if you feel like a room without a roof
(Because I'm happy)
Clap along if you feel like happiness is the truth
(Because I'm happy)
Clap along if you know what happiness is to you
(Because I'm happy)
Clap along if you feel like that's what you wanna do
"""


In [None]:
prediction = classifier(lyrics)
print(prediction)

NameError: name 'classifier' is not defined

In [None]:
max_label = max(prediction[0], key=lambda x: x['score'])['label']
print(max_label)

joy


#### Wrecking Ball by Miley Cyrus

In [None]:
lyrics = """
[Verse 1]
We clawed, we chained our hearts in vain
We jumped, never asking why
We kissed, I fell under your spell
A love no one could deny

[Pre-Chorus]
Don't you ever say I just walked away
I will always want you
I can't live a lie, running for my life
I will always want you

[Chorus]
I came in like a wrecking ball
I never hit so hard in love
All I wanted was to break your walls
All you ever did was wreck me
Yeah, you, you wreck me

[Verse 2]
I put you high up in the sky
And now, you're not coming down
It slowly turned, you let me burn
And now, we're ashes on the ground

[Pre-Chorus]
Don't you ever say I just walked away
I will always want you
I can't live a lie, running for my life
I will always want you
"""

In [None]:
prediction = classifier(lyrics)
print(prediction)
max_label = max(prediction[0], key=lambda x: x['score'])['label']
print(max_label)

[[{'label': 'sadness', 'score': 0.40335676074028015}, {'label': 'joy', 'score': 0.007068042177706957}, {'label': 'love', 'score': 0.0018133686389774084}, {'label': 'anger', 'score': 0.5796093344688416}, {'label': 'fear', 'score': 0.007454819045960903}, {'label': 'surprise', 'score': 0.0006976505974307656}]]
anger


#### Bad Guy by Billie Eilish

In [None]:
lyrics = """
[Verse 1]
White shirt now red, my bloody nose
Sleepin', you're on your tippy toes
Creepin' around like no one knows
Think you're so criminal
Bruises on both my knees for you
Don't say thank you or please
I do what I want when I'm wanting to
My soul? So cynical

[Chorus]
So you're a tough guy
Like it really rough guy
Just can't get enough guy
Chest always so puffed guy
I'm that bad type
Make your mama sad type
Make your girlfriend mad tight
Might seduce your dad type
I'm the bad guy
Duh

[Post-Chorus]
I'm the bad guy

[Verse 2]
I like it when you take control
Even if you know that you don't
Own me, I'll let you play the role
I'll be your animal
My mommy likes to sing along with me
But she won't sing this song
If she reads all the lyrics
She'll pity the men I know

You might also like
Pink Venom
BLACKPINK
BLACKPINK - Pink Venom (Romanized)
Genius Romanizations
BREAK MY SOUL
Beyoncé

[Chorus]
So you're a tough guy
Like it really rough guy
Just can't get enough guy
Chest always so puffed guy
I'm that bad type
Make your mama sad type
Make your girlfriend mad tight
Might seduce your dad type
I'm the bad guy
Duh

[Post-Chorus]
I'm the bad guy, duh
I'm only good at bein' bad, bad

[Bridge]
I like when you get mad
I guess I'm pretty glad that you're alone
You said she's scared of me?
I mean, I don't see what she sees
But maybe it's 'cause I'm wearing your cologne

[Outro]
I'm a bad guy
I'm, I'm a bad guy
Bad guy, bad guy
I'm a bad
"""

In [None]:
prediction = classifier(lyrics)
print(prediction)
max_label = max(prediction[0], key=lambda x: x['score'])['label']
print(max_label)

[[{'label': 'sadness', 'score': 0.40335676074028015}, {'label': 'joy', 'score': 0.007068042177706957}, {'label': 'love', 'score': 0.0018133686389774084}, {'label': 'anger', 'score': 0.5796093344688416}, {'label': 'fear', 'score': 0.007454819045960903}, {'label': 'surprise', 'score': 0.0006976505974307656}]]
anger


In [None]:
outputs = pd.read_csv("../../mnt/disks/songsnap/data_output.csv")

FileNotFoundError: [Errno 2] No such file or directory: '../../mnt/disks/songsnap/data_output.csv'