## Where to get song lyrics

There are a few commonly used ways to download song lyrics for any song

* Webscraping from YTMusic
* Webscraping from Genius
* Webscraping from AZLyrics
* Retrieving from MusixMatch

I've tried all four. YTMusic was inconsistent and could not be relied on. Genius Lyrics were fine but required parsing large amounts of data. AZLyrics was the best due to minimal loads but it has aggressive IP blocking so using it would require a proxy.

To avoid paying for a proxy, I decided to explore the fourth route. MusixMatch is the lyric provider that Spotify themselves use for obtaining lyrics. Unfortunately MusixMatch will only offer 30% of a song's lyrics for free, and to obtain more you have to manually contact MusixMatch and obtain a quote.

For this POC we will use MusixMatch to obtain song lyrics so we can perform semantic analysis, but this code will be modular by nature so we could easily replace this with AZLyrics + a proxy in the future.

In [16]:
import os
from musixmatch import Musixmatch
import pandas as pd

key = os.environ.get("MUSIXMATCH_API")
musixmatch = Musixmatch(key)

## Using MusixMatch to obtain song lyrics

For this I'll be using a very helpful library found [here](https://github.com/hudsonbrendon/python-musixmatch)

### Loading in song database

I picked a niche rap artist with a potentially hard to parse name and a large song library of somewhat weird song names to stress test MusixMatch

In [17]:
file = "out2.csv"
artist_name = "$uicideboy$"
track_info = pd.read_csv(file,index_col=0)
track_info

Unnamed: 0,album_name,track_name,release_date,popularity,duration_ms,acousticness,danceability,energy,instrumentalness,key,liveness,loudness,mode,speechiness,tempo,time_signature,valence
49YpGS0rVcRLtiDvx5JQyp,DIRTIESTNASTIEST$UICIDE,Sorry for the Delay,2022-12-16,0.0,172399,0.00951,0.787,0.889,0.000322,2.0,0.6520,-3.125,1,0.1280,156.027,4,0.677
5dol1hrERJOReznLRJ2VVQ,DIRTIESTNASTIEST$UICIDE,BUCKHEAD,2022-12-16,0.0,183919,0.00026,0.759,0.833,0.057300,11.0,0.1780,-5.010,1,0.0779,140.026,4,0.522
3QQXpvZd9qmzHZ02wDf2im,DIRTIESTNASTIEST$UICIDE,I Dream of Chrome,2022-12-16,0.0,145842,0.04840,0.840,0.934,0.000000,0.0,0.0961,-3.717,1,0.1190,149.994,4,0.670
1UsvO5U72YRU8Xnq8Lp14O,DIRTIESTNASTIEST$UICIDE,Champagne Face,2022-12-16,0.0,140288,0.02310,0.894,0.767,0.000024,10.0,0.5740,-4.695,0,0.1370,144.077,4,0.412
2CkpD7gqMXrrpTCJ9TZ0bw,DIRTIESTNASTIEST$UICIDE,The Serpent and the Rainbow,2022-12-16,0.0,177289,0.00147,0.780,0.780,0.000000,0.0,0.4720,-2.857,1,0.0858,118.014,4,0.446
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
4Gy5kycvHxatuBiNQBCPA6,KILL YOURSELF Part I: The $uicide Saga,Kill Yourself,2014-01-01,0.0,174602,0.48900,0.856,0.709,0.000085,10.0,0.0845,-6.976,0,0.0420,110.063,4,0.403
2LRyFZnPdogRD8fjdj0gHr,KILL YOURSELF Part I: The $uicide Saga,Mask & Da Glock,2014-01-01,0.0,247483,0.02310,0.759,0.882,0.000015,9.0,0.5440,-5.996,1,0.1840,129.992,4,0.570
5SN1ffDyC7OtMlZjdOKgHZ,KILL YOURSELF Part I: The $uicide Saga,Maple Syrup,2014-01-01,0.0,171002,0.04770,0.602,0.858,0.039000,2.0,0.2740,-6.671,1,0.0716,129.859,4,0.580
6cDsdfgV7UHdDc2AokAylv,KILL YOURSELF Part I: The $uicide Saga,Kill Yourself - Leaned Out Remix,2014-01-01,0.0,206158,0.10700,0.649,0.738,0.071500,0.0,0.3040,-5.972,0,0.0452,93.245,4,0.523


### Obtaining Song Lyrics

As we observe below: very few dropped songs out of 189.

In [18]:
for track_id in track_info.index:
    track_name = track_info.loc[track_id,"track_name"]
    try:
        lyrics_res = musixmatch.matcher_lyrics_get(track_name,artist_name)
        lyrics = lyrics_res["message"]["body"]["lyrics"]["lyrics_body"]
        track_info.loc[track_id,"lyrics"] = lyrics
    except:
        print("Failed to find lyrics for " + track_name)

Failed to find lyrics for THE_EVIL_THAT_MEN_DO
Failed to find lyrics for One Last Look at the Damage


In [None]:
track_info

Unnamed: 0,album_name,track_name,release_date,popularity,duration_ms,acousticness,danceability,energy,instrumentalness,key,liveness,loudness,mode,speechiness,tempo,time_signature,valence,lyrics
49YpGS0rVcRLtiDvx5JQyp,DIRTIESTNASTIEST$UICIDE,Sorry for the Delay,2022-12-16,0.0,172399,0.00951,0.787,0.889,0.000322,2.0,0.6520,-3.125,1,0.1280,156.027,4,0.677,"(Pimp type brotha from the underground, G, Ski..."
5dol1hrERJOReznLRJ2VVQ,DIRTIESTNASTIEST$UICIDE,BUCKHEAD,2022-12-16,0.0,183919,0.00026,0.759,0.833,0.057300,11.0,0.1780,-5.010,1,0.0779,140.026,4,0.522,"(You did good, $lick)\n\nOoh ooooh!\nYeah, yea..."
3QQXpvZd9qmzHZ02wDf2im,DIRTIESTNASTIEST$UICIDE,I Dream of Chrome,2022-12-16,0.0,145842,0.04840,0.840,0.934,0.000000,0.0,0.0961,-3.717,1,0.1190,149.994,4,0.670,"(Do the beam-up, Scotty)\n(Bubble bath and get..."
1UsvO5U72YRU8Xnq8Lp14O,DIRTIESTNASTIEST$UICIDE,Champagne Face,2022-12-16,0.0,140288,0.02310,0.894,0.767,0.000024,10.0,0.5740,-4.695,0,0.1370,144.077,4,0.412,"(Check the clock, we're running out of time)\n..."
2CkpD7gqMXrrpTCJ9TZ0bw,DIRTIESTNASTIEST$UICIDE,The Serpent and the Rainbow,2022-12-16,0.0,177289,0.00147,0.780,0.780,0.000000,0.0,0.4720,-2.857,1,0.0858,118.014,4,0.446,"(Yeah, fuck)\n(Should a nigga plan a hit)\n(Or..."
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
4Gy5kycvHxatuBiNQBCPA6,KILL YOURSELF Part I: The $uicide Saga,Kill Yourself,2014-01-01,0.0,174602,0.48900,0.856,0.709,0.000085,10.0,0.0845,-6.976,0,0.0420,110.063,4,0.403,This a $crim beat\nDJ $crim with that 808\n\n$...
2LRyFZnPdogRD8fjdj0gHr,KILL YOURSELF Part I: The $uicide Saga,Mask & Da Glock,2014-01-01,0.0,247483,0.02310,0.759,0.882,0.000015,9.0,0.5440,-5.996,1,0.1840,129.992,4,0.570,This a $crim beat\nHere comes the pain!\nDJ $c...
5SN1ffDyC7OtMlZjdOKgHZ,KILL YOURSELF Part I: The $uicide Saga,Maple Syrup,2014-01-01,0.0,171002,0.04770,0.602,0.858,0.039000,2.0,0.2740,-6.671,1,0.0716,129.859,4,0.580,This a $crim beat\nYung Christ (Yeah hoe)\n59 ...
6cDsdfgV7UHdDc2AokAylv,KILL YOURSELF Part I: The $uicide Saga,Kill Yourself - Leaned Out Remix,2014-01-01,0.0,206158,0.10700,0.649,0.738,0.071500,0.0,0.3040,-5.972,0,0.0452,93.245,4,0.523,This a $crim beat\nDJ $crim with that 808\n\n$...


In [None]:
track_info.to_csv("track-info-lyrics.csv")

## Using Vader for Sentiment Intensity Analysis

We ideally want to use a pre-trained model to do all of our lyric sentiment analysis for us. From research, there are quite a few models that work well for this purpose (Vader, RoBERTa, TextBlob, Google NL). We will use Vader today to see what results we can obtain.

[Vader](https://github.com/cjhutto/vaderSentiment) is a model "specifically attuned to sentiments expressed in social media." For example, Vader could be used to predict that the sentence "VADER is smart, handsome, and funny" is a positive sentence and that "VADER is not smart, handsome, nor funny" is a negative sentence.

Let's see if we can use Vader's classifications of positive, negative, and neutral to draw easy insights from song lyrics to use in our song recommendation algorithm.

In [None]:
def get_sentiment(compound):
    if compound >= 0.05:
        return "Positive"
    if compound <= -0.05:
        return "Negative"
    return "Neutral"

In [None]:
from vaderSentiment.vaderSentiment import SentimentIntensityAnalyzer
sentences = track_info["lyrics"].values
analyzer = SentimentIntensityAnalyzer()

counts = {}

for track_id in track_info.index:
    try:
        track_name = track_info.loc[track_id,"track_name"]
        sentence = track_info.loc[track_id,"lyrics"]
        vs = analyzer.polarity_scores(sentence)
        sentiment = get_sentiment(vs["compound"])
        counts[sentiment] = counts.get(sentiment,0) + 1
        print("{:-<65} {} {}".format(track_name, sentiment,str(vs)))
    except:
        print("We got an error on song",track_name)

Sorry for the Delay---------------------------------------------- Negative {'neg': 0.146, 'neu': 0.734, 'pos': 0.12, 'compound': -0.9503}
BUCKHEAD--------------------------------------------------------- Negative {'neg': 0.179, 'neu': 0.614, 'pos': 0.207, 'compound': -0.7696}
I Dream of Chrome------------------------------------------------ Positive {'neg': 0.014, 'neu': 0.953, 'pos': 0.033, 'compound': 0.4471}
Champagne Face--------------------------------------------------- Neutral {'neg': 0.06, 'neu': 0.848, 'pos': 0.092, 'compound': -0.0402}
The Serpent and the Rainbow-------------------------------------- Negative {'neg': 0.135, 'neu': 0.751, 'pos': 0.114, 'compound': -0.7547}
My Swisher Sweet, But My Sig Sauer------------------------------- Negative {'neg': 0.089, 'neu': 0.863, 'pos': 0.048, 'compound': -0.9337}
Center Core Never More------------------------------------------- Negative {'neg': 0.145, 'neu': 0.807, 'pos': 0.048, 'compound': -0.9381}
Genesis------------------------

In [15]:
counts

{'Negative': 278, 'Positive': 32, 'Neutral': 11}

### Analyzing the results

The fact that a lot of songs by a group called "Suicideboys" are classified as negative shouldn't be too surprising. However, this is an alarming rate. 

Additionally, some of the classifications in particular are concerning. For example, "5 Grand at 8 to 1" is one of the few positive songs, but the entire song speaks to themes of change, struggle, and longing with lyrics such as "Caught up in a nightmare so I don't sleep right I just want them to treat me like they used to." This is not something I would describe as positive.

Another song that was classified as positive was King Tulip. This songs speaks to themes of change, addiction, and isolation with lyrics such as "They say I made it, and that should be satisfactory Lately, I feel like I have nobody."

There is obviously a lot of potential iteration to do on the work above (especially since we're working with incomplete song lyrics), but it seems as though Vader's "sentiments" don't truly capture song motifs in the way I had hoped. We can still use these in our recommendation algorithm to provide additional datapoints, but I don't feel comfortable throwing in just arbitrary statistics I don't know how to interpret.

However there is a bright light. The themes I classified above in the two mentioned songs were entirely classified by ChatGPT after being asked for 3 themes for each song. In my testing, it has been absolutely incredible at interpreting lyrics in a way that no other pre-trained model has been able to do.



## Next step

The next step will be to experiment with classic NLP methods to group songs together based on distance and to experiment with using ChatGPT for song theme recognition and using those as categorical labels for every song in our original algorithm.