# (2.) Attach song characteristics from Spotify (OPTIONAL!)

After downloading the lyrics, we can also attach other characteristics of the songs to the dataset. Maybe we want our generated song to be in a certain **mood**? A very danceable song maybe? However, **we ended up not using these characteristics** in the model as they mainly describe the music and not the lyrics. <br><br>

The idea of this notebook is to use the Spotify API to add song characteristics to the data, which are not necessary for a simple lyrics generator but could help when we want to build a more sophisticated one that also considers genre / mood as user input.<br><br>

To do this we have to find the song on spotify (get its spotify id) and then look for its characteristics, so <br>
*1*. Get track-id from artist / title <br>
*2*. Then get audio-features using track-id <br><br>

As the genre is not part of the audio-features but rather saved by spotify on a artist level, we need another step to attach the genre of a song to the data. <br>
*3*. Get artist-id and find genre

In [None]:
# imports
import pandas as pd
import numpy as np
import re
from datetime import date
import lg_functions as lg
# install library pycld2 for detecting the main language of a song
!pip install -U pycld2
import pycld2

Collecting git+https://github.com/johnwmillr/LyricsGenius.git
  Cloning https://github.com/johnwmillr/LyricsGenius.git to /tmp/pip-req-build-n_nspx4u
  Running command git clone --filter=blob:none --quiet https://github.com/johnwmillr/LyricsGenius.git /tmp/pip-req-build-n_nspx4u
  Resolved https://github.com/johnwmillr/LyricsGenius.git to commit fa9528551043cd60376b900a0adacf239a93fae5
  Preparing metadata (setup.py) ... [?25ldone
You should consider upgrading via the '/root/venv/bin/python -m pip install --upgrade pip' command.[0m[33m
You should consider upgrading via the '/root/venv/bin/python -m pip install --upgrade pip' command.[0m[33m
You should consider upgrading via the '/root/venv/bin/python -m pip install --upgrade pip' command.[0m[33m
[0m

In [None]:
# load the data
songs = pd.read_csv("./data/songs_only_lyrics.csv")
songs

Unnamed: 0,song,artist,url,lyrics
0,Easy On Me,['Adele'],https://genius.com/Adele-easy-on-me-lyrics,Easy On Me Lyrics\nThere ain't no gold in this...
1,Stay,"['The Kid LAROI', 'Justin Bieber']",https://genius.com/The-kid-laroi-and-justin-bi...,STAY Lyrics\nI do the same thing I told you th...
2,Industry Baby,"['Lil Nas X', 'Jack Harlow']",https://genius.com/Lil-nas-x-and-jack-harlow-i...,INDUSTRY BABY Lyrics\n(D-D-Daytrip took it to ...
3,Fancy Like,['Walker Hayes'],https://genius.com/Walker-hayes-fancy-like-lyrics,"Fancy Like Lyrics\nAyy\nMy girl is bangin', sh..."
4,Bad Habits,['Ed Sheeran'],https://genius.com/Ed-sheeran-bad-habits-lyrics,"Bad Habits Lyrics\n(One, two, three, four)\nOo..."
...,...,...,...,...
9044,The Greatest Romance Ever Sold,['Prince'],https://genius.com/Prince-the-greatest-romance...,The Greatest Romance Ever Sold Lyrics\nThe gre...
9045,The Christmas Song (Chestnuts Roasting On An O...,['Christina Aguilera'],https://genius.com/Christina-aguilera-christma...,Christmas Song (Chestnuts Roasting On An Open ...
9046,Deck The Halls,['SHeDAISY'],https://genius.com/Shedaisy-deck-the-halls-lyrics,Deck The Halls Lyrics(arranged by SHeDAISY and...
9047,I Love You,['Martina McBride'],https://genius.com/Martina-mcbride-i-love-you-...,I Love You Lyrics\nYeah\n\nThe sun is shining ...


In [None]:
# check if there's songs left with no lyrics
len(songs[songs["lyrics"].isnull()]) # should be 0

0

## Some processing and filtering

In [None]:
# replace pattern "song title" + "lyrics" that occurs at the beginning of all lyrics
songs["lyrics"] = songs["lyrics"].apply(lambda x: re.sub(r".+?Lyrics", "", x))

# replace string pattern in column artist as it leads to problems when searching for the song - only needed if we use built in methodology
#  of the lyricsgenius package (eg. searching for songs using the tag "pop")
# eg. for artists Twenty One Pilots as its formatted as "\\u200btwenty one pilots"
#songs["artist"] = songs["artist"].apply(lambda x: re.sub(r"\\u200b", "", x))

# change column artist into list (needed for asessing the first artist in later functions)
songs["artist"] = songs["artist"].apply(lambda x: re.sub("[\[\]\'\']",  "", x).split(","))

songs.tail()

Unnamed: 0,song,artist,url,lyrics
9044,The Greatest Romance Ever Sold,[Prince],https://genius.com/Prince-the-greatest-romance...,\nThe greatest romance that's ever been sold\n...
9045,The Christmas Song (Chestnuts Roasting On An O...,[Christina Aguilera],https://genius.com/Christina-aguilera-christma...,Chestnuts roasting on an open fire\nJack Frost...
9046,Deck The Halls,[SHeDAISY],https://genius.com/Shedaisy-deck-the-halls-lyrics,(arranged by SHeDAISY and Phil Symonds)\nDeck ...
9047,I Love You,[Martina McBride],https://genius.com/Martina-mcbride-i-love-you-...,\nYeah\n\nThe sun is shining everyday\nThe clo...
9048,Left & Right,"[""DAngelo"", Method Man And Redman]",https://genius.com/Dangelo-left-and-right-lyrics,"\nYo, yo, yo (Yeah, yeah)\nYo, yo\n\nMy flow's..."


Determine language of lyrics to delete non-English lyrics. For this, the package <mark>pycld2</mark> will be used.

In [None]:
# detect language and also keep accuracy of detection
songs["lang"], songs["lang_acc"] = zip(*songs["lyrics"].apply(lambda x: pycld2.detect(x)[2][0][1:3]))

In [None]:
songs["lang"].value_counts()

en    8865
es     142
ko      20
un       3
de       3
pt       3
fr       2
ro       1
da       1
pl       1
ja       1
sv       1
Name: lang, dtype: int64

In [None]:
# only keep english songs
songs = songs[songs["lang"] == "en"]

In [None]:
# accuracy of language prediction, note that for some songs the accuracy is quite small, ie. they might be english + spanish
songs["lang_acc"].value_counts()

99    8626
96      28
97      25
94      21
95      21
93      15
98      13
91      13
92      12
89       9
87       8
88       7
90       6
83       5
85       4
86       4
80       4
71       3
82       3
66       3
84       3
76       3
73       2
77       2
60       2
63       2
75       2
67       2
79       2
74       2
55       1
70       1
78       1
41       1
54       1
61       1
52       1
44       1
64       1
59       1
51       1
81       1
56       1
Name: lang_acc, dtype: int64

In [None]:
# only keep songs where the accuracy that it is english is >= 90%
songs = songs[songs["lang_acc"] >= 90]

## 1. Search Spotify ID

Search the corresponding Spotify ID (unique for every song) to the songs that we have lyrics for. For this, the function `search_sp_id` was created.

In [None]:
songs = songs.rename(columns = {"song": "title"})
songs = lg.search_sp_id(songs)
songs

number of songs processed: (in 500 steps)
0
500
HTTP Error for GET to https://api.spotify.com/v1/search with Params: {'q': "Perry Como And The Fontane Sisters With Mitchell Ayres And His Orchestra It's Beginning To Look A Lot Like Christmas", 'limit': 10, 'offset': 0, 'type': 'track', 'market': None} returned 404 due to Not found.
1000
1500
2500
3000
3500
4000
4500
5000
5500
6000
6500
7000
7500
8000
8500
9000


Unnamed: 0,title,artist,url,lyrics,lang,lang_acc,sp_id
0,Easy On Me,[Adele],https://genius.com/Adele-easy-on-me-lyrics,\nThere ain't no gold in this river\nThat I've...,en,99,0gplL1WMoJ6iYaPgMCL0gX
1,Stay,"[The Kid LAROI, Justin Bieber]",https://genius.com/The-kid-laroi-and-justin-bi...,\nI do the same thing I told you that I never ...,en,99,5HCyWlXZPP0y6Gqq8TgA20
2,Industry Baby,"[Lil Nas X, Jack Harlow]",https://genius.com/Lil-nas-x-and-jack-harlow-i...,"\n(D-D-Daytrip took it to ten, hey)\nBaby back...",en,99,27NovPIUIRrOZoCHxABJwK
3,Fancy Like,[Walker Hayes],https://genius.com/Walker-hayes-fancy-like-lyrics,"\nAyy\nMy girl is bangin', she's so low mainte...",en,99,58UKC45GPNTflCN6nwCUeF
4,Bad Habits,[Ed Sheeran],https://genius.com/Ed-sheeran-bad-habits-lyrics,"\n(One, two, three, four)\nOoh, ooh\n\nEvery t...",en,99,3rmo8F54jFF8OgYsqTxm5d
...,...,...,...,...,...,...,...
9044,The Greatest Romance Ever Sold,[Prince],https://genius.com/Prince-the-greatest-romance...,\nThe greatest romance that's ever been sold\n...,en,99,3A8pzjcWgAHry1Ix19z7ip
9045,The Christmas Song (Chestnuts Roasting On An O...,[Christina Aguilera],https://genius.com/Christina-aguilera-christma...,Chestnuts roasting on an open fire\nJack Frost...,en,99,none
9046,Deck The Halls,[SHeDAISY],https://genius.com/Shedaisy-deck-the-halls-lyrics,(arranged by SHeDAISY and Phil Symonds)\nDeck ...,en,99,3MAQlKrBxFN5QXR7SqxYQh
9047,I Love You,[Martina McBride],https://genius.com/Martina-mcbride-i-love-you-...,\nYeah\n\nThe sun is shining everyday\nThe clo...,en,99,6hvREHiu0i7PJv7XUHeo5w


In [None]:
# songs for which no id was found
sum(songs["sp_id"] == "none")

123

## 2. Create Audio Features

As a next step, the Spotify ID is used to find the audio features to the songs. They will be stored in a seperate dataframe and later merged back. A function `create_audio_features` was created. This function searches for the audio features like danceability etc.

In [None]:
audio = lg.create_audio_features(songs)
audio.tail()

Unnamed: 0,danceability,energy,key,loudness,mode,speechiness,acousticness,instrumentalness,liveness,valence,tempo,id
8651,0.85,0.675,10,-7.915,0,0.0442,0.0107,0.0,0.0914,0.816,131.996,2ZDxfuXmTIRCdXChbtHpW9
8652,0.844,0.457,8,-6.859,1,0.0422,0.37,5e-06,0.0768,0.594,90.022,3A8pzjcWgAHry1Ix19z7ip
8653,0.568,0.828,1,-7.123,0,0.0403,0.0151,2.1e-05,0.167,0.407,118.836,3MAQlKrBxFN5QXR7SqxYQh
8654,0.708,0.597,0,-5.411,1,0.025,0.019,0.0,0.177,0.629,106.706,6hvREHiu0i7PJv7XUHeo5w
8655,0.842,0.404,1,-9.504,1,0.628,0.219,0.0,0.369,0.826,91.963,4EQ0dK6Sg7v685NGrQvuki


## Merge

In [None]:
# merge with songs by id
merged = pd.merge(songs, audio, left_on = "sp_id", right_on = "id", how = "left")

# drop second column id
merged = merged.drop(["id"], axis = 1)

merged.tail(10)

Unnamed: 0,title,artist,url,lyrics,lang,lang_acc,sp_id,danceability,energy,key,loudness,mode,speechiness,acousticness,instrumentalness,liveness,valence,tempo
8828,Get Gone,[Ideal],https://genius.com/Ideal-usa-get-gone-lyrics,"\nHey, come in here for a minute\nSit down\nCo...",en,99,0CofintZCm8MhxiOMrauiT,0.702,0.52,10.0,-8.581,1.0,0.0725,0.435,0.0,0.299,0.467,120.005
8829,Heartbreaker,"[Mariah Carey, Jay-Z]",https://genius.com/Mariah-carey-heartbreaker-l...,\nYeah\nWe're gonna do it like this\nAight\nLe...,en,99,0jsANwwkkHyyeNyuTFq2XO,0.524,0.816,1.0,-5.872,1.0,0.37,0.383,0.0,0.349,0.789,200.031
8830,15 Minutes,[Marc Nelson],https://genius.com/Marc-nelson-15-minutes-lyrics,"Damn, what time is it\nAll snap, I gotta go to...",en,99,none,,,,,,,,,,,
8831,This Gift,[98 Degrees],https://genius.com/98-this-gift-lyrics,The snow is falling the city is white\nYour ey...,en,99,3ggPv9plkk3a5EfI4D9g2L,0.622,0.541,0.0,-8.643,1.0,0.0331,0.476,0.0,0.0819,0.401,86.963
8832,Give You What You Want (Fa Sure),[Chico DeBarge],https://genius.com/Chico-debarge-give-you-what...,"Gonna get it, get it, get it, get it\nBaby, ba...",en,99,2ZDxfuXmTIRCdXChbtHpW9,0.85,0.675,10.0,-7.915,0.0,0.0442,0.0107,0.0,0.0914,0.816,131.996
8833,The Greatest Romance Ever Sold,[Prince],https://genius.com/Prince-the-greatest-romance...,\nThe greatest romance that's ever been sold\n...,en,99,3A8pzjcWgAHry1Ix19z7ip,0.844,0.457,8.0,-6.859,1.0,0.0422,0.37,5e-06,0.0768,0.594,90.022
8834,The Christmas Song (Chestnuts Roasting On An O...,[Christina Aguilera],https://genius.com/Christina-aguilera-christma...,Chestnuts roasting on an open fire\nJack Frost...,en,99,none,,,,,,,,,,,
8835,Deck The Halls,[SHeDAISY],https://genius.com/Shedaisy-deck-the-halls-lyrics,(arranged by SHeDAISY and Phil Symonds)\nDeck ...,en,99,3MAQlKrBxFN5QXR7SqxYQh,0.568,0.828,1.0,-7.123,0.0,0.0403,0.0151,2.1e-05,0.167,0.407,118.836
8836,I Love You,[Martina McBride],https://genius.com/Martina-mcbride-i-love-you-...,\nYeah\n\nThe sun is shining everyday\nThe clo...,en,99,6hvREHiu0i7PJv7XUHeo5w,0.708,0.597,0.0,-5.411,1.0,0.025,0.019,0.0,0.177,0.629,106.706
8837,Left & Right,"[""DAngelo"", Method Man And Redman]",https://genius.com/Dangelo-left-and-right-lyrics,"\nYo, yo, yo (Yeah, yeah)\nYo, yo\n\nMy flow's...",en,99,4EQ0dK6Sg7v685NGrQvuki,0.842,0.404,1.0,-9.504,1.0,0.628,0.219,0.0,0.369,0.826,91.963


In [None]:
len(merged)

8838

In [None]:
# drop duplicates but keep entries with "none" -> they don't have audio features but lyrics can still be used
merged = merged[(~merged["sp_id"].duplicated()) | (merged["sp_id"] == "none")]
len(merged)

8752

In [None]:
# safe as csv
merged.to_csv("./data/songs_features.csv", index = False)

## 3. Add genre

Add the Spotify ID of the primary artist of the song, which is used to determine the genre of the song.

**Note:** The genre of the ***song*** *cannot* be determined with the Spotify API as the genre is only available on artist level. Therefore the genres (list of genres) that the primary artist of the song is related to are taken instead.

In [None]:
genres = lg.create_artists_genres(merged)

genres.tail(10)

number of songs processed: (in 500 steps)
0


KernelInterrupted: Execution interrupted by the Jupyter kernel.

In [None]:
# drop duplicates but keep entries with "none" -> they don't have audio features but lyrics can still be used
genres = genres[(~genres["sp_id"].duplicated()) | (genres["sp_id"] == "none")]

In [None]:
len(genres)

8708

In [None]:
# safe as csv
genres.to_csv("./data/songs_features_genres.csv", index = False)

In [None]:
genres.tail(10)

Unnamed: 0,title,artist,url,lyrics,lang,sp_id,danceability,energy,key,loudness,mode,speechiness,acousticness,instrumentalness,liveness,valence,tempo,art_id,genres
24826,Music Of My Heart,"['""N Sync""', ' Gloria Estefan']",https://genius.com/Gloria-estefan-and-nsync-mu...,\nYou'll never know what you've done for me\nW...,en,0M3ZIWNcizkhYFvn6RuCEz,0.375,0.556,11.0,-7.196,1.0,0.041,0.4,0.0,0.0956,0.397,111.827,6Ff53KvcvAj5U7Z1vojB5o,"boy band, dance pop, post-teen pop"
24827,Will 2K,"['Will Smith', ' K-Ci']",https://genius.com/Will-smith-will-2k-lyrics,\nIts here and I like it\n(Whoo! Ha-ha! Ha-ha!...,en,59xpdlaIK1l5hiYP1KsBxK,0.827,0.783,9.0,-3.918,0.0,0.0869,0.0692,0.0,0.628,0.832,117.933,41qil2VaGbD194gaEcmmyx,"hip hop, pop rap"
24828,Get Gone,['Ideal'],https://genius.com/Ideal-usa-get-gone-lyrics,"\nHey, come in here for a minute\nSit down\nCo...",en,0CofintZCm8MhxiOMrauiT,0.702,0.52,10.0,-8.581,1.0,0.0725,0.435,0.0,0.299,0.467,120.005,2bK1rpFhmGkImiZNuUyHVT,contemporary r&b
24829,Heartbreaker,"['Mariah Carey', ' Jay-Z']",https://genius.com/Mariah-carey-heartbreaker-l...,\nYeah\nWe're gonna do it like this\nAight\nLe...,en,0jsANwwkkHyyeNyuTFq2XO,0.524,0.816,1.0,-5.872,1.0,0.37,0.383,0.0,0.349,0.789,200.031,4iHNK0tOyZPYnBU7nGAgpQ,"dance pop, pop, urban contemporary"
24830,This Gift,['98 Degrees'],https://genius.com/98-this-gift-lyrics,The snow is falling the city is white\nYour ey...,en,3ggPv9plkk3a5EfI4D9g2L,0.622,0.541,0.0,-8.643,1.0,0.0331,0.476,0.0,0.0819,0.401,86.963,6V03b3Y36lolYP2orXn8mV,"boy band, dance pop"
24831,Give You What You Want (Fa Sure),['Chico DeBarge'],https://genius.com/Chico-debarge-give-you-what...,"Gonna get it, get it, get it, get it\nBaby, ba...",en,2ZDxfuXmTIRCdXChbtHpW9,0.85,0.675,10.0,-7.915,0.0,0.0442,0.0107,0.0,0.0914,0.816,131.996,67ISVBZzcCTTKM17Ps00sx,"contemporary r&b, neo soul, r&b, urban contemp..."
24832,The Greatest Romance Ever Sold,['Prince'],https://genius.com/Prince-the-greatest-romance...,\nThe greatest romance that's ever been sold\n...,en,3A8pzjcWgAHry1Ix19z7ip,0.844,0.457,8.0,-6.859,1.0,0.0422,0.37,5e-06,0.0768,0.594,90.022,5a2EaR3hamoenG9rDuVn8j,"funk, funk rock, minneapolis sound, synth funk"
24833,Deck The Halls,['SHeDAISY'],https://genius.com/Shedaisy-deck-the-halls-lyrics,(arranged by SHeDAISY and Phil Symonds)\nDeck ...,en,3MAQlKrBxFN5QXR7SqxYQh,0.568,0.828,1.0,-7.123,0.0,0.0403,0.0151,2.1e-05,0.167,0.407,118.836,2qFe0FyUMK8XXoyOsfYJr2,"contemporary country, country, country dawn, c..."
24834,I Love You,['Martina McBride'],https://genius.com/Martina-mcbride-i-love-you-...,\nYeah\n\nThe sun is shining everyday\nThe clo...,en,6hvREHiu0i7PJv7XUHeo5w,0.708,0.597,0.0,-5.411,1.0,0.025,0.019,0.0,0.177,0.629,106.706,3P33qFNGBVXl86yQYWspFj,"contemporary country, country, country dawn, c..."
24835,Left & Right,"['""DAngelo""', ' Method Man And Redman']",https://genius.com/Dangelo-left-and-right-lyrics,"\nYo, yo, yo (Yeah, yeah)\nYo, yo\n\nMy flow's...",en,2Wmee1fuuP9Ppj13r4BDPp,0.6,0.956,6.0,-5.286,1.0,0.0465,0.139,0.0283,0.0481,0.845,136.008,45yx1rBykdTiIHG65hOgdx,
