<a href="https://colab.research.google.com/github/ryanczhang7/spotifyproject/blob/master/Spotify_Data_Acquisition_Part_2.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Intro

**My original dataset contains only about 1050 observations, of which around 220 were lost when attempting to scrape lyrics, and 10 more were lost when querying personality analysis. Best case, I want many thousands of observations for a good model, not to mention I would like to use XGBoost and LightGBM effectively, both of which require large datasets. So getting observations in the hundreds of thousands would be optimal, as I doubt I could get a dataset with millions of observations. Another problem is that none of the data in my original dataset is time labelled, which is a problem since the target label popularity is likely highly influenced by the release date of a song.**

**Both of these problems are solved by this data set I found created by Andrew Thompson https://components.one/datasets/billboard-200/. It has 340,000 observations of tracks, and they all have time labels. I expect to lose many observations again when I scrape lyrics and query personality analysis, but I hope to have at least 100,000 when all is done. Also, my original dataset contains many unpopular album songs, some of which function as intros or interludes. These songs are all from the billboard 100, which may make them significantly better observations for the purpose of training a model to predict popularity. Ths dataset also has no null values. Currently, the Spotify Web API features missing that I need to query are popularity and explicit. Then I need to attempt to web scrape lyrics, and query sentiment analysis and personality analysis again. This will all take quite a while, given 340,000 observations. The data is in a database file.**

# Getting the table from SQL Database File

In [0]:
import sqlite3
import pandas as pd

db = sqlite3.connect('/content/drive/My Drive/billboard-200.db')
cursor = db.cursor()
cursor.execute("SELECT name FROM sqlite_master WHERE type='table';")
tables = cursor.fetchall()
cursor.close()
db.close()

tables

[('albums',), ('acoustic_features',)]

In [0]:
# The table we want is the acoustic_features table
db = sqlite3.connect('/content/drive/My Drive/billboard-200.db')
spotify_df = pd.read_sql("SELECT * from acoustic_features", db)
spotify_df

Unnamed: 0,id,song,album,artist,acousticness,danceability,duration_ms,energy,instrumentalness,key,liveness,loudness,mode,speechiness,tempo,time_signature,valence,album_id,date
0,0Veyvc3n9AcLSoK3r1dA12,Voices In My Head,Hoodie SZN,A Boogie Wit da Hoodie,0.0555,0.754,142301.0,0.663,0.000000,6.0,0.101,-6.311,0.0,0.4270,90.195,4.0,0.207,3r5hf3Cj3EMh1C2saQ8jyt,2018-12-21
1,77JzXZonNumWsuXKy9vr3U,Beasty,Hoodie SZN,A Boogie Wit da Hoodie,0.2920,0.860,152829.0,0.418,0.000000,7.0,0.106,-9.061,0.0,0.1580,126.023,4.0,0.374,3r5hf3Cj3EMh1C2saQ8jyt,2018-12-21
2,18yllZD0TdF7ykcREib8Z1,I Did It,Hoodie SZN,A Boogie Wit da Hoodie,0.1530,0.718,215305.0,0.454,0.000046,8.0,0.116,-9.012,1.0,0.1270,89.483,4.0,0.196,3r5hf3Cj3EMh1C2saQ8jyt,2018-12-21
3,1wJRveJZLSb1rjhnUHQiv6,Swervin (feat. 6ix9ine),Hoodie SZN,A Boogie Wit da Hoodie,0.0153,0.581,189487.0,0.662,0.000000,9.0,0.111,-5.239,1.0,0.3030,93.023,4.0,0.434,3r5hf3Cj3EMh1C2saQ8jyt,2018-12-21
4,0jAfdqv18goRTUxm3ilRjb,Startender (feat. Offset and Tyga),Hoodie SZN,A Boogie Wit da Hoodie,0.0235,0.736,192779.0,0.622,0.000000,6.0,0.151,-4.653,0.0,0.1330,191.971,4.0,0.506,3r5hf3Cj3EMh1C2saQ8jyt,2018-12-21
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
339850,1EU8l9SctgP0gwIFxdjKPA,It'S A Raggy Waltz - Live,The Dave Brubeck Quartet At Carnegie Hall,The Dave Brubeck Quartet,0.6550,0.445,434427.0,0.544,0.000325,8.0,0.235,-11.662,1.0,0.0792,176.723,3.0,0.654,4My0KPjdtzCUfFjToOCiPh,1963
339851,4a0J3zWWe5IXdwWWQSypjq,King For A Day - Live,The Dave Brubeck Quartet At Carnegie Hall,The Dave Brubeck Quartet,0.8350,0.600,375533.0,0.172,0.865000,8.0,0.421,-22.897,1.0,0.0672,135.005,4.0,0.458,4My0KPjdtzCUfFjToOCiPh,1963
339852,28VEEbzNdg6r5gQFY5wWI3,Castilian Drums - Live,The Dave Brubeck Quartet At Carnegie Hall,The Dave Brubeck Quartet,0.1920,0.459,861933.0,0.606,0.612000,10.0,0.398,-13.427,0.0,0.0645,116.325,4.0,0.359,4My0KPjdtzCUfFjToOCiPh,1963
339853,7i3NXBP12BJoerk6sAATx0,Blue Rondo A La Turk - Live,The Dave Brubeck Quartet At Carnegie Hall,The Dave Brubeck Quartet,0.5220,0.444,761573.0,0.508,0.025000,10.0,0.929,-11.111,1.0,0.0526,62.879,4.0,0.592,4My0KPjdtzCUfFjToOCiPh,1963


In [0]:
# converting date feature to datetime object
spotify_df["date"] = pd.to_datetime(spotify_df["date"])
spotify_df["date"]

0        2018-12-21
1        2018-12-21
2        2018-12-21
3        2018-12-21
4        2018-12-21
            ...    
339850   1963-01-01
339851   1963-01-01
339852   1963-01-01
339853   1963-01-01
339854   1963-01-01
Name: date, Length: 339855, dtype: datetime64[ns]

In [0]:
# Saving to csv so I don't have to repeat above steps
spotify_df.to_csv('/content/drive/My Drive/spotify_df.csv', index=False)

# Querying the missing popularity and explicit features from Spotify Web API

In [0]:
spotify_df = pd.read_csv("/content/drive/My Drive/spotify_df.csv")

In [0]:
spotify_df

Unnamed: 0,id,song,album,artist,acousticness,danceability,duration_ms,energy,instrumentalness,key,liveness,loudness,mode,speechiness,tempo,time_signature,valence,album_id,date
0,0Veyvc3n9AcLSoK3r1dA12,Voices In My Head,Hoodie SZN,A Boogie Wit da Hoodie,0.0555,0.754,142301.0,0.663,0.000000,6.0,0.101,-6.311,0.0,0.4270,90.195,4.0,0.207,3r5hf3Cj3EMh1C2saQ8jyt,2018-12-21
1,77JzXZonNumWsuXKy9vr3U,Beasty,Hoodie SZN,A Boogie Wit da Hoodie,0.2920,0.860,152829.0,0.418,0.000000,7.0,0.106,-9.061,0.0,0.1580,126.023,4.0,0.374,3r5hf3Cj3EMh1C2saQ8jyt,2018-12-21
2,18yllZD0TdF7ykcREib8Z1,I Did It,Hoodie SZN,A Boogie Wit da Hoodie,0.1530,0.718,215305.0,0.454,0.000046,8.0,0.116,-9.012,1.0,0.1270,89.483,4.0,0.196,3r5hf3Cj3EMh1C2saQ8jyt,2018-12-21
3,1wJRveJZLSb1rjhnUHQiv6,Swervin (feat. 6ix9ine),Hoodie SZN,A Boogie Wit da Hoodie,0.0153,0.581,189487.0,0.662,0.000000,9.0,0.111,-5.239,1.0,0.3030,93.023,4.0,0.434,3r5hf3Cj3EMh1C2saQ8jyt,2018-12-21
4,0jAfdqv18goRTUxm3ilRjb,Startender (feat. Offset and Tyga),Hoodie SZN,A Boogie Wit da Hoodie,0.0235,0.736,192779.0,0.622,0.000000,6.0,0.151,-4.653,0.0,0.1330,191.971,4.0,0.506,3r5hf3Cj3EMh1C2saQ8jyt,2018-12-21
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
339850,1EU8l9SctgP0gwIFxdjKPA,It'S A Raggy Waltz - Live,The Dave Brubeck Quartet At Carnegie Hall,The Dave Brubeck Quartet,0.6550,0.445,434427.0,0.544,0.000325,8.0,0.235,-11.662,1.0,0.0792,176.723,3.0,0.654,4My0KPjdtzCUfFjToOCiPh,1963-01-01
339851,4a0J3zWWe5IXdwWWQSypjq,King For A Day - Live,The Dave Brubeck Quartet At Carnegie Hall,The Dave Brubeck Quartet,0.8350,0.600,375533.0,0.172,0.865000,8.0,0.421,-22.897,1.0,0.0672,135.005,4.0,0.458,4My0KPjdtzCUfFjToOCiPh,1963-01-01
339852,28VEEbzNdg6r5gQFY5wWI3,Castilian Drums - Live,The Dave Brubeck Quartet At Carnegie Hall,The Dave Brubeck Quartet,0.1920,0.459,861933.0,0.606,0.612000,10.0,0.398,-13.427,0.0,0.0645,116.325,4.0,0.359,4My0KPjdtzCUfFjToOCiPh,1963-01-01
339853,7i3NXBP12BJoerk6sAATx0,Blue Rondo A La Turk - Live,The Dave Brubeck Quartet At Carnegie Hall,The Dave Brubeck Quartet,0.5220,0.444,761573.0,0.508,0.025000,10.0,0.929,-11.111,1.0,0.0526,62.879,4.0,0.592,4My0KPjdtzCUfFjToOCiPh,1963-01-01


In [0]:
# Using spotipy package to interact with Spotify Web API
import spotipy

# To access authorised Spotify data
from spotipy.oauth2 import SpotifyClientCredentials

client_credentials_manager = SpotifyClientCredentials(
    client_id=client_id, client_secret=client_secret)
# Spotify object to access API
sp = spotipy.Spotify(client_credentials_manager=client_credentials_manager) 

In [0]:
# Testing grabbing the popularity of a song
pop = sp.track(spotify_df.iloc[339854,0])
pop["popularity"]

13

In [0]:
import time

In [0]:
# Grabbing popularity and explicit feature for each song

tracks = []

for uri in spotify_df["id"]:
  track_features = {}
  
  # grabbing popularity
  pop = sp.track(uri)
  track_features['popularity'] = pop['popularity']
  track_features['explicit'] = pop['explicit']
  tracks.append(track_features)
  time.sleep(0.1)

popularity = pd.json_normalize(tracks)
popularity

In [0]:
popularity = json_normalize(tracks)
popularity

  """Entry point for launching an IPython kernel.


Unnamed: 0,popularity,explicit
0,58,True
1,59,True
2,57,True
3,83,True
4,71,True
...,...,...
240516,13,False
240517,15,False
240518,23,False
240519,27,False


In [0]:
spotify_df.iloc[240521:,0]

240521    55QovqYXaR526sZWxnKQqz
240522    26AKvLdnRWa98vbOWJ2zfe
240523    1SxNgia01ERxRsFRcCu1yv
240524    3MN82VctKXwosPol8bzyDM
240525    1fKWRNMOMoG6NzaQwSkp9K
                   ...          
339850    1EU8l9SctgP0gwIFxdjKPA
339851    4a0J3zWWe5IXdwWWQSypjq
339852    28VEEbzNdg6r5gQFY5wWI3
339853    7i3NXBP12BJoerk6sAATx0
339854    57j1v3ks11sygQuUWcSHaL
Name: id, Length: 99334, dtype: object

In [0]:
tracks = []

for uri in spotify_df.iloc[240521:,0]:
  track_features = {}
  
  # grabbing popularity
  pop = sp.track(uri)
  track_features['popularity'] = pop['popularity']
  track_features['explicit'] = pop['explicit']
  tracks.append(track_features)
  time.sleep(0.1)

popularity2 = json_normalize(tracks)
popularity2

  from ipykernel import kernelapp as app


Unnamed: 0,popularity,explicit
0,17,False
1,24,False
2,16,False
3,41,False
4,15,False
...,...,...
99329,9,False
99330,8,False
99331,10,False
99332,11,False


In [0]:
# I ended the first query early so I concatenate both dfs
popularity_complete = pd.concat([popularity, popularity2], ignore_index=True)

In [0]:
# The observations are in the same order as in the original dataset, and both 
# are 0 indexed, so I can merge by index.
spotify_df = spotify_df.merge(popularity_complete, left_index=True, 
                              right_index=True, how="outer")

In [0]:
spotify_df

Unnamed: 0,id,song,album,artist,acousticness,danceability,duration_ms,energy,instrumentalness,key,liveness,loudness,mode,speechiness,tempo,time_signature,valence,album_id,date,popularity,explicit
0,0Veyvc3n9AcLSoK3r1dA12,Voices In My Head,Hoodie SZN,A Boogie Wit da Hoodie,0.0555,0.754,142301.0,0.663,0.000000,6.0,0.101,-6.311,0.0,0.4270,90.195,4.0,0.207,3r5hf3Cj3EMh1C2saQ8jyt,2018-12-21,58,True
1,77JzXZonNumWsuXKy9vr3U,Beasty,Hoodie SZN,A Boogie Wit da Hoodie,0.2920,0.860,152829.0,0.418,0.000000,7.0,0.106,-9.061,0.0,0.1580,126.023,4.0,0.374,3r5hf3Cj3EMh1C2saQ8jyt,2018-12-21,59,True
2,18yllZD0TdF7ykcREib8Z1,I Did It,Hoodie SZN,A Boogie Wit da Hoodie,0.1530,0.718,215305.0,0.454,0.000046,8.0,0.116,-9.012,1.0,0.1270,89.483,4.0,0.196,3r5hf3Cj3EMh1C2saQ8jyt,2018-12-21,57,True
3,1wJRveJZLSb1rjhnUHQiv6,Swervin (feat. 6ix9ine),Hoodie SZN,A Boogie Wit da Hoodie,0.0153,0.581,189487.0,0.662,0.000000,9.0,0.111,-5.239,1.0,0.3030,93.023,4.0,0.434,3r5hf3Cj3EMh1C2saQ8jyt,2018-12-21,83,True
4,0jAfdqv18goRTUxm3ilRjb,Startender (feat. Offset and Tyga),Hoodie SZN,A Boogie Wit da Hoodie,0.0235,0.736,192779.0,0.622,0.000000,6.0,0.151,-4.653,0.0,0.1330,191.971,4.0,0.506,3r5hf3Cj3EMh1C2saQ8jyt,2018-12-21,71,True
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
339850,1EU8l9SctgP0gwIFxdjKPA,It'S A Raggy Waltz - Live,The Dave Brubeck Quartet At Carnegie Hall,The Dave Brubeck Quartet,0.6550,0.445,434427.0,0.544,0.000325,8.0,0.235,-11.662,1.0,0.0792,176.723,3.0,0.654,4My0KPjdtzCUfFjToOCiPh,1963-01-01,9,False
339851,4a0J3zWWe5IXdwWWQSypjq,King For A Day - Live,The Dave Brubeck Quartet At Carnegie Hall,The Dave Brubeck Quartet,0.8350,0.600,375533.0,0.172,0.865000,8.0,0.421,-22.897,1.0,0.0672,135.005,4.0,0.458,4My0KPjdtzCUfFjToOCiPh,1963-01-01,8,False
339852,28VEEbzNdg6r5gQFY5wWI3,Castilian Drums - Live,The Dave Brubeck Quartet At Carnegie Hall,The Dave Brubeck Quartet,0.1920,0.459,861933.0,0.606,0.612000,10.0,0.398,-13.427,0.0,0.0645,116.325,4.0,0.359,4My0KPjdtzCUfFjToOCiPh,1963-01-01,10,False
339853,7i3NXBP12BJoerk6sAATx0,Blue Rondo A La Turk - Live,The Dave Brubeck Quartet At Carnegie Hall,The Dave Brubeck Quartet,0.5220,0.444,761573.0,0.508,0.025000,10.0,0.929,-11.111,1.0,0.0526,62.879,4.0,0.592,4My0KPjdtzCUfFjToOCiPh,1963-01-01,11,False


In [0]:
# Saving this complete audio features dataset to csv
spotify_df.to_csv("/content/drive/My Drive/spotify_audio.csv", index=False)

# Web Scraping lyrics off Metrolyrics

In [0]:
import requests
from bs4 import BeautifulSoup
import re
import string
import time

**Many of these artists below I will not be able to get lyrics for, like various artists, or sountrack, or original cast.**

In [0]:
spotify_df["artist"].value_counts()

Various Artists                     11578
Soundtrack                           8260
Original Cast                         888
Original Broadway Cast Recording      855
Elvis Presley                         772
                                    ...  
Joe Dolce                               1
John O'Banion                           1
Juan Velez                              1
Patrick Hernandez                       1
Yipes                                   1
Name: artist, Length: 8081, dtype: int64

In [0]:
# Make new column with just clean song names (without (feat ...), 
# and without - Live) using regex

spotify_df["song"] = spotify_df["song"].str.replace(r' \([^)]*\)', "")
spotify_df["song"] = spotify_df["song"].str.replace(r' - Live', "")
spotify_df["song"]

0            Voices In My Head
1                       Beasty
2                     I Did It
3                      Swervin
4                   Startender
                  ...         
339850      It'S A Raggy Waltz
339851          King For A Day
339852         Castilian Drums
339853    Blue Rondo A La Turk
339854               Take Five
Name: song, Length: 339855, dtype: object

In [0]:
spotify_df.to_csv("/content/drive/My Drive/spotify_audio.csv", index=False)

In [0]:
spotify_df = pd.read_csv("/content/drive/My Drive/spotify_audio.csv")
spotify_df

Unnamed: 0,id,song,album,artist,acousticness,danceability,duration_ms,energy,instrumentalness,key,liveness,loudness,mode,speechiness,tempo,time_signature,valence,album_id,date,popularity,explicit
0,0Veyvc3n9AcLSoK3r1dA12,Voices In My Head,Hoodie SZN,A Boogie Wit da Hoodie,0.0555,0.754,142301.0,0.663,0.000000,6.0,0.101,-6.311,0.0,0.4270,90.195,4.0,0.207,3r5hf3Cj3EMh1C2saQ8jyt,2018-12-21,58,True
1,77JzXZonNumWsuXKy9vr3U,Beasty,Hoodie SZN,A Boogie Wit da Hoodie,0.2920,0.860,152829.0,0.418,0.000000,7.0,0.106,-9.061,0.0,0.1580,126.023,4.0,0.374,3r5hf3Cj3EMh1C2saQ8jyt,2018-12-21,59,True
2,18yllZD0TdF7ykcREib8Z1,I Did It,Hoodie SZN,A Boogie Wit da Hoodie,0.1530,0.718,215305.0,0.454,0.000046,8.0,0.116,-9.012,1.0,0.1270,89.483,4.0,0.196,3r5hf3Cj3EMh1C2saQ8jyt,2018-12-21,57,True
3,1wJRveJZLSb1rjhnUHQiv6,Swervin,Hoodie SZN,A Boogie Wit da Hoodie,0.0153,0.581,189487.0,0.662,0.000000,9.0,0.111,-5.239,1.0,0.3030,93.023,4.0,0.434,3r5hf3Cj3EMh1C2saQ8jyt,2018-12-21,83,True
4,0jAfdqv18goRTUxm3ilRjb,Startender,Hoodie SZN,A Boogie Wit da Hoodie,0.0235,0.736,192779.0,0.622,0.000000,6.0,0.151,-4.653,0.0,0.1330,191.971,4.0,0.506,3r5hf3Cj3EMh1C2saQ8jyt,2018-12-21,71,True
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
339850,1EU8l9SctgP0gwIFxdjKPA,It'S A Raggy Waltz,The Dave Brubeck Quartet At Carnegie Hall,The Dave Brubeck Quartet,0.6550,0.445,434427.0,0.544,0.000325,8.0,0.235,-11.662,1.0,0.0792,176.723,3.0,0.654,4My0KPjdtzCUfFjToOCiPh,1963-01-01,9,False
339851,4a0J3zWWe5IXdwWWQSypjq,King For A Day,The Dave Brubeck Quartet At Carnegie Hall,The Dave Brubeck Quartet,0.8350,0.600,375533.0,0.172,0.865000,8.0,0.421,-22.897,1.0,0.0672,135.005,4.0,0.458,4My0KPjdtzCUfFjToOCiPh,1963-01-01,8,False
339852,28VEEbzNdg6r5gQFY5wWI3,Castilian Drums,The Dave Brubeck Quartet At Carnegie Hall,The Dave Brubeck Quartet,0.1920,0.459,861933.0,0.606,0.612000,10.0,0.398,-13.427,0.0,0.0645,116.325,4.0,0.359,4My0KPjdtzCUfFjToOCiPh,1963-01-01,10,False
339853,7i3NXBP12BJoerk6sAATx0,Blue Rondo A La Turk,The Dave Brubeck Quartet At Carnegie Hall,The Dave Brubeck Quartet,0.5220,0.444,761573.0,0.508,0.025000,10.0,0.929,-11.111,1.0,0.0526,62.879,4.0,0.592,4My0KPjdtzCUfFjToOCiPh,1963-01-01,11,False


In [0]:
# Testing generating a url for metrolyrics from the dataframe columns
# An example of an url is below. Urls don't have any punctuation except -

songs = zip(spotify_df["artist"], spotify_df["song"])
url = "https://www.metrolyrics.com/"
a = list(songs)
# Making a list in the right order, including the lyrics string in between
b = a[0][1].split() + ["lyrics"] + a[0][0].split()
# Removing punctuation from each string in list
c = [c.translate(str.maketrans('', '', string.punctuation)) for c in b]
# Joining the list with a - between each string
d = "-".join(c).lower()
d += ".html"
url += d
url

'https://www.metrolyrics.com/voices-in-my-head-lyrics-a-boogie-wit-da-hoodie.html'

In [0]:
# Testing grabbing lyrics for the url generated above.

content = requests.get(url)
soup = BeautifulSoup(content.content, 'html.parser')

# Metrolyrics has nice label "verse" for div, which makes scraping easy
a = soup.findAll("p", class_="verse")
# Song lyrics are text separated by html tags, so I join them with a space
c = " ".join([b.text for b in a])
# Removing verse labels, and adding spaces to \n for even spacing
d = re.sub(r'\[.*?\]', "", c).strip().replace("\n", " \n ").lower()
# Removing punctuation
e = d.translate(str.maketrans('', '', string.punctuation))
# Removing parentheses around any song lyrics
f = re.sub('[()]', '', e)
# Replacing multiple spaces with just one, this will be chain_lyrics
g = re.sub(r' +', " ", f)
# This will be the clean lyrics without \n
h = re.sub(r' +', " ", g.replace("\n", " "))
h

'monstas gon tear it up all she ever wanted was my heart to hurt no attachments just a gucci purse she know im mad rich she think im usin her my diamonds mad rich they so rude to her and the way im actin is all due to her if i think she thottin im gon do it first im gon call my side bitch and we gon do the work she know im from highbridge my chain show through the shirt the feds locked my man up free montana damn all he ever wanted was his bands up they gave him 10 years for nothin keep your head up if any niggas start to fret nigga lay em out just listen to all my tracks cause i dont wanna do no interviews im not into that and i be feelin like im malcolm i got the x on my back and every nigga in the x can vouch for me thats a fact i dropped my first mixtape and yeah that shit ran laps by the time i dropped the bigger artist like seven million in plaques got too much gold im way too smart to leave my crib without straps and any nigga run up on us gon get hit like that and ima be like n

**Becuase of how long it takes to scrape these lyrics I will have to partition the spotify_df into multiple parts and scrape lyrics for each part.**

In [0]:
# Web scraping the lyrics for songs off of metrolyrics.com
# This is what I used for the original 1050 songs
songs = zip(spotify_df["artist"], spotify_df["song"])

song_lyrics = []
for song in songs:
  url = "https://www.metrolyrics.com/"
  # Dictionary for the song lyrics
  lyrics_dict = {}

  try:
    a = list(song)
    b = list(a)[1].split() + ["lyrics"] + list(a)[0].split()
    c = [c.translate(str.maketrans('', '', string.punctuation)) for c in b]
    d = "-".join(c).lower()
    d += ".html"
    url += d

    content = requests.get(url)
    soup = BeautifulSoup(content.content, 'html.parser')

    # Metrolyrics has nice label "verse" for div, which makes scraping easy
    a = soup.findAll("p", class_="verse")
    # Song lyrics are text separated by html tags, so I join them with a space
    c = " ".join([b.text for b in a])
    # Removing verse labels, and adding spaces to \n for even spacing
    d = re.sub(r'\[.*?\]', "", c).strip().replace("\n", " \n ").lower()
    # Removing punctuation
    e = d.translate(str.maketrans('', '', string.punctuation))
    # Removing parentheses around any song lyrics
    f = re.sub('[()]', '', e)
    # Replacing multiple spaces with just one, this will be chain_lyrics
    g = re.sub(r' +', " ", f)
    # This will be the clean lyrics without \n
    h = re.sub(r' +', " ", g.replace("\n", " "))
    lyrics_dict["chain_lyrics"] = g
    lyrics_dict["lyrics"] = h

    song_lyrics.append(lyrics_dict)

  # Try except, in case there is an problem with the scraping append empty dict
  except Exception as e:
    song_lyrics.append({})

    time.sleep(0.01)

In [0]:
# Saving progress so far
lyrics_df = pd.json_normalize(song_lyrics)
lyrics_df.to_csv("/content/drive/My Drive/lyrics_df.csv", index=False)

In [0]:
lyrics_df = pd.read_csv("/content/drive/My Drive/lyrics_df.csv")
lyrics_df

Unnamed: 0,chain_lyrics,lyrics
0,monstas gon tear it up all she ever wanted was...,monstas gon tear it up all she ever wanted was...
1,all that we know is the logos and hb \n the gl...,all that we know is the logos and hb the glock...
2,yeah i cant trust these bitches \n i dont got ...,yeah i cant trust these bitches i dont got no ...
3,uh \n yeah \n oh thats london on da track run ...,uh yeah oh thats london on da track run that s...
4,yeah shawty got ass she just got a tummy tuck ...,yeah shawty got ass she just got a tummy tuck ...
...,...,...
99271,how many times have you cut only to feel the p...,how many times have you cut only to feel the p...
99272,close your eyes \n and just hear me sing \n on...,close your eyes and just hear me sing one last...
99273,and you say this aint living \n you say you ca...,and you say this aint living you say you cant ...
99274,a trail of tears beyond redemption \n just a w...,a trail of tears beyond redemption just a word...


In [0]:
# Second partition
spotify_partition1 = spotify_df.iloc[99276:175000,:]
spotify_partition1

In [0]:
songs = zip(spotify_partition1["artist"], spotify_partition1["song"])

song_lyrics = []
for song in songs:
  url = "https://www.metrolyrics.com/"
  # Dictionary for the song lyrics
  lyrics_dict = {}

  try:
    a = list(song)
    b = list(a)[1].split() + ["lyrics"] + list(a)[0].split()
    c = [c.translate(str.maketrans('', '', string.punctuation)) for c in b]
    d = "-".join(c).lower()
    d += ".html"
    url += d

    content = requests.get(url)
    soup = BeautifulSoup(content.content, 'html.parser')

    # Metrolyrics has nice label "verse" for div, which makes scraping easy
    a = soup.findAll("p", class_="verse")
    # Song lyrics are text separated by html tags, so I join them with a space
    c = " ".join([b.text for b in a])
    # Removing verse labels, and adding spaces to \n for even spacing
    d = re.sub(r'\[.*?\]', "", c).strip().replace("\n", " \n ").lower()
    # Removing punctuation
    e = d.translate(str.maketrans('', '', string.punctuation))
    # Removing parentheses around any song lyrics
    f = re.sub('[()]', '', e)
    # Replacing multiple spaces with just one, this will be chain_lyrics
    g = re.sub(r' +', " ", f)
    # This will be the clean lyrics without \n
    h = re.sub(r' +', " ", g.replace("\n", " "))
    lyrics_dict["chain_lyrics"] = g
    lyrics_dict["lyrics"] = h

    song_lyrics.append(lyrics_dict)

  # Try except, in case there is an problem with the scraping append empty dict
  except Exception as e:
    song_lyrics.append({})

    time.sleep(0.01)

In [0]:
lyrics_df1 = pd.json_normalize(song_lyrics)
lyrics_df1

Unnamed: 0,chain_lyrics,lyrics
0,,
1,,
2,sitting alone in the cold of the night \n your...,sitting alone in the cold of the night youre t...
3,they say you are right \n i hope that im wrong...,they say you are right i hope that im wrong i ...
4,now there is a light \n in the dark some will ...,now there is a light in the dark some will say...
...,...,...
75719,,
75720,shove me under you again \n i cant wait for th...,shove me under you again i cant wait for this ...
75721,i got somethin up my sleeve \n i know you will...,i got somethin up my sleeve i know you will co...
75722,lets go the day has come to an end \n the sun ...,lets go the day has come to an end the sun is ...


In [0]:
# Concatenating first and second partition
lyrics_df = pd.concat([lyrics_df, lyrics_df1], ignore_index=True)
lyrics_df

Unnamed: 0,chain_lyrics,lyrics
0,monstas gon tear it up all she ever wanted was...,monstas gon tear it up all she ever wanted was...
1,all that we know is the logos and hb \n the gl...,all that we know is the logos and hb the glock...
2,yeah i cant trust these bitches \n i dont got ...,yeah i cant trust these bitches i dont got no ...
3,uh \n yeah \n oh thats london on da track run ...,uh yeah oh thats london on da track run that s...
4,yeah shawty got ass she just got a tummy tuck ...,yeah shawty got ass she just got a tummy tuck ...
...,...,...
174995,,
174996,shove me under you again \n i cant wait for th...,shove me under you again i cant wait for this ...
174997,i got somethin up my sleeve \n i know you will...,i got somethin up my sleeve i know you will co...
174998,lets go the day has come to an end \n the sun ...,lets go the day has come to an end the sun is ...


In [0]:
# Saving progress so far
lyrics_df.to_csv("/content/drive/My Drive/lyrics_df.csv", index=False)

In [0]:
lyrics_df = pd.read_csv("/content/drive/My Drive/lyrics_df.csv")
lyrics_df

Unnamed: 0,chain_lyrics,lyrics
0,monstas gon tear it up all she ever wanted was...,monstas gon tear it up all she ever wanted was...
1,all that we know is the logos and hb \n the gl...,all that we know is the logos and hb the glock...
2,yeah i cant trust these bitches \n i dont got ...,yeah i cant trust these bitches i dont got no ...
3,uh \n yeah \n oh thats london on da track run ...,uh yeah oh thats london on da track run that s...
4,yeah shawty got ass she just got a tummy tuck ...,yeah shawty got ass she just got a tummy tuck ...
...,...,...
174995,,
174996,shove me under you again \n i cant wait for th...,shove me under you again i cant wait for this ...
174997,i got somethin up my sleeve \n i know you will...,i got somethin up my sleeve i know you will co...
174998,lets go the day has come to an end \n the sun ...,lets go the day has come to an end the sun is ...


In [0]:
# The third partition
spotify_partition2 = spotify_df.iloc[175000:260000,:]
spotify_partition2

Unnamed: 0,id,song,album,artist,acousticness,danceability,duration_ms,energy,instrumentalness,key,liveness,loudness,mode,speechiness,tempo,time_signature,valence,album_id,date,popularity,explicit
175000,231g8oadCYb2JCKMHiWn0H,Natural Life,Saturate,Breaking Benjamin,0.000134,0.466,239467.0,0.936,0.325000,1.0,0.1670,-5.962,0.0,0.0401,150.191,3.0,0.555,1yJqr0efi39dFYC1GkA1aI,2002-01-01,39,False
175001,3AZAEuBA6XcDBN2OhoiIkd,Next To Nothing,Saturate,Breaking Benjamin,0.004110,0.594,223867.0,0.789,0.000328,0.0,0.2470,-5.994,0.0,0.0345,106.963,4.0,0.487,1yJqr0efi39dFYC1GkA1aI,2002-01-01,39,False
175002,4LrbjBe6pFbDCXQAyrTzhj,Water,Saturate,Breaking Benjamin,0.000081,0.431,252667.0,0.861,0.322000,1.0,0.0957,-5.424,1.0,0.0474,172.092,4.0,0.364,1yJqr0efi39dFYC1GkA1aI,2002-01-01,38,False
175003,4sSutqK9JRIqWDRBQ9zP9o,Home,Saturate,Breaking Benjamin,0.000156,0.513,217627.0,0.842,0.064500,9.0,0.0501,-6.020,1.0,0.0425,93.900,4.0,0.579,1yJqr0efi39dFYC1GkA1aI,2002-01-01,41,False
175004,4wWF9CU6X4RJCBlTxzyg9y,Phase,Saturate,Breaking Benjamin,0.001020,0.483,270600.0,0.689,0.264000,6.0,0.0921,-6.879,0.0,0.0434,176.130,4.0,0.200,1yJqr0efi39dFYC1GkA1aI,2002-01-01,38,False
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
259995,2f0auRvwTvwIN3ZGIFhd2d,Two Steps Ahead,Box Of Frogs,Box Of Frogs,0.063500,0.659,271093.0,0.644,0.084700,11.0,0.1280,-10.363,0.0,0.0300,109.491,4.0,0.738,1kVTwsLgH0XUt0ac9Awkb6,1984-01-01,8,False
259996,0SBn7AOys0rceEKCgZoWNn,Into the Dark,Box Of Frogs,Box Of Frogs,0.320000,0.791,246667.0,0.302,0.000376,4.0,0.1140,-13.173,0.0,0.0469,117.003,4.0,0.447,1kVTwsLgH0XUt0ac9Awkb6,1984-01-01,6,False
259997,2m33PlUf3Dd6e2slYKD5bM,Just a Boy Again,Box Of Frogs,Box Of Frogs,0.185000,0.835,338600.0,0.571,0.000645,0.0,0.0786,-11.969,1.0,0.0422,103.067,4.0,0.713,1kVTwsLgH0XUt0ac9Awkb6,1984-01-01,7,False
259998,0jQzPkltxwUwpJeM6CQu0X,Poor Boy,Box Of Frogs,Box Of Frogs,0.057200,0.615,255400.0,0.636,0.013000,2.0,0.0846,-10.568,1.0,0.0308,97.781,4.0,0.661,1kVTwsLgH0XUt0ac9Awkb6,1984-01-01,7,False


In [0]:
songs = zip(spotify_partition2["artist"], spotify_partition2["song"])

song_lyrics = []
for song in songs:
  url = "https://www.metrolyrics.com/"
  # Dictionary for the song lyrics
  lyrics_dict = {}

  try:
    a = list(song)
    b = list(a)[1].split() + ["lyrics"] + list(a)[0].split()
    c = [c.translate(str.maketrans('', '', string.punctuation)) for c in b]
    d = "-".join(c).lower()
    d += ".html"
    url += d

    content = requests.get(url)
    soup = BeautifulSoup(content.content, 'html.parser')

    # Metrolyrics has nice label "verse" for div, which makes scraping easy
    a = soup.findAll("p", class_="verse")
    # Song lyrics are text separated by html tags, so I join them with a space
    c = " ".join([b.text for b in a])
    # Removing verse labels, and adding spaces to \n for even spacing
    d = re.sub(r'\[.*?\]', "", c).strip().replace("\n", " \n ").lower()
    # Removing punctuation
    e = d.translate(str.maketrans('', '', string.punctuation))
    # Removing parentheses around any song lyrics
    f = re.sub('[()]', '', e)
    # Replacing multiple spaces with just one, this will be chain_lyrics
    g = re.sub(r' +', " ", f)
    # This will be the clean lyrics without \n
    h = re.sub(r' +', " ", g.replace("\n", " "))
    lyrics_dict["chain_lyrics"] = g
    lyrics_dict["lyrics"] = h

    song_lyrics.append(lyrics_dict)

  # Try except, in case there is an problem with the scraping append empty dict
  except Exception as e:
    song_lyrics.append({})

    time.sleep(0.01)

In [0]:
lyrics_df2 = pd.json_normalize(song_lyrics)
lyrics_df2

Unnamed: 0,chain_lyrics,lyrics
0,hold still all of my life all of my time \n i ...,hold still all of my life all of my time i don...
1,beneath this wave \n i just cant take your bre...,beneath this wave i just cant take your breath...
2,whats all this talk of a notion \n id rather d...,whats all this talk of a notion id rather drin...
3,ive got a little red bow \n and i bought it fo...,ive got a little red bow and i bought it for y...
4,the light is dead in your eye \n so ill keep l...,the light is dead in your eye so ill keep livi...
...,...,...
84995,,
84996,,
84997,,
84998,,


In [0]:
# Concatenating the third partition with the rest so far
lyrics_df = pd.concat([lyrics_df, lyrics_df2], ignore_index=True)
lyrics_df

Unnamed: 0,chain_lyrics,lyrics
0,monstas gon tear it up all she ever wanted was...,monstas gon tear it up all she ever wanted was...
1,all that we know is the logos and hb \n the gl...,all that we know is the logos and hb the glock...
2,yeah i cant trust these bitches \n i dont got ...,yeah i cant trust these bitches i dont got no ...
3,uh \n yeah \n oh thats london on da track run ...,uh yeah oh thats london on da track run that s...
4,yeah shawty got ass she just got a tummy tuck ...,yeah shawty got ass she just got a tummy tuck ...
...,...,...
259995,,
259996,,
259997,,
259998,,


In [0]:
lyrics_df.to_csv("/content/drive/My Drive/lyrics_df.csv", index=False)

In [0]:
# The fourth partition
spotify_partition3 = spotify_df.iloc[260000:339855,:]
spotify_partition3

Unnamed: 0,id,song,album,artist,acousticness,danceability,duration_ms,energy,instrumentalness,key,liveness,loudness,mode,speechiness,tempo,time_signature,valence,album_id,date,popularity,explicit
260000,4rdVZkDBpWvIEIu66TB1SL,Just Like the USA,Knife,Aztec Camera,0.0409,0.722,243227.0,0.472,0.000000,2.0,0.1510,-15.918,1.0,0.0497,128.945,4.0,0.817,4k7k4G9sGUKuXTYqxgQZkd,1984-01-01,11,False
260001,4LFnLp1Lzudln0MowwORBF,Head Is Happy,Knife,Aztec Camera,0.0826,0.693,254267.0,0.444,0.000000,9.0,0.0994,-14.298,1.0,0.0272,107.479,4.0,0.693,4k7k4G9sGUKuXTYqxgQZkd,1984-01-01,8,False
260002,3z2Lf3LyFBIw9uxm9pC4Pa,The Back Door to Heaven,Knife,Aztec Camera,0.0609,0.622,322973.0,0.579,0.000008,4.0,0.0679,-14.241,1.0,0.0279,100.127,4.0,0.459,4k7k4G9sGUKuXTYqxgQZkd,1984-01-01,8,False
260003,7HT6ZXnDZikE5HSQRL0Efj,All I Need Is Everything,Knife,Aztec Camera,0.3070,0.614,350800.0,0.515,0.001200,9.0,0.3560,-15.268,1.0,0.0306,120.885,4.0,0.474,4k7k4G9sGUKuXTYqxgQZkd,1984-01-01,14,False
260004,0nk1mWP7jcCa5591QFVckI,Backwards and Forwards,Knife,Aztec Camera,0.1910,0.708,252893.0,0.287,0.000003,7.0,0.0508,-18.012,1.0,0.0302,108.065,4.0,0.321,4k7k4G9sGUKuXTYqxgQZkd,1984-01-01,9,False
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
339850,1EU8l9SctgP0gwIFxdjKPA,It'S A Raggy Waltz,The Dave Brubeck Quartet At Carnegie Hall,The Dave Brubeck Quartet,0.6550,0.445,434427.0,0.544,0.000325,8.0,0.2350,-11.662,1.0,0.0792,176.723,3.0,0.654,4My0KPjdtzCUfFjToOCiPh,1963-01-01,9,False
339851,4a0J3zWWe5IXdwWWQSypjq,King For A Day,The Dave Brubeck Quartet At Carnegie Hall,The Dave Brubeck Quartet,0.8350,0.600,375533.0,0.172,0.865000,8.0,0.4210,-22.897,1.0,0.0672,135.005,4.0,0.458,4My0KPjdtzCUfFjToOCiPh,1963-01-01,8,False
339852,28VEEbzNdg6r5gQFY5wWI3,Castilian Drums,The Dave Brubeck Quartet At Carnegie Hall,The Dave Brubeck Quartet,0.1920,0.459,861933.0,0.606,0.612000,10.0,0.3980,-13.427,0.0,0.0645,116.325,4.0,0.359,4My0KPjdtzCUfFjToOCiPh,1963-01-01,10,False
339853,7i3NXBP12BJoerk6sAATx0,Blue Rondo A La Turk,The Dave Brubeck Quartet At Carnegie Hall,The Dave Brubeck Quartet,0.5220,0.444,761573.0,0.508,0.025000,10.0,0.9290,-11.111,1.0,0.0526,62.879,4.0,0.592,4My0KPjdtzCUfFjToOCiPh,1963-01-01,11,False


In [0]:
songs = zip(spotify_partition3["artist"], spotify_partition3["song"])

song_lyrics = []
for song in songs:
  url = "https://www.metrolyrics.com/"
  # Dictionary for the song lyrics
  lyrics_dict = {}

  try:
    a = list(song)
    b = list(a)[1].split() + ["lyrics"] + list(a)[0].split()
    c = [c.translate(str.maketrans('', '', string.punctuation)) for c in b]
    d = "-".join(c).lower()
    d += ".html"
    url += d

    content = requests.get(url)
    soup = BeautifulSoup(content.content, 'html.parser')

    # Metrolyrics has nice label "verse" for div, which makes scraping easy
    a = soup.findAll("p", class_="verse")
    # Song lyrics are text separated by html tags, so I join them with a space
    c = " ".join([b.text for b in a])
    # Removing verse labels, and adding spaces to \n for even spacing
    d = re.sub(r'\[.*?\]', "", c).strip().replace("\n", " \n ").lower()
    # Removing punctuation
    e = d.translate(str.maketrans('', '', string.punctuation))
    # Removing parentheses around any song lyrics
    f = re.sub('[()]', '', e)
    # Replacing multiple spaces with just one, this will be chain_lyrics
    g = re.sub(r' +', " ", f)
    # This will be the clean lyrics without \n
    h = re.sub(r' +', " ", g.replace("\n", " "))
    lyrics_dict["chain_lyrics"] = g
    lyrics_dict["lyrics"] = h

    song_lyrics.append(lyrics_dict)

  # Try except, in case there is an problem with the scraping append empty dict
  except Exception as e:
    song_lyrics.append({})

    time.sleep(0.01)

In [0]:
lyrics_df3 = pd.json_normalize(song_lyrics)
lyrics_df3

Unnamed: 0,chain_lyrics,lyrics
0,,
1,i recall the biggest beach \n throwing stones ...,i recall the biggest beach throwing stones alm...
2,my eyes are stuck on sleepless dreams \n this ...,my eyes are stuck on sleepless dreams this wor...
3,tears \n just like the jewels adorn their corp...,tears just like the jewels adorn their corpora...
4,,
...,...,...
79850,instrumental \n fidel girl tv chiuso bluetooth...,instrumental fidel girl tv chiuso bluetooth fs...
79851,,
79852,,
79853,,


In [0]:
# Concatenating the 4th partition with the rest
lyrics_df = pd.concat([lyrics_df, lyrics_df3], ignore_index=True)
lyrics_df

Unnamed: 0,chain_lyrics,lyrics
0,monstas gon tear it up all she ever wanted was...,monstas gon tear it up all she ever wanted was...
1,all that we know is the logos and hb \n the gl...,all that we know is the logos and hb the glock...
2,yeah i cant trust these bitches \n i dont got ...,yeah i cant trust these bitches i dont got no ...
3,uh \n yeah \n oh thats london on da track run ...,uh yeah oh thats london on da track run that s...
4,yeah shawty got ass she just got a tummy tuck ...,yeah shawty got ass she just got a tummy tuck ...
...,...,...
339850,instrumental \n fidel girl tv chiuso bluetooth...,instrumental fidel girl tv chiuso bluetooth fs...
339851,,
339852,,
339853,,


In [0]:
# Saving progress again
lyrics_df.to_csv("/content/drive/My Drive/lyrics_df.csv", index=False)

In [0]:
# Merging the two dfs together. They are in the same order, so I can merge by index
spotify_df = spotify_df.merge(lyrics_df, left_index=True, right_index=True,
                              how='outer')
spotify_df

Unnamed: 0,id,song,album,artist,acousticness,danceability,duration_ms,energy,instrumentalness,key,liveness,loudness,mode,speechiness,tempo,time_signature,valence,album_id,date,popularity,explicit,chain_lyrics,lyrics
0,0Veyvc3n9AcLSoK3r1dA12,Voices In My Head,Hoodie SZN,A Boogie Wit da Hoodie,0.0555,0.754,142301.0,0.663,0.000000,6.0,0.101,-6.311,0.0,0.4270,90.195,4.0,0.207,3r5hf3Cj3EMh1C2saQ8jyt,2018-12-21,58,True,monstas gon tear it up all she ever wanted was...,monstas gon tear it up all she ever wanted was...
1,77JzXZonNumWsuXKy9vr3U,Beasty,Hoodie SZN,A Boogie Wit da Hoodie,0.2920,0.860,152829.0,0.418,0.000000,7.0,0.106,-9.061,0.0,0.1580,126.023,4.0,0.374,3r5hf3Cj3EMh1C2saQ8jyt,2018-12-21,59,True,all that we know is the logos and hb \n the gl...,all that we know is the logos and hb the glock...
2,18yllZD0TdF7ykcREib8Z1,I Did It,Hoodie SZN,A Boogie Wit da Hoodie,0.1530,0.718,215305.0,0.454,0.000046,8.0,0.116,-9.012,1.0,0.1270,89.483,4.0,0.196,3r5hf3Cj3EMh1C2saQ8jyt,2018-12-21,57,True,yeah i cant trust these bitches \n i dont got ...,yeah i cant trust these bitches i dont got no ...
3,1wJRveJZLSb1rjhnUHQiv6,Swervin,Hoodie SZN,A Boogie Wit da Hoodie,0.0153,0.581,189487.0,0.662,0.000000,9.0,0.111,-5.239,1.0,0.3030,93.023,4.0,0.434,3r5hf3Cj3EMh1C2saQ8jyt,2018-12-21,83,True,uh \n yeah \n oh thats london on da track run ...,uh yeah oh thats london on da track run that s...
4,0jAfdqv18goRTUxm3ilRjb,Startender,Hoodie SZN,A Boogie Wit da Hoodie,0.0235,0.736,192779.0,0.622,0.000000,6.0,0.151,-4.653,0.0,0.1330,191.971,4.0,0.506,3r5hf3Cj3EMh1C2saQ8jyt,2018-12-21,71,True,yeah shawty got ass she just got a tummy tuck ...,yeah shawty got ass she just got a tummy tuck ...
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
339850,1EU8l9SctgP0gwIFxdjKPA,It'S A Raggy Waltz,The Dave Brubeck Quartet At Carnegie Hall,The Dave Brubeck Quartet,0.6550,0.445,434427.0,0.544,0.000325,8.0,0.235,-11.662,1.0,0.0792,176.723,3.0,0.654,4My0KPjdtzCUfFjToOCiPh,1963-01-01,9,False,instrumental \n fidel girl tv chiuso bluetooth...,instrumental fidel girl tv chiuso bluetooth fs...
339851,4a0J3zWWe5IXdwWWQSypjq,King For A Day,The Dave Brubeck Quartet At Carnegie Hall,The Dave Brubeck Quartet,0.8350,0.600,375533.0,0.172,0.865000,8.0,0.421,-22.897,1.0,0.0672,135.005,4.0,0.458,4My0KPjdtzCUfFjToOCiPh,1963-01-01,8,False,,
339852,28VEEbzNdg6r5gQFY5wWI3,Castilian Drums,The Dave Brubeck Quartet At Carnegie Hall,The Dave Brubeck Quartet,0.1920,0.459,861933.0,0.606,0.612000,10.0,0.398,-13.427,0.0,0.0645,116.325,4.0,0.359,4My0KPjdtzCUfFjToOCiPh,1963-01-01,10,False,,
339853,7i3NXBP12BJoerk6sAATx0,Blue Rondo A La Turk,The Dave Brubeck Quartet At Carnegie Hall,The Dave Brubeck Quartet,0.5220,0.444,761573.0,0.508,0.025000,10.0,0.929,-11.111,1.0,0.0526,62.879,4.0,0.592,4My0KPjdtzCUfFjToOCiPh,1963-01-01,11,False,,


In [0]:
spotify_df = pd.read_csv("/content/drive/My Drive/spotify_lyrics.csv")
spotify_df

Unnamed: 0,id,song,album,artist,acousticness,danceability,duration_ms,energy,instrumentalness,key,liveness,loudness,mode,speechiness,tempo,time_signature,valence,album_id,date,popularity,explicit,chain_lyrics,lyrics
0,0Veyvc3n9AcLSoK3r1dA12,Voices In My Head,Hoodie SZN,A Boogie Wit da Hoodie,0.0555,0.754,142301.0,0.663,0.000000,6.0,0.101,-6.311,0.0,0.4270,90.195,4.0,0.207,3r5hf3Cj3EMh1C2saQ8jyt,2018-12-21,58,True,monstas gon tear it up all she ever wanted was...,monstas gon tear it up all she ever wanted was...
1,77JzXZonNumWsuXKy9vr3U,Beasty,Hoodie SZN,A Boogie Wit da Hoodie,0.2920,0.860,152829.0,0.418,0.000000,7.0,0.106,-9.061,0.0,0.1580,126.023,4.0,0.374,3r5hf3Cj3EMh1C2saQ8jyt,2018-12-21,59,True,all that we know is the logos and hb \n the gl...,all that we know is the logos and hb the glock...
2,18yllZD0TdF7ykcREib8Z1,I Did It,Hoodie SZN,A Boogie Wit da Hoodie,0.1530,0.718,215305.0,0.454,0.000046,8.0,0.116,-9.012,1.0,0.1270,89.483,4.0,0.196,3r5hf3Cj3EMh1C2saQ8jyt,2018-12-21,57,True,yeah i cant trust these bitches \n i dont got ...,yeah i cant trust these bitches i dont got no ...
3,1wJRveJZLSb1rjhnUHQiv6,Swervin,Hoodie SZN,A Boogie Wit da Hoodie,0.0153,0.581,189487.0,0.662,0.000000,9.0,0.111,-5.239,1.0,0.3030,93.023,4.0,0.434,3r5hf3Cj3EMh1C2saQ8jyt,2018-12-21,83,True,uh \n yeah \n oh thats london on da track run ...,uh yeah oh thats london on da track run that s...
4,0jAfdqv18goRTUxm3ilRjb,Startender,Hoodie SZN,A Boogie Wit da Hoodie,0.0235,0.736,192779.0,0.622,0.000000,6.0,0.151,-4.653,0.0,0.1330,191.971,4.0,0.506,3r5hf3Cj3EMh1C2saQ8jyt,2018-12-21,71,True,yeah shawty got ass she just got a tummy tuck ...,yeah shawty got ass she just got a tummy tuck ...
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
339850,1EU8l9SctgP0gwIFxdjKPA,It'S A Raggy Waltz,The Dave Brubeck Quartet At Carnegie Hall,The Dave Brubeck Quartet,0.6550,0.445,434427.0,0.544,0.000325,8.0,0.235,-11.662,1.0,0.0792,176.723,3.0,0.654,4My0KPjdtzCUfFjToOCiPh,1963-01-01,9,False,instrumental \n fidel girl tv chiuso bluetooth...,instrumental fidel girl tv chiuso bluetooth fs...
339851,4a0J3zWWe5IXdwWWQSypjq,King For A Day,The Dave Brubeck Quartet At Carnegie Hall,The Dave Brubeck Quartet,0.8350,0.600,375533.0,0.172,0.865000,8.0,0.421,-22.897,1.0,0.0672,135.005,4.0,0.458,4My0KPjdtzCUfFjToOCiPh,1963-01-01,8,False,,
339852,28VEEbzNdg6r5gQFY5wWI3,Castilian Drums,The Dave Brubeck Quartet At Carnegie Hall,The Dave Brubeck Quartet,0.1920,0.459,861933.0,0.606,0.612000,10.0,0.398,-13.427,0.0,0.0645,116.325,4.0,0.359,4My0KPjdtzCUfFjToOCiPh,1963-01-01,10,False,,
339853,7i3NXBP12BJoerk6sAATx0,Blue Rondo A La Turk,The Dave Brubeck Quartet At Carnegie Hall,The Dave Brubeck Quartet,0.5220,0.444,761573.0,0.508,0.025000,10.0,0.929,-11.111,1.0,0.0526,62.879,4.0,0.592,4My0KPjdtzCUfFjToOCiPh,1963-01-01,11,False,,


In [0]:
# Seeing how many songs I managed to get lyrics for. I will only keep those.
# Without lyrics I cannot get sentiment or personality analysis.
spotify_df[spotify_df["lyrics"].notnull()]

Unnamed: 0,id,song,album,artist,acousticness,danceability,duration_ms,energy,instrumentalness,key,liveness,loudness,mode,speechiness,tempo,time_signature,valence,album_id,date,popularity,explicit,chain_lyrics,lyrics
0,0Veyvc3n9AcLSoK3r1dA12,Voices In My Head,Hoodie SZN,A Boogie Wit da Hoodie,0.0555,0.754,142301.0,0.663,0.000000,6.0,0.101,-6.311,0.0,0.4270,90.195,4.0,0.2070,3r5hf3Cj3EMh1C2saQ8jyt,2018-12-21,58,True,monstas gon tear it up all she ever wanted was...,monstas gon tear it up all she ever wanted was...
1,77JzXZonNumWsuXKy9vr3U,Beasty,Hoodie SZN,A Boogie Wit da Hoodie,0.2920,0.860,152829.0,0.418,0.000000,7.0,0.106,-9.061,0.0,0.1580,126.023,4.0,0.3740,3r5hf3Cj3EMh1C2saQ8jyt,2018-12-21,59,True,all that we know is the logos and hb \n the gl...,all that we know is the logos and hb the glock...
2,18yllZD0TdF7ykcREib8Z1,I Did It,Hoodie SZN,A Boogie Wit da Hoodie,0.1530,0.718,215305.0,0.454,0.000046,8.0,0.116,-9.012,1.0,0.1270,89.483,4.0,0.1960,3r5hf3Cj3EMh1C2saQ8jyt,2018-12-21,57,True,yeah i cant trust these bitches \n i dont got ...,yeah i cant trust these bitches i dont got no ...
3,1wJRveJZLSb1rjhnUHQiv6,Swervin,Hoodie SZN,A Boogie Wit da Hoodie,0.0153,0.581,189487.0,0.662,0.000000,9.0,0.111,-5.239,1.0,0.3030,93.023,4.0,0.4340,3r5hf3Cj3EMh1C2saQ8jyt,2018-12-21,83,True,uh \n yeah \n oh thats london on da track run ...,uh yeah oh thats london on da track run that s...
4,0jAfdqv18goRTUxm3ilRjb,Startender,Hoodie SZN,A Boogie Wit da Hoodie,0.0235,0.736,192779.0,0.622,0.000000,6.0,0.151,-4.653,0.0,0.1330,191.971,4.0,0.5060,3r5hf3Cj3EMh1C2saQ8jyt,2018-12-21,71,True,yeah shawty got ass she just got a tummy tuck ...,yeah shawty got ass she just got a tummy tuck ...
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
339840,0YUgXm3g2TozRf8aAJSs7w,More Than You Know,Warm And Willing,Andy Williams,0.8350,0.232,206507.0,0.231,0.000211,4.0,0.224,-13.919,0.0,0.0334,114.318,4.0,0.0901,5a5tJ1todBkvozHochVs00,1962-01-01,5,False,whether you are here or yonder \n whether you ...,whether you are here or yonder whether you are...
339841,3VWyVAx1aIJwzt1Q7eOeqf,Love Is Here to Stay,Warm And Willing,Andy Williams,0.7970,0.265,163160.0,0.146,0.000002,5.0,0.459,-14.390,1.0,0.0336,108.870,4.0,0.1500,5a5tJ1todBkvozHochVs00,1962-01-01,5,False,its very clear our love is here to stay \n not...,its very clear our love is here to stay not fo...
339842,26opStezaJhoYXFcGSnruX,Warm and Willing,Warm And Willing,Andy Williams,0.8100,0.184,171973.0,0.150,0.000467,0.0,0.138,-15.416,1.0,0.0368,81.187,4.0,0.1280,5a5tJ1todBkvozHochVs00,1962-01-01,4,False,love is for the warm and willing \n waiting fo...,love is for the warm and willing waiting for t...
339843,2AQMj428TntnVZW9ngkxUG,St. Louis Blues,The Dave Brubeck Quartet At Carnegie Hall,The Dave Brubeck Quartet,0.5760,0.504,719267.0,0.611,0.070000,8.0,0.644,-12.726,1.0,0.0676,106.063,4.0,0.7530,4My0KPjdtzCUfFjToOCiPh,1963-01-01,13,False,أنا شعري غامق بس قلبي مطقطق ابيض من زمان جايز ...,أنا شعري غامق بس قلبي مطقطق ابيض من زمان جايز ...


In [0]:
spotify_df = spotify_df[spotify_df["lyrics"].notnull()]
spotify_df = spotify_df.reset_index().iloc[:,1:]
spotify_df

Unnamed: 0,id,song,album,artist,acousticness,danceability,duration_ms,energy,instrumentalness,key,liveness,loudness,mode,speechiness,tempo,time_signature,valence,album_id,date,popularity,explicit,chain_lyrics,lyrics
0,0Veyvc3n9AcLSoK3r1dA12,Voices In My Head,Hoodie SZN,A Boogie Wit da Hoodie,0.0555,0.754,142301.0,0.663,0.000000,6.0,0.101,-6.311,0.0,0.4270,90.195,4.0,0.2070,3r5hf3Cj3EMh1C2saQ8jyt,2018-12-21,58,True,monstas gon tear it up all she ever wanted was...,monstas gon tear it up all she ever wanted was...
1,77JzXZonNumWsuXKy9vr3U,Beasty,Hoodie SZN,A Boogie Wit da Hoodie,0.2920,0.860,152829.0,0.418,0.000000,7.0,0.106,-9.061,0.0,0.1580,126.023,4.0,0.3740,3r5hf3Cj3EMh1C2saQ8jyt,2018-12-21,59,True,all that we know is the logos and hb \n the gl...,all that we know is the logos and hb the glock...
2,18yllZD0TdF7ykcREib8Z1,I Did It,Hoodie SZN,A Boogie Wit da Hoodie,0.1530,0.718,215305.0,0.454,0.000046,8.0,0.116,-9.012,1.0,0.1270,89.483,4.0,0.1960,3r5hf3Cj3EMh1C2saQ8jyt,2018-12-21,57,True,yeah i cant trust these bitches \n i dont got ...,yeah i cant trust these bitches i dont got no ...
3,1wJRveJZLSb1rjhnUHQiv6,Swervin,Hoodie SZN,A Boogie Wit da Hoodie,0.0153,0.581,189487.0,0.662,0.000000,9.0,0.111,-5.239,1.0,0.3030,93.023,4.0,0.4340,3r5hf3Cj3EMh1C2saQ8jyt,2018-12-21,83,True,uh \n yeah \n oh thats london on da track run ...,uh yeah oh thats london on da track run that s...
4,0jAfdqv18goRTUxm3ilRjb,Startender,Hoodie SZN,A Boogie Wit da Hoodie,0.0235,0.736,192779.0,0.622,0.000000,6.0,0.151,-4.653,0.0,0.1330,191.971,4.0,0.5060,3r5hf3Cj3EMh1C2saQ8jyt,2018-12-21,71,True,yeah shawty got ass she just got a tummy tuck ...,yeah shawty got ass she just got a tummy tuck ...
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
144629,0YUgXm3g2TozRf8aAJSs7w,More Than You Know,Warm And Willing,Andy Williams,0.8350,0.232,206507.0,0.231,0.000211,4.0,0.224,-13.919,0.0,0.0334,114.318,4.0,0.0901,5a5tJ1todBkvozHochVs00,1962-01-01,5,False,whether you are here or yonder \n whether you ...,whether you are here or yonder whether you are...
144630,3VWyVAx1aIJwzt1Q7eOeqf,Love Is Here to Stay,Warm And Willing,Andy Williams,0.7970,0.265,163160.0,0.146,0.000002,5.0,0.459,-14.390,1.0,0.0336,108.870,4.0,0.1500,5a5tJ1todBkvozHochVs00,1962-01-01,5,False,its very clear our love is here to stay \n not...,its very clear our love is here to stay not fo...
144631,26opStezaJhoYXFcGSnruX,Warm and Willing,Warm And Willing,Andy Williams,0.8100,0.184,171973.0,0.150,0.000467,0.0,0.138,-15.416,1.0,0.0368,81.187,4.0,0.1280,5a5tJ1todBkvozHochVs00,1962-01-01,4,False,love is for the warm and willing \n waiting fo...,love is for the warm and willing waiting for t...
144632,2AQMj428TntnVZW9ngkxUG,St. Louis Blues,The Dave Brubeck Quartet At Carnegie Hall,The Dave Brubeck Quartet,0.5760,0.504,719267.0,0.611,0.070000,8.0,0.644,-12.726,1.0,0.0676,106.063,4.0,0.7530,4My0KPjdtzCUfFjToOCiPh,1963-01-01,13,False,أنا شعري غامق بس قلبي مطقطق ابيض من زمان جايز ...,أنا شعري غامق بس قلبي مطقطق ابيض من زمان جايز ...


In [0]:
# Saving my progress
spotify_df.to_csv("/content/drive/My Drive/spotify_lyrics.csv", index=False)

# Sentiment Analysis from Google Cloud Natural Language API

In [0]:
spotify_df = pd.read_csv("/content/drive/My Drive/spotify_lyrics.csv")
spotify_df.head()

In [0]:
spotify_partition = spotify_df.iloc[:50000,:]
spotify_partition.head()

Unnamed: 0,id,song,album,artist,acousticness,danceability,duration_ms,energy,instrumentalness,key,liveness,loudness,mode,speechiness,tempo,time_signature,valence,album_id,date,popularity,explicit,chain_lyrics,lyrics
0,0Veyvc3n9AcLSoK3r1dA12,Voices In My Head,Hoodie SZN,A Boogie Wit da Hoodie,0.0555,0.754,142301.0,0.663,0.0,6.0,0.101,-6.311,0.0,0.427,90.195,4.0,0.207,3r5hf3Cj3EMh1C2saQ8jyt,2018-12-21,58,True,monstas gon tear it up all she ever wanted was...,monstas gon tear it up all she ever wanted was...
1,77JzXZonNumWsuXKy9vr3U,Beasty,Hoodie SZN,A Boogie Wit da Hoodie,0.292,0.86,152829.0,0.418,0.0,7.0,0.106,-9.061,0.0,0.158,126.023,4.0,0.374,3r5hf3Cj3EMh1C2saQ8jyt,2018-12-21,59,True,all that we know is the logos and hb \n the gl...,all that we know is the logos and hb the glock...
2,18yllZD0TdF7ykcREib8Z1,I Did It,Hoodie SZN,A Boogie Wit da Hoodie,0.153,0.718,215305.0,0.454,4.6e-05,8.0,0.116,-9.012,1.0,0.127,89.483,4.0,0.196,3r5hf3Cj3EMh1C2saQ8jyt,2018-12-21,57,True,yeah i cant trust these bitches \n i dont got ...,yeah i cant trust these bitches i dont got no ...
3,1wJRveJZLSb1rjhnUHQiv6,Swervin,Hoodie SZN,A Boogie Wit da Hoodie,0.0153,0.581,189487.0,0.662,0.0,9.0,0.111,-5.239,1.0,0.303,93.023,4.0,0.434,3r5hf3Cj3EMh1C2saQ8jyt,2018-12-21,83,True,uh \n yeah \n oh thats london on da track run ...,uh yeah oh thats london on da track run that s...
4,0jAfdqv18goRTUxm3ilRjb,Startender,Hoodie SZN,A Boogie Wit da Hoodie,0.0235,0.736,192779.0,0.622,0.0,6.0,0.151,-4.653,0.0,0.133,191.971,4.0,0.506,3r5hf3Cj3EMh1C2saQ8jyt,2018-12-21,71,True,yeah shawty got ass she just got a tummy tuck ...,yeah shawty got ass she just got a tummy tuck ...


In [0]:
# Grabbing sentiment analysis or lyrics from Google Cloud Natural Language API.
# The rate limit is 600 requests per minute which is 10 requests per second
from googleapiclient.discovery import build
lservice = build('language', 'v1beta1', developerKey=APIKEY)

def sentiment(df):
  sentiments = []
  for lyric in df["lyrics"]:
    sentiment = {}

    try:
      response = lservice.documents().analyzeSentiment(
        body={
          'document': {
            'type': 'PLAIN_TEXT',
            'content': lyric
          }
        }).execute()
      sentiment["polarity"] = response['documentSentiment']['polarity']
      sentiment["magnitude"] = response['documentSentiment']['magnitude']
      sentiments.append(sentiment)

    except Exception as e:
      sentiments.append({})

    time.sleep(0.075)
  return sentiments

In [0]:
sentiments1 = pd.json_normalize(sentiment(spotify_partition))
sentiments1

Unnamed: 0,polarity,magnitude
0,-1.0,1.4
1,0.0,1.5
2,-1.0,1.9
3,0.5,1.4
4,-1.0,2.5
...,...,...
49995,1.0,0.1
49996,1.0,0.4
49997,1.0,0.8
49998,-1.0,0.5


In [0]:
spotify_partition2 = spotify_df.iloc[50000:100000,:]
spotify_partition2.head()

Unnamed: 0,id,song,album,artist,acousticness,danceability,duration_ms,energy,instrumentalness,key,liveness,loudness,mode,speechiness,tempo,time_signature,valence,album_id,date,popularity,explicit,chain_lyrics,lyrics
50000,3o2PfltEMN9yrK0O7BTZia,An Easy Life,Meyrin Fields (EP),Broken Bells,0.144,0.585,174787.0,0.79,4e-06,3.0,0.304,-4.428,1.0,0.0268,76.007,4.0,0.723,74Duvq6GksnldghnjwYRdB,2011-03-29,35,False,so tell me where it hurts \n do those stunts p...,so tell me where it hurts do those stunts prov...
50001,4rq5wYypPiZMTl0mB3fGah,Heartless Empire,Meyrin Fields (EP),Broken Bells,0.00102,0.509,164680.0,0.588,0.513,8.0,0.188,-4.216,0.0,0.037,124.524,3.0,0.603,74Duvq6GksnldghnjwYRdB,2011-03-29,32,False,and what you found was gold \n as black as dri...,and what you found was gold as black as dried ...
50002,5x1B2VxlxDm0HMJbQN9Vzw,What This World Needs,The Altar And The Door,Casting Crowns,0.00328,0.526,281480.0,0.81,1e-05,9.0,0.0692,-5.256,0.0,0.0339,140.039,4.0,0.48,3YNvCS8uVgcJVMfk2Ad8EL,2007-08-28,29,False,what this world needs is not another one hit w...,what this world needs is not another one hit w...
50003,7gO6lB3vqU96BPwhvVCJG5,Every Man,The Altar And The Door,Casting Crowns,0.00347,0.472,286520.0,0.565,0.0,0.0,0.243,-6.137,1.0,0.0264,144.051,4.0,0.28,3YNvCS8uVgcJVMfk2Ad8EL,2007-08-28,39,False,im the man with all ive ever wanted \n all the...,im the man with all ive ever wanted all the to...
50004,3GkIRSRXThXbSfQTKcMApx,Slow Fade,The Altar And The Door,Casting Crowns,0.0492,0.462,277293.0,0.488,0.0,1.0,0.121,-7.181,1.0,0.026,78.015,4.0,0.133,3YNvCS8uVgcJVMfk2Ad8EL,2007-08-28,44,False,be careful little eyes what you see \n its the...,be careful little eyes what you see its the se...


In [0]:
sentiments2 = pd.json_normalize(sentiment(spotify_partition2))
sentiments2

Unnamed: 0,polarity,magnitude
0,-1.0,0.8
1,-1.0,0.4
2,1.0,1.1
3,0.1,1.0
4,0.1,1.4
...,...,...
49995,1.0,0.2
49996,1.0,1.1
49997,-1.0,0.4
49998,1.0,0.8


In [0]:
sentiments_df = pd.concat([sentiments1, sentiments2], ignore_index=True)

In [0]:
sentiments_df.to_csv("/content/drive/My Drive/sentiments.csv", index=False)

In [0]:
spotify_partition3 = spotify_df.iloc[100000:,:]

In [0]:
sentiments3 = pd.json_normalize(sentiment(spotify_partition3))

In [0]:
sentiments_df = pd.concat([sentiments_df, sentiments3], ignore_index=True)
sentiments_df

Unnamed: 0,polarity,magnitude
0,-1.0,1.4
1,0.0,1.5
2,-1.0,1.9
3,0.5,1.4
4,-1.0,2.5
...,...,...
144629,1.0,0.1
144630,1.0,0.8
144631,1.0,0.5
144632,,


In [0]:
spotify_df = spotify_df.merge(sentiments_df, left_index=True, right_index=True,
                              how="outer")
spotify_df

Unnamed: 0,id,song,album,artist,acousticness,danceability,duration_ms,energy,instrumentalness,key,liveness,loudness,mode,speechiness,tempo,time_signature,valence,album_id,date,popularity,explicit,chain_lyrics,lyrics,polarity,magnitude
0,0Veyvc3n9AcLSoK3r1dA12,Voices In My Head,Hoodie SZN,A Boogie Wit da Hoodie,0.0555,0.754,142301.0,0.663,0.000000,6.0,0.101,-6.311,0.0,0.4270,90.195,4.0,0.2070,3r5hf3Cj3EMh1C2saQ8jyt,2018-12-21,58,True,monstas gon tear it up all she ever wanted was...,monstas gon tear it up all she ever wanted was...,-1.0,1.4
1,77JzXZonNumWsuXKy9vr3U,Beasty,Hoodie SZN,A Boogie Wit da Hoodie,0.2920,0.860,152829.0,0.418,0.000000,7.0,0.106,-9.061,0.0,0.1580,126.023,4.0,0.3740,3r5hf3Cj3EMh1C2saQ8jyt,2018-12-21,59,True,all that we know is the logos and hb \n the gl...,all that we know is the logos and hb the glock...,0.0,1.5
2,18yllZD0TdF7ykcREib8Z1,I Did It,Hoodie SZN,A Boogie Wit da Hoodie,0.1530,0.718,215305.0,0.454,0.000046,8.0,0.116,-9.012,1.0,0.1270,89.483,4.0,0.1960,3r5hf3Cj3EMh1C2saQ8jyt,2018-12-21,57,True,yeah i cant trust these bitches \n i dont got ...,yeah i cant trust these bitches i dont got no ...,-1.0,1.9
3,1wJRveJZLSb1rjhnUHQiv6,Swervin,Hoodie SZN,A Boogie Wit da Hoodie,0.0153,0.581,189487.0,0.662,0.000000,9.0,0.111,-5.239,1.0,0.3030,93.023,4.0,0.4340,3r5hf3Cj3EMh1C2saQ8jyt,2018-12-21,83,True,uh \n yeah \n oh thats london on da track run ...,uh yeah oh thats london on da track run that s...,0.5,1.4
4,0jAfdqv18goRTUxm3ilRjb,Startender,Hoodie SZN,A Boogie Wit da Hoodie,0.0235,0.736,192779.0,0.622,0.000000,6.0,0.151,-4.653,0.0,0.1330,191.971,4.0,0.5060,3r5hf3Cj3EMh1C2saQ8jyt,2018-12-21,71,True,yeah shawty got ass she just got a tummy tuck ...,yeah shawty got ass she just got a tummy tuck ...,-1.0,2.5
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
144629,0YUgXm3g2TozRf8aAJSs7w,More Than You Know,Warm And Willing,Andy Williams,0.8350,0.232,206507.0,0.231,0.000211,4.0,0.224,-13.919,0.0,0.0334,114.318,4.0,0.0901,5a5tJ1todBkvozHochVs00,1962-01-01,5,False,whether you are here or yonder \n whether you ...,whether you are here or yonder whether you are...,1.0,0.1
144630,3VWyVAx1aIJwzt1Q7eOeqf,Love Is Here to Stay,Warm And Willing,Andy Williams,0.7970,0.265,163160.0,0.146,0.000002,5.0,0.459,-14.390,1.0,0.0336,108.870,4.0,0.1500,5a5tJ1todBkvozHochVs00,1962-01-01,5,False,its very clear our love is here to stay \n not...,its very clear our love is here to stay not fo...,1.0,0.8
144631,26opStezaJhoYXFcGSnruX,Warm and Willing,Warm And Willing,Andy Williams,0.8100,0.184,171973.0,0.150,0.000467,0.0,0.138,-15.416,1.0,0.0368,81.187,4.0,0.1280,5a5tJ1todBkvozHochVs00,1962-01-01,4,False,love is for the warm and willing \n waiting fo...,love is for the warm and willing waiting for t...,1.0,0.5
144632,2AQMj428TntnVZW9ngkxUG,St. Louis Blues,The Dave Brubeck Quartet At Carnegie Hall,The Dave Brubeck Quartet,0.5760,0.504,719267.0,0.611,0.070000,8.0,0.644,-12.726,1.0,0.0676,106.063,4.0,0.7530,4My0KPjdtzCUfFjToOCiPh,1963-01-01,13,False,أنا شعري غامق بس قلبي مطقطق ابيض من زمان جايز ...,أنا شعري غامق بس قلبي مطقطق ابيض من زمان جايز ...,,


In [0]:
# I drop whatever rows I didn't get sentiment analysis for
spotify_df = spotify_df[spotify_df["polarity"].notnull()]
spotify_df = spotify_df.reset_index().iloc[:,1:]
spotify_df

Unnamed: 0,id,song,album,artist,acousticness,danceability,duration_ms,energy,instrumentalness,key,liveness,loudness,mode,speechiness,tempo,time_signature,valence,album_id,date,popularity,explicit,chain_lyrics,lyrics,polarity,magnitude
0,0Veyvc3n9AcLSoK3r1dA12,Voices In My Head,Hoodie SZN,A Boogie Wit da Hoodie,0.0555,0.754,142301.0,0.663,0.000000,6.0,0.101,-6.311,0.0,0.4270,90.195,4.0,0.2070,3r5hf3Cj3EMh1C2saQ8jyt,2018-12-21,58,True,monstas gon tear it up all she ever wanted was...,monstas gon tear it up all she ever wanted was...,-1.0,1.4
1,77JzXZonNumWsuXKy9vr3U,Beasty,Hoodie SZN,A Boogie Wit da Hoodie,0.2920,0.860,152829.0,0.418,0.000000,7.0,0.106,-9.061,0.0,0.1580,126.023,4.0,0.3740,3r5hf3Cj3EMh1C2saQ8jyt,2018-12-21,59,True,all that we know is the logos and hb \n the gl...,all that we know is the logos and hb the glock...,0.0,1.5
2,18yllZD0TdF7ykcREib8Z1,I Did It,Hoodie SZN,A Boogie Wit da Hoodie,0.1530,0.718,215305.0,0.454,0.000046,8.0,0.116,-9.012,1.0,0.1270,89.483,4.0,0.1960,3r5hf3Cj3EMh1C2saQ8jyt,2018-12-21,57,True,yeah i cant trust these bitches \n i dont got ...,yeah i cant trust these bitches i dont got no ...,-1.0,1.9
3,1wJRveJZLSb1rjhnUHQiv6,Swervin,Hoodie SZN,A Boogie Wit da Hoodie,0.0153,0.581,189487.0,0.662,0.000000,9.0,0.111,-5.239,1.0,0.3030,93.023,4.0,0.4340,3r5hf3Cj3EMh1C2saQ8jyt,2018-12-21,83,True,uh \n yeah \n oh thats london on da track run ...,uh yeah oh thats london on da track run that s...,0.5,1.4
4,0jAfdqv18goRTUxm3ilRjb,Startender,Hoodie SZN,A Boogie Wit da Hoodie,0.0235,0.736,192779.0,0.622,0.000000,6.0,0.151,-4.653,0.0,0.1330,191.971,4.0,0.5060,3r5hf3Cj3EMh1C2saQ8jyt,2018-12-21,71,True,yeah shawty got ass she just got a tummy tuck ...,yeah shawty got ass she just got a tummy tuck ...,-1.0,2.5
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
144367,2DJpoS75CjnHOTXSBPKBTn,My One and Only Love,Warm And Willing,Andy Williams,0.6970,0.139,229173.0,0.269,0.000225,11.0,0.255,-14.119,0.0,0.0327,77.814,4.0,0.1590,5a5tJ1todBkvozHochVs00,1962-01-01,5,False,the very thought of you makes my heart sing \n...,the very thought of you makes my heart sing li...,1.0,0.9
144368,0YUgXm3g2TozRf8aAJSs7w,More Than You Know,Warm And Willing,Andy Williams,0.8350,0.232,206507.0,0.231,0.000211,4.0,0.224,-13.919,0.0,0.0334,114.318,4.0,0.0901,5a5tJ1todBkvozHochVs00,1962-01-01,5,False,whether you are here or yonder \n whether you ...,whether you are here or yonder whether you are...,1.0,0.1
144369,3VWyVAx1aIJwzt1Q7eOeqf,Love Is Here to Stay,Warm And Willing,Andy Williams,0.7970,0.265,163160.0,0.146,0.000002,5.0,0.459,-14.390,1.0,0.0336,108.870,4.0,0.1500,5a5tJ1todBkvozHochVs00,1962-01-01,5,False,its very clear our love is here to stay \n not...,its very clear our love is here to stay not fo...,1.0,0.8
144370,26opStezaJhoYXFcGSnruX,Warm and Willing,Warm And Willing,Andy Williams,0.8100,0.184,171973.0,0.150,0.000467,0.0,0.138,-15.416,1.0,0.0368,81.187,4.0,0.1280,5a5tJ1todBkvozHochVs00,1962-01-01,4,False,love is for the warm and willing \n waiting fo...,love is for the warm and willing waiting for t...,1.0,0.5


In [0]:
# Saving progress
spotify_df.to_csv("/content/drive/My Drive/spotify_sentiment.csv", index=False)

# Personality Insights from IBM Watson Cloud API

In [0]:
spotify_df = pd.read_csv("/content/drive/My Drive/spotify_sentiment.csv")
spotify_df

Unnamed: 0,id,song,album,artist,acousticness,danceability,duration_ms,energy,instrumentalness,key,liveness,loudness,mode,speechiness,tempo,time_signature,valence,album_id,date,popularity,explicit,chain_lyrics,lyrics,polarity,magnitude
0,0Veyvc3n9AcLSoK3r1dA12,Voices In My Head,Hoodie SZN,A Boogie Wit da Hoodie,0.0555,0.754,142301.0,0.663,0.000000,6.0,0.101,-6.311,0.0,0.4270,90.195,4.0,0.2070,3r5hf3Cj3EMh1C2saQ8jyt,2018-12-21,58,True,monstas gon tear it up all she ever wanted was...,monstas gon tear it up all she ever wanted was...,-1.0,1.4
1,77JzXZonNumWsuXKy9vr3U,Beasty,Hoodie SZN,A Boogie Wit da Hoodie,0.2920,0.860,152829.0,0.418,0.000000,7.0,0.106,-9.061,0.0,0.1580,126.023,4.0,0.3740,3r5hf3Cj3EMh1C2saQ8jyt,2018-12-21,59,True,all that we know is the logos and hb \n the gl...,all that we know is the logos and hb the glock...,0.0,1.5
2,18yllZD0TdF7ykcREib8Z1,I Did It,Hoodie SZN,A Boogie Wit da Hoodie,0.1530,0.718,215305.0,0.454,0.000046,8.0,0.116,-9.012,1.0,0.1270,89.483,4.0,0.1960,3r5hf3Cj3EMh1C2saQ8jyt,2018-12-21,57,True,yeah i cant trust these bitches \n i dont got ...,yeah i cant trust these bitches i dont got no ...,-1.0,1.9
3,1wJRveJZLSb1rjhnUHQiv6,Swervin,Hoodie SZN,A Boogie Wit da Hoodie,0.0153,0.581,189487.0,0.662,0.000000,9.0,0.111,-5.239,1.0,0.3030,93.023,4.0,0.4340,3r5hf3Cj3EMh1C2saQ8jyt,2018-12-21,83,True,uh \n yeah \n oh thats london on da track run ...,uh yeah oh thats london on da track run that s...,0.5,1.4
4,0jAfdqv18goRTUxm3ilRjb,Startender,Hoodie SZN,A Boogie Wit da Hoodie,0.0235,0.736,192779.0,0.622,0.000000,6.0,0.151,-4.653,0.0,0.1330,191.971,4.0,0.5060,3r5hf3Cj3EMh1C2saQ8jyt,2018-12-21,71,True,yeah shawty got ass she just got a tummy tuck ...,yeah shawty got ass she just got a tummy tuck ...,-1.0,2.5
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
144367,2DJpoS75CjnHOTXSBPKBTn,My One and Only Love,Warm And Willing,Andy Williams,0.6970,0.139,229173.0,0.269,0.000225,11.0,0.255,-14.119,0.0,0.0327,77.814,4.0,0.1590,5a5tJ1todBkvozHochVs00,1962-01-01,5,False,the very thought of you makes my heart sing \n...,the very thought of you makes my heart sing li...,1.0,0.9
144368,0YUgXm3g2TozRf8aAJSs7w,More Than You Know,Warm And Willing,Andy Williams,0.8350,0.232,206507.0,0.231,0.000211,4.0,0.224,-13.919,0.0,0.0334,114.318,4.0,0.0901,5a5tJ1todBkvozHochVs00,1962-01-01,5,False,whether you are here or yonder \n whether you ...,whether you are here or yonder whether you are...,1.0,0.1
144369,3VWyVAx1aIJwzt1Q7eOeqf,Love Is Here to Stay,Warm And Willing,Andy Williams,0.7970,0.265,163160.0,0.146,0.000002,5.0,0.459,-14.390,1.0,0.0336,108.870,4.0,0.1500,5a5tJ1todBkvozHochVs00,1962-01-01,5,False,its very clear our love is here to stay \n not...,its very clear our love is here to stay not fo...,1.0,0.8
144370,26opStezaJhoYXFcGSnruX,Warm and Willing,Warm And Willing,Andy Williams,0.8100,0.184,171973.0,0.150,0.000467,0.0,0.138,-15.416,1.0,0.0368,81.187,4.0,0.1280,5a5tJ1todBkvozHochVs00,1962-01-01,4,False,love is for the warm and willing \n waiting fo...,love is for the warm and willing waiting for t...,1.0,0.5


In [0]:
pip install --upgrade "ibm-watson>=4.3.0"

In [0]:
from ibm_watson import PersonalityInsightsV3
from ibm_cloud_sdk_core.authenticators import IAMAuthenticator
# Authenticator object
authenticator = IAMAuthenticator(APIKEY)
personality_insights = PersonalityInsightsV3(
    version= '2017-10-13',
    authenticator=authenticator
)

personality_insights.set_service_url(
    "https://api.eu-gb.personality-insights.watson.cloud.ibm.com"
    "/instances/168b15d9-dd14-4832-bf98-0b304c149a6c")

In [0]:
# Grabbing the 17 chosen personality traits for each lyric
from ibm_watson import ApiException

def personality_q(df):
  personality = []

  for lyric in df["lyrics"]:
    traits = {}

    try:
      profile = personality_insights.profile(
            lyric,
            accept='application/json',
            content_type='text/html',
      ).get_result()["personality"]

      traits["artistic"] = profile[0]["children"][1]["percentile"]
      traits["emotion"] = profile[0]["children"][2]["percentile"]
      traits["imagination"] = profile[0]["children"][3]["percentile"]
      traits["defiance"] = profile[0]["children"][5]["percentile"]

      traits["assertive"] = profile[2]["children"][1]["percentile"]
      traits["cheerful"] = profile[2]["children"][2]["percentile"]
      traits["outgoing"] = profile[2]["children"][4]["percentile"]
      traits["gregarious"] = profile[2]["children"][5]["percentile"]

      traits["modesty"] = profile[3]["children"][2]["percentile"]
      traits["stubborn"] = profile[3]["children"][3]["percentile"]
      traits["sympathy"] = profile[3]["children"][4]["percentile"]
      traits["trust"] = profile[3]["children"][5]["percentile"]

      traits["fiery"] = profile[4]["children"][0]["percentile"]
      traits["melancholy"] = profile[4]["children"][2]["percentile"]
      traits["immoderation"] = profile[4]["children"][3]["percentile"]
      traits["self-conscious"] = profile[4]["children"][4]["percentile"]
      traits["stress"] = profile[4]["children"][5]["percentile"]

      personality.append(traits)

    # To continue the program if an ApiException occurrs. Will append an empty 
    # dictionary. The API requires at least 100 tokens to do an analysis, which
    # not all my lyrics might contain.
    except ApiException:
      personality.append({})

  return pd.json_normalize(personality)


In [0]:
spotify_partition1 = spotify_df.iloc[:101,:]

In [0]:
personality1 = personality_q(spotify_partition1)
personality1

Unnamed: 0,artistic,emotion,imagination,defiance,assertive,cheerful,outgoing,gregarious,modesty,stubborn,sympathy,trust,fiery,melancholy,immoderation,self-conscious,stress
0,0.983429,0.269785,0.845684,0.833923,0.339471,0.814061,0.100284,0.099952,0.179199,0.670662,0.318997,0.001978,0.241852,0.780715,0.587846,0.719468,0.480640
1,0.951363,0.118000,0.993211,0.778178,0.651997,0.856554,0.322349,0.608428,0.010153,0.048880,0.084278,0.000658,0.848225,0.664055,0.801688,0.390560,0.443472
2,0.974373,0.406747,0.700611,0.815956,0.123845,0.771871,0.128306,0.112841,0.631363,0.718800,0.313936,0.001588,0.251038,0.798487,0.450389,0.736179,0.669987
3,0.945732,0.312331,0.866823,0.403949,0.815055,0.961068,0.834103,0.872156,0.008053,0.392424,0.068322,0.011624,0.523789,0.160812,0.532799,0.075188,0.109408
4,0.921820,0.017407,0.824431,0.521194,0.549106,0.949360,0.699198,0.819306,0.011715,0.257703,0.042916,0.000842,0.599420,0.223387,0.610249,0.098895,0.122751
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
96,0.930622,0.745394,0.551028,0.255336,0.488313,0.961293,0.545531,0.644906,0.654739,0.863120,0.439441,0.025029,0.121905,0.468695,0.380692,0.220633,0.303700
97,0.449456,0.920349,0.813274,0.081134,0.971131,0.974093,0.748655,0.888468,0.569916,0.210979,0.569377,0.005932,0.776126,0.814740,0.634973,0.161767,0.743967
98,0.904845,0.849844,0.973929,0.536265,0.674779,0.952063,0.654889,0.592064,0.351695,0.918772,0.281045,0.327625,0.173105,0.357412,0.577869,0.428962,0.237743
99,0.929041,0.771477,0.189124,0.146536,0.376799,0.994985,0.938108,0.982943,0.951342,0.887181,0.586570,0.363326,0.144397,0.137116,0.347495,0.047869,0.311686


In [0]:
spotify_partition2 = spotify_df.iloc[101:998,:]

In [0]:
personality2 = personality_q(spotify_partition2)
personality2

In [0]:
personality = pd.concat([personality1, personality2], ignore_index=True)
personality

Unnamed: 0,artistic,emotion,imagination,defiance,assertive,cheerful,outgoing,gregarious,modesty,stubborn,sympathy,trust,fiery,melancholy,immoderation,self-conscious,stress
0,0.983429,0.269785,0.845684,0.833923,0.339471,0.814061,0.100284,0.099952,0.179199,0.670662,0.318997,0.001978,0.241852,0.780715,0.587846,0.719468,0.480640
1,0.951363,0.118000,0.993211,0.778178,0.651997,0.856554,0.322349,0.608428,0.010153,0.048880,0.084278,0.000658,0.848225,0.664055,0.801688,0.390560,0.443472
2,0.974373,0.406747,0.700611,0.815956,0.123845,0.771871,0.128306,0.112841,0.631363,0.718800,0.313936,0.001588,0.251038,0.798487,0.450389,0.736179,0.669987
3,0.945732,0.312331,0.866823,0.403949,0.815055,0.961068,0.834103,0.872156,0.008053,0.392424,0.068322,0.011624,0.523789,0.160812,0.532799,0.075188,0.109408
4,0.921820,0.017407,0.824431,0.521194,0.549106,0.949360,0.699198,0.819306,0.011715,0.257703,0.042916,0.000842,0.599420,0.223387,0.610249,0.098895,0.122751
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
993,0.979987,0.313413,0.271619,0.733439,0.124827,0.884718,0.164004,0.106537,0.914077,0.965532,0.296567,0.077120,0.017420,0.288538,0.594996,0.667395,0.480486
994,0.999469,0.941313,0.989057,0.486872,0.352928,0.969229,0.545885,0.674081,0.575463,0.879740,0.603159,0.223791,0.085565,0.587142,0.590078,0.177969,0.354906
995,0.988905,0.937101,0.741063,0.018081,0.430030,0.994346,0.845208,0.794048,0.974057,0.860168,0.969754,0.340322,0.222987,0.449860,0.060687,0.222354,0.479499
996,0.482493,0.502376,0.009660,0.755368,0.576295,0.298204,0.514742,0.517921,0.807039,0.573953,0.304102,0.017901,0.157980,0.693738,0.941278,0.282183,0.474742


In [0]:
spotify_partition3 = spotify_df.iloc[998:1998,:]

In [0]:
personality3 = personality_q(spotify_partition3)

In [0]:
personality = pd.concat([personality, personality3], ignore_index=True)
personality

In [0]:
personality = pd.read_csv("/content/drive/My Drive/personality.csv")
personality = personality.iloc[:,1:]
personality

Unnamed: 0,artistic,emotion,imagination,defiance,assertive,cheerful,outgoing,gregarious,modesty,stubborn,sympathy,trust,fiery,melancholy,immoderation,self-conscious,stress
0,0.983429,0.269785,0.845684,0.833923,0.339471,0.814061,0.100284,0.099952,0.179199,0.670662,0.318997,0.001978,0.241852,0.780715,0.587846,0.719468,0.480640
1,0.951363,0.118000,0.993211,0.778178,0.651997,0.856554,0.322349,0.608428,0.010153,0.048880,0.084278,0.000658,0.848225,0.664055,0.801688,0.390560,0.443472
2,0.974373,0.406747,0.700611,0.815956,0.123845,0.771871,0.128306,0.112841,0.631363,0.718800,0.313936,0.001588,0.251038,0.798487,0.450389,0.736179,0.669987
3,0.945732,0.312331,0.866823,0.403949,0.815055,0.961068,0.834103,0.872156,0.008053,0.392424,0.068322,0.011624,0.523789,0.160812,0.532799,0.075188,0.109408
4,0.921820,0.017407,0.824431,0.521194,0.549106,0.949360,0.699198,0.819306,0.011715,0.257703,0.042916,0.000842,0.599420,0.223387,0.610249,0.098895,0.122751
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
1993,0.966506,0.999384,0.833097,0.379966,0.014145,0.448045,0.009024,0.000601,0.999981,0.998282,0.990120,0.027116,0.575858,0.999507,0.639104,0.983790,0.995672
1994,0.999762,0.999935,0.797607,0.988838,0.014706,0.788363,0.284271,0.037100,0.990796,0.998213,0.998043,0.614099,0.005887,0.845420,0.444600,0.869222,0.872836
1995,0.947925,0.997601,0.660800,0.227337,0.105676,0.567652,0.269545,0.031402,0.996275,0.985964,0.999353,0.232827,0.378977,0.987134,0.376898,0.805171,0.958301
1996,0.890352,0.746394,0.298913,0.562965,0.569526,0.817964,0.886211,0.876781,0.812130,0.794370,0.719878,0.774189,0.079530,0.398646,0.867095,0.145551,0.251402


In [0]:
spotify_partition4 = spotify_df.iloc[1998:2998,:]
spotify_partition4

Unnamed: 0,id,song,album,artist,acousticness,danceability,duration_ms,energy,instrumentalness,key,liveness,loudness,mode,speechiness,tempo,time_signature,valence,album_id,date,popularity,explicit,chain_lyrics,lyrics,polarity,magnitude
1998,3d1qN7AHge6BXuVPutcpI5,Some Of It,Desperate Man,Eric Church,0.341000,0.560,195147.0,0.736,0.000010,0.0,0.1550,-7.306,1.0,0.0446,79.843,4.0,0.824,5TjDN2hfsgNWVtP8Ew56Xx,2018-10-05,67,False,beer dont keep \n loves not cheap \n trucks do...,beer dont keep loves not cheap trucks dont wre...,0.3,0.8
1999,55UCRHfezMCFfbGy0tfjTG,Monsters,Desperate Man,Eric Church,0.513000,0.649,200387.0,0.400,0.000001,6.0,0.0774,-10.202,1.0,0.0455,157.895,4.0,0.393,5TjDN2hfsgNWVtP8Ew56Xx,2018-10-05,67,False,i killed my first monster when i was seven yea...,i killed my first monster when i was seven yea...,-0.3,0.8
2000,1vHeu2j2MpfMqqvcXKLtic,Desperate Man,Desperate Man,Eric Church,0.071600,0.711,208733.0,0.796,0.004500,10.0,0.2620,-5.870,1.0,0.0341,109.991,4.0,0.901,5TjDN2hfsgNWVtP8Ew56Xx,2018-10-05,57,False,ive seen the joshua tree \n got down on my kne...,ive seen the joshua tree got down on my knees ...,-1.0,0.6
2001,7KNrJiivZDZuTNfZ7kDMdA,Jukebox And A Bar,Desperate Man,Eric Church,0.787000,0.614,192720.0,0.459,0.000000,3.0,0.0923,-9.871,1.0,0.0333,148.213,4.0,0.545,5TjDN2hfsgNWVtP8Ew56Xx,2018-10-05,52,False,verse \n i think were sorely lacking \n so im ...,verse i think were sorely lacking so im going ...,-1.0,0.8
2002,3uWC9e2W0qKahn2TtjLEH4,I Know Better,4275,Jacquees,0.043500,0.748,174253.0,0.627,0.000000,5.0,0.1140,-5.806,0.0,0.1910,135.109,4.0,0.565,03AdJ15pTDdmxry6qkKwlO,2018-09-07,42,True,mmhmm hmm \n hmm aw yeah \n what \n murphy kid...,mmhmm hmm hmm aw yeah what murphy kid hmm jacq...,-1.0,1.6
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
2993,6NvFnHaFXUttOycsNjD0YM,Doomsday,Holy Hell,Architects,0.000026,0.523,248880.0,0.979,0.000026,2.0,0.1530,-4.214,1.0,0.1590,125.038,4.0,0.179,7xWphusyJNzmsFmfoSkchL,2018-11-09,50,False,remember when hell had frozen over \n the cold...,remember when hell had frozen over the cold st...,-1.0,1.0
2994,1yXFAwSMDZmX2ZyDLLyQ9s,Almost Love,Singular: Act I,Sabrina Carpenter,0.111000,0.810,212360.0,0.759,0.000005,9.0,0.0919,-3.029,0.0,0.0376,104.993,4.0,0.615,29mlGxS6kxq1EHxlX1EAZK,2018-11-09,57,False,the moment when i love you is right before you...,the moment when i love you is right before you...,1.0,0.9
2995,359HNzfOXhCMHB1pNKhyfH,Paris,Singular: Act I,Sabrina Carpenter,0.048200,0.576,218080.0,0.671,0.000000,2.0,0.1140,-6.408,0.0,0.0474,114.839,4.0,0.226,29mlGxS6kxq1EHxlX1EAZK,2018-11-09,62,False,if i ask that boy to jump i know he would \n h...,if i ask that boy to jump i know he would he d...,1.0,1.0
2996,72DnQlaqdNhz9QJZXfYe6L,Hold Tight,Singular: Act I,Sabrina Carpenter,0.126000,0.582,175160.0,0.383,0.000000,6.0,0.2500,-8.576,1.0,0.1950,135.535,4.0,0.388,29mlGxS6kxq1EHxlX1EAZK,2018-11-09,45,False,aye done everything and done it again \n not e...,aye done everything and done it again not ever...,-1.0,0.3


In [0]:
personality4 = personality_q(spotify_partition4)
personality4

In [0]:
personality = pd.concat([personality, personality4], ignore_index=True)
personality

Unnamed: 0,artistic,emotion,imagination,defiance,assertive,cheerful,outgoing,gregarious,modesty,stubborn,sympathy,trust,fiery,melancholy,immoderation,self-conscious,stress
0,0.983429,0.269785,0.845684,0.833923,0.339471,0.814061,0.100284,0.099952,0.179199,0.670662,0.318997,0.001978,0.241852,0.780715,0.587846,0.719468,0.480640
1,0.951363,0.118000,0.993211,0.778178,0.651997,0.856554,0.322349,0.608428,0.010153,0.048880,0.084278,0.000658,0.848225,0.664055,0.801688,0.390560,0.443472
2,0.974373,0.406747,0.700611,0.815956,0.123845,0.771871,0.128306,0.112841,0.631363,0.718800,0.313936,0.001588,0.251038,0.798487,0.450389,0.736179,0.669987
3,0.945732,0.312331,0.866823,0.403949,0.815055,0.961068,0.834103,0.872156,0.008053,0.392424,0.068322,0.011624,0.523789,0.160812,0.532799,0.075188,0.109408
4,0.921820,0.017407,0.824431,0.521194,0.549106,0.949360,0.699198,0.819306,0.011715,0.257703,0.042916,0.000842,0.599420,0.223387,0.610249,0.098895,0.122751
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
2993,0.997964,0.449819,0.846236,0.734391,0.469236,0.321606,0.207580,0.102950,0.537790,0.878441,0.347346,0.143946,0.007152,0.408801,0.276723,0.347574,0.144480
2994,0.989646,0.985677,0.414052,0.787762,0.006539,0.929619,0.114198,0.201865,0.998245,0.965078,0.686818,0.342040,0.018548,0.570639,0.607553,0.833457,0.876602
2995,0.974077,0.883346,0.404483,0.900081,0.029580,0.234704,0.023702,0.016020,0.984028,0.879654,0.695278,0.070828,0.161238,0.653737,0.389519,0.930596,0.948669
2996,0.997381,0.769346,0.966203,0.672787,0.369988,0.848863,0.185425,0.150896,0.470260,0.961363,0.395533,0.023462,0.215450,0.776369,0.539758,0.592168,0.467035


In [0]:
spotify_partition5 = spotify_df.iloc[2998:4998,:]

In [0]:
personality5 = personality_q(spotify_partition5)
personality5

In [0]:
personality = pd.concat([personality, personality5], ignore_index=True)
personality

Unnamed: 0,artistic,emotion,imagination,defiance,assertive,cheerful,outgoing,gregarious,modesty,stubborn,sympathy,trust,fiery,melancholy,immoderation,self-conscious,stress
0,0.983429,0.269785,0.845684,0.833923,0.339471,0.814061,0.100284,0.099952,0.179199,0.670662,0.318997,0.001978,0.241852,0.780715,0.587846,0.719468,0.480640
1,0.951363,0.118000,0.993211,0.778178,0.651997,0.856554,0.322349,0.608428,0.010153,0.048880,0.084278,0.000658,0.848225,0.664055,0.801688,0.390560,0.443472
2,0.974373,0.406747,0.700611,0.815956,0.123845,0.771871,0.128306,0.112841,0.631363,0.718800,0.313936,0.001588,0.251038,0.798487,0.450389,0.736179,0.669987
3,0.945732,0.312331,0.866823,0.403949,0.815055,0.961068,0.834103,0.872156,0.008053,0.392424,0.068322,0.011624,0.523789,0.160812,0.532799,0.075188,0.109408
4,0.921820,0.017407,0.824431,0.521194,0.549106,0.949360,0.699198,0.819306,0.011715,0.257703,0.042916,0.000842,0.599420,0.223387,0.610249,0.098895,0.122751
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
4993,0.842012,0.596860,0.932071,0.182312,0.947798,0.899038,0.863564,0.638427,0.359392,0.822904,0.781093,0.528186,0.045856,0.360186,0.116777,0.083433,0.096796
4994,0.794762,0.968778,0.611698,0.624759,0.788542,0.314244,0.435134,0.157406,0.715278,0.967492,0.996856,0.286909,0.169120,0.618066,0.672432,0.458051,0.210817
4995,0.991355,0.852482,0.999864,0.960685,0.091463,0.121304,0.006406,0.027287,0.843416,0.817498,0.930807,0.062729,0.145868,0.926686,0.910910,0.907188,0.423103
4996,0.993370,0.949072,0.871405,0.932102,0.479059,0.212418,0.218286,0.224716,0.827244,0.973078,0.906626,0.409671,0.126610,0.714141,0.522930,0.694173,0.593700


In [0]:
spotify_partition6 = spotify_df.iloc[4998:13998,:]

In [0]:
personality6 = personality_q(spotify_partition6)
personality6

In [0]:
personality = pd.concat([personality, personality6], ignore_index=True)
personality

Unnamed: 0,artistic,emotion,imagination,defiance,assertive,cheerful,outgoing,gregarious,modesty,stubborn,sympathy,trust,fiery,melancholy,immoderation,self-conscious,stress
0,0.983429,0.269785,0.845684,0.833923,0.339471,0.814061,0.100284,0.099952,0.179199,0.670662,0.318997,1.977780e-03,0.241852,0.780715,0.587846,0.719468,0.480640
1,0.951363,0.118000,0.993211,0.778178,0.651997,0.856554,0.322349,0.608428,0.010153,0.048880,0.084278,6.575984e-04,0.848225,0.664055,0.801688,0.390560,0.443472
2,0.974373,0.406747,0.700611,0.815956,0.123845,0.771871,0.128306,0.112841,0.631363,0.718800,0.313936,1.588150e-03,0.251038,0.798487,0.450389,0.736179,0.669987
3,0.945732,0.312331,0.866823,0.403949,0.815055,0.961068,0.834103,0.872156,0.008053,0.392424,0.068322,1.162439e-02,0.523789,0.160812,0.532799,0.075188,0.109408
4,0.921820,0.017407,0.824431,0.521194,0.549106,0.949360,0.699198,0.819306,0.011715,0.257703,0.042916,8.421960e-04,0.599420,0.223387,0.610249,0.098895,0.122751
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
13993,0.566533,0.043130,0.883024,0.173050,0.846260,0.927410,0.484558,0.817444,0.014089,0.377795,0.075654,2.542144e-05,0.638546,0.488024,0.616067,0.218735,0.123458
13994,0.982963,0.107937,0.722990,0.900500,0.224686,0.964461,0.309792,0.186156,0.077652,0.694249,0.065979,4.917525e-03,0.271243,0.437571,0.384234,0.601002,0.200176
13995,0.977729,0.014555,0.940996,0.956487,0.773791,0.865004,0.537938,0.568854,0.000030,0.033977,0.004314,2.408742e-04,0.850932,0.282200,0.884098,0.385182,0.114279
13996,0.747540,0.044445,0.970553,0.425372,0.946561,0.967102,0.789942,0.981505,0.003324,0.016402,0.059636,1.660645e-04,0.949658,0.612355,0.946795,0.106488,0.306350


In [0]:
spotify_partition7 = spotify_df.iloc[13998:24000,:]

In [0]:
personality7 = personality_q(spotify_partition7)
personality7

In [0]:
personality = pd.concat([personality, personality7], ignore_index=True)

In [0]:
personality = personality.iloc[:16634,:]
personality

Unnamed: 0,artistic,emotion,imagination,defiance,assertive,cheerful,outgoing,gregarious,modesty,stubborn,sympathy,trust,fiery,melancholy,immoderation,self-conscious,stress
0,0.983429,0.269785,0.845684,0.833923,0.339471,0.814061,0.100284,0.099952,0.179199,0.670662,0.318997,0.001978,0.241852,0.780715,0.587846,0.719468,0.480640
1,0.951363,0.118000,0.993211,0.778178,0.651997,0.856554,0.322349,0.608428,0.010153,0.048880,0.084278,0.000658,0.848225,0.664055,0.801688,0.390560,0.443472
2,0.974373,0.406747,0.700611,0.815956,0.123845,0.771871,0.128306,0.112841,0.631363,0.718800,0.313936,0.001588,0.251038,0.798487,0.450389,0.736179,0.669987
3,0.945732,0.312331,0.866823,0.403949,0.815055,0.961068,0.834103,0.872156,0.008053,0.392424,0.068322,0.011624,0.523789,0.160812,0.532799,0.075188,0.109408
4,0.921820,0.017407,0.824431,0.521194,0.549106,0.949360,0.699198,0.819306,0.011715,0.257703,0.042916,0.000842,0.599420,0.223387,0.610249,0.098895,0.122751
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
16629,0.610572,0.497648,0.656574,0.842893,0.327532,0.330685,0.303034,0.401232,0.539034,0.334514,0.611507,0.242054,0.597247,0.808790,0.741312,0.739884,0.737564
16630,0.668308,0.458072,0.653067,0.843224,0.305030,0.385853,0.308448,0.461988,0.512044,0.309395,0.617887,0.376763,0.614601,0.796770,0.830509,0.727967,0.764861
16631,0.622659,0.528941,0.660484,0.853855,0.361785,0.300671,0.307022,0.394831,0.541859,0.341028,0.611174,0.365767,0.593277,0.807313,0.715226,0.741994,0.719494
16632,0.601383,0.542283,0.662385,0.853718,0.376679,0.307431,0.308198,0.407658,0.545101,0.335360,0.614443,0.364215,0.608143,0.810041,0.772645,0.730282,0.737626


In [0]:
spotify_partition8 = spotify_df.iloc[16634:26734,:]

Unnamed: 0,id,song,album,artist,acousticness,danceability,duration_ms,energy,instrumentalness,key,liveness,loudness,mode,speechiness,tempo,time_signature,valence,album_id,date,popularity,explicit,chain_lyrics,lyrics,polarity,magnitude
16634,0pfk4nldZlQbtDBi06Lupm,Snapchat,Energia,J Balvin,0.0450,0.921,220320.0,0.439,0.000016,1.0,0.1180,-7.208,1.0,0.0655,123.969,4.0,0.905,4cGc9Eeb3Gjff2Aq5ILLEf,2016-06-24,57,False,ella me tira por la noche \n me manda fotos po...,ella me tira por la noche me manda fotos por e...,-1.0,0.5
16635,0VnHW8YOpMzbob93AGljBM,Hola,Energia,J Balvin,0.1480,0.801,196240.0,0.836,0.000979,4.0,0.1290,-5.367,0.0,0.0884,100.998,4.0,0.607,4cGc9Eeb3Gjff2Aq5ILLEf,2016-06-24,47,False,si después de lo que a pasao \n piensas en mí ...,si después de lo que a pasao piensas en mí deb...,-1.0,0.2
16636,2C2TGgFzrTRIOdQS1vUN5h,Ginza,Energia,J Balvin,0.2080,0.730,171093.0,0.809,0.001300,5.0,0.0804,-6.406,0.0,0.0876,101.965,4.0,0.825,4cGc9Eeb3Gjff2Aq5ILLEf,2016-06-24,69,False,si necesita reggaetón dale \n sigue bailando m...,si necesita reggaetón dale sigue bailando mami...,-0.5,0.2
16637,5AxBtF5bpBeCem2Bq0Fwll,Solitario,Energia,J Balvin,0.0892,0.784,207720.0,0.812,0.011200,1.0,0.1060,-4.499,1.0,0.0588,100.006,4.0,0.704,4cGc9Eeb3Gjff2Aq5ILLEf,2016-06-24,47,False,solitario \n infinity soy un hombre solitarioo...,solitario infinity soy un hombre solitariooh y...,1.0,0.4
16638,18H4UlOn9dxEFkQ9kWBCVJ,35 Pa Las 12,Energia,J Balvin,0.4510,0.702,246840.0,0.624,0.000676,1.0,0.1090,-3.457,1.0,0.1790,148.930,4.0,0.665,4cGc9Eeb3Gjff2Aq5ILLEf,2016-06-24,5,False,ohh ohh yeee ohhh ieeee \n ohhhhh \n vamo a pr...,ohh ohh yeee ohhh ieeee ohhhhh vamo a prende v...,1.0,1.3
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
26729,0e0FMXCVBFcW2t3uVM90NJ,Simple Machine,Evermotion,Guster,0.0101,0.624,187240.0,0.794,0.004310,4.0,0.1370,-7.018,1.0,0.0476,167.970,4.0,0.781,26NzwzbIFuoOgtsbIm5ryI,2015-01-13,31,False,static steady plastic motion \n lights flash b...,static steady plastic motion lights flash beat...,-1.0,0.9
26730,3ctly2pUS5Ot33CoKaxmgl,Expectation,Evermotion,Guster,0.4460,0.467,191187.0,0.578,0.000170,11.0,0.0954,-6.467,1.0,0.0344,137.013,4.0,0.250,26NzwzbIFuoOgtsbIm5ryI,2015-01-13,19,False,im alive \n thats the expectation \n oh no \n ...,im alive thats the expectation oh no feels lik...,-1.0,0.3
26731,4C9C8sWHHTOsldu3TRZoKI,Gangway,Evermotion,Guster,0.1130,0.447,192653.0,0.753,0.004860,2.0,0.3550,-5.284,1.0,0.0284,123.460,4.0,0.431,26NzwzbIFuoOgtsbIm5ryI,2015-01-13,25,False,if i could make a wish of i could right a wron...,if i could make a wish of i could right a wron...,-1.0,0.7
26732,2C0GzDKThNKZr4thpolEzX,Kid Dreams,Evermotion,Guster,0.2260,0.503,253240.0,0.568,0.019500,5.0,0.0817,-8.075,1.0,0.0336,88.784,4.0,0.557,26NzwzbIFuoOgtsbIm5ryI,2015-01-13,24,False,so there i was fifteen stuck in \n high school...,so there i was fifteen stuck in high school wa...,1.0,0.2


In [0]:
personality8 = personality_q(spotify_partition8)
personality8

ERROR:root:The number of words 96 is less than the minimum number of words required for analysis: 100
Traceback (most recent call last):
  File "/usr/local/lib/python3.6/dist-packages/ibm_cloud_sdk_core/base_service.py", line 229, in send
    response.status_code, error_message, http_response=response)
ibm_cloud_sdk_core.api_exception.ApiException: Error: The number of words 96 is less than the minimum number of words required for analysis: 100, Code: 400 , X-global-transaction-id: 6118dffe060e485e924c4c3f5862ddb0
ERROR:root:The number of words 70 is less than the minimum number of words required for analysis: 100
Traceback (most recent call last):
  File "/usr/local/lib/python3.6/dist-packages/ibm_cloud_sdk_core/base_service.py", line 229, in send
    response.status_code, error_message, http_response=response)
ibm_cloud_sdk_core.api_exception.ApiException: Error: The number of words 70 is less than the minimum number of words required for analysis: 100, Code: 400 , X-global-transacti

Unnamed: 0,artistic,emotion,imagination,defiance,assertive,cheerful,outgoing,gregarious,modesty,stubborn,sympathy,trust,fiery,melancholy,immoderation,self-conscious,stress
0,0.587477,0.507154,0.669622,0.839412,0.334472,0.330643,0.309930,0.424119,0.533488,0.326215,0.612485,0.270539,0.612906,0.808447,0.755667,0.725966,0.747313
1,0.593545,0.434140,0.677414,0.830876,0.371569,0.306578,0.302414,0.409154,0.511522,0.327938,0.605687,0.216345,0.587502,0.798141,0.697818,0.727110,0.715772
2,0.641673,0.425627,0.681010,0.782984,0.393689,0.436226,0.304867,0.436963,0.479459,0.345632,0.604971,0.351666,0.524989,0.777532,0.635098,0.727644,0.653994
3,0.588890,0.479982,0.669953,0.838402,0.411059,0.319914,0.304490,0.404136,0.525595,0.335591,0.609691,0.303208,0.576650,0.802982,0.688667,0.727523,0.711371
4,0.635742,0.411775,0.682576,0.841156,0.305287,0.403982,0.302994,0.412663,0.483459,0.301045,0.597112,0.474531,0.584563,0.791003,0.710038,0.751090,0.744855
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
10095,0.483210,0.614466,0.791187,0.004402,0.170147,0.476026,0.200341,0.314043,0.944821,0.947986,0.454287,0.011603,0.327810,0.959904,0.367588,0.505668,0.712661
10096,0.999678,0.781936,0.994449,0.987118,0.005879,0.066235,0.001171,0.006887,0.490115,0.290116,0.856172,0.148689,0.391590,0.988559,0.477067,0.955391,0.976936
10097,0.999164,0.589154,0.999992,0.999748,0.094155,0.362654,0.114609,0.310047,0.026866,0.094465,0.841245,0.758046,0.699378,0.797153,0.943452,0.942651,0.640199
10098,0.975240,0.594793,0.909666,0.751356,0.315535,0.734684,0.207368,0.224545,0.631730,0.911782,0.844109,0.095943,0.087876,0.484756,0.518084,0.479072,0.334271


In [0]:
personality

Unnamed: 0,artistic,emotion,imagination,defiance,assertive,cheerful,outgoing,gregarious,modesty,stubborn,sympathy,trust,fiery,melancholy,immoderation,self-conscious,stress
0,0.983429,0.269785,0.845684,0.833923,0.339471,0.814061,0.100284,0.099952,0.179199,0.670662,0.318997,0.001978,0.241852,0.780715,0.587846,0.719468,0.480640
1,0.951363,0.118000,0.993211,0.778178,0.651997,0.856554,0.322349,0.608428,0.010153,0.048880,0.084278,0.000658,0.848225,0.664055,0.801688,0.390560,0.443472
2,0.974373,0.406747,0.700611,0.815956,0.123845,0.771871,0.128306,0.112841,0.631363,0.718800,0.313936,0.001588,0.251038,0.798487,0.450389,0.736179,0.669987
3,0.945732,0.312331,0.866823,0.403949,0.815055,0.961068,0.834103,0.872156,0.008053,0.392424,0.068322,0.011624,0.523789,0.160812,0.532799,0.075188,0.109408
4,0.921820,0.017407,0.824431,0.521194,0.549106,0.949360,0.699198,0.819306,0.011715,0.257703,0.042916,0.000842,0.599420,0.223387,0.610249,0.098895,0.122751
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
16629,0.610572,0.497648,0.656574,0.842893,0.327532,0.330685,0.303034,0.401232,0.539034,0.334514,0.611507,0.242054,0.597247,0.808790,0.741312,0.739884,0.737564
16630,0.668308,0.458072,0.653067,0.843224,0.305030,0.385853,0.308448,0.461988,0.512044,0.309395,0.617887,0.376763,0.614601,0.796770,0.830509,0.727967,0.764861
16631,0.622659,0.528941,0.660484,0.853855,0.361785,0.300671,0.307022,0.394831,0.541859,0.341028,0.611174,0.365767,0.593277,0.807313,0.715226,0.741994,0.719494
16632,0.601383,0.542283,0.662385,0.853718,0.376679,0.307431,0.308198,0.407658,0.545101,0.335360,0.614443,0.364215,0.608143,0.810041,0.772645,0.730282,0.737626


In [0]:
personality = pd.concat([personality, personality8], ignore_index=True)
personality

Unnamed: 0,artistic,emotion,imagination,defiance,assertive,cheerful,outgoing,gregarious,modesty,stubborn,sympathy,trust,fiery,melancholy,immoderation,self-conscious,stress
0,0.983429,0.269785,0.845684,0.833923,0.339471,0.814061,0.100284,0.099952,0.179199,0.670662,0.318997,0.001978,0.241852,0.780715,0.587846,0.719468,0.480640
1,0.951363,0.118000,0.993211,0.778178,0.651997,0.856554,0.322349,0.608428,0.010153,0.048880,0.084278,0.000658,0.848225,0.664055,0.801688,0.390560,0.443472
2,0.974373,0.406747,0.700611,0.815956,0.123845,0.771871,0.128306,0.112841,0.631363,0.718800,0.313936,0.001588,0.251038,0.798487,0.450389,0.736179,0.669987
3,0.945732,0.312331,0.866823,0.403949,0.815055,0.961068,0.834103,0.872156,0.008053,0.392424,0.068322,0.011624,0.523789,0.160812,0.532799,0.075188,0.109408
4,0.921820,0.017407,0.824431,0.521194,0.549106,0.949360,0.699198,0.819306,0.011715,0.257703,0.042916,0.000842,0.599420,0.223387,0.610249,0.098895,0.122751
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
26729,0.483210,0.614466,0.791187,0.004402,0.170147,0.476026,0.200341,0.314043,0.944821,0.947986,0.454287,0.011603,0.327810,0.959904,0.367588,0.505668,0.712661
26730,0.999678,0.781936,0.994449,0.987118,0.005879,0.066235,0.001171,0.006887,0.490115,0.290116,0.856172,0.148689,0.391590,0.988559,0.477067,0.955391,0.976936
26731,0.999164,0.589154,0.999992,0.999748,0.094155,0.362654,0.114609,0.310047,0.026866,0.094465,0.841245,0.758046,0.699378,0.797153,0.943452,0.942651,0.640199
26732,0.975240,0.594793,0.909666,0.751356,0.315535,0.734684,0.207368,0.224545,0.631730,0.911782,0.844109,0.095943,0.087876,0.484756,0.518084,0.479072,0.334271


In [0]:
personality.to_csv("/content/drive/My Drive/personality.csv", index=False)

In [0]:
personality

Unnamed: 0,artistic,emotion,imagination,defiance,assertive,cheerful,outgoing,gregarious,modesty,stubborn,sympathy,trust,fiery,melancholy,immoderation,self-conscious,stress
0,0.983429,0.269785,0.845684,0.833923,0.339471,0.814061,0.100284,0.099952,0.179199,0.670662,0.318997,0.001978,0.241852,0.780715,0.587846,0.719468,0.480640
1,0.951363,0.118000,0.993211,0.778178,0.651997,0.856554,0.322349,0.608428,0.010153,0.048880,0.084278,0.000658,0.848225,0.664055,0.801688,0.390560,0.443472
2,0.974373,0.406747,0.700611,0.815956,0.123845,0.771871,0.128306,0.112841,0.631363,0.718800,0.313936,0.001588,0.251038,0.798487,0.450389,0.736179,0.669987
3,0.945732,0.312331,0.866823,0.403949,0.815055,0.961068,0.834103,0.872156,0.008053,0.392424,0.068322,0.011624,0.523789,0.160812,0.532799,0.075188,0.109408
4,0.921820,0.017407,0.824431,0.521194,0.549106,0.949360,0.699198,0.819306,0.011715,0.257703,0.042916,0.000842,0.599420,0.223387,0.610249,0.098895,0.122751
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
26729,0.483210,0.614466,0.791187,0.004402,0.170147,0.476026,0.200341,0.314043,0.944821,0.947986,0.454287,0.011603,0.327810,0.959904,0.367588,0.505668,0.712661
26730,0.999678,0.781936,0.994449,0.987118,0.005879,0.066235,0.001171,0.006887,0.490115,0.290116,0.856172,0.148689,0.391590,0.988559,0.477067,0.955391,0.976936
26731,0.999164,0.589154,0.999992,0.999748,0.094155,0.362654,0.114609,0.310047,0.026866,0.094465,0.841245,0.758046,0.699378,0.797153,0.943452,0.942651,0.640199
26732,0.975240,0.594793,0.909666,0.751356,0.315535,0.734684,0.207368,0.224545,0.631730,0.911782,0.844109,0.095943,0.087876,0.484756,0.518084,0.479072,0.334271


**My data for personality analysis so far only goes back to 2015, but I have run out of API calls to use towards IBM Cloud Personality Insights.**

In [0]:
spotify_df.iloc[:26734,:]

Unnamed: 0,id,song,album,artist,acousticness,danceability,duration_ms,energy,instrumentalness,key,liveness,loudness,mode,speechiness,tempo,time_signature,valence,album_id,date,popularity,explicit,chain_lyrics,lyrics,polarity,magnitude
0,0Veyvc3n9AcLSoK3r1dA12,Voices In My Head,Hoodie SZN,A Boogie Wit da Hoodie,0.0555,0.754,142301.0,0.663,0.000000,6.0,0.1010,-6.311,0.0,0.4270,90.195,4.0,0.207,3r5hf3Cj3EMh1C2saQ8jyt,2018-12-21,58,True,monstas gon tear it up all she ever wanted was...,monstas gon tear it up all she ever wanted was...,-1.0,1.4
1,77JzXZonNumWsuXKy9vr3U,Beasty,Hoodie SZN,A Boogie Wit da Hoodie,0.2920,0.860,152829.0,0.418,0.000000,7.0,0.1060,-9.061,0.0,0.1580,126.023,4.0,0.374,3r5hf3Cj3EMh1C2saQ8jyt,2018-12-21,59,True,all that we know is the logos and hb \n the gl...,all that we know is the logos and hb the glock...,0.0,1.5
2,18yllZD0TdF7ykcREib8Z1,I Did It,Hoodie SZN,A Boogie Wit da Hoodie,0.1530,0.718,215305.0,0.454,0.000046,8.0,0.1160,-9.012,1.0,0.1270,89.483,4.0,0.196,3r5hf3Cj3EMh1C2saQ8jyt,2018-12-21,57,True,yeah i cant trust these bitches \n i dont got ...,yeah i cant trust these bitches i dont got no ...,-1.0,1.9
3,1wJRveJZLSb1rjhnUHQiv6,Swervin,Hoodie SZN,A Boogie Wit da Hoodie,0.0153,0.581,189487.0,0.662,0.000000,9.0,0.1110,-5.239,1.0,0.3030,93.023,4.0,0.434,3r5hf3Cj3EMh1C2saQ8jyt,2018-12-21,83,True,uh \n yeah \n oh thats london on da track run ...,uh yeah oh thats london on da track run that s...,0.5,1.4
4,0jAfdqv18goRTUxm3ilRjb,Startender,Hoodie SZN,A Boogie Wit da Hoodie,0.0235,0.736,192779.0,0.622,0.000000,6.0,0.1510,-4.653,0.0,0.1330,191.971,4.0,0.506,3r5hf3Cj3EMh1C2saQ8jyt,2018-12-21,71,True,yeah shawty got ass she just got a tummy tuck ...,yeah shawty got ass she just got a tummy tuck ...,-1.0,2.5
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
26729,0e0FMXCVBFcW2t3uVM90NJ,Simple Machine,Evermotion,Guster,0.0101,0.624,187240.0,0.794,0.004310,4.0,0.1370,-7.018,1.0,0.0476,167.970,4.0,0.781,26NzwzbIFuoOgtsbIm5ryI,2015-01-13,31,False,static steady plastic motion \n lights flash b...,static steady plastic motion lights flash beat...,-1.0,0.9
26730,3ctly2pUS5Ot33CoKaxmgl,Expectation,Evermotion,Guster,0.4460,0.467,191187.0,0.578,0.000170,11.0,0.0954,-6.467,1.0,0.0344,137.013,4.0,0.250,26NzwzbIFuoOgtsbIm5ryI,2015-01-13,19,False,im alive \n thats the expectation \n oh no \n ...,im alive thats the expectation oh no feels lik...,-1.0,0.3
26731,4C9C8sWHHTOsldu3TRZoKI,Gangway,Evermotion,Guster,0.1130,0.447,192653.0,0.753,0.004860,2.0,0.3550,-5.284,1.0,0.0284,123.460,4.0,0.431,26NzwzbIFuoOgtsbIm5ryI,2015-01-13,25,False,if i could make a wish of i could right a wron...,if i could make a wish of i could right a wron...,-1.0,0.7
26732,2C0GzDKThNKZr4thpolEzX,Kid Dreams,Evermotion,Guster,0.2260,0.503,253240.0,0.568,0.019500,5.0,0.0817,-8.075,1.0,0.0336,88.784,4.0,0.557,26NzwzbIFuoOgtsbIm5ryI,2015-01-13,24,False,so there i was fifteen stuck in \n high school...,so there i was fifteen stuck in high school wa...,1.0,0.2


In [0]:
spotify_complete = spotify_df.iloc[:26734,:].merge(personality, left_index=True,
                                right_index=True, how="outer")
spotify_complete

Unnamed: 0,id,song,album,artist,acousticness,danceability,duration_ms,energy,instrumentalness,key,liveness,loudness,mode,speechiness,tempo,time_signature,valence,album_id,date,popularity,explicit,chain_lyrics,lyrics,polarity,magnitude,artistic,emotion,imagination,defiance,assertive,cheerful,outgoing,gregarious,modesty,stubborn,sympathy,trust,fiery,melancholy,immoderation,self-conscious,stress
0,0Veyvc3n9AcLSoK3r1dA12,Voices In My Head,Hoodie SZN,A Boogie Wit da Hoodie,0.0555,0.754,142301.0,0.663,0.000000,6.0,0.1010,-6.311,0.0,0.4270,90.195,4.0,0.207,3r5hf3Cj3EMh1C2saQ8jyt,2018-12-21,58,True,monstas gon tear it up all she ever wanted was...,monstas gon tear it up all she ever wanted was...,-1.0,1.4,0.983429,0.269785,0.845684,0.833923,0.339471,0.814061,0.100284,0.099952,0.179199,0.670662,0.318997,0.001978,0.241852,0.780715,0.587846,0.719468,0.480640
1,77JzXZonNumWsuXKy9vr3U,Beasty,Hoodie SZN,A Boogie Wit da Hoodie,0.2920,0.860,152829.0,0.418,0.000000,7.0,0.1060,-9.061,0.0,0.1580,126.023,4.0,0.374,3r5hf3Cj3EMh1C2saQ8jyt,2018-12-21,59,True,all that we know is the logos and hb \n the gl...,all that we know is the logos and hb the glock...,0.0,1.5,0.951363,0.118000,0.993211,0.778178,0.651997,0.856554,0.322349,0.608428,0.010153,0.048880,0.084278,0.000658,0.848225,0.664055,0.801688,0.390560,0.443472
2,18yllZD0TdF7ykcREib8Z1,I Did It,Hoodie SZN,A Boogie Wit da Hoodie,0.1530,0.718,215305.0,0.454,0.000046,8.0,0.1160,-9.012,1.0,0.1270,89.483,4.0,0.196,3r5hf3Cj3EMh1C2saQ8jyt,2018-12-21,57,True,yeah i cant trust these bitches \n i dont got ...,yeah i cant trust these bitches i dont got no ...,-1.0,1.9,0.974373,0.406747,0.700611,0.815956,0.123845,0.771871,0.128306,0.112841,0.631363,0.718800,0.313936,0.001588,0.251038,0.798487,0.450389,0.736179,0.669987
3,1wJRveJZLSb1rjhnUHQiv6,Swervin,Hoodie SZN,A Boogie Wit da Hoodie,0.0153,0.581,189487.0,0.662,0.000000,9.0,0.1110,-5.239,1.0,0.3030,93.023,4.0,0.434,3r5hf3Cj3EMh1C2saQ8jyt,2018-12-21,83,True,uh \n yeah \n oh thats london on da track run ...,uh yeah oh thats london on da track run that s...,0.5,1.4,0.945732,0.312331,0.866823,0.403949,0.815055,0.961068,0.834103,0.872156,0.008053,0.392424,0.068322,0.011624,0.523789,0.160812,0.532799,0.075188,0.109408
4,0jAfdqv18goRTUxm3ilRjb,Startender,Hoodie SZN,A Boogie Wit da Hoodie,0.0235,0.736,192779.0,0.622,0.000000,6.0,0.1510,-4.653,0.0,0.1330,191.971,4.0,0.506,3r5hf3Cj3EMh1C2saQ8jyt,2018-12-21,71,True,yeah shawty got ass she just got a tummy tuck ...,yeah shawty got ass she just got a tummy tuck ...,-1.0,2.5,0.921820,0.017407,0.824431,0.521194,0.549106,0.949360,0.699198,0.819306,0.011715,0.257703,0.042916,0.000842,0.599420,0.223387,0.610249,0.098895,0.122751
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
26729,0e0FMXCVBFcW2t3uVM90NJ,Simple Machine,Evermotion,Guster,0.0101,0.624,187240.0,0.794,0.004310,4.0,0.1370,-7.018,1.0,0.0476,167.970,4.0,0.781,26NzwzbIFuoOgtsbIm5ryI,2015-01-13,31,False,static steady plastic motion \n lights flash b...,static steady plastic motion lights flash beat...,-1.0,0.9,0.483210,0.614466,0.791187,0.004402,0.170147,0.476026,0.200341,0.314043,0.944821,0.947986,0.454287,0.011603,0.327810,0.959904,0.367588,0.505668,0.712661
26730,3ctly2pUS5Ot33CoKaxmgl,Expectation,Evermotion,Guster,0.4460,0.467,191187.0,0.578,0.000170,11.0,0.0954,-6.467,1.0,0.0344,137.013,4.0,0.250,26NzwzbIFuoOgtsbIm5ryI,2015-01-13,19,False,im alive \n thats the expectation \n oh no \n ...,im alive thats the expectation oh no feels lik...,-1.0,0.3,0.999678,0.781936,0.994449,0.987118,0.005879,0.066235,0.001171,0.006887,0.490115,0.290116,0.856172,0.148689,0.391590,0.988559,0.477067,0.955391,0.976936
26731,4C9C8sWHHTOsldu3TRZoKI,Gangway,Evermotion,Guster,0.1130,0.447,192653.0,0.753,0.004860,2.0,0.3550,-5.284,1.0,0.0284,123.460,4.0,0.431,26NzwzbIFuoOgtsbIm5ryI,2015-01-13,25,False,if i could make a wish of i could right a wron...,if i could make a wish of i could right a wron...,-1.0,0.7,0.999164,0.589154,0.999992,0.999748,0.094155,0.362654,0.114609,0.310047,0.026866,0.094465,0.841245,0.758046,0.699378,0.797153,0.943452,0.942651,0.640199
26732,2C0GzDKThNKZr4thpolEzX,Kid Dreams,Evermotion,Guster,0.2260,0.503,253240.0,0.568,0.019500,5.0,0.0817,-8.075,1.0,0.0336,88.784,4.0,0.557,26NzwzbIFuoOgtsbIm5ryI,2015-01-13,24,False,so there i was fifteen stuck in \n high school...,so there i was fifteen stuck in high school wa...,1.0,0.2,0.975240,0.594793,0.909666,0.751356,0.315535,0.734684,0.207368,0.224545,0.631730,0.911782,0.844109,0.095943,0.087876,0.484756,0.518084,0.479072,0.334271


In [0]:
spotify_complete = spotify_complete[spotify_complete["artistic"].notnull()]
spotify_complete = spotify_complete.reset_index().iloc[:,1:]
spotify_complete

Unnamed: 0,id,song,album,artist,acousticness,danceability,duration_ms,energy,instrumentalness,key,liveness,loudness,mode,speechiness,tempo,time_signature,valence,album_id,date,popularity,explicit,chain_lyrics,lyrics,polarity,magnitude,artistic,emotion,imagination,defiance,assertive,cheerful,outgoing,gregarious,modesty,stubborn,sympathy,trust,fiery,melancholy,immoderation,self-conscious,stress
0,0Veyvc3n9AcLSoK3r1dA12,Voices In My Head,Hoodie SZN,A Boogie Wit da Hoodie,0.0555,0.754,142301.0,0.663,0.000000,6.0,0.1010,-6.311,0.0,0.4270,90.195,4.0,0.207,3r5hf3Cj3EMh1C2saQ8jyt,2018-12-21,58,True,monstas gon tear it up all she ever wanted was...,monstas gon tear it up all she ever wanted was...,-1.0,1.4,0.983429,0.269785,0.845684,0.833923,0.339471,0.814061,0.100284,0.099952,0.179199,0.670662,0.318997,0.001978,0.241852,0.780715,0.587846,0.719468,0.480640
1,77JzXZonNumWsuXKy9vr3U,Beasty,Hoodie SZN,A Boogie Wit da Hoodie,0.2920,0.860,152829.0,0.418,0.000000,7.0,0.1060,-9.061,0.0,0.1580,126.023,4.0,0.374,3r5hf3Cj3EMh1C2saQ8jyt,2018-12-21,59,True,all that we know is the logos and hb \n the gl...,all that we know is the logos and hb the glock...,0.0,1.5,0.951363,0.118000,0.993211,0.778178,0.651997,0.856554,0.322349,0.608428,0.010153,0.048880,0.084278,0.000658,0.848225,0.664055,0.801688,0.390560,0.443472
2,18yllZD0TdF7ykcREib8Z1,I Did It,Hoodie SZN,A Boogie Wit da Hoodie,0.1530,0.718,215305.0,0.454,0.000046,8.0,0.1160,-9.012,1.0,0.1270,89.483,4.0,0.196,3r5hf3Cj3EMh1C2saQ8jyt,2018-12-21,57,True,yeah i cant trust these bitches \n i dont got ...,yeah i cant trust these bitches i dont got no ...,-1.0,1.9,0.974373,0.406747,0.700611,0.815956,0.123845,0.771871,0.128306,0.112841,0.631363,0.718800,0.313936,0.001588,0.251038,0.798487,0.450389,0.736179,0.669987
3,1wJRveJZLSb1rjhnUHQiv6,Swervin,Hoodie SZN,A Boogie Wit da Hoodie,0.0153,0.581,189487.0,0.662,0.000000,9.0,0.1110,-5.239,1.0,0.3030,93.023,4.0,0.434,3r5hf3Cj3EMh1C2saQ8jyt,2018-12-21,83,True,uh \n yeah \n oh thats london on da track run ...,uh yeah oh thats london on da track run that s...,0.5,1.4,0.945732,0.312331,0.866823,0.403949,0.815055,0.961068,0.834103,0.872156,0.008053,0.392424,0.068322,0.011624,0.523789,0.160812,0.532799,0.075188,0.109408
4,0jAfdqv18goRTUxm3ilRjb,Startender,Hoodie SZN,A Boogie Wit da Hoodie,0.0235,0.736,192779.0,0.622,0.000000,6.0,0.1510,-4.653,0.0,0.1330,191.971,4.0,0.506,3r5hf3Cj3EMh1C2saQ8jyt,2018-12-21,71,True,yeah shawty got ass she just got a tummy tuck ...,yeah shawty got ass she just got a tummy tuck ...,-1.0,2.5,0.921820,0.017407,0.824431,0.521194,0.549106,0.949360,0.699198,0.819306,0.011715,0.257703,0.042916,0.000842,0.599420,0.223387,0.610249,0.098895,0.122751
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
25457,0e0FMXCVBFcW2t3uVM90NJ,Simple Machine,Evermotion,Guster,0.0101,0.624,187240.0,0.794,0.004310,4.0,0.1370,-7.018,1.0,0.0476,167.970,4.0,0.781,26NzwzbIFuoOgtsbIm5ryI,2015-01-13,31,False,static steady plastic motion \n lights flash b...,static steady plastic motion lights flash beat...,-1.0,0.9,0.483210,0.614466,0.791187,0.004402,0.170147,0.476026,0.200341,0.314043,0.944821,0.947986,0.454287,0.011603,0.327810,0.959904,0.367588,0.505668,0.712661
25458,3ctly2pUS5Ot33CoKaxmgl,Expectation,Evermotion,Guster,0.4460,0.467,191187.0,0.578,0.000170,11.0,0.0954,-6.467,1.0,0.0344,137.013,4.0,0.250,26NzwzbIFuoOgtsbIm5ryI,2015-01-13,19,False,im alive \n thats the expectation \n oh no \n ...,im alive thats the expectation oh no feels lik...,-1.0,0.3,0.999678,0.781936,0.994449,0.987118,0.005879,0.066235,0.001171,0.006887,0.490115,0.290116,0.856172,0.148689,0.391590,0.988559,0.477067,0.955391,0.976936
25459,4C9C8sWHHTOsldu3TRZoKI,Gangway,Evermotion,Guster,0.1130,0.447,192653.0,0.753,0.004860,2.0,0.3550,-5.284,1.0,0.0284,123.460,4.0,0.431,26NzwzbIFuoOgtsbIm5ryI,2015-01-13,25,False,if i could make a wish of i could right a wron...,if i could make a wish of i could right a wron...,-1.0,0.7,0.999164,0.589154,0.999992,0.999748,0.094155,0.362654,0.114609,0.310047,0.026866,0.094465,0.841245,0.758046,0.699378,0.797153,0.943452,0.942651,0.640199
25460,2C0GzDKThNKZr4thpolEzX,Kid Dreams,Evermotion,Guster,0.2260,0.503,253240.0,0.568,0.019500,5.0,0.0817,-8.075,1.0,0.0336,88.784,4.0,0.557,26NzwzbIFuoOgtsbIm5ryI,2015-01-13,24,False,so there i was fifteen stuck in \n high school...,so there i was fifteen stuck in high school wa...,1.0,0.2,0.975240,0.594793,0.909666,0.751356,0.315535,0.734684,0.207368,0.224545,0.631730,0.911782,0.844109,0.095943,0.087876,0.484756,0.518084,0.479072,0.334271


In [0]:
spotify_complete["explicit"] = spotify_complete["explicit"].astype("int64")
spotify_complete = spotify_complete.drop("duration_ms", axis=1)
spotify_complete

Unnamed: 0,id,song,album,artist,acousticness,danceability,energy,instrumentalness,key,liveness,loudness,mode,speechiness,tempo,time_signature,valence,album_id,date,popularity,explicit,chain_lyrics,lyrics,polarity,magnitude,artistic,emotion,imagination,defiance,assertive,cheerful,outgoing,gregarious,modesty,stubborn,sympathy,trust,fiery,melancholy,immoderation,self-conscious,stress
0,0Veyvc3n9AcLSoK3r1dA12,Voices In My Head,Hoodie SZN,A Boogie Wit da Hoodie,0.0555,0.754,0.663,0.000000,6.0,0.1010,-6.311,0.0,0.4270,90.195,4.0,0.207,3r5hf3Cj3EMh1C2saQ8jyt,2018-12-21,58,1,monstas gon tear it up all she ever wanted was...,monstas gon tear it up all she ever wanted was...,-1.0,1.4,0.983429,0.269785,0.845684,0.833923,0.339471,0.814061,0.100284,0.099952,0.179199,0.670662,0.318997,0.001978,0.241852,0.780715,0.587846,0.719468,0.480640
1,77JzXZonNumWsuXKy9vr3U,Beasty,Hoodie SZN,A Boogie Wit da Hoodie,0.2920,0.860,0.418,0.000000,7.0,0.1060,-9.061,0.0,0.1580,126.023,4.0,0.374,3r5hf3Cj3EMh1C2saQ8jyt,2018-12-21,59,1,all that we know is the logos and hb \n the gl...,all that we know is the logos and hb the glock...,0.0,1.5,0.951363,0.118000,0.993211,0.778178,0.651997,0.856554,0.322349,0.608428,0.010153,0.048880,0.084278,0.000658,0.848225,0.664055,0.801688,0.390560,0.443472
2,18yllZD0TdF7ykcREib8Z1,I Did It,Hoodie SZN,A Boogie Wit da Hoodie,0.1530,0.718,0.454,0.000046,8.0,0.1160,-9.012,1.0,0.1270,89.483,4.0,0.196,3r5hf3Cj3EMh1C2saQ8jyt,2018-12-21,57,1,yeah i cant trust these bitches \n i dont got ...,yeah i cant trust these bitches i dont got no ...,-1.0,1.9,0.974373,0.406747,0.700611,0.815956,0.123845,0.771871,0.128306,0.112841,0.631363,0.718800,0.313936,0.001588,0.251038,0.798487,0.450389,0.736179,0.669987
3,1wJRveJZLSb1rjhnUHQiv6,Swervin,Hoodie SZN,A Boogie Wit da Hoodie,0.0153,0.581,0.662,0.000000,9.0,0.1110,-5.239,1.0,0.3030,93.023,4.0,0.434,3r5hf3Cj3EMh1C2saQ8jyt,2018-12-21,83,1,uh \n yeah \n oh thats london on da track run ...,uh yeah oh thats london on da track run that s...,0.5,1.4,0.945732,0.312331,0.866823,0.403949,0.815055,0.961068,0.834103,0.872156,0.008053,0.392424,0.068322,0.011624,0.523789,0.160812,0.532799,0.075188,0.109408
4,0jAfdqv18goRTUxm3ilRjb,Startender,Hoodie SZN,A Boogie Wit da Hoodie,0.0235,0.736,0.622,0.000000,6.0,0.1510,-4.653,0.0,0.1330,191.971,4.0,0.506,3r5hf3Cj3EMh1C2saQ8jyt,2018-12-21,71,1,yeah shawty got ass she just got a tummy tuck ...,yeah shawty got ass she just got a tummy tuck ...,-1.0,2.5,0.921820,0.017407,0.824431,0.521194,0.549106,0.949360,0.699198,0.819306,0.011715,0.257703,0.042916,0.000842,0.599420,0.223387,0.610249,0.098895,0.122751
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
25457,0e0FMXCVBFcW2t3uVM90NJ,Simple Machine,Evermotion,Guster,0.0101,0.624,0.794,0.004310,4.0,0.1370,-7.018,1.0,0.0476,167.970,4.0,0.781,26NzwzbIFuoOgtsbIm5ryI,2015-01-13,31,0,static steady plastic motion \n lights flash b...,static steady plastic motion lights flash beat...,-1.0,0.9,0.483210,0.614466,0.791187,0.004402,0.170147,0.476026,0.200341,0.314043,0.944821,0.947986,0.454287,0.011603,0.327810,0.959904,0.367588,0.505668,0.712661
25458,3ctly2pUS5Ot33CoKaxmgl,Expectation,Evermotion,Guster,0.4460,0.467,0.578,0.000170,11.0,0.0954,-6.467,1.0,0.0344,137.013,4.0,0.250,26NzwzbIFuoOgtsbIm5ryI,2015-01-13,19,0,im alive \n thats the expectation \n oh no \n ...,im alive thats the expectation oh no feels lik...,-1.0,0.3,0.999678,0.781936,0.994449,0.987118,0.005879,0.066235,0.001171,0.006887,0.490115,0.290116,0.856172,0.148689,0.391590,0.988559,0.477067,0.955391,0.976936
25459,4C9C8sWHHTOsldu3TRZoKI,Gangway,Evermotion,Guster,0.1130,0.447,0.753,0.004860,2.0,0.3550,-5.284,1.0,0.0284,123.460,4.0,0.431,26NzwzbIFuoOgtsbIm5ryI,2015-01-13,25,0,if i could make a wish of i could right a wron...,if i could make a wish of i could right a wron...,-1.0,0.7,0.999164,0.589154,0.999992,0.999748,0.094155,0.362654,0.114609,0.310047,0.026866,0.094465,0.841245,0.758046,0.699378,0.797153,0.943452,0.942651,0.640199
25460,2C0GzDKThNKZr4thpolEzX,Kid Dreams,Evermotion,Guster,0.2260,0.503,0.568,0.019500,5.0,0.0817,-8.075,1.0,0.0336,88.784,4.0,0.557,26NzwzbIFuoOgtsbIm5ryI,2015-01-13,24,0,so there i was fifteen stuck in \n high school...,so there i was fifteen stuck in high school wa...,1.0,0.2,0.975240,0.594793,0.909666,0.751356,0.315535,0.734684,0.207368,0.224545,0.631730,0.911782,0.844109,0.095943,0.087876,0.484756,0.518084,0.479072,0.334271


In [0]:
spotify_complete["date"] = pd.to_datetime(spotify_complete["date"])
spotify_complete["date"]

0       2018-12-21
1       2018-12-21
2       2018-12-21
3       2018-12-21
4       2018-12-21
           ...    
25457   2015-01-13
25458   2015-01-13
25459   2015-01-13
25460   2015-01-13
25461   2015-01-13
Name: date, Length: 25462, dtype: datetime64[ns]

In [0]:
spotify_complete.to_csv("/content/drive/My Drive/spotify_complete.csv", index=False)