# Getting Started
#### Imports
We make use of Python 3 along with a few imported libraries: 
- <a href="https://pandas.pydata.org/pandas-docs/stable/">pandas</a> and <a href="https://numpy.org/">numpy</a> to organize our data
- <a href="https://matplotlib.org/">matplotlib</a> and <a href="https://seaborn.pydata.org/">seaborn</a> to visualize our data
- <a href="https://scikit-learn.org/stable/">sklearn</a> to create models and make predictions based on our data
- <a href="https://spotipy.readthedocs.io/en/latest/">spotipy</a>, which we will go into more below.

In [2]:
import spotipy
from spotipy.oauth2 import SpotifyClientCredentials
import spotipy.util as util
import seaborn as sb
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
from sklearn import tree, ensemble, neighbors, model_selection, metrics

#### Authorization
Spotipy is really cool! It enables us to gain access to the spotify API and use it in a more "pythonic" way (simply put, easier for us to use in python). Once we went on the Spotify website and apply for access to the API, we used the ID and Secret ID given to us in our authorization code below, so we can gain access to the API and parse through any data we use.

In [3]:
sb.set()

sp = spotipy.Spotify()
        
cid = "6d49981d346842a1844ab1612afae8ba"
csec = "e33407ff688c4d48908d5f5286fe8e41"


client_credentials_manager = SpotifyClientCredentials(client_id=cid, client_secret=csec)
sp = spotipy.Spotify(client_credentials_manager=client_credentials_manager)

# Data Collection and Processing
#### Creating Playlists
Now we get to the actual data collection process. As soon as we started this project, we hit the question: "How are we going to get the data on every grammy winner and nominee for the past 10 years?"

Obviously there's no "Grammy" section on Spotify, so we decided to cheat a little bit. Every song, album, user, and playlist has a specific ID within Spotify called its URI, which can easily be found through the settings of whatever page you are on. The important part here is that each playlist has its own ID, so we made our own playlists containing all the data we need!

We created playlists for the winners and the nominees of the past 10 years for Song of the Year, Album of the Year, and Best New Artist. With these playlists, we can access all the data we want much faster, instead of having to manually go through every song that we want.

#### What data do we actually want?
Let's start with Song of the Year. In spotify, they assign numerical float values to each song's characteristics. Listed below are the values we are going to retrieve for each song, descriptions retrieved from the <a href="https://developer.spotify.com/documentation/web-api/reference/tracks/get-audio-features/">Spotify API</a>.
- Name: Name of the song
- Release Date: Release date of the song
- Popularity: How "popular" is the song compared to other songs currently
- Danceability: How suitable a track is for dancing based on a combination of musical elements including tempo, rhythm stability, beat strength, and overall regularity. A value of 0.0 is least danceable and 1.0 is most danceable. 
- Energy: Measure from 0.0 to 1.0 and represents a perceptual measure of intensity and activity. Typically, energetic tracks feel fast, loud, and noisy. For example, death metal has high energy, while a Bach prelude scores low on the scale. Perceptual features contributing to this attribute include dynamic range, perceived loudness, timbre, onset rate, and general entropy. 
- Loudness: The overall loudness of the track in decibals
- Speechiness: The presence of spoken words in a track. The more exclusively speech-like the recording (e.g. talk show, audio book, poetry), the closer to 1.0 the attribute value.
- Acousticness: A confidence measure from 0.0 to 1.0 of whether the track is acoustic. 1.0 represents high confidence the track is acoustic. 
- Liveness: Detects the presence of an audience in the recording. Higher liveness values represent an increased probability that the track was performed live.
- Valence: 	A measure from 0.0 to 1.0 describing the musical positiveness conveyed by a track. Tracks with high valence sound more positive (e.g. happy, cheerful, euphoric), while tracks with low valence sound more negative (e.g. sad, depressed, angry).
- Tempo: The overall estimated tempo of a track in beats per minute (BPM).

We also keep track of if a particular song won or lost in the year that it was nominated. The ideology here is that we use a song's characteristics and find patterns between the ones who won and the ones who lost using Machine Learning, in order to potentially predict this year's winners.

Finally, we create a dataframe to store all of this data. Keep in mind that Spotify API calls take a bit of time, especially when you run it on hundreds of songs (album of the year has over 500 songs, each with a lot of data to parse through), so we exported our dataframe to a CSV file immediately so accessing the data would be easier and we only need to run this code once.

In [3]:
#Access the playlist for song of the year winners for the past 10 years using the URI of the user (Tomi Olusina) 
#who made it and the playlist URI
playlist_tracks = sp.user_playlist_tracks('22b7g6udnfkubfplnsla3ww5a', '7pF07PQEdD4e0AEoo6ue9G', fields='items,uri,name,id,total', market='fr')

#Create arrays for every category in our dataframe
name = []
date = []
pop = []
dance = []
energy = []
loud = []
speech = []
acous = []
live = []
valence = []
tempo = []
win = []

#Parse through the playlist data and add the values we are looking for for every song in the playlist
for i in playlist_tracks['items']:
    name.append(i['track']['name'])
    date.append(i['track']['album']['release_date'])
    pop.append(i['track']['popularity'])
    dance.append(sp.audio_features('spotify:track:'+i['track']['id'])[0]['danceability'])
    energy.append(sp.audio_features('spotify:track:'+i['track']['id'])[0]['energy'])
    loud.append(sp.audio_features('spotify:track:'+i['track']['id'])[0]['loudness'])
    speech.append(sp.audio_features('spotify:track:'+i['track']['id'])[0]['speechiness'])
    acous.append(sp.audio_features('spotify:track:'+i['track']['id'])[0]['acousticness'])
    live.append(sp.audio_features('spotify:track:'+i['track']['id'])[0]['liveness'])
    valence.append(sp.audio_features('spotify:track:'+i['track']['id'])[0]['valence'])
    tempo.append(sp.audio_features('spotify:track:'+i['track']['id'])[0]['tempo'])
    win.append(1)
    
#Now access the nominees for song of the year for the past 10 years and repeat the same process
playlist_tracks = sp.user_playlist_tracks('22b7g6udnfkubfplnsla3ww5a', '4OzkYLxp0VzMNlYBXq2faU', fields='items,uri,name,id,total', market='fr')
for i in playlist_tracks['items']:
    name.append(i['track']['name'])
    date.append(i['track']['album']['release_date'])
    pop.append(i['track']['popularity'])
    dance.append(sp.audio_features('spotify:track:'+i['track']['id'])[0]['danceability'])
    energy.append(sp.audio_features('spotify:track:'+i['track']['id'])[0]['energy'])
    loud.append(sp.audio_features('spotify:track:'+i['track']['id'])[0]['loudness'])
    speech.append(sp.audio_features('spotify:track:'+i['track']['id'])[0]['speechiness'])
    acous.append(sp.audio_features('spotify:track:'+i['track']['id'])[0]['acousticness'])
    live.append(sp.audio_features('spotify:track:'+i['track']['id'])[0]['liveness'])
    valence.append(sp.audio_features('spotify:track:'+i['track']['id'])[0]['valence'])
    tempo.append(sp.audio_features('spotify:track:'+i['track']['id'])[0]['tempo'])
    win.append(0)

#Create a dataframe and add all the columns made
songdata = pd.DataFrame()
songdata.insert(0, "Name", name, True)
songdata.insert(1, "Date Released", date, True)
songdata.insert(2, "Popularity", pop, True)
songdata.insert(3, "Danceability", dance, True)
songdata.insert(4, "Energy", energy, True)
songdata.insert(5, "Loudness", loud, True)
songdata.insert(6, "Speechiness", speech, True)
songdata.insert(7, "Acousticness", acous, True)
songdata.insert(8, "Liveness", live, True)
songdata.insert(9, "Valence", valence, True)
songdata.insert(10, "Tempo", tempo, True)
songdata.insert(11, "Did They Win?", win, True)

#Send it to a CSV file
songdata
songdata.to_csv('topsongs.csv', encoding='utf-8', index=False)

In [4]:
songdb = pd.read_csv("topsongs.csv")
songdb

Unnamed: 0,Name,Date Released,Popularity,Danceability,Energy,Loudness,Speechiness,Acousticness,Liveness,Valence,Tempo,Did They Win?
0,Single Ladies (Put a Ring on It),2008-11-14,71,0.426,0.584,-5.293,0.2960,0.03830,0.1880,0.272,193.437,1
1,Need You Now,2010-01-01,69,0.587,0.622,-5.535,0.0303,0.09270,0.2000,0.231,107.943,1
2,Rolling in the Deep,2011-01-19,77,0.730,0.770,-5.114,0.0298,0.13800,0.0473,0.507,104.948,1
3,We Are Young (feat. Janelle Monáe),2012-02-14,76,0.378,0.638,-5.576,0.0750,0.02000,0.0849,0.735,184.086,1
4,Royals,2013-01-01,79,0.674,0.428,-9.504,0.1220,0.12100,0.1320,0.337,84.878,1
...,...,...,...,...,...,...,...,...,...,...,...,...
98,Umbrella,2008-06-02,81,0.583,0.829,-4.603,0.1340,0.00864,0.0426,0.575,174.028,0
99,American Boy,2008-03-28,80,0.727,0.729,-2.990,0.3260,0.17100,0.0700,0.512,117.932,0
100,Chasing Pavements,2008-01-28,68,0.614,0.470,-6.090,0.0255,0.29100,0.1110,0.329,80.045,0
101,I'm Yours,2008-05-12,84,0.686,0.457,-8.322,0.0468,0.59500,0.1050,0.718,150.953,0


#### Album of the Year
Now we move to Album of the Year nominees and winners of the past 10 years. Obviously, you can't make a playlist of albums, so the workaround we created was simply creating playlists from one song from each album. From there, we could access the album's URI in the song's data, and use that URI to get all the data we need.

The data collection process here is very similar, but there is one slight difference. The song's characteristics like danceability and energy are only applicable to songs, not albums in the Spotify API. Therefore, to get the characteristics of an album, we take these characteristics and average them over all the songs in the album. For example, we get the danceability of an album by averaging the danceabilities of every song in the album.

Also, there was a bug in the data with one of the Taylor Swift albums and its URI, so we manually put in its URI and Name. Who knew she could be so problematic?

In [None]:
#Get the songs for the winners once again from the playlist we made
playlist_albums = sp.user_playlist_tracks('22b7g6udnfkubfplnsla3ww5a', '3q7RrjHTPcG1pAL3j4QcJF', fields='items,uri,name,id,total', market='fr')
playlist_albums

#smh Taylor
albums = ['2dqn5yOQWdyGwOpOIi9O4x']
name = ['Fearless']
date = []
pop = []
dance = []
energy = []
loud = []
speech = []
acous = []
live = []
valence = []
tempo = []
win = []

#Populate the name/date/popularity arrays now since that data is accessible, and get a list of the album URIs
for i in playlist_albums['items']:
    name.append(i['track']['album']['name'])
    date.append(i['track']['album']['release_date'])
    pop.append(i['track']['popularity'])
    win.append(1)
    albums.append(i['track']['album']['id'])

#Again, smh Taylor
del albums[1]
del name[1]

danceavg = 0
energyavg = 0
loudavg = 0
speechavg = 0
acousavg = 0
liveavg = 0
valenceavg = 0
tempoavg = 0
count = 0

#Go through each album, and get the avaerage characteristics for each album
for a in albums:
    songs = sp.album_tracks(a)
    for i in songs['items']:
        danceavg = danceavg + sp.audio_features('spotify:track:'+i['id'])[0]['danceability']
        energyavg = energyavg + sp.audio_features('spotify:track:'+i['id'])[0]['energy']
        loudavg = loudavg + sp.audio_features('spotify:track:'+i['id'])[0]['loudness']
        speechavg = speechavg + sp.audio_features('spotify:track:'+i['id'])[0]['speechiness']
        acousavg = acousavg + sp.audio_features('spotify:track:'+i['id'])[0]['acousticness']
        liveavg = liveavg + sp.audio_features('spotify:track:'+i['id'])[0]['liveness']
        valenceavg = valenceavg + sp.audio_features('spotify:track:'+i['id'])[0]['valence']
        tempoavg = tempoavg + sp.audio_features('spotify:track:'+i['id'])[0]['tempo']
        count = count + 1
    dance.append(danceavg/count)
    energy.append(energyavg/count)
    loud.append(loudavg/count)
    speech.append(speechavg/count)
    acous.append(acousavg/count)
    live.append(liveavg/count)
    valence.append(valenceavg/count)
    tempo.append(tempoavg/count)
    danceavg = 0
    energyavg = 0
    loudavg = 0
    speechavg = 0
    acousavg = 0
    liveavg = 0
    valenceavg = 0
    tempoavg = 0
    count = 0
    
#Repeat the process for nominees
playlist_albums = sp.user_playlist_tracks('22b7g6udnfkubfplnsla3ww5a', '6ua8ZJ16YFgCou0ecqPgdV', fields='items,uri,name,id,total', market='fr')
playlist_albums

albums = []

for i in playlist_albums['items']:
    name.append(i['track']['album']['name'])
    date.append(i['track']['album']['release_date'])
    pop.append(i['track']['popularity'])
    win.append(0)
    albums.append(i['track']['album']['id'])

danceavg = 0
energyavg = 0
loudavg = 0
speechavg = 0
acousavg = 0
liveavg = 0
valenceavg = 0
tempoavg = 0
count = 0

    
for a in albums:
    songs = sp.album_tracks(a)
    for i in songs['items']:
        danceavg = danceavg + sp.audio_features('spotify:track:'+i['id'])[0]['danceability']
        energyavg = energyavg + sp.audio_features('spotify:track:'+i['id'])[0]['energy']
        loudavg = loudavg + sp.audio_features('spotify:track:'+i['id'])[0]['loudness']
        speechavg = speechavg + sp.audio_features('spotify:track:'+i['id'])[0]['speechiness']
        acousavg = acousavg + sp.audio_features('spotify:track:'+i['id'])[0]['acousticness']
        liveavg = liveavg + sp.audio_features('spotify:track:'+i['id'])[0]['liveness']
        valenceavg = valenceavg + sp.audio_features('spotify:track:'+i['id'])[0]['valence']
        tempoavg = tempoavg + sp.audio_features('spotify:track:'+i['id'])[0]['tempo']
        count = count + 1
    dance.append(danceavg/count)
    energy.append(energyavg/count)
    loud.append(loudavg/count)
    speech.append(speechavg/count)
    acous.append(acousavg/count)
    live.append(liveavg/count)
    valence.append(valenceavg/count)
    tempo.append(tempoavg/count)
    danceavg = 0
    energyavg = 0
    loudavg = 0
    speechavg = 0
    acousavg = 0
    liveavg = 0
    valenceavg = 0
    tempoavg = 0
    count = 0

albumdata = pd.DataFrame()
albumdata.insert(0, "Name", name, True)
albumdata.insert(1, "Date Released", date, True)
albumdata.insert(2, "Popularity", pop, True)
albumdata.insert(3, "Danceability", dance, True)
albumdata.insert(4, "Energy", energy, True)
albumdata.insert(5, "Loudness", loud, True)
albumdata.insert(6, "Speechiness", speech, True)
albumdata.insert(7, "Acousticness", acous, True)
albumdata.insert(8, "Liveness", live, True)
albumdata.insert(9, "Valence", valence, True)
albumdata.insert(10, "Tempo", tempo, True)
albumdata.insert(11, "Did They Win?", win, True)

albumdata
albumdata.to_csv('topalbums.csv', encoding='utf-8', index=False)

In [5]:
albumdb = pd.read_csv("topalbums.csv")
albumdb

Unnamed: 0,Name,Date Released,Popularity,Danceability,Energy,Loudness,Speechiness,Acousticness,Liveness,Valence,Tempo,Did They Win?
0,Fearless,2008-11-10,60,0.592769,0.637308,-5.276000,0.032992,0.175937,0.145554,0.384231,113.235462,1
1,The Suburbs,2010,71,0.448625,0.721381,-7.534563,0.045250,0.217593,0.205238,0.418344,127.591062,1
2,21,2011-01-19,77,0.581091,0.534091,-6.081727,0.036018,0.439371,0.117536,0.414636,133.496909,1
3,Babel,2012-09-21,52,0.404250,0.558167,-8.333750,0.038817,0.185863,0.127025,0.281533,116.001083,1
4,Random Access Memories,2013-05-17,64,0.686462,0.585846,-10.704077,0.044315,0.287906,0.132877,0.488231,114.471077,1
...,...,...,...,...,...,...,...,...,...,...,...,...
99,These Days,2006-01-01,14,0.586674,0.567860,-7.979581,0.031730,0.485630,0.165735,0.530228,121.711581,0
100,In Rainbows,2007-12-28,61,0.529400,0.591700,-8.693000,0.038340,0.460245,0.093220,0.394620,121.130400,0
101,Tha Carter III,2008-01-01,75,0.679063,0.580375,-8.258250,0.225844,0.091419,0.258544,0.525188,125.076688,0
102,Viva La Vida or Death and All His Friends,2008-05-26,80,0.319700,0.658000,-8.284900,0.038630,0.115038,0.208800,0.261720,126.567200,0


#### Best New Artist
Finally, we get the data for best new artist. Our workaround here was to simply make playlists with any song from the artist, then use that song to get the artist's URI to work with.

The data collection process is very similar once again, but there is much less data to work with in the Spotify API for this award. We simply retrieved the artist's name, number of followers, and popularity to make models and predictions for this award.

In [5]:
#Same process as before, just use the song's data from the playlist to access the artist's data
#This is easier here because artist data is also included within the original playlist's data
playlist_artists = sp.user_playlist_tracks('22b7g6udnfkubfplnsla3ww5a', '6Wu1dIW3n13jsx0WoODnlQ', fields='items,uri,name,id,total', market='fr')
playlist_artists

name = []
followers = []
pop = []
win = []

for i in playlist_artists['items']:
    name.append(i['track']['album']['artists'][0]['name'])
    artist = sp.artist(i['track']['album']['artists'][0]['id'])
    followers.append(artist['followers']['total'])
    pop.append(artist['popularity'])
    win.append(1)
    
playlist_artists = sp.user_playlist_tracks('22b7g6udnfkubfplnsla3ww5a', '6ua8ZJ16YFgCou0ecqPgdV', fields='items,uri,name,id,total', market='fr')
playlist_artists

for i in playlist_artists['items']:
    name.append(i['track']['album']['artists'][0]['name'])
    artist = sp.artist(i['track']['album']['artists'][0]['id'])
    followers.append(artist['followers']['total'])
    pop.append(artist['popularity'])
    win.append(0) 

    
artistdata = pd.DataFrame()
artistdata.insert(0, "Name", name, True)
artistdata.insert(1, "Followers", followers, True)
artistdata.insert(2, "Popularity", pop, True)
artistdata.insert(3, "Did They Win", win, True)

artistdata.to_csv('topartists.csv', encoding='utf-8', index=False)


In [6]:
artistdb = pd.read_csv("topartists.csv")
artistdb

Unnamed: 0,Name,Followers,Popularity,Did They Win
0,Zac Brown Band,1874027,77,1
1,Esperanza Spalding,236041,62,1
2,Bon Iver,2490997,79,1
3,fun.,3313046,73,1
4,Macklemore & Ryan Lewis,2424691,78,1
...,...,...,...,...
98,Vince Gill,284995,66,0
99,Radiohead,4742554,82,0
100,Lil Wayne,8506468,91,0
101,Coldplay,23419831,93,0


#### Current Nominees

We aren't done yet! We need to create these same 3 dataframes, but for the current nominees for each award. The data collection process is similar once again, so it is reproduced below.

In [4]:

playlist_tracks = sp.user_playlist_tracks('22b7g6udnfkubfplnsla3ww5a', '7InjGr6kiABbxyV5q9GPW1', fields='items,uri,name,id,total', market='fr')

name = []
date = []
pop = []
dance = []
energy = []
loud = []
speech = []
acous = []
live = []
valence = []
tempo = []

for i in playlist_tracks['items']:
    name.append(i['track']['name'])
    date.append(i['track']['album']['release_date'])
    pop.append(i['track']['popularity'])
    dance.append(sp.audio_features('spotify:track:'+i['track']['id'])[0]['danceability'])
    energy.append(sp.audio_features('spotify:track:'+i['track']['id'])[0]['energy'])
    loud.append(sp.audio_features('spotify:track:'+i['track']['id'])[0]['loudness'])
    speech.append(sp.audio_features('spotify:track:'+i['track']['id'])[0]['speechiness'])
    acous.append(sp.audio_features('spotify:track:'+i['track']['id'])[0]['acousticness'])
    live.append(sp.audio_features('spotify:track:'+i['track']['id'])[0]['liveness'])
    valence.append(sp.audio_features('spotify:track:'+i['track']['id'])[0]['valence'])
    tempo.append(sp.audio_features('spotify:track:'+i['track']['id'])[0]['tempo'])

songdata = pd.DataFrame()
songdata.insert(0, "Name", name, True)
songdata.insert(1, "Date Released", date, True)
songdata.insert(2, "Popularity", pop, True)
songdata.insert(3, "Danceability", dance, True)
songdata.insert(4, "Energy", energy, True)
songdata.insert(5, "Loudness", loud, True)
songdata.insert(6, "Speechiness", speech, True)
songdata.insert(7, "Acousticness", acous, True)
songdata.insert(8, "Liveness", live, True)
songdata.insert(9, "Valence", valence, True)
songdata.insert(10, "Tempo", tempo, True)

songdata
songdata.to_csv('2020songs.csv', encoding='utf-8', index=False)

In [7]:
songdb2 = pd.read_csv("2020songs.csv")
songdb2

Unnamed: 0,Name,Date Released,Popularity,Danceability,Energy,Loudness,Speechiness,Acousticness,Liveness,Valence,Tempo
0,Always Remember Us This Way,2018-10-05,82,0.553,0.502,-5.972,0.0409,0.299,0.764,0.296,129.976
1,bad guy,2019-03-29,96,0.701,0.425,-10.965,0.375,0.328,0.1,0.562,135.128
2,Bring My Flowers Now,2019-08-23,42,0.557,0.198,-9.911,0.0461,0.948,0.217,0.336,133.823
3,Hard Place,2018-11-02,70,0.614,0.719,-4.694,0.0955,0.179,0.163,0.34,160.075
4,Lover,2019-08-23,88,0.359,0.543,-7.582,0.0919,0.492,0.118,0.453,68.534
5,Norman fucking Rockwell,2019-08-30,77,0.218,0.215,-12.49,0.0368,0.967,0.0948,0.138,76.74
6,Someone You Loved,2019-05-17,96,0.501,0.405,-5.679,0.0319,0.751,0.105,0.446,109.891
7,Truth Hurts,2019-05-03,91,0.715,0.624,-3.046,0.114,0.11,0.123,0.412,158.087


In [None]:
playlist_albums = sp.user_playlist_tracks('22b7g6udnfkubfplnsla3ww5a', '3uMdi0Gvd3rEp2aiWjy5jM', fields='items,uri,name,id,total', market='fr')
playlist_albums

albums = []
name = []
date = []
pop = []
dance = []
energy = []
loud = []
speech = []
acous = []
live = []
valence = []
tempo = []

for i in playlist_albums['items']:
    name.append(i['track']['album']['name'])
    date.append(i['track']['album']['release_date'])
    pop.append(i['track']['popularity'])
    albums.append(i['track']['album']['id'])

danceavg = 0
energyavg = 0
loudavg = 0
speechavg = 0
acousavg = 0
liveavg = 0
valenceavg = 0
tempoavg = 0
count = 0

for a in albums:
    songs = sp.album_tracks(a)
    for i in songs['items']:
        danceavg = danceavg + sp.audio_features('spotify:track:'+i['id'])[0]['danceability']
        energyavg = energyavg + sp.audio_features('spotify:track:'+i['id'])[0]['energy']
        loudavg = loudavg + sp.audio_features('spotify:track:'+i['id'])[0]['loudness']
        speechavg = speechavg + sp.audio_features('spotify:track:'+i['id'])[0]['speechiness']
        acousavg = acousavg + sp.audio_features('spotify:track:'+i['id'])[0]['acousticness']
        liveavg = liveavg + sp.audio_features('spotify:track:'+i['id'])[0]['liveness']
        valenceavg = valenceavg + sp.audio_features('spotify:track:'+i['id'])[0]['valence']
        tempoavg = tempoavg + sp.audio_features('spotify:track:'+i['id'])[0]['tempo']
        count = count + 1
    dance.append(danceavg/count)
    energy.append(energyavg/count)
    loud.append(loudavg/count)
    speech.append(speechavg/count)
    acous.append(acousavg/count)
    live.append(liveavg/count)
    valence.append(valenceavg/count)
    tempo.append(tempoavg/count)
    danceavg = 0
    energyavg = 0
    loudavg = 0
    speechavg = 0
    acousavg = 0
    liveavg = 0
    valenceavg = 0
    tempoavg = 0
    count = 0
    

albumdata = pd.DataFrame()
albumdata.insert(0, "Name", name, True)
albumdata.insert(1, "Date Released", date, True)
albumdata.insert(2, "Popularity", pop, True)
albumdata.insert(3, "Danceability", dance, True)
albumdata.insert(4, "Energy", energy, True)
albumdata.insert(5, "Loudness", loud, True)
albumdata.insert(6, "Speechiness", speech, True)
albumdata.insert(7, "Acousticness", acous, True)
albumdata.insert(8, "Liveness", live, True)
albumdata.insert(9, "Valence", valence, True)
albumdata.insert(10, "Tempo", tempo, True)

albumdata
albumdata.to_csv('2020albums.csv', encoding='utf-8', index=False)

In [8]:
albumdb2 = pd.read_csv("2020albums.csv")
albumdb2

Unnamed: 0,Name,Date Released,Popularity,Danceability,Energy,Loudness,Speechiness,Acousticness,Liveness,Valence,Tempo
0,"i,i",2019-08-09,58,0.465231,0.324862,-12.271923,0.098315,0.875923,0.239092,0.249631,106.955769
1,Norman Fucking Rockwell!,2019-08-30,77,0.434214,0.271457,-11.989429,0.042664,0.823,0.120293,0.228979,97.773857
2,"WHEN WE ALL FALL ASLEEP, WHERE DO WE GO?",2019-03-29,96,0.583786,0.281721,-15.085071,0.193086,0.656857,0.239307,0.288007,105.494286
3,"thank u, next",2019-02-08,89,0.66825,0.525583,-6.622,0.141642,0.298733,0.154742,0.388917,113.231167
4,I Used To Know Her,2019-08-30,60,0.549263,0.492421,-9.299053,0.165763,0.471753,0.167553,0.370826,107.297263
5,7 EP,2019-06-21,88,0.7515,0.586625,-6.646625,0.110963,0.15275,0.158362,0.531375,142.742375
6,Cuz I Love You,2019-04-19,67,0.706818,0.731545,-4.324364,0.129882,0.051543,0.304291,0.631727,127.846545
7,Father of the Bride,2019-05-03,62,0.642389,0.504167,-7.761889,0.062189,0.522978,0.168917,0.527778,118.150222


In [10]:
playlist_artists = sp.user_playlist_tracks('22b7g6udnfkubfplnsla3ww5a', '5Ihltik64Mn9LOlhgya8UE', fields='items,uri,name,id,total', market='fr')
playlist_artists

name = []
followers = []
pop = []

for i in playlist_artists['items']:
    name.append(i['track']['album']['artists'][0]['name'])
    artist = sp.artist(i['track']['album']['artists'][0]['id'])
    followers.append(artist['followers']['total'])
    pop.append(artist['popularity'])

    
artistdata = pd.DataFrame()
artistdata.insert(0, "Name", name, True)
artistdata.insert(1, "Followers", followers, True)
artistdata.insert(2, "Popularity", pop, True)

artistdata
artistdata.to_csv('2020artists.csv', encoding='utf-8', index=False)

In [9]:
artistdb2 = pd.read_csv("2020artists.csv")
artistdb2

Unnamed: 0,Name,Followers,Popularity
0,Black Pumas,71329,61
1,Billie Eilish,17055834,98
2,Lil Nas X,1900496,87
3,Lizzo,1602632,88
4,Maggie Rogers,509106,76
5,ROSALÍA,1970207,85
6,Tank and The Bangas,115219,50
7,Yola,26757,54
