# Music Over the Decades

Music has been around for tens of thousands of years.
Music first recorded in ---
Spotify created in ---

In this tutorial, we will use the Spotify API to look at popular music over the decades. Specifically, we will look at the All out 50's, 60's, 70's, 80's, 90's, 00's, and 10's playlists created by Spotify. This should provide us with a sufficient overview of popular music in the last century.


### Part 1: Data Collection

First things first, we need to collect our data. We will be using the Spotipy python library to interact with Spotify. The following steps allow us to use the Spotify API. This can easily be found on the Spotipy website which I have linked at the end of this tutorial. The environment variables listed below are for my personal Spotify account. You will have your own.

In [1]:
import spotipy
from spotipy.oauth2 import SpotifyClientCredentials
import os
import pandas as pd

os.environ['SPOTIPY_CLIENT_ID'] = '14bc2f79e1c9423397627283ad8c518e' # client id variable
os.environ['SPOTIPY_CLIENT_SECRET'] = '84e9c4bd7c0848f59eccbc59f8ce6d97' # client secret variable

auth_manager = SpotifyClientCredentials()
sp = spotipy.Spotify(auth_manager=auth_manager) 

Now that we have access to Spotiy, we need to get our playlists and songs along with all of their data so we can perform some analysis. The two blocks of code below are functions found from another tutorial on How to Create Large Music Datasets Using Spotipy by Max Hilsdorf, link here: https://towardsdatascience.com/how-to-create-large-music-datasets-using-spotipy-40e7242cc6a6

I have made some minor modifications to these functions, but they operate the same way as in the linked tutorial. We will use the analyze_playlist function to loop over all tracks in a playlist and store specific data related to each song in a dataframe. The second function, analyze_playlist_dict, will be used to concatenate the dataframes from each individual playlist into one large dataframe.

In [2]:
def analyze_playlist(playlist_id):
    
    # Create empty dataframe
    playlist_features_list = ["artist", "album", "track_name", "track_id",
                              "danceability", "energy", "key", "loudness",
                              "mode", "speechiness", "instrumentalness", "liveness",
                              "valence", "tempo", "duration_ms", "time_signature"]
    
    playlist_df = pd.DataFrame(columns = playlist_features_list)
    
    # Loop through every track in the playlist, extract features and append the features to the playlist df
    playlist = sp.playlist_tracks(playlist_id)["items"]
    for track in playlist:
        # Create empty dict
        playlist_features = {}
        # Get metadata
        playlist_features["artist"] = track["track"]["album"]["artists"][0]["name"]
        playlist_features["album"] = track["track"]["album"]["name"]
        playlist_features["track_name"] = track["track"]["name"]
        playlist_features["track_id"] = track["track"]["id"]
        
        # Get audio features
        audio_features = sp.audio_features(playlist_features["track_id"])[0]
        for feature in playlist_features_list[4:]:
            playlist_features[feature] = audio_features[feature]
        
        # Concat the dfs
        track_df = pd.DataFrame(playlist_features, index = [0])
        playlist_df = pd.concat([playlist_df, track_df], ignore_index = True)
        
    return playlist_df

In [5]:
def analyze_playlist_dict(playlist_dict):
    i = 0
    # Loop through every playlist in the dict and analyze it
    for name, playlist_id in playlist_dict.items():
        playlist_df = analyze_playlist(playlist_id)
        # Add a playlist column so that we can see which playlist a track belongs too
        playlist_df["playlist"] = name
        #Create or concat df
        if i == 0:
            playlist_dict_df = playlist_df
        else:
            playlist_dict_df = pd.concat([playlist_dict_df, playlist_df], ignore_index = True)
        i += 1
        
    return playlist_dict_df

In order to access playlists, we need their playlist URI's. These can easily be found in the Spotify desktop app. The seven URI's for the mentioned playlists are stored in a dictionary below.

In [6]:
decades_URIs = {
    'all_out_50s' : 'spotify:playlist:37i9dQZF1DWSV3Tk4GO2fq',
    'all_out_60s' : 'spotify:playlist:37i9dQZF1DXaKIA8E7WcJj',
    'all_out_70s' : 'spotify:playlist:37i9dQZF1DWTJ7xPn4vNaz',
    'all_out_80s' : 'spotify:playlist:37i9dQZF1DX4UtSsGT1Sbe',
    'all_out_90s' : 'spotify:playlist:37i9dQZF1DXbTxeAdrVG2l',
    'all_out_00s' : 'spotify:playlist:37i9dQZF1DX4o1oenSJRJd',
    'all_out_10s' : 'spotify:playlist:37i9dQZF1DX5Ejj0EkURtP'
}

Now all we have to do is pass this dict into the second function above to get our beast of a dataframe.

In [None]:
df = analyze_playlist_dict(decades_URIs)

## Part 2: Data Cleaning

## Part 3: Data Exploration and Vizualization

## Part 4: Data Analysis

## Part 5: Data Interpretation

https://towardsdatascience.com/how-to-create-large-music-datasets-using-spotipy-40e7242cc6a6
https://spotipy.readthedocs.io/en/2.16.1/#api-reference