# **Editing Kaggle Dataset for Time Comparison**
In order to conduct a clearer analysis of spotify tracks across time, we used a Kaggle dataset that was specifically designed to compromise tracks from a wide range of eras.

However, this dataset did not contain the feature <u>'artist popularity'</u>. It was thus necessary to use the Spotify API to extract the artist popularity values of these tracks and add them to the dataset.

This Jupyter Notebook will take you through the process of adding 'artist popularity' to the Kaggle dataset

In [1]:
import csv
import spotipy
import pandas as pd
from spotipy.oauth2 import SpotifyClientCredentials
import time # time.sleep() used throughout data extraction to prevent MaxRetriesError from the API

client_id = '6f214ac01be74f798b00a6ca1cc14cb0' # our personal client_id 
client_secret = '131ee3fba4a6432fafb814657def5785' # our personal client_secret

# Obtain authorisation from Spotify
client_credentials_manager = SpotifyClientCredentials(client_id=client_id, client_secret=client_secret)
sp = spotipy.Spotify(client_credentials_manager=client_credentials_manager, retries=0) 

## **Retrieve Artist URI and Track URI from CSV file**
> Artist URI is necessary to retrieve the 'artist popularity' metric\
> Track URI is necessary to _filter out local files/tracks_ that do not exist in the Spotify database

In [7]:
df = pd.read_csv('datasets/top_10000_1960-now.csv')

artist_id = []
with open('datasets/top_10000_1960-now.csv', 'r') as file:
    file = csv.reader(file)
    header = next(file)

    """
    For tracks with multiple artists, the main (i.e. the first) artist was chosen as the metric of popularity
    """

    for line in file:
        if 'local' not in line[0]: # filter out local tracks
            line[2] = line[2].split(',')[0] # only use the main artist
            artist_id.append(line[2])
        else:
            df.drop(df.loc[df['Track URI'] == line[0]].index, inplace=True) # drop local tracks from csv file

## **Helper Function to retrieve 'artist popularity'**

In [8]:
def get_artist_pop(artists, pop_list):
     
    for i in range(0, len(artists), 50):
        if (i+50) > len(artists):
            artist_interval = artists[i:len(artists)]
        else:
            artist_interval = artists[i:i+50]

        ## Retrieve artist information (i.e. popularity, genres, name)
        # Get artist object
        artist_info = sp.artists(artist_interval) # returns an artist object corresponding to the given URI, this object contains detailed info on the artist
        
        # Retrieve artist popularity
        popularity = []
        for d in artist_info["artists"]:
            for k, v in d.items():
                if k == 'popularity':
                    popularity.append(v)

        pop_list.extend(popularity)

        print(f"{i+50} tracks done", end="; ")
        time.sleep(2)

## **Extract data using API**

In [9]:
artist_pop = []

for i in range(0, 9901, 1000):
    print(f"Retrieving artist popularity for tracks {i} to {i+1000}")
    get_artist_pop(artist_id[i:i+1000], artist_pop)
    print()

Retrieving artist popularity for tracks 0 to 1000
50 tracks done; 100 tracks done; 150 tracks done; 200 tracks done; 250 tracks done; 300 tracks done; 350 tracks done; 400 tracks done; 450 tracks done; 500 tracks done; 550 tracks done; 600 tracks done; 650 tracks done; 700 tracks done; 750 tracks done; 800 tracks done; 850 tracks done; 900 tracks done; 950 tracks done; 1000 tracks done; 
Retrieving artist popularity for tracks 1000 to 2000
50 tracks done; 100 tracks done; 150 tracks done; 200 tracks done; 250 tracks done; 300 tracks done; 350 tracks done; 400 tracks done; 450 tracks done; 500 tracks done; 550 tracks done; 600 tracks done; 650 tracks done; 700 tracks done; 750 tracks done; 800 tracks done; 850 tracks done; 900 tracks done; 950 tracks done; 1000 tracks done; 
Retrieving artist popularity for tracks 2000 to 3000
50 tracks done; 100 tracks done; 150 tracks done; 200 tracks done; 250 tracks done; 300 tracks done; 350 tracks done; 400 tracks done; 450 tracks done; 500 tracks

## **Write to new CSV file**

In [11]:
df["Artist Popularity"] = artist_pop # add artist popularity column
df

Unnamed: 0,Track URI,Track Name,Artist URI(s),Artist Name(s),Album URI,Album Name,Album Artist URI(s),Album Artist Name(s),Album Release Date,Album Image URL,...,Acousticness,Instrumentalness,Liveness,Valence,Tempo,Time Signature,Album Genres,Label,Copyrights,Artist Popularity
0,spotify:track:1XAZlnVtthcDZt2NI1Dtxo,Justified & Ancient - Stand by the Jams,spotify:artist:6dYrdRlNZSKaVxYg5IrvCH,The KLF,spotify:album:4MC0ZjNtVP1nDD5lsLxFjc,Songs Collection,spotify:artist:6dYrdRlNZSKaVxYg5IrvCH,The KLF,1992-08-03,https://i.scdn.co/image/ab67616d0000b27355346b...,...,0.015800,0.112000,0.4080,0.504,111.458,4.0,,Jams Communications,"C 1992 Copyright Control, P 1992 Jams Communic...",42
1,spotify:track:6a8GbQIlV8HBUW3c6Uk9PH,I Know You Want Me (Calle Ocho),spotify:artist:0TnOYISbd1XYRBk9myaseg,Pitbull,spotify:album:5xLAcbvbSAlRtPXnKkggXA,Pitbull Starring In Rebelution,spotify:artist:0TnOYISbd1XYRBk9myaseg,Pitbull,2009-10-23,https://i.scdn.co/image/ab67616d0000b27326d73a...,...,0.014200,0.000021,0.2370,0.800,127.045,4.0,,Mr.305/Polo Grounds Music/J Records,"P (P) 2009 RCA/JIVE Label Group, a unit of Son...",84
2,spotify:track:70XtWbcVZcpaOddJftMcVi,From the Bottom of My Broken Heart,spotify:artist:26dSoYclwsYLMAKD3tpOr4,Britney Spears,spotify:album:3WNxdumkSMGMJRhEgK80qx,...Baby One More Time (Digital Deluxe Version),spotify:artist:26dSoYclwsYLMAKD3tpOr4,Britney Spears,1999-01-12,https://i.scdn.co/image/ab67616d0000b2738e4986...,...,0.560000,0.000001,0.3380,0.706,74.981,4.0,,Jive,P (P) 1999 Zomba Recording LLC,81
3,spotify:track:1NXUWyPJk5kO6DQJ5t7bDu,Apeman - 2014 Remastered Version,spotify:artist:1SQRv42e4PjEYfPhS0Tk9E,The Kinks,spotify:album:6lL6HugNEN4Vlc8sj0Zcse,"Lola vs. Powerman and the Moneygoround, Pt. On...",spotify:artist:1SQRv42e4PjEYfPhS0Tk9E,The Kinks,2014-10-20,https://i.scdn.co/image/ab67616d0000b2731e7c53...,...,0.568000,0.000051,0.0384,0.833,75.311,4.0,,Sanctuary Records,"C © 2014 Sanctuary Records Group Ltd., a BMG C...",65
4,spotify:track:72WZtWs6V7uu3aMgMmEkYe,You Can't Always Get What You Want,spotify:artist:22bE4uQ6baNwSHPVcDxLCe,The Rolling Stones,spotify:album:0c78nsgqX6VfniSNWIxwoD,Let It Bleed,spotify:artist:22bE4uQ6baNwSHPVcDxLCe,The Rolling Stones,1969-12-05,https://i.scdn.co/image/ab67616d0000b27373d927...,...,0.675000,0.000073,0.2890,0.497,85.818,4.0,,Universal Music Group,"C © 2002 ABKCO Music & Records Inc., P ℗ 2002 ...",80
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
9994,spotify:track:3kcKlOkQQEPVwxwljbGJ5p,Kernkraft 400 (A Better Day),"spotify:artist:0u6GtibW46tFX7koQ6uNJZ, spotify...","Topic, A7S",spotify:album:2NIChqkijGw4r4Dqfmg0A3,Kernkraft 400 (A Better Day),"spotify:artist:0u6GtibW46tFX7koQ6uNJZ, spotify...","Topic, A7S",2022-06-17,https://i.scdn.co/image/ab67616d0000b273e1cafe...,...,0.184000,0.000020,0.3090,0.400,125.975,4.0,,Virgin,"C © 2022 Topic, under exclusive license to Uni...",73
9995,spotify:track:5k9QrzJFDAp5cXVdzAi02f,Never Say Never - Radio Edit,spotify:artist:1ScZSjoYAihNNm9qlhzDnL,Vandalism,spotify:album:2n506u3HKN3CaEDvAjv5Ct,Never Say Never,spotify:artist:1ScZSjoYAihNNm9qlhzDnL,Vandalism,2005-10-24,https://i.scdn.co/image/ab67616d0000b273b65ad4...,...,0.000354,0.011200,0.3380,0.767,130.978,4.0,,Vicious,"C 2005 Vicious, a division of Vicious Recordin...",22
9996,spotify:track:5ydeCNaWDmFbu4zl0roPAH,Groovejet (If This Ain't Love) [feat. Sophie E...,"spotify:artist:4bmymFwDu9zLCiTRUmrewb, spotify...","Spiller, Sophie Ellis-Bextor",spotify:album:20Q3pGpYiyicF32x5L8ppH,Groovejet (If This Ain't Love) [feat. Sophie E...,spotify:artist:4bmymFwDu9zLCiTRUmrewb,Spiller,2000-08-14,https://i.scdn.co/image/ab67616d0000b27342781a...,...,0.000132,0.088900,0.3610,0.626,123.037,4.0,,Defected Records,"C © 2021 Defected Records Limited, P ℗ 2021 De...",44
9997,spotify:track:0zKbDrEXKpnExhGQRe9dxt,Lay Low,spotify:artist:2o5jDhtHVPhrJdv3cEQ99Z,Tiësto,spotify:album:0EYKSXXTsON8ZA95BuCoXn,Lay Low,spotify:artist:2o5jDhtHVPhrJdv3cEQ99Z,Tiësto,2023-01-06,https://i.scdn.co/image/ab67616d0000b273c8fdaf...,...,0.060700,0.000263,0.3460,0.420,122.060,4.0,,Musical Freedom,"C © 2023 Musical Freedom Label Ltd., P ℗ 2023 ...",85


In [12]:
df.to_csv('datasets/time_comparison_dataset.csv', index=True) # write to 'top_10000_1960-now_updated.csv'