### Spotify Initial Data Collection ###

This code takes advantage of Spotipy, a package which allows one to use Spotify's API, to gather track-level data maintained by Spotify. It produces a dataset with information on each of the top 50 most popular songs on Spotify in America in 2017, as measured by Spotify and released to the public via an [ordered playlist](https://open.spotify.com/user/spotify/playlist/37i9dQZF1DX7Axsg3uaDZb?si=Yf6l20lBTWu9BzquG35UKg) at the end of the year.

Source: Spotify Web API, [Top Tracks of 2017: USA](https://open.spotify.com/user/spotify/playlist/37i9dQZF1DX7Axsg3uaDZb?si=Yf6l20lBTWu9BzquG35UKg)

Downloaded: 11/22/2021

Srinidhi Ramakrishna

In [1]:
# Importing packages
import spotipy
import time
import pandas as pd
from spotipy.oauth2 import SpotifyClientCredentials

In [2]:
# Locating my unique client and secret ID as a developer
cid = '9809a4a6d80942d0a6e115fde747e50e'
secret = '10ff9acc3b4e4b4b984a1be5ffa16d2a'
client_credentials_manager = SpotifyClientCredentials(client_id=cid, client_secret=secret)
sp = spotipy.Spotify(client_credentials_manager = client_credentials_manager)

In [3]:
# Collecting track IDs based on the playlist URL
def getTrackIDs(user, playlist_id):
    ids = []
    playlist = sp.user_playlist(user, playlist_id)
    for item in playlist['tracks']['items']:
        track = item['track']
        ids.append(track['id'])
    return ids

ids = getTrackIDs('spotify', '37i9dQZF1DX7Axsg3uaDZb')

In [4]:
# Collecting track features for each song
def getTrackFeatures(id):
  meta = sp.track(id)
  features = sp.audio_features(id)

  # meta
  name = meta['name']
  album = meta['album']['name']
  artist = meta['album']['artists'][0]['name']
  release_date = meta['album']['release_date']
  duration_ms = meta['duration_ms']
  popularity = meta['popularity']
  explicit = meta['explicit']
    
  # features
  acousticness = features[0]['acousticness']
  danceability = features[0]['danceability']
  energy = features[0]['energy']
  instrumentalness = features[0]['instrumentalness']
  liveness = features[0]['liveness']
  loudness = features[0]['loudness']
  speechiness = features[0]['speechiness']
  tempo = features[0]['tempo']
  time_signature = features[0]['time_signature']
  positiveness = features[0]['valence']


  track = [name, album, artist, release_date, duration_ms, popularity, danceability, acousticness, danceability, energy, instrumentalness, liveness, loudness, speechiness, tempo, time_signature, positiveness, explicit]
  return track

In [5]:
# Looping over track ids to append track-level metrics in a new row 
tracks = []
for i in range(len(ids)):
  time.sleep(.5)
  track = getTrackFeatures(ids[i])
  tracks.append(track)

In [6]:
# Creating dataset
df = pd.DataFrame(tracks, columns = ['name', 'album', 'artist', 'release_date', 'duration_ms', 'popularity', 'danceability', 'acousticness', 'danceability', 'energy', 'instrumentalness', 'liveness', 'loudness', 'speechiness', 'tempo', 'time_signature', 'positiveness', 'explicit'])
df.to_csv("../data/spotify2017raw.csv", sep = ',')