# Billboard Hot 100 Turns 60: Top 600 Songs Of All Time

In honor of the 60th Anniversary of the Billboard Hot 100 Singles chart, on August 4, 2018, 
they created this [special chart of the 600 biggest songs of all time.](https://open.spotify.com/playlist/0X9hkrRqCCP69Ze1MheAda)

We'll use Spotipy library API for scraping data from Spotify. Besides extracting the 600 songs from the Billboard, we'll also pull interesting song data such as danceability, loudness, energy of each song.

In [65]:
import spotipy
from spotipy.oauth2 import SpotifyClientCredentials
from configparser import ConfigParser
import pandas as pd
from pprint import pprint
import datetime
from datetime import timedelta

## Data Extraction

In [66]:
config = ConfigParser()
config.read('notebook.cfg')
client_id = config['spotify_api']['client_id']
client_secret = config['spotify_api']['client_secret']

#Authenticating with Spotipy
client_credentials_manager = SpotifyClientCredentials(client_id = client_id, client_secret = client_secret)
sp = spotipy.Spotify(client_credentials_manager = client_credentials_manager)

### 1. Playlist Items
Since playlist_items() function is limited to 100 tracks, we'll create a custom method to pull all tracks in [the Top 600 Songs list](https://open.spotify.com/playlist/0X9hkrRqCCP69Ze1MheAda)

In [67]:
# Playlist_tracks
def get_playlist_tracks(playlist_id):
    
    pl_id = 'spotify:playlist:'+ playlist_id
    offset = 0
    print(pl_id)
    while True:
        response = sp.playlist_items(pl_id,
                                 offset=offset,
                                 fields='items(track(id,name,artists(name),album(name, album_type, release_date),popularity,duration_ms))'
                                    )
    
        if len(response['items']) == 0:
            break
        
        for item in response['items']:
            yield(item["track"])

        offset = offset + len(response['items'])

In [68]:
playlist_link = 'https://open.spotify.com/playlist/0X9hkrRqCCP69Ze1MheAda'
playlist_id = playlist_link.split("/")[-1].split("?")[0]

In [129]:
#sp.playlist_items(playlist_id)

{'href': 'https://api.spotify.com/v1/playlists/0X9hkrRqCCP69Ze1MheAda/tracks?offset=0&limit=100&additional_types=track%2Cepisode',
 'items': [{'added_at': '2018-08-08T20:04:21Z',
   'added_by': {'external_urls': {'spotify': 'https://open.spotify.com/user/sean081978'},
    'href': 'https://api.spotify.com/v1/users/sean081978',
    'id': 'sean081978',
    'type': 'user',
    'uri': 'spotify:user:sean081978'},
   'is_local': False,
   'primary_color': None,
   'track': {'album': {'album_type': 'compilation',
     'artists': [{'external_urls': {'spotify': 'https://open.spotify.com/artist/7qQJQ3YtcGlqaLg5tcypN2'},
       'href': 'https://api.spotify.com/v1/artists/7qQJQ3YtcGlqaLg5tcypN2',
       'id': '7qQJQ3YtcGlqaLg5tcypN2',
       'name': 'Chubby Checker',
       'type': 'artist',
       'uri': 'spotify:artist:7qQJQ3YtcGlqaLg5tcypN2'}],
     'available_markets': ['AD',
      'AE',
      'AG',
      'AL',
      'AM',
      'AO',
      'AR',
      'AT',
      'AU',
      'AZ',
      'BA',


In [69]:
track_list = [] 
for track in get_playlist_tracks(playlist_id):
    
    track_id = track['id']
    track_name = track['name']
     
    artist_name = track['artists'][0]['name']
   
    album = track['album']['name']
    album_type = track['album']['album_type']
    release_date = track['album']['release_date']

    track_pop = track['popularity']
    track_duration = track['duration_ms']
    
    newlist =  [track_id, track_name, track_duration, track_pop, artist_name, album, album_type, release_date ]
    
    track_list.append(newlist)

spotify:playlist:0X9hkrRqCCP69Ze1MheAda


In [29]:
#track_list

[['3ohLnESFgYACPMCkoTOzqE',
  'The Twist',
  153760,
  53,
  'Chubby Checker',
  'The Best Of Chubby Checker 1959-1963',
  'compilation',
  '2005-01-01'],
 ['0n2SEXB2qoRQg171q7XqeW',
  'Smooth (feat. Rob Thomas)',
  294986,
  69,
  'Santana',
  'Supernatural (Remastered)',
  'album',
  '1999-06-15'],
 ['3E5ndyOfO6vFDEIE42HA8o',
  'Mack the Knife',
  184333,
  61,
  'Bobby Darin',
  "That's All",
  'album',
  '1959'],
 ['32OlwWuMpZ6b0aN2RZOeMS',
  'Uptown Funk (feat. Bruno Mars)',
  269666,
  83,
  'Mark Ronson',
  'Uptown Special',
  'album',
  '2015-01-12'],
 ['7BD50ATrF3Vab5FQy7vtK8',
  'How Do I Live',
  266973,
  68,
  'LeAnn Rimes',
  'Greatest Hits',
  'compilation',
  '2003-11-18'],
 ['0IkKz2J93C94Ei4BvDop7P',
  'Party Rock Anthem',
  262173,
  69,
  'LMFAO',
  'Sorry For Party Rocking',
  'album',
  '2011-01-01'],
 ['4vp2J1l5RD4gMZwGFLfRAu',
  'I Gotta Feeling',
  289133,
  11,
  'Black Eyed Peas',
  'THE E.N.D. (THE ENERGY NEVER DIES)',
  'album',
  '2009-01-01'],
 ['2df5QsXuc

In [70]:
playlist_df = pd.DataFrame(track_list)
playlist_df.columns = ['track_id', 'track_name', 'duration', 'track_pop', 'artist', 'album', 'album_type', 'release_date']
#playlist_df.info()

### Data transformation

In [71]:
#Convert 'duration' from milliseconds to seconds.
playlist_df['duration_sec'] = pd.to_timedelta(playlist_df.duration, unit='ms')
playlist_df.duration_sec = playlist_df.duration_sec.dt.total_seconds().astype(int)
#playlist_df.head()

Another option is to convert 'duration' column to a easy-to-read format using datetime module.

In [72]:
playlist_df['duration'] = pd.to_datetime(playlist_df['duration'],
             unit='ms').dt.strftime('%M:%S:%f').str[:5]

In [73]:
columns = ['track_id', 'track_name','track_pop', 'duration','duration_sec', 'artist', 'album', 'album_type', 'release_date']
playlist_df = playlist_df[columns]

In [74]:
playlist_df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 600 entries, 0 to 599
Data columns (total 9 columns):
 #   Column        Non-Null Count  Dtype 
---  ------        --------------  ----- 
 0   track_id      600 non-null    object
 1   track_name    600 non-null    object
 2   track_pop     600 non-null    int64 
 3   duration      600 non-null    object
 4   duration_sec  600 non-null    int64 
 5   artist        600 non-null    object
 6   album         600 non-null    object
 7   album_type    600 non-null    object
 8   release_date  600 non-null    object
dtypes: int64(2), object(7)
memory usage: 42.3+ KB


### 2.Track Audio Features
In order to pull the corresponding audio features data for [the Top 600 Songs list](https://open.spotify.com/playlist/0X9hkrRqCCP69Ze1MheAda), we'll use the audio_features() function.

In [75]:
track_ids = []

for track in track_list:
    track_id = track[0]
    track_ids.append(track_id)

len(track_ids)

600

In [76]:
audiofeatures_list = [] 

for track_id in track_ids :
    features = sp.audio_features(track_id)[0]
    danc = features['danceability']
    enrg = features['energy']
    key = features['key']
    loud = features['loudness']
    mode = features['mode']
    spch = features['speechiness']
    acou = features['acousticness']
    inst = features['instrumentalness']
    live = features['liveness']
    valn = features['valence']
    temp = features['tempo']
    
    flist = [track_id, danc, enrg, key, loud, mode, spch, acou, inst, live, valn, temp]
    audiofeatures_list.append(flist)

In [82]:
features_df = pd.DataFrame(audiofeatures_list)
features_df.columns = ['track_id', 'danc', 'enrg', 'key', 'loud', 'mode', 'spch', 'acou', 'inst', 'live', 'valn', 'temp']

In [88]:
features_df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 600 entries, 0 to 599
Data columns (total 12 columns):
 #   Column    Non-Null Count  Dtype  
---  ------    --------------  -----  
 0   track_id  600 non-null    object 
 1   danc      600 non-null    float64
 2   enrg      600 non-null    float64
 3   key       600 non-null    int64  
 4   loud      600 non-null    float64
 5   mode      600 non-null    int64  
 6   spch      600 non-null    float64
 7   acou      600 non-null    float64
 8   inst      600 non-null    float64
 9   live      600 non-null    float64
 10  valn      600 non-null    float64
 11  temp      600 non-null    float64
dtypes: float64(9), int64(2), object(1)
memory usage: 56.4+ KB


In [89]:
top600_list= pd.merge(playlist_df, features_df, how="inner")

In [90]:
top600_list.info()

<class 'pandas.core.frame.DataFrame'>
Int64Index: 600 entries, 0 to 599
Data columns (total 20 columns):
 #   Column        Non-Null Count  Dtype  
---  ------        --------------  -----  
 0   track_id      600 non-null    object 
 1   track_name    600 non-null    object 
 2   track_pop     600 non-null    int64  
 3   duration      600 non-null    object 
 4   duration_sec  600 non-null    int64  
 5   artist        600 non-null    object 
 6   album         600 non-null    object 
 7   album_type    600 non-null    object 
 8   release_date  600 non-null    object 
 9   danc          600 non-null    float64
 10  enrg          600 non-null    float64
 11  key           600 non-null    int64  
 12  loud          600 non-null    float64
 13  mode          600 non-null    int64  
 14  spch          600 non-null    float64
 15  acou          600 non-null    float64
 16  inst          600 non-null    float64
 17  live          600 non-null    float64
 18  valn          600 non-null    