## Lab | Web Scraping Single Page (GNOD part 1)
Business goal:
Check the case_study_gnod.md file.

Make sure you've understood the big picture of your project:

the goal of the company (Gnod),
their current product (Gnoosic),
their strategy, and
how your project fits into this context.
Re-read the business case and the e-mail from the CTO.

Instructions - Scraping popular songs
Your product will take a song as an input from the user and will output another song (the recommendation). In most cases, the recommended song will have to be similar to the inputted song, but the CTO thinks that if the song is on the top charts at the moment, the user will also enjoy a recommendation of another song that is popular at the moment.

You have to find data on the internet about currently popular songs. Popvortex maintains a weekly Top 100 of "hot" songs here: http://www.popvortex.com/music/charts/top-100-songs.php.

It's a good place to start! Scrape the current top 100 songs and their respective artists, and put the information into a pandas dataframe.

In [107]:
from bs4 import BeautifulSoup
import requests
import pandas as pd
import numpy as np
from pandas import json_normalize
pd.set_option("display.max_columns", 0)

In [2]:
url = "https://www.popvortex.com/music/charts/top-100-songs.php"

In [3]:
# 3.0 download html with a get request
response = requests.get(url)
response.status_code # 200 status code means OK!

200

In [4]:
# 4.1 parse html (create the 'soup')
soup = BeautifulSoup(response.content, "html.parser")

In [5]:
# 4.2. check that the html code looks like it should
# soup

In [6]:
board_positions = soup.select(".cover-art.col-xs-12.col-sm-4 > p")
titles = soup.select(".chart-content.col-xs-12.col-sm-8 > p > cite")
artists = soup.select(".chart-content.col-xs-12.col-sm-8 > p > em")
    
board_position_text = [board_position.get_text() for board_position in board_positions]
title_text = [title.get_text() for title in titles]
artist_text = [artist.get_text() for artist in artists]
    
leader_board_songs_info = {
        "Ranking": board_position_text,
        "Title": title_text,
        "Artist": artist_text
    }
    
singer_songs_list = pd.DataFrame(leader_board_songs_info)
singer_songs_list.head(20)

Unnamed: 0,Ranking,Title,Artist
0,1,Lovin On Me,Jack Harlow
1,2,Lil Boo Thang,Paul Russell
2,3,I Remember Everything (feat. Kacey Musgraves),Zach Bryan
3,4,White Horse,Chris Stapleton
4,5,Save Me (with Lainey Wilson),Jelly Roll
5,6,Lovin On Me,Jack Harlow
6,7,Need A Favor,Jelly Roll
7,8,90s Rap Mashup,Austin Williams
8,9,Cruel Summer,Taylor Swift
9,10,Daylight,David Kushner


## Lab | Web Scraping Multiple Pages
### Expand the project Part 1

In [7]:
url2 = "https://www.billboard.com/charts/hot-100/"

In [8]:
response2 = requests.get(url2)
response2.status_code

200

In [9]:
soup2 = BeautifulSoup(response2.content, "html.parser")

In [10]:
titles = []
artists = []

for i in range(1, 110): 
    title_selector = f"#post-1479786 > div.pmc-paywall > div > div > div > div.chart-results-list.\/\/.lrv-u-padding-t-150.lrv-u-padding-t-050\\@mobile-max > div:nth-child({i + 1}) > ul > li.lrv-u-width-100p > ul > li.o-chart-results-list__item.\/\/.lrv-u-flex-grow-1.lrv-u-flex.lrv-u-flex-direction-column.lrv-u-justify-content-center.lrv-u-border-b-1.u-border-b-0\\@mobile-max.lrv-u-border-color-grey-light.lrv-u-padding-l-1\\@mobile-max > h3"
    artist_selector = f"#post-1479786 > div.pmc-paywall > div > div > div > div.chart-results-list.\/\/.lrv-u-padding-t-150.lrv-u-padding-t-050\\@mobile-max > div:nth-child({i + 1}) > ul > li.lrv-u-width-100p > ul > li.o-chart-results-list__item.\/\/.lrv-u-flex-grow-1.lrv-u-flex.lrv-u-flex-direction-column.lrv-u-justify-content-center.lrv-u-border-b-1.u-border-b-0\\@mobile-max.lrv-u-border-color-grey-light.lrv-u-padding-l-1\\@mobile-max > span"

    title_element = soup2.select_one(title_selector)
    artist_element = soup2.select_one(artist_selector)

    if title_element:
        title = title_element.get_text(strip=True)
        titles.append(title)

    if artist_element:
        artist = artist_element.get_text(strip=True)
        artists.append(artist)

song_list = {'Title': titles, 'Artist': artists}
df_song = pd.DataFrame(song_list, index=range(1, len(titles) + 1))
df_song

Unnamed: 0,Title,Artist
1,Cruel Summer,Taylor Swift
2,Lovin On Me,Jack Harlow
3,Paint The Town Red,Doja Cat
4,Snooze,SZA
5,Is It Over Now? (Taylor's Version) [From The V...,Taylor Swift
...,...,...
96,Mi Ex Tenia Razon,Karol G
97,Different 'Round Here,Riley Green Featuring Luke Combs
98,But I Got A Beer In My Hand,Luke Bryan
99,Better Than Ever,YoungBoy Never Broke Again & Rod Wave


In [63]:
print('Please Enter Your Favourite Song\'s Title and Artist !!!')
title = input('Title:').strip()
artist = input('Artist:').strip()

Please Enter Your Favourite Song's Title and Artist !!!
Title:Snooze
Artist:SZA


In [72]:
is_hot = False
def is_it_hot_song():
    for t,a in zip(df_song.Title, df_song.Artist):
        if (title == t) & (a== artist):
            return True

is_hot = is_it_hot_song()

if is_hot==True:
    print('Enjoy Another Hot song of billboard')
    display(df_song.sample())

Enjoy Another Hot song of billboard


Unnamed: 0,Title,Artist
31,Stick Season,Noah Kahan


## Fetch Songs from Spotify

#### Authentification

In [84]:
import spotipy
from spotipy.oauth2 import SpotifyClientCredentials

In [85]:
file = open("secrets.txt","r")
content = file.read()
secrets_dict={}
for line in content.split('\n'):
    if len(line) > 0:
#         print(line.split(':'))
        secrets_dict[line.split(':')[0]]=line.split(':')[1].strip()

#Initialize SpotiPy with user credentials
sp = spotipy.Spotify(auth_manager=SpotifyClientCredentials(client_id=secrets_dict['ClientID'],
                                                           client_secret=secrets_dict['ClientSecret']))

### Playlists

In [99]:
play_list_id = "1hMzceeWw7QiI6vaBkcEJO"

playlist = sp.user_playlist_tracks("spotify", play_list_id)
playlist

{'href': 'https://api.spotify.com/v1/playlists/1hMzceeWw7QiI6vaBkcEJO/tracks?offset=0&limit=100&additional_types=track',
 'items': [{'added_at': '2021-11-26T09:03:43Z',
   'added_by': {'external_urls': {'spotify': 'https://open.spotify.com/user/massivemusic.com'},
    'href': 'https://api.spotify.com/v1/users/massivemusic.com',
    'id': 'massivemusic.com',
    'type': 'user',
    'uri': 'spotify:user:massivemusic.com'},
   'is_local': False,
   'primary_color': None,
   'track': {'album': {'album_type': 'single',
     'artists': [{'external_urls': {'spotify': 'https://open.spotify.com/artist/7abqAQhqbQVO2WgB2twSQL'},
       'href': 'https://api.spotify.com/v1/artists/7abqAQhqbQVO2WgB2twSQL',
       'id': '7abqAQhqbQVO2WgB2twSQL',
       'name': 'MassiveMusic',
       'type': 'artist',
       'uri': 'spotify:artist:7abqAQhqbQVO2WgB2twSQL'}],
     'available_markets': ['AR',
      'AU',
      'AT',
      'BE',
      'BO',
      'BR',
      'BG',
      'CA',
      'CL',
      'CO',
     

In [100]:
playlist["total"] 

10000

In [101]:
playlist['items']

[{'added_at': '2021-11-26T09:03:43Z',
  'added_by': {'external_urls': {'spotify': 'https://open.spotify.com/user/massivemusic.com'},
   'href': 'https://api.spotify.com/v1/users/massivemusic.com',
   'id': 'massivemusic.com',
   'type': 'user',
   'uri': 'spotify:user:massivemusic.com'},
  'is_local': False,
  'primary_color': None,
  'track': {'album': {'album_type': 'single',
    'artists': [{'external_urls': {'spotify': 'https://open.spotify.com/artist/7abqAQhqbQVO2WgB2twSQL'},
      'href': 'https://api.spotify.com/v1/artists/7abqAQhqbQVO2WgB2twSQL',
      'id': '7abqAQhqbQVO2WgB2twSQL',
      'name': 'MassiveMusic',
      'type': 'artist',
      'uri': 'spotify:artist:7abqAQhqbQVO2WgB2twSQL'}],
    'available_markets': ['AR',
     'AU',
     'AT',
     'BE',
     'BO',
     'BR',
     'BG',
     'CA',
     'CL',
     'CO',
     'CR',
     'CY',
     'CZ',
     'DK',
     'DO',
     'DE',
     'EC',
     'EE',
     'SV',
     'FI',
     'FR',
     'GR',
     'GT',
     'HN',
     '

In [102]:
len(playlist["items"])

100

In [103]:
from random import randint
from time import sleep

def get_playlist_tracks(playlist_id):
    results = sp.user_playlist_tracks("spotify",playlist_id)
    tracks = results['items']
    while results['next']!=None:
        results = sp.next(results)
        tracks = tracks + results['items']
        sleep(randint(1,3000)/1000) # respectful nap
    return tracks

In [104]:
all_tracks = get_playlist_tracks(play_list_id)
len(all_tracks)

10000

In [110]:
tracks = json_normalize(all_tracks)

In [111]:
tracks.head()

Unnamed: 0,added_at,is_local,primary_color,added_by.external_urls.spotify,added_by.href,added_by.id,added_by.type,added_by.uri,track.album.album_type,track.album.artists,track.album.available_markets,track.album.external_urls.spotify,track.album.href,track.album.id,track.album.images,track.album.name,track.album.release_date,track.album.release_date_precision,track.album.total_tracks,track.album.type,track.album.uri,track.artists,track.available_markets,track.disc_number,track.duration_ms,track.episode,track.explicit,track.external_ids.isrc,track.external_urls.spotify,track.href,track.id,track.is_local,track.name,track.popularity,track.preview_url,track.track,track.track_number,track.type,track.uri,video_thumbnail.url
0,2021-11-26T09:03:43Z,False,,https://open.spotify.com/user/massivemusic.com,https://api.spotify.com/v1/users/massivemusic.com,massivemusic.com,user,spotify:user:massivemusic.com,single,[{'external_urls': {'spotify': 'https://open.s...,"[AR, AU, AT, BE, BO, BR, BG, CA, CL, CO, CR, C...",https://open.spotify.com/album/1pg4cYRkcyzktgD...,https://api.spotify.com/v1/albums/1pg4cYRkcyzk...,1pg4cYRkcyzktgD4CcX9JM,"[{'height': 640, 'url': 'https://i.scdn.co/ima...",This Magic Moment (Crisp.nl Version),2021-11-24,day,1,album,spotify:album:1pg4cYRkcyzktgD4CcX9JM,[{'external_urls': {'spotify': 'https://open.s...,"[AR, AU, AT, BE, BO, BR, BG, CA, CL, CO, CR, C...",1,188696,False,False,QZSR42108889,https://open.spotify.com/track/6uDlwsguxFDgbak...,https://api.spotify.com/v1/tracks/6uDlwsguxFDg...,6uDlwsguxFDgbakvPyhChp,False,This Magic Moment - Crisp.nl Version,19,https://p.scdn.co/mp3-preview/6906012f7064a3ad...,True,1,track,spotify:track:6uDlwsguxFDgbakvPyhChp,
1,2014-06-20T16:10:54Z,False,,https://open.spotify.com/user/massivemusic.com,https://api.spotify.com/v1/users/massivemusic.com,massivemusic.com,user,spotify:user:massivemusic.com,album,[{'external_urls': {'spotify': 'https://open.s...,"[AR, AU, AT, BE, BO, BR, BG, CA, CL, CO, CR, C...",https://open.spotify.com/album/7bPasMSBOBrih7x...,https://api.spotify.com/v1/albums/7bPasMSBOBri...,7bPasMSBOBrih7xATDtmgk,"[{'height': 640, 'url': 'https://i.scdn.co/ima...","Just a Lil' Beat, Vol. 1 (OOgo & Chomsk')",2012-05-14,day,16,album,spotify:album:7bPasMSBOBrih7xATDtmgk,[{'external_urls': {'spotify': 'https://open.s...,"[AR, AU, AT, BE, BO, BR, BG, CA, CL, CO, CR, C...",1,51373,False,False,FR0Z41200100,https://open.spotify.com/track/00LvjMnpznr4MZF...,https://api.spotify.com/v1/tracks/00LvjMnpznr4...,00LvjMnpznr4MZFSA8x9sA,False,00h00,7,https://p.scdn.co/mp3-preview/3d9a8ac204cfcba4...,True,1,track,spotify:track:00LvjMnpznr4MZFSA8x9sA,
2,2014-06-20T16:10:54Z,False,,https://open.spotify.com/user/massivemusic.com,https://api.spotify.com/v1/users/massivemusic.com,massivemusic.com,user,spotify:user:massivemusic.com,album,[{'external_urls': {'spotify': 'https://open.s...,[],https://open.spotify.com/album/19MOC02Ei3l2Sal...,https://api.spotify.com/v1/albums/19MOC02Ei3l2...,19MOC02Ei3l2SalGUGWozw,"[{'height': 640, 'url': 'https://i.scdn.co/ima...",Aftermath (Remastered),1966-04-15,day,11,album,spotify:album:19MOC02Ei3l2SalGUGWozw,[{'external_urls': {'spotify': 'https://open.s...,[],1,221533,False,False,USA176610050,https://open.spotify.com/track/3E8gEvhLia6w9lQ...,https://api.spotify.com/v1/tracks/3E8gEvhLia6w...,3E8gEvhLia6w9lQv3hfxzM,False,Under My Thumb,0,,True,4,track,spotify:track:3E8gEvhLia6w9lQv3hfxzM,
3,2014-06-20T16:10:54Z,False,,https://open.spotify.com/user/massivemusic.com,https://api.spotify.com/v1/users/massivemusic.com,massivemusic.com,user,spotify:user:massivemusic.com,compilation,[{'external_urls': {'spotify': 'https://open.s...,[],https://open.spotify.com/album/6sX0Hd7MDMIw0Xm...,https://api.spotify.com/v1/albums/6sX0Hd7MDMIw...,6sX0Hd7MDMIw0Xm4nsoAPt,"[{'height': 640, 'url': 'https://i.scdn.co/ima...",March of the Gremlins,2014-01-20,day,14,album,spotify:album:6sX0Hd7MDMIw0Xm4nsoAPt,[{'external_urls': {'spotify': 'https://open.s...,[],1,216234,False,False,DEGT81400256,https://open.spotify.com/track/2jjPlj6KFzCffJ7...,https://api.spotify.com/v1/tracks/2jjPlj6KFzCf...,2jjPlj6KFzCffJ7KVQMwHX,False,Conquer Me,0,,True,5,track,spotify:track:2jjPlj6KFzCffJ7KVQMwHX,
4,2016-10-27T09:04:03Z,False,,https://open.spotify.com/user/massivemusic.com,https://api.spotify.com/v1/users/massivemusic.com,massivemusic.com,user,spotify:user:massivemusic.com,single,[{'external_urls': {'spotify': 'https://open.s...,"[AR, AU, AT, BE, BO, BR, BG, CA, CL, CO, CR, C...",https://open.spotify.com/album/35pzihogLsPIl48...,https://api.spotify.com/v1/albums/35pzihogLsPI...,35pzihogLsPIl48ZBykD4l,"[{'height': 640, 'url': 'https://i.scdn.co/ima...",Dapper (feat. Anderson .Paak),2016-03-11,day,1,album,spotify:album:35pzihogLsPIl48ZBykD4l,[{'external_urls': {'spotify': 'https://open.s...,"[AR, AU, AT, BE, BO, BR, BG, CA, CL, CO, CR, C...",1,193093,False,True,USQX91600320,https://open.spotify.com/track/3seHx466iTcUmWE...,https://api.spotify.com/v1/tracks/3seHx466iTcU...,3seHx466iTcUmWE9dOVB3v,False,Dapper (feat. Anderson .Paak),34,https://p.scdn.co/mp3-preview/bc35e40cee1f733c...,True,1,track,spotify:track:3seHx466iTcUmWE9dOVB3v,


In [118]:
# we can even create a new column containing these DataFrames! KEEP song URI
# NOTE: need to change this function, since the columns names are now a bit different
def expand_list_dict2(row):
    df = json_normalize(row['track.artists'])
    df['song_id'] = row['track.id']
    return df

tracks['artists_dfs'] = tracks.apply(expand_list_dict2, axis=1)
tracks['artists_dfs'][1]

Unnamed: 0,href,id,name,type,uri,external_urls.spotify,song_id
0,https://api.spotify.com/v1/artists/63GCe948jqV...,63GCe948jqVvNvFB0DmuGB,Hoosky,artist,spotify:artist:63GCe948jqVvNvFB0DmuGB,https://open.spotify.com/artist/63GCe948jqVvNv...,00LvjMnpznr4MZFSA8x9sA
1,https://api.spotify.com/v1/artists/2EMmqQQmszs...,2EMmqQQmszsCXfVfMRibOQ,La Fine Equipe,artist,spotify:artist:2EMmqQQmszsCXfVfMRibOQ,https://open.spotify.com/artist/2EMmqQQmszsCXf...,00LvjMnpznr4MZFSA8x9sA
2,https://api.spotify.com/v1/artists/1mM65AjhdrT...,1mM65AjhdrTa1eDExLKRsu,oOgo,artist,spotify:artist:1mM65AjhdrTa1eDExLKRsu,https://open.spotify.com/artist/1mM65AjhdrTa1e...,00LvjMnpznr4MZFSA8x9sA
3,https://api.spotify.com/v1/artists/0CJDBEjJQYG...,0CJDBEjJQYGcyC12FI1L0b,Chomsk',artist,spotify:artist:0CJDBEjJQYGcyC12FI1L0b,https://open.spotify.com/artist/0CJDBEjJQYGcyC...,00LvjMnpznr4MZFSA8x9sA


In [119]:
# now we create a new dataframe with all these artists
artist_df = pd.DataFrame(columns=['href', 'id', 'name', 'type', 'uri'])
for mini_df in tracks['artists_dfs']:
    #display(val)
    artist_df = pd.concat([artist_df, mini_df], axis=0)
    
artist_df

Unnamed: 0,href,id,name,type,uri,external_urls.spotify,song_id
0,https://api.spotify.com/v1/artists/7abqAQhqbQV...,7abqAQhqbQVO2WgB2twSQL,MassiveMusic,artist,spotify:artist:7abqAQhqbQVO2WgB2twSQL,https://open.spotify.com/artist/7abqAQhqbQVO2W...,6uDlwsguxFDgbakvPyhChp
0,https://api.spotify.com/v1/artists/63GCe948jqV...,63GCe948jqVvNvFB0DmuGB,Hoosky,artist,spotify:artist:63GCe948jqVvNvFB0DmuGB,https://open.spotify.com/artist/63GCe948jqVvNv...,00LvjMnpznr4MZFSA8x9sA
1,https://api.spotify.com/v1/artists/2EMmqQQmszs...,2EMmqQQmszsCXfVfMRibOQ,La Fine Equipe,artist,spotify:artist:2EMmqQQmszsCXfVfMRibOQ,https://open.spotify.com/artist/2EMmqQQmszsCXf...,00LvjMnpznr4MZFSA8x9sA
2,https://api.spotify.com/v1/artists/1mM65AjhdrT...,1mM65AjhdrTa1eDExLKRsu,oOgo,artist,spotify:artist:1mM65AjhdrTa1eDExLKRsu,https://open.spotify.com/artist/1mM65AjhdrTa1e...,00LvjMnpznr4MZFSA8x9sA
3,https://api.spotify.com/v1/artists/0CJDBEjJQYG...,0CJDBEjJQYGcyC12FI1L0b,Chomsk',artist,spotify:artist:0CJDBEjJQYGcyC12FI1L0b,https://open.spotify.com/artist/0CJDBEjJQYGcyC...,00LvjMnpznr4MZFSA8x9sA
...,...,...,...,...,...,...,...
0,https://api.spotify.com/v1/artists/2I36EjIVz3v...,2I36EjIVz3vDfROgj1MfZ3,Boozoo Bajou,artist,spotify:artist:2I36EjIVz3vDfROgj1MfZ3,https://open.spotify.com/artist/2I36EjIVz3vDfR...,1PNaGC2ihDmVCldSce119E
1,https://api.spotify.com/v1/artists/1stlWvYSCm3...,1stlWvYSCm3sSEIzdKBSeY,Joe Dukie,artist,spotify:artist:1stlWvYSCm3sSEIzdKBSeY,https://open.spotify.com/artist/1stlWvYSCm3sSE...,1PNaGC2ihDmVCldSce119E
2,https://api.spotify.com/v1/artists/5N6EzjkOoyA...,5N6EzjkOoyABhNZJggeXi6,Mousse T.,artist,spotify:artist:5N6EzjkOoyABhNZJggeXi6,https://open.spotify.com/artist/5N6EzjkOoyABhN...,1PNaGC2ihDmVCldSce119E
0,https://api.spotify.com/v1/artists/2oofDquWt9t...,2oofDquWt9tMCETKAHmhlG,Mocky,artist,spotify:artist:2oofDquWt9tMCETKAHmhlG,https://open.spotify.com/artist/2oofDquWt9tMCE...,2887tlqmjY4WcyQ3Dy6dZy


In [120]:
# now we merge (join) the two dataframes, keeping only the fields we need
df_merged = pd.merge(left=tracks,
                    right=artist_df,
                    how='inner',
                    left_on='track.id',
                    right_on='song_id')
df_merged.head()

Unnamed: 0,added_at,is_local,primary_color,added_by.external_urls.spotify,added_by.href,added_by.id,added_by.type,added_by.uri,track.album.album_type,track.album.artists,track.album.available_markets,track.album.external_urls.spotify,track.album.href,track.album.id,track.album.images,track.album.name,track.album.release_date,track.album.release_date_precision,track.album.total_tracks,track.album.type,track.album.uri,track.artists,track.available_markets,track.disc_number,track.duration_ms,track.episode,track.explicit,track.external_ids.isrc,track.external_urls.spotify,track.href,track.id,track.is_local,track.name,track.popularity,track.preview_url,track.track,track.track_number,track.type,track.uri,video_thumbnail.url,artists_dfs,href,id,name,type,uri,external_urls.spotify,song_id
0,2021-11-26T09:03:43Z,False,,https://open.spotify.com/user/massivemusic.com,https://api.spotify.com/v1/users/massivemusic.com,massivemusic.com,user,spotify:user:massivemusic.com,single,[{'external_urls': {'spotify': 'https://open.s...,"[AR, AU, AT, BE, BO, BR, BG, CA, CL, CO, CR, C...",https://open.spotify.com/album/1pg4cYRkcyzktgD...,https://api.spotify.com/v1/albums/1pg4cYRkcyzk...,1pg4cYRkcyzktgD4CcX9JM,"[{'height': 640, 'url': 'https://i.scdn.co/ima...",This Magic Moment (Crisp.nl Version),2021-11-24,day,1,album,spotify:album:1pg4cYRkcyzktgD4CcX9JM,[{'external_urls': {'spotify': 'https://open.s...,"[AR, AU, AT, BE, BO, BR, BG, CA, CL, CO, CR, C...",1,188696,False,False,QZSR42108889,https://open.spotify.com/track/6uDlwsguxFDgbak...,https://api.spotify.com/v1/tracks/6uDlwsguxFDg...,6uDlwsguxFDgbakvPyhChp,False,This Magic Moment - Crisp.nl Version,19,https://p.scdn.co/mp3-preview/6906012f7064a3ad...,True,1,track,spotify:track:6uDlwsguxFDgbakvPyhChp,,...,https://api.spotify.com/v1/artists/7abqAQhqbQV...,7abqAQhqbQVO2WgB2twSQL,MassiveMusic,artist,spotify:artist:7abqAQhqbQVO2WgB2twSQL,https://open.spotify.com/artist/7abqAQhqbQVO2W...,6uDlwsguxFDgbakvPyhChp
1,2014-06-20T16:10:54Z,False,,https://open.spotify.com/user/massivemusic.com,https://api.spotify.com/v1/users/massivemusic.com,massivemusic.com,user,spotify:user:massivemusic.com,album,[{'external_urls': {'spotify': 'https://open.s...,"[AR, AU, AT, BE, BO, BR, BG, CA, CL, CO, CR, C...",https://open.spotify.com/album/7bPasMSBOBrih7x...,https://api.spotify.com/v1/albums/7bPasMSBOBri...,7bPasMSBOBrih7xATDtmgk,"[{'height': 640, 'url': 'https://i.scdn.co/ima...","Just a Lil' Beat, Vol. 1 (OOgo & Chomsk')",2012-05-14,day,16,album,spotify:album:7bPasMSBOBrih7xATDtmgk,[{'external_urls': {'spotify': 'https://open.s...,"[AR, AU, AT, BE, BO, BR, BG, CA, CL, CO, CR, C...",1,51373,False,False,FR0Z41200100,https://open.spotify.com/track/00LvjMnpznr4MZF...,https://api.spotify.com/v1/tracks/00LvjMnpznr4...,00LvjMnpznr4MZFSA8x9sA,False,00h00,7,https://p.scdn.co/mp3-preview/3d9a8ac204cfcba4...,True,1,track,spotify:track:00LvjMnpznr4MZFSA8x9sA,,...,https://api.spotify.com/v1/artists/63GCe948jqV...,63GCe948jqVvNvFB0DmuGB,Hoosky,artist,spotify:artist:63GCe948jqVvNvFB0DmuGB,https://open.spotify.com/artist/63GCe948jqVvNv...,00LvjMnpznr4MZFSA8x9sA
2,2014-06-20T16:10:54Z,False,,https://open.spotify.com/user/massivemusic.com,https://api.spotify.com/v1/users/massivemusic.com,massivemusic.com,user,spotify:user:massivemusic.com,album,[{'external_urls': {'spotify': 'https://open.s...,"[AR, AU, AT, BE, BO, BR, BG, CA, CL, CO, CR, C...",https://open.spotify.com/album/7bPasMSBOBrih7x...,https://api.spotify.com/v1/albums/7bPasMSBOBri...,7bPasMSBOBrih7xATDtmgk,"[{'height': 640, 'url': 'https://i.scdn.co/ima...","Just a Lil' Beat, Vol. 1 (OOgo & Chomsk')",2012-05-14,day,16,album,spotify:album:7bPasMSBOBrih7xATDtmgk,[{'external_urls': {'spotify': 'https://open.s...,"[AR, AU, AT, BE, BO, BR, BG, CA, CL, CO, CR, C...",1,51373,False,False,FR0Z41200100,https://open.spotify.com/track/00LvjMnpznr4MZF...,https://api.spotify.com/v1/tracks/00LvjMnpznr4...,00LvjMnpznr4MZFSA8x9sA,False,00h00,7,https://p.scdn.co/mp3-preview/3d9a8ac204cfcba4...,True,1,track,spotify:track:00LvjMnpznr4MZFSA8x9sA,,...,https://api.spotify.com/v1/artists/2EMmqQQmszs...,2EMmqQQmszsCXfVfMRibOQ,La Fine Equipe,artist,spotify:artist:2EMmqQQmszsCXfVfMRibOQ,https://open.spotify.com/artist/2EMmqQQmszsCXf...,00LvjMnpznr4MZFSA8x9sA
3,2014-06-20T16:10:54Z,False,,https://open.spotify.com/user/massivemusic.com,https://api.spotify.com/v1/users/massivemusic.com,massivemusic.com,user,spotify:user:massivemusic.com,album,[{'external_urls': {'spotify': 'https://open.s...,"[AR, AU, AT, BE, BO, BR, BG, CA, CL, CO, CR, C...",https://open.spotify.com/album/7bPasMSBOBrih7x...,https://api.spotify.com/v1/albums/7bPasMSBOBri...,7bPasMSBOBrih7xATDtmgk,"[{'height': 640, 'url': 'https://i.scdn.co/ima...","Just a Lil' Beat, Vol. 1 (OOgo & Chomsk')",2012-05-14,day,16,album,spotify:album:7bPasMSBOBrih7xATDtmgk,[{'external_urls': {'spotify': 'https://open.s...,"[AR, AU, AT, BE, BO, BR, BG, CA, CL, CO, CR, C...",1,51373,False,False,FR0Z41200100,https://open.spotify.com/track/00LvjMnpznr4MZF...,https://api.spotify.com/v1/tracks/00LvjMnpznr4...,00LvjMnpznr4MZFSA8x9sA,False,00h00,7,https://p.scdn.co/mp3-preview/3d9a8ac204cfcba4...,True,1,track,spotify:track:00LvjMnpznr4MZFSA8x9sA,,...,https://api.spotify.com/v1/artists/1mM65AjhdrT...,1mM65AjhdrTa1eDExLKRsu,oOgo,artist,spotify:artist:1mM65AjhdrTa1eDExLKRsu,https://open.spotify.com/artist/1mM65AjhdrTa1e...,00LvjMnpznr4MZFSA8x9sA
4,2014-06-20T16:10:54Z,False,,https://open.spotify.com/user/massivemusic.com,https://api.spotify.com/v1/users/massivemusic.com,massivemusic.com,user,spotify:user:massivemusic.com,album,[{'external_urls': {'spotify': 'https://open.s...,"[AR, AU, AT, BE, BO, BR, BG, CA, CL, CO, CR, C...",https://open.spotify.com/album/7bPasMSBOBrih7x...,https://api.spotify.com/v1/albums/7bPasMSBOBri...,7bPasMSBOBrih7xATDtmgk,"[{'height': 640, 'url': 'https://i.scdn.co/ima...","Just a Lil' Beat, Vol. 1 (OOgo & Chomsk')",2012-05-14,day,16,album,spotify:album:7bPasMSBOBrih7xATDtmgk,[{'external_urls': {'spotify': 'https://open.s...,"[AR, AU, AT, BE, BO, BR, BG, CA, CL, CO, CR, C...",1,51373,False,False,FR0Z41200100,https://open.spotify.com/track/00LvjMnpznr4MZF...,https://api.spotify.com/v1/tracks/00LvjMnpznr4...,00LvjMnpznr4MZFSA8x9sA,False,00h00,7,https://p.scdn.co/mp3-preview/3d9a8ac204cfcba4...,True,1,track,spotify:track:00LvjMnpznr4MZFSA8x9sA,,...,https://api.spotify.com/v1/artists/0CJDBEjJQYG...,0CJDBEjJQYGcyC12FI1L0b,Chomsk',artist,spotify:artist:0CJDBEjJQYGcyC12FI1L0b,https://open.spotify.com/artist/0CJDBEjJQYGcyC...,00LvjMnpznr4MZFSA8x9sA


In [121]:
df_merged.shape

(11846, 48)

In [122]:
df_final = df_merged[['track.name', 'name', 'song_id']]

### Audio features

In [124]:
chunks = [(i, i+100) for i in range(0, len(df_final), 100)]
# chunks
audio_features_list = []
for chunk in chunks:
    id_list100 = df_final['song_id'][chunk[0]:chunk[1]]
    audio_features_list = audio_features_list + sp.audio_features(id_list100)
    sleep(randint(1,3000)/1000)
len(audio_features_list)

11846

In [125]:
audio_features_df = json_normalize(audio_features_list)

In [126]:
audio_features_df.drop_duplicates(inplace=True) # duplicates because some songs have more artists
len(audio_features_df)

9998

In [127]:
df_w_audio_ft = pd.merge(left=df_final,
                        right=audio_features_df,
                        how='inner',
                        left_on='song_id',
                        right_on='id')
df_w_audio_ft

Unnamed: 0,track.name,name,song_id,danceability,energy,key,loudness,mode,speechiness,acousticness,instrumentalness,liveness,valence,tempo,type,id,uri,track_href,analysis_url,duration_ms,time_signature
0,This Magic Moment - Crisp.nl Version,MassiveMusic,6uDlwsguxFDgbakvPyhChp,0.823,0.671,11,-6.437,1,0.0592,0.55100,0.001440,0.3540,0.662,125.052,audio_features,6uDlwsguxFDgbakvPyhChp,spotify:track:6uDlwsguxFDgbakvPyhChp,https://api.spotify.com/v1/tracks/6uDlwsguxFDg...,https://api.spotify.com/v1/audio-analysis/6uDl...,188697,4
1,00h00,Hoosky,00LvjMnpznr4MZFSA8x9sA,0.375,0.606,4,-9.383,0,0.1970,0.33900,0.000924,0.4240,0.883,91.564,audio_features,00LvjMnpznr4MZFSA8x9sA,spotify:track:00LvjMnpznr4MZFSA8x9sA,https://api.spotify.com/v1/tracks/00LvjMnpznr4...,https://api.spotify.com/v1/audio-analysis/00Lv...,51373,4
2,00h00,La Fine Equipe,00LvjMnpznr4MZFSA8x9sA,0.375,0.606,4,-9.383,0,0.1970,0.33900,0.000924,0.4240,0.883,91.564,audio_features,00LvjMnpznr4MZFSA8x9sA,spotify:track:00LvjMnpznr4MZFSA8x9sA,https://api.spotify.com/v1/tracks/00LvjMnpznr4...,https://api.spotify.com/v1/audio-analysis/00Lv...,51373,4
3,00h00,oOgo,00LvjMnpznr4MZFSA8x9sA,0.375,0.606,4,-9.383,0,0.1970,0.33900,0.000924,0.4240,0.883,91.564,audio_features,00LvjMnpznr4MZFSA8x9sA,spotify:track:00LvjMnpznr4MZFSA8x9sA,https://api.spotify.com/v1/tracks/00LvjMnpznr4...,https://api.spotify.com/v1/audio-analysis/00Lv...,51373,4
4,00h00,Chomsk',00LvjMnpznr4MZFSA8x9sA,0.375,0.606,4,-9.383,0,0.1970,0.33900,0.000924,0.4240,0.883,91.564,audio_features,00LvjMnpznr4MZFSA8x9sA,spotify:track:00LvjMnpznr4MZFSA8x9sA,https://api.spotify.com/v1/tracks/00LvjMnpznr4...,https://api.spotify.com/v1/audio-analysis/00Lv...,51373,4
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
11841,Take It Slow - Mousse T. Remix,Boozoo Bajou,1PNaGC2ihDmVCldSce119E,0.684,0.504,9,-8.913,1,0.0801,0.09720,0.000290,0.3360,0.740,86.226,audio_features,1PNaGC2ihDmVCldSce119E,spotify:track:1PNaGC2ihDmVCldSce119E,https://api.spotify.com/v1/tracks/1PNaGC2ihDmV...,https://api.spotify.com/v1/audio-analysis/1PNa...,216601,4
11842,Take It Slow - Mousse T. Remix,Joe Dukie,1PNaGC2ihDmVCldSce119E,0.684,0.504,9,-8.913,1,0.0801,0.09720,0.000290,0.3360,0.740,86.226,audio_features,1PNaGC2ihDmVCldSce119E,spotify:track:1PNaGC2ihDmVCldSce119E,https://api.spotify.com/v1/tracks/1PNaGC2ihDmV...,https://api.spotify.com/v1/audio-analysis/1PNa...,216601,4
11843,Take It Slow - Mousse T. Remix,Mousse T.,1PNaGC2ihDmVCldSce119E,0.684,0.504,9,-8.913,1,0.0801,0.09720,0.000290,0.3360,0.740,86.226,audio_features,1PNaGC2ihDmVCldSce119E,spotify:track:1PNaGC2ihDmVCldSce119E,https://api.spotify.com/v1/tracks/1PNaGC2ihDmV...,https://api.spotify.com/v1/audio-analysis/1PNa...,216601,4
11844,Birds of a Feather,Mocky,2887tlqmjY4WcyQ3Dy6dZy,0.677,0.410,9,-10.598,0,0.0400,0.59100,0.103000,0.0813,0.515,81.924,audio_features,2887tlqmjY4WcyQ3Dy6dZy,spotify:track:2887tlqmjY4WcyQ3Dy6dZy,https://api.spotify.com/v1/tracks/2887tlqmjY4W...,https://api.spotify.com/v1/audio-analysis/2887...,274013,4
