# Working with spotify web api and web scraping

In this notebook the spotify web api will be used with the help of spotipy, a special spotify api package.

The spotify web api will provide all necessary data that will build up to the latest music datasets in 2020


**Following steps are included in the notebook:**

 - web api calls with spotypy
 - extracting data from the api as json
 - extracting data from web application with selenium
 - building pandas DataFrame from the web api data
 - storing the data into csv files for later analysis

.
**Goal**

On later analysis will be focused on the most popular tracks in 2020. \
Means, for the dataset we need spotify playlist for most streamed track and artist, globally and for germany.

Track information and artist information are necessary. Also the popularity value, a spotify index, is key for later analysis.\

Although the spotify audio features, values from their AI analysis, which will index specific features of how the music sound (e.g acoustic, danceability), will be gathered and stored for each track id.  

Official Spotify api wont give any counts on play on a track, its not supported because of business related issues.

Plays on a track are visible at the spotify web app, so it can get gathered through web scraping, which wil be done also in this notebook.





# Preparation

Importing all key librarys.
- spotypy , for working with the spotify API, also the Oauth Handler for access
- json, to read json strings
- making DataFrames width to maximum with ipython.display


In [31]:
# import 
import pandas as pd
import requests
import spotipy 
import json
from spotipy.oauth2 import SpotifyClientCredentials

In [32]:
# showing dataframes wihtout limitation
from IPython.display import display
pd.set_option('display.max_rows', None)
pd.set_option('display.max_columns', None)
pd.set_option('display.width', None)
pd.set_option('display.max_colwidth', None)

For working with the api the cliend id and secret token have to be stored for later uses. \
The variable "sp" will store the api acces.

In [33]:
# getting client 

client_id = 'f17e478090d248869355d07445e4ed15'

secret = '882dabecb5484e8c9fcf34c63151a5a7'

manager = SpotifyClientCredentials(client_id=client_id,client_secret=secret)
response = requests.get('https://api.spotify.com/v1')
response
# set up spotify callable
sp = spotipy.Spotify(client_credentials_manager=manager)

## Web scraping

Using Selinium and ChromeWebDriver for getting Track playcounts, which aren´t supported by the spotify api.
I tried to use BeautifulSoup for this but it was more difficult than writing 3 lines in Selenium.

For this case the spotify web app will be scraped.

In [5]:
import urllib.request
import requests
import time
from bs4 import BeautifulSoup
import re
from selenium import webdriver
from selenium.webdriver.common.by import By
from selenium.webdriver.common.keys import Keys
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support.expected_conditions import presence_of_element_located

options = webdriver.ChromeOptions()
options.add_argument('--ignore-certificate-errors')
options.add_argument("--test-type")

For the beginning i will set up  manual scraping code for getting a artist top track and with name of the track and plays on the track.

- First the selenium webdriver have to be started. 

- For a reusable code the artist url will be stored as reusable variable. 

- The final callable is driver.get(url1). This will open the chrome website via selenium. 

A little timer is placed, that the webbrowser have time to open the actual link, before the following code is executed. This is important or click actions cant be performed.

## Building elements

Single elements of the scraper will be build herer.

In [7]:
#start web driver
driver= webdriver.Chrome(executable_path=r"C:\Users\name1\Downloads\chromedriver_win32\chromedriver.exe",chrome_options=options) #Path to Chrome Driver

# setting up 
url = 'https://open.spotify.com'


# artist id
# example "The Weekend" a top artist
artist_id = '1Xyo4u8uXC1ZmMpatF05PJ'





  


In [14]:
def go():
    
     #start web driver
    driver= webdriver.Chrome(executable_path=r"C:\Users\name1\Downloads\chromedriver_win32\chromedriver.exe",chrome_options=options) #Path to Chrome Driver

    url = 'https://open.spotify.com'
    #dynamic driver call
    url1 =url+'/artist/'+ artist_id
    driver.get(url1)
    
    time.sleep(1)

    #click cockies agreement
    
    driver.find_element_by_id('onetrust-accept-btn-handler').click()
    #time.sleep(10)
    #click "show more " to get all top track
    
    time.sleep(1)
    
    driver.find_element_by_class_name('_8eaf9c7e0fab279ec19ca81d822db2ad-scss').click()
    
    
    vis = driver.find_elements_by_class_name("_85aaee9fc23ca61102952862a10b544c-scss")
    pretty = 'pretty'
    return vis, pretty

In [20]:
gogo = go()


  after removing the cwd from sys.path.


In [27]:
for i in gogo:
    print(i)

[<selenium.webdriver.remote.webelement.WebElement (session="345f5247ce44597535eafd8971a33430", element="48491b92-1433-4de9-a3e4-d9316a88de3d")>]
pretty


Now cookies have to be accepted. The accept button gets clicked with selenium. Afterwards the "show more" button have to be clicked that all tracks are expanded and we can gather all the track data from the browser.

In [None]:
#click cockies agreement
driver.find_element_by_id('onetrust-accept-btn-handler').click()


In [None]:
#click "show more " to get all top track
        driver.find_element_by_class_name('_8eaf9c7e0fab279ec19ca81d822db2ad-scss').click()

In [None]:
driver.find_elements_by_class_name("85aaee9fc23ca61102952862a10b544c-scss")


In [None]:
"""s= driver.find_elements_by_css_selector("h1")
for i in s:
    print(i.text)
    """


In [None]:
# Get element with tag name 'div'
element = driver.find_element(By.TAG_NAME, 'div')

# Get all the elements available with tag name 'p'
#elements = element.find_elements(By.TAG_NAME, '')


To get Track names with releated playcounts selenium will load the webpage with the artist id.
Then the html class names get triggered with the find_elemnts_by_class_name() method.

Through the chrome browser the different class names got inspected and filled into the driver method.



In [None]:
# get top track playcount
playcount = driver.find_elements_by_class_name("d47b790d001ed769adcd9ddfc0e83acc-scss")
"""
#testing 
for i in playcount:
    print(i.text)
    break"""
# get top tracks names
track_names=driver.find_elements_by_class_name('da0bc4060bb1bdb4abb8e402916af32e-scss')

In [None]:
# get all playcounts
cts=0
for i in range(len(playcount)):
    print(track_names[cts].text,playcount[cts].text)
    cts=cts+1

In [None]:
# get reusable playcount code
cts1=0
lst= []
for i in range(len(playcount)):
    track_name = track_names[cts1].text
    plays = playcount[cts1].text
    lst.append({'track':track_name,
               'plays':plays})
    
    cts1=cts1+1

##### Final test code 
This test code will get a specefied track name from selenium webdriver. Then it will gather the related playcount.

In [None]:
#test code, whith the Track "The Hills" from the artist
cts2=0
for i in range(len(playcount)):
    nam = track_names[cts2].text
    str1= 'The Hills'
    if str1 in nam:
        plays = playcount[cts2].text
        print(plays)
    
        
    cts2=cts2+1


For testing i simulate 3 calls through the spotify web api.
The track and artist id will be directly from the api. Based on both id values, selenium will search the related track plays.


In [None]:

plist= sp.playlist('37i9dQZF1DX4HROODZmf5u')
plist['tracks']['items'][0]['track']['artists'][0]

In [None]:
artid= sp.artist('1Xyo4u8uXC1ZmMpatF05PJ')
artid['followers']['total']
artid['name']

In [None]:
    
for i in range(5):
    try:
            print('3')
            try:
                
                print('4')
                
                try:
                    
                    print('5')
                
                except:
                    
                    print('1')
            
            except:
                
                print('2')
        
    except:
            
        print('6')
    

## Sandbox for testing


In [None]:
#### artist_get final test code 
def artist_get(artist_id):
    artistget =sp.artist(artist_id)
    gen_artist = artistget['genres']
    gen_fl = artistget['followers']['total']
    lens= len(gen_artist)
    print(artistget)
    follower = []
    print(gen_artist)
    
    if lens == 3:
        genre1 = gen_artist[0]
        genre2 = gen_artist[1]
        genre3 = gen_artist[2]
    
    
       
#,genre2,genre3  
# for the weeknd
artist_get('1Xyo4u8uXC1ZmMpatF05PJ')


In [None]:

def artist_get(artist_id):
    artistget =sp.artist(artist_id)
    gen_artist = enumerate(artistget['genres'])
    gen_fl = artistget['followers']['total']
    genre12 = []
    

    
#,genre2,genre3  
# for the weeknd
#genre1,genre2,genre3,gen_fl = artist_get('1Xyo4u8uXC1ZmMpatF05PJ')
#genre1),genre2,genre3,gen_fl 


In [None]:
genre12

In [None]:

def artist_get(artist_id):
 
    artistget =sp.artist(artist_id)
    gen_artist = artistget['genres']
    lens = len(artistget['genres'])
    gen_fl = artistget['followers']['total']
    #print(gen_artist)
    print(len(artistget['genres']))
    print(gen_artist[0])
    if lens >= 3:
        genre1 = gen_artist[0]
        genre2 = gen_artist[1]
        genre3 = gen_artist[2]
        return genre1,genre2,genre3
    if lens == 2:
        genre1 = gen_artist[0]
        genre2 = gen_artist[1]
        return genre1,genre2
    if lens == 1:
        genre1 = gen_artist[0]
        return genre1
    else:
        print('wrong genre get')
#,genre2,genre3  
# for the weeknd
#genre1,genre2,genre3,gen_fl = artist_get('1Xyo4u8uXC1ZmMpatF05PJ')
#genre1),genre2,genre3,gen_fl 
# api call
plist = sp.playlist('37i9dQZF1DX4HROODZmf5u')
    
#count of tracks
total_tracks = plist['tracks']['total']
print(total_tracks)
#emtpy list
tracks_list= []
#count 
cts = 0
for i in range(3):
    aid= plist['tracks']['items'][cts]['track']['artists'][0]['name']
    artist_get(aid)

    
    
    
    cts = cts+1

# Final data gathering by scraping and web api

Now the scraping tool is working fine and api plus scraping data can be work together. Both data sources can be stored in a dataframe. \

Lets now define a function that only needs playlist id to work.

This function will call the spotify web api get track information and audio features.
Also it will gather the artist id within each top track. This artist id will be used to scrape with the selenium code. 
Following data need to be stored for analysis.

- playlist name
- playlist description
- track id
- track name
- track plays
- artist name
- album
- explicit of track content
- popularity ( measured by spotify´s algorithm)
- duration of track length
- 

Below i wrote a function that will automaticly gather track, artist and audio_feature data from different api endpoints.
The spotify data structure was a bit difficult to read especially the artist elements were on different spots.
This function provide data gathering also if different artist worked one track. \7



All data gets stored in one list and this list will be stored as pandas DataFrame at the end of the function.

he paylist_tracks function only needs the playlist id as string, to work.

 

## Final Code (only change when needed)

Below you find the findal function which scrapes the spotify API and spotify web app.

Do not chnage this code.

Final call: **playlist_tracks(id)**



In [79]:
###!!!!!!!!Final Code gathers all tracks

#function for get playlist tracks with audiofeatures

def playlist_tracks(id):
    # api call
    plist = sp.playlist(id)
    
    #count of tracks
    total_tracks = plist['tracks']['total']
    print(total_tracks)
    #emtpy list
    tracks_list= []
    #count 
    cts = 0
    
    #########################################################################
    # get artist genre follower
    
    
    def artist_get(artist_id):
 
        artistget =sp.artist(artist_id)
        gen_artist = artistget['genres']
        lens = len(artistget['genres'])
        followers_a = artistget['followers']['total']
        #print(gen_artist)
        
      
        if lens >= 3:
            genre1 = gen_artist[0]
            genre2 = gen_artist[1]
            genre3 = gen_artist[2]
            #return genre1,genre2,genre3
        if lens == 2:
            genre1 = gen_artist[0]
            genre2 = gen_artist[1]
            genre3 = np.nan
            #return genre1,genre2,genre3
        if lens == 1:
            genre1 = gen_artist[0]
            genre2 = np.nan
            genre3 = np.nan
            #return genre1,genre2,genre3
        else:
            print('wrong genre get')
        return genre1,genre2,genre3,followers_a       
    ##################################################################################################
    #audiofeatures function from spotify API
    
    def audiof(id):
        feat=sp.audio_features(id)
        return feat
    ###################################################################################################
    # join strings function for concenate different artist to one cell
    def tostr(data, sep):
       # Join all the strings in list
        string = sep.join(data)
        return string
    
    ##################################################################################################
    # getting plays from selenium
    def plays_get(artist_id):
    
        try:
        #start web driver
            driver= webdriver.Chrome(executable_path=r"C:\Users\name1\Downloads\chromedriver_win32\chromedriver.exe",chrome_options=options) #Path to Chrome Driver

            url = 'https://open.spotify.com'
            #dynamic driver call
            url1 =url+'/artist/'+ artist_id
            driver.get(url1)
            time.sleep(2)
        except:
            print('driver or url wrong')
        try:
            #click cockies agreement
            driver.find_element_by_id('onetrust-accept-btn-handler').click()
            #time.sleep(10)
            #click "show more " to get all top track
            driver.find_element_by_class_name('_8eaf9c7e0fab279ec19ca81d822db2ad-scss').click()
        except:
            print('selenium clicks are wrong')
        
        # get top tracks names
        track_names_driver=driver.find_elements_by_class_name('da0bc4060bb1bdb4abb8e402916af32e-scss')
        # get top track playcount
        playcount = driver.find_elements_by_class_name("d47b790d001ed769adcd9ddfc0e83acc-scss")
        #time.sleep(1)
    
        #get playcounts
            #print(' selenium didnt find top tracks element in webpage')
    
        listr = driver.find_elements_by_class_name("_85aaee9fc23ca61102952862a10b544c-scss")
        countsonplays1 = []
        ctl11 = 0
        for i in range(10):
             
            nms = track_names_driver[ctl11].text
            print('selenium',nms)
            print('api',title)
            
        
            try:
                if title in nms:
                    f = playcount[ctl11].text
                    countsonplays1.append(f)
                    print(f)
                    #return f
                    ctl11 = 0
                    break
                else:
                    f = np.nan
                    #f = np.nan
                    print('plays not grabbed for', nms)
                    #return e
                    ctl11 = ctl11+1
            

            except:
                f = np.nan
                
        return f,#listr.text
            
            #ctl11 = ctl11+1
    
        
    
       
        #return f
    
    ####################################################################################################################
    
    ##################################################################################################################
    # playlists grabber from Spotify API
    # produces variables for selenium scraper, title
    #for testint ONLY , ONLY 5 API calls
   # for i in range(2):
    for i in range(total_tracks):
        print('iteration: ',i+1,plist['tracks']['items'][cts]['track']['artists'][0]['name'] )
        
       
        # playlist information
        playlistname = plist['name']
        descr = plist['description']
        #id, track title, popularity, artist, explicit
        ids = plist['tracks']['items'][cts]['track']['id']
        title = plist['tracks']['items'][cts]['track']['name']
        
        
        album = plist['tracks']['items'][cts]['track']['album']['name']
        pop = plist['tracks']['items'][cts]['track']['popularity']
        artist = plist['tracks']['items'][cts]['track']['artists'][0]['name']
        artist_id = plist['tracks']['items'][cts]['track']['artists'][0]['id']
        expl = plist['tracks']['items'][cts]['track']['explicit']
        
        
        #get play per function
        playscounts = str(plays_get(artist_id)[0]).replace('.',"")
        #listeners = plays_get(artist_id)[1]
        ######################################################################
   
        ######################################################################
        # if more than 1 artist
        lens = len(plist['tracks']['items'][cts]['track']['artists'])
        if lens >= 2:
            ct1 = 0
            artist1 =[]
            for x in range(lens):
                artist = plist['tracks']['items'][cts]['track']['artists'][ct1]['name']
                artist1.append(artist)
                ct1= ct1+1
        else:
            artist1=[]
            artist2 = plist['tracks']['items'][cts]['track']['artists'][0]['name']
            artist1.append(artist2)
        ######################################################################
        #audio features
        feats = audiof(ids)[0]
        dur = feats['duration_ms']
        dance =feats['danceability']
        energy =feats['energy']
        key = feats['key']
        loud= feats['loudness']
        mode = feats['mode']
        speech = feats['speechiness']
        acoustic = feats['acousticness']
        inst = feats['instrumentalness']
        live = feats['liveness']
        val = feats['valence']
        tempo = feats['tempo']
        sig = feats['time_signature']
        
        
        
        
        #build the later dataframe
        tracks_list.append({
                    'playlist':str(playlistname),
                    'description':str(descr),
                    'track_id':str(ids),
                    'title':str(title),
                    #get plays from selenium, deleting point seperator for clean integer
                    'plays':playscounts,
                    'album':str(album),
                    'artist/s':tostr(artist1,", "),
                    'artist_id':artist_id,
                    'popularity_track':int(pop),
                    #'listeners':str(listeners),
            
                    #objectrom get_artist function
                    'genre1': artist_get(artist_id)[0],
                    'genre2':artist_get(artist_id)[1],
                    'genre3':artist_get(artist_id)[2],
                    'artist_followers':artist_get(artist_id)[3],
            
            
            
                    'explicit':int(expl),
                    'duration':int(dur),
                    'danceability':dance,
                    'energy':energy,
                    'key':key,
                    'loudness':loud,
                    'mode':mode,
                    'speechiness':speech,
                    'acousticness':acoustic,
                    'instrumentalness':inst,
                    'liveness':live,
                    'valence':val,
                    'tempo':tempo,
                    'time_signature':sig
                                        
                   })
        print(' ')
        #increase count
        cts = cts+1
        driver.quit()
    df = pd.DataFrame(tracks_list)
    return df
    

## Test Code

Below is the test code. It runs and gets 5 results.

Final call:playlist_tracks_test(id)


In [68]:
###!!!!!!!!TEST

#function for get playlist tracks with audiofeatures

def playlist_tracks_test(id):
    # api call
    plist = sp.playlist(id)
    
    #count of tracks
    total_tracks = plist['tracks']['total']
    print(total_tracks)
    #emtpy list
    tracks_list= []
    #count 
    cts = 0
    
    #########################################################################
    # get artist genre follower
    
    
    def artist_get(artist_id):
 
        artistget =sp.artist(artist_id)
        gen_artist = artistget['genres']
        lens = len(artistget['genres'])
        followers_a = artistget['followers']['total']
        #print(gen_artist)
        
      
        if lens >= 3:
            genre1 = gen_artist[0]
            genre2 = gen_artist[1]
            genre3 = gen_artist[2]
            #return genre1,genre2,genre3
        if lens == 2:
            genre1 = gen_artist[0]
            genre2 = gen_artist[1]
            genre3 = np.an
            #return genre1,genre2,genre3
        if lens == 1:
            genre1 = gen_artist[0]
            genre2 = np.nan
            genre3 = np.nan
            #return genre1,genre2,genre3
        else:
            print('wrong genre get')
        return genre1,genre2,genre3,followers_a       
    ##################################################################################################
    #audiofeatures function from spotify API
    
    def audiof(id):
        feat=sp.audio_features(id)
        return feat
    ###################################################################################################
    # join strings function for concenate different artist to one cell
    def tostr(data, sep):
       # Join all the strings in list
        string = sep.join(data)
        return string
    
    ##################################################################################################
    # getting plays from selenium
    def plays_get(artist_id):
    
        try:
        #start web driver
            driver= webdriver.Chrome(executable_path=r"C:\Users\name1\Downloads\chromedriver_win32\chromedriver.exe",chrome_options=options) #Path to Chrome Driver

            url = 'https://open.spotify.com'
            #dynamic driver call
            url1 =url+'/artist/'+ artist_id
            driver.get(url1)
            time.sleep(2)
        except:
            print('driver or url wrong')
        try:
            #click cockies agreement
            driver.find_element_by_id('onetrust-accept-btn-handler').click()
            #time.sleep(10)
            #click "show more " to get all top track
            driver.find_element_by_class_name('_8eaf9c7e0fab279ec19ca81d822db2ad-scss').click()
        except:
            print('selenium clicks are wrong')
        
        # get top tracks names
        track_names_driver=driver.find_elements_by_class_name('da0bc4060bb1bdb4abb8e402916af32e-scss')
        # get top track playcount
        playcount = driver.find_elements_by_class_name("d47b790d001ed769adcd9ddfc0e83acc-scss")
        #time.sleep(1)
    
        #get playcounts
            #print(' selenium didnt find top tracks element in webpage')
    
        listr = driver.find_elements_by_class_name("_85aaee9fc23ca61102952862a10b544c-scss")
        countsonplays1 = []
        ctl11 = 0
        for i in range(10):
             
            nms = track_names_driver[ctl11].text
            print('selenium',nms)
            print('api',title)
            
        
            try:
                if title in nms:
                    f = playcount[ctl11].text
                    countsonplays1.append(f)
                    print(f)
                    #return f
                    ctl11 = 0
                    break
                else:
                    #e = 'NaN'
                    print('plays not grabbed for', nms)
                    #return e
                    ctl11 = ctl11+1
            

            except:
                e = 'NaN'
                #return e
        return f,#listr.text
            
            #ctl11 = ctl11+1
    
        
    
       
        #return f
    
    ####################################################################################################################
    
    ##################################################################################################################
    # playlists grabber from Spotify API
    # produces variables for selenium scraper, title
    #for testint ONLY , ONLY 5 API calls
    for i in range(2):
    #for i in range(total_tracks):
        print('iteration: ',i+1,plist['tracks']['items'][cts]['track']['artists'][0]['name'] )
        
       
        # playlist information
        playlistname = plist['name']
        descr = plist['description']
        #id, track title, popularity, artist, explicit
        ids = plist['tracks']['items'][cts]['track']['id']
        title = plist['tracks']['items'][cts]['track']['name']
        
        
        album = plist['tracks']['items'][cts]['track']['album']['name']
        pop = plist['tracks']['items'][cts]['track']['popularity']
        artist = plist['tracks']['items'][cts]['track']['artists'][0]['name']
        artist_id = plist['tracks']['items'][cts]['track']['artists'][0]['id']
        expl = plist['tracks']['items'][cts]['track']['explicit']
        
        
        #get play per function
        playscounts = str(plays_get(artist_id)[0]).replace('.',"")
        #listeners = plays_get(artist_id)[1]
        ######################################################################
   
        ######################################################################
        # if more than 1 artist
        lens = len(plist['tracks']['items'][cts]['track']['artists'])
        if lens >= 2:
            ct1 = 0
            artist1 =[]
            for x in range(lens):
                artist = plist['tracks']['items'][cts]['track']['artists'][ct1]['name']
                artist1.append(artist)
                ct1= ct1+1
        else:
            artist1=[]
            artist2 = plist['tracks']['items'][cts]['track']['artists'][0]['name']
            artist1.append(artist2)
        ######################################################################
        #audio features
        feats = audiof(ids)[0]
        dur = feats['duration_ms']
        dance =feats['danceability']
        energy =feats['energy']
        key = feats['key']
        loud= feats['loudness']
        mode = feats['mode']
        speech = feats['speechiness']
        acoustic = feats['acousticness']
        inst = feats['instrumentalness']
        live = feats['liveness']
        val = feats['valence']
        tempo = feats['tempo']
        sig = feats['time_signature']
        
        
        
        
        #build the later dataframe
        tracks_list.append({
                    'playlist':str(playlistname),
                    'description':str(descr),
                    'track_id':str(ids),
                    'title':str(title),
                    #get plays from selenium, deleting point seperator for clean integer
                    'plays':playscounts,
                    'album':str(album),
                    'artist/s':tostr(artist1,", "),
                    'artist_id':artist_id,
                    'popularity_track':int(pop),
                    #'listeners':str(listeners),
            
                    #objectrom get_artist function
                    'genre1': artist_get(artist_id)[0],
                    'genre2':artist_get(artist_id)[1],
                    'genre3':artist_get(artist_id)[2],
                    'artist_followers':artist_get(artist_id)[3],
            
            
            
                    'explicit':int(expl),
                    'duration':int(dur),
                    'danceability':dance,
                    'energy':energy,
                    'key':key,
                    'loudness':loud,
                    'mode':mode,
                    'speechiness':speech,
                    'acousticness':acoustic,
                    'instrumentalness':inst,
                    'liveness':live,
                    'valence':val,
                    'tempo':tempo,
                    'time_signature':sig
                                        
                   })
        print(' ')
        #increase count
        cts = cts+1
    
    df = pd.DataFrame(tracks_list)
    return df
    driver.quit()

In [69]:
best_tracks_ger1 = playlist_tracks_test('37i9dQZF1DX4HROODZmf5u')
best_tracks_ger1.head()

50
iteration:  1 The Weeknd




selenium Blinding Lights
api Blinding Lights
1.747.226.514
wrong genre get
wrong genre get
wrong genre get
wrong genre get
 
iteration:  2 Apache 207
selenium Roller
api Roller
235.689.262
 


Unnamed: 0,playlist,description,track_id,title,plays,album,artist/s,artist_id,popularity_track,genre1,genre2,genre3,artist_followers,explicit,duration,danceability,energy,key,loudness,mode,speechiness,acousticness,instrumentalness,liveness,valence,tempo,time_signature
0,Top Tracks 2020 Deutschland,Die meistgestreamten Tracks 2020 in Deutschland. Cover: The Weeknd,0VjIjW4GlUZAMYd2vXMi3b,Blinding Lights,1747226514,After Hours,The Weeknd,1Xyo4u8uXC1ZmMpatF05PJ,97,canadian contemporary r&b,canadian pop,pop,26634329,0,200040,0.514,0.73,1,-5.934,1,0.0598,0.00146,9.5e-05,0.0897,0.334,171.005,4
1,Top Tracks 2020 Deutschland,Die meistgestreamten Tracks 2020 in Deutschland. Cover: The Weeknd,6hw1Sy9wZ8UCxYGdpKrU6M,Roller,235689262,Platte,Apache 207,1qQLhymHXFPtP5U8KNKsm6,73,german hip hop,,,1322493,1,157093,0.941,0.758,10,-6.47,0,0.17,0.0256,0.00258,0.193,0.683,128.017,4


In [None]:
 
# most streamed artists in germany, male and female
#https://open.spotify.com/playlist/37i9dQZF1DWTdV9tXbHOAv?si=A9OeQAOuSmWhOki7-kdiqg
most_artist_ger_test= playlist_tracks_test('37i9dQZF1DWTdV9tXbHOAv')



In [None]:
most_artist_ger_test.head()

In [None]:
most_artist_ger_test.info()

In [None]:
most_artist_ger.head(5)

## Getting the data

All datasets can be build up from scratch by calling the functions with the playlist id.

In [80]:
# top global track 2020
#https://open.spotify.com/playlist/37i9dQZF1DX7Jl5KP2eZaS?si=T0vjmXV7TbqJ6yakXhikWw

most_tracks_glob = playlist_tracks('37i9dQZF1DX7Jl5KP2eZaS')


50
iteration:  1 The Weeknd




selenium Blinding Lights
api Blinding Lights
1.747.226.514
wrong genre get
wrong genre get
wrong genre get
wrong genre get
 
iteration:  2 Tones And I
selenium Dance Monkey
api Dance Monkey
1.986.783.025
 
iteration:  3 Roddy Ricch
selenium The Box
api The Box
1.084.828.362
wrong genre get
wrong genre get
wrong genre get
wrong genre get
 
iteration:  4 SAINt JHN
selenium Roses - Imanbek Remix
api Roses - Imanbek Remix
1.080.664.694
wrong genre get
wrong genre get
wrong genre get
wrong genre get
 
iteration:  5 Dua Lipa
selenium UN DIA (ONE DAY) (Feat. Tainy)
api Don't Start Now
plays not grabbed for UN DIA (ONE DAY) (Feat. Tainy)
selenium Prisoner (feat. Dua Lipa)
api Don't Start Now
plays not grabbed for Prisoner (feat. Dua Lipa)
selenium Levitating (feat. DaBaby)
api Don't Start Now
plays not grabbed for Levitating (feat. DaBaby)
selenium Don't Start Now
api Don't Start Now
1.232.173.577
wrong genre get
wrong genre get
wrong genre get
wrong genre get
 
iteration:  6 DaBaby
selenium R

wrong genre get
wrong genre get
wrong genre get
wrong genre get
 
iteration:  33 BTS
selenium Life Goes On
api Dynamite
plays not grabbed for Life Goes On
selenium Dynamite
api Dynamite
519.279.843
wrong genre get
wrong genre get
wrong genre get
wrong genre get
 
iteration:  34 BENEE
selenium Supalonely (feat. Gus Dapperton)
api Supalonely (feat. Gus Dapperton)
475.537.409
wrong genre get
wrong genre get
wrong genre get
wrong genre get
 
iteration:  35 Surf Mesa
selenium ily (i love you baby) (feat. Emilee)
api ily (i love you baby) (feat. Emilee)
474.462.784
 
iteration:  36 Lady Gaga
selenium Shallow
api Rain On Me (with Ariana Grande)
plays not grabbed for Shallow
selenium Rain On Me (with Ariana Grande)
api Rain On Me (with Ariana Grande)
467.375.493
wrong genre get
wrong genre get
wrong genre get
wrong genre get
 
iteration:  37 Travis Scott
selenium goosebumps
api HIGHEST IN THE ROOM
plays not grabbed for goosebumps
selenium HIGHEST IN THE ROOM
api HIGHEST IN THE ROOM
789.747.737

In [82]:
most_tracks_glob.head(50)

Unnamed: 0,playlist,description,track_id,title,plays,album,artist/s,artist_id,popularity_track,genre1,genre2,genre3,artist_followers,explicit,duration,danceability,energy,key,loudness,mode,speechiness,acousticness,instrumentalness,liveness,valence,tempo,time_signature
0,Top Tracks of 2020,The Global Top Tracks of 2020. Cover: The Weeknd,0VjIjW4GlUZAMYd2vXMi3b,Blinding Lights,1747226514.0,After Hours,The Weeknd,1Xyo4u8uXC1ZmMpatF05PJ,97,canadian contemporary r&b,canadian pop,pop,26634329,0,200040,0.514,0.73,1,-5.934,1,0.0598,0.00146,9.5e-05,0.0897,0.334,171.005,4
1,Top Tracks of 2020,The Global Top Tracks of 2020. Cover: The Weeknd,1rgnBhdG2JDFTbYkYRZAku,Dance Monkey,1986783025.0,Dance Monkey,Tones And I,2NjfBq1NflQcKSeiDooVjY,71,australian pop,,,2435056,0,209755,0.825,0.593,6,-6.401,0,0.0988,0.688,0.000161,0.17,0.54,98.078,4
2,Top Tracks of 2020,The Global Top Tracks of 2020. Cover: The Weeknd,0nbXyq5TXYPCO7pr3N8S4I,The Box,1084828362.0,Please Excuse Me For Being Antisocial,Roddy Ricch,757aE44tKEUQEqRuT6GnEB,90,melodic rap,rap,trap,4991249,1,196653,0.896,0.586,10,-6.687,0,0.0559,0.104,0.0,0.79,0.642,116.971,4
3,Top Tracks of 2020,The Global Top Tracks of 2020. Cover: The Weeknd,2Wo6QQD1KMDWeFkkjLqwx5,Roses - Imanbek Remix,1080664694.0,Roses (Imanbek Remix),"SAINt JHN, Imanbek",0H39MdGGX6dbnnQPt6NQkZ,76,melodic rap,pop rap,rap,597487,1,176219,0.785,0.721,8,-5.457,1,0.0506,0.0149,0.00432,0.285,0.894,121.962,4
4,Top Tracks of 2020,The Global Top Tracks of 2020. Cover: The Weeknd,3PfIrDoz19wz7qK7tYeu62,Don't Start Now,1232173577.0,Future Nostalgia,Dua Lipa,6M2wZ9GZgrQXHCFfjv46we,85,dance pop,pop,pop dance,21383592,0,183290,0.793,0.793,11,-4.521,0,0.083,0.0123,0.0,0.0951,0.679,123.95,4
5,Top Tracks of 2020,The Global Top Tracks of 2020. Cover: The Weeknd,7ytR5pFWmSjzHJIeQkgog4,ROCKSTAR (feat. Roddy Ricch),903523389.0,BLAME IT ON BABY,"DaBaby, Roddy Ricch",4r63FhuTkUYltbVAg5TQnk,93,north carolina hip hop,rap,,5143332,1,181733,0.746,0.69,11,-7.956,1,0.164,0.247,0.0,0.101,0.497,89.977,4
6,Top Tracks of 2020,The Global Top Tracks of 2020. Cover: The Weeknd,6UelLqGlWMcVH1E5c4H7lY,Watermelon Sugar,936927979.0,Fine Line,Harry Styles,6KImCVD70vtIoJWnq6nGn3,93,pop,post-teen pop,,11155234,0,174000,0.548,0.816,0,-4.209,1,0.0465,0.122,0.0,0.335,0.557,95.39,4
7,Top Tracks of 2020,The Global Top Tracks of 2020. Cover: The Weeknd,7eJMfftS33KTjuF7lTsMCx,death bed (coffee for your head),807108257.0,death bed (coffee for your head),"Powfu, beabadoobee",6bmlMHgSheBauioMgKv2tn,90,sad rap,,,700959,0,173333,0.726,0.431,8,-8.765,0,0.135,0.731,0.0,0.696,0.348,144.026,4
8,Top Tracks of 2020,The Global Top Tracks of 2020. Cover: The Weeknd,2rRJrJEo19S2J82BDsQ3F7,Falling,947758562.0,Nicotine,Trevor Daniel,7uaIm6Pw7xplS8Dy06V6pT,79,alternative r&b,melodic rap,pop rap,526921,0,159382,0.784,0.43,10,-8.756,0,0.0364,0.123,0.0,0.0887,0.236,127.087,4
9,Top Tracks of 2020,The Global Top Tracks of 2020. Cover: The Weeknd,7qEHsqek33rTcFNT9PFqLf,Someone You Loved,1656458655.0,Divinely Uninspired To A Hellish Extent,Lewis Capaldi,4GNC7GD6oZMSxPGyXy4MNB,91,pop,uk pop,,6088447,0,182161,0.501,0.405,1,-5.679,1,0.0319,0.751,0.0,0.105,0.446,109.891,4


# Exporting Datasets

Every spotify playlist was converted to a dataframe. 
This dataframes get stored as a csv, for every playlist.

In [None]:
import seaborn as sb
import matplotlib.pyplot as plt
plt.style.use('dark_background')
sb.barplot(data=most_artist_ger,x='popularity',y='artist/s',color='lightgreen')


### grabbing

In [None]:
# top global track 2020
#https://open.spotify.com/playlist/37i9dQZF1DX7Jl5KP2eZaS?si=T0vjmXV7TbqJ6yakXhikWw

most_tracks_glob = playlist_tracks('37i9dQZF1DX7Jl5KP2eZaS')


Unnamed: 0,playlist,description,track_id,title,plays,album,artist/s,artist_id,popularity_track,genre1,genre2,genre3,artist_followers,explicit,duration,danceability,energy,key,loudness,mode,speechiness,acousticness,instrumentalness,liveness,valence,tempo,time_signature
0,Top Tracks of 2020,The Global Top Tracks of 2020. Cover: The Weeknd,0VjIjW4GlUZAMYd2vXMi3b,Blinding Lights,1747226514.0,After Hours,The Weeknd,1Xyo4u8uXC1ZmMpatF05PJ,97,canadian contemporary r&b,canadian pop,pop,26634329,0,200040,0.514,0.73,1,-5.934,1,0.0598,0.00146,9.5e-05,0.0897,0.334,171.005,4
1,Top Tracks of 2020,The Global Top Tracks of 2020. Cover: The Weeknd,1rgnBhdG2JDFTbYkYRZAku,Dance Monkey,1986783025.0,Dance Monkey,Tones And I,2NjfBq1NflQcKSeiDooVjY,71,australian pop,,,2435056,0,209755,0.825,0.593,6,-6.401,0,0.0988,0.688,0.000161,0.17,0.54,98.078,4
2,Top Tracks of 2020,The Global Top Tracks of 2020. Cover: The Weeknd,0nbXyq5TXYPCO7pr3N8S4I,The Box,1084828362.0,Please Excuse Me For Being Antisocial,Roddy Ricch,757aE44tKEUQEqRuT6GnEB,90,melodic rap,rap,trap,4991249,1,196653,0.896,0.586,10,-6.687,0,0.0559,0.104,0.0,0.79,0.642,116.971,4
3,Top Tracks of 2020,The Global Top Tracks of 2020. Cover: The Weeknd,2Wo6QQD1KMDWeFkkjLqwx5,Roses - Imanbek Remix,1080664694.0,Roses (Imanbek Remix),"SAINt JHN, Imanbek",0H39MdGGX6dbnnQPt6NQkZ,76,melodic rap,pop rap,rap,597487,1,176219,0.785,0.721,8,-5.457,1,0.0506,0.0149,0.00432,0.285,0.894,121.962,4
4,Top Tracks of 2020,The Global Top Tracks of 2020. Cover: The Weeknd,3PfIrDoz19wz7qK7tYeu62,Don't Start Now,1232173577.0,Future Nostalgia,Dua Lipa,6M2wZ9GZgrQXHCFfjv46we,85,dance pop,pop,pop dance,21383592,0,183290,0.793,0.793,11,-4.521,0,0.083,0.0123,0.0,0.0951,0.679,123.95,4
5,Top Tracks of 2020,The Global Top Tracks of 2020. Cover: The Weeknd,7ytR5pFWmSjzHJIeQkgog4,ROCKSTAR (feat. Roddy Ricch),903523389.0,BLAME IT ON BABY,"DaBaby, Roddy Ricch",4r63FhuTkUYltbVAg5TQnk,93,north carolina hip hop,rap,,5143332,1,181733,0.746,0.69,11,-7.956,1,0.164,0.247,0.0,0.101,0.497,89.977,4
6,Top Tracks of 2020,The Global Top Tracks of 2020. Cover: The Weeknd,6UelLqGlWMcVH1E5c4H7lY,Watermelon Sugar,936927979.0,Fine Line,Harry Styles,6KImCVD70vtIoJWnq6nGn3,93,pop,post-teen pop,,11155234,0,174000,0.548,0.816,0,-4.209,1,0.0465,0.122,0.0,0.335,0.557,95.39,4
7,Top Tracks of 2020,The Global Top Tracks of 2020. Cover: The Weeknd,7eJMfftS33KTjuF7lTsMCx,death bed (coffee for your head),807108257.0,death bed (coffee for your head),"Powfu, beabadoobee",6bmlMHgSheBauioMgKv2tn,90,sad rap,,,700959,0,173333,0.726,0.431,8,-8.765,0,0.135,0.731,0.0,0.696,0.348,144.026,4
8,Top Tracks of 2020,The Global Top Tracks of 2020. Cover: The Weeknd,2rRJrJEo19S2J82BDsQ3F7,Falling,947758562.0,Nicotine,Trevor Daniel,7uaIm6Pw7xplS8Dy06V6pT,79,alternative r&b,melodic rap,pop rap,526921,0,159382,0.784,0.43,10,-8.756,0,0.0364,0.123,0.0,0.0887,0.236,127.087,4
9,Top Tracks of 2020,The Global Top Tracks of 2020. Cover: The Weeknd,7qEHsqek33rTcFNT9PFqLf,Someone You Loved,1656458655.0,Divinely Uninspired To A Hellish Extent,Lewis Capaldi,4GNC7GD6oZMSxPGyXy4MNB,91,pop,uk pop,,6088447,0,182161,0.501,0.405,1,-5.679,1,0.0319,0.751,0.0,0.105,0.446,109.891,4


In [None]:
most_tracks_glob.info()

In [83]:
most_tracks_glob.to_csv('most_streamed_tracks2020_global.csv',index=False)

__________________________


In [None]:
plt.figure(figsize=(10,30))
sb.barplot(data=most_tracks_glob,y='title',x='popularity',color='lightgreen')


In [None]:
# most streamed tracks2020 germany
most_tracks_ger = playlist_tracks('37i9dQZF1DX4HROODZmf5u')




In [None]:
most_tracks_ger.head(10)

In [None]:
most_artist_ger.to_csv('most_streamed_artist2020_germany.csv',index=False)

In [None]:

#spotify:playlist:37i9dQZF1DX4HROODZmf5u

most_tracks_ger.to_csv('most_streamed_tracks2020_germany.csv',index=False)

In [None]:
"""# spotify self playlists
cts=0
spotys = sp.user_playlists('spotify',limit=50)

for i in range(100):
    lists=spotys['items'][cts]['name']
    followers=spotys['items']
    print(lists)
    cts=cts+1
"""

In [None]:
#https://open.spotify.com/playlist/37i9dQZF1DWXgY89J4Sjdb?si=wA30zNdCSOqmMvi7fgmHKQ
#top artists 2020 globally

most_artist_glob =playlist_tracks('37i9dQZF1DWXgY89J4Sjdb')
most_artist_glob.to_csv('most_streamed_artist2020_global.csv',index=False)

In [None]:
most_artist_glob.head(60)