## Lab | Web Scraping Single Page

### Instructions - Scraping popular songs
Your product will take a song as an input from the user and will output another song (the recommendation). In most cases, the recommended song will have to be similar to the inputted song, but the CTO thinks that if the song is on the top charts at the moment, the user will enjoy more a recommendation of a song that's also popular at the moment.

You have find data on the internet about currently popular songs. Billboard maintains a weekly Top 100 of "hot" songs here: https://www.billboard.com/charts/hot-100.

It's a good place to start! Scrape the current top 100 songs and their respective artists, and put the information into a pandas dataframe.

### Hot Songs (Top 100 this week)

In [1]:
from bs4 import BeautifulSoup
import requests
import pandas as pd
from time import sleep
import random
from time import sleep
from random import randint
import spotipy
from spotipy.oauth2 import SpotifyClientCredentials
import datetime
import time

# login info for Spotify API saved in local file
secrets_file = open(r"C:\Users\luana\secrets.txt","r")
string = secrets_file.read()

secrets_dict={}
for line in string.split('\n'):
    if len(line) > 0:
        secrets_dict[line.split(':')[0]]=line.split(':')[1]

sp = spotipy.Spotify(auth_manager=SpotifyClientCredentials(client_id=secrets_dict['cid'],
                                                           client_secret=secrets_dict['secret']))

In [2]:
url = 'https://www.popvortex.com/music/charts/top-100-songs.php'

In [3]:
response = requests.get(url)
response.status_code # 200 status code means OK!

200

In [4]:
soup = BeautifulSoup(response.content, "html.parser")

In [5]:
# soup

In [5]:
# extracting lists
song = []
artist = []

num_iter = len(soup.select("p > cite"))

song_list = soup.select("p > cite")
artist_list = soup.select("p > em ")


# iterate through the result set and retrive all the data
for i in range(num_iter):
    song.append(song_list[i].get_text())
    artist.append(artist_list[i].get_text())

print(song)
print(artist)

['Heart Like A Truck', 'Unholy', 'Anti-Hero', 'No Horse To Ride', 'Bring Me to Life', 'wait in the truck', 'Rockstar', 'Tishomingo', 'Ghost', 'Made You Look', 'Kryptonite', 'Thank God', 'Unstoppable', "I Ain't Worried", 'Thought You Should Know', 'Son Of A Sinner', "Something in the Orange (Z&E's Version)", "I'm Good (Blue)", 'Iris', "it's been a year", 'Lift Me Up (From Black Panther: Wakanda Forever - Music From and Inspired By)', 'Have You Ever Seen the Rain', 'Sweet Home Alabama', 'As It Was', 'Here Without You', 'Rock and a Hard Place', "What's Up?", "Livin' On a Prayer", 'Far Away', 'Erbody But Me (feat. Bizzy & Krizz Kaliko)', 'Players', 'Cold Heart (PNAU Remix)', 'Wasted On You', 'One Thing At A Time', 'CUFF IT', 'All the Small Things', 'Watermelon Moonshine', 'You Proof', 'How You Remind Me', 'Hold My Hand', "Creepin'", 'Photograph', 'OMG', 'Motorcycle Drive By', 'Mr. Jones', 'Daydream', 'Goo Goo Muck', 'Way of the Triune God (Hallelujah Version)', 'Fall In Love', 'Renegade', 

In [6]:
#creating dataframe

hot_songs = pd.DataFrame({"song":song,
                       "artist":artist
                      })
hot_songs.head(50)

Unnamed: 0,song,artist
0,Heart Like A Truck,Lainey Wilson
1,Unholy,Sam Smith & Kim Petras
2,Anti-Hero,Taylor Swift
3,No Horse To Ride,Luke Grimes
4,Bring Me to Life,Evanescence
5,wait in the truck,HARDY & Lainey Wilson
6,Rockstar,Nickelback
7,Tishomingo,Zach Bryan
8,Ghost,Tom MacDonald
9,Made You Look,Meghan Trainor


## Lab | Web Scraping Multiple Pages

### Instructions - Scraping popular songs

Prioritize the MVP
In the previous lab, you had to scrape data about "hot songs". It's critical to be on track with that part, as it was part of the request from the CTO.

If you couldn't finish the first lab, use this time to go back there.

Expand the project
If you're done, you can try to expand the project on your own. Here are a few suggestions:

Find other lists of hot songs on the internet and scrape them too: having a bigger pool of songs will be awesome!
Apply the same logic to other "groups" of songs: the best songs from a decade or from a country / culture / language / genre.
Wikipedia maintains a large collection of lists of songs: https://en.wikipedia.org/wiki/Lists_of_songs

### Top hits by year

In [8]:
# I wanna do the same thing for several years, so I will create a function

def url_text(year):
    url = str("https://playback.fm/charts/top-100-songs/"+str(year))
    return url

url2 = url_text(2021)

In [9]:
response2 = requests.get(url2)
response2.status_code # 200 status code means OK!

soup2 = BeautifulSoup(response2.content, "html.parser")
# soup2

In [10]:
print(soup2.select('.song')[0].get_text())
print(soup2.select('.song')[1].get_text())
print(soup2.select('.song')[198].get_text())
print(soup2.select('.song')[199].get_text())
# each song is coming up twice, so I will have to pay attention to the indexes



                       Levitating
                       

Levitating


                       Thot Shit
                       

Thot Shit


In [11]:
soup2.select('.artist')[0].get_text()
soup2.select('.artist')[99].get_text() #this will need cleaning

'\n                   Megan Thee Stallion\n                   '

In [12]:
#Now that I know how to find the data and what the outcome is, I can create a function

def pop_songs_by_year(year):
    try:    
        url = str("https://playback.fm/charts/top-100-songs/"+str(year))
        response = requests.get(url)
        soup = BeautifulSoup(response.content, "html.parser")

        songs_list = soup.select('.song')
        artists_list = soup.select('.artist')

        song = []
        artist = []

        for i in range (len(songs_list)):
            if i%2 !=0:
                song.append(songs_list[i].get_text())
            else:
                pass

        for i in range (len(artists_list)):
            artist.append(artists_list[i].get_text())
            artist[i]=artist[i].replace('\n','')

        top_songs = pd.DataFrame({"song":song,
                                  "artist":artist})

        top_songs['year'] = str(year)

        wait_time = randint(1,4)
        sleep(wait_time)
    
    except Exception:
        pass
    
    return top_songs

### !!! Careful with next cell! Long running time!!! 

In [13]:
# # Scraping the 100 most popular songs for the last 50 years.

# songs_df = pop_songs_by_year(2021)

# list_of_years = []
# for year in range (50):
#     list_of_years.append(2020-year)

# for index in range(len(list_of_years)):
#     songs_df = pd.concat([songs_df, pop_songs_by_year(list_of_years[index])])

In [17]:
#exporting csv so that I dont have to run it again every time I restart the notebook
songs_df.to_csv(r'C:\Users\luana\github\spotify-song-recommender\csv_files\songs_df.csv', index = False, header=True)

In [16]:
# Use next line if notebook is restarted
# songs_df= pd.read_csv(r'C:\Users\luana\github\spotify-song-recommender\csv_files\songs_df.csv')

songs_df

Unnamed: 0,song,artist,year
0,Levitating,Dua Lipa & DaBaby,2021
1,Drivers License,Olivia Rodrigo,2021
2,Save Your Tears,The Weeknd & Ariana Grande,2021
3,Montero (Call Me by Your Name),Lil Nas X,2021
4,Blinding Lights,The Weeknd,2021
...,...,...,...
5092,He's Gonna Step On You Again,John Kongos,1971
5093,Wild World,Cat Stevens,1971
5094,Love Her Madly,The Doors,1971
5095,Amazing Grace,Judy Collins,1971


### Merging hot songs this week with top songs by year

In [18]:
hot_songs["year"]=2022
hot_songs

Unnamed: 0,song,artist,year
0,Unholy,Sam Smith & Kim Petras,2022
1,Heart Like A Truck,Lainey Wilson,2022
2,it's been a year,Ashley Cooke,2022
3,Anti-Hero,Taylor Swift,2022
4,Bring Me to Life,Evanescence,2022
...,...,...,...
95,I Like You (A Happier Song) [feat. Doja Cat],Post Malone,2022
96,Tennessee Orange,Megan Moroney,2022
97,Something in the Orange,Zach Bryan,2022
98,I Won't Let Go,Rascal Flatts,2022


In [19]:
hot_top_songs = pd.concat([hot_songs, songs_df], axis=0).reset_index(drop=True)

hot_top_songs['song'] = hot_top_songs['song'].str.lower()
hot_top_songs['artist'] = hot_top_songs['artist'].str.lower()
hot_top_songs

#I think 5000 songs are enough for now :)

Unnamed: 0,song,artist,year
0,unholy,sam smith & kim petras,2022
1,heart like a truck,lainey wilson,2022
2,it's been a year,ashley cooke,2022
3,anti-hero,taylor swift,2022
4,bring me to life,evanescence,2022
...,...,...,...
5192,he's gonna step on you again,john kongos,1971
5193,wild world,cat stevens,1971
5194,love her madly,the doors,1971
5195,amazing grace,judy collins,1971


### Starting the Song Recommender

In [20]:
def song_recommender():
    user_song = str(input('Please enter the name of a song you like\n')).lower()
    if user_song in hot_top_songs['song'].tolist():
        i = random.randint(0,len(hot_top_songs))
        print('You may also like',hot_top_songs['song'][i].title(), "by",hot_top_songs['artist'][i].title())
    else:
        print('There is no song recommendation for you now. Please try again later')

In [21]:
song_recommender()

Please enter the name of a song you like
Unholy
You may also like Forever & Ever by Demis Roussos


In [22]:
song_recommender()

Please enter the name of a song you like
wild world
You may also like Breaking The Habit by Linkin Park


In [23]:
song_recommender()

Please enter the name of a song you like
Test
There is no song recommendation for you now. Please try again later


# Lab | API wrappers - Create your collection of songs & audio features

### Instructions

To move forward with the project, you need to create a collection of songs with their audio features - as large as possible!

These are the songs that we will cluster. And, later, when the user inputs a song, we will find the cluster to which the song belongs and recommend a song from the same cluster. The more songs you have, the more accurate and diverse recommendations you'll be able to give. Although... you might want to make sure the collected songs are "curated" in a certain way. Try to find playlists of songs that are diverse, but also that meet certain standards.

The process of sending hundreds or thousands of requests can take some time - it's normal if you have to wait a few minutes (or, if you're ambitious, even hours) to get all the data you need.

An idea for collecting as many songs as possible is to start with all the songs of a big, diverse playlist and then go to every artist present in the playlist and grab every song of every album of that artist. The amount of songs you'll be collecting per playlist will grow exponentially!

In [24]:
# I will slice the dataframe to use as test

df_slice = hot_top_songs[0:20]
df_slice

Unnamed: 0,song,artist,year
0,unholy,sam smith & kim petras,2022
1,heart like a truck,lainey wilson,2022
2,it's been a year,ashley cooke,2022
3,anti-hero,taylor swift,2022
4,bring me to life,evanescence,2022
5,tishomingo,zach bryan,2022
6,no horse to ride,luke grimes,2022
7,made you look,meghan trainor,2022
8,rockstar,nickelback,2022
9,wait in the truck,hardy & lainey wilson,2022


In [25]:
results = sp.search(q='unholy artist:sam smith', type='track', limit=1)
# results

In [26]:
results.keys()

dict_keys(['tracks'])

In [27]:
results['tracks'].keys()

dict_keys(['href', 'items', 'limit', 'next', 'offset', 'previous', 'total'])

In [28]:
# I am gonna use the list that I have and extract spotify information about it
# I will assume that if I search for the song and artist as listed above, the first result will be the correct song

# I will need these queries:

print(results["tracks"]["items"][0]["name"])
print(results["tracks"]["items"][0]["artists"][0]["name"])
print(results["tracks"]["items"][0]["artists"][1]["name"]) # only in case more than one artist: 
# print(results["tracks"]["items"][0]["artists"][2]["name"]) # I will scrape up to 3 in the function
print(results["tracks"]["items"][0]["popularity"])
sp.audio_features(results["tracks"]["items"][0]["id"])

Unholy (feat. Kim Petras)
Sam Smith
Kim Petras
99


[{'danceability': 0.714,
  'energy': 0.472,
  'key': 2,
  'loudness': -7.375,
  'mode': 1,
  'speechiness': 0.0864,
  'acousticness': 0.013,
  'instrumentalness': 4.51e-06,
  'liveness': 0.266,
  'valence': 0.238,
  'tempo': 131.121,
  'type': 'audio_features',
  'id': '3nqQXoyQOWXiESFLlDF1hG',
  'uri': 'spotify:track:3nqQXoyQOWXiESFLlDF1hG',
  'track_href': 'https://api.spotify.com/v1/tracks/3nqQXoyQOWXiESFLlDF1hG',
  'analysis_url': 'https://api.spotify.com/v1/audio-analysis/3nqQXoyQOWXiESFLlDF1hG',
  'duration_ms': 156943,
  'time_signature': 4}]

In [29]:
def get_songs_info(dataframe):
    #creating df with needed columns
    playlist_df = pd.DataFrame(columns=['song','artist','popularity','danceability','energy','key','loudness','mode',
                                    'speechiness','acousticness','instrumentalness','liveness','valence','tempo',
                                    'type','id','uri','track_href','analysis_url','duration_ms','time_signature'])
    
    #for loop to go through a dataframe containing names of songs and artists
    for i in range(len(dataframe)):
        try:
            # Get song's name and artist from the dataframe and generate a query with them
            # Searching with more than one artist wont generate any result, so I will take only the first word
            # Some songs have too long names and cause a 400(Bad request).Therefore, limiting to 50 characters
            artist_name=dataframe['artist'][i].split(' ')[0]
            query = dataframe['song'][i][:50]+" artist:"+artist_name[:50]
            results = sp.search(q=query, type='track', limit=1)
            
            # Song name, artists and popularity obtained directly with request

            song = results["tracks"]["items"][0]["name"]

            # Getting up to three different artists for each song
            artist_1 = results["tracks"]["items"][0]["artists"][0]["name"]
            artist_2 = ''
            try:
                artist_2 = (results["tracks"]["items"][0]["artists"][1]["name"])
                artist_2= " & "+artist_2
            except:
                pass
            artist_3 = ''
            try:
                artist_3 = results["tracks"]["items"][0]["artists"][2]["name"]
                artist_3= " & "+artist_3
            except:
                pass
            artists = (artist_1+artist_2+artist_3)

            popularity = results["tracks"]["items"][0]["popularity"]

            track_dict = {'song':song, 'artist':artists, 'popularity':popularity}
            df_track = pd.DataFrame([track_dict])
            
            # Now the audio features
            song_id = results["tracks"]["items"][0]["id"]
            features_dict = sp.audio_features(song_id)
            df_features = pd.DataFrame(features_dict)
            
            # Complete df for one song
            song_df = pd.concat([df_track, df_features], axis=1)
            
            # Progress report every 25%
            if i == round((len(dataframe)*0.25)):
                print("25% complete")
            elif i == round((len(dataframe)*0.5)):
                print("50% complete")
            elif i == round((len(dataframe)*0.75)):
                print('75% complete')
            else:
                pass
            
            # Respectul nap
            wait_time = random.uniform(0.5,1)
            sleep(wait_time)
    
            # Complete df with all songs
            playlist_df = pd.concat([playlist_df, song_df],axis=0)
            
        except:
            pass
    
    return playlist_df

In [30]:
%%time
playlist_slice = get_songs_info(df_slice)

25% complete
50% complete
75% complete
CPU times: total: 328 ms
Wall time: 17.8 s


In [31]:
playlist_slice

Unnamed: 0,song,artist,popularity,danceability,energy,key,loudness,mode,speechiness,acousticness,...,liveness,valence,tempo,type,id,uri,track_href,analysis_url,duration_ms,time_signature
0,Unholy (feat. Kim Petras),Sam Smith & Kim Petras,99,0.714,0.472,2,-7.375,1,0.0864,0.013,...,0.266,0.238,131.121,audio_features,3nqQXoyQOWXiESFLlDF1hG,spotify:track:3nqQXoyQOWXiESFLlDF1hG,https://api.spotify.com/v1/tracks/3nqQXoyQOWXi...,https://api.spotify.com/v1/audio-analysis/3nqQ...,156943,4
0,Heart Like A Truck,Lainey Wilson,75,0.587,0.632,9,-5.416,1,0.0311,0.148,...,0.158,0.392,139.974,audio_features,5ZCjp56T8J6d7amJyFSzrI,spotify:track:5ZCjp56T8J6d7amJyFSzrI,https://api.spotify.com/v1/tracks/5ZCjp56T8J6d...,https://api.spotify.com/v1/audio-analysis/5ZCj...,199040,4
0,it's been a year,Ashley Cooke,61,0.364,0.431,11,-7.959,1,0.0556,0.665,...,0.339,0.391,77.28,audio_features,2untQJbPOaB1BjyZKIVuZv,spotify:track:2untQJbPOaB1BjyZKIVuZv,https://api.spotify.com/v1/tracks/2untQJbPOaB1...,https://api.spotify.com/v1/audio-analysis/2unt...,216671,4
0,Anti-Hero,Taylor Swift,97,0.637,0.643,4,-6.571,1,0.0519,0.13,...,0.142,0.533,97.008,audio_features,0V3wPSX9ygBnCm8psDIegu,spotify:track:0V3wPSX9ygBnCm8psDIegu,https://api.spotify.com/v1/tracks/0V3wPSX9ygBn...,https://api.spotify.com/v1/audio-analysis/0V3w...,200690,4
0,Bring Me To Life,Evanescence,82,0.331,0.943,4,-3.188,0,0.0698,0.00721,...,0.242,0.296,94.612,audio_features,0COqiPhxzoWICwFCS4eZcp,spotify:track:0COqiPhxzoWICwFCS4eZcp,https://api.spotify.com/v1/tracks/0COqiPhxzoWI...,https://api.spotify.com/v1/audio-analysis/0COq...,235893,4
0,Tishomingo,Zach Bryan,63,0.56,0.576,6,-6.92,1,0.03,0.309,...,0.0988,0.478,133.491,audio_features,1TsiwVwHhLgVK8sxBchINM,spotify:track:1TsiwVwHhLgVK8sxBchINM,https://api.spotify.com/v1/tracks/1TsiwVwHhLgV...,https://api.spotify.com/v1/audio-analysis/1Tsi...,188883,4
0,No Horse To Ride,Luke Grimes,63,0.674,0.334,2,-7.726,1,0.0277,0.143,...,0.11,0.352,120.972,audio_features,2NnlmzSeHsqBmgxAJXWeJ3,spotify:track:2NnlmzSeHsqBmgxAJXWeJ3,https://api.spotify.com/v1/tracks/2NnlmzSeHsqB...,https://api.spotify.com/v1/audio-analysis/2Nnl...,133874,4
0,Made You Look,Meghan Trainor,95,0.838,0.525,10,-3.562,1,0.0665,0.345,...,0.0771,0.884,144.981,audio_features,0QHEIqNKsMoOY5urbzN48u,spotify:track:0QHEIqNKsMoOY5urbzN48u,https://api.spotify.com/v1/tracks/0QHEIqNKsMoO...,https://api.spotify.com/v1/audio-analysis/0QHE...,134256,4
0,Rockstar,Nickelback,68,0.616,0.91,0,-3.004,1,0.0386,0.0459,...,0.343,0.693,144.072,audio_features,6n9yCXvLhnYMgJIiIcMu7D,spotify:track:6n9yCXvLhnYMgJIiIcMu7D,https://api.spotify.com/v1/tracks/6n9yCXvLhnYM...,https://api.spotify.com/v1/audio-analysis/6n9y...,252040,4
0,wait in the truck (feat. Lainey Wilson),HARDY & Lainey Wilson,78,0.534,0.466,6,-6.98,1,0.0272,0.277,...,0.0979,0.23,140.0,audio_features,7trjNYF5ek7zX4GKSHQZbP,spotify:track:7trjNYF5ek7zX4GKSHQZbP,https://api.spotify.com/v1/tracks/7trjNYF5ek7z...,https://api.spotify.com/v1/audio-analysis/7trj...,277660,4


### !!! Careful with next cell! Long running time!!! 

In [None]:
# Let´s be bold and see what happens :D

# playlist_df = get_songs_time(hot_top_songs) # as comment to not risk running it again by accident

In [32]:
# Use next line if notebook is restarted
# playlist_df= pd.read_csv(r'C:\Users\luana\github\spotify-song-recommender\csv_files\playlist_df.csv')

playlist_df.shape

(5003, 21)

In [33]:
playlist_df.head()

Unnamed: 0,song,artist,popularity,danceability,energy,key,loudness,mode,speechiness,acousticness,...,liveness,valence,tempo,type,id,uri,track_href,analysis_url,duration_ms,time_signature
0,Unholy (feat. Kim Petras),Sam Smith & Kim Petras,99,0.714,0.472,2.0,-7.375,1.0,0.0864,0.013,...,0.266,0.238,131.121,audio_features,3nqQXoyQOWXiESFLlDF1hG,spotify:track:3nqQXoyQOWXiESFLlDF1hG,https://api.spotify.com/v1/tracks/3nqQXoyQOWXi...,https://api.spotify.com/v1/audio-analysis/3nqQ...,156943.0,4.0
1,I'm Good (Blue),David Guetta & Bebe Rexha,98,0.561,0.965,7.0,-3.673,0.0,0.0343,0.00383,...,0.371,0.304,128.04,audio_features,4uUG5RXrOk84mYEfFvj3cK,spotify:track:4uUG5RXrOk84mYEfFvj3cK,https://api.spotify.com/v1/tracks/4uUG5RXrOk84...,https://api.spotify.com/v1/audio-analysis/4uUG...,175238.0,4.0
2,Thank God,Kane Brown & Katelyn Brown,78,0.738,0.455,3.0,-8.735,1.0,0.0352,0.695,...,0.107,0.441,99.945,audio_features,1brnLTvarI9D1hLP6z2Ar8,spotify:track:1brnLTvarI9D1hLP6z2Ar8,https://api.spotify.com/v1/tracks/1brnLTvarI9D...,https://api.spotify.com/v1/audio-analysis/1brn...,174560.0,4.0
3,wait in the truck (feat. Lainey Wilson),HARDY & Lainey Wilson,75,0.534,0.466,6.0,-6.98,1.0,0.0272,0.277,...,0.0979,0.23,140.0,audio_features,7trjNYF5ek7zX4GKSHQZbP,spotify:track:7trjNYF5ek7zX4GKSHQZbP,https://api.spotify.com/v1/tracks/7trjNYF5ek7z...,https://api.spotify.com/v1/audio-analysis/7trj...,277660.0,4.0
4,Son Of A Sinner,Jelly Roll,77,0.365,0.541,7.0,-7.489,1.0,0.0304,0.0419,...,0.281,0.329,86.041,audio_features,25VQoiuyc0HkC5FQTj1a8G,spotify:track:25VQoiuyc0HkC5FQTj1a8G,https://api.spotify.com/v1/tracks/25VQoiuyc0Hk...,https://api.spotify.com/v1/audio-analysis/25VQ...,232093.0,4.0


In [35]:
playlist_df = playlist_df.drop(columns=0, axis=1)
playlist_df

Unnamed: 0,song,artist,popularity,danceability,energy,key,loudness,mode,speechiness,acousticness,...,liveness,valence,tempo,type,id,uri,track_href,analysis_url,duration_ms,time_signature
0,Unholy (feat. Kim Petras),Sam Smith & Kim Petras,99,0.714,0.472,2.0,-7.375,1.0,0.0864,0.01300,...,0.2660,0.238,131.121,audio_features,3nqQXoyQOWXiESFLlDF1hG,spotify:track:3nqQXoyQOWXiESFLlDF1hG,https://api.spotify.com/v1/tracks/3nqQXoyQOWXi...,https://api.spotify.com/v1/audio-analysis/3nqQ...,156943.0,4.0
1,I'm Good (Blue),David Guetta & Bebe Rexha,98,0.561,0.965,7.0,-3.673,0.0,0.0343,0.00383,...,0.3710,0.304,128.040,audio_features,4uUG5RXrOk84mYEfFvj3cK,spotify:track:4uUG5RXrOk84mYEfFvj3cK,https://api.spotify.com/v1/tracks/4uUG5RXrOk84...,https://api.spotify.com/v1/audio-analysis/4uUG...,175238.0,4.0
2,Thank God,Kane Brown & Katelyn Brown,78,0.738,0.455,3.0,-8.735,1.0,0.0352,0.69500,...,0.1070,0.441,99.945,audio_features,1brnLTvarI9D1hLP6z2Ar8,spotify:track:1brnLTvarI9D1hLP6z2Ar8,https://api.spotify.com/v1/tracks/1brnLTvarI9D...,https://api.spotify.com/v1/audio-analysis/1brn...,174560.0,4.0
3,wait in the truck (feat. Lainey Wilson),HARDY & Lainey Wilson,75,0.534,0.466,6.0,-6.980,1.0,0.0272,0.27700,...,0.0979,0.230,140.000,audio_features,7trjNYF5ek7zX4GKSHQZbP,spotify:track:7trjNYF5ek7zX4GKSHQZbP,https://api.spotify.com/v1/tracks/7trjNYF5ek7z...,https://api.spotify.com/v1/audio-analysis/7trj...,277660.0,4.0
4,Son Of A Sinner,Jelly Roll,77,0.365,0.541,7.0,-7.489,1.0,0.0304,0.04190,...,0.2810,0.329,86.041,audio_features,25VQoiuyc0HkC5FQTj1a8G,spotify:track:25VQoiuyc0HkC5FQTj1a8G,https://api.spotify.com/v1/tracks/25VQoiuyc0Hk...,https://api.spotify.com/v1/audio-analysis/25VQ...,232093.0,4.0
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
4998,He's Gonna Step on You Again,John Kongos,28,0.712,0.874,0.0,-6.835,0.0,0.0328,0.01670,...,0.3510,0.705,107.083,audio_features,4Kqp11O4KoQ6cHAtH9Kf1x,spotify:track:4Kqp11O4KoQ6cHAtH9Kf1x,https://api.spotify.com/v1/tracks/4Kqp11O4KoQ6...,https://api.spotify.com/v1/audio-analysis/4Kqp...,260773.0,5.0
4999,Wild World,Yusuf / Cat Stevens,77,0.480,0.542,0.0,-8.510,1.0,0.0348,0.34500,...,0.1130,0.567,152.853,audio_features,7mjSHL2Eb0kAwiKbvNNyD9,spotify:track:7mjSHL2Eb0kAwiKbvNNyD9,https://api.spotify.com/v1/tracks/7mjSHL2Eb0kA...,https://api.spotify.com/v1/audio-analysis/7mjS...,200560.0,4.0
5000,Love Her Madly,The Doors,66,0.565,0.587,4.0,-7.393,0.0,0.0317,0.05730,...,0.0416,0.966,147.462,audio_features,3MFFDRC4wTN9JNGtzXsZlN,spotify:track:3MFFDRC4wTN9JNGtzXsZlN,https://api.spotify.com/v1/tracks/3MFFDRC4wTN9...,https://api.spotify.com/v1/audio-analysis/3MFF...,198467.0,4.0
5001,Amazing Grace,Judy Collins,45,0.192,0.263,3.0,-11.830,1.0,0.0332,0.94500,...,0.2030,0.196,87.747,audio_features,6Sueudn0VQA4AXRsFKQbFl,spotify:track:6Sueudn0VQA4AXRsFKQbFl,https://api.spotify.com/v1/tracks/6Sueudn0VQA4...,https://api.spotify.com/v1/audio-analysis/6Sue...,248693.0,4.0


In [36]:
#exporting csv so that I dont have to run it again every time I restart the notebook
# playlist_df.to_csv(r'C:\Users\luana\github\spotify-song-recommender\csv_files\playlist_df.csv', index = False, header=True)

### Separating hot from top songs

In [39]:
top_1971_2021 = pd.read_csv(r'C:\Users\luana\github\spotify-song-recommender\csv_files\playlist_df.csv')
# I will remove the hot songs from this dataset and keep only the top songs until 2021
top_1971_2021 = top_1971_2021[94:].reset_index(drop=True)
top_1971_2021.head()

Unnamed: 0,song,artist,popularity,danceability,energy,key,loudness,mode,speechiness,acousticness,...,liveness,valence,tempo,type,id,uri,track_href,analysis_url,duration_ms,time_signature
0,Levitating (feat. DaBaby),Dua Lipa & DaBaby,85,0.702,0.825,6.0,-3.787,0.0,0.0601,0.00883,...,0.0674,0.915,102.977,audio_features,5nujrmhLynf4yMoMtj8AQF,spotify:track:5nujrmhLynf4yMoMtj8AQF,https://api.spotify.com/v1/tracks/5nujrmhLynf4...,https://api.spotify.com/v1/audio-analysis/5nuj...,203064.0,4.0
1,drivers license,Olivia Rodrigo,88,0.561,0.431,10.0,-8.81,1.0,0.0578,0.768,...,0.106,0.137,143.875,audio_features,5wANPM4fQCJwkGd4rN57mH,spotify:track:5wANPM4fQCJwkGd4rN57mH,https://api.spotify.com/v1/tracks/5wANPM4fQCJw...,https://api.spotify.com/v1/audio-analysis/5wAN...,242013.0,4.0
2,Save Your Tears,The Weeknd,90,0.68,0.826,0.0,-5.487,1.0,0.0309,0.0212,...,0.543,0.644,118.051,audio_features,5QO79kh1waicV47BqGRL3g,spotify:track:5QO79kh1waicV47BqGRL3g,https://api.spotify.com/v1/tracks/5QO79kh1waic...,https://api.spotify.com/v1/audio-analysis/5QO7...,215627.0,4.0
3,MONTERO (Call Me By Your Name),Lil Nas X,83,0.593,0.503,8.0,-6.725,0.0,0.22,0.293,...,0.405,0.71,178.781,audio_features,1SC5rEoYDGUK4NfG82494W,spotify:track:1SC5rEoYDGUK4NfG82494W,https://api.spotify.com/v1/tracks/1SC5rEoYDGUK...,https://api.spotify.com/v1/audio-analysis/1SC5...,137704.0,4.0
4,Blinding Lights,The Weeknd,91,0.514,0.73,1.0,-5.934,1.0,0.0598,0.00146,...,0.0897,0.334,171.005,audio_features,0VjIjW4GlUZAMYd2vXMi3b,spotify:track:0VjIjW4GlUZAMYd2vXMi3b,https://api.spotify.com/v1/tracks/0VjIjW4GlUZA...,https://api.spotify.com/v1/audio-analysis/0VjI...,200040.0,4.0


In [40]:
top_1971_2021.to_csv(r'C:\Users\luana\github\spotify-song-recommender\csv_files\top_1971_2021.csv', index = False, header=True)

To expand the songs that I have in my list I will search for spotify playlists and get songs from them

For processing reasons, I will use another notebook