## Lab | Web Scraping Single Page

### Instructions - Scraping popular songs
Your product will take a song as an input from the user and will output another song (the recommendation). In most cases, the recommended song will have to be similar to the inputted song, but the CTO thinks that if the song is on the top charts at the moment, the user will enjoy more a recommendation of a song that's also popular at the moment.

You have find data on the internet about currently popular songs. Billboard maintains a weekly Top 100 of "hot" songs here: https://www.billboard.com/charts/hot-100.

It's a good place to start! Scrape the current top 100 songs and their respective artists, and put the information into a pandas dataframe.

### Hot Songs (Top 100 this week)

In [1]:
from bs4 import BeautifulSoup
import requests
import pandas as pd
from time import sleep
import random
from time import sleep
from random import randint

In [2]:
url = 'https://www.popvortex.com/music/charts/top-100-songs.php'

In [3]:
response = requests.get(url)
response.status_code # 200 status code means OK!

200

In [4]:
soup = BeautifulSoup(response.content, "html.parser")

In [5]:
soup

<!DOCTYPE html>
<html lang="en"><head><meta charset="utf-8"/><title>iTunes Top 100 Songs Chart 2022</title><meta content="width=device-width, initial-scale=1" name="viewport"/><meta content="iTunes top 100 songs chart list. The most popular hit music and trending songs of 2022. Chart of today's current iTunes top 100 songs is updated daily." name="description"/><meta content="iTunes Top 100 Songs Chart 2022" property="og:title"><meta content="Chart of the top 100 songs on iTunes. Chart list of the top 100 song downloads of 2022 is updated daily." property="og:description"><meta content="article" property="og:type"><meta content="https://www.popvortex.com/images/logo-facebook.png" property="og:image"/><meta content="PopVortex" property="og:site_name"/><meta content="https://www.popvortex.com/music/charts/top-100-songs.php" property="og:url"/><meta content="100000239962942" property="fb:admins"/><meta content="178831188827052" property="fb:app_id"/><link href="/favicon.png" rel="shortcut

In [6]:
# extracting lists
song = []
artist = []

num_iter = len(soup.select("p > cite"))

song_list = soup.select("p > cite")
artist_list = soup.select("p > em ")


# iterate through the result set and retrive all the data
for i in range(num_iter):
    song.append(song_list[i].get_text())
    artist.append(artist_list[i].get_text())

print(song)
print(artist)

['Unholy', 'Eagle (feat. KB)', "I'm Good (Blue)", 'wait in the truck', 'Everywhere', 'Thank God', 'Make It With You', 'A Thousand Years', 'Son Of A Sinner', 'You Proof', "I Ain't Worried", 'the mockingbird & THE CROW', 'CUFF IT', 'Unstoppable', 'Left and Right', 'TRUCK BED', 'She Had Me At Heads Carolina', 'here lies country music', 'Wasted On You', 'The Kind of Love We Make', 'Fall In Love', 'Lose Yourself', 'Starved', 'Shallow', 'Under the Influence', 'As It Was', 'Super Freaky Girl', '1,2,3 Eoi!', 'Life Is a Highway', 'Celestial', 'Love Me Like You Do', 'About Damn Time', 'Hold Me Closer', 'Earned It', 'Shivers', 'Sunroof', 'Victoria’s Secret', 'Everything I Own', '2 Be Loved (Am I Ready)', "Baby I'm-A Want You", 'Numb', 'Next Thing You Know', 'Way of the Triune God (Hallelujah Version)', 'High Heels', "You'll Be In My Heart", "Gangsta's Paradise (feat. L.V.)", 'I Like You (A Happier Song) [feat. Doja Cat]', 'Rock and a Hard Place', 'Tennessee Orange', 'Neon You', 'Vegas (From the O

In [7]:
#creating dataframe

hot_songs = pd.DataFrame({"song":song,
                       "artist":artist
                      })
hot_songs

Unnamed: 0,song,artist
0,Unholy,Sam Smith & Kim Petras
1,Eagle (feat. KB),Transformation Worship
2,I'm Good (Blue),David Guetta & Bebe Rexha
3,wait in the truck,HARDY & Lainey Wilson
4,Everywhere,Fleetwood Mac
...,...,...
95,Havana (Live),Camila Cabello
96,Lady Marmalade,"Christina Aguilera, Lil' Kim, Mýa & P!nk"
97,Narco,Blasterjaxx & Timmy Trumpet
98,Half Of Me (feat. Riley Green),Thomas Rhett


## Lab | Web Scraping Multiple Pages

### Instructions - Scraping popular songs

Prioritize the MVP
In the previous lab, you had to scrape data about "hot songs". It's critical to be on track with that part, as it was part of the request from the CTO.

If you couldn't finish the first lab, use this time to go back there.

Expand the project
If you're done, you can try to expand the project on your own. Here are a few suggestions:

Find other lists of hot songs on the internet and scrape them too: having a bigger pool of songs will be awesome!
Apply the same logic to other "groups" of songs: the best songs from a decade or from a country / culture / language / genre.
Wikipedia maintains a large collection of lists of songs: https://en.wikipedia.org/wiki/Lists_of_songs

### Top hits by year

In [8]:
# I wanna do the same thing for several years, so I will create a function

def url_text(year):
    url = str("https://playback.fm/charts/top-100-songs/"+str(year))
    return url

url2 = url_text(2021)

In [9]:
response2 = requests.get(url2)
response2.status_code # 200 status code means OK!

soup2 = BeautifulSoup(response2.content, "html.parser")
soup2

<!DOCTYPE html>

<html class="birthday" lang="en-US">
<head>
<meta charset="utf-8"/>
<meta content="text/html; charset=utf-8" http-equiv="Content-Type"/>
<meta content="upgrade-insecure-requests" http-equiv="Content-Security-Policy">
<meta content="IE=edge,chrome=1" http-equiv="X-UA-Compatible"/>
<meta content="width=device-width, initial-scale=1.0,minimum-scale=1.0,maximum-scale=5.0" name="viewport">
<meta content="always" name="referrer">
<title>Top 100 Pop Song Chart for 2021</title>
<meta content="Find the top 100 Pop songs for the year of 2021 and listen to them all! Can you guess the number one Pop song in 2021? Find out now!" name="description"/>
<meta content="" name="keywords"/>
<meta content="7934465" property="fb:admins"/>
<meta content="983502645001411" property="fb:app_id"/>
<meta content="website" property="og:type"/>
<meta content="Top 100 Pop Song Chart for 2021" property="og:title"/>
<meta content="Find the top 100 Pop songs for the year of 2021 and listen to them all!

In [10]:
print(soup2.select('.song')[0].get_text())
print(soup2.select('.song')[1].get_text())
print(soup2.select('.song')[198].get_text())
print(soup2.select('.song')[199].get_text())
# each song is coming up twice, so I will have to pay attention to the indexes



Levitating


Levitating


Thot Shit


Thot Shit


In [11]:
soup2.select('.artist')[0].get_text()
soup2.select('.artist')[99].get_text() #this will need cleaning

'\nMegan Thee Stallion\n'

In [12]:
#Now that I know how to find the data and what the outcome is, I can create a function

def pop_songs_by_year(year):
    try:    
        url = str("https://playback.fm/charts/top-100-songs/"+str(year))
        response = requests.get(url)
        soup = BeautifulSoup(response.content, "html.parser")

        songs_list = soup.select('.song')
        artists_list = soup.select('.artist')

        song = []
        artist = []

        for i in range (len(songs_list)):
            if i%2 !=0:
                song.append(songs_list[i].get_text())
            else:
                pass

        for i in range (len(artists_list)):
            artist.append(artists_list[i].get_text())
            artist[i]=artist[i].replace('\n','')

        top_songs = pd.DataFrame({"song":song,
                                  "artist":artist})

        top_songs['year'] = str(year)

        wait_time = randint(1,4)
        sleep(wait_time)
    
    except Exception:
        pass
    
    return top_songs

### !!! Careful with next cell! Long running time!!! 

In [15]:
# # Scraping the 100 most popular songs for the last 50 years.

# songs_df = pop_songs_by_year(2021)

# list_of_years = []
# for year in range (50):
#     list_of_years.append(2020-year)

# for index in range(len(list_of_years)):
#     songs_df = pd.concat([songs_df, pop_songs_by_year(list_of_years[index])])

In [21]:
songs_df

Unnamed: 0,song,artist,year
0,Levitating,Dua Lipa & DaBaby,2021
1,Drivers License,Olivia Rodrigo,2021
2,Save Your Tears,The Weeknd & Ariana Grande,2021
3,Montero (Call Me by Your Name),Lil Nas X,2021
4,Blinding Lights,The Weeknd,2021
...,...,...,...
95,He's Gonna Step On You Again,John Kongos,1971
96,Wild World,Cat Stevens,1971
97,Love Her Madly,The Doors,1971
98,Amazing Grace,Judy Collins,1971


In [23]:
#exporting csv so that I dont have to run it again everytime I restart the notebook
songs_df.to_csv(r'C:\Users\luana\IH-Labs\lab-web-scraping-multiple-pages\songs_df.csv', index = False, header=True)

### Merging hot songs this week with top songs by year

In [24]:
hot_songs["year"]="2022"
hot_songs

Unnamed: 0,song,artist,year
0,Unholy,Sam Smith & Kim Petras,2022
1,Eagle (feat. KB),Transformation Worship,2022
2,I'm Good (Blue),David Guetta & Bebe Rexha,2022
3,wait in the truck,HARDY & Lainey Wilson,2022
4,Everywhere,Fleetwood Mac,2022
...,...,...,...
95,Havana (Live),Camila Cabello,2022
96,Lady Marmalade,"Christina Aguilera, Lil' Kim, Mýa & P!nk",2022
97,Narco,Blasterjaxx & Timmy Trumpet,2022
98,Half Of Me (feat. Riley Green),Thomas Rhett,2022


In [25]:
hot_top_songs = pd.concat([hot_songs, songs_df], axis=0).reset_index(drop=True)

hot_top_songs['song'] = hot_top_songs['song'].str.lower()
hot_top_songs['artist'] = hot_top_songs['artist'].str.lower()
hot_top_songs

#I think 5000 songs are enough for now :)

Unnamed: 0,song,artist,year
0,unholy,sam smith & kim petras,2022
1,eagle (feat. kb),transformation worship,2022
2,i'm good (blue),david guetta & bebe rexha,2022
3,wait in the truck,hardy & lainey wilson,2022
4,everywhere,fleetwood mac,2022
...,...,...,...
5192,he's gonna step on you again,john kongos,1971
5193,wild world,cat stevens,1971
5194,love her madly,the doors,1971
5195,amazing grace,judy collins,1971


### Starting the Song Recommender

In [46]:
def song_recommender():
    user_song = str(input('Please enter the name of a song you like\n')).lower()
    if user_song in hot_top_songs['song'].tolist():
        i = random.randint(0,len(hot_top_songs))
        print('You may also like',hot_top_songs['song'][i].title(), "by",hot_top_songs['artist'][i].title())
    else:
        print('There is no song recommendation for you now. Please try again later')

In [48]:
song_recommender()

Please enter the name of a song you like
Unholy
You may also like Always by Atlantic Starr


In [49]:
song_recommender()

Please enter the name of a song you like
Hello
You may also like Bad And Boujee by Migos Featuring Lil Uzi Vert


In [50]:
song_recommender()

Please enter the name of a song you like
song not in the list
There is no song recommendation for you now. Please try again later
