# Lab | Web Scraping Single Page

## The goal of this lab is to create a function: scrape_hot100() to scrape the current top 100 songs present at https://www.billboard.com/charts/hot-100 and their respective artists, put the information into a pandas dataframe, and save the dataframe in a csv file in the current folder.

### Import libraries

In [1]:
from bs4 import BeautifulSoup
import requests
import pandas as pd

### Connecting to the website:

In [2]:
url = "https://www.billboard.com/charts/hot-100/"

# Download html with a get request:
movies = requests.get(url)
movies.status_code # if we get 200 we're good to go

200

### Parsing html to create the soup

In [3]:
soup = BeautifulSoup(movies.content, 'html.parser')

#Check that the html code looks like it should

#soup.prettify()

### Getting the song list

In [9]:
songs = []
for title in soup.select("li.o-chart-results-list__item > h3"):
    songs.append(title.get_text().strip())#strip removes adjacent characters to text 
    
songs

['As It Was',
 'First Class',
 'Heat Waves',
 'Big Energy',
 'Enemy',
 'Stay',
 "Don't Think Jesus",
 'Woman',
 'Super Gremlin',
 'Ghost',
 'Thats What I Want',
 'Bad Habits',
 'abcdefu',
 'Shivers',
 'Cold Heart (PNAU Remix)',
 'Easy On Me',
 'Need To Know',
 'Save Your Tears',
 'One Right Now',
 'In A Minute',
 'Levitating',
 "'Til You Can't",
 'Industry Baby',
 'MAMIII',
 'Bam Bam',
 'Hrs And Hrs',
 "We Don't Talk About Bruno",
 'Right On',
 'Never Say Never',
 "Doin' This",
 'Wasted On You',
 'AA',
 'Good 4 U',
 'Sweetest Pie',
 'Fingers Crossed',
 'I Hate U',
 'Boyfriend',
 'To The Moon!',
 'You Right',
 'Numb Little Bug',
 'Fancy Like',
 'Sand In My Boots',
 'What Happened To Virgil',
 'Pushin P',
 'Beers On Me',
 'The Motto',
 "She's All I Wanna Be",
 "When You're Gone",
 'Buy Dirt',
 'About Damn Time',
 'Shake It',
 'Light Switch',
 'If I Was A Cowboy',
 'Peru',
 'Flowers',
 'Nail Tech',
 'Freaky Deaky',
 '23',
 'Trouble With A Heartbreak',
 'Broadway Girls',
 'Heart On Fire',


### Getting the artist list

In [11]:
artists = []

#created a list of the terms to remove from the 'artists' list:

for artist in soup.select("li.o-chart-results-list__item > span.c-label"):
    artists.append(artist.get_text().strip())
    
artists

We have to clean the list we got because there are some redundant numbers, dashes and text:

In [13]:
artists = [number for number in artists if not number.isdigit()]

removal = ['-','NEW','RE-\nENTRY']

for items in artists:
    for x in removal:
        if x in artists:
            artists.remove(x)

### Creating a DataFrame adding both lists as columns

In [14]:
hot100_df = pd.DataFrame({"songs": songs, "artists": artists})
hot100_df

Unnamed: 0,songs,artists
0,As It Was,Harry Styles
1,First Class,Jack Harlow
2,Heat Waves,Glass Animals
3,Big Energy,Latto
4,Enemy,Imagine Dragons X JID
...,...,...
95,P Power,Gunna Featuring Drake
96,Money So Big,Yeat
97,Blick Blick!,Coi Leray & Nicki Minaj
98,Fall In Love,Bailey Zimmerman


### Creating a custom function with all the steps we performed before

In [16]:
def hot100(link):
    from bs4 import BeautifulSoup
    import requests
    import pandas as pd
    
    music = requests.get(link)
    soup = BeautifulSoup(music.content, 'html.parser')
    
    songs = []
    for title in soup.select("li.o-chart-results-list__item > h3"):
        songs.append(title.get_text().strip())#strip removes adjacent characters to text 
    
    artists = []
    
    for artist in soup.select("li.o-chart-results-list__item > span.c-label"):
        artists.append(artist.get_text().strip())

    artists = [number for number in artists if not number.isdigit()]
    
    removal = ['-','NEW','RE-\nENTRY']

    for items in artists:
        for x in removal:
            if x in artists:
                artists.remove(x)

    hot100 = pd.DataFrame({"songs": songs, "artists": artists})
    
    hot100.to_csv("hot100.csv", index=False)
    
    return hot100

In [17]:
hot100_df = hot100("https://www.billboard.com/charts/hot-100/")
hot100_df

Unnamed: 0,songs,artists
0,As It Was,Harry Styles
1,First Class,Jack Harlow
2,Heat Waves,Glass Animals
3,Big Energy,Latto
4,Enemy,Imagine Dragons X JID
...,...,...
95,P Power,Gunna Featuring Drake
96,Money So Big,Yeat
97,Blick Blick!,Coi Leray & Nicki Minaj
98,Fall In Love,Bailey Zimmerman
