# Lab | Web Scraping Multiple Pages

Business goal:
- Check the case_study_gnod.md file.

- Make sure you've understood the big picture of your project:

    - the goal of the company (Gnod),
    - their current product (Gnoosic),
    - their strategy, and
    - how your project fits into this context.
Re-read the business case and the e-mail from the CTO, take a look at the flowchart and create an initial Trello board with the tasks you think you'll have to accomplish.

### Instructions - Scraping popular songs

##### Prioritize the MVP
In the previous lab, you had to scrape data about "hot songs". It's critical to be on track with that part, as it was part of the request from the CTO.

If you couldn't finish the first lab, use this time to go back there.

##### Expand the project
If you're done, you can try to expand the project on your own. Here are a few suggestions:

- Find other lists of hot songs on the internet and scrape them too: having a bigger pool of songs will be awesome!
- Apply the same logic to other "groups" of songs: the best songs from a decade or from a country / culture / language / genre.
- Wikipedia maintains a large collection of lists of songs: https://en.wikipedia.org/wiki/Lists_of_songs

##### Practice web scraping
As you've seen, scraping the internet is a skill that can get you all sorts of information. Here are some little challenges that you can try to gain more experience in the field:

- Retrieve an arbitrary Wikipedia page of "Python" and create a list of links on that page: url ='https://en.wikipedia.org/wiki/Python'
- Find the number of titles that have changed in the United States Code since its last release point: url = 'http://uscode.house.gov/download/download.shtml'
- Create a Python list with the top ten FBI's Most Wanted names: url = 'https://www.fbi.gov/wanted/topten'
- Display the 20 latest earthquakes info (date, time, latitude, longitude and region name) by the EMSC as a pandas dataframe: url = 'https://www.emsc-csem.org/Earthquake/'
- List all language names and number of related articles in the order they appear in wikipedia.org: url = 'https://www.wikipedia.org/'
- A list with the different kind of datasets available in data.gov.uk: url = 'https://data.gov.uk/'
- Display the top 10 languages by number of native speakers stored in a pandas dataframe: url = 'https://en.wikipedia.org/wiki/List_of_languages_by_number_of_native_speakers'

# Lab | Web Scraping Single Page

# Goal
Get top 100 songs on Billboard and list them into a pandas dataframe

## Import libraries

In [1]:
from bs4 import BeautifulSoup
import requests
import pandas as pd

## Scraping the content from the web

In [2]:
# Send a GET request to the Billboard Hot 100 URL
url = 'https://www.billboard.com/charts/hot-100'
response = requests.get(url)
response.status_code

200

In [3]:
# Create a BeautifulSoup object to parse the HTML content
soup = BeautifulSoup(response.content, 'html.parser')

In [4]:
# Getting a list of all matching elements
result = soup.find_all('div', class_='o-chart-results-list-row-container')

In [5]:
# Initialize empty lists
data = []

In [6]:
# Retrieving the song name and artist for each element
for res in result: 
    songName = res.find('h3').text.strip()
    artist = res.find('h3').find_next('span').text.strip()
    data.append({'Song': songName, 'Artist': artist})

In [7]:
# Converting into a DataFrame
top_100_songs = pd.DataFrame(data) 
top_100_songs

Unnamed: 0,Song,Artist
0,Last Night,Morgan Wallen
1,Fast Car,Luke Combs
2,Calm Down,Rema & Selena Gomez
3,Flowers,Miley Cyrus
4,All My Life,Lil Durk Featuring J. Cole
...,...,...
95,"Angel, Pt. 1","Kodak Black, NLE Choppa, Jimin, JVKE & Muni Long"
96,Girl In Mine,Parmalee
97,Moonlight,Kali Uchis
98,Classy 101,Feid x Young Miko


# Lab | Web Scraping Multiple Pages

## Top 100 Song in Germany

In [8]:
# Send a GET request to the Billboard Hot 100 URL
url = 'https://www.offiziellecharts.de/charts?rCH=2'
response = requests.get(url)
response.status_code

200

In [9]:
# Create a BeautifulSoup object to parse the HTML content
soup = BeautifulSoup(response.content, 'html.parser')

In [10]:
german_artists = []
for a in soup.find_all("span", attrs={"class": "info-artist"}):
    german_artists.append(a.get_text())
    
german_artists

['Ski Aggu, Joost & Otto Waalkes',
 'Udo Lindenberg & Apache 207',
 'RAF Camora / Ahmad Amin',
 'Ayliva',
 'Yung Yury & Damn Yury',
 'Apache 207',
 'Apache 207',
 'Apache 207',
 'David Kushner',
 'Miley Cyrus',
 'Ayliva',
 'Tiësto',
 'Apache 207',
 'David Guetta / Anne-Marie / Coi Leray',
 'Luca-Dante Spadafora, Niklas Dee, Octavian',
 'Nina Chuba',
 '01099 - Paul & Ski Aggu',
 'Purple Disco Machine x Kungs',
 'Creeds',
 'Peter Fox feat. Inéz',
 'Miksu / MacLoud & Makko',
 'David Guetta & Bebe Rexha',
 'Bonez MC',
 'Nina Chuba',
 'Ikke Hüftgold x Schürze x DJ Robin',
 'Dave & Central Cee',
 'Pashanim',
 'Loreen',
 'RAF Camora / Luciano',
 'Hava & Dardan',
 'HoodBlaq',
 'Ufo361 / Lucidbeatz',
 'Montez feat. SDP',
 'Eminem',
 'Apache 207',
 'Rema',
 'Tom Odell',
 'Ski Aggu, Endzone & Ericson',
 'Olexesh / Bonez MC',
 'Shindy feat. Nate Dogg',
 'P!nk',
 'Jamule',
 'Michael Schulte x R3hab',
 'Finch x Tream',
 'Ayliva feat. Mero',
 'Harry Styles',
 'Julian Sommer x Mia Julia',
 'Libianca',

In [11]:
german_songs = []
for a in soup.find_all("span", attrs={"class": "info-title"}):
    german_songs.append(a.get_text())
    
german_songs

['Friesenjung',
 'Komet',
 'Strada',
 'In deinen Armen',
 'Tabu.',
 'Breaking Your Heart',
 'Was weißt du schon',
 'Wenn das so bleibt',
 'Daylight',
 'Flowers',
 'Aber sie',
 'Lay Low',
 'Neunzig',
 "Baby Don't Hurt Me",
 'Mädchen auf dem Pferd',
 'Wildberry Lillet',
 'Anders',
 'Substitution',
 'Push Up',
 'Zukunft Pink',
 'Nachts wach',
 "I'm Good (Blue)",
 'Alles nur kein Star',
 'Mangos mit Chili',
 'Bumsbar',
 'Sprinter',
 'Bagchaser Can',
 'Tattoo',
 'All Night',
 'Normal',
 'Pass auf',
 'Match3',
 'Fieber',
 'Mockingbird',
 'Roller',
 'Calm Down',
 'Another Love',
 'Party Sahne',
 'Gramm für Gramm',
 'How Come?',
 'Trustfall',
 'Alemania',
 'Waterfall',
 'Liebe auf der Rückbank',
 'Sie weiß',
 'As It Was',
 'Peter Pan',
 'People',
 '10:35',
 'Toscana Fanboys',
 'Eyes Closed',
 'Zelten auf Kies',
 'Not Fair',
 'Miracle',
 'One Touch (00212)',
 'Give It To Me',
 'Makeba',
 'Bamba',
 'Anti-Hero',
 'Living In A Haze',
 'Weekends',
 "Creepin'",
 'Round One',
 'Aperol im Glas',
 'Bad

In [12]:
# Convert the lists to Pandas Series
artists_series = pd.Series(german_artists, name='Artists')
songs_series = pd.Series(german_songs, name='Songs')

In [13]:
# Concatenate
top_100_songs_germany = pd.concat([artists_series, songs_series], axis=1)
top_100_songs_germany

Unnamed: 0,Artists,Songs
0,"Ski Aggu, Joost & Otto Waalkes",Friesenjung
1,Udo Lindenberg & Apache 207,Komet
2,RAF Camora / Ahmad Amin,Strada
3,Ayliva,In deinen Armen
4,Yung Yury & Damn Yury,Tabu.
...,...,...
95,Ed Sheeran,Shivers
96,Ricchi & Poveri,Sarà perché ti amo
97,Macklemore & Ryan Lewis feat. Ray Dalton,Can't Hold Us
98,Imanbek & BYOR,Belly Dancer


## Top 100 Song in South Africa

In [14]:
# Send a GET request to the Billboard Hot 100 URL
url = 'https://www.popvortex.com/music/south-africa/top-songs.php'
response = requests.get(url)
response.status_code

200

In [15]:
# Create a BeautifulSoup object to parse the HTML content
soup = BeautifulSoup(response.content, 'html.parser')

In [16]:
south_africa_songs = []
for a in soup.find_all("cite", attrs={"class": "title"}):
    south_africa_songs.append(a.get_text())
    
south_africa_songs

['Mnike (feat. DJ Maphorisa, Nandipha808, Ceeka RSA & Tyron Dee)',
 '24 Hours',
 'Sgudi Snyc',
 'Hamba Juba',
 'Gangnam Style (feat. DJ Maphorisa & Kabza De Small)',
 'Someone You Loved',
 'Stimela',
 "If Anything's Left",
 'Kiss Me',
 'Kunkra (feat. Xduppy, ShaunMusiq & Ftears)',
 'Flowers',
 'Umbayimbayi',
 'Calm Down',
 "Thath'Indawo (Live) [feat. Mpumi Mtsweni]",
 'iMpumelelo (feat. Da Muziqal Chef)',
 'Heaven',
 'Thixo Wami (feat. Zola, Big Zulu & Riot)',
 'Padam Padam',
 'Khaki',
 'Praise (feat. Brandon Lake, Chris Brown & Chandler Moore)',
 'fukumean',
 'Akudingwa Nasibani',
 'The Sound of Silence',
 "Hallelujah Nkateko (Lihle's Version)",
 'Lift Me Up (From Black Panther: Wakanda Forever - Music From and Inspired By)',
 'Your Love',
 'Seasons',
 'Surrender',
 'Asibe Happy',
 'Rush',
 'As It Was',
 'TRUSTFALL',
 'Calm Down',
 'People',
 'Mohigan Sun (feat. Murumba Pitch)',
 'All My Life (feat. J. Cole)',
 'Foute',
 'Thando (feat. Lowsheen)',
 'Ukholo Lwam (A Song of Hope)',
 'Im

In [17]:
south_africa_artists = []
for a in soup.find_all("em", attrs={"class": "artist"}):
    south_africa_artists.append(a.get_text())
    
south_africa_artists

['Tyler ICU & Tumelo.za',
 'Kaylow',
 'De Mthuda, Da Muziqal Chef & Eemoh',
 'Lady Amar, JL SA, Cici & Murumba Pitch',
 'Mas Musiq & Daliwonga',
 'Lewis Capaldi',
 '2Point1, Ntate Stunna & Nthabi Sings',
 'Jamie Fine',
 'Dermot Kennedy',
 'Myztro & Daliwonga',
 'Miley Cyrus',
 'Inkabi Zezwe, Sjava & Big Zulu',
 'Rema & Selena Gomez',
 'Spirit of Praise',
 'Sam Deep & Eemoh',
 'Niall Horan',
 'Zakwe',
 'Kylie Minogue',
 'Ricus Nel',
 'Elevation Worship',
 'Gunna',
 'Sindi Ntombela',
 'Disturbed',
 'Joyous Celebration',
 'Rihanna',
 'Azana',
 'Lloyiso',
 'Natalie Taylor',
 'Kabza De Small, DJ Maphorisa & Ami Faku',
 'Ayra Starr',
 'Harry Styles',
 'P!nk',
 'Rema',
 'Libianca',
 'MÖRDA & Oscar Mbo',
 'Lil Durk',
 'Appel',
 'Wanitwa Mos, Master KG & Seemah',
 'Intimate Worshippers',
 'Nathi',
 'Mellow & Sleazy & TmanXpress',
 'Davido',
 'Ayra Starr',
 'Jain',
 'Rooksein',
 'De Mthuda, Da Muziqal Chef & Kwiish SA',
 'Felo Le Tee & Mellow & Sleazy',
 'uMjabulisi',
 'Falling In Reverse',
 'Kg

In [18]:
# Convert the lists to Pandas Series
artists_series = pd.Series(south_africa_artists, name='Artists')
songs_series = pd.Series(south_africa_songs, name='Songs')

In [19]:
# Concatenate
top_100_songs_south_africa = pd.concat([artists_series, songs_series], axis=1)
top_100_songs_south_africa

Unnamed: 0,Artists,Songs
0,Tyler ICU & Tumelo.za,"Mnike (feat. DJ Maphorisa, Nandipha808, Ceeka ..."
1,Kaylow,24 Hours
2,"De Mthuda, Da Muziqal Chef & Eemoh",Sgudi Snyc
3,"Lady Amar, JL SA, Cici & Murumba Pitch",Hamba Juba
4,Mas Musiq & Daliwonga,Gangnam Style (feat. DJ Maphorisa & Kabza De S...
...,...,...
95,Michelle Simonal & Amazonics,Cry for Help
96,Ian Storm & David Atsman,Waves (feat. Marissa)
97,Deep Narratives,Amiba Snakes
98,The Tuten Brothers,Feelin' Famous
