# Lab | Web Scraping Single Page
**Business goal:**

    Check the case_study_gnod.md file.

    Make sure you've understood the big picture of your project:
        the goal of the company (Gnod),
        their current product (Gnoosic),
        their strategy, and
        how your project fits into this context.

    Re-read the business case and the e-mail from the CTO, take a look at the flowchart and create an initial Trello board with the tasks you think you'll have to accomplish.

**Instructions - Scraping popular songs**

Your product will take a song as an input from the user and will output another song (the recommendation). In most cases, the recommended song will have to be similar to the inputted song, but the CTO thinks that if the song is on the top charts at the moment, the user will enjoy more a recommendation of a song that's also popular at the moment.

You have find data on the internet about currently popular songs. Billboard maintains a weekly Top 100 of "hot" songs here: https://www.billboard.com/charts/hot-100.

It's a good place to start! Scrape the current top 100 songs and their respective artists, and put the information into a pandas dataframe.

## Loading the libraries

In [1]:
from bs4 import BeautifulSoup
import requests
import re
import pandas as pd

## Storing the URL

In [2]:
url = "https://www.billboard.com/charts/hot-100"

## Getting the html code of the web page

In [3]:
response = requests.get(url)
response.status_code # 200 status code means OK!

200

## Parsing the html code

In [4]:
soup = BeautifulSoup(response.content, "html.parser")
#soup

## Retrieving the desired info from the soup

In [5]:
#artists
soup.find_all("span", class_="chart-element__information__artist")

[<span class="chart-element__information__artist text--truncate color--secondary">Olivia Rodrigo</span>,
 <span class="chart-element__information__artist text--truncate color--secondary">Cardi B</span>,
 <span class="chart-element__information__artist text--truncate color--secondary">The Weeknd</span>,
 <span class="chart-element__information__artist text--truncate color--secondary">The Weeknd</span>,
 <span class="chart-element__information__artist text--truncate color--secondary">24kGoldn Featuring iann dior</span>,
 <span class="chart-element__information__artist text--truncate color--secondary">Ariana Grande</span>,
 <span class="chart-element__information__artist text--truncate color--secondary">Chris Brown &amp; Young Thug</span>,
 <span class="chart-element__information__artist text--truncate color--secondary">Dua Lipa Featuring DaBaby</span>,
 <span class="chart-element__information__artist text--truncate color--secondary">Ariana Grande</span>,
 <span class="chart-element__info

In [6]:
#songs
soup.find_all("span", class_="chart-element__information__song")

[<span class="chart-element__information__song text--truncate color--primary">Drivers License</span>,
 <span class="chart-element__information__song text--truncate color--primary">Up</span>,
 <span class="chart-element__information__song text--truncate color--primary">Blinding Lights</span>,
 <span class="chart-element__information__song text--truncate color--primary">Save Your Tears</span>,
 <span class="chart-element__information__song text--truncate color--primary">Mood</span>,
 <span class="chart-element__information__song text--truncate color--primary">34+35</span>,
 <span class="chart-element__information__song text--truncate color--primary">Go Crazy</span>,
 <span class="chart-element__information__song text--truncate color--primary">Levitating</span>,
 <span class="chart-element__information__song text--truncate color--primary">Positions</span>,
 <span class="chart-element__information__song text--truncate color--primary">What You Know Bout Love</span>,
 <span class="chart-elem

## Store info in dataframe

In [7]:
#initializing empty lists
song = []
artist = []

num_iter = len(soup.select("ol li"))

#iterate through the result set and retrive all the data
for i in range(num_iter):    
    song.append(soup.select(".chart-element__information__song")[i].get_text())
    artist.append(soup.select(".chart-element__information__artist")[i].get_text())
    
df = pd.DataFrame({'artist': artist, 'song': song})

In [8]:
#artist

In [9]:
#song

In [10]:
df

Unnamed: 0,artist,song
0,Olivia Rodrigo,Drivers License
1,Cardi B,Up
2,The Weeknd,Blinding Lights
3,The Weeknd,Save Your Tears
4,24kGoldn Featuring iann dior,Mood
...,...,...
95,Jordan Davis,Almost Maybes
96,DaBaby,Masterpiece
97,Miley Cyrus Featuring Dua Lipa,Prisoner
98,Sabrina Carpenter,Skin


# Lab | Web Scraping Multiple Pages

**Instructions**

**Expand the project**

If you're done, you can try to expand the project on your own. Here are a few suggestions:

    1. Find other lists of hot songs on the internet and scrape them too: having a bigger pool of songs will be awesome!
    2. Apply the same logic to other "groups" of songs: the best songs from a decade or from a country / culture / language / genre.
    3. Wikipedia maintains a large collection of lists of songs: https://en.wikipedia.org/wiki/Lists_of_songs

**Practice web scraping**

See Part 2 of Lab.

## Expand the project

### hot songs - year end - from billboard (list 1)

In [11]:
url = "https://www.billboard.com/charts/year-end/hot-100-songs"

In [12]:
response = requests.get(url)
response.status_code # 200 status code means OK!

200

In [13]:
soup1 = BeautifulSoup(response.content, "html.parser")
#soup1

In [14]:
#initializing empty lists
song1 = []
artist1 = []

num_iter = len(soup1.select(".ye-chart-item__title"))

#iterate through the result set and retrive all the data
for i in range(num_iter):    
    song1.append(soup1.select(".ye-chart-item__title")[i].get_text())
    artist1.append(soup1.select(".ye-chart-item__artist")[i].get_text())
    
df_bb = pd.DataFrame({'artist': artist1, 'song': song1})
df_bb

Unnamed: 0,artist,song
0,\n\nThe Weeknd\n\n,\nBlinding Lights\n
1,\n\nPost Malone\n\n,\nCircles\n
2,\n\nRoddy Ricch\n\n,\nThe Box\n
3,\n\nDua Lipa\n\n,\n Don't Start Now\n
4,\nDaBaby Featuring Roddy Ricch\n,\nRockstar\n
...,...,...
95,\n\nMorgan Wallen\n\n,\nMore Than My Hometown\n
96,\n\nLuke Combs\n\n,\nLovin' On You\n
97,\n\nMoneybagg Yo\n\n,\nSaid Sum\n
98,\nH.E.R. Featuring YG\n,\nSlide\n


In [15]:
#df_bb['artist'] = df_bb['artist'].replace('\n\n', '')
#df_bb['song'] = df_bb['song'].replace('\n', '')
#df_bb

### hot songs from tonspion.de (list 2)
tracklist "Deutsche Musik: Die 100 besten deutschsprachigen Songs"

In [16]:
url = "https://www.tonspion.de/news/playlist-die-besten-deutschsprachigen-lieder"

In [17]:
response = requests.get(url)
response.status_code # 200 status code means OK!

200

In [18]:
soup2 = BeautifulSoup(response.content, "html.parser")
#soup2

In [19]:
#initializing empty lists
artist_song = []

num_iter = len(soup2.select("ol li"))

#iterate through the result set and retrive all the data
for i in range(num_iter):    
    artist_song.append(soup2.select("ol li")[i].get_text())
    
#artist_song

In [20]:
df_tsp = pd.DataFrame({'artist_song': artist_song})
df_tsp[['artist', 'song']] = df_tsp['artist_song']. str.split(' - ', 1, expand=True)
df_tsp.drop(columns = "artist_song",inplace=True)

df_tsp

Unnamed: 0,artist,song
0,Beginner & Samy Deluxe,Füchse
1,Casper,Im Ascheregen
2,Chefket,Rap & Soul
3,Fatoni & Dexter,32 Grad
4,K.I.Z & Henning May,Hurra die Welt geht unter
...,...,...
95,Marteria,Endboss
96,Die Orsons,Schwung in die Kiste
97,Casper & Blixa Bargeld & Dagobert & Sizarr,Lang lebe der Tod
98,Einstürzende Neubauten,Yü-Gung (Adrian Sherwood Remix)


### hot songs from MTV (list 3)
tracklist "Official Single Top 100"

In [21]:
url = "http://www.mtv.de/charts/c6mc86/single-top-100"

In [22]:
response = requests.get(url)
response.status_code # 200 status code means OK!

200

In [23]:
soup3 = BeautifulSoup(response.content, "html.parser")
#soup3

In [24]:
#soup3.select(".videoTitle")

In [25]:
#soup3.select(".artist")

In [26]:
#initializing empty lists
song3 = []
artist3 = []

num_iter = len(soup3.select(".videoTitle"))

#iterate through the result set and retrive all the data
for i in range(num_iter):    
    song3.append(soup3.select(".videoTitle")[i].get_text())
    artist3.append(soup3.select(".artist")[i].get_text())
    
df_mtv = pd.DataFrame({'artist': artist3, 'song': song3})
df_mtv

Unnamed: 0,artist,song
0,Nathan Evans,Wellerman
1,Kasimir1441 & badmomzjay,Ohne dich
2,Jamule & Capital Bra,No comprendo
3,Tiesto,The Business
4,Olivia Rodrigo,drivers license
5,The Weeknd,Save Your Tears
6,Ufo361,Wings
7,Masked Wolf,Astronaut In The Ocean
8,Jason Derulo,Love Not War (The Tampa Beat)
9,Zoe Wees,Girls Like Us
