![logo_ironhack_blue 7](https://user-images.githubusercontent.com/23629340/40541063-a07a0a8a-601a-11e8-91b5-2f13e4e6b441.png)

# Lab | Web Scraping Multiple Pages

#### Expand the project

If you're done, you can try to expand the project on your own. Here are a few suggestions:

- Find other lists of hot songs on the internet and scrape them too: having a bigger pool of songs will be awesome!
- Apply the same logic to other "groups" of songs: the best songs from a decade or from a country / culture / language / genre.
- Wikipedia maintains a large collection of lists of songs: https://en.wikipedia.org/wiki/Lists_of_songs

<h1 style="color: #00BFFF;">00 |</h1>

In [1]:
# 📚 The good all basics
import pandas as pd # dataframe managment

# 🥣 Let's make a beautiful soup
from bs4 import BeautifulSoup # for web scrapping
import requests # to kindly request to scrape the web

In [2]:
# 🎯 Specific functions
def get_soup(link):
    try:
        request = requests.get(link)
        request.raise_for_status()  # returns an HTTPError if the response is not OK
        soup = BeautifulSoup(request.content, "html.parser") # .content returns raw bytes
        print("All good! Response code is", request.status_code)
        return soup
    except requests.exceptions.HTTPError as err:
        if request.status_code == 404:
            print("404: Oops, sorry we can't find that page!")
        else:
            print("The error code is", err.args[0]) # look up the 1st argument from HTTPError
        return None

<h1 style="color: #00BFFF;">01 | Data Wrangling</h1>

<h3 style="color: #008080;">1. Itunes Top 100 Songs Chart 2023</h3>

In [3]:
itunes_link = "https://www.popvortex.com/music/charts/top-100-songs.php"

In [4]:
soup = get_soup(itunes_link)

All good! Response code is 200


<h3 style="color: #008080;">Getting the song titles</h3>

In [5]:
song1 = soup.find_all("cite", attrs={"class": "title"})
s1_names = [tag.get_text() for tag in song1]
s1_df = pd.DataFrame(s1_names, columns=["Songs"])
print(s1_df)

                                               Songs
0   Barbie World (with Aqua) [From Barbie The Album]
1                                           Fast Car
2                           Barbie World (with Aqua)
3                                         Last Night
4                                       Need A Favor
..                                               ...
92                                             Lover
93                                          Drinkaby
94                                    King of Hearts
95                                   Ella Baila Sola
96                                   Halfway To Hell

[97 rows x 1 columns]


In [6]:
# We get 97 songs not by an error, in the original website there are 97 songs-artists

<h3 style="color: #008080;">Getting the artists names</h3>

In [7]:
artist1 = soup.find_all("em", attrs={"class": "artist"})
a1_names = [tag.get_text() for tag in artist1]
a1_df = pd.DataFrame(a1_names, columns=["Artists"])
print(a1_df)

                        Artists
0       Nicki Minaj & Ice Spice
1                    Luke Combs
2       Nicki Minaj & Ice Spice
3                 Morgan Wallen
4                    Jelly Roll
..                          ...
92                 Taylor Swift
93                Cole Swindell
94                   Kim Petras
95  Eslabon Armado & Peso Pluma
96                   Jelly Roll

[97 rows x 1 columns]


<h2 style="color: #008080;">Presenting the Results</h2>

In [8]:
# We do a final dataframe, concatenating the songs and artists
df = pd.concat([s1_df, a1_df], axis=1)
df

Unnamed: 0,Songs,Artists
0,Barbie World (with Aqua) [From Barbie The Album],Nicki Minaj & Ice Spice
1,Fast Car,Luke Combs
2,Barbie World (with Aqua),Nicki Minaj & Ice Spice
3,Last Night,Morgan Wallen
4,Need A Favor,Jelly Roll
...,...,...
92,Lover,Taylor Swift
93,Drinkaby,Cole Swindell
94,King of Hearts,Kim Petras
95,Ella Baila Sola,Eslabon Armado & Peso Pluma


<h3 style="color: #008080;">2. Top 50 Singles Chart</h3>

In [9]:
single_link = "https://www.aria.com.au/charts/singles-chart/2023-06-26"

In [10]:
soup2 = get_soup(single_link)

All good! Response code is 200


<h3 style="color: #008080;">Getting the song titles</h3>

In [13]:
song2 = soup2.find_all("a", attrs={"class": "c-chart-item__title"})
s2_names = [tag.get_text() for tag in song2]
s2_df = pd.DataFrame(s2_names, columns=["Songs"])
print(s2_df)

                                      Songs
0                                  Sprinter
1                                Last Night
2                                  Fast Car
3                                   Flowers
4                                  Daylight
5                      The Beginning: Cupid
6                                 Kill Bill
7                               Die For You
8                                 Anti-Hero
9                              Boy's a liar
10                                Calm Down
11                                As It Was
12                              All My Life
13                  Something in the Orange
14                                    Karma
15                          I Ain't Worried
16                        Until I Found You
17                                  Miracle
18                              Padam Padam
19                                 Creepin'
20                               Area Codes
21                            Am

<h3 style="color: #008080;">Getting the artists names</h3>

In [14]:
artist2 = soup.find_all("a", attrs={"class": "c-chart-item__artist"})
a2_names = [tag.get_text() for tag in artist1]
a2_df = pd.DataFrame(a2_names, columns=["Artists"])
print(a2_df)

                        Artists
0       Nicki Minaj & Ice Spice
1                    Luke Combs
2       Nicki Minaj & Ice Spice
3                 Morgan Wallen
4                    Jelly Roll
..                          ...
92                 Taylor Swift
93                Cole Swindell
94                   Kim Petras
95  Eslabon Armado & Peso Pluma
96                   Jelly Roll

[97 rows x 1 columns]


In [15]:
# We do a final dataframe, concatenating the songs and artists
df = pd.concat([s2_df, a2_df], axis=1)
df

Unnamed: 0,Songs,Artists
0,Sprinter,Nicki Minaj & Ice Spice
1,Last Night,Luke Combs
2,Fast Car,Nicki Minaj & Ice Spice
3,Flowers,Morgan Wallen
4,Daylight,Jelly Roll
...,...,...
92,,Taylor Swift
93,,Cole Swindell
94,,Kim Petras
95,,Eslabon Armado & Peso Pluma


<h3 style="color: #008080;">3. Top 100 Songs (Spain)</h3>

In [16]:
spain_link = "https://www.popvortex.com/music/spain/top-songs.php"

In [17]:
soup3 = get_soup(spain_link)

All good! Response code is 200


<h3 style="color: #008080;">Getting the song titles</h3>

In [18]:
song3 = soup3.find_all("cite", attrs={"class": "title"})
s3_names = [tag.get_text() for tag in song3]
s3_df = pd.DataFrame(s3_names, columns=["Songs"])
print(s3_df)

                                                Songs
0                                          Nochentera
1                                           LAS BABYS
2                                    One Of The Girls
3                                            Clavaíto
4                                           Acróstico
..                                                ...
95  La Nit de Sant Joan (Nova Versió) (feat. Viole...
96                    Padre / Niña de Tus Ojos (Live)
97                                        Gimlet Gaze
98                                        Mimosa Mist
99                                     NUNCA VOY SOLO

[100 rows x 1 columns]


<h3 style="color: #008080;">Getting the artists names</h3>

In [19]:
artist3 = soup3.find_all("em", attrs={"class": "artist"})
a3_names = [tag.get_text() for tag in artist3]
a3_df = pd.DataFrame(a3_names, columns=["Artists"])
print(a3_df)

                                Artists
0                                 Vicco
1                                Aitana
2   The Weeknd, JENNIE & Lily Rose Depp
3                Chanel & Abraham Mateo
4                               Shakira
..                                  ...
95                            Strombers
96     Full Life Music & Daniel Calveti
97                      Midnight Cycler
98                      Midnight Cycler
99                        KHEA & Milo j

[100 rows x 1 columns]


In [20]:
# We do a final dataframe, concatenating the songs and artists
df = pd.concat([s3_df, a3_df], axis=1)
df

Unnamed: 0,Songs,Artists
0,Nochentera,Vicco
1,LAS BABYS,Aitana
2,One Of The Girls,"The Weeknd, JENNIE & Lily Rose Depp"
3,Clavaíto,Chanel & Abraham Mateo
4,Acróstico,Shakira
...,...,...
95,La Nit de Sant Joan (Nova Versió) (feat. Viole...,Strombers
96,Padre / Niña de Tus Ojos (Live),Full Life Music & Daniel Calveti
97,Gimlet Gaze,Midnight Cycler
98,Mimosa Mist,Midnight Cycler
