# Lab 1 | Web Scraping Single Page

## Instructions - Scraping popular songs
Your product will take a song as an input from the user and will output another song (the recommendation). In most cases, the recommended song will have to be similar to the inputted song, but the CTO thinks that if the song is on the top charts at the moment, the user will enjoy more a recommendation of a song that's also popular at the moment. <br>
<br>
You have find data on the internet about currently popular songs. Billboard maintains a weekly Top 100 of "hot" songs here: https://www.billboard.com/charts/hot-100.<br>
<br>
It's a good place to start! Scrape the current top 100 songs and their respective artists, and put the information into a pandas dataframe.

In [1]:
from bs4 import BeautifulSoup
import requests
import pandas as pd
import re

In [2]:
url = "https://www.billboard.com/charts/hot-100"
response = requests.get(url)
response.status_code

200

In [3]:
soup = BeautifulSoup(response.content, "html.parser")
# print(soup.prettify())

In [4]:
# soup.find_all("div", attrs={"class": "o-chart-results-list-row-container"}) 

In [5]:
# The first song in the list has a different format than the rest

cls = "c-title a-no-trucate a-font-primary-bold-s u-letter-spacing-0021 u-font-size-23@tablet " \
      "lrv-u-font-size-16 u-line-height-125 u-line-height-normal@mobile-max a-truncate-ellipsis " \
      "u-max-width-245 u-max-width-230@tablet-only u-letter-spacing-0028@tablet"

soup.find_all("h3", attrs={"class": cls}) 

[<h3 class="c-title a-no-trucate a-font-primary-bold-s u-letter-spacing-0021 u-font-size-23@tablet lrv-u-font-size-16 u-line-height-125 u-line-height-normal@mobile-max a-truncate-ellipsis u-max-width-245 u-max-width-230@tablet-only u-letter-spacing-0028@tablet" id="title-of-a-story">
 
 	
 	
 		
 					Last Night		
 	
 </h3>]

In [6]:
titles = [soup.find("h3", attrs={"class": cls}).get_text()]
titles

['\n\n\t\n\t\n\t\t\n\t\t\t\t\tLast Night\t\t\n\t\n']

In [7]:
# Rest of the song titles

cls = "c-title a-no-trucate a-font-primary-bold-s u-letter-spacing-0021 lrv-u-font-size-18@tablet " \
      "lrv-u-font-size-16 u-line-height-125 u-line-height-normal@mobile-max a-truncate-ellipsis " \
      "u-max-width-330 u-max-width-230@tablet-only"

# soup.find_all("h3", attrs={"class": cls}) 

In [8]:
num_iter = len(soup.find_all("h3", attrs={"class": cls}))

for i in range(num_iter):
    titles.append(soup.find_all("h3", attrs={"class": cls})[i].get_text())

# titles

In [9]:
# Remove extra characters 

titles = titles[:100]

titles = [re.sub(r'[\r\n\t]', '', x) for x in titles]
# titles 

In [10]:
len(titles)

100

In [11]:
# First artist

cls = "c-label a-no-trucate a-font-primary-s lrv-u-font-size-14@mobile-max u-line-height-normal@mobile-max " \
      "u-letter-spacing-0021 lrv-u-display-block a-truncate-ellipsis-2line u-max-width-330 " \
      "u-max-width-230@tablet-only u-font-size-20@tablet"

soup.find_all("span", attrs={"class": cls}) 

[<span class="c-label a-no-trucate a-font-primary-s lrv-u-font-size-14@mobile-max u-line-height-normal@mobile-max u-letter-spacing-0021 lrv-u-display-block a-truncate-ellipsis-2line u-max-width-330 u-max-width-230@tablet-only u-font-size-20@tablet">
 	
 	Morgan Wallen
 </span>]

In [12]:
artists = [soup.find("span", attrs={"class": cls}) .get_text()]
artists

['\n\t\n\tMorgan Wallen\n']

In [13]:
# Rest of the artists

cls = "c-label a-no-trucate a-font-primary-s lrv-u-font-size-14@mobile-max u-line-height-normal@mobile-max u-letter-spacing-0021 lrv-u-display-block a-truncate-ellipsis-2line u-max-width-330 u-max-width-230@tablet-only"

num_iter = len(soup.find_all("span", attrs={"class": cls}))

for i in range(num_iter):
    artists.append(soup.find_all("span", attrs={"class": cls})[i].get_text())

In [14]:
# Remove extra characters 

artists = [re.sub(r'[\r\n\t]', '', x) for x in artists]
# artists 

In [15]:
len(artists)

100

In [16]:
# Create dataframe

billboard = pd.DataFrame({"artist": artists, "title": titles})
billboard

Unnamed: 0,artist,title
0,Morgan Wallen,Last Night
1,Miley Cyrus,Flowers
2,Luke Combs,Fast Car
3,Rema & Selena Gomez,Calm Down
4,Lil Durk Featuring J. Cole,All My Life
...,...,...
95,Jelly Roll With Lainey Wilson,Save Me
96,Yandel & Feid,Yandel 150
97,Rosalia & Rauw Alejandro,Beso
98,Morgan Wallen,I Wrote The Book


# Lab 2 | Web Scraping Multiple Pages.

## Instructions
**Expand the project**<br>
If you're done, you can try to expand the project on your own. Here are a few suggestions:

- Find other lists of hot songs on the internet and scrape them too: having a bigger pool of songs will be awesome!
- Apply the same logic to other "groups" of songs: the best songs from a decade or from a country / culture / language / genre.
- Wikipedia maintains a large collection of lists of songs: https://en.wikipedia.org/wiki/Lists_of_songs

### Rolling stone: 100 best songs of 2022
https://www.rollingstone.com/music/music-lists/best-songs-2022-list-1234632381/lainey-wilson-heart-like-a-truck-1234632707/

In [17]:
url = "https://www.rollingstone.com/music/music-lists/best-songs-2022-list-1234632381/lainey-wilson-heart-like-a-truck-1234632707/"
response = requests.get(url)
response.status_code

200

In [18]:
soup = BeautifulSoup(response.content, "html.parser")

In [19]:
rolling_stone_songs = []

for a in soup.find_all("article", attrs={"class": "pmc-fallback-list-item"}):
    rolling_stone_songs.append(a.find("h2").get_text())
    
rolling_stone_songs

['Lainey Wilson, ‘Heart Like a Truck’',
 'Chronixx, ‘Never Give Up’',
 'Plains, ‘Problem With It’',
 'Hurray for the Riff Raff, ‘Saga’',
 'Camilo ft. Grupo Firme, ‘Alaska’',
 'Ingrid Andress, ‘Yearbook’',
 'Jack Harlow, ‘First Class’',
 'Psy feat. Suga, ‘That That’',
 'Dead Cross, ‘Reign of Error’',
 'Ethel Cain, ‘American Teenager’',
 'Gladie, ‘Nothing’',
 'Guided by Voices, ‘Alex Bell’',
 '(G)I-dle, ‘Nxde’',
 'The Weeknd, ‘Take My Breath’',
 'Nayeon, ‘Pop!’',
 'Bill Callahan, ‘Coyotes’',
 'Protoje feat. Lila Iké, ‘Late at Night’',
 'Blood Orange, ‘Jesus Freak Lighter’',
 'Camila Cabello feat. Maria Becerra, ‘Hasta Los Dientes’',
 'Charli XCX and Tiësto, ‘Hot in It’',
 'Daddy Yankee and Bad Bunny, ‘X Ultima Vez’',
 'The 1975, ‘Part of the Band’',
 'Rauw Alejandro and Baby Rasta, ‘Punto 40’',
 'Florence + the Machine, ‘Choreomania’',
 'Saba feat. Day Wave, ‘2012’',
 'Le Sserafim, ‘Antifragile’',
 'Alvvays, ‘Pomeranian Spinster’',
 'Big Bang, ‘Still Life’',
 'Yeah Yeah Yeahs, ‘Blacktop’

These are the first 50 songs, now we will get the next 50 in the same way.

In [20]:
url = "https://www.rollingstone.com/music/music-lists/best-songs-2022-list-1234632381/bad-bunny-ft-bomba-estereo-ojitos-lindos-1234632596/"
response = requests.get(url)
response.status_code

200

In [21]:
soup = BeautifulSoup(response.content, "html.parser")

In [22]:
for a in soup.find_all("article", attrs={"class": "pmc-fallback-list-item"}):
    rolling_stone_songs.append(a.find("h2").get_text())
    
rolling_stone_songs

['Lainey Wilson, ‘Heart Like a Truck’',
 'Chronixx, ‘Never Give Up’',
 'Plains, ‘Problem With It’',
 'Hurray for the Riff Raff, ‘Saga’',
 'Camilo ft. Grupo Firme, ‘Alaska’',
 'Ingrid Andress, ‘Yearbook’',
 'Jack Harlow, ‘First Class’',
 'Psy feat. Suga, ‘That That’',
 'Dead Cross, ‘Reign of Error’',
 'Ethel Cain, ‘American Teenager’',
 'Gladie, ‘Nothing’',
 'Guided by Voices, ‘Alex Bell’',
 '(G)I-dle, ‘Nxde’',
 'The Weeknd, ‘Take My Breath’',
 'Nayeon, ‘Pop!’',
 'Bill Callahan, ‘Coyotes’',
 'Protoje feat. Lila Iké, ‘Late at Night’',
 'Blood Orange, ‘Jesus Freak Lighter’',
 'Camila Cabello feat. Maria Becerra, ‘Hasta Los Dientes’',
 'Charli XCX and Tiësto, ‘Hot in It’',
 'Daddy Yankee and Bad Bunny, ‘X Ultima Vez’',
 'The 1975, ‘Part of the Band’',
 'Rauw Alejandro and Baby Rasta, ‘Punto 40’',
 'Florence + the Machine, ‘Choreomania’',
 'Saba feat. Day Wave, ‘2012’',
 'Le Sserafim, ‘Antifragile’',
 'Alvvays, ‘Pomeranian Spinster’',
 'Big Bang, ‘Still Life’',
 'Yeah Yeah Yeahs, ‘Blacktop’

In [23]:
len(rolling_stone_songs)

100

In [24]:
# Create a dataframe from the list

rolling_stone = pd.DataFrame({"song": rolling_stone_songs})
rolling_stone

Unnamed: 0,song
0,"Lainey Wilson, ‘Heart Like a Truck’"
1,"Chronixx, ‘Never Give Up’"
2,"Plains, ‘Problem With It’"
3,"Hurray for the Riff Raff, ‘Saga’"
4,"Camilo ft. Grupo Firme, ‘Alaska’"
...,...
95,"Rosalia, ‘Despecha’"
96,"Taylor Swift, ‘Karma’"
97,"Steve Lacy, ‘Bad Habit’"
98,"Beyonce, ‘Cuff It’"


In [25]:
# Split artist and title

rolling_stone = rolling_stone['song'].str.split(",", n=1, expand=True)
rolling_stone.columns = ["artist", "title"]
rolling_stone

Unnamed: 0,artist,title
0,Lainey Wilson,‘Heart Like a Truck’
1,Chronixx,‘Never Give Up’
2,Plains,‘Problem With It’
3,Hurray for the Riff Raff,‘Saga’
4,Camilo ft. Grupo Firme,‘Alaska’
...,...,...
95,Rosalia,‘Despecha’
96,Taylor Swift,‘Karma’
97,Steve Lacy,‘Bad Habit’
98,Beyonce,‘Cuff It’


In [26]:
rolling_stone["title"][0]

' ‘Heart Like a Truck’'

In [27]:
# Remove initial blank space and quotation marks

rolling_stone["title"] = rolling_stone["title"].str.strip(" ‘’")
rolling_stone

Unnamed: 0,artist,title
0,Lainey Wilson,Heart Like a Truck
1,Chronixx,Never Give Up
2,Plains,Problem With It
3,Hurray for the Riff Raff,Saga
4,Camilo ft. Grupo Firme,Alaska
...,...,...
95,Rosalia,Despecha
96,Taylor Swift,Karma
97,Steve Lacy,Bad Habit
98,Beyonce,Cuff It


In [28]:
rolling_stone["title"][0]

'Heart Like a Truck'

### NPR: 100 best songs of 2021
https://www.npr.org/2021/12/02/1054377950/the-100-best-songs-of-2021-page-1 <br>
This list is divided in multiple pages (5). We will start by webscrapping one of the pages and then create a loop to scrape all of them.

In [29]:
url = "https://www.npr.org/2021/12/02/1054380365/the-100-best-songs-of-2021-page-5"
response = requests.get(url)
response.status_code

200

In [30]:
soup = BeautifulSoup(response.content, "html.parser")

In [31]:
npr_songs = []

for a in soup.find_all("h3", attrs={"class": "edTag"}):
    npr_songs.append(a.get_text())

npr_songs

['Olivia Rodrigo',
 '"deja vu" ',
 'Emily Scott Robinson',
 '"Let \'em Burn" ',
 'Mdou Moctar',
 '"Afrique Victime" ',
 'Mitski',
 '"Working for the Knife" ',
 'Lucky Daye (feat. Yebba)',
 '"How Much Can A Heart Take" ',
 'Remi Wolf',
 '"Grumpy Old Man" ',
 'Noname',
 '"Rainforest" ',
 'Sun-EL Musician (feat. Simmy)',
 '"Higher"',
 'Baby Keem (feat. Kendrick Lamar)',
 '"range brothers" ',
 'Brandee Younger',
 '"Reclamation"',
 'Brandi Carlile',
 '"Broken Horses" ',
 'Olivia Rodrigo',
 '"good 4 u" ',
 'Chlöe',
 '"Have Mercy" ',
 'Rauw Alejandro',
 '"Todo de Ti" ',
 'Cassandra Jenkins',
 '"Hard Drive" ',
 'Sharon Van Etten & Angel Olsen',
 '"Like I Used To" ',
 'Megan Thee Stallion',
 '"Thot S***" ',
 'Lucy Dacus',
 '"Thumbs" ',
 'Wet Leg',
 '"Chaise Longue" ',
 'Lil Nas X',
 '"MONTERO (Call Me By Your Name)" ',
 '< Previous']

In [32]:
# Remove the last element since it's not a song (it's the link to the previous page)

npr_songs.pop()
npr_songs

['Olivia Rodrigo',
 '"deja vu" ',
 'Emily Scott Robinson',
 '"Let \'em Burn" ',
 'Mdou Moctar',
 '"Afrique Victime" ',
 'Mitski',
 '"Working for the Knife" ',
 'Lucky Daye (feat. Yebba)',
 '"How Much Can A Heart Take" ',
 'Remi Wolf',
 '"Grumpy Old Man" ',
 'Noname',
 '"Rainforest" ',
 'Sun-EL Musician (feat. Simmy)',
 '"Higher"',
 'Baby Keem (feat. Kendrick Lamar)',
 '"range brothers" ',
 'Brandee Younger',
 '"Reclamation"',
 'Brandi Carlile',
 '"Broken Horses" ',
 'Olivia Rodrigo',
 '"good 4 u" ',
 'Chlöe',
 '"Have Mercy" ',
 'Rauw Alejandro',
 '"Todo de Ti" ',
 'Cassandra Jenkins',
 '"Hard Drive" ',
 'Sharon Van Etten & Angel Olsen',
 '"Like I Used To" ',
 'Megan Thee Stallion',
 '"Thot S***" ',
 'Lucy Dacus',
 '"Thumbs" ',
 'Wet Leg',
 '"Chaise Longue" ',
 'Lil Nas X',
 '"MONTERO (Call Me By Your Name)" ']

In [33]:
# Take only the artists (uneven elements of the list)

artists = []
i = 0

while i < len(npr_songs):
    artists.append(npr_songs[i])
    i += 2

artists  

['Olivia Rodrigo',
 'Emily Scott Robinson',
 'Mdou Moctar',
 'Mitski',
 'Lucky Daye (feat. Yebba)',
 'Remi Wolf',
 'Noname',
 'Sun-EL Musician (feat. Simmy)',
 'Baby Keem (feat. Kendrick Lamar)',
 'Brandee Younger',
 'Brandi Carlile',
 'Olivia Rodrigo',
 'Chlöe',
 'Rauw Alejandro',
 'Cassandra Jenkins',
 'Sharon Van Etten & Angel Olsen',
 'Megan Thee Stallion',
 'Lucy Dacus',
 'Wet Leg',
 'Lil Nas X']

In [34]:
# Take only the titles (even elements of the list)

titles = []
i = 1

while i < len(npr_songs):
    titles.append(npr_songs[i])
    i += 2

titles    

['"deja vu" ',
 '"Let \'em Burn" ',
 '"Afrique Victime" ',
 '"Working for the Knife" ',
 '"How Much Can A Heart Take" ',
 '"Grumpy Old Man" ',
 '"Rainforest" ',
 '"Higher"',
 '"range brothers" ',
 '"Reclamation"',
 '"Broken Horses" ',
 '"good 4 u" ',
 '"Have Mercy" ',
 '"Todo de Ti" ',
 '"Hard Drive" ',
 '"Like I Used To" ',
 '"Thot S***" ',
 '"Thumbs" ',
 '"Chaise Longue" ',
 '"MONTERO (Call Me By Your Name)" ']

Now that we have successfully obtained the elements we wanted in one of the pages, let's create a foor loop for webscrapping the 5 pages.

In [35]:
# First, define a function for extracting the information

def get_song_info(soup):
    npr_songs = []
    artists = []
    titles = []

    for a in soup.find_all("h3", attrs={"class": "edTag"}):
        npr_songs.append(a.get_text())

    npr_songs.pop()
    
    i = 0
    while i < len(npr_songs):
        artists.append(npr_songs[i])
        i += 2
        
    i = 1
    while i < len(npr_songs):
        titles.append(npr_songs[i])
        i += 2
        
    dct = {"artist": artists, "title": titles}
    
    return dct

In [36]:
# Create loop to webscrap all pages

npr = pd.DataFrame()

for page in range(1, 6):
    request = requests.get(f"https://www.npr.org/2021/12/02/1054380365/the-100-best-songs-of-2021-page-{page}")
    soup = BeautifulSoup(request.content, 'html.parser')
    info_dct = get_song_info(soup)
    new_df = pd.DataFrame.from_dict(info_dct)
    npr = pd.concat([npr, new_df])    
    
npr

Unnamed: 0,artist,title
0,Olivia Rodrigo,"""deja vu"""
1,Emily Scott Robinson,"""Let 'em Burn"""
2,Mdou Moctar,"""Afrique Victime"""
3,Mitski,"""Working for the Knife"""
4,Lucky Daye (feat. Yebba),"""How Much Can A Heart Take"""
...,...,...
15,Sharon Van Etten & Angel Olsen,"""Like I Used To"""
16,Megan Thee Stallion,"""Thot S***"""
17,Lucy Dacus,"""Thumbs"""
18,Wet Leg,"""Chaise Longue"""


In [37]:
# Remove quotation marks

# Firts check that we can use replace, by counting number of quotation marks (should be 200)
counting = npr["title"].str.count('"')
counting.sum()

200

In [38]:
# Now we can use replace knowing that we won't be replacing something that we don't want to 

npr["title"] = npr["title"].str.replace('"', '')
npr

Unnamed: 0,artist,title
0,Olivia Rodrigo,deja vu
1,Emily Scott Robinson,Let 'em Burn
2,Mdou Moctar,Afrique Victime
3,Mitski,Working for the Knife
4,Lucky Daye (feat. Yebba),How Much Can A Heart Take
...,...,...
15,Sharon Van Etten & Angel Olsen,Like I Used To
16,Megan Thee Stallion,Thot S***
17,Lucy Dacus,Thumbs
18,Wet Leg,Chaise Longue


In [39]:
npr.duplicated().value_counts()

True     80
False    20
dtype: int64

In [40]:
# For some reason it is getting only the last page

request = requests.get("https://www.npr.org/2021/12/02/1054380365/the-100-best-songs-of-2021-page-1")
soup = BeautifulSoup(request.content, 'html.parser')

npr_songs = []
for a in soup.find_all("h3", attrs={"class": "edTag"}):
    npr_songs.append(a.get_text())
    
npr_songs

['Olivia Rodrigo',
 '"deja vu" ',
 'Emily Scott Robinson',
 '"Let \'em Burn" ',
 'Mdou Moctar',
 '"Afrique Victime" ',
 'Mitski',
 '"Working for the Knife" ',
 'Lucky Daye (feat. Yebba)',
 '"How Much Can A Heart Take" ',
 'Remi Wolf',
 '"Grumpy Old Man" ',
 'Noname',
 '"Rainforest" ',
 'Sun-EL Musician (feat. Simmy)',
 '"Higher"',
 'Baby Keem (feat. Kendrick Lamar)',
 '"range brothers" ',
 'Brandee Younger',
 '"Reclamation"',
 'Brandi Carlile',
 '"Broken Horses" ',
 'Olivia Rodrigo',
 '"good 4 u" ',
 'Chlöe',
 '"Have Mercy" ',
 'Rauw Alejandro',
 '"Todo de Ti" ',
 'Cassandra Jenkins',
 '"Hard Drive" ',
 'Sharon Van Etten & Angel Olsen',
 '"Like I Used To" ',
 'Megan Thee Stallion',
 '"Thot S***" ',
 'Lucy Dacus',
 '"Thumbs" ',
 'Wet Leg',
 '"Chaise Longue" ',
 'Lil Nas X',
 '"MONTERO (Call Me By Your Name)" ',
 '< Previous']

### The 200 Best Songs of the 1960s
https://pitchfork.com/features/lists-and-guides/6405-the-200-greatest-songs-of-the-1960s/ <br>
Try another one for multipages (this one has 10 pages)

In [41]:
request = requests.get("https://pitchfork.com/features/lists-and-guides/6405-the-200-greatest-songs-of-the-1960s/")
soup = BeautifulSoup(request.content, 'html.parser')

artists = []
for a in soup.find_all("ul", attrs={"class": "artist-list list-blurb__artists"}):
    artists.append(a.get_text())
    
artists

['The Kinks',
 'Nina Simone',
 'Dionne Warwick',
 'Charles Mingus',
 'Irma Thomas',
 'James Brown',
 'The Foundations',
 'Johnny and June Carter Cash',
 'Alton Ellis',
 'The Cannonball Adderley Quintet',
 'Leonard Cohen',
 'The Sonics',
 'Tyrannosaurus Rex',
 'The Walker Brothers',
 'The Hollies',
 'The Temptations',
 'James Brown',
 'Bobby Darin',
 'Patsy Cline',
 'France Gall']

In [42]:
titles = []
for a in soup.find_all("h2", attrs={"class": "list-blurb__work-title"}):
    titles.append(a.get_text())
    
titles

['“Sunny Afternoon”',
 '“Black Is the Color of My True Love’s Hair”',
 '“Walk on By”',
 '“Solo Dancer”',
 '“Time Is on My Side”',
 '“Night Train (Live at the Apollo)”',
 '“Build Me Up Buttercup”',
 '“Jackson”',
 '“I’m Still in Love With You”',
 '“Mercy, Mercy, Mercy”',
 '“So Long, Marianne”',
 '“Strychnine”',
 '“Debora”',
 '“The Sun Ain’t Gonna Shine Anymore”',
 '“Bus Stop”',
 '“Get Ready”',
 '“Mother Popcorn (You Got to Have a Mother for Me)”',
 '“Beyond the Sea”',
 '“She’s Got You”',
 '“Laisse Tomber les Filles”']

In [43]:
# First, define a function for extracting the information

def get_song_info(soup):
    artists = []
    titles = []

    for a in soup.find_all("ul", attrs={"class": "artist-list list-blurb__artists"}):
        artists.append(a.get_text())

    for a in soup.find_all("h2", attrs={"class": "list-blurb__work-title"}):
        titles.append(a.get_text())
        
    dct = {"artist": artists, "title": titles}
    
    return dct

In [44]:
# Create loop to webscrap all pages

sixties = pd.DataFrame()

for page in range(1, 11):
    request = requests.get(f"https://pitchfork.com/features/lists-and-guides/6405-the-200-greatest-songs-of-the-1960s/?page={page}")
    soup = BeautifulSoup(request.content, 'html.parser')
    info_dct = get_song_info(soup)
    new_df = pd.DataFrame.from_dict(info_dct)
    sixties = pd.concat([sixties, new_df])    
    
sixties

Unnamed: 0,artist,title
0,The Kinks,“Sunny Afternoon”
1,Nina Simone,“Black Is the Color of My True Love’s Hair”
2,Dionne Warwick,“Walk on By”
3,Charles Mingus,“Solo Dancer”
4,Irma Thomas,“Time Is on My Side”
...,...,...
15,The Beatles,“A Day in the Life”
16,Bob Dylan,“Like a Rolling Stone”
17,Sam Cooke,“A Change Is Gonna Come”
18,The Jackson 5,“I Want You Back”


In [45]:
# Removing quotation marks

sixties["title"] = sixties["title"].str.strip("“”")
sixties

Unnamed: 0,artist,title
0,The Kinks,Sunny Afternoon
1,Nina Simone,Black Is the Color of My True Love’s Hair
2,Dionne Warwick,Walk on By
3,Charles Mingus,Solo Dancer
4,Irma Thomas,Time Is on My Side
...,...,...
15,The Beatles,A Day in the Life
16,Bob Dylan,Like a Rolling Stone
17,Sam Cooke,A Change Is Gonna Come
18,The Jackson 5,I Want You Back


### Join dataframes

In [46]:
songs = pd.concat([billboard, rolling_stone, sixties]).reset_index(drop=True)
songs

Unnamed: 0,artist,title
0,Morgan Wallen,Last Night
1,Miley Cyrus,Flowers
2,Luke Combs,Fast Car
3,Rema & Selena Gomez,Calm Down
4,Lil Durk Featuring J. Cole,All My Life
...,...,...
395,The Beatles,A Day in the Life
396,Bob Dylan,Like a Rolling Stone
397,Sam Cooke,A Change Is Gonna Come
398,The Jackson 5,I Want You Back


In [47]:
# Check for duplicates

songs.duplicated().value_counts()

False    400
dtype: int64

In [48]:
# Save the csv

songs.to_csv("songs.csv")

### Conclusion
We obtained a csv with 400 unique songs (current top 100 songs by Billboard, best 100 songs of 2022 by Rolling Stone, and 200 best songs of the 60s by Pitchfork). <br>
Also tried to get the best 100 songs of 2021 by NPR but had problems with the webscrapping.