# lab-web-scraping-single-page

## Case Study: The site for recommendations - "Gnod"

### Scenario

#### You have been hired as a Data Analyst for "Gnod".
#### "Gnod" is a site that provides recommendations for music, art, literature and products based on collaborative filtering algorithms. Their flagship product is the music recommender, which you can try at www.gnoosic.com. The site asks users to input 3 bands they like, and computes similarity scores with the rest of the users. Then, they recommend to the user bands that users with similar tastes have picked.
#### "Gnod" is a small company, and its only revenue stream so far are adds in the site. In the future, they would like to explore partnership options with music apps (such as Deezer, Soundcloud or even Apple Music and Spotify). However, for that to be possible, they need to expand and improve their recommendations.
#### That's precisely where you come. They have hired you as a Data Analyst, and they expect you to bring a mix of technical expertise and business mindset to the table.
#### Jane, CTO of Gnod, has sent you an email assigning you with your first task.

### Task(s)
#### This is an e-mail Jane - CTO of Gnod - sent over your inbox in the first weeks working there.
#### Dear xxxxxxxx, We are thrilled to welcome you as a Data Analyst for Gnoosic!
#### As you know, we are trying to come up with ways to enhance our music recommendations. One of the new features we'd like to research is to recommend songs (not only bands). We're also aware of the limitations of our collaborative filtering algorithms, and would like to give users two new possibilities when searching for recommendations:
#### Songs that are actually similar to the ones they picked from an acoustic point of view.
#### Songs that are popular around the world right now, independently from their tastes.
#### Coming up with the perfect song recommender will take us months - no need to stress out too much. In this first week, we want you to explore new data sources for songs. The Internet is full of information and our first step is to acquire it do an initial exploration. Feel free to use APIs or directly scrape the web to collect as much information as possible from popular songs. Eventually, we'll need to collect data from millions of songs, but we can start with a few hundreds or thousands from each source and see if the collected features are useful.
#### Once the data is collected, we want you to create clusters of songs that are similar to each other. The idea is that if a user inputs a song from one group, we'll prioritize giving them recommendations of songs from that same group.
#### On Friday, you will present your work to me and Marek, the CEO and founder. Full disclosure: I need you to be very convincing about this whole song-recommender, as this has been my personal push and the main reason we hired you for!
#### Be open minded about this process: we are agile, and that means that we define our products and features on-the-go, while exploring the tools and the data that's available to us. We'd love you to provide your own vision of the product and the next steps to be taken.
#### Lots of luck and strength for this first week with us!
#### -Jane

#### Business goal:
#### Check the case_study_gnod.md file.
#### Make sure you've understood the big picture of your project:
#### the goal of the company (Gnod),
#### their current product (Gnoosic),
#### their strategy, and
#### how your project fits into this context.
#### Re-read the business case and the e-mail from the CTO, take a look at the flowchart and create an initial Trello board with the tasks you think you'll have to accomplish.

### Instructions - Scraping popular songs

#### Your product will take a song as an input from the user and will output another song (the recommendation). In most cases, the recommended song will have to be similar to the inputted song, but the CTO thinks that if the song is on the top charts at the moment, the user will enjoy more a recommendation of a song that's also popular at the moment.
#### You have find data on the internet about currently popular songs. Billboard maintains a weekly Top 100 of "hot" songs here: https://www.billboard.com/charts/hot-100.
#### It's a good place to start! Scrape the current top 100 songs and their respective artists, and put the information into a pandas dataframe.

In [40]:
# importing useful libraries 
#! pip install bs4 # (installing the library in case it is not installed)
from bs4 import BeautifulSoup
import requests
import pandas as pd

In [41]:
# Let's get the URL and store it in a variable
url="https://www.billboard.com/charts/hot-100/"

In [42]:
# getting the HTML code from our URL using request from requests library and then getting the status code
request_charts = requests.get(url)
print("request_charts:", request_charts.status_code)
# the status code is 200 so we do not face any issue

request_charts: 200


In [43]:
# getting the code with the attribute content
request_charts.content[:100]
# Since we essentially have a giant string of HTML, we can print a slice of 100 characters to confirm we have the source of 
# the page and now it is not messy

b'<!DOCTYPE html>\n<!--[if IE 6]>\n<html id="ie6" lang="en-US">\n<![endif]-->\n<!--[if IE 7]>\n<html id="ie'

In [66]:
# parsing the element and getting the code with the attribute content using the 'html.parser' so we know that we have html code
# Print the prettify version of soup instead if the simple soup, so it is not so messy like previously
soup = BeautifulSoup(request_charts.content, 'html.parser')
# soup
# html well indented. not always works great...
print(soup.prettify()[:3000])
# we could say that the html code looks like the way it should look and it is saved in a beautiful soup object

<!DOCTYPE html>
<!--[if IE 6]>
<html id="ie6" lang="en-US">
<![endif]-->
<!--[if IE 7]>
<html id="ie7" lang="en-US">
<![endif]-->
<!--[if IE 8]>
<html id="ie8" lang="en-US">
<![endif]-->
<!--[if !(IE 6) | !(IE 7) | !(IE 8) ]><!-->
<html lang="en-US">
 <!--<![endif]-->
 <head>
  <meta charset="utf-8"/>
  <meta content="IE=edge,chrome=1" http-equiv="X-UA-Compatible"/>
  <meta content="#ffffff" name="theme-color"/>
  <meta content="width=device-width, initial-scale=1.0" name="viewport">
   <!-- Add to home screen for iOS -->
   <meta content="black-translucent" name="apple-mobile-web-app-status-bar-style"/>
   <link href="https://www.billboard.com/wp-content/themes/vip/pmc-billboard-2021/assets/app/icons/apple-touch-icon.png" rel="apple-touch-icon" sizes="180x180"/>
   <!-- Tile icons for Windows -->
   <meta content="https://www.billboard.com/wp-content/themes/vip/pmc-billboard-2021/assets/app/browserconfig.xml" name="msapplication-config"/>
   <meta content="https://www.billboard.com/wp

#### playing with the code

In [45]:
# basic tree navigation, let's get the title of the object
soup.title.get_text(strip=True)

'Billboard Hot 100 – Billboard'

In [46]:
# or
soup.title.string

'Billboard Hot 100 – Billboard'

In [47]:
soup.title.name

'title'

In [48]:
# finding the children of the body
# for i in soup.body.children:
#     print(i)
#     print()

In [49]:
# finding the paragraphs
soup.p

<p class="c-tagline a-font-primary-s lrv-u-padding-b-1">THE WEEK’S MOST POPULAR CURRENT SONGS ACROSS ALL GENRES, RANKED BY STREAMING ACTIVITY DATA BY ONLINE MUSIC SOURCES TRACKED BY LUMINATE, RADIO AIRPLAY AUDIENCE IMPRESSIONS AS MEASURED BY LUMINATE AND SALES DATA AS COMPILED BY LUMINATE.</p>

In [50]:
# getting 10 of the lists
soup.find_all("li")[:10]

[<li class="o-nav__list-item lrv-u-align-items-center lrv-u-flex">
 <a class="c-link lrv-a-unstyle-link lrv-a-unstyle-link lrv-u-color-brand-accent-blue:hover lrv-a-hover-effect lrv-u-whitespace-nowrap lrv-u-color-grey-lightest" href="/charts">
 	Charts</a>
 </li>,
 <li class="o-nav__list-item lrv-u-align-items-center lrv-u-flex">
 <a class="c-link lrv-a-unstyle-link lrv-a-unstyle-link lrv-u-color-brand-accent-blue:hover lrv-a-hover-effect lrv-u-whitespace-nowrap lrv-u-color-grey-lightest" href="https://www.billboard.com/c/music/">
 	Music</a>
 </li>,
 <li class="o-nav__list-item lrv-u-align-items-center lrv-u-flex">
 <a class="c-link lrv-a-unstyle-link lrv-a-unstyle-link lrv-u-color-brand-accent-blue:hover lrv-a-hover-effect lrv-u-whitespace-nowrap lrv-u-color-grey-lightest" href="https://www.billboard.com/c/culture/">
 	Culture</a>
 </li>,
 <li class="o-nav__list-item lrv-u-align-items-center lrv-u-flex">
 <a class="c-link lrv-a-unstyle-link lrv-a-unstyle-link lrv-u-color-brand-accen

In [51]:
# get some elements (100) of a tag
ps = [i.get_text(strip=True) for i in soup.find_all("p")]
ps[:100]

['THE WEEK’S MOST POPULAR CURRENT SONGS ACROSS ALL GENRES, RANKED BY STREAMING ACTIVITY DATA BY ONLINE MUSIC SOURCES TRACKED BY LUMINATE, RADIO AIRPLAY AUDIENCE IMPRESSIONS AS MEASURED BY LUMINATE AND SALES DATA AS COMPILED BY LUMINATE.',
 'Lizzo',
 'Last week',
 'Weeks at no. 1',
 'Weeks on chart',
 'B.Slatktin, E.B.Frederic, L.Price, M.McLaren, M.Jefferson, R.Larkins, S.Hague, T.M.Thomas',
 'Ricky Reed, B.Slatkin',
 'Nice Life/Atlantic',
 'Biggest gain in digital sales',
 'Gains In Performance',
 'Week of July 30, 2022',
 'click to see more',
 'B.Slatktin, E.B.Frederic, L.Price, M.McLaren, M.Jefferson, R.Larkins, S.Hague, T.M.Thomas',
 'Ricky Reed, B.Slatkin',
 'Nice Life/Atlantic',
 'Biggest gain in digital sales',
 'Gains In Performance',
 'H.Styles, T.E.P.Hull, T.Johnson',
 'Kid Harpoon, T.Johnson',
 'Erskine/Columbia',
 'K.Bush',
 'K.Bush',
 'Fish People/Noble And Brite/Rhino/Warner',
 'Biggest gain in airplay',
 'Gains In Performance',
 'J.T.Harlow, D.Ford, J.Velazquez, R.Chahay

#### songs

In [52]:
# Let's find the songs by recognizing a pattern on them
soup.find_all('h3', attrs={'class': 'c-title a-no-trucate a-font-primary-bold-s u-letter-spacing-0021 lrv-u-font-size-18@tablet lrv-u-font-size-16 u-line-height-125 u-line-height-normal@mobile-max a-truncate-ellipsis u-max-width-330 u-max-width-230@tablet-only'})

[<h3 class="c-title a-no-trucate a-font-primary-bold-s u-letter-spacing-0021 lrv-u-font-size-18@tablet lrv-u-font-size-16 u-line-height-125 u-line-height-normal@mobile-max a-truncate-ellipsis u-max-width-330 u-max-width-230@tablet-only" id="title-of-a-story">
 
 	
 	
 		
 					As It Was		
 	
 </h3>,
 <h3 class="c-title a-no-trucate a-font-primary-bold-s u-letter-spacing-0021 lrv-u-font-size-18@tablet lrv-u-font-size-16 u-line-height-125 u-line-height-normal@mobile-max a-truncate-ellipsis u-max-width-330 u-max-width-230@tablet-only" id="title-of-a-story">
 
 	
 	
 		
 					Running Up That Hill (A Deal With God)		
 	
 </h3>,
 <h3 class="c-title a-no-trucate a-font-primary-bold-s u-letter-spacing-0021 lrv-u-font-size-18@tablet lrv-u-font-size-16 u-line-height-125 u-line-height-normal@mobile-max a-truncate-ellipsis u-max-width-330 u-max-width-230@tablet-only" id="title-of-a-story">
 
 	
 	
 		
 					First Class		
 	
 </h3>,
 <h3 class="c-title a-no-trucate a-font-primary-bold-s u-letter-sp

In [53]:
# looping and getting the text only
for title in soup.find_all('h3', attrs={'class': 'c-title a-no-trucate a-font-primary-bold-s u-letter-spacing-0021 lrv-u-font-size-18@tablet lrv-u-font-size-16 u-line-height-125 u-line-height-normal@mobile-max a-truncate-ellipsis u-max-width-330 u-max-width-230@tablet-only'}):
    print(title.get_text())



	
	
		
					As It Was		
	



	
	
		
					Running Up That Hill (A Deal With God)		
	



	
	
		
					First Class		
	



	
	
		
					Wait For U		
	



	
	
		
					Me Porto Bonito		
	



	
	
		
					Break My Soul		
	



	
	
		
					Heat Waves		
	



	
	
		
					Late Night Talking		
	



	
	
		
					Jimmy Cooks		
	



	
	
		
					Big Energy		
	



	
	
		
					I Like You (A Happier Song)		
	



	
	
		
					Wasted On You		
	



	
	
		
					Bad Habit		
	



	
	
		
					Titi Me Pregunto		
	



	
	
		
					Sunroof		
	



	
	
		
					The Kind Of Love We Make		
	



	
	
		
					Stay		
	



	
	
		
					Glimpse Of Us		
	



	
	
		
					Numb Little Bug		
	



	
	
		
					Ghost		
	



	
	
		
					Get Into It (Yuh)		
	



	
	
		
					You Proof		
	



	
	
		
					Moscow Mule		
	



	
	
		
					Bad Habits		
	



	
	
		
					She Had Me At Heads Carolina		
	



	
	
		
					I Ain't Worried		
	



	
	
		
					Shivers		
	



	
	
		
					Cold Heart (PNAU Remix)		
	



	
	
		
					Boyfriend		
	



	
	
		
				

In [54]:
# creating a list where we append the titles, titles have specific tag and specific attribute
songs = []
for title in soup.find_all('h3', attrs={'class': 'c-title a-no-trucate a-font-primary-bold-s u-letter-spacing-0021 lrv-u-font-size-18@tablet lrv-u-font-size-16 u-line-height-125 u-line-height-normal@mobile-max a-truncate-ellipsis u-max-width-330 u-max-width-230@tablet-only'}):
    songs.append(title.get_text().strip()) # method strip: removes adjacent characters and gaps in the text   
# getting the songs
songs

['As It Was',
 'Running Up That Hill (A Deal With God)',
 'First Class',
 'Wait For U',
 'Me Porto Bonito',
 'Break My Soul',
 'Heat Waves',
 'Late Night Talking',
 'Jimmy Cooks',
 'Big Energy',
 'I Like You (A Happier Song)',
 'Wasted On You',
 'Bad Habit',
 'Titi Me Pregunto',
 'Sunroof',
 'The Kind Of Love We Make',
 'Stay',
 'Glimpse Of Us',
 'Numb Little Bug',
 'Ghost',
 'Get Into It (Yuh)',
 'You Proof',
 'Moscow Mule',
 'Bad Habits',
 'She Had Me At Heads Carolina',
 "I Ain't Worried",
 'Shivers',
 'Cold Heart (PNAU Remix)',
 'Boyfriend',
 'Vegas',
 'Sticky',
 'Something In The Orange',
 'In A Minute',
 'Fall In Love',
 'Hot Shit',
 'Like I Love Country Music',
 'Enemy',
 'Damn Strait',
 'Woman',
 'Provenza',
 'Rock And A Hard Place',
 'Super Gremlin',
 'Thats What I Want',
 'Efecto',
 'Take My Name',
 'Thousand Miles',
 'What Happened To Virgil',
 'Sweetest Pie',
 'Master Of Puppets',
 'Left And Right',
 'Sleazy Flow',
 '2step',
 '5 Foot 9',
 'Cooped Up',
 'Despues de La Playa'

#### second way to get the songs

In [55]:
# Let's find the songs by recognizing a pattern on them
soup.select("li.o-chart-results-list__item > h3")

[<h3 class="c-title a-no-trucate a-font-primary-bold-s u-letter-spacing-0021 u-font-size-23@tablet lrv-u-font-size-16 u-line-height-125 u-line-height-normal@mobile-max a-truncate-ellipsis u-max-width-245 u-max-width-230@tablet-only u-letter-spacing-0028@tablet" id="title-of-a-story">
 
 	
 	
 		
 					About Damn Time		
 	
 </h3>,
 <h3 class="c-title a-no-trucate a-font-primary-bold-s u-letter-spacing-0021 lrv-u-font-size-18@tablet lrv-u-font-size-16 u-line-height-125 u-line-height-normal@mobile-max a-truncate-ellipsis u-max-width-330 u-max-width-230@tablet-only" id="title-of-a-story">
 
 	
 	
 		
 					As It Was		
 	
 </h3>,
 <h3 class="c-title a-no-trucate a-font-primary-bold-s u-letter-spacing-0021 lrv-u-font-size-18@tablet lrv-u-font-size-16 u-line-height-125 u-line-height-normal@mobile-max a-truncate-ellipsis u-max-width-330 u-max-width-230@tablet-only" id="title-of-a-story">
 
 	
 	
 		
 					Running Up That Hill (A Deal With God)		
 	
 </h3>,
 <h3 class="c-title a-no-trucate a-fo

In [56]:
# looping and getting the text only
for title in soup.select("li.o-chart-results-list__item > h3"):
    print(title.get_text())



	
	
		
					About Damn Time		
	



	
	
		
					As It Was		
	



	
	
		
					Running Up That Hill (A Deal With God)		
	



	
	
		
					First Class		
	



	
	
		
					Wait For U		
	



	
	
		
					Me Porto Bonito		
	



	
	
		
					Break My Soul		
	



	
	
		
					Heat Waves		
	



	
	
		
					Late Night Talking		
	



	
	
		
					Jimmy Cooks		
	



	
	
		
					Big Energy		
	



	
	
		
					I Like You (A Happier Song)		
	



	
	
		
					Wasted On You		
	



	
	
		
					Bad Habit		
	



	
	
		
					Titi Me Pregunto		
	



	
	
		
					Sunroof		
	



	
	
		
					The Kind Of Love We Make		
	



	
	
		
					Stay		
	



	
	
		
					Glimpse Of Us		
	



	
	
		
					Numb Little Bug		
	



	
	
		
					Ghost		
	



	
	
		
					Get Into It (Yuh)		
	



	
	
		
					You Proof		
	



	
	
		
					Moscow Mule		
	



	
	
		
					Bad Habits		
	



	
	
		
					She Had Me At Heads Carolina		
	



	
	
		
					I Ain't Worried		
	



	
	
		
					Shivers		
	



	
	
		
					Cold Heart (PNAU Remix)		
	



	
	
	

In [57]:
# creating a list where we append the titles, titles have specific tag and specific attribute
songs = []
for title in soup.select("li.o-chart-results-list__item > h3"):
    songs.append(title.get_text().strip()) # method strip: removes adjacent characters and gaps in the text   
# getting the songs
songs

['About Damn Time',
 'As It Was',
 'Running Up That Hill (A Deal With God)',
 'First Class',
 'Wait For U',
 'Me Porto Bonito',
 'Break My Soul',
 'Heat Waves',
 'Late Night Talking',
 'Jimmy Cooks',
 'Big Energy',
 'I Like You (A Happier Song)',
 'Wasted On You',
 'Bad Habit',
 'Titi Me Pregunto',
 'Sunroof',
 'The Kind Of Love We Make',
 'Stay',
 'Glimpse Of Us',
 'Numb Little Bug',
 'Ghost',
 'Get Into It (Yuh)',
 'You Proof',
 'Moscow Mule',
 'Bad Habits',
 'She Had Me At Heads Carolina',
 "I Ain't Worried",
 'Shivers',
 'Cold Heart (PNAU Remix)',
 'Boyfriend',
 'Vegas',
 'Sticky',
 'Something In The Orange',
 'In A Minute',
 'Fall In Love',
 'Hot Shit',
 'Like I Love Country Music',
 'Enemy',
 'Damn Strait',
 'Woman',
 'Provenza',
 'Rock And A Hard Place',
 'Super Gremlin',
 'Thats What I Want',
 'Efecto',
 'Take My Name',
 'Thousand Miles',
 'What Happened To Virgil',
 'Sweetest Pie',
 'Master Of Puppets',
 'Left And Right',
 'Sleazy Flow',
 '2step',
 '5 Foot 9',
 'Cooped Up',
 '

#### artists

In [58]:
# Let's find the artists by recognizing a pattern on them
soup.find_all('span', attrs={'class': 'c-label a-no-trucate a-font-primary-s lrv-u-font-size-14@mobile-max u-line-height-normal@mobile-max u-letter-spacing-0021 lrv-u-display-block a-truncate-ellipsis-2line u-max-width-330 u-max-width-230@tablet-only'})

[<span class="c-label a-no-trucate a-font-primary-s lrv-u-font-size-14@mobile-max u-line-height-normal@mobile-max u-letter-spacing-0021 lrv-u-display-block a-truncate-ellipsis-2line u-max-width-330 u-max-width-230@tablet-only">
 	
 	Harry Styles
 </span>,
 <span class="c-label a-no-trucate a-font-primary-s lrv-u-font-size-14@mobile-max u-line-height-normal@mobile-max u-letter-spacing-0021 lrv-u-display-block a-truncate-ellipsis-2line u-max-width-330 u-max-width-230@tablet-only">
 	
 	Kate Bush
 </span>,
 <span class="c-label a-no-trucate a-font-primary-s lrv-u-font-size-14@mobile-max u-line-height-normal@mobile-max u-letter-spacing-0021 lrv-u-display-block a-truncate-ellipsis-2line u-max-width-330 u-max-width-230@tablet-only">
 	
 	Jack Harlow
 </span>,
 <span class="c-label a-no-trucate a-font-primary-s lrv-u-font-size-14@mobile-max u-line-height-normal@mobile-max u-letter-spacing-0021 lrv-u-display-block a-truncate-ellipsis-2line u-max-width-330 u-max-width-230@tablet-only">
 	
 	Fut

In [59]:
# looping and getting the text only
for artist in soup.find_all('span', attrs={'class': 'c-label a-no-trucate a-font-primary-s lrv-u-font-size-14@mobile-max u-line-height-normal@mobile-max u-letter-spacing-0021 lrv-u-display-block a-truncate-ellipsis-2line u-max-width-330 u-max-width-230@tablet-only'}):
    print(artist.get_text())


	
	Harry Styles


	
	Kate Bush


	
	Jack Harlow


	
	Future Featuring Drake & Tems


	
	Bad Bunny & Chencho Corleone


	
	Beyonce


	
	Glass Animals


	
	Harry Styles


	
	Drake Featuring 21 Savage


	
	Latto


	
	Post Malone Featuring Doja Cat


	
	Morgan Wallen


	
	Steve Lacy


	
	Bad Bunny


	
	Nicky Youre & dazy


	
	Luke Combs


	
	The Kid LAROI & Justin Bieber


	
	Joji


	
	Em Beihold


	
	Justin Bieber


	
	Doja Cat


	
	Morgan Wallen


	
	Bad Bunny


	
	Ed Sheeran


	
	Cole Swindell


	
	OneRepublic


	
	Ed Sheeran


	
	Elton John & Dua Lipa


	
	Dove Cameron


	
	Doja Cat


	
	Drake


	
	Zach Bryan


	
	Lil Baby


	
	Bailey Zimmerman


	
	Cardi B, Ye & Lil Durk


	
	Kane Brown


	
	Imagine Dragons X JID


	
	Scotty McCreery


	
	Doja Cat


	
	Karol G


	
	Bailey Zimmerman


	
	Kodak Black


	
	Lil Nas X


	
	Bad Bunny


	
	Parmalee


	
	The Kid LAROI


	
	Lil Durk Featuring Gunna


	
	Megan Thee Stallion & Dua Lipa


	
	Metallica


	
	Charlie Puth Featuring Jung Kook


	
	S

In [60]:
# creating a list where we append the artists, artists have specific tag and specific attribute
artists = []
for artist in soup.find_all('span', attrs={'class': 'c-label a-no-trucate a-font-primary-s lrv-u-font-size-14@mobile-max u-line-height-normal@mobile-max u-letter-spacing-0021 lrv-u-display-block a-truncate-ellipsis-2line u-max-width-330 u-max-width-230@tablet-only'}):
    artists.append(artist.get_text().strip()) 
# getting the artists
artists

['Harry Styles',
 'Kate Bush',
 'Jack Harlow',
 'Future Featuring Drake & Tems',
 'Bad Bunny & Chencho Corleone',
 'Beyonce',
 'Glass Animals',
 'Harry Styles',
 'Drake Featuring 21 Savage',
 'Latto',
 'Post Malone Featuring Doja Cat',
 'Morgan Wallen',
 'Steve Lacy',
 'Bad Bunny',
 'Nicky Youre & dazy',
 'Luke Combs',
 'The Kid LAROI & Justin Bieber',
 'Joji',
 'Em Beihold',
 'Justin Bieber',
 'Doja Cat',
 'Morgan Wallen',
 'Bad Bunny',
 'Ed Sheeran',
 'Cole Swindell',
 'OneRepublic',
 'Ed Sheeran',
 'Elton John & Dua Lipa',
 'Dove Cameron',
 'Doja Cat',
 'Drake',
 'Zach Bryan',
 'Lil Baby',
 'Bailey Zimmerman',
 'Cardi B, Ye & Lil Durk',
 'Kane Brown',
 'Imagine Dragons X JID',
 'Scotty McCreery',
 'Doja Cat',
 'Karol G',
 'Bailey Zimmerman',
 'Kodak Black',
 'Lil Nas X',
 'Bad Bunny',
 'Parmalee',
 'The Kid LAROI',
 'Lil Durk Featuring Gunna',
 'Megan Thee Stallion & Dua Lipa',
 'Metallica',
 'Charlie Puth Featuring Jung Kook',
 'SleazyWorld Go Featuring Lil Baby',
 'Ed Sheeran Feat

#### second way to get the artists

In [61]:
for artist in soup.select("li.o-chart-results-list__item > span.c-label"):
    print(artist.get_text())
# we could also use
# for artist in soup.select("li.o-chart-results-list__item > span.c-label"):
#   print(artist.text.strip())


	
	1


	
	Lizzo


	
	2


	
	1


	
	14


	
	2


	
	1


	
	14


	
	2


	
	Harry Styles


	
	1


	
	1


	
	16


	
	1


	
	1


	
	16


	
	3


	
	Kate Bush


	
	4


	
	3


	
	28


	
	4


	
	3


	
	28


	
	4


	
	Jack Harlow


	
	3


	
	1


	
	15


	
	3


	
	1


	
	15


	
	5


	
	Future Featuring Drake & Tems


	
	5


	
	1


	
	12


	
	5


	
	1


	
	12


	
	6


	
	Bad Bunny & Chencho Corleone


	
	6


	
	6


	
	11


	
	6


	
	6


	
	11


	
	7


	
	Beyonce


	
	9


	
	7


	
	5


	
	9


	
	7


	
	5


	
	8


	
	Glass Animals


	
	7


	
	1


	
	79


	
	7


	
	1


	
	79


	
	9


	
	Harry Styles


	
	11


	
	4


	
	9


	
	11


	
	4


	
	9


	
	10


	
	Drake Featuring 21 Savage


	
	8


	
	1


	
	5


	
	8


	
	1


	
	5


	
	11


	
	Latto


	
	10


	
	3


	
	39


	
	10


	
	3


	
	39


	
	12


	
	Post Malone Featuring Doja Cat


	
	14


	
	9


	
	7


	
	14


	
	9


	
	7


	
	13


	
	Morgan Wallen


	
	12


	
	9


	
	36


	
	12


	
	9


	
	36


	
	14


	
	Steve Lacy


	
	50


	
	14


	
	3


	
	50




In [62]:
#creating a list of'artists' and using the methd strip so we do not have adjacent characters and gaps in the text
artists = []
for artist in soup.select("li.o-chart-results-list__item > span.c-label"):
    artists.append(artist.get_text().strip()) 
# getting the artists
artists

['1',
 'Lizzo',
 '2',
 '1',
 '14',
 '2',
 '1',
 '14',
 '2',
 'Harry Styles',
 '1',
 '1',
 '16',
 '1',
 '1',
 '16',
 '3',
 'Kate Bush',
 '4',
 '3',
 '28',
 '4',
 '3',
 '28',
 '4',
 'Jack Harlow',
 '3',
 '1',
 '15',
 '3',
 '1',
 '15',
 '5',
 'Future Featuring Drake & Tems',
 '5',
 '1',
 '12',
 '5',
 '1',
 '12',
 '6',
 'Bad Bunny & Chencho Corleone',
 '6',
 '6',
 '11',
 '6',
 '6',
 '11',
 '7',
 'Beyonce',
 '9',
 '7',
 '5',
 '9',
 '7',
 '5',
 '8',
 'Glass Animals',
 '7',
 '1',
 '79',
 '7',
 '1',
 '79',
 '9',
 'Harry Styles',
 '11',
 '4',
 '9',
 '11',
 '4',
 '9',
 '10',
 'Drake Featuring 21 Savage',
 '8',
 '1',
 '5',
 '8',
 '1',
 '5',
 '11',
 'Latto',
 '10',
 '3',
 '39',
 '10',
 '3',
 '39',
 '12',
 'Post Malone Featuring Doja Cat',
 '14',
 '9',
 '7',
 '14',
 '9',
 '7',
 '13',
 'Morgan Wallen',
 '12',
 '9',
 '36',
 '12',
 '9',
 '36',
 '14',
 'Steve Lacy',
 '50',
 '14',
 '3',
 '50',
 '14',
 '3',
 '15',
 'Bad Bunny',
 '13',
 '5',
 '11',
 '13',
 '5',
 '11',
 '16',
 'Nicky Youre & dazy',
 '17',


In [63]:
# keeping the artists-names only
# first we keep only the non numbers elements and then we also remove with a function the rest on non names elements
artists = [number for number in artists if not number.isdigit()]

elements_to_remove = ['-','NEW','RE-\nENTRY']

for items in artists:
    for x in elements_to_remove:
        if x in artists:
            artists.remove(x)
# we could also use
# [artists[i] for i in range(len(artists)) if i%7 == 0]
# as we found a pattern that the name of the artist is in every multiple of 7 position

In [64]:
# getting only the names
artists

['Lizzo',
 'Harry Styles',
 'Kate Bush',
 'Jack Harlow',
 'Future Featuring Drake & Tems',
 'Bad Bunny & Chencho Corleone',
 'Beyonce',
 'Glass Animals',
 'Harry Styles',
 'Drake Featuring 21 Savage',
 'Latto',
 'Post Malone Featuring Doja Cat',
 'Morgan Wallen',
 'Steve Lacy',
 'Bad Bunny',
 'Nicky Youre & dazy',
 'Luke Combs',
 'The Kid LAROI & Justin Bieber',
 'Joji',
 'Em Beihold',
 'Justin Bieber',
 'Doja Cat',
 'Morgan Wallen',
 'Bad Bunny',
 'Ed Sheeran',
 'Cole Swindell',
 'OneRepublic',
 'Ed Sheeran',
 'Elton John & Dua Lipa',
 'Dove Cameron',
 'Doja Cat',
 'Drake',
 'Zach Bryan',
 'Lil Baby',
 'Bailey Zimmerman',
 'Cardi B, Ye & Lil Durk',
 'Kane Brown',
 'Imagine Dragons X JID',
 'Scotty McCreery',
 'Doja Cat',
 'Karol G',
 'Bailey Zimmerman',
 'Kodak Black',
 'Lil Nas X',
 'Bad Bunny',
 'Parmalee',
 'The Kid LAROI',
 'Lil Durk Featuring Gunna',
 'Megan Thee Stallion & Dua Lipa',
 'Metallica',
 'Charlie Puth Featuring Jung Kook',
 'SleazyWorld Go Featuring Lil Baby',
 'Ed Sh

#### Scrape the current top 100 songs and their respective artists, and put the information into a pandas dataframe.

In [65]:
# creatimg a dataframe with information of the top 100 songs and their respective artists
top100 = pd.DataFrame({"songs": songs, "artists": artists})
top100

Unnamed: 0,songs,artists
0,About Damn Time,Lizzo
1,As It Was,Harry Styles
2,Running Up That Hill (A Deal With God),Kate Bush
3,First Class,Jack Harlow
4,Wait For U,Future Featuring Drake & Tems
...,...,...
95,Arson,j-hope
96,Right On,Lil Baby
97,Cash In Cash Out,"Pharrell Williams Featuring 21 Savage & Tyler,..."
98,La Corriente,Bad Bunny & Tony Dize
