#### Business goal

- Check the `case_study_gnod.md` file.
- Make sure you've understood the big picture of your project:

  - the goal of the company (`Gnod`),
  - their current product (`Gnoosic`),
  - their strategy, and
  - how your project fits into this context.

#### Instructions

1. Scraping popular songs

Your product will take a song as an input from the user and will output another song (the recommendation). In most cases, the recommended song will have to be similar to the inputted song, but the CTO thinks that if the song is on the top charts at the moment, the user will enjoy more a recommendation of a song that's also popular at the moment.

You have to find data on the internet about currently popular songs. Billboard maintains a weekly Top 100 of "hot" songs here: [https://www.billboard.com/charts/hot-100](https://www.billboard.com/charts/hot-100).

It's a good place to start! Scrape the current top 100 songs and their respective artists, and put the information into a pandas dataframe.

2. Expand the project

If you're done, you can try to expand the project on your own. Here are a few suggestions:

- Find other lists of hot songs on the internet and scrape them too: having a bigger pool of songs will be awesome!
- Apply the same logic to other "groups" of songs: the best songs from a decade or from a country / culture / language / genre.
- Wikipedia maintains a large collection of lists of songs: https://en.wikipedia.org/wiki/Lists_of_songs

POSSIBLE MUSIC CHART LINKS TO SCRAP FOR THE PROJECT: <br>
Kworb: https://kworb.net/spotify/country/global_daily_totals.html <br>
Rolling stone: https://www.rollingstone.com/music/music-lists/best-songs-2022-list-1234632381/camilo-ft-grupo-firme-alaska-1234632703/ <br>
Billaboard: https://www.billboard.com/charts/hot-100/ <br>
Youtube charts: https://charts.youtube.com/?hl=es

In [1]:
from bs4 import BeautifulSoup
import pandas as pd
import requests
import re
#import math

### 1. Scraping popular songs

In [2]:
#pd.read_html('https://www.billboard.com/charts/hot-100')

In [3]:
url = 'https://www.billboard.com/charts/hot-100'
response = requests.get(url)

response.status_code

200

200: Everything went okay and the result has been returned (if any).

In [4]:
html = response.content

The artist is found under the tag `h3` with `class="c-title  a-no-trucate a-font-primary-bold-s u-letter-spacing-0021 u-font-size-23@tablet lrv-u-font-size-16 u-line-height-125 u-line-height-normal@mobile-max a-truncate-ellipsis u-max-width-245 u-max-width-230@tablet-only u-letter-spacing-0028@tablet"`

The song under `span` with class = `"c-label a-no-trucate a-font-primary-s lrv-u-font-size-14@mobile-max u-line-height-normal@mobile-max " \
      "u-letter-spacing-0021 lrv-u-display-block a-truncate-ellipsis-2line u-max-width-330 " \
      "u-max-width-230@tablet-only u-font-size-20@tablet"`

In [14]:
soup = BeautifulSoup(html, 'html.parser')
#soup

In [6]:
# Defining the links
cls1 = "c-title a-no-trucate a-font-primary-bold-s u-letter-spacing-0021 u-font-size-23@tablet " \
      "lrv-u-font-size-16 u-line-height-125 u-line-height-normal@mobile-max a-truncate-ellipsis " \
      "u-max-width-245 u-max-width-230@tablet-only u-letter-spacing-0028@tablet"
cls2 = "c-label a-no-trucate a-font-primary-s lrv-u-font-size-14@mobile-max u-line-height-normal@mobile-max " \
      "u-letter-spacing-0021 lrv-u-display-block a-truncate-ellipsis-2line u-max-width-330 " \
      "u-max-width-230@tablet-only u-font-size-20@tablet"

# Defining the tags
song_tags = soup.find_all('h3', attrs={'class': cls1}) 

artist_tags = soup.find_all('span', attrs={'class': cls2})

# Extracting song and artist pairs
for song_tag, artist_tag in zip(song_tags, artist_tags):
    print(f'Song: {song_tag.text.strip()}, Artist: {artist_tag.text.strip()}')

Song: Last Night, Artist: Morgan Wallen


It seems that the first pair is different than the rest since the loop is stopping.

In [7]:
cls3 = "c-title a-no-trucate a-font-primary-bold-s u-letter-spacing-0021 lrv-u-font-size-18@tablet " \
        "lrv-u-font-size-16 u-line-height-125 u-line-height-normal@mobile-max a-truncate-ellipsis " \
        "u-max-width-330 u-max-width-230@tablet-only"
cls4 = "c-label a-no-trucate a-font-primary-s lrv-u-font-size-14@mobile-max u-line-height-normal@mobile-max " \
        "u-letter-spacing-0021 lrv-u-display-block a-truncate-ellipsis-2line u-max-width-330 " \
        "u-max-width-230@tablet-only"

song_tags = soup.find_all('h3', attrs={'class': cls3}) 

artist_tags = soup.find_all('span', attrs={'class': cls4})

for song_tag, artist_tag in zip(song_tags, artist_tags):
    print(f'Song: {song_tag.text.strip()}, Artist: {artist_tag.text.strip()}')

Song: Fast Car, Artist: Luke Combs
Song: Calm Down, Artist: Rema & Selena Gomez
Song: Flowers, Artist: Miley Cyrus
Song: All My Life, Artist: Lil Durk Featuring J. Cole
Song: Favorite Song, Artist: Toosii
Song: Karma, Artist: Taylor Swift Featuring Ice Spice
Song: Kill Bill, Artist: SZA
Song: Creepin', Artist: Metro Boomin, The Weeknd & 21 Savage
Song: Ella Baila Sola, Artist: Eslabon Armado X Peso Pluma
Song: Sure Thing, Artist: Miguel
Song: Anti-Hero, Artist: Taylor Swift
Song: Snooze, Artist: SZA
Song: Something In The Orange, Artist: Zach Bryan
Song: Die For You, Artist: The Weeknd & Ariana Grande
Song: Fukumean, Artist: Gunna
Song: Need A Favor, Artist: Jelly Roll
Song: Cruel Summer, Artist: Taylor Swift
Song: La Bebe, Artist: Yng Lvcas x Peso Pluma
Song: You Proof, Artist: Morgan Wallen
Song: Un x100to, Artist: Grupo Frontera X Bad Bunny
Song: Thinkin' Bout Me, Artist: Morgan Wallen
Song: Rock And A Hard Place, Artist: Bailey Zimmerman
Song: Cupid, Artist: Fifty Fifty
Song: Searc

In [8]:
# Initializing an empty list
songs = []
artists = []

In [9]:
# Adding song/artist pairs to list
for song_tag, artist_tag in zip(song_tags, artist_tags):
    songs.append(song_tag.text.strip())
    artists.append(artist_tag.text.strip())

In [10]:
# Turning list into dataframe
top_songs = pd.DataFrame({'Song':songs, 'Artist':artists})
top_songs

Unnamed: 0,Song,Artist
0,Fast Car,Luke Combs
1,Calm Down,Rema & Selena Gomez
2,Flowers,Miley Cyrus
3,All My Life,Lil Durk Featuring J. Cole
4,Favorite Song,Toosii
...,...,...
94,"Angel, Pt. 1","Kodak Black, NLE Choppa, Jimin, JVKE & Muni Long"
95,Girl In Mine,Parmalee
96,Moonlight,Kali Uchis
97,Classy 101,Feid x Young Miko


#### I am still missing the very first song and artist.

In [12]:
# Extracting song and artist number 1
song1 = soup.find('h3', attrs={'class': cls1}).text.strip()
artist1 = soup.find('span', attrs={'class': cls2}).text.strip()

# Inserting them at first position in each list
songs.insert(0, song1)
artists.insert(0, artist1)

In [13]:
top_songs = pd.DataFrame({'Song':songs, 'Artist':artists})
top_songs

Unnamed: 0,Song,Artist
0,Last Night,Morgan Wallen
1,Fast Car,Luke Combs
2,Calm Down,Rema & Selena Gomez
3,Flowers,Miley Cyrus
4,All My Life,Lil Durk Featuring J. Cole
...,...,...
95,"Angel, Pt. 1","Kodak Black, NLE Choppa, Jimin, JVKE & Muni Long"
96,Girl In Mine,Parmalee
97,Moonlight,Kali Uchis
98,Classy 101,Feid x Young Miko


### 2. Expand the project

#### Scraping a larger amount of songs.

In [26]:
spotify = requests.get('https://kworb.net/spotify/country/global_daily_totals.html')
print('Spotify:', spotify.status_code)
rollingstone = requests.get('https://www.rollingstone.com/music/music-lists/best-songs-2022-list-1234632381/camilo-ft-grupo-firme-alaska-1234632703/')
print('Rolling Stone:', rollingstone.status_code)
wikipedia = requests.get('https://en.wikipedia.org/wiki/Lists_of_songs')
print('Wikipedia:', youtube.status_code)

Spotify: 200
Rolling Stone: 200
Wikipedia: 200


#### Choosing Spotify Global from kworb.net to scrape a larger amount of songs. Here both the artist name and the song title are within `a` tags, but they are separate elements within the parent `div`.

In [52]:
def scraping_songs_artists(music):
    songs = []
    artists = []

    for item in music:
        links = item.find_all('a')
        if links:
            artist = links[0].get_text() if len(links) > 0 else 'None.'
            song = links[1].get_text() if len(links) > 1 else 'None.'
        else:
            artist = 'None.'
            song = 'None.'
        
        songs.append(song)
        artists.append(artist)
    
    dct = {'Songs': songs, 'Artists': artists}
    return dct

response = requests.get('https://kworb.net/spotify/country/global_daily_totals.html')
soup = BeautifulSoup(response.content, 'html.parser')

# Finding all songs on the page
songs = soup.find_all('td', {'class': 'text mp'})

# Calling the function to scrape song and artist details
song_artist_dict = scraping_songs_artists(songs)

# Printing the scraped data
print(song_artist_dict)



In [51]:
spotify_global = pd.DataFrame(song_artist_dict)
spotify_global

Unnamed: 0,Songs,Artists
0,Blinding Lights,The Weeknd
1,Shape of You,Ed Sheeran
2,Someone You Loved,Lewis Capaldi
3,Sunflower - Spider-Man: Into the Spider-Verse,Post Malone
4,Stay,The Kid LAROI
...,...,...
9324,Adrenalina,Wisin
9325,Work,Iggy Azalea
9326,Främling,Orup
9327,Nina,Ed Sheeran
