# Case: GNOD
You have been hired as a Data Analyst for Gnod.

Gnod is a site that provides recommendations for music, art, literature and products based on collaborative filtering algorithms. Their flagship product is the music recommender, which you can try at www.gnoosic.com. The site asks users to input 3 bands they like, and computes similarity scores with the rest of the users. Then, they recommend to the user bands that users with similar tastes have picked.

Gnod is a small company, and its only revenue stream so far are adds in the site. In the future, they would like to explore partnership options with music apps (such as Deezer, Soundcloud or even Apple Music and Spotify). But for that to be possible, they need to expand and improve their recommendations.

That’s precisely where you come. They have hired you as a Data Analyst, and they expect you to bring a mix of technical expertise and business mindset to the table.

Jane, CTO of Gnod, has sent you an email assigning you with your first task

### Lab | Web Scraping Single Page

In [1]:
# get the Top 100
# https://www.popvortex.com/music/charts/top-100-songs.php
# https://playback.fm/charts

##### Lab 1
Start with Step 1:
Goal: List of 100 songs

- Scrap a list of **top 100 songs**
- Elements: Artist and title of the song

##### Lab 2
- Different notebook get more songs
- Focus on Practice web scraping (try at least 2)

##### Top 100 2023

In [2]:
# 1. import libraries
from bs4 import BeautifulSoup
import requests
import pandas as pd

In [3]:
# 2. find url and store it in a variable
url = "https://www.popvortex.com/music/charts/top-100-songs.php"

In [4]:
# 3. download html with a get request
response = requests.get(url) #gets the html and puts in response
#response.status_code

In [5]:
# 4.1. parse html (create the 'soup')
soup = BeautifulSoup(response.content, "html.parser")

In [6]:
# 4.2. check that the html code looks like it should
#soup.prettify()

In [7]:
# 5. retrieve/extract the desired info 
    # 1. Name of the artist
    # 2. Title of the song
    
soup.select("#chart-position-1 > div.chart-content.col-xs-12.col-sm-8 > p")

[<p class="title-artist"><cite class="title">Separate Ways (Worlds Apart) [feat. Lzzy Hale]</cite><em class="artist">Daughtry</em></p>]

In [8]:
soup.select("p.title-artist")

[<p class="title-artist"><cite class="title">Separate Ways (Worlds Apart) [feat. Lzzy Hale]</cite><em class="artist">Daughtry</em></p>,
 <p class="title-artist"><cite class="title">Anti-Hero</cite><em class="artist">Taylor Swift</em></p>,
 <p class="title-artist"><cite class="title">Unholy</cite><em class="artist">Sam Smith &amp; Kim Petras</em></p>,
 <p class="title-artist"><cite class="title">Heart Like A Truck</cite><em class="artist">Lainey Wilson</em></p>,
 <p class="title-artist"><cite class="title">Temperature</cite><em class="artist">Sean Paul</em></p>,
 <p class="title-artist"><cite class="title">Son Of A Sinner</cite><em class="artist">Jelly Roll</em></p>,
 <p class="title-artist"><cite class="title">Made You Look</cite><em class="artist">Meghan Trainor</em></p>,
 <p class="title-artist"><cite class="title">Lift Me Up (From Black Panther: Wakanda Forever - Music From and Inspired By)</cite><em class="artist">Rihanna</em></p>,
 <p class="title-artist"><cite class="title">Thoug

In [9]:
# just get the title
soup.select("p.title-artist cite")[0].get_text()

'Separate Ways (Worlds Apart) [feat. Lzzy Hale]'

In [10]:
# just get the artist
soup.select("p.title-artist em")[0].get_text()

'Daughtry'

In [11]:
# 6. Create lists to put everything in a table
artist = []
title = []

# 6.1. Define the number of iterations for the loop
num_iter = len(soup.select("p.title-artist"))

nameartist = soup.select("p.title-artist em")
Ctitle = soup.select("p.title-artist cite")

for i in range(num_iter):
    artist.append(nameartist[i].get_text())
    title.append(Ctitle[i].get_text())

print(artist)
print(title)

['Daughtry', 'Taylor Swift', 'Sam Smith & Kim Petras', 'Lainey Wilson', 'Sean Paul', 'Jelly Roll', 'Meghan Trainor', 'Rihanna', 'Morgan Wallen', 'HARDY & Lainey Wilson', 'Lady Gaga', 'Kane Brown & Katelyn Brown', 'OneRepublic', 'Sia', 'David Guetta & Bebe Rexha', 'Zach Bryan', 'Harry Styles', 'Tom MacDonald', 'Bailey Zimmerman', 'Morgan Wallen', 'Coi Leray', 'Bazzi', 'Morgan Wallen', 'Morgan Wallen', 'Rema & Selena Gomez', 'JVKE', 'Luke Grimes', 'Mike Posner', 'Bailey Zimmerman', 'Metro Boomin, The Weeknd & 21 Savage', 'Lady Gaga', 'Beyoncé', 'Shania Twain', 'David Guetta & Bebe Rexha', 'Luke Combs', 'Old Dominion', 'Elton John & Britney Spears', 'Chris Brown', 'SZA', 'Brandon Lake', 'Luke Combs', 'Fuerza Regida & Grupo Frontera', 'Sleep Token', 'Lizzo', 'Bad Omens', 'MONSTA X', 'Miles Guo', 'Three Dog Night', 'Carin Leon & Grupo Frontera', 'Chris Stapleton', 'Zach Bryan', 'Lily Meola', 'The Weeknd', 'P!nk', 'Jordan Davis', 'Jelly Roll', 'Nate Smith', 'Lil Uzi Vert', 'Adam Lambert', 'L

In [12]:
# 7. Put everything in a dataframe

top100 = pd.DataFrame({"artist": artist,
                      "title": title
                      })

In [13]:
top100.tail(60)

Unnamed: 0,artist,title
39,Brandon Lake,Gratitude
40,Luke Combs,"Going, Going, Gone"
41,Fuerza Regida & Grupo Frontera,Bebe Dame
42,Sleep Token,The Summoning
43,Lizzo,About Damn Time
44,Bad Omens,Just Pretend
45,MONSTA X,Beautiful Liar
46,Miles Guo,Papa (Version A)
47,Three Dog Night,Never Been to Spain
48,Carin Leon & Grupo Frontera,Que Vuelvas


**Comment**: If there is more than one artist split the column?

##### Top 100 2021

In [14]:
# 1. find url and store it in a variable
url = "https://playback.fm/charts/top-100-songs/2021"

# 2. download html with a get request
response = requests.get(url) #gets the html and puts in response
#response.status_code

In [15]:
# 3.1. parse html (create the 'soup')
soup2 = BeautifulSoup(response.content, "html.parser")

In [16]:
# 3.2. check that the html code looks like it should
#soup2.prettify()

In [17]:
# 4. retrieve/extract the desired info 
    # 1. Name of the artist
    # 2. Title of the song
    
soup2.select("#myTable")

[<table class="chartTbl" id="myTable">
 <thead>
 <tr class="tableHead">
 <th>Rank</th>
 <th><span class="mobile-only">Song</span><span class="mobile-hide">Artist</span></th>
 <th><span class="mobile-hide">Title</span></th>
 </tr>
 </thead>
 <tr itemprop="track" itemscope="" itemtype="https://schema.org/MusicRecording">
 <td>1</td>
 <td>
 <span class="mobile-only song">
 <a href="/charts/top-100-songs/video/2021/Dua-Lipa--DaBaby-Levitating" itemprop="name">
                        Levitating
                        </a>
 </span>
 <a class="artist" href="/artist/dua-lipa-top-songs" itemprop="byArtist">
                    Dua Lipa &amp; DaBaby
                    </a>
 <meta content="/artist/dua-lipa-top-songs" itemprop="url">
 </meta></td>
 <td class="mobile-hide">
 <a href="/charts/top-100-songs/video/2021/Dua-Lipa--DaBaby-Levitating">
 <span class="red-play">►</span>
 <span class="song" itemprop="name">Levitating</span>
 </a>
 </td>
 <td class="mobile-only play">
 <a href="/charts/top

In [18]:
soup2.select("span > a")[0].get_text()

'\n                       Levitating\n                       '

In [20]:
soup2.select("table.chartTbl a")[1].get_text()

'\n                   Dua Lipa & DaBaby\n                   '

In [22]:
title = []

# 6.1. Define the number of iterations for the loop
num_iter = len(soup2.select("span > a"))

Ctitle = soup2.select("span > a")

for i in range(num_iter):
    title.append(Ctitle[i].get_text())

title

['\n                       Levitating\n                       ',
 '\n                       Drivers License\n                       ',
 '\n                       Save Your Tears\n                       ',
 '\n                       Montero (Call Me by Your Name)\n                       ',
 '\n                       Blinding Lights\n                       ',
 '\n                       Good 4 U\n                       ',
 '\n                       Mood\n                       ',
 '\n                       Peaches\n                       ',
 '\n                       Leave the Door Open\n                       ',
 '\n                       Astronaut in the Ocean\n                       ',
 '\n                       Kiss Me More\n                       ',
 '\n                       Stay\n                       ',
 '\n                       Easy on Me\n                       ',
 '\n                       Butter\n                       ',
 '\n                       Without You\n             

In [34]:
top100_21 = pd.DataFrame({"title": title})
top100_21

Unnamed: 0,title
0,\n Levitating\n ...
1,\n Drivers License\n ...
2,\n Save Your Tears\n ...
3,\n Montero (Call Me by Y...
4,\n Blinding Lights\n ...
...,...
95,\n Leave Before You Love...
96,\n Beggin\n ...
97,\n Famous Friends\n ...
98,\n Lil Bit\n ...


In [35]:
def clean_art(df):
    df.replace('\\n','',regex=True,inplace=True)
    return df

In [36]:
top100_21 = top100_21.apply(clean_art)
top100_21


Unnamed: 0,title
0,Levitating ...
1,Drivers License ...
2,Save Your Tears ...
3,Montero (Call Me by You...
4,Blinding Lights ...
...,...
95,Leave Before You Love M...
96,Beggin ...
97,Famous Friends ...
98,Lil Bit ...


### Build a MVP

1. User inputs song -> Is it a hot song? - If YES: Recommend another hot song, if NOT: Recommend "similar" song

1. need to get a song and artist
2. Check if that is in my list
3. If yes - recommend another hot song
4. If no - "cannot recommend another song at the moment"

In [None]:
# Turn everything into lower cases
top100['title']=top100['title'].str.lower()
top100['artist']=top100['artist'].str.lower()
top100.head(2)

In [None]:
artist = input("Please enter the artist you are listening to?", ).lower()
title = input("What song are you listening to from this artist?", ).lower()
print("\n")
music = [artist,title]

if music in top100.values:
    print("Try this song next!")
    print("\n")
    print(str(top100.sample(n=1)).title())
    
elif music[0] in top100.values:
    print("Try this song next!")
    print("\n")
    print(str(top100.sample(n=1)).title())
    
else:
    print("Cannot make a recommendation")

In [None]:
music

In [None]:
# if music in top100.values:
#     print(str(top100.sample(n=1)).title())
#     print("Try this song next!")
# elif music[0] in top100.values:
#     print(str(top100.sample(n=1)).title())
#     print("Try this song next!")
# else:
#     print("Cannot make a recommendation")