#### Instructions 

#### Prioritize the MVP

In the previous lab, you had to scrape data about "hot songs". It's critical to be on track with that part, as it was part of the request from the CTO.
 - Done

If you couldn't finish the first lab, use this time to go back there.

**User experience:**

- What happens if the user inputs a song that doesn't exist?
- We return an error message
- What do we do with songs that have the same name, but a different artist?
- We can ask the user to type the artist name
- How do we deal with typos?

**Architecture:**

- Do we build the interaction with the user in the same notebook as the web-scraping?
- Where do we store the scraped songs?

**Scheduling / Automation:**

- Should we scrape billboard / wikipedia every time a user sends a request?

**Testing:**

- Does it work when you test it with a real user (a colleague)?

Chances are that more issues will appear, and that not all of them will be solved during this session. But what's important is that the issues have been identified.

In [1]:
def billboard_scraping():
    # 1. import libraries
    from bs4 import BeautifulSoup
    import requests
    import pandas as pd

    # 2. find url and store it in a variable
    url = "https://www.billboard.com/charts/hot-100/"

    # 3. download html with a get request 
    response = requests.get(url)

    # 4.1. parse html (create the 'soup')
    soup = BeautifulSoup(response.content, "html.parser")

    s = soup.find_all('h3', class_='c-title a-no-trucate a-font-primary-bold-s u-letter-spacing-0021 lrv-u-font-size-18@tablet lrv-u-font-size-16 u-line-height-125 u-line-height-normal@mobile-max a-truncate-ellipsis u-max-width-330 u-max-width-230@tablet-only')

    t = soup.find_all('h3', class_='c-title a-font-primary-bold-l a-font-primary-bold-m@mobile-max lrv-u-color-black u-color-white@mobile-max lrv-u-margin-r-150')
    
    #5 iterate over list
    title = []
    title.append(t[0].get_text().replace("\n",""))
    for i in range(len(s)):
        text = str(s[i].get_text()).replace("\n", "")
        title.append(text)

    s = soup.find_all('span', class_='c-label a-no-trucate a-font-primary-s lrv-u-font-size-14@mobile-max u-line-height-normal@mobile-max u-letter-spacing-0021 lrv-u-display-block a-truncate-ellipsis-2line u-max-width-330 u-max-width-230@tablet-only')

    t = soup.find_all('p', class_='c-tagline a-font-primary-l a-font-primary-m@mobile-max lrv-u-color-black u-color-white@mobile-max lrv-u-margin-tb-00 lrv-u-padding-t-025 lrv-u-margin-r-150')

    artist = []
    artist.append(t[0].get_text().replace("\n",""))
    for i in range(len(s)):
        text = str(s[i].get_text()).replace("\n", "")
        artist.append(text)
    
    #6 create the dataframe
    billboard = pd.DataFrame({"title":title,
                           "artist":artist,
                          })

    return billboard

billboard_scraping()

Unnamed: 0,title,artist
0,We Don't Talk About Bruno,"Carolina Gaitan, Mauro Castillo, Adassa, Rhenz..."
1,Do We Have A Problem?,Nicki Minaj X Lil Baby
2,Easy On Me,Adele
3,Heat Waves,Glass Animals
4,Stay,The Kid LAROI & Justin Bieber
...,...,...
95,Iffy,Chris Brown
96,When I'm Gone,Alesso / Katy Perry
97,Fair Trade,Drake Featuring Travis Scott
98,Megan's Piano,Megan Thee Stallion


In [104]:
billboard_hot_100 = billboard_scraping()

In [3]:
billboard_hot_100

Unnamed: 0,title,artist
0,We Don't Talk About Bruno,"Carolina Gaitan, Mauro Castillo, Adassa, Rhenz..."
1,Do We Have A Problem?,Nicki Minaj X Lil Baby
2,Easy On Me,Adele
3,Heat Waves,Glass Animals
4,Stay,The Kid LAROI & Justin Bieber
...,...,...
95,Iffy,Chris Brown
96,When I'm Gone,Alesso / Katy Perry
97,Fair Trade,Drake Featuring Travis Scott
98,Megan's Piano,Megan Thee Stallion


In [4]:
def user_input():
    x = input("Please enter the title of a song: ")
    
    return x

In [6]:
def lowercase_values(dataframe):
    for column_name in list(dataframe.select_dtypes(include='object').columns.values):
        dataframe[column_name] = dataframe[column_name].str.lower()


In [134]:
#The goal here is to print the title and the artist if the researched song is in the billboard hot 100
#if method

import pandas as pd
from Levenshtein import distance as lev

#df2 = pd.DataFrame([["easy on me","not adele"]],columns=['title', 'artist'])
#billboard_hot_100 = pd.concat([billboard_hot_100, df2],ignore_index=True)
x = user_input()
lowercase_values(billboard_hot_100)
my_list = list(billboard_hot_100["title"])
x = str.lower(x)
for title in billboard_hot_100["title"]:
    if lev(title,x) < 3:
        x = title
if billboard_hot_100[billboard_hot_100["title"] == x].shape[0] > 1:
    print(billboard_hot_100[billboard_hot_100["title"] == x])
    y = int(input("Please enter the number next to the song: "))
    song = billboard_hot_100["title"][y]
    artist = billboard_hot_100["artist"][y]
    print("Your selected song is: "+ song +" from "+ artist)
    reco = billboard_hot_100.sample(n = 1)
    song = list(reco["title"])[0]
    artist = list(reco["artist"])[0]
    print("You should listen to: "+ song +" from "+ artist)
elif x in my_list:
    song = list(billboard_hot_100["title"][billboard_hot_100["title"] == str.lower(x)])[0]
    artist = list(billboard_hot_100["artist"][billboard_hot_100["title"] == str.lower(x)])[0]
    print("Your selected song is: "+ song +" from "+ artist)
    reco = billboard_hot_100.sample(n = 1)
    song = list(reco["title"])[0]
    artist = list(reco["artist"])[0]
    print("You should listen to: "+ song +" from "+ artist)
else:
    print("The song is not in the hot list")

Please enter the title of a song: test
The song is not in the hot list


# Lab | API wrappers - Create your collection of songs & audio features


#### Instructions 


To move forward with the project, you need to create a collection of songs with their audio features - as large as possible! 

These are the songs that we will cluster. And, later, when the user inputs a song, we will find the cluster to which the song belongs and recommend a song from the same cluster.
The more songs you have, the more accurate and diverse recommendations you'll be able to give. Although... you might want to make sure the collected songs are "curated" in a certain way. Try to find playlists of songs that are diverse, but also that meet certain standards.

The process of sending hundreds or thousands of requests can take some time - it's normal if you have to wait a few minutes (or, if you're ambitious, even hours) to get all the data you need.

An idea for collecting as many songs as possible is to start with all the songs of a big, diverse playlist and then go to every artist present in the playlist and grab every song of every album of that artist. The amount of songs you'll be collecting per playlist will grow exponentially!


Billboard Hot 100 playlist

https://open.spotify.com/playlist/6UeSakyzhiEt4NB3UAd6NQ?si=19d7543e2fb94fe6

Chill Pill playlist

https://open.spotify.com/playlist/7aZ5mBWDDCNy7wmdCf4FCX?si=d28cf9ef49bd497a

https://www.geeksforgeeks.org/append-list-of-dictionary-and-series-to-a-existing-pandas-dataframe-in-python/

Longest Playlist on Spotify (10k songs)

https://open.spotify.com/playlist/5S8SJdl1BDc0ugpkEvFsIL

In [135]:
import config

In [136]:
import spotipy
import json
from spotipy.oauth2 import SpotifyClientCredentials

sp = spotipy.Spotify(auth_manager=SpotifyClientCredentials(client_id= config.client_id,
                                                           client_secret= config.client_secret))

results = sp.search(q="Lose Yourself",limit=1,market="GB")
results

{'tracks': {'href': 'https://api.spotify.com/v1/search?query=Lose+Yourself&type=track&market=GB&offset=0&limit=1',
  'items': [{'album': {'album_type': 'single',
     'artists': [{'external_urls': {'spotify': 'https://open.spotify.com/artist/7dGJo4pcD2V6oG8kP0tJRR'},
       'href': 'https://api.spotify.com/v1/artists/7dGJo4pcD2V6oG8kP0tJRR',
       'id': '7dGJo4pcD2V6oG8kP0tJRR',
       'name': 'Eminem',
       'type': 'artist',
       'uri': 'spotify:artist:7dGJo4pcD2V6oG8kP0tJRR'}],
     'external_urls': {'spotify': 'https://open.spotify.com/album/1rfORa9iYmocEsnnZGMVC4'},
     'href': 'https://api.spotify.com/v1/albums/1rfORa9iYmocEsnnZGMVC4',
     'id': '1rfORa9iYmocEsnnZGMVC4',
     'images': [{'height': 640,
       'url': 'https://i.scdn.co/image/ab67616d0000b273b6ef2ebd34efb08cb76f6eec',
       'width': 640},
      {'height': 300,
       'url': 'https://i.scdn.co/image/ab67616d00001e02b6ef2ebd34efb08cb76f6eec',
       'width': 300},
      {'height': 64,
       'url': 'https://i.

import pprint

pprint.pprint(playlist) to check the json file more clearly


In [68]:
import pandas as pd
df = pd.DataFrame({})
df

In [69]:
def get_playlist_tracks(username, playlist_id):
    results = sp.user_playlist_tracks(username,playlist_id,market="GB")
    tracks = results['items']
    while results['next']:
        results = sp.next(results)
        tracks.extend(results['items'])
    return tracks

chillpill_tracks = get_playlist_tracks("spotify", "7aZ5mBWDDCNy7wmdCf4FCX")
chillpill_tracks

[{'added_at': '2020-01-14T14:23:30Z',
  'added_by': {'external_urls': {'spotify': 'https://open.spotify.com/user/solenita'},
   'href': 'https://api.spotify.com/v1/users/solenita',
   'id': 'solenita',
   'type': 'user',
   'uri': 'spotify:user:solenita'},
  'is_local': False,
  'primary_color': None,
  'track': {'album': {'album_type': 'single',
    'artists': [{'external_urls': {'spotify': 'https://open.spotify.com/artist/0LyfQWJT6nXafLPZqxe9Of'},
      'href': 'https://api.spotify.com/v1/artists/0LyfQWJT6nXafLPZqxe9Of',
      'id': '0LyfQWJT6nXafLPZqxe9Of',
      'name': 'Various Artists',
      'type': 'artist',
      'uri': 'spotify:artist:0LyfQWJT6nXafLPZqxe9Of'}],
    'external_urls': {'spotify': 'https://open.spotify.com/album/6K44xJeoORjgqFWovNnSkg'},
    'href': 'https://api.spotify.com/v1/albums/6K44xJeoORjgqFWovNnSkg',
    'id': '6K44xJeoORjgqFWovNnSkg',
    'images': [{'height': 640,
      'url': 'https://i.scdn.co/image/ab67616d0000b2737392a38627de9efd21169116',
      'wi

In [70]:
for i in range(len(chillpill_tracks)):
    if chillpill_tracks[i]["track"]["id"] != None:
        audio_feature_dict = sp.audio_features(chillpill_tracks[i]["track"]["uri"])[0]
        my_dict_new = { key: [audio_feature_dict[key]] for key in list(audio_feature_dict.keys()) }
        new_df = pd.DataFrame(my_dict_new)
        df = df.append(new_df, ignore_index=True, sort=False)

In [72]:
df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 706 entries, 0 to 705
Data columns (total 18 columns):
 #   Column            Non-Null Count  Dtype  
---  ------            --------------  -----  
 0   danceability      706 non-null    float64
 1   energy            706 non-null    float64
 2   key               706 non-null    int64  
 3   loudness          706 non-null    float64
 4   mode              706 non-null    int64  
 5   speechiness       706 non-null    float64
 6   acousticness      706 non-null    float64
 7   instrumentalness  706 non-null    float64
 8   liveness          706 non-null    float64
 9   valence           706 non-null    float64
 10  tempo             706 non-null    float64
 11  type              706 non-null    object 
 12  id                706 non-null    object 
 13  uri               706 non-null    object 
 14  track_href        706 non-null    object 
 15  analysis_url      706 non-null    object 
 16  duration_ms       706 non-null    int64  
 1