# Business goal:

Check the case_study_gnod.md file.

Make sure you've understood the big picture of your project:

the goal of the company (Gnod),
their current product (Gnoosic),
their strategy, and
how your project fits into this context.
Re-read the business case and the e-mail from the CTO, take a look at the flowchart and create an initial Trello board with the tasks you think you'll have to accomplish.

Instructions - Scraping popular songs
Your product will take a song as an input from the user and will output another song (the recommendation). In most cases, the recommended song will have to be similar to the inputted song, but the CTO thinks that if the song is on the top charts at the moment, the user will enjoy more a recommendation of a song that's also popular at the moment.

You have find data on the internet about currently popular songs. Billboard maintains a weekly Top 100 of "hot" songs here: https://www.billboard.com/charts/hot-100.

It's a good place to start! Scrape the current top 100 songs and their respective artists, and put the information into a pandas dataframe.

Hooray! You have been hired as a Data Analyst for **Gnod!**

Gnod is a site that provides recommendations for music, art, 
literature and products based on collaborative filtering algorithms. 
Their flagship product is the music recommender, which you can try at [www.gnoosic.com](http://www.gnoosic.com/).
 The site asks users to input 3 bands they like, and computes similarity
 scores with the rest of the users. Then, they recommend to the user 
bands that users with similar tastes have picked.

Gnod is a small company, and its only revenue stream so far are adds 
in the site. In the future, they would like to explore partnership 
options with music apps (such as Deezer, Soundcloud or even Apple Music 
and Spotify). But for that to be possible, they need to expand and 
improve their recommendations.

That’s precisely where you come. They have hired you as a Data 
Analyst, and they expect you to bring a mix of technical expertise and 
business mindset to the table.

Jane, CTO of Gnod, has sent you an email assigning you with your first task.

## **The Challenge**

This is an e-mail Jane - CTO of Gnod - sent over your inbox in the first weeks working there.

> Dear xxxxxxxx,
We are thrilled to welcome you as a Data Analyst for Gnoosic!
As you know, we are trying to come up with ways to enhance our music recommendations. One of the new features we'd like to research is to recommend songs (not only bands). We're also aware of the limitations of our collaborative filtering algorithms, and would like to give users two new possibilities when searching for recommendations:
- Songs that are actually similar to the ones they picked from an acoustic point of view.
- Songs that are popular around the world right now, independently from their tastes.
Coming up with the perfect song recommender will take us months - no need to stress out too much. In this first week, we want you to explore new data sources for songs. The internet is full of information and our first step is to acquire it do an initial exploration. Feel free to use APIs or directly scrape the web to collect as much information as possible from popular songs. Eventually, we'll need to collect data from millions of songs, but we can start with a few hundreds or thousands from each source and see if the collected features are useful.
Once the data is collected, we want you to create clusters of songs that are similar to each other. The idea is that if a user inputs a song from one group, we'll prioritize giving them recommendations of songs from that same group.
**On Friday**, you will present your work to me and Marek, the CEO and founder. Full disclosure: I need you to be very convincing about this whole song-recommender, as this has been my personal push and the main reason we hired you for!
Be open minded about this process: we are agile, and that means that we define our products and features on-the-go, while exploring the tools and the data that's available to us. We'd love you to provide your own vision of the product and the next steps to be taken.
Lots of luck and strength for this first week with us!
Jane

Have fun and enjoy the ride!

In [1]:
# Scrape the current top 100 songs and their respective artists, and put the information into a pandas dataframe.

# List one from Billboard Charts

In [2]:
# 1. importing libraries
from bs4 import BeautifulSoup
import requests
import pandas as pd


# 2. find url and store it in avariable
url = "https://www.billboard.com/charts/hot-100"

# 3. download html with a get request
response = requests.get(url)

In [3]:
response.status_code

200

200 means successfully/ok

In [4]:
soup = BeautifulSoup(response.content, 'html.parser')

soup.select(" span.chart-element__information > span.chart-element__information__artist.text--truncate.color--secondary")

#charts > div > div.chart-list__wrapper > div > ol > li:nth-child(1) > button > span.chart-element__information > span.chart-element__information__artist.text--truncate.color--secondary

In [5]:
lst = soup.select(" span.chart-element__information > span.chart-element__information__artist.text--truncate.color--secondary")
lst

[<span class="chart-element__information__artist text--truncate color--secondary">The Weeknd &amp; Ariana Grande</span>,
 <span class="chart-element__information__artist text--truncate color--secondary">Silk Sonic (Bruno Mars &amp; Anderson .Paak)</span>,
 <span class="chart-element__information__artist text--truncate color--secondary">Justin Bieber Featuring Daniel Caesar &amp; Giveon</span>,
 <span class="chart-element__information__artist text--truncate color--secondary">Polo G</span>,
 <span class="chart-element__information__artist text--truncate color--secondary">Dua Lipa Featuring DaBaby</span>,
 <span class="chart-element__information__artist text--truncate color--secondary">Doja Cat Featuring SZA</span>,
 <span class="chart-element__information__artist text--truncate color--secondary">Lil Nas X</span>,
 <span class="chart-element__information__artist text--truncate color--secondary">Masked Wolf</span>,
 <span class="chart-element__information__artist text--truncate color--seco

In [6]:
soup.select('span.chart-element__information__artist')[0].text

'The Weeknd & Ariana Grande'

In [7]:
soup.select('span.chart-element__information__song')[0].text

'Save Your Tears'

In [8]:
lst_artists = soup.select(" span.chart-element__information > span.chart-element__information__artist.text--truncate.color--secondary")
for song in lst: 
    print(song.text)

The Weeknd & Ariana Grande
Silk Sonic (Bruno Mars & Anderson .Paak)
Justin Bieber Featuring Daniel Caesar & Giveon
Polo G
Dua Lipa Featuring DaBaby
Doja Cat Featuring SZA
Lil Nas X
Masked Wolf
Cardi B
Olivia Rodrigo
The Weeknd
Olivia Rodrigo
SpotemGottem Featuring Pooh Shiesty Or DaBaby
Lil Tjay Featuring 6LACK
Lil Baby
Saweetie Featuring Doja Cat
Giveon
Maroon 5 Featuring Megan Thee Stallion
Pop Smoke
24kGoldn Featuring iann dior
Machine Gun Kelly X blackbear
Pooh Shiesty Featuring Lil Durk
The Kid LAROI
Gabby Barrett
Chris Brown & Young Thug
Tate McRae
Ariana Grande
Eric Church
Luke Combs
Drake
Moneybagg Yo
Billie Eilish
Bad Bunny & Jhay Cortez
Ariana Grande
Moneybagg Yo
Young Thug & Gunna Featuring Drake
Mooski
Young Thug & Gunna
Drake Featuring Lil Baby
Yung Bleu Featuring Drake
Dua Lipa
Glass Animals
Jake Owen
SZA
Pop Smoke Featuring Lil Baby & DaBaby
Sam Hunt
Coi Leray Featuring Lil Durk
Travis Scott & HVME
Kali Uchis
Rod Wave
Doja Cat
Moneybagg Yo Featuring BIG30
Moneybagg Yo & 

In [9]:

song_name = []
artists = []


In [10]:
# from tqdm.notebook import tqdm
#tqdm


#len_songs = len(lst_artists)

#len_songs

In [11]:
soup

<!DOCTYPE html>

<html class="" lang="">
<head>
<meta charset="utf-8"/>
<meta content="IE=edge" http-equiv="X-UA-Compatible"/>
<meta content="width=device-width, initial-scale=1, user-scalable=no" name="viewport"/>
<title>The Hot 100 Chart | Billboard</title>
<meta content="The Hot 100 Chart" name="title" property="title">
<meta content="@billboard" name="twitter:site"/>
<meta content="Billboard" property="og:site_name">
<meta content="article" property="og:type">
<link href="/manifest.json" rel="manifest"/>
<style>
        .chart-pro-access {
            background-image: url('https://www.billboard.com/assets/1620152677/images/piano/chart-pro-access-mb.png?f85ba153dd79ad9832a8');
        }

        @media (min-width: 769px) {
            .chart-pro-access {
                background-image: url('https://www.billboard.com/assets/1620152677/images/piano/chart-pro-access-dk.png?f85ba153dd79ad9832a8');
            }
        }
    </style>
<script async="async" data-cfasync="false" src="ht

In [12]:
songs=soup.select("span.chart-element__information")

In [13]:
len_songs=len(songs)
len_songs

100

In [14]:
for i in range(len_songs):
    song_name.append(soup.select("span.chart-element__information__song")[i].text)
    artists.append(soup.select("span.chart-element__information__artist")[i].text)
   

In [15]:
song_name

['Save Your Tears',
 'Leave The Door Open',
 'Peaches',
 'Rapstar',
 'Levitating',
 'Kiss Me More',
 'Montero (Call Me By Your Name)',
 'Astronaut In The Ocean',
 'Up',
 'Drivers License',
 'Blinding Lights',
 'Deja Vu',
 'Beat Box',
 'Calling My Phone',
 'On Me',
 'Best Friend',
 'Heartbreak Anniversary',
 'Beautiful Mistakes',
 'What You Know Bout Love',
 'Mood',
 "My Ex's Best Friend",
 'Back In Blood',
 'Without You',
 'The Good Ones',
 'Go Crazy',
 'You Broke Me First.',
 '34+35',
 'Hell Of A View',
 'Forever After All',
 "What's Next",
 'Time Today',
 'Therefore I Am',
 'Dakiti',
 'Positions',
 'Shottas (Lala)',
 'Solid',
 'Track Star',
 'Ski',
 'Wants And Needs',
 "You're Mines Still",
 "We're Good",
 'Heat Waves',
 'Made For You',
 'Good Days',
 'For The Night',
 "Breaking Up Was Easy In The 90's",
 'No More Parties',
 'Goosebumps',
 'Telepatia',
 'Tombstone',
 'Streets',
 'Go!',
 'Hard For The Next',
 'If Pain Was A Person',
 "What's Your Country Song",
 'Just The Way',
 'Hold

In [16]:
artists

['The Weeknd & Ariana Grande',
 'Silk Sonic (Bruno Mars & Anderson .Paak)',
 'Justin Bieber Featuring Daniel Caesar & Giveon',
 'Polo G',
 'Dua Lipa Featuring DaBaby',
 'Doja Cat Featuring SZA',
 'Lil Nas X',
 'Masked Wolf',
 'Cardi B',
 'Olivia Rodrigo',
 'The Weeknd',
 'Olivia Rodrigo',
 'SpotemGottem Featuring Pooh Shiesty Or DaBaby',
 'Lil Tjay Featuring 6LACK',
 'Lil Baby',
 'Saweetie Featuring Doja Cat',
 'Giveon',
 'Maroon 5 Featuring Megan Thee Stallion',
 'Pop Smoke',
 '24kGoldn Featuring iann dior',
 'Machine Gun Kelly X blackbear',
 'Pooh Shiesty Featuring Lil Durk',
 'The Kid LAROI',
 'Gabby Barrett',
 'Chris Brown & Young Thug',
 'Tate McRae',
 'Ariana Grande',
 'Eric Church',
 'Luke Combs',
 'Drake',
 'Moneybagg Yo',
 'Billie Eilish',
 'Bad Bunny & Jhay Cortez',
 'Ariana Grande',
 'Moneybagg Yo',
 'Young Thug & Gunna Featuring Drake',
 'Mooski',
 'Young Thug & Gunna',
 'Drake Featuring Lil Baby',
 'Yung Bleu Featuring Drake',
 'Dua Lipa',
 'Glass Animals',
 'Jake Owen',
 

In [17]:
#list is song_name 
songs = pd.DataFrame({'song_name':song_name,'artists':artists})
songs.head(100)

Unnamed: 0,song_name,artists
0,Save Your Tears,The Weeknd & Ariana Grande
1,Leave The Door Open,Silk Sonic (Bruno Mars & Anderson .Paak)
2,Peaches,Justin Bieber Featuring Daniel Caesar & Giveon
3,Rapstar,Polo G
4,Levitating,Dua Lipa Featuring DaBaby
...,...,...
95,4 Da Gang,42 Dugg & Roddy Ricch
96,Blame It On You,Jason Aldean
97,Wasted On You,Morgan Wallen
98,Way Less Sad,AJR


In [18]:
songs_100 = songs.to_csv('Lab Web-Scraping1')

# List two from Official Charts

In [19]:
# 1. import libraries
#from bs4 import BeautifulSoup
#import requests
#import pandas as pd

#2. find and store it in avariable
url = "https://www.officialcharts.com/charts/singles-chart/"

#3. download html with a get request
response = requests.get(url)

In [20]:
response.status_code

200

In [21]:
response.content

b'\r\n\r\n<!doctype html>\r\n<!--[if lt IE 7]><html class="no-js ie6 oldie" lang="en"><![endif]-->\r\n<!--[if IE 7]><html class="no-js ie7 oldie" lang="en"><![endif]-->\r\n<!--[if IE 8]><html class="no-js ie8 oldie" lang="en"><![endif]-->\r\n<!--[if gt IE 8]><!-->\r\n<html class="no-js" lang="en">\r\n<!--<![endif]-->\r\n\r\n<head>\r\n    \r\n\r\n<meta charset="utf-8" />\r\n<meta http-equiv="X-UA-Compatible" content="IE=edge,chrome=1" />\r\n\r\n<title>Official Singles Chart Top 100 | Official Charts Company</title>\r\n<meta name="description" content="The Official UK Top 40 chart is compiled by the Official Charts Company, based on official sales of sales of downloads, CD, vinyl, audio streams and video streams. The Top 40 is broadcast on BBC Radio 1 and MTV, the full Top 100 is published exclusively on OfficialCharts.com." />\r\n<meta name="keywords" content="Top 40, UK Top 40, Charts, Top 40 UK, UK Charts, UK singles chart, Music Charts, Official UK Top 40, Charts 2012, Hit 40 UK, UK 

In [22]:
soup = BeautifulSoup(response.content, 'html.parser')
soup


<!DOCTYPE html>

<!--[if lt IE 7]><html class="no-js ie6 oldie" lang="en"><![endif]-->
<!--[if IE 7]><html class="no-js ie7 oldie" lang="en"><![endif]-->
<!--[if IE 8]><html class="no-js ie8 oldie" lang="en"><![endif]-->
<!--[if gt IE 8]><!-->
<html class="no-js" lang="en">
<!--<![endif]-->
<head>
<meta charset="utf-8"/>
<meta content="IE=edge,chrome=1" http-equiv="X-UA-Compatible"/>
<title>Official Singles Chart Top 100 | Official Charts Company</title>
<meta content="The Official UK Top 40 chart is compiled by the Official Charts Company, based on official sales of sales of downloads, CD, vinyl, audio streams and video streams. The Top 40 is broadcast on BBC Radio 1 and MTV, the full Top 100 is published exclusively on OfficialCharts.com." name="description"/>
<meta content="Top 40, UK Top 40, Charts, Top 40 UK, UK Charts, UK singles chart, Music Charts, Official UK Top 40, Charts 2012, Hit 40 UK, UK Chart, Official Singles Chart, Official Albums Chart, Number 1, Number One" name="k

In [23]:
artist_lst = soup.select("div > div.title-artist > div.artist > a")
artist_lst

[<a href="/artist/55226/lil-nas-x/">LIL NAS X</a>,
 <a href="/artist/59818/justin-bieber-caesar-giveon/">JUSTIN BIEBER/CAESAR/GIVEON</a>,
 <a href="/artist/59630/joel-corry-raye-david-guetta/">JOEL CORRY/RAYE/DAVID GUETTA</a>,
 <a href="/artist/60027/tion-wayne-and-russ-millions/">TION WAYNE &amp; RUSS MILLIONS</a>,
 <a href="/artist/59350/riton-nightcrawlers-mufasa/">RITON/NIGHTCRAWLERS/MUFASA</a>,
 <a href="/artist/59953/doja-cat-ft-sza/">DOJA CAT FT SZA</a>,
 <a href="/artist/55623/polo-g/">POLO G</a>,
 <a href="/artist/51252/tom-grennan/">TOM GRENNAN</a>,
 <a href="/artist/59389/nathan-evans-220kid-billen-ted/">NATHAN EVANS/220KID/BILLEN TED</a>,
 <a href="/artist/14412/tiesto/">TIESTO</a>,
 <a href="/artist/59583/ella-henderson-and-tom-grennan/">ELLA HENDERSON &amp; TOM GRENNAN</a>,
 <a href="/artist/59366/atb-topic-a7s/">ATB/TOPIC/A7S</a>,
 <a href="/artist/56921/olivia-rodrigo/">OLIVIA RODRIGO</a>,
 <a href="/artist/59670/d-block-europe-raye/">D-BLOCK EUROPE/RAYE</a>,
 <a href="

In [24]:
artists_2 = soup.select("div > div.title-artist > div.artist > a")
for artists_2 in artist_lst: 
    print(artists_2.text)

LIL NAS X
JUSTIN BIEBER/CAESAR/GIVEON
JOEL CORRY/RAYE/DAVID GUETTA
TION WAYNE & RUSS MILLIONS
RITON/NIGHTCRAWLERS/MUFASA
DOJA CAT FT SZA
POLO G
TOM GRENNAN
NATHAN EVANS/220KID/BILLEN TED
TIESTO
ELLA HENDERSON & TOM GRENNAN
ATB/TOPIC/A7S
OLIVIA RODRIGO
D-BLOCK EUROPE/RAYE
MIMI WEBB
MASKED WOLF
DAVE
AVA MAX
GLASS ANIMALS
WEEKND
MAJESTIC & BONEY M
SILK SONIC/BRUNO MARS/PAAK
NAVOS
TOM ZANETTI
CARDI B
LIL TJAY & 6LACK
JUSTIN BIEBER
DUA LIPA
AURORA
AJ TRACEY
KSI FT YUNGBLUD & POLO G
RAG'N'BONE MAN & PINK
A1 & J1
DUA LIPA
GIVEON
WES NELSON FT YXNG BANE
OLIVIA RODRIGO
YEARS & YEARS
BECKY HILL
WEEKND
KID LAROI
JOEL CORRY FT MNEK
NOIZU
BAD BOY CHILLER CREW
ANNE-MARIE/KSI/DIGITAL FARM
JAMES ARTHUR
FRED AGAIN & BLESSED MADONNA
TRAVIS SCOTT & HVME
NATHAN DAWE/ANNE-MARIE/MOSTACK
GRIFF
POLO G
RAG'N'BONE MAN
MAROON 5/MEGAN THEE STALLION
PINK & WILLOW SAGE HART
YOUNG THUG/GUNNA/DRAKE
REGARD/TROYE SIVAN/TATE MCRAE
CENTRAL CEE
JORJA SMITH
MEDUZA FT DERMOT KENNEDY
SHANE CODD
PAUL WOOLFORD & AMBER MARK
FAT

In [25]:
songs_lst = soup.select("div > div.title-artist > div.title > a")
songs_lst

[<a href="/search/singles/montero-(call-me-by-your-name)/">MONTERO (CALL ME BY YOUR NAME)</a>,
 <a href="/search/singles/peaches/">PEACHES</a>,
 <a href="/search/singles/bed/">BED</a>,
 <a href="/search/singles/body/">BODY</a>,
 <a href="/search/singles/friday/">FRIDAY</a>,
 <a href="/search/singles/kiss-me-more/">KISS ME MORE</a>,
 <a href="/search/singles/rapstar/">RAPSTAR</a>,
 <a href="/search/singles/little-bit-of-love/">LITTLE BIT OF LOVE</a>,
 <a href="/search/singles/wellerman/">WELLERMAN</a>,
 <a href="/search/singles/the-business/">THE BUSINESS</a>,
 <a href="/search/singles/let's-go-home-together/">LET'S GO HOME TOGETHER</a>,
 <a href="/search/singles/your-love-(9pm)/">YOUR LOVE (9PM)</a>,
 <a href="/search/singles/deja-vu/">DEJA VU</a>,
 <a href="/search/singles/ferrari-horses/">FERRARI HORSES</a>,
 <a href="/search/singles/good-without/">GOOD WITHOUT</a>,
 <a href="/search/singles/astronaut-in-the-ocean/">ASTRONAUT IN THE OCEAN</a>,
 <a href="/search/singles/titanium/">TIT

In [26]:
songs_2 = soup.select("div > div.title-artist > div.title > a")
for songs_2 in songs_lst: 
    print(songs_2.text)

MONTERO (CALL ME BY YOUR NAME)
PEACHES
BED
BODY
FRIDAY
KISS ME MORE
RAPSTAR
LITTLE BIT OF LOVE
WELLERMAN
THE BUSINESS
LET'S GO HOME TOGETHER
YOUR LOVE (9PM)
DEJA VU
FERRARI HORSES
GOOD WITHOUT
ASTRONAUT IN THE OCEAN
TITANIUM
MY HEAD & MY HEART
HEAT WAVES
SAVE YOUR TEARS
RASPUTIN
LEAVE THE DOOR OPEN
BELIEVE ME
DIDN'T KNOW
UP
CALLING MY PHONE
HOLD ON
LEVITATING
RUNAWAY
LITTLE MORE LOVE
PATIENCE
ANYWHERE AWAY FROM HERE
LATEST TRENDS
WE'RE GOOD
HEARTBREAK ANNIVERSARY
NICE TO MEET YA
DRIVERS LICENSE
STARSTRUCK
LAST TIME
BLINDING LIGHTS
WITHOUT YOU
HEAD & HEART
SUMMER 91 (LOOKING BACK)
DON'T YOU WORRY ABOUT ME
DON'T PLAY
MEDICINE
MAREA (WE'VE LOST DANCING)
GOOSEBUMPS
WAY TOO LONG
BLACK HOLE
MARTIN & GINA
ALL YOU EVER WANTED
BEAUTIFUL MISTAKES
COVER ME IN SUNSHINE
SOLID
YOU
COMMITMENT ISSUES
ADDICTED
PARADISE
GET OUT MY HEAD
HEAT
SUNSHINE (THE LIGHT)
MR PERFECTLY FINE (TAYLOR'S VERSION)
6 FOR 6
TONIGHT
STREETS
HOW DOES IT FEEL
MERCURY
ANOTHER LOVE
WATERMELON SUGAR
TRACK STAR
SOMEONE YOU LOVED

In [27]:
 soup.select("tr:nth-child(2) > td:nth-child(3) > div > div.title-artist > div.artist > a")[0].text


'LIL NAS X'

In [28]:
 soup.select("tr:nth-child(2) > td:nth-child(3) > div > div.title-artist > div.title > a")

[<a href="/search/singles/montero-(call-me-by-your-name)/">MONTERO (CALL ME BY YOUR NAME)</a>]

In [29]:
 soup.select("tr:nth-child(2) > td:nth-child(3) > div > div.title-artist > div.title > a")[0].text

'MONTERO (CALL ME BY YOUR NAME)'

In [30]:
songs_2 = []
artists_2 = []

In [31]:
len_songs = len(songs_lst)
len_songs

100

In [32]:
len_artist = len(artist_lst)
len_artist

100

In [33]:
for i in range(len_songs):
    songs_2.append(soup.select("div > div.title-artist > div.title > a")[i].text)

In [34]:
songs_2

['MONTERO (CALL ME BY YOUR NAME)',
 'PEACHES',
 'BED',
 'BODY',
 'FRIDAY',
 'KISS ME MORE',
 'RAPSTAR',
 'LITTLE BIT OF LOVE',
 'WELLERMAN',
 'THE BUSINESS',
 "LET'S GO HOME TOGETHER",
 'YOUR LOVE (9PM)',
 'DEJA VU',
 'FERRARI HORSES',
 'GOOD WITHOUT',
 'ASTRONAUT IN THE OCEAN',
 'TITANIUM',
 'MY HEAD & MY HEART',
 'HEAT WAVES',
 'SAVE YOUR TEARS',
 'RASPUTIN',
 'LEAVE THE DOOR OPEN',
 'BELIEVE ME',
 "DIDN'T KNOW",
 'UP',
 'CALLING MY PHONE',
 'HOLD ON',
 'LEVITATING',
 'RUNAWAY',
 'LITTLE MORE LOVE',
 'PATIENCE',
 'ANYWHERE AWAY FROM HERE',
 'LATEST TRENDS',
 "WE'RE GOOD",
 'HEARTBREAK ANNIVERSARY',
 'NICE TO MEET YA',
 'DRIVERS LICENSE',
 'STARSTRUCK',
 'LAST TIME',
 'BLINDING LIGHTS',
 'WITHOUT YOU',
 'HEAD & HEART',
 'SUMMER 91 (LOOKING BACK)',
 "DON'T YOU WORRY ABOUT ME",
 "DON'T PLAY",
 'MEDICINE',
 "MAREA (WE'VE LOST DANCING)",
 'GOOSEBUMPS',
 'WAY TOO LONG',
 'BLACK HOLE',
 'MARTIN & GINA',
 'ALL YOU EVER WANTED',
 'BEAUTIFUL MISTAKES',
 'COVER ME IN SUNSHINE',
 'SOLID',
 'YOU'

In [35]:
for i in range(len_artist):
    artists_2.append(soup.select("div > div.title-artist > div.artist > a")[i].text)

In [36]:
artists_2

['LIL NAS X',
 'JUSTIN BIEBER/CAESAR/GIVEON',
 'JOEL CORRY/RAYE/DAVID GUETTA',
 'TION WAYNE & RUSS MILLIONS',
 'RITON/NIGHTCRAWLERS/MUFASA',
 'DOJA CAT FT SZA',
 'POLO G',
 'TOM GRENNAN',
 'NATHAN EVANS/220KID/BILLEN TED',
 'TIESTO',
 'ELLA HENDERSON & TOM GRENNAN',
 'ATB/TOPIC/A7S',
 'OLIVIA RODRIGO',
 'D-BLOCK EUROPE/RAYE',
 'MIMI WEBB',
 'MASKED WOLF',
 'DAVE',
 'AVA MAX',
 'GLASS ANIMALS',
 'WEEKND',
 'MAJESTIC & BONEY M',
 'SILK SONIC/BRUNO MARS/PAAK',
 'NAVOS',
 'TOM ZANETTI',
 'CARDI B',
 'LIL TJAY & 6LACK',
 'JUSTIN BIEBER',
 'DUA LIPA',
 'AURORA',
 'AJ TRACEY',
 'KSI FT YUNGBLUD & POLO G',
 "RAG'N'BONE MAN & PINK",
 'A1 & J1',
 'DUA LIPA',
 'GIVEON',
 'WES NELSON FT YXNG BANE',
 'OLIVIA RODRIGO',
 'YEARS & YEARS',
 'BECKY HILL',
 'WEEKND',
 'KID LAROI',
 'JOEL CORRY FT MNEK',
 'NOIZU',
 'BAD BOY CHILLER CREW',
 'ANNE-MARIE/KSI/DIGITAL FARM',
 'JAMES ARTHUR',
 'FRED AGAIN & BLESSED MADONNA',
 'TRAVIS SCOTT & HVME',
 'NATHAN DAWE/ANNE-MARIE/MOSTACK',
 'GRIFF',
 'POLO G',
 "RAG'N

In [37]:
songs_offical_charts = pd.DataFrame({'songs_2':songs_2,'artists_2':artists_2})
songs_offical_charts.head(100)

Unnamed: 0,songs_2,artists_2
0,MONTERO (CALL ME BY YOUR NAME),LIL NAS X
1,PEACHES,JUSTIN BIEBER/CAESAR/GIVEON
2,BED,JOEL CORRY/RAYE/DAVID GUETTA
3,BODY,TION WAYNE & RUSS MILLIONS
4,FRIDAY,RITON/NIGHTCRAWLERS/MUFASA
...,...,...
95,LIFE GOES ON,PS1 FT ALEX HOSKING
96,KUKOC,AJ TRACEY FT NAV
97,HEADSHOT,LIL TJAY/POLO G/FIVIO FOREIGN
98,34+35,ARIANA GRANDE


In [38]:
songs_offical_charts.head()

Unnamed: 0,songs_2,artists_2
0,MONTERO (CALL ME BY YOUR NAME),LIL NAS X
1,PEACHES,JUSTIN BIEBER/CAESAR/GIVEON
2,BED,JOEL CORRY/RAYE/DAVID GUETTA
3,BODY,TION WAYNE & RUSS MILLIONS
4,FRIDAY,RITON/NIGHTCRAWLERS/MUFASA


In [39]:
songs = songs.rename(columns={'song_name':'song','artists':'artist'})
songs.head()

Unnamed: 0,song,artist
0,Save Your Tears,The Weeknd & Ariana Grande
1,Leave The Door Open,Silk Sonic (Bruno Mars & Anderson .Paak)
2,Peaches,Justin Bieber Featuring Daniel Caesar & Giveon
3,Rapstar,Polo G
4,Levitating,Dua Lipa Featuring DaBaby


In [40]:
songs_offical_charts=songs_offical_charts.rename(columns={'songs_2':'song','artists_2':'artist'})

In [41]:
frames=[songs,songs_offical_charts]

In [42]:
songs_list_merged=pd.concat(frames)

In [43]:
songs_list_merged

Unnamed: 0,song,artist
0,Save Your Tears,The Weeknd & Ariana Grande
1,Leave The Door Open,Silk Sonic (Bruno Mars & Anderson .Paak)
2,Peaches,Justin Bieber Featuring Daniel Caesar & Giveon
3,Rapstar,Polo G
4,Levitating,Dua Lipa Featuring DaBaby
...,...,...
95,LIFE GOES ON,PS1 FT ALEX HOSKING
96,KUKOC,AJ TRACEY FT NAV
97,HEADSHOT,LIL TJAY/POLO G/FIVIO FOREIGN
98,34+35,ARIANA GRANDE


In [44]:
songs_list_merged_final = songs_list_merged.to_csv('songs_top_200')

In [45]:
import pandas as pd 

In [46]:
songs_200 = pd.read_csv('songs_top_200')

In [47]:
songs_200

Unnamed: 0.1,Unnamed: 0,song,artist
0,0,Save Your Tears,The Weeknd & Ariana Grande
1,1,Leave The Door Open,Silk Sonic (Bruno Mars & Anderson .Paak)
2,2,Peaches,Justin Bieber Featuring Daniel Caesar & Giveon
3,3,Rapstar,Polo G
4,4,Levitating,Dua Lipa Featuring DaBaby
...,...,...,...
195,95,LIFE GOES ON,PS1 FT ALEX HOSKING
196,96,KUKOC,AJ TRACEY FT NAV
197,97,HEADSHOT,LIL TJAY/POLO G/FIVIO FOREIGN
198,98,34+35,ARIANA GRANDE


In [119]:
songs_200_df=songs_200.drop(['Unnamed: 0'], axis = 1)
songs_200_df

Unnamed: 0,song,artist
0,Save Your Tears,The Weeknd & Ariana Grande
1,Leave The Door Open,Silk Sonic (Bruno Mars & Anderson .Paak)
2,Peaches,Justin Bieber Featuring Daniel Caesar & Giveon
3,Rapstar,Polo G
4,Levitating,Dua Lipa Featuring DaBaby
...,...,...
195,LIFE GOES ON,PS1 FT ALEX HOSKING
196,KUKOC,AJ TRACEY FT NAV
197,HEADSHOT,LIL TJAY/POLO G/FIVIO FOREIGN
198,34+35,ARIANA GRANDE


In [137]:
#sian says go through the function and replace the names with names from your data - name of columns - song and artist 
#name of data frame (if changed)
#once updated - run the function 

def recommender():
    from random import randint
    #input from user
    song=input("Pick a song from the playlist ")
    if len(song)==0:
        print("Lost in thoughts?")
    else: 
        song=song.lower() #convert input to lowercase
        songs_200_df["song"]=songs_200_df["song"].apply(lambda x:x.lower())
        filter_song=songs_200_df[songs_200_df["song"]==song]
        #filter_song=songs_200_df[(songs_200_df["song"].str.lower()).str.contains(song)]# convert target to lowercase
        #check if its in the list we have
        if len(filter_song) ==0:
            print("JOP says you are lazy")

        else:
            # if song is in billboard hot 100, confirm it, recommend another random hot song
            print("That's a hot song")
            random_song = randint(0, len(songs_200_df)-1)
            print("You might also like " + songs_200_df["song"][random_song] + " by " + songs_200_df["artist"][random_song])
            
            
            
            
            
            
            

In [138]:
recommender()

Pick a song from the playlist 0
JOP says you are lazy


# Challenge three, create a spotify 

# Instructions

* To move forward with the project, you need to create a collection of songs with their audio features - as large as possible!


* These are the songs that we will cluster. And, later, when the user inputs a song, we will find the cluster to which the song belongs and recommend a song from the same cluster. The more songs you have, the more accurate and diverse recommendations you'll be able to give. Although... you might want to make sure the collected songs are "curated" in a certain way. Try to find playlists of songs that are diverse, but also that meet certain standards.


* The process of sending hundreds or thousands of requests can take some time - it's normal if you have to wait a few minutes (or, if you're ambitious, even hours) to get all the data you need.


* An idea for collecting as many songs as possible is to start with all the songs of a big, diverse playlist and then go to every artist present in the playlist and grab every song of every album of that artist. The amount of songs you'll be collecting per playlist will grow exponentially!

In [51]:
import spotipy
from spotipy.oauth2 import SpotifyClientCredentials
import getpass
import pandas as pd

In [52]:
client_id = getpass.getpass('client_id')
client_secret = getpass.getpass('client_secret')

client_id········
client_secret········


In [53]:

sp = spotipy.Spotify(auth_manager=SpotifyClientCredentials(client_id="a3fd2d56e9c34b1eb9847f19c74bc457",
                                                          client_secret="f4bd85f398b84b13b46b3ee85dec0f1c"))

# Explore Drake

In [54]:
results = sp.search(q="Drake", limit=50)

In [55]:
type(results)

dict

In [56]:
results.keys()

dict_keys(['tracks'])

In [57]:
results

{'tracks': {'href': 'https://api.spotify.com/v1/search?query=Drake&type=track&offset=0&limit=50',
  'items': [{'album': {'album_type': 'single',
     'artists': [{'external_urls': {'spotify': 'https://open.spotify.com/artist/3TVXtAsR1Inumwj472S9r4'},
       'href': 'https://api.spotify.com/v1/artists/3TVXtAsR1Inumwj472S9r4',
       'id': '3TVXtAsR1Inumwj472S9r4',
       'name': 'Drake',
       'type': 'artist',
       'uri': 'spotify:artist:3TVXtAsR1Inumwj472S9r4'}],
     'available_markets': ['AD',
      'AE',
      'AG',
      'AL',
      'AM',
      'AO',
      'AR',
      'AT',
      'AU',
      'AZ',
      'BA',
      'BB',
      'BD',
      'BE',
      'BF',
      'BG',
      'BH',
      'BI',
      'BJ',
      'BN',
      'BO',
      'BR',
      'BS',
      'BT',
      'BW',
      'BY',
      'BZ',
      'CA',
      'CH',
      'CI',
      'CL',
      'CM',
      'CO',
      'CR',
      'CV',
      'CW',
      'CY',
      'CZ',
      'DE',
      'DJ',
      'DK',
      'DM',
   

In [58]:
results['tracks'].keys()

dict_keys(['href', 'items', 'limit', 'next', 'offset', 'previous', 'total'])

In [59]:
results['tracks']['href']

'https://api.spotify.com/v1/search?query=Drake&type=track&offset=0&limit=50'

In [60]:
results["tracks"]["items"]

[{'album': {'album_type': 'single',
   'artists': [{'external_urls': {'spotify': 'https://open.spotify.com/artist/3TVXtAsR1Inumwj472S9r4'},
     'href': 'https://api.spotify.com/v1/artists/3TVXtAsR1Inumwj472S9r4',
     'id': '3TVXtAsR1Inumwj472S9r4',
     'name': 'Drake',
     'type': 'artist',
     'uri': 'spotify:artist:3TVXtAsR1Inumwj472S9r4'}],
   'available_markets': ['AD',
    'AE',
    'AG',
    'AL',
    'AM',
    'AO',
    'AR',
    'AT',
    'AU',
    'AZ',
    'BA',
    'BB',
    'BD',
    'BE',
    'BF',
    'BG',
    'BH',
    'BI',
    'BJ',
    'BN',
    'BO',
    'BR',
    'BS',
    'BT',
    'BW',
    'BY',
    'BZ',
    'CA',
    'CH',
    'CI',
    'CL',
    'CM',
    'CO',
    'CR',
    'CV',
    'CW',
    'CY',
    'CZ',
    'DE',
    'DJ',
    'DK',
    'DM',
    'DO',
    'DZ',
    'EC',
    'EE',
    'EG',
    'ES',
    'FI',
    'FJ',
    'FM',
    'FR',
    'GA',
    'GB',
    'GD',
    'GE',
    'GH',
    'GM',
    'GN',
    'GQ',
    'GR',
    'GT',
    'GW'

In [61]:
results["tracks"]['limit']

50

In [62]:
results['tracks']['next']

'https://api.spotify.com/v1/search?query=Drake&type=track&offset=50&limit=50'

In [63]:
results['tracks']['offset']

0

In [64]:
results['tracks']['previous']

In [65]:
results['tracks']['total']

29784

# Explore items

In [66]:
len(results['tracks']['items'])

50

In [67]:
results['tracks']['items'][0].keys()

dict_keys(['album', 'artists', 'available_markets', 'disc_number', 'duration_ms', 'explicit', 'external_ids', 'external_urls', 'href', 'id', 'is_local', 'name', 'popularity', 'preview_url', 'track_number', 'type', 'uri'])

In [68]:
results['tracks']['items'][0]['album'].keys()

dict_keys(['album_type', 'artists', 'available_markets', 'external_urls', 'href', 'id', 'images', 'name', 'release_date', 'release_date_precision', 'total_tracks', 'type', 'uri'])

In [69]:
results['tracks']['items'][0]['album']['artists']

[{'external_urls': {'spotify': 'https://open.spotify.com/artist/3TVXtAsR1Inumwj472S9r4'},
  'href': 'https://api.spotify.com/v1/artists/3TVXtAsR1Inumwj472S9r4',
  'id': '3TVXtAsR1Inumwj472S9r4',
  'name': 'Drake',
  'type': 'artist',
  'uri': 'spotify:artist:3TVXtAsR1Inumwj472S9r4'}]

In [70]:
results['tracks']['items'][0]['album']['id']

'5LuoozUhs2pl3glZeAJl89'

In [71]:
results['tracks']['items'][0]['album']['name']

'Scary Hours 2'

In [72]:
results['tracks']['items'][0]['album']['release_date']

'2021-03-05'

In [73]:
results['tracks']['items'][0]['album']['total_tracks']

3

In [74]:
results['tracks']['items'][0]['artists']

[{'external_urls': {'spotify': 'https://open.spotify.com/artist/3TVXtAsR1Inumwj472S9r4'},
  'href': 'https://api.spotify.com/v1/artists/3TVXtAsR1Inumwj472S9r4',
  'id': '3TVXtAsR1Inumwj472S9r4',
  'name': 'Drake',
  'type': 'artist',
  'uri': 'spotify:artist:3TVXtAsR1Inumwj472S9r4'},
 {'external_urls': {'spotify': 'https://open.spotify.com/artist/5f7VJjfbwm532GiveGC0ZK'},
  'href': 'https://api.spotify.com/v1/artists/5f7VJjfbwm532GiveGC0ZK',
  'id': '5f7VJjfbwm532GiveGC0ZK',
  'name': 'Lil Baby',
  'type': 'artist',
  'uri': 'spotify:artist:5f7VJjfbwm532GiveGC0ZK'}]

In [75]:
results['tracks']['items'][0]['id']

'65OVbaJR5O1RmwOQx0875b'

In [76]:
results['tracks']['items'][0]['name']

'Wants and Needs (feat. Lil Baby)'

In [77]:
results['tracks']['items'][0]['popularity']

88

In [78]:
results['tracks']['items'][0]['uri']

'spotify:track:65OVbaJR5O1RmwOQx0875b'

# Playlist

In [81]:
playlist = sp.user_playlist_tracks("The Swinging Turtle", "5TZkls9cEOzWDR6qCxwDot")

In [82]:
playlist.keys()

dict_keys(['href', 'items', 'limit', 'next', 'offset', 'previous', 'total'])

In [83]:
len(playlist['items'])

100

In [84]:
#The whole playlist
for item in playlist['items']:
    print(item['track']['id'])

Hiphop = []
for item in playlist['items']:
    Hiphop.append(item['track']['id'])

43PGPuHIlVOc04jrZVh9L6
7ytR5pFWmSjzHJIeQkgog4
65OVbaJR5O1RmwOQx0875b
5vGLcdRuSbUhD8ScwsGSdA
2r6OAV3WsYtXuXjvJ1lIDi
0PvFJmanyNQMseIFrU708S
3aQem4jVGdhtg116TmJnHz
5Kskr9LcNYa0tpt5f0ZEJx
4TIqzdAssasqx3DAe6cG9J
285pBltuF7vW8TeWk8hdRR
0e7ipj03S05BNilyu5bRzt
5yY9lUy8nbvjM1Uyo1Uqoc
6gBFPUFcJLzWGx4lenP6h2
2SAqBLGA283SUiwJ3xOUVI
2Ox2c1WEJDeQCHTXPE3YKM
6cQ08IpBxxfGcSKxqE3NmB
17vGPZ5EsdvtgAOCD4FLWI
3fUKv3kIexMFOmYsUYvbXJ
3eekarcy7kvN4yt5ZFzltW
6rTInqW3YECMkQsBEHw4sd
4Iedi94TIaB2GGb1nMB68v
6Qfcmquht50gQWbUbXiDnf
7o4gBbTM6UBLkOYPw9xMCz
6toQdWWc4noiOk3Eo5mVDS
0tIHHwV2eL10rpQ1fiyDjz
4oYH9ASuD0zMUpGKiyhEBf
02kDW379Yfd5PzW5A6vuGt
2nPqkrxKkDu71sYiHDFPjn
0rozVEMymlZH9dvu2jFr8M
6uFn47ACjqYkc0jADwEdj1
1dg3qy5DjoJodawfOCgrTP
0k7wmahjkn389wAZdz19Cv
5YEOzOojehCqxGQCcQiyR4
3RHCJjmc6dZycqVUqYZgLI
62vpWI1CHwFy7tMIcSStl8
1UooOpZ9LTGd3WEgbeioy7
5DHKtFkoRokYppclmGhQEX
5SWnsxjhdcEDc7LJjq9UHk
1KWdCGCAnSJync4NlTu8Ei
78QR3Wp35dqAhFEc2qAGjE
0mZcIuwAUL7sI6amRIDxuQ
0nbXyq5TXYPCO7pr3N8S4I
5axuUkBGhxBjXfi2f4DTyx
7GX5flRQZVH

In [85]:
Hiphop

['43PGPuHIlVOc04jrZVh9L6',
 '7ytR5pFWmSjzHJIeQkgog4',
 '65OVbaJR5O1RmwOQx0875b',
 '5vGLcdRuSbUhD8ScwsGSdA',
 '2r6OAV3WsYtXuXjvJ1lIDi',
 '0PvFJmanyNQMseIFrU708S',
 '3aQem4jVGdhtg116TmJnHz',
 '5Kskr9LcNYa0tpt5f0ZEJx',
 '4TIqzdAssasqx3DAe6cG9J',
 '285pBltuF7vW8TeWk8hdRR',
 '0e7ipj03S05BNilyu5bRzt',
 '5yY9lUy8nbvjM1Uyo1Uqoc',
 '6gBFPUFcJLzWGx4lenP6h2',
 '2SAqBLGA283SUiwJ3xOUVI',
 '2Ox2c1WEJDeQCHTXPE3YKM',
 '6cQ08IpBxxfGcSKxqE3NmB',
 '17vGPZ5EsdvtgAOCD4FLWI',
 '3fUKv3kIexMFOmYsUYvbXJ',
 '3eekarcy7kvN4yt5ZFzltW',
 '6rTInqW3YECMkQsBEHw4sd',
 '4Iedi94TIaB2GGb1nMB68v',
 '6Qfcmquht50gQWbUbXiDnf',
 '7o4gBbTM6UBLkOYPw9xMCz',
 '6toQdWWc4noiOk3Eo5mVDS',
 '0tIHHwV2eL10rpQ1fiyDjz',
 '4oYH9ASuD0zMUpGKiyhEBf',
 '02kDW379Yfd5PzW5A6vuGt',
 '2nPqkrxKkDu71sYiHDFPjn',
 '0rozVEMymlZH9dvu2jFr8M',
 '6uFn47ACjqYkc0jADwEdj1',
 '1dg3qy5DjoJodawfOCgrTP',
 '0k7wmahjkn389wAZdz19Cv',
 '5YEOzOojehCqxGQCcQiyR4',
 '3RHCJjmc6dZycqVUqYZgLI',
 '62vpWI1CHwFy7tMIcSStl8',
 '1UooOpZ9LTGd3WEgbeioy7',
 '5DHKtFkoRokYppclmGhQEX',
 

In [86]:
# these are the audio features for all songs in the playlist
sp.audio_features(tracks=Hiphop)

[{'danceability': 0.789,
  'energy': 0.536,
  'key': 6,
  'loudness': -6.862,
  'mode': 1,
  'speechiness': 0.242,
  'acousticness': 0.41,
  'instrumentalness': 0,
  'liveness': 0.129,
  'valence': 0.437,
  'tempo': 81.039,
  'type': 'audio_features',
  'id': '43PGPuHIlVOc04jrZVh9L6',
  'uri': 'spotify:track:43PGPuHIlVOc04jrZVh9L6',
  'track_href': 'https://api.spotify.com/v1/tracks/43PGPuHIlVOc04jrZVh9L6',
  'analysis_url': 'https://api.spotify.com/v1/audio-analysis/43PGPuHIlVOc04jrZVh9L6',
  'duration_ms': 165926,
  'time_signature': 4},
 {'danceability': 0.746,
  'energy': 0.69,
  'key': 11,
  'loudness': -7.956,
  'mode': 1,
  'speechiness': 0.164,
  'acousticness': 0.247,
  'instrumentalness': 0,
  'liveness': 0.101,
  'valence': 0.497,
  'tempo': 89.977,
  'type': 'audio_features',
  'id': '7ytR5pFWmSjzHJIeQkgog4',
  'uri': 'spotify:track:7ytR5pFWmSjzHJIeQkgog4',
  'track_href': 'https://api.spotify.com/v1/tracks/7ytR5pFWmSjzHJIeQkgog4',
  'analysis_url': 'https://api.spotify.com

# Unsupervised learning, predicting songs ID's using clasters 



1.Determine the audio features of the input song by querying the Spotify API with it. Store that result in a dataframe or numpy array (if the latter, keep track of which element in the array corresponds to which feature). Don't spend too much time on implementing any guessing that should be done by your program (whether user inputs the correct song). To get a working recommender ready, first assume that the user puts in a track name and/or an artist that exactly matches how they are stored in the spotify database.

2.take the vastness of songs you have collected together with their audio features and train a k-means model with them - similar how we did in lecture (scaling is important!). Keep in mind, that it doesn't make sense to train a model, when you have artist or title name or track id in the data. You need to however have the possibility to later (after your model has determined the cluster ids) reassign the correct track ids to the rows. You can do this by just adding back the track id to the dataframe that contains the determined cluster id's, but of course only if you haven't shuffled your data rowwise. You can optimize a bit the parameters, but for a first iteration of your model, just go with the default values of KMeans and stick with it. Only improve later by tuning the parameters.

3."Predicting" which cluster id a song belongs to, would work like this: take the audio-features of the song and put it into a numpy array, let's say song_array. The audio feature order must exactly match the order that the model has learned. Call the model with KMeans.predict(song_array) to make the prediction. The outcome is only one cluster
The recommendation would then work like this: Take the determined cluster id from 2. and subset your spotipy dataframe to filter for this cluster with e.g. df[df["cluster_id'']==2] and get a random song back from that subset. You can take a look at pandas.DataFrame.sample
(edited)
white_check_mark
eyes
raised_hands







In [87]:
audio_features_hiphop = pd.DataFrame(sp.audio_features(tracks=Hiphop))

In [88]:
audio_features_hiphop.head(100)

Unnamed: 0,danceability,energy,key,loudness,mode,speechiness,acousticness,instrumentalness,liveness,valence,tempo,type,id,uri,track_href,analysis_url,duration_ms,time_signature
0,0.789,0.536,6,-6.862,1,0.2420,0.4100,0.000000,0.1290,0.437,81.039,audio_features,43PGPuHIlVOc04jrZVh9L6,spotify:track:43PGPuHIlVOc04jrZVh9L6,https://api.spotify.com/v1/tracks/43PGPuHIlVOc...,https://api.spotify.com/v1/audio-analysis/43PG...,165926,4
1,0.746,0.690,11,-7.956,1,0.1640,0.2470,0.000000,0.1010,0.497,89.977,audio_features,7ytR5pFWmSjzHJIeQkgog4,spotify:track:7ytR5pFWmSjzHJIeQkgog4,https://api.spotify.com/v1/tracks/7ytR5pFWmSjz...,https://api.spotify.com/v1/audio-analysis/7ytR...,181733,4
2,0.578,0.449,1,-6.349,1,0.2860,0.0618,0.000002,0.1190,0.100,136.006,audio_features,65OVbaJR5O1RmwOQx0875b,spotify:track:65OVbaJR5O1RmwOQx0875b,https://api.spotify.com/v1/tracks/65OVbaJR5O1R...,https://api.spotify.com/v1/audio-analysis/65OV...,192956,4
3,0.719,0.648,3,-7.600,0,0.1250,0.2000,0.000000,0.1270,0.660,140.201,audio_features,5vGLcdRuSbUhD8ScwsGSdA,spotify:track:5vGLcdRuSbUhD8ScwsGSdA,https://api.spotify.com/v1/tracks/5vGLcdRuSbUh...,https://api.spotify.com/v1/audio-analysis/5vGL...,123263,4
4,0.905,0.647,10,-5.065,0,0.1070,0.0187,0.000000,0.2820,0.367,130.970,audio_features,2r6OAV3WsYtXuXjvJ1lIDi,spotify:track:2r6OAV3WsYtXuXjvJ1lIDi,https://api.spotify.com/v1/tracks/2r6OAV3WsYtX...,https://api.spotify.com/v1/audio-analysis/2r6O...,190534,4
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
95,0.867,0.744,2,-5.171,1,0.2270,0.2680,0.000000,0.0713,0.645,84.005,audio_features,4cSSL3YafYjM3yjgFO1vJg,spotify:track:4cSSL3YafYjM3yjgFO1vJg,https://api.spotify.com/v1/tracks/4cSSL3YafYjM...,https://api.spotify.com/v1/audio-analysis/4cSS...,173288,4
96,0.901,0.464,5,-9.789,0,0.0645,0.3680,0.000017,0.2380,0.638,109.004,audio_features,1fewSx2d5KIZ04wsooEBOz,spotify:track:1fewSx2d5KIZ04wsooEBOz,https://api.spotify.com/v1/tracks/1fewSx2d5KIZ...,https://api.spotify.com/v1/audio-analysis/1few...,203267,4
97,0.763,0.532,0,-11.322,1,0.2600,0.1250,0.000006,0.0907,0.276,145.062,audio_features,0hvkWQjfuTO8bbPwOWRuD3,spotify:track:0hvkWQjfuTO8bbPwOWRuD3,https://api.spotify.com/v1/tracks/0hvkWQjfuTO8...,https://api.spotify.com/v1/audio-analysis/0hvk...,162000,4
98,0.884,0.346,8,-8.228,0,0.3510,0.0151,0.000007,0.0871,0.376,75.016,audio_features,2fQrGHiQOvpL9UgPvtYy6G,spotify:track:2fQrGHiQOvpL9UgPvtYy6G,https://api.spotify.com/v1/tracks/2fQrGHiQOvpL...,https://api.spotify.com/v1/audio-analysis/2fQr...,220307,4


In [89]:
audio_features_hiphop.describe()

Unnamed: 0,danceability,energy,key,loudness,mode,speechiness,acousticness,instrumentalness,liveness,valence,tempo,duration_ms,time_signature
count,100.0,100.0,100.0,100.0,100.0,100.0,100.0,100.0,100.0,100.0,100.0,100.0,100.0
mean,0.77161,0.57342,5.17,-7.25047,0.5,0.218112,0.15567,0.002493,0.166935,0.427733,126.10674,191768.43,3.95
std,0.118308,0.114526,3.95187,2.020845,0.502519,0.132139,0.164359,0.012177,0.117884,0.191049,28.360808,37431.058319,0.35887
min,0.456,0.298,0.0,-12.326,0.0,0.0317,0.000407,0.0,0.0565,0.0605,74.722,115200.0,1.0
25%,0.7075,0.49825,1.0,-8.73325,0.0,0.0969,0.029425,0.0,0.10175,0.28075,103.82425,166414.5,4.0
50%,0.7875,0.5735,5.0,-6.8605,0.5,0.2015,0.0933,0.0,0.1215,0.4275,132.0155,190193.0,4.0
75%,0.868,0.643,8.0,-5.81575,1.0,0.315,0.23575,2e-06,0.1715,0.547,144.116,216306.5,4.0
max,0.963,0.869,11.0,-3.37,1.0,0.53,0.703,0.101,0.79,0.961,202.015,312820.0,5.0


In [90]:
audio_features_hiphop.keys()

Index(['danceability', 'energy', 'key', 'loudness', 'mode', 'speechiness',
       'acousticness', 'instrumentalness', 'liveness', 'valence', 'tempo',
       'type', 'id', 'uri', 'track_href', 'analysis_url', 'duration_ms',
       'time_signature'],
      dtype='object')

In [91]:
print(audio_features_hiphop['danceability'])

0     0.789
1     0.746
2     0.578
3     0.719
4     0.905
      ...  
95    0.867
96    0.901
97    0.763
98    0.884
99    0.590
Name: danceability, Length: 100, dtype: float64


In [92]:
print(audio_features_hiphop['energy'])

0     0.536
1     0.690
2     0.449
3     0.648
4     0.647
      ...  
95    0.744
96    0.464
97    0.532
98    0.346
99    0.618
Name: energy, Length: 100, dtype: float64


In [93]:
print(audio_features_hiphop['key'])

0      6
1     11
2      1
3      3
4     10
      ..
95     2
96     5
97     0
98     8
99     1
Name: key, Length: 100, dtype: int64


In [94]:
print(audio_features_hiphop['loudness'])  

0     -6.862
1     -7.956
2     -6.349
3     -7.600
4     -5.065
       ...  
95    -5.171
96    -9.789
97   -11.322
98    -8.228
99    -5.756
Name: loudness, Length: 100, dtype: float64


In [95]:
print(audio_features_hiphop['mode']) 

0     1
1     1
2     1
3     0
4     0
     ..
95    1
96    0
97    1
98    0
99    1
Name: mode, Length: 100, dtype: int64


In [96]:
print(audio_features_hiphop['speechiness']) 

0     0.2420
1     0.1640
2     0.2860
3     0.1250
4     0.1070
       ...  
95    0.2270
96    0.0645
97    0.2600
98    0.3510
99    0.3340
Name: speechiness, Length: 100, dtype: float64


In [97]:
print(audio_features_hiphop['acousticness']) 

0     0.4100
1     0.2470
2     0.0618
3     0.2000
4     0.0187
       ...  
95    0.2680
96    0.3680
97    0.1250
98    0.0151
99    0.0127
Name: acousticness, Length: 100, dtype: float64


In [98]:
print(audio_features_hiphop['instrumentalness']) 

0     0.000000
1     0.000000
2     0.000002
3     0.000000
4     0.000000
        ...   
95    0.000000
96    0.000017
97    0.000006
98    0.000007
99    0.000000
Name: instrumentalness, Length: 100, dtype: float64


In [99]:
print(audio_features_hiphop['liveness']) 

0     0.1290
1     0.1010
2     0.1190
3     0.1270
4     0.2820
       ...  
95    0.0713
96    0.2380
97    0.0907
98    0.0871
99    0.2440
Name: liveness, Length: 100, dtype: float64


In [100]:
print(audio_features_hiphop['valence']) 

0     0.437
1     0.497
2     0.100
3     0.660
4     0.367
      ...  
95    0.645
96    0.638
97    0.276
98    0.376
99    0.153
Name: valence, Length: 100, dtype: float64


In [101]:
print(audio_features_hiphop['tempo']) 

0      81.039
1      89.977
2     136.006
3     140.201
4     130.970
       ...   
95     84.005
96    109.004
97    145.062
98     75.016
99     96.459
Name: tempo, Length: 100, dtype: float64


In [102]:
print(audio_features_hiphop['type']) 

0     audio_features
1     audio_features
2     audio_features
3     audio_features
4     audio_features
           ...      
95    audio_features
96    audio_features
97    audio_features
98    audio_features
99    audio_features
Name: type, Length: 100, dtype: object


In [103]:
audio_features_hiphop.columns

Index(['danceability', 'energy', 'key', 'loudness', 'mode', 'speechiness',
       'acousticness', 'instrumentalness', 'liveness', 'valence', 'tempo',
       'type', 'id', 'uri', 'track_href', 'analysis_url', 'duration_ms',
       'time_signature'],
      dtype='object')

In [105]:
feats = audio_features_hiphop[['danceability', 'energy', 'key', 'loudness', 'mode', 'speechiness',
       'acousticness', 'instrumentalness', 'liveness', 'valence', 'tempo',
       'type', 'id']]

In [106]:
feats

Unnamed: 0,danceability,energy,key,loudness,mode,speechiness,acousticness,instrumentalness,liveness,valence,tempo,type,id
0,0.789,0.536,6,-6.862,1,0.2420,0.4100,0.000000,0.1290,0.437,81.039,audio_features,43PGPuHIlVOc04jrZVh9L6
1,0.746,0.690,11,-7.956,1,0.1640,0.2470,0.000000,0.1010,0.497,89.977,audio_features,7ytR5pFWmSjzHJIeQkgog4
2,0.578,0.449,1,-6.349,1,0.2860,0.0618,0.000002,0.1190,0.100,136.006,audio_features,65OVbaJR5O1RmwOQx0875b
3,0.719,0.648,3,-7.600,0,0.1250,0.2000,0.000000,0.1270,0.660,140.201,audio_features,5vGLcdRuSbUhD8ScwsGSdA
4,0.905,0.647,10,-5.065,0,0.1070,0.0187,0.000000,0.2820,0.367,130.970,audio_features,2r6OAV3WsYtXuXjvJ1lIDi
...,...,...,...,...,...,...,...,...,...,...,...,...,...
95,0.867,0.744,2,-5.171,1,0.2270,0.2680,0.000000,0.0713,0.645,84.005,audio_features,4cSSL3YafYjM3yjgFO1vJg
96,0.901,0.464,5,-9.789,0,0.0645,0.3680,0.000017,0.2380,0.638,109.004,audio_features,1fewSx2d5KIZ04wsooEBOz
97,0.763,0.532,0,-11.322,1,0.2600,0.1250,0.000006,0.0907,0.276,145.062,audio_features,0hvkWQjfuTO8bbPwOWRuD3
98,0.884,0.346,8,-8.228,0,0.3510,0.0151,0.000007,0.0871,0.376,75.016,audio_features,2fQrGHiQOvpL9UgPvtYy6G


In [107]:
feats_to_cluster = feats.drop(['type','id'], axis=1)

In [108]:
feats_to_cluster

Unnamed: 0,danceability,energy,key,loudness,mode,speechiness,acousticness,instrumentalness,liveness,valence,tempo
0,0.789,0.536,6,-6.862,1,0.2420,0.4100,0.000000,0.1290,0.437,81.039
1,0.746,0.690,11,-7.956,1,0.1640,0.2470,0.000000,0.1010,0.497,89.977
2,0.578,0.449,1,-6.349,1,0.2860,0.0618,0.000002,0.1190,0.100,136.006
3,0.719,0.648,3,-7.600,0,0.1250,0.2000,0.000000,0.1270,0.660,140.201
4,0.905,0.647,10,-5.065,0,0.1070,0.0187,0.000000,0.2820,0.367,130.970
...,...,...,...,...,...,...,...,...,...,...,...
95,0.867,0.744,2,-5.171,1,0.2270,0.2680,0.000000,0.0713,0.645,84.005
96,0.901,0.464,5,-9.789,0,0.0645,0.3680,0.000017,0.2380,0.638,109.004
97,0.763,0.532,0,-11.322,1,0.2600,0.1250,0.000006,0.0907,0.276,145.062
98,0.884,0.346,8,-8.228,0,0.3510,0.0151,0.000007,0.0871,0.376,75.016


In [112]:
from sklearn.preprocessing import StandardScaler
scaler = StandardScaler().fit(feats_to_cluster)
from sklearn.cluster import KMeans

In [114]:
X_prep = scaler.transform(feats_to_cluster)

In [116]:
kmeans = KMeans(n_clusters=7, random_state = 3)
kmeans.fit(X_prep)
kmeans.predict(X_prep)

array([4, 1, 5, 1, 1, 1, 2, 6, 4, 4, 2, 1, 1, 5, 6, 1, 2, 6, 6, 6, 6, 4,
       6, 5, 6, 5, 1, 2, 6, 5, 1, 5, 0, 6, 4, 6, 5, 3, 2, 0, 2, 0, 6, 1,
       1, 5, 4, 1, 5, 1, 2, 2, 6, 1, 4, 1, 2, 5, 0, 2, 6, 6, 5, 2, 6, 5,
       2, 4, 2, 1, 2, 6, 1, 1, 1, 5, 5, 1, 1, 6, 6, 2, 1, 2, 1, 1, 1, 4,
       2, 2, 5, 3, 4, 2, 1, 1, 6, 5, 6, 2], dtype=int32)