# Predicting which songs I am most likely to enjoy based on my listening history
In this project we will be using Spotify's API to gather data about my listening habits and analyze what makes my favorite songs good for me. I will then web-scrape billboard's top 100 songs and analyze their songs, and with the help of machine learning, hopefully predict the top 3 songs I am most likely to enjoy. From there, I want to create a playlist that will add 3 songs weekly, and hopefully get me back to music. 

## Let's Start with Billboard's Top 100 Songs of the Week

In [1]:
#Step 1 - Collect billboard's top 100 songs 
import requests
from bs4 import BeautifulSoup
import pandas as pd
url = 'https://www.billboard.com/charts/hot-100'

### Import the url and use BeautifulSoup to organize the code

In [2]:
response = requests.get(url)
html = response.content
soup = BeautifulSoup(html)
soup

<!DOCTYPE html>
<html class="" lang="">
<head>
<meta charset="utf-8"/>
<meta content="IE=edge" http-equiv="X-UA-Compatible"/>
<meta content="width=device-width, initial-scale=1, user-scalable=no" name="viewport"/>
<title>The Hot 100 Chart | Billboard</title>
<meta content="The Hot 100 Chart" name="title" property="title"/>
<meta content="@billboard" name="twitter:site"/>
<meta content="Billboard" property="og:site_name"/>
<meta content="article" property="og:type"/>
<link href="/manifest.json" rel="manifest"/>
<link href="/charts/hot-100" rel="canonical"/>
<link href="https://www.billboard.com/assets/1595881739/images/favicon.ico?719fbf92a6bb95f8698f" rel="shortcut icon" type="image/vnd.microsoft.icon"/>
<link href="https://www.billboard.com/assets/1595881739/images/BB_favicon144.png?719fbf92a6bb95f8698f" rel="apple-touch-icon"/>
<link href="https://www.billboard.com" rel="dns-prefetch"/>
<link href="https://www.google-analytics.com/" rel="dns-prefetch"/>
<link href="https://www.google

###  Select the tag where artist names  and songs are stored and clean up the code so we are left with just relevant information.

In [3]:
artists_top100 = [artists_top100.text for artists_top100 in soup.find_all('span', attrs = {"chart-element__information__artist text--truncate color--secondary"})]
artists_top100




['DaBaby Featuring Roddy Ricch',
 'Jack Harlow Featuring DaBaby, Tory Lanez & Lil Wayne',
 'DJ Khaled Featuring Drake',
 'The Weeknd',
 'SAINt JHN',
 'Megan Thee Stallion Featuring Beyonce',
 'Harry Styles',
 'DJ Khaled Featuring Drake',
 'Juice WRLD x Marshmello',
 'Chris Brown & Young Thug',
 'Lil Mosey',
 'Jawsh 685 x Jason Derulo',
 'Juice WRLD',
 'Lady Gaga & Ariana Grande',
 'Harry Styles',
 'Gabby Barrett',
 'Dua Lipa',
 'Post Malone',
 'Lewis Capaldi',
 'Lil Baby & 42 Dugg',
 'Doja Cat Featuring Nicki Minaj',
 'Justin Bieber Featuring Quavo',
 'Pop Smoke Featuring Lil Baby & DaBaby',
 'StaySolidRocky',
 'Luke Bryan',
 'Miranda Lambert',
 'Dua Lipa',
 'Future Featuring Drake',
 'Powfu Featuring beabadoobee',
 'Trevor Daniel',
 'Maren Morris',
 'Pop Smoke Featuring 50 Cent & Roddy Ricch',
 'Sam Hunt',
 'Roddy Ricch',
 'Maddie & Tae',
 'Juice WRLD & Marshmello Featuring Polo G & The Kid LAROI',
 'Lil Baby',
 'Juice WRLD',
 'Morgan Wallen',
 'Surfaces',
 'Rod Wave Featuring ATR Son

In [4]:
top100 = [top100.text for top100 in soup.find_all('span', attrs = {'chart-element__information__song text--truncate color--primary'})]
top100

['Rockstar',
 'Whats Poppin',
 'Popstar',
 'Blinding Lights',
 'Roses',
 'Savage',
 'Watermelon Sugar',
 'Greece',
 'Come & Go',
 'Go Crazy',
 'Blueberry Faygo',
 'Savage Love (Laxed - Siren Beat)',
 'Wishing Well',
 'Rain On Me',
 'Adore You',
 'I Hope',
 'Break My Heart',
 'Circles',
 'Before You Go',
 'We Paid',
 'Say So',
 'Intentions',
 'For The Night',
 'Party Girl',
 'One Margarita',
 'Bluebird',
 "Don't Start Now",
 'Life Is Good',
 'Death Bed',
 'Falling',
 'The Bones',
 'The Woo',
 'Hard To Forget',
 'The Box',
 'Die From A Broken Heart',
 'Hate The Other Side',
 'The Bigger Picture',
 'Conversations',
 "Chasin' You",
 'Sunday Best',
 'Rags2Riches',
 "Life's A Mess",
 'Emotionally Scarred',
 'Said Sum',
 'Toosie Slide',
 'Girls In The Hood',
 'Supalonely',
 'Walk Em Down',
 'Blood On My Jeans',
 'One Big Country Song',
 'Righteous',
 'If The World Was Ending',
 'Got What I Got',
 'I Love My Country',
 'Got It On Me',
 'Done',
 'Like That',
 'Stuck With U',
 'Be A Light',
 "Do

## Using Spotify's API to Collect Data From Billboards Top 100

### Import libraries and getpsass to hide important information

In [5]:
import json
import requests
import spotipy
import spotipy.util as util

In [6]:
from getpass import getpass
from spotipy.oauth2 import SpotifyClientCredentials
import sys

In [7]:
spotify_id = getpass()
#spotify:user:1234896446

········


In [139]:
#Get authorization token

token = util.prompt_for_user_token('1234896446',
                           scope = 'playlist-modify-public',        
                           client_id='e97099ce419a4e91832bc49f3bfb3372',
                           client_secret= spotify_id,
                           redirect_uri= 'http://127.0.0.1:9090')
if token:
    sp = spotipy.Spotify(auth=token)

### Get the uri for all the songs on top100 
 

In [9]:
#Search the song on Spotify
sptf = sp.search(q= top100[2],
          type = 'track'      
     )
sptf

{'tracks': {'href': 'https://api.spotify.com/v1/search?query=Popstar&type=track&offset=0&limit=10',
  'items': [{'album': {'album_type': 'single',
     'artists': [{'external_urls': {'spotify': 'https://open.spotify.com/artist/0QHgL1lAIqAw0HtD7YldmP'},
       'href': 'https://api.spotify.com/v1/artists/0QHgL1lAIqAw0HtD7YldmP',
       'id': '0QHgL1lAIqAw0HtD7YldmP',
       'name': 'DJ Khaled',
       'type': 'artist',
       'uri': 'spotify:artist:0QHgL1lAIqAw0HtD7YldmP'},
      {'external_urls': {'spotify': 'https://open.spotify.com/artist/3TVXtAsR1Inumwj472S9r4'},
       'href': 'https://api.spotify.com/v1/artists/3TVXtAsR1Inumwj472S9r4',
       'id': '3TVXtAsR1Inumwj472S9r4',
       'name': 'Drake',
       'type': 'artist',
       'uri': 'spotify:artist:3TVXtAsR1Inumwj472S9r4'}],
     'available_markets': ['AD',
      'AE',
      'AL',
      'AR',
      'AT',
      'AU',
      'BA',
      'BE',
      'BG',
      'BH',
      'BO',
      'BR',
      'BY',
      'CA',
      'CH',
      

In [10]:
#export the song URI
track_uri = []
for music in top100:
    sptf =sp.search(q= music, type = 'track')
    track_uri.append(sptf['tracks']['items'][0]['uri'])
track_uri

['spotify:track:7ytR5pFWmSjzHJIeQkgog4',
 'spotify:track:1jaTQ3nqY3oAAYyCTbIvnM',
 'spotify:track:6EDO9iiTtwNv6waLwa1UUq',
 'spotify:track:0VjIjW4GlUZAMYd2vXMi3b',
 'spotify:track:24Yi9hE78yPEbZ4kxyoXAI',
 'spotify:track:1xQ6trAsedVPCdbtDAmk0c',
 'spotify:track:6UelLqGlWMcVH1E5c4H7lY',
 'spotify:track:35RJhm1pEovTBwnNR0zWad',
 'spotify:track:7y7w4tl4MaRC2UMEj1mPtr',
 'spotify:track:1IIKrJVP1C9N7iPtG6eOsK',
 'spotify:track:6wJYhPfqk3KGhHRG76WzOh',
 'spotify:track:1xQ6trAsedVPCdbtDAmk0c',
 'spotify:track:6o3QUC5oAE4g6WxRIFcZtb',
 'spotify:track:24ySl2hOPGCDcxBxFIqWBu',
 'spotify:track:3jjujdWJ72nww5eGnfs2E7',
 'spotify:track:6VJTip0bQuZIzTVitK6Z8R',
 'spotify:track:017PF4Q3l4DBUiWoXk4OWT',
 'spotify:track:21jGcNKet2qwijlDFuPiPb',
 'spotify:track:7ce20yLkzuXXLUhzIDoZih',
 'spotify:track:6gxKUmycQX7uyMwJcweFjp',
 'spotify:track:3Dv1eDb0MEgF93GpLXlucZ',
 'spotify:track:364dI1bYnvamSnBJ8JcNzN',
 'spotify:track:0PvFJmanyNQMseIFrU708S',
 'spotify:track:5RqR4ZCCKJDcBLIn4sih9l',
 'spotify:track:

### Get the artist's uri

In [11]:
#test getting the uri for one item

oi = sp.search(q= music, type = 'track')
oi['tracks']['items'][0]['album']['artists'][0]['uri']


'spotify:artist:6l3HvQ5sa6mXTsMTB19rO5'

In [12]:
#gets the uri for all the tracks and creates a list

artist_uri = []
for music in top100:
    sptf1 =sp.search(q= music, type = 'track')
    artist_uri.append(sptf1['tracks']['items'][0]['album']['artists'][0]['uri'])
artist_uri

['spotify:artist:4r63FhuTkUYltbVAg5TQnk',
 'spotify:artist:2LIk90788K0zvyj2JJVwkJ',
 'spotify:artist:0QHgL1lAIqAw0HtD7YldmP',
 'spotify:artist:1Xyo4u8uXC1ZmMpatF05PJ',
 'spotify:artist:0H39MdGGX6dbnnQPt6NQkZ',
 'spotify:artist:56mfhUDKa1vec6rSLZV5Eg',
 'spotify:artist:6KImCVD70vtIoJWnq6nGn3',
 'spotify:artist:0QHgL1lAIqAw0HtD7YldmP',
 'spotify:artist:4MCBfE4596Uoi2O4DtmEMz',
 'spotify:artist:7bXgB6jMjp9ATFy66eO08Z',
 'spotify:artist:5zctI4wO9XSKS8XwcnqEHk',
 'spotify:artist:56mfhUDKa1vec6rSLZV5Eg',
 'spotify:artist:4MCBfE4596Uoi2O4DtmEMz',
 'spotify:artist:1HY2Jd0NmPuamShAr6KMms',
 'spotify:artist:6KImCVD70vtIoJWnq6nGn3',
 'spotify:artist:6TQj5BFPooTa08A7pk8AQ1',
 'spotify:artist:6M2wZ9GZgrQXHCFfjv46we',
 'spotify:artist:246dkjvS1zLTtiykXe5h60',
 'spotify:artist:4GNC7GD6oZMSxPGyXy4MNB',
 'spotify:artist:5f7VJjfbwm532GiveGC0ZK',
 'spotify:artist:5cj0lLjcoR7YOSnhnX0Po5',
 'spotify:artist:1uNFoZAHBGtllmzznpCI3s',
 'spotify:artist:0eDvMgVFoNV3TpwtrVCoTj',
 'spotify:artist:1XLWox9w1Yvbodui0

### Now let's start getting information about these songs

### Getting the genre  

In [13]:
#searches for an artist and uses keys to get the genre

sp.artist(artist_uri[0])['genres']

['north carolina hip hop', 'rap']

In [14]:
genre = []
for uri in artist_uri:
    sptf2 =sp.artist(uri)
    genre.append(sptf2['genres'])
genre

[['north carolina hip hop', 'rap'],
 ['deep underground hip hop', 'kentucky hip hop', 'pop rap', 'rap', 'trap'],
 ['dance pop',
  'hip hop',
  'miami hip hop',
  'pop',
  'pop rap',
  'rap',
  'southern hip hop',
  'trap'],
 ['canadian contemporary r&b', 'canadian pop', 'pop'],
 ['melodic rap', 'pop rap', 'rap', 'trap'],
 [],
 ['pop', 'post-teen pop'],
 ['dance pop',
  'hip hop',
  'miami hip hop',
  'pop',
  'pop rap',
  'rap',
  'southern hip hop',
  'trap'],
 ['chicago rap', 'melodic rap'],
 ['dance pop', 'pop', 'pop rap', 'r&b', 'rap'],
 ['melodic rap', 'rap conscient', 'vapor trap'],
 [],
 ['chicago rap', 'melodic rap'],
 ['dance pop', 'pop'],
 ['pop', 'post-teen pop'],
 ['big room',
  'chicago house',
  'edm',
  'electro house',
  'electropop',
  'house',
  'pop',
  'progressive house',
  'tropical house',
  'vocal house'],
 ['dance pop', 'pop', 'uk pop'],
 ['dfw rap', 'melodic rap', 'rap'],
 ['pop', 'uk pop'],
 ['atl hip hop', 'atl trap', 'rap'],
 ['la indie', 'pop'],
 ['canadia

###  Getting the songs' danceability

In [15]:
danceability = []
for music in track_uri:
    spot = sp.audio_features(music)
    danceability.append(spot[0]['danceability'])

###  Getting the songs' energy


In [16]:
energy = []
for music in track_uri:
    spot = sp.audio_features(music)
    energy.append(spot[0]['energy'])

###  Getting the songs' duration

In [17]:
duration = []
for music in track_uri:
    tm = sp.audio_analysis(music)
    duration.append(tm['track']['duration'])

### Let's create a label 1:100 for the songs 

In [18]:
pos = [1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25,26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44, 45, 46, 47, 48, 49, 50, 51, 52, 53, 54, 55, 56, 57, 58, 59, 60, 61, 62, 63, 64, 65, 66, 67, 68, 69, 70, 71, 72, 73, 74, 75, 76, 77, 78, 79, 80, 81, 82, 83, 84, 85, 86, 87, 88, 89, 90, 91, 92, 93, 94, 95, 96, 97, 98, 99, 100]



## Let's create a dataset with all the information we have 

In [19]:
colnames = ['Position','Artists', 'Song', 'Genre', 'Danceability', 'Energy', 'Duration']
topmusic = []
topmusic.append(pos)
topmusic.append(artists_top100)
topmusic.append(top100)
topmusic.append(genre)
topmusic.append(danceability)
topmusic.append(energy)
topmusic.append(duration)

In [20]:
topmusic = {'Position': pos, 'Artists': artists_top100, 'Songs': top100, 'Genre': genre, 'Danceability': danceability, 'Energy': energy,'Duration':duration}


In [21]:
topmusic = pd.DataFrame(topmusic)

In [334]:
topmusic['origem'] = 'billboard'

In [335]:
topmusic

Unnamed: 0,Position,Artists,Songs,Genre,Danceability,Energy,Duration,origem
0,1,DaBaby Featuring Roddy Ricch,Rockstar,"[north carolina hip hop, rap]",0.746,0.690,181.73334,billboard
1,2,"Jack Harlow Featuring DaBaby, Tory Lanez & Lil...",Whats Poppin,"[deep underground hip hop, kentucky hip hop, p...",0.923,0.604,139.74132,billboard
2,3,DJ Khaled Featuring Drake,Popstar,"[dance pop, hip hop, miami hip hop, pop, pop r...",0.800,0.560,200.22118,billboard
3,4,The Weeknd,Blinding Lights,"[canadian contemporary r&b, canadian pop, pop]",0.514,0.730,200.04000,billboard
4,5,SAINt JHN,Roses,"[melodic rap, pop rap, rap, trap]",0.770,0.724,176.84000,billboard
...,...,...,...,...,...,...,...,...
95,96,Gunna Featuring Young Thug,Dollaz On My Head,"[atl hip hop, melodic rap, rap, trap]",0.825,0.458,197.76000,billboard
96,97,Topic & A7S,Breaking Me,"[german dance, pop edm, tropical house]",0.789,0.720,166.79388,billboard
97,98,Pop Smoke Featuring Karol G,Enjoy Yourself,[brooklyn drill],0.773,0.688,197.93814,billboard
98,99,Parker McCollum,Pretty Heart,"[contemporary country, texas country]",0.562,0.683,244.79773,billboard


In [23]:
topmusic.to_csv('topmusic.csv')

# Creating a Personal Music Database

Before streaming existed, there was Last.FM, a website that allowed you to synchronize your mp3 player to their platform and would give you stats on your listening habits. I was a heavy user around 2006, so I was able to retrieve a lot of data from there. The rest was supplemented by my listening habits Spotify keeps.

## Importing Last.FM Data

In [24]:
import pandas

In [88]:
lastfm = pd.read_csv('LastFM.csv', header = None, index_col=None)
lastfm

Unnamed: 0,0,1,2,3
0,King Princess,Pussy Is God,Pussy Is God,
1,King Princess,Talia,Talia,28 Jul 2020 18:32
2,King Princess,1950,1950,28 Jul 2020 18:29
3,No Party For Cao Dong,醜奴兒,Intro,28 Jul 2020 18:25
4,No Party For Cao Dong,醜奴兒,我們,28 Jul 2020 18:23
...,...,...,...,...
24218,The Cure,Galore (The Singles 1987-1997),Letter To Elise,23 Jun 2006 00:50
24219,The Kooks,Inside In Inside Out,Naive,23 Jun 2006 00:46
24220,Belle and Sebastian,2004-03-24: Copenhagen Denmark (disc 2),Asleep on a Sunbeam,23 Jun 2006 00:42
24221,Le Tigre,Le Tigre,Friendship Station,23 Jun 2006 00:39


In [91]:
#dropping the extra column
lastfm = lastfm.drop(1, axis = 1)

KeyError: '[1] not found in axis'

In [92]:
lastfm.columns = ['Artists','Songs','Date Listened']

In [93]:
lastfm

Unnamed: 0,Artists,Songs,Date Listened
0,King Princess,Pussy Is God,
1,King Princess,Talia,28 Jul 2020 18:32
2,King Princess,1950,28 Jul 2020 18:29
3,No Party For Cao Dong,Intro,28 Jul 2020 18:25
4,No Party For Cao Dong,我們,28 Jul 2020 18:23
...,...,...,...
24218,The Cure,Letter To Elise,23 Jun 2006 00:50
24219,The Kooks,Naive,23 Jun 2006 00:46
24220,Belle and Sebastian,Asleep on a Sunbeam,23 Jun 2006 00:42
24221,Le Tigre,Friendship Station,23 Jun 2006 00:39


In [29]:
#group by artists
lastfm_group = lastfm.groupby('Artists').count()
lastfm_group.sort_values('Songs')

Unnamed: 0_level_0,Songs,Date Listened
Artists,Unnamed: 1_level_1,Unnamed: 2_level_1
Carbona,0,41
Montage,0,6
Monty Python,0,1
Cidade Negra,0,13
Morrissey,0,1
...,...,...
American Football,362,365
Cake,377,424
The Used,387,413
MAE,388,388


## Find the artists on Spotify and collect the genre

In [82]:
#search for every artist's URI
from tqdm.auto import tqdm

import numpy as np


lastfm_artist_uri = []

for i in tqdm(lastfm_group.index):
    try:
        lastfm_artist_uri.append(sp.search(i)['tracks']['items'][0]['artists'][0]['uri'])
    except:
        lastfm_artist_uri.append(np.nan)

HBox(children=(IntProgress(value=0, max=708), HTML(value='')))




In [72]:
type(lastfm_artist_uri)

list

In [83]:
lastfm_genre = []
for x in tqdm (lastfm_artist_uri):
    try:
        sptf3 =sp.artist(x)
        lastfm_genre.append(sptf3['genres'])
    except:
        lastfm_genre.append([])
lastfm_genre

HBox(children=(IntProgress(value=0, max=708), HTML(value='')))




[['boy band', 'dance pop', 'europop', 'pop'],
 [],
 ['dance pop', 'pop', 'pop rap', 'post-teen pop'],
 [],
 ['modern rock', 'pop punk', 'post-grunge', 'rock'],
 ['new wave pop', 'pop rock'],
 ['brooklyn drill'],
 [],
 ['post-rock'],
 [],
 [],
 ['metalcore', 'pop punk', 'screamo'],
 ['indiecoustica', 'neo mellow', 'piano rock', 'pop rock'],
 ['bubblegum dance', 'dance pop', 'europop'],
 ['album rock', 'australian rock', 'hard rock', 'rock'],
 ['beatlesque',
  'british invasion',
  'classic rock',
  'merseybeat',
  'psychedelic rock',
  'rock'],
 ['beatlesque',
  'british invasion',
  'classic rock',
  'merseybeat',
  'psychedelic rock',
  'rock'],
 ['melodic rap'],
 ['regional mexican pop'],
 ['dance pop', 'edm', 'pop'],
 ['atl hip hop',
  'dance pop',
  'hip hop',
  'pop',
  'pop rap',
  'rap',
  'southern hip hop'],
 ['alabama indie', 'indie rock', 'modern blues rock', 'rock'],
 ['cantautor',
  'latin',
  'latin arena pop',
  'latin pop',
  'mexican pop',
  'rock en espanol',
  'spani

## Find the music on Spotify and collect the data

In [121]:
lastfm_music_group = lastfm.groupby('Songs').count()
lastfm_music_group
sp.search

Unnamed: 0_level_0,Artists,Date Listened
Songs,Unnamed: 1_level_1,Unnamed: 2_level_1
#1,7,7
(*)Black Tongue,6,6
(I Can't Get No) Satisfaction,8,8
(I Got That) Boom Boom,3,3
(Marie's the Name) His Latest Flame,1,1
...,...,...
情歌,1,1
我們,1,1
還願,1,1
醜,2,2


In [175]:
lastfm

Unnamed: 0,Artists,Songs,Date Listened
0,King Princess,Pussy Is God,
1,King Princess,Talia,28 Jul 2020 18:32
2,King Princess,1950,28 Jul 2020 18:29
3,No Party For Cao Dong,Intro,28 Jul 2020 18:25
4,No Party For Cao Dong,我們,28 Jul 2020 18:23
...,...,...,...
24218,The Cure,Letter To Elise,23 Jun 2006 00:50
24219,The Kooks,Naive,23 Jun 2006 00:46
24220,Belle and Sebastian,Asleep on a Sunbeam,23 Jun 2006 00:42
24221,Le Tigre,Friendship Station,23 Jun 2006 00:39


In [130]:
lastfm_track_uri = []

for i in tqdm(lastfm_music_group.index):
    try:
        lastfm_track_uri.append(sp.search(i)['tracks']['items'][0]['uri'])
    except:
        lastfm_track_uri.append(np.nan)

HBox(children=(IntProgress(value=0, max=6196), HTML(value='')))




### Get Song's Danceability 

In [134]:
lastfm_danceability = []
for music in tqdm (lastfm_track_uri):
    try:
        x = sp.audio_features(music)
        lastfm_danceability.append(x[0]['danceability'])
    except:
        lastfm_danceability.append([])
    

HBox(children=(IntProgress(value=0, max=6196), HTML(value='')))




### Get Song's Energy 


In [136]:
lastfm_energy = []
for music in tqdm(lastfm_track_uri):
    try:
        x = sp.audio_features(music)
        lastfm_energy.append(x[0]['energy'])
    except:
        lastfm_energy.append([])
  

HBox(children=(IntProgress(value=0, max=6196), HTML(value='')))




### Let's Create a Dataset 

In [181]:
len(lastfm_track_uri)

6196

In [185]:
data = pd.DataFrame({'Songs': lastfm_music_group.index,'track_uri': lastfm_track_uri, 'danceability':lastfm_danceability, 'energy':lastfm_energy})
data

Unnamed: 0,Songs,track_uri,danceability,energy
0,#1,spotify:track:3GDnVUfriizmbSEOLieKP8,0.652,0.612
1,(*)Black Tongue,spotify:track:0KdjtHI5Acg6SMoomxQaCb,0.216,0.969
2,(I Can't Get No) Satisfaction,spotify:track:2PzU4IB8Dr6mxV3lHuaG34,0.723,0.863
3,(I Got That) Boom Boom,spotify:track:5epx5YtoMbV0GrL9qx9kVY,0.864,0.87
4,(Marie's the Name) His Latest Flame,spotify:track:4qbLOU1T7xgTH87eUSkvJ1,0.661,0.917
...,...,...,...,...
6191,情歌,spotify:track:7rd1wNn9MY4fPL9HhAaJSw,0.616,0.144
6192,我們,spotify:track:143NwGtUqaVYjXrY1v5jum,0.301,0.622
6193,還願,spotify:track:2Wew8zqBUtIXwZnxpIHdSf,0.502,0.202
6194,醜,spotify:track:4ecuKqF3MB966IrrYBTepC,0.757,0.65


In [187]:
lastfm_final = pd.merge(left=lastfm, right=data, on='Songs', how='left')

In [189]:
data1 = pd.DataFrame({'Artists': lastfm_group.index , 'artist_uri' : lastfm_artist_uri, 'genre' : lastfm_genre  })

In [192]:
lastfm_final = pd.merge(left = lastfm_final, right = data1, on = 'Artists', how = 'left') 

In [193]:
lastfm_final

Unnamed: 0,Artists,Songs,Date Listened,track_uri,danceability,energy,artist_uri,genre
0,King Princess,Pussy Is God,,spotify:track:6VCeywT4JeawuZOUkQ1okx,0.739,0.621,spotify:artist:6beUvFUlKliUYJdLOXNj9C,"[dance pop, electropop, indie pop, nyc pop, pop]"
1,King Princess,Talia,28 Jul 2020 18:32,spotify:track:53jbdPQBaH6WaQvW0zmGBs,0.401,0.507,spotify:artist:6beUvFUlKliUYJdLOXNj9C,"[dance pop, electropop, indie pop, nyc pop, pop]"
2,King Princess,1950,28 Jul 2020 18:29,spotify:track:0CZ8lquoTX2Dkg7Ak2inwA,0.6,0.535,spotify:artist:6beUvFUlKliUYJdLOXNj9C,"[dance pop, electropop, indie pop, nyc pop, pop]"
3,No Party For Cao Dong,Intro,28 Jul 2020 18:25,spotify:track:27py1Q0fMmpuSYiOAKBZPb,0.665,0.831,spotify:artist:3HXSUfI76zVZk71UMAeVfp,"[chinese indie, taiwan indie, taiwan pop, taiw..."
4,No Party For Cao Dong,我們,28 Jul 2020 18:23,spotify:track:143NwGtUqaVYjXrY1v5jum,0.301,0.622,spotify:artist:3HXSUfI76zVZk71UMAeVfp,"[chinese indie, taiwan indie, taiwan pop, taiw..."
...,...,...,...,...,...,...,...,...
24218,The Cure,Letter To Elise,23 Jun 2006 00:50,spotify:track:7mEGddVRDdESAibWOnbXoA,0.53,0.592,spotify:artist:1HY2Jd0NmPuamShAr6KMms,"[dance pop, pop]"
24219,The Kooks,Naive,23 Jun 2006 00:46,spotify:track:7BHPGtpuuWWsvE7cCaMuEU,0.391,0.808,spotify:artist:1GLtl8uqKmnyCWxHmw9tL4,"[brighton indie, indie pop, modern rock, rock]"
24220,Belle and Sebastian,Asleep on a Sunbeam,23 Jun 2006 00:42,spotify:track:5SQ2d653ZtDmLU0N1CXVus,0.618,0.507,spotify:artist:4I2BJf80C0skQpp1sQmA0h,"[alternative rock, anti-folk, baroque pop, cha..."
24221,Le Tigre,Friendship Station,23 Jun 2006 00:39,spotify:track:3m8KgKtNqtdMieItf9xA8X,0.722,0.979,spotify:artist:2n6FviARgtjjimZXu18uRM,"[alternative dance, dance-punk, electroclash, ..."


In [194]:
lastfm_final.to_csv('lastfmdata.csv')

In [203]:
lastfm_project = lastfm_final.drop(columns = ['track_uri', 'artist_uri'])

In [328]:
lastfm_project = lastfm_project.drop('Date Listened', axis = 1)

## Importing Spotify's Data 

In [291]:
#We need a different token for this, let's change the scope

token = util.prompt_for_user_token('1234896446',
                           scope = 'user-top-read',        
                           client_id='e97099ce419a4e91832bc49f3bfb3372',
                           client_secret= spotify_id,
                           redirect_uri= 'http://127.0.0.1:9090')
if token:
    sp = spotipy.Spotify(auth=token)

### Saved Tracks 

In [41]:
#Checks my saved tracks
saved50 = sp.current_user_saved_tracks(limit=50, offset=0)
saved50

{'href': 'https://api.spotify.com/v1/me/tracks?offset=0&limit=50',
 'items': [{'added_at': '2020-07-29T13:17:14Z',
   'track': {'album': {'album_type': 'compilation',
     'artists': [{'external_urls': {'spotify': 'https://open.spotify.com/artist/0LyfQWJT6nXafLPZqxe9Of'},
       'href': 'https://api.spotify.com/v1/artists/0LyfQWJT6nXafLPZqxe9Of',
       'id': '0LyfQWJT6nXafLPZqxe9Of',
       'name': '',
       'type': 'artist',
       'uri': 'spotify:artist:0LyfQWJT6nXafLPZqxe9Of'}],
     'available_markets': [],
     'external_urls': {'spotify': 'https://open.spotify.com/album/717UG2du6utFe7CdmpuUe3'},
     'href': 'https://api.spotify.com/v1/albums/717UG2du6utFe7CdmpuUe3',
     'id': '717UG2du6utFe7CdmpuUe3',
     'images': [],
     'name': '',
     'release_date': '2012-01-05',
     'release_date_precision': 'day',
     'total_tracks': 20,
     'type': 'album',
     'uri': 'spotify:album:717UG2du6utFe7CdmpuUe3'},
    'artists': [{'external_urls': {'spotify': 'https://open.spotify.co

In [42]:
#Getting another 50 tracks, due to an API limit
saved100 = sp.current_user_saved_tracks(limit=50, offset=50)
saved100

{'href': 'https://api.spotify.com/v1/me/tracks?offset=50&limit=50',
 'items': [{'added_at': '2017-08-27T23:36:36Z',
   'track': {'album': {'album_type': 'album',
     'artists': [{'external_urls': {'spotify': 'https://open.spotify.com/artist/27M9shmwhIjRo7WntpT9Rp'},
       'href': 'https://api.spotify.com/v1/artists/27M9shmwhIjRo7WntpT9Rp',
       'id': '27M9shmwhIjRo7WntpT9Rp',
       'name': 'Frank Turner',
       'type': 'artist',
       'uri': 'spotify:artist:27M9shmwhIjRo7WntpT9Rp'}],
     'available_markets': ['AD',
      'AE',
      'AL',
      'AR',
      'AT',
      'AU',
      'BA',
      'BE',
      'BG',
      'BH',
      'BO',
      'BR',
      'BY',
      'CA',
      'CH',
      'CL',
      'CO',
      'CR',
      'CY',
      'CZ',
      'DE',
      'DK',
      'DO',
      'DZ',
      'EC',
      'EE',
      'EG',
      'ES',
      'FI',
      'FR',
      'GR',
      'GT',
      'HK',
      'HN',
      'HR',
      'HU',
      'ID',
      'IL',
      'IN',
      'IS',
   

In [43]:
#Getting another 50 tracks, due to an API limit
saved150 = sp.current_user_saved_tracks(limit=50, offset=100)
saved150

{'href': 'https://api.spotify.com/v1/me/tracks?offset=100&limit=50',
 'items': [{'added_at': '2017-06-16T00:26:55Z',
   'track': {'album': {'album_type': 'album',
     'artists': [{'external_urls': {'spotify': 'https://open.spotify.com/artist/0YrtvWJMgSdVrk3SfNjTbx'},
       'href': 'https://api.spotify.com/v1/artists/0YrtvWJMgSdVrk3SfNjTbx',
       'id': '0YrtvWJMgSdVrk3SfNjTbx',
       'name': 'Death Cab for Cutie',
       'type': 'artist',
       'uri': 'spotify:artist:0YrtvWJMgSdVrk3SfNjTbx'}],
     'available_markets': ['AD',
      'AE',
      'AL',
      'AR',
      'AT',
      'AU',
      'BA',
      'BE',
      'BG',
      'BH',
      'BO',
      'BR',
      'BY',
      'CA',
      'CH',
      'CL',
      'CO',
      'CR',
      'CY',
      'CZ',
      'DE',
      'DK',
      'DO',
      'DZ',
      'EC',
      'EE',
      'EG',
      'ES',
      'FI',
      'FR',
      'GB',
      'GR',
      'GT',
      'HK',
      'HN',
      'HR',
      'HU',
      'ID',
      'IE',
      '

In [59]:
#Getting another 50 tracks, due to an API limit
saved200 = sp.current_user_saved_tracks(limit=50, offset=150)
saved200

{'href': 'https://api.spotify.com/v1/me/tracks?offset=150&limit=50',
 'items': [{'added_at': '2017-01-14T19:02:15Z',
   'track': {'album': {'album_type': 'album',
     'artists': [{'external_urls': {'spotify': 'https://open.spotify.com/artist/3HXSUfI76zVZk71UMAeVfp'},
       'href': 'https://api.spotify.com/v1/artists/3HXSUfI76zVZk71UMAeVfp',
       'id': '3HXSUfI76zVZk71UMAeVfp',
       'name': 'No Party For Cao Dong',
       'type': 'artist',
       'uri': 'spotify:artist:3HXSUfI76zVZk71UMAeVfp'}],
     'available_markets': [],
     'external_urls': {'spotify': 'https://open.spotify.com/album/7yTq3BMkGtRDa76rLBFo07'},
     'href': 'https://api.spotify.com/v1/albums/7yTq3BMkGtRDa76rLBFo07',
     'id': '7yTq3BMkGtRDa76rLBFo07',
     'images': [{'height': 640,
       'url': 'https://i.scdn.co/image/ab67616d0000b273e63f20f7e63a76d7db8350da',
       'width': 640},
      {'height': 300,
       'url': 'https://i.scdn.co/image/ab67616d00001e02e63f20f7e63a76d7db8350da',
       'width': 300},


In [155]:
#Joining them into one dataset
saved = saved50['items'] + saved100['items'] + saved150['items'] + saved200['items']

In [156]:
#getting the artists' uri
artisturi = []
for x in saved:
    artisturi.append(x['track']['artists'][0]['uri'])
    
artisturi

['spotify:artist:33ScadVnbm2X8kkUqOkC6Z',
 'spotify:artist:05SdqPzK4m3k1ljK2wrTSP',
 'spotify:artist:0hDjKSKjl1DC7ovYTDJHe8',
 'spotify:artist:55VydwMyCuGcavwPuhutPL',
 'spotify:artist:2FXC3k01G6Gw61bmprjgqS',
 'spotify:artist:2FXC3k01G6Gw61bmprjgqS',
 'spotify:artist:2FXC3k01G6Gw61bmprjgqS',
 'spotify:artist:2FXC3k01G6Gw61bmprjgqS',
 'spotify:artist:2FXC3k01G6Gw61bmprjgqS',
 'spotify:artist:2FXC3k01G6Gw61bmprjgqS',
 'spotify:artist:2FXC3k01G6Gw61bmprjgqS',
 'spotify:artist:2FXC3k01G6Gw61bmprjgqS',
 'spotify:artist:2FXC3k01G6Gw61bmprjgqS',
 'spotify:artist:2FXC3k01G6Gw61bmprjgqS',
 'spotify:artist:2FXC3k01G6Gw61bmprjgqS',
 'spotify:artist:2FXC3k01G6Gw61bmprjgqS',
 'spotify:artist:2FXC3k01G6Gw61bmprjgqS',
 'spotify:artist:2FXC3k01G6Gw61bmprjgqS',
 'spotify:artist:2FXC3k01G6Gw61bmprjgqS',
 'spotify:artist:2FXC3k01G6Gw61bmprjgqS',
 'spotify:artist:2FXC3k01G6Gw61bmprjgqS',
 'spotify:artist:2FXC3k01G6Gw61bmprjgqS',
 'spotify:artist:2FXC3k01G6Gw61bmprjgqS',
 'spotify:artist:2FXC3k01G6Gw61bmp

### Get the Genre of These Artists 

In [174]:
saved_genre = []
for v in tqdm (artisturi):
    y = sp.artist(v)
    saved_genre.append(y['genres'])
saved_genre

HBox(children=(IntProgress(value=0, max=200), HTML(value='')))




[['latin', 'latin hip hop', 'puerto rican pop', 'reggaeton', 'tropical'],
 ['emo rap', 'pop rap', 'sacramento hip hop'],
 ['indie r&b', 'pop'],
 ['emo', 'post-hardcore', 'screamo'],
 ['irish singer-songwriter', 'pop'],
 ['irish singer-songwriter', 'pop'],
 ['irish singer-songwriter', 'pop'],
 ['irish singer-songwriter', 'pop'],
 ['irish singer-songwriter', 'pop'],
 ['irish singer-songwriter', 'pop'],
 ['irish singer-songwriter', 'pop'],
 ['irish singer-songwriter', 'pop'],
 ['irish singer-songwriter', 'pop'],
 ['irish singer-songwriter', 'pop'],
 ['irish singer-songwriter', 'pop'],
 ['irish singer-songwriter', 'pop'],
 ['irish singer-songwriter', 'pop'],
 ['irish singer-songwriter', 'pop'],
 ['irish singer-songwriter', 'pop'],
 ['irish singer-songwriter', 'pop'],
 ['irish singer-songwriter', 'pop'],
 ['irish singer-songwriter', 'pop'],
 ['irish singer-songwriter', 'pop'],
 ['irish singer-songwriter', 'pop'],
 ['irish singer-songwriter', 'pop'],
 ['irish singer-songwriter', 'pop'],
 ['i

### Get the Name of These Artists

In [226]:
saved_artist = []
for x in saved:
    saved_artist.append(x['track']['artists'][0]['name'])
    
saved_artist



['',
 'Hobo Johnson',
 'Bruno Major',
 'The Used',
 'Hozier',
 'Hozier',
 'Hozier',
 'Hozier',
 'Hozier',
 'Hozier',
 'Hozier',
 'Hozier',
 'Hozier',
 'Hozier',
 'Hozier',
 'Hozier',
 'Hozier',
 'Hozier',
 'Hozier',
 'Hozier',
 'Hozier',
 'Hozier',
 'Hozier',
 'Hozier',
 'Hozier',
 'Hozier',
 'Hozier',
 'Hozier',
 'Hozier',
 'Hozier',
 'Hozier',
 'Hozier',
 'Hozier',
 'Hozier',
 'Hozier',
 'Hozier',
 'Hozier',
 'Hozier',
 'Hozier',
 'Hozier',
 'Hozier',
 'Hozier',
 'Hozier',
 'Hozier',
 'Frank Turner',
 'Frank Turner',
 'Frank Turner',
 'Frank Turner',
 'Frank Turner',
 'Frank Turner',
 'Frank Turner',
 'Frank Turner',
 'Frank Turner',
 'Frank Turner',
 'Frank Turner',
 'Frank Turner',
 'Frank Turner',
 'Frank Turner',
 'Frank Turner',
 'Frank Turner',
 'Frank Turner',
 'Frank Turner',
 'Local Natives',
 'T-Pain',
 'State Champs',
 'Dance Gavin Dance',
 'New Years Day',
 'The Amity Affliction',
 'Andy Black',
 'Grayscale',
 'Capsize',
 'Boston Manor',
 'Eat Your Heart Out',
 'The Plot 

### Get Songs Information

In [234]:
#getting the songs' uri
songsuri = []
for x in saved:
    songsuri.append(x['track']['uri'])
    
songsuri

['spotify:track:74soW454u2CuC4yaxDVClP',
 'spotify:track:68VR6uEFIwkBEZSBzyeJZ1',
 'spotify:track:6AxRGtu8gdKPeynxdHsmzC',
 'spotify:track:2PzjnmkdMr6eOkFwQgqmyC',
 'spotify:track:2QsquiKBcjvDUlO6QyvMEs',
 'spotify:track:7uttm8Iurm5uK67Vr9G2Sp',
 'spotify:track:0SFq19lgG4qbQAZaNZ3xhO',
 'spotify:track:2ApgD2AGOa4XxzWDdqHM65',
 'spotify:track:2BnhDoRRohPQhs7zrqVq2L',
 'spotify:track:6dRjQxGUFZRY3ydat2JSu7',
 'spotify:track:4li9KmEUg6RVrAe1Y4p86W',
 'spotify:track:3ieUtb4ecQgEYxae8dzEUi',
 'spotify:track:29bXfCo3mQVzohwPJXohaU',
 'spotify:track:2SopPHDZ2cKMbFumNtFQL9',
 'spotify:track:5yZtHXjySe6jDjOM6Yi3Mw',
 'spotify:track:0L3oQdYUvVbsOOPJSkZ3C3',
 'spotify:track:0bgP0aKOxvdFxB2EfDrulY',
 'spotify:track:1IbgYIuVwJsibCcT5GFYHy',
 'spotify:track:2EiGECydkS2M8OCcRHQZhT',
 'spotify:track:5FRiiYGylFmqV4dPgwncK2',
 'spotify:track:2c32F1jt47FbgVmJXwkDGq',
 'spotify:track:5ZPVQMYn6nD6KYY9U6lwVA',
 'spotify:track:13eVZd07tPhyCJXGCf4y0V',
 'spotify:track:1GVfAOyh0mGrK6Hx7HwPty',
 'spotify:track:

In [324]:
#get songs' names
saved_songs = []
for x in saved:
    saved_songs.append(x['track']['name'])

In [325]:
saved_songs

['',
 'Happiness',
 'Nothing',
 'Take It Away',
 'Nina Cried Power',
 'Almost (Sweet Music)',
 'Movement',
 'No Plan',
 'Nobody',
 'To Noise Making (Sing)',
 'As It Was',
 'Shrike',
 'Talk',
 'Be',
 'Dinner & Diatribes',
 'Would That I',
 'Sunlight',
 'Wasteland, Baby!',
 'Take Me To Church',
 'Angel Of Small Death & The Codeine Scene',
 'Jackie And Wilson',
 'Someone New',
 'To Be Alone',
 'From Eden',
 'In A Week',
 'Sedated',
 'Work Song',
 'Like Real People Do',
 'It Will Come Back',
 "Foreigner's God",
 'Cherry Wine - Live',
 'In The Woods Somewhere',
 'Run',
 "Arsonist's Lullabye",
 'My Love Will Never Die',
 'From Eden - Live',
 'Jackie And Wilson - Live',
 'Someone New - Live',
 'Work Song - Live',
 'Take Me To Church - Live',
 'Problem / Regulate - BBC Live Lounge / 2015',
 "Whole Lotta Love - The Dermot O'Leary Saturday Sessions Show / BBC Radio 2 / 2015",
 'Do I Wanna Know? - Live At the BBC / 2015',
 'Lay Me Down - BBC Live Lounge / 2015',
 'Peggy Sang The Blues',
 'I Still

In [237]:
#get danceability
sptf_danceability = []
for music in tqdm (songsuri):
    x = sp.audio_features(music)
    sptf_danceability.append(x[0]['danceability'])


HBox(children=(IntProgress(value=0, max=200), HTML(value='')))

In [238]:
#get energy
sptf_energy = []
for music in tqdm(songsuri):
    x = sp.audio_features(music)
    sptf_energy.append(x[0]['energy'])
    

HBox(children=(IntProgress(value=0, max=200), HTML(value='')))

## Creating a DataFrame 

In [326]:
data_spotify = pd.DataFrame({'Artists': saved_artist , 'artist_uri' : artisturi, 'Songs': saved_songs,  'danceability':sptf_danceability, 'energy' : sptf_energy, 'genre' : saved_genre })

In [330]:
data_spotify.to_csv('spotify.csv')

In [332]:
data_spotify = data_spotify.drop('artist_uri', axis = 1)

In [339]:
data_spotify['origin'] = 'spotify'

In [336]:
lastfm_project ['origin'] = 'lastfm'

In [340]:
#Combines both lastfm and spotify data
personal_df = pd.concat([lastfm_project, data_spotify] )
personal_df

Unnamed: 0,Artists,Songs,danceability,energy,genre,origin,origem
0,King Princess,Pussy Is God,0.739,0.621,"[dance pop, electropop, indie pop, nyc pop, pop]",lastfm,
1,King Princess,Talia,0.401,0.507,"[dance pop, electropop, indie pop, nyc pop, pop]",lastfm,
2,King Princess,1950,0.6,0.535,"[dance pop, electropop, indie pop, nyc pop, pop]",lastfm,
3,No Party For Cao Dong,Intro,0.665,0.831,"[chinese indie, taiwan indie, taiwan pop, taiw...",lastfm,
4,No Party For Cao Dong,我們,0.301,0.622,"[chinese indie, taiwan indie, taiwan pop, taiw...",lastfm,
...,...,...,...,...,...,...,...
195,Los Hermanos,Sapato Novo,0.499,0.0906,"[brazilian rock, mpb, nova mpb, rock alternati...",spotify,spotify
196,Los Hermanos,Pois é,0.664,0.328,"[brazilian rock, mpb, nova mpb, rock alternati...",spotify,spotify
197,Los Hermanos,É de Lágrima,0.197,0.165,"[brazilian rock, mpb, nova mpb, rock alternati...",spotify,spotify
198,Los Hermanos,Anna Júlia,0.427,0.817,"[brazilian rock, mpb, nova mpb, rock alternati...",spotify,spotify


In [343]:
personal_df

Unnamed: 0,Artists,Songs,danceability,energy,genre,origin
0,King Princess,Pussy Is God,0.739,0.621,"[dance pop, electropop, indie pop, nyc pop, pop]",lastfm
1,King Princess,Talia,0.401,0.507,"[dance pop, electropop, indie pop, nyc pop, pop]",lastfm
2,King Princess,1950,0.6,0.535,"[dance pop, electropop, indie pop, nyc pop, pop]",lastfm
3,No Party For Cao Dong,Intro,0.665,0.831,"[chinese indie, taiwan indie, taiwan pop, taiw...",lastfm
4,No Party For Cao Dong,我們,0.301,0.622,"[chinese indie, taiwan indie, taiwan pop, taiw...",lastfm
...,...,...,...,...,...,...
195,Los Hermanos,Sapato Novo,0.499,0.0906,"[brazilian rock, mpb, nova mpb, rock alternati...",spotify
196,Los Hermanos,Pois é,0.664,0.328,"[brazilian rock, mpb, nova mpb, rock alternati...",spotify
197,Los Hermanos,É de Lágrima,0.197,0.165,"[brazilian rock, mpb, nova mpb, rock alternati...",spotify
198,Los Hermanos,Anna Júlia,0.427,0.817,"[brazilian rock, mpb, nova mpb, rock alternati...",spotify


# Measuring the distance between my personal music preference and billboard's top 100

In [346]:
from scipy.spatial.distance import pdist, squareform

In [385]:
top = topmusic[['Artists','Songs','Danceability','Energy','Genre','origem']]

In [386]:
top.columns = personal_df.columns

In [387]:
all_df = pd.concat([personal_df, top])

In [388]:
all_df = all_df.reset_index(drop=True)

In [426]:
personal_df_clean = all_df.loc[~all_df.genre.apply(lambda x: len(x) == 0), :].reset_index(drop=True)

personal_df_clean = personal_df_clean.loc[personal_df_clean.danceability.apply(lambda x: type(x) == float), :].reset_index(drop=True)

In [454]:
personal_df_clean

Unnamed: 0,Artists,Songs,danceability,energy,genre,origin
0,King Princess,Pussy Is God,0.739,0.621,"[dance pop, electropop, indie pop, nyc pop, pop]",lastfm
1,King Princess,Talia,0.401,0.507,"[dance pop, electropop, indie pop, nyc pop, pop]",lastfm
2,King Princess,1950,0.6,0.535,"[dance pop, electropop, indie pop, nyc pop, pop]",lastfm
3,No Party For Cao Dong,Intro,0.665,0.831,"[chinese indie, taiwan indie, taiwan pop, taiw...",lastfm
4,No Party For Cao Dong,我們,0.301,0.622,"[chinese indie, taiwan indie, taiwan pop, taiw...",lastfm
...,...,...,...,...,...,...
22069,Gunna Featuring Young Thug,Dollaz On My Head,0.825,0.458,"[atl hip hop, melodic rap, rap, trap]",billboard
22070,Topic & A7S,Breaking Me,0.789,0.72,"[german dance, pop edm, tropical house]",billboard
22071,Pop Smoke Featuring Karol G,Enjoy Yourself,0.773,0.688,[brooklyn drill],billboard
22072,Parker McCollum,Pretty Heart,0.562,0.683,"[contemporary country, texas country]",billboard


In [427]:
personal_df_clean.shape

(22074, 6)

In [428]:
dummies = pd.get_dummies(personal_df_clean.genre.apply(pd.Series).stack()).sum(level=0)

In [429]:
dummies.shape

(22074, 614)

In [430]:
df_dist = pd.concat([personal_df_clean[['danceability','energy','origin']], dummies], axis=1)
df_dist

Unnamed: 0,danceability,energy,origin,a cappella,abstract idm,acid rock,acoustic pop,action rock,adoracao,adult standards,...,vocal house,washington indie,water,welsh metal,welsh rock,witch house,wrestling,yacht rock,york indie,zolo
0,0.739,0.621,lastfm,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
1,0.401,0.507,lastfm,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
2,0.6,0.535,lastfm,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
3,0.665,0.831,lastfm,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
4,0.301,0.622,lastfm,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
22069,0.825,0.458,billboard,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
22070,0.789,0.72,billboard,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
22071,0.773,0.688,billboard,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
22072,0.562,0.683,billboard,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0


In [431]:
x = df_dist.query('origin == "billboard"').drop('origin', axis=1)

In [432]:
x.iloc[[0], :]

Unnamed: 0,danceability,energy,a cappella,abstract idm,acid rock,acoustic pop,action rock,adoracao,adult standards,afro house,...,vocal house,washington indie,water,welsh metal,welsh rock,witch house,wrestling,yacht rock,york indie,zolo
21980,0.746,0.69,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0


In [433]:
y = df_dist.query('origin != "billboard"').drop('origin', axis=1)

In [434]:
y

Unnamed: 0,danceability,energy,a cappella,abstract idm,acid rock,acoustic pop,action rock,adoracao,adult standards,afro house,...,vocal house,washington indie,water,welsh metal,welsh rock,witch house,wrestling,yacht rock,york indie,zolo
0,0.739,0.621,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
1,0.401,0.507,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
2,0.6,0.535,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
3,0.665,0.831,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
4,0.301,0.622,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
21975,0.499,0.0906,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
21976,0.664,0.328,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
21977,0.197,0.165,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
21978,0.427,0.817,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0


In [435]:
concatted_df = pd.concat([x.iloc[[0], :], y.iloc[[0],:]])
concatted_df

Unnamed: 0,danceability,energy,a cappella,abstract idm,acid rock,acoustic pop,action rock,adoracao,adult standards,afro house,...,vocal house,washington indie,water,welsh metal,welsh rock,witch house,wrestling,yacht rock,york indie,zolo
21980,0.746,0.69,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
0,0.739,0.621,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0


In [437]:
pdist(concatted_df)[0]

2.6466601595218076

In [439]:
distances_b_s = []
avg_distances = []

for i in tqdm(range(len(x))):
    for j in tqdm(range(len(y))):

        concatted_df = pd.concat([x.iloc[[i], :], y.iloc[[j],:]])

        distances_b_s.append(pdist(concatted_df)[0])
        
    avg_distances.append(np.mean(distances_b_s))

HBox(children=(IntProgress(value=0, max=94), HTML(value='')))

HBox(children=(IntProgress(value=0, max=21980), HTML(value='')))

HBox(children=(IntProgress(value=0, max=21980), HTML(value='')))

HBox(children=(IntProgress(value=0, max=21980), HTML(value='')))

HBox(children=(IntProgress(value=0, max=21980), HTML(value='')))

HBox(children=(IntProgress(value=0, max=21980), HTML(value='')))

HBox(children=(IntProgress(value=0, max=21980), HTML(value='')))

HBox(children=(IntProgress(value=0, max=21980), HTML(value='')))

HBox(children=(IntProgress(value=0, max=21980), HTML(value='')))

HBox(children=(IntProgress(value=0, max=21980), HTML(value='')))

HBox(children=(IntProgress(value=0, max=21980), HTML(value='')))

HBox(children=(IntProgress(value=0, max=21980), HTML(value='')))

HBox(children=(IntProgress(value=0, max=21980), HTML(value='')))

HBox(children=(IntProgress(value=0, max=21980), HTML(value='')))

HBox(children=(IntProgress(value=0, max=21980), HTML(value='')))

HBox(children=(IntProgress(value=0, max=21980), HTML(value='')))

HBox(children=(IntProgress(value=0, max=21980), HTML(value='')))

HBox(children=(IntProgress(value=0, max=21980), HTML(value='')))

HBox(children=(IntProgress(value=0, max=21980), HTML(value='')))

HBox(children=(IntProgress(value=0, max=21980), HTML(value='')))

HBox(children=(IntProgress(value=0, max=21980), HTML(value='')))

HBox(children=(IntProgress(value=0, max=21980), HTML(value='')))

HBox(children=(IntProgress(value=0, max=21980), HTML(value='')))

HBox(children=(IntProgress(value=0, max=21980), HTML(value='')))

HBox(children=(IntProgress(value=0, max=21980), HTML(value='')))

HBox(children=(IntProgress(value=0, max=21980), HTML(value='')))

HBox(children=(IntProgress(value=0, max=21980), HTML(value='')))

HBox(children=(IntProgress(value=0, max=21980), HTML(value='')))

HBox(children=(IntProgress(value=0, max=21980), HTML(value='')))

HBox(children=(IntProgress(value=0, max=21980), HTML(value='')))

HBox(children=(IntProgress(value=0, max=21980), HTML(value='')))

HBox(children=(IntProgress(value=0, max=21980), HTML(value='')))

HBox(children=(IntProgress(value=0, max=21980), HTML(value='')))

HBox(children=(IntProgress(value=0, max=21980), HTML(value='')))

HBox(children=(IntProgress(value=0, max=21980), HTML(value='')))

HBox(children=(IntProgress(value=0, max=21980), HTML(value='')))

HBox(children=(IntProgress(value=0, max=21980), HTML(value='')))

HBox(children=(IntProgress(value=0, max=21980), HTML(value='')))

HBox(children=(IntProgress(value=0, max=21980), HTML(value='')))

HBox(children=(IntProgress(value=0, max=21980), HTML(value='')))

HBox(children=(IntProgress(value=0, max=21980), HTML(value='')))

HBox(children=(IntProgress(value=0, max=21980), HTML(value='')))

HBox(children=(IntProgress(value=0, max=21980), HTML(value='')))

HBox(children=(IntProgress(value=0, max=21980), HTML(value='')))

HBox(children=(IntProgress(value=0, max=21980), HTML(value='')))

HBox(children=(IntProgress(value=0, max=21980), HTML(value='')))

HBox(children=(IntProgress(value=0, max=21980), HTML(value='')))

HBox(children=(IntProgress(value=0, max=21980), HTML(value='')))

HBox(children=(IntProgress(value=0, max=21980), HTML(value='')))

HBox(children=(IntProgress(value=0, max=21980), HTML(value='')))

HBox(children=(IntProgress(value=0, max=21980), HTML(value='')))

HBox(children=(IntProgress(value=0, max=21980), HTML(value='')))

HBox(children=(IntProgress(value=0, max=21980), HTML(value='')))

HBox(children=(IntProgress(value=0, max=21980), HTML(value='')))

HBox(children=(IntProgress(value=0, max=21980), HTML(value='')))

HBox(children=(IntProgress(value=0, max=21980), HTML(value='')))

HBox(children=(IntProgress(value=0, max=21980), HTML(value='')))

HBox(children=(IntProgress(value=0, max=21980), HTML(value='')))

HBox(children=(IntProgress(value=0, max=21980), HTML(value='')))

HBox(children=(IntProgress(value=0, max=21980), HTML(value='')))

HBox(children=(IntProgress(value=0, max=21980), HTML(value='')))

HBox(children=(IntProgress(value=0, max=21980), HTML(value='')))

HBox(children=(IntProgress(value=0, max=21980), HTML(value='')))

HBox(children=(IntProgress(value=0, max=21980), HTML(value='')))

HBox(children=(IntProgress(value=0, max=21980), HTML(value='')))

HBox(children=(IntProgress(value=0, max=21980), HTML(value='')))

HBox(children=(IntProgress(value=0, max=21980), HTML(value='')))

HBox(children=(IntProgress(value=0, max=21980), HTML(value='')))

HBox(children=(IntProgress(value=0, max=21980), HTML(value='')))

HBox(children=(IntProgress(value=0, max=21980), HTML(value='')))

HBox(children=(IntProgress(value=0, max=21980), HTML(value='')))

HBox(children=(IntProgress(value=0, max=21980), HTML(value='')))

HBox(children=(IntProgress(value=0, max=21980), HTML(value='')))

HBox(children=(IntProgress(value=0, max=21980), HTML(value='')))

HBox(children=(IntProgress(value=0, max=21980), HTML(value='')))

HBox(children=(IntProgress(value=0, max=21980), HTML(value='')))

HBox(children=(IntProgress(value=0, max=21980), HTML(value='')))

HBox(children=(IntProgress(value=0, max=21980), HTML(value='')))

HBox(children=(IntProgress(value=0, max=21980), HTML(value='')))

HBox(children=(IntProgress(value=0, max=21980), HTML(value='')))

HBox(children=(IntProgress(value=0, max=21980), HTML(value='')))

HBox(children=(IntProgress(value=0, max=21980), HTML(value='')))

HBox(children=(IntProgress(value=0, max=21980), HTML(value='')))

HBox(children=(IntProgress(value=0, max=21980), HTML(value='')))

HBox(children=(IntProgress(value=0, max=21980), HTML(value='')))

HBox(children=(IntProgress(value=0, max=21980), HTML(value='')))

HBox(children=(IntProgress(value=0, max=21980), HTML(value='')))

HBox(children=(IntProgress(value=0, max=21980), HTML(value='')))

HBox(children=(IntProgress(value=0, max=21980), HTML(value='')))

HBox(children=(IntProgress(value=0, max=21980), HTML(value='')))

HBox(children=(IntProgress(value=0, max=21980), HTML(value='')))

HBox(children=(IntProgress(value=0, max=21980), HTML(value='')))

HBox(children=(IntProgress(value=0, max=21980), HTML(value='')))

HBox(children=(IntProgress(value=0, max=21980), HTML(value='')))

HBox(children=(IntProgress(value=0, max=21980), HTML(value='')))

In [449]:
x['distance_to_my_spotify'] = avg_distances

In [452]:
x.sort_values(by = 'distance_to_my_spotify', ascending = False)

Unnamed: 0,danceability,energy,a cappella,abstract idm,acid rock,acoustic pop,action rock,adoracao,adult standards,afro house,...,washington indie,water,welsh metal,welsh rock,witch house,wrestling,yacht rock,york indie,zolo,distance_to_my_spotify
21980,0.746,0.69,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,2.592791
22037,0.894,0.635,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,2.748705
22016,0.63,0.446,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,2.749700
22036,0.568,0.545,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,2.750032
22017,0.585,0.641,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,2.750368
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
21988,0.755,0.578,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,2.943604
21984,0.77,0.724,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,2.976951
21986,0.695,0.343,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,2.982743
21983,0.514,0.73,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,2.982958


In [472]:
x.describe

<bound method NDFrame.describe of       danceability energy  a cappella  abstract idm  acid rock  acoustic pop  \
21980        0.746   0.69           0             0          0             0   
21981        0.923  0.604           0             0          0             0   
21982          0.8   0.56           0             0          0             0   
21983        0.514   0.73           0             0          0             0   
21984         0.77  0.724           0             0          0             0   
...            ...    ...         ...           ...        ...           ...   
22069        0.825  0.458           0             0          0             0   
22070        0.789   0.72           0             0          0             0   
22071        0.773  0.688           0             0          0             0   
22072        0.562  0.683           0             0          0             0   
22073         0.61  0.569           0             0          0             0   

     

In [468]:
personal_df_clean.loc[x.index,['Artists', 'Songs', 'danceability']]

TypeError: cannot concatenate object of type '<class 'list'>'; only Series and DataFrame objs are valid

In [479]:
df_dist.sort_values(by = 'distance_to_my_spotify', ascending = False)

Unnamed: 0,Artists,Songs,danceability,energy,genre,origin,a cappella,abstract idm,acid rock,acoustic pop,...,washington indie,water,welsh metal,welsh rock,witch house,wrestling,yacht rock,york indie,zolo,distance_to_my_spotify
21982,,,0.8,0.56,,,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,3.071628
21983,,,0.514,0.73,,,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,2.982958
21986,,,0.695,0.343,,,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,2.982743
21984,,,0.77,0.724,,,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,2.976951
21988,,,0.755,0.578,,,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,2.943604
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
22069,Gunna Featuring Young Thug,Dollaz On My Head,0.825,0.458,"[atl hip hop, melodic rap, rap, trap]",billboard,,,,,...,,,,,,,,,,
22070,Topic & A7S,Breaking Me,0.789,0.72,"[german dance, pop edm, tropical house]",billboard,,,,,...,,,,,,,,,,
22071,Pop Smoke Featuring Karol G,Enjoy Yourself,0.773,0.688,[brooklyn drill],billboard,,,,,...,,,,,,,,,,
22072,Parker McCollum,Pretty Heart,0.562,0.683,"[contemporary country, texas country]",billboard,,,,,...,,,,,,,,,,


In [None]:
df_dist = pd.concat( [ personal_df_clean, x]) 


In [344]:
# todas as musicas do last fm 24000 DONE
# musicas favoritas do spotify 200
# user top tracks 100 long term 100 short term
# buscar caracteristicas dessas 24400 musicas  no spotify (danceability ...) DONE

# opção1: analise no tableau
# opção2: recomendação do top 100 - calcular distância entre top1 e as 24400 [0.8, 0.1, 0.1, 0.1, 0.1, 0.3 ,0.9]
# obtem distancia media

# definir X['danceability', ...]
# from scipy.spatial.distance import pdist, squareform
# X = pd.concat([top100, lastfm, favorites])['danceability','energy',...]
# distances = squareform(pdist(X))
# distances.iloc[0:100]

# Create a new playlist

In [8]:
#client_credentials_manager = SpotifyClientCredentials('e97099ce419a4e91832bc49f3bfb3372',spotify_id)
#sp = spotipy.Spotify(client_credentials_manager=client_credentials_manager)

## Create a playlist

In [15]:
sp.user_playlist_create('1234896446', 
                     name= 'Billboard Top 100', 
                     public=True, 
                     description='Top 100 songs of the week')

{'collaborative': False,
 'description': 'Top 100 songs of the week',
 'external_urls': {'spotify': 'https://open.spotify.com/playlist/7rhinDHkx14eQgE2nvSwNZ'},
 'followers': {'href': None, 'total': 0},
 'href': 'https://api.spotify.com/v1/playlists/7rhinDHkx14eQgE2nvSwNZ',
 'id': '7rhinDHkx14eQgE2nvSwNZ',
 'images': [],
 'name': 'Billboard Top 100',
 'owner': {'display_name': 'Camila Aguileras',
  'external_urls': {'spotify': 'https://open.spotify.com/user/1234896446'},
  'href': 'https://api.spotify.com/v1/users/1234896446',
  'id': '1234896446',
  'type': 'user',
  'uri': 'spotify:user:1234896446'},
 'primary_color': None,
 'public': True,
 'snapshot_id': 'MSw3ZmQ5ODkzNjIxNjM0YjU2ZTZiOGRlZDgzY2NkMDkxNDNkZDRhZjcx',
 'tracks': {'href': 'https://api.spotify.com/v1/playlists/7rhinDHkx14eQgE2nvSwNZ/tracks',
  'items': [],
  'limit': 100,
  'next': None,
  'offset': 0,
  'previous': None,
  'total': 0},
 'type': 'playlist',
 'uri': 'spotify:playlist:7rhinDHkx14eQgE2nvSwNZ'}

### First let's try to find the songs' information on spotify: 

In [35]:
sptf = sp.search(q= top100[2],
          type = 'track'      
     )
sptf

{'tracks': {'href': 'https://api.spotify.com/v1/search?query=Blinding+Lights&type=track&offset=0&limit=10',
  'items': [{'album': {'album_type': 'album',
     'artists': [{'external_urls': {'spotify': 'https://open.spotify.com/artist/1Xyo4u8uXC1ZmMpatF05PJ'},
       'href': 'https://api.spotify.com/v1/artists/1Xyo4u8uXC1ZmMpatF05PJ',
       'id': '1Xyo4u8uXC1ZmMpatF05PJ',
       'name': 'The Weeknd',
       'type': 'artist',
       'uri': 'spotify:artist:1Xyo4u8uXC1ZmMpatF05PJ'}],
     'available_markets': ['AD',
      'AE',
      'AR',
      'AT',
      'AU',
      'BE',
      'BG',
      'BH',
      'BO',
      'BR',
      'CA',
      'CH',
      'CL',
      'CO',
      'CR',
      'CY',
      'CZ',
      'DE',
      'DK',
      'DO',
      'DZ',
      'EC',
      'EE',
      'EG',
      'ES',
      'FI',
      'FR',
      'GB',
      'GR',
      'GT',
      'HK',
      'HN',
      'HU',
      'ID',
      'IE',
      'IL',
      'IN',
      'IS',
      'IT',
      'JO',
      'JP',
 