# Spotify Playlist Analysis


<span class = "myhighlight">Objective.</span> Using Python, the project goal is to implement a k-means clustering algorithm, a technique often used in machine learning, and use it for data analysis. We write various functions making use of lists, sets, dictionaries, sorting, and graph data structures for computational problem solving and analysis.


In [42]:
import csv
import time
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import spotipy
from spotipy.oauth2 import SpotifyClientCredentials
from operator import index

First, we create a Client Credentials Flow Manager used in server-to-server authentication by passing the necessary parameters to the [Spotify OAuth](https://github.com/spotipy-dev/spotipy/blob/master/spotipy/oauth2.py#L261) class. We provide a client id and client secret to the constructor of this authorization flow, which does not require user interaction.
    

In [43]:
# Set client id and client secret
client_id = '4cf3afdca2d74dc48af9999b1b7c9c61'
client_secret = 'f6ca08ad37bb41a0afab5ca1dc74b208'

# Spotify authentication token
client_credentials_manager = SpotifyClientCredentials(client_id, client_secret)
sp = spotipy.Spotify(client_credentials_manager = client_credentials_manager)

Now, we want to get the full details of the tracks of a playlist based on a playlist ID, URI, or URL. The following function takes a playlist and gets information from each individual song.


In [44]:
# Get playlist song features and artist info
def playlist_features(id, artist_id, playlist_id):
    
    # Create Spotify API client variables
    meta = sp.track(id)
    audio_features = sp.audio_features(id)
    artist_info = sp.artist(artist_id)
    playlist_info = sp.playlist(playlist_id)

    # Metadata
    name = meta['name']
    track_id = meta['id']
    album = meta['album']['name']
    artist = meta['album']['artists'][0]['name']
    artist_id = meta['album']['artists'][0]['id']
    release_date = meta['album']['release_date']
    length = meta['duration_ms']
    popularity = meta['popularity']

    # Main artist name, popularity, genre
    artist_pop = artist_info["popularity"]
    artist_genres = artist_info["genres"]

    # Track features
    acousticness = audio_features[0]['acousticness']
    danceability = audio_features[0]['danceability']
    energy = audio_features[0]['energy']
    instrumentalness = audio_features[0]['instrumentalness']
    liveness = audio_features[0]['liveness']
    loudness = audio_features[0]['loudness']
    speechiness = audio_features[0]['speechiness']
    tempo = audio_features[0]['tempo']
    valence = audio_features[0]['valence']
    key = audio_features[0]['key']
    mode = audio_features[0]['mode']
    time_signature = audio_features[0]['time_signature']
    
    # Basic playlist info
    playlist_name = playlist_info['name']

    return [name, track_id, album, artist, artist_id, release_date, length, popularity, 
            artist_pop, artist_genres, acousticness, danceability, 
            energy, instrumentalness, liveness, loudness, speechiness, 
            tempo, valence, key, mode, time_signature, playlist_name]

Choose a specific playlist to analyze by copying the URL from the Spotify Player interface. Using that link, the following code uses the playlist_tracks method to retrieve a list of IDs and corresponding artists for each track from the playlist. 



In [45]:
def get_playlist_tracks(playlist_URI):
    tracks = []
    results = sp.playlist_tracks(playlist_URI)
    tracks = results['items']
    while results['next']:
        results = sp.next(results)
        tracks.extend(results['items'])     
    return tracks

In [46]:
# Spotify playlist url
playlist_links = ["https://open.spotify.com/playlist/1nvpVNmzL7Vi1pXcQEiaLx?si=6842f41a58284be3"]

playlist_ids = []
track_ids = []
artist_uris = []

for link in playlist_links:
    playlist_URI = link.split("/")[-1].split("?")[0]
    
    # Extract song ids and artists from playlist
    for i in get_playlist_tracks(playlist_URI):
        track_ids.append(i['track']["id"])
        artist_uris.append(i['track']["artists"][0]["uri"])
        playlist_ids.append(playlist_URI)
  

In [47]:
len(playlist_ids)

106

--------------------

The following code loops through each track ID in the playlist and extracts additional song information by calling the function we created above. From there, we can create a pandas data frame by passing in the extracted information and giving the column header names we want. 

In [48]:
# Loop over track ids
all_tracks = [playlist_features(track_ids[i], artist_uris[i], playlist_ids[i])
              for i in range(len(track_ids))]

In [49]:
# Create dataframe
df = pd.DataFrame(
    all_tracks, columns=['name', 'track_id', 'album', 'artist', 'artist_id','release_date',
                     'length', 'popularity', 'artist_pop', 'artist_genres',
                     'acousticness', 'danceability', 'energy',
                     'instrumentalness', 'liveness', 'loudness',
                     'speechiness', 'tempo', 'valence', 'key', 'mode',
                     'time_signature', 'playlist'])
# Save to csv file
df.to_csv("data/my_playlist.csv", sep=',')

--------------------------------------------------------


#### Spotify Playlists Data Extraction

In [50]:
spotify_playlists = pd.read_csv('data/spotify_playlists.csv', encoding_errors='ignore', index_col=0, header=0)
spotify_playlists['playlist'].value_counts()

New Music Friday      100
New Pop Picks         100
just hits             100
Hip Hop Controller     99
RapCaviar              51
Today's Top Hits       50
Hot Hits USA           50
Name: playlist, dtype: int64

------------------------------------------------------

### The Data


How many songs do we have?

In [51]:
# Number of rows and columns
rows, cols = df.shape
print(f'Number of songs: {rows}')
print(f'Number of attributes per song: {cols}')

Number of songs: 106
Number of attributes per song: 23


In [52]:
# Get a song string search
def getMusicName(elem):
    return f"{elem['artist']} - {elem['name']}"

# Select song and get track info
anySong = df.loc[15]
anySongName = getMusicName(anySong)
print('name:', anySongName)

name: ODIE - In My Head


-----------------------

## Spotify Songs - Similarity Search




Below, we create a query to retrieve similar elements based on Euclidean distance. In mathematics, the Euclidean distance between two points is the length of the line segment between the two points. In this sense, the closer the distance is to 0, the more similar the songs are.



#### [KNN Algorithm](https://www.kaggle.com/code/leomauro/spotify-songs-similarity-search/notebook)


The k-Nearest Neighbors (KNN) algorithm searches for k similar elements based on a query point at the center within a predefined radius. 



In [53]:
def knnQuery(queryPoint, arrCharactPoints, k):
    queryVals = queryPoint.tolist()
    distVals = []
    
    # Copy of dataframe indices and data
    tmp = arrCharactPoints.copy(deep = True)  
    for index, row in tmp.iterrows():
        feat = row.values.tolist()
        
        # Calculate sum of squared differences
        ssd = sum(abs(feat[i] - queryVals[i]) ** 2 for i in range(len(queryVals)))
        
        # Get euclidean distance
        distVals.append(ssd ** 0.5)
        
    tmp['distance'] = distVals
    tmp = tmp.sort_values('distance')
    
    # K closest and furthest points
    return tmp.head(k).index, tmp.tail(k).index

In [54]:
# Execute KNN removing the query point
def querySimilars(df, columns, idx, func, param):
    arr = df[columns].copy(deep = True)
    queryPoint = arr.loc[idx]
    arr = arr.drop([idx])
    return func(queryPoint, arr, param)

**KNN Query Example.** 

Our function allows us to create personalized query points and modify the columns to explore other options. For example, the following code selects a specific set of song attributes and then searches for the $k$ highest values of these attributes set equal to one.

Let's search for  $k=3$  similar songs to a query point $\textrm{songIndex} = 6$. 

In [55]:
from sklearn import preprocessing 
scaler=preprocessing.MinMaxScaler() #instantiate a scaler
#all the feature values are in the range [0,1] ,except of loudnes
#so let's scale it to fit the exact same range
loudness2 = df["loudness"].values
loudness_scaled=scaler.fit_transform(loudness2.reshape(-1, 1))
df['loudness_scaled'] = loudness_scaled

In [56]:
# Select song and column attributes
songIndex = 4 # query point
columns = ['acousticness', 'danceability', 'energy', 'instrumentalness', 'liveness', 
           #'loudness_scaled', 'tempo', 
           'speechiness', 'valence']

# Set query parameters
func, param = knnQuery,3

# Implement query
response = querySimilars(df, columns, songIndex, func, param)

print("---- Query Point ----")
print(getMusicName(df.loc[songIndex]))
print('---- k = 3 similar songs ----')
for track_id in response[0]:
    track_name = getMusicName(df.loc[track_id])
    print(track_name)
print('---- k = 3 nonsimilar songs ----')
for track_id in response[1]:
    track_name = getMusicName(df.loc[track_id])
    print(track_name)

---- Query Point ----
AG Club - Memphis
---- k = 3 similar songs ----
Roddy Ricch - Stop Breathing
The Game - Eazy
Lil Yachty - Yacht Club (feat. Juice WRLD)
---- k = 3 nonsimilar songs ----
Post Malone - Internet
ODIE - In My Head
Frank Ocean - In My Room


The code below implements the same idea as above, but queries each track in a given playlist instead of a single defined query point.

In [57]:
similar_count = {} # Similar songs count
nonsimilar_count = {} # Non-similar songs count

for track_index in df.index:
    response = querySimilars(df, columns, track_index, func, param)
    
    # Get similar songs
    for similar_index in response[0]:
        track = getMusicName(df.loc[similar_index])
        if track in similar_count:
            similar_count[track] += 1
        else:
            similar_count[track] = 1
    
    # Get non-similar songs
    for nonsimilar_index in response[1]:
        track = getMusicName(df.loc[nonsimilar_index])
        if track in nonsimilar_count:
            nonsimilar_count[track] += 1
        else:
            nonsimilar_count[track] = 1

NON SIMILAR SONG COUNT:

In [60]:
nonsimilar = dict(sorted(nonsimilar_count.items(), key=lambda item: item[1], reverse=True))

print('---- NON SIMILAR SONG COUNTS ----')
for track_name, track_count in nonsimilar.items():
    if track_count >= 8:
        print(track_name, ':', track_count)

---- NON SIMILAR SONG COUNTS ----
Frank Ocean - In My Room : 83
ODIE - In My Head : 46
Post Malone - Internet : 42
Blxst - Hurt : 25
Lil Uzi Vert - The Way Life Goes (feat. Nicki Minaj & Oh Wonder) - Remix : 19
Kanye West - Waves : 12
Tyla Yaweh - Understand Me : 11
Mark Battles - Lemme Talk : 9
Kanye West - Violent Crimes : 9
SAINt JHN - The Best Part of Life : 8


SIMILAR SONG COUNT:

In [61]:
similar = dict(sorted(similar_count.items(), key=lambda item: item[1], reverse=True))

print('---- SIMILAR SONG COUNTS ----')
for track_name, track_count in similar.items():
    if track_count >= 5:
        print(track_name, ':', track_count)

---- SIMILAR SONG COUNTS ----
YoungBoy Never Broke Again - Home Ain't Home (feat. Rod Wave) : 9
Kodak Black - MoshPit (feat. Juice WRLD) : 8
Lil Xan - Lies (feat. Lil Skies) : 8
Tyla Yaweh - High Right Now (feat. Wiz Khalifa) - Remix : 7
iann dior - I might : 7
mike. - commas : 7
Juice WRLD - Life's A Mess (feat. Halsey) : 6
Fresco Trey - Key To My Heart : 6
Azizi Gibson - Rain : 6
Mac Miller - Weekend (feat. Miguel) : 5
Polo G - RAPSTAR : 5
Rae Sremmurd - Denial : 5
Juice WRLD - Stay High : 5
Lil Uzi Vert - The Way Life Goes (feat. Nicki Minaj & Oh Wonder) - Remix : 5
Sheff G - Weight On Me : 5
whiterosemoxie - west side boys : 5
Post Malone - Waiting For Never : 5
Post Malone - Big Lie : 5
Juice WRLD - In My Head : 5
Justin Stone - Goldmine : 5
Juice WRLD - Rich And Blind : 5
Baby Keem - 16 : 5
KILJ - No Remedy : 5


---------------------------------------------------------------


### Similar Artists Web Visual


First, we want to find the most frequently occurring artist in a given playlist. We use the value_counts function to get a sequence containing counts of unique values sorted in descending order. 


In [62]:
# pandas count distinct values in column
tallyArtists = df.value_counts(["artist", "artist_id"]).reset_index(name='counts')
topArtist = tallyArtists['artist_id'][1]
tallyArtists.head(4)

Unnamed: 0,artist,artist_id,counts
0,Juice WRLD,4MCBfE4596Uoi2O4DtmEMz,10
1,Post Malone,246dkjvS1zLTtiykXe5h60,8
2,SAINt JHN,0H39MdGGX6dbnnQPt6NQkZ,3
3,Lil Uzi Vert,4O15NlyKLIASxsJ0PrXPfz,3


#### Links Dataset

I can retrieve artist and artist-related data using the following code, passing the artist ID to the artist and artist-related artist functions under the spotipy package. The returned list of similar artists is sorted by similarity score based on the listener data.

In [63]:
# create links table
a = sp.artist(topArtist)
ra = sp.artist_related_artists(topArtist)

# dictionary of lists 
links_dict = {"source_name":[],"source_id":[],"target_name":[],"target_id":[]};
for artist in ra['artists']:
    links_dict["source_name"].append(a['name'])
    links_dict["source_id"].append(a['id'])
    links_dict["target_name"].append(artist['name'])
    links_dict["target_id"].append(artist['id'])

Let’s take it a step further and query the API for similar artists for those similar to the most frequent artist in the given playlist. In other words, we generate two generations of the most similar artists.

In [65]:
for i in range(0, 4):
    a = sp.artist(links_dict['target_id'][i])
    ra = sp.artist_related_artists(links_dict['target_id'][i])
    time.sleep(.5)
    for artist in ra['artists']:
        links_dict["source_name"].append(a['name'])
        links_dict["source_id"].append(a['id'])
        links_dict["target_name"].append(artist['name'])
        links_dict["target_id"].append(artist['id'])

# Convert links dict to dataframe
links = pd.DataFrame(links_dict) 

# Export to excel sheet             
links.to_excel("data/links.xlsx", index = False)

In [66]:
links.head(3)

Unnamed: 0,source_name,source_id,target_name,target_id
0,Post Malone,246dkjvS1zLTtiykXe5h60,Rae Sremmurd,7iZtZyCzp3LItcw1wtPI3D
1,Post Malone,246dkjvS1zLTtiykXe5h60,Huncho Jack,6extd4B6hl8VTmnlhpl2bY
2,Post Malone,246dkjvS1zLTtiykXe5h60,Tyla Yaweh,1MXZ0hsGic96dWRDKwAwdr


#### Points Dataset

In [67]:
# create "points" table             
all_artist_ids = list(set(links_dict['source_id'] + links_dict['target_id']))


In [68]:
name_count = {}
all_artist_names = list(links_dict['source_name'] + links_dict['target_name'])
for name in all_artist_names:
    if name in name_count:
        name_count[name] += 1
        
    else:
        name_count[name] = 1

In [69]:
# dictionary of lists 
points_dict = {"id":[],"name":[],"connections":[],"followers":[],"popularity":[],"url":[],"image":[]};

for id in all_artist_ids:
    time.sleep(.5)
    a = sp.artist(id)
    points_dict['id'].append(id)
    points_dict['name'].append(a['name'])
    points_dict['connections'].append(name_count[a['name']])
    points_dict['followers'].append(a['followers']['total'])
    points_dict['popularity'].append(a['popularity'])
    points_dict['url'].append(a['external_urls']['spotify'])
    points_dict['image'].append(a['images'][0]['url'])

# Convert links dict to dataframe
points = pd.DataFrame(points_dict) 

# Export to excel sheet             
points.to_excel("data/points.xlsx", index = False)

In [70]:
points.head(3)

Unnamed: 0,id,name,connections,followers,popularity,url,image
0,0VRj0yCOv2FXJNP47XQnx5,Quavo,3,6179310,82,https://open.spotify.com/artist/0VRj0yCOv2FXJN...,https://i.scdn.co/image/ab6761610000e5eb1454de...
1,7iZtZyCzp3LItcw1wtPI3D,Rae Sremmurd,45,7008543,73,https://open.spotify.com/artist/7iZtZyCzp3LItc...,https://i.scdn.co/image/ab6761610000e5eb209b54...
2,34Y0ldeyUv7jBvukWOGASO,Bobby Shmurda,2,1390475,60,https://open.spotify.com/artist/34Y0ldeyUv7jBv...,https://i.scdn.co/image/ab6761610000e5ebee12e6...


#### Flourish Network Graph

The following visualization is based on the [Spotify Similiar Artists API](https://unboxed-analytics.com/data-technology/visualizing-rap-communities-wtih-python-spotifys-api/) article and created with flourish studio.


In [None]:
%%html

<iframe src='https://flo.uri.sh/visualisation/12232729/embed' title='Interactive or visual content' class='flourish-embed-iframe' frameborder='0' scrolling='no' style='width:100%;height:600px;' sandbox='allow-same-origin allow-forms allow-scripts allow-downloads allow-popups allow-popups-to-escape-sandbox allow-top-navigation-by-user-activation'></iframe><div style='width:100%!;margin-top:4px!important;text-align:right!important;'><a class='flourish-credit' href='https://public.flourish.studio/visualisation/12232729/?utm_source=embed&utm_campaign=visualisation/12232729' target='_top' style='text-decoration:none!important'><img alt='Made with Flourish' src='https://public.flourish.studio/resources/made_with_flourish.svg' style='width:105px!important;height:16px!important;border:none!important;margin:0!important;'> </a></div>

------------------------------------------


## Clustering with pycaret


In [76]:
# Select columns to keep on dataset
data_keep = df[['track_id', 'name', 'danceability', 'energy', 'tempo', 'valence']] 
data_keep.describe()

Unnamed: 0,danceability,energy,tempo,valence
count,106.0,106.0,106.0,106.0
mean,0.6795,0.586915,119.174302,0.398518
std,0.115911,0.122452,33.817415,0.208096
min,0.375,0.214,74.013,0.0357
25%,0.62075,0.50275,83.60175,0.2475
50%,0.7005,0.5985,120.1085,0.373
75%,0.7605,0.67025,150.7845,0.5545
max,0.935,0.881,178.046,0.835


In [79]:
data_keep['url'] = data_keep[['track_id']].apply(lambda x: f"https://open.spotify.com/track/{x['track_id']}", axis=1)

A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  data_keep['url'] = data_keep[['track_id']].apply(lambda x: f"https://open.spotify.com/track/{x['track_id']}", axis=1)


In [84]:
from pycaret.clustering import *
s = setup(data_keep, normalize = True, 
                   ignore_features = ['name', 'track_id', 'url'],
                   session_id = 42)

ModuleNotFoundError: No module named 'pycaret'

In [None]:
# Create Model
m = create_model('kmeans', num_clusters=3)
results = assign_model(m)
results[['Cluster', 'track_name']].head(10)


In [None]:
# Check results
def select_cluster(id):
    _df = results[results['Cluster']==f'Cluster {id}'][['track_name', 'url']]
    return _df.sample(n=15, random_state=42)

select_cluster(3)

In [85]:
!pip install pycaret

Defaulting to user installation because normal site-packages is not writeable
Collecting pycaret
  Using cached pycaret-2.3.10-py3-none-any.whl (320 kB)
Collecting kmodes>=0.10.1
  Using cached kmodes-0.12.2-py2.py3-none-any.whl (20 kB)
Collecting Boruta
  Using cached Boruta-0.3-py3-none-any.whl (56 kB)
Collecting numba<0.55
  Downloading numba-0.54.1-cp39-cp39-win_amd64.whl (2.3 MB)
Collecting scikit-learn==0.23.2
  Using cached scikit-learn-0.23.2.tar.gz (7.2 MB)
  Installing build dependencies: started
  Installing build dependencies: finished with status 'error'
Collecting pycaret
  Using cached pycaret-2.3.9-py3-none-any.whl (319 kB)
  Using cached pycaret-2.3.8-py3-none-any.whl (318 kB)
  Using cached pycaret-2.3.7-py3-none-any.whl (318 kB)
  Using cached pycaret-2.3.6-py3-none-any.whl (301 kB)
  Using cached pycaret-2.3.5-py3-none-any.whl (288 kB)
  Using cached pycaret-2.3.4-py3-none-any.whl (266 kB)
Collecting numba<0.54
  Downloading numba-0.53.1-cp39-cp39-win_amd64.whl (2.3

  ERROR: Command errored out with exit status 1:
   command: 'C:\ProgramData\Anaconda3\python.exe' 'C:\conda_tmp\pip-standalone-pip-havh0mfg\__env_pip__.zip\pip' install --ignore-installed --no-user --prefix 'C:\conda_tmp\pip-build-env-iqbpevrp\overlay' --no-warn-script-location --no-binary :none: --only-binary :none: -i https://pypi.org/simple -- setuptools wheel 'Cython>=0.28.5' 'numpy==1.13.3; python_version=='"'"'3.6'"'"' and platform_system!='"'"'AIX'"'"' and platform_python_implementation == '"'"'CPython'"'"'' 'numpy==1.14.0; python_version=='"'"'3.6'"'"' and platform_system!='"'"'AIX'"'"' and platform_python_implementation != '"'"'CPython'"'"'' 'numpy==1.14.5; python_version=='"'"'3.7'"'"' and platform_system!='"'"'AIX'"'"'' 'numpy==1.17.3; python_version>='"'"'3.8'"'"' and platform_system!='"'"'AIX'"'"'' 'numpy==1.16.0; python_version=='"'"'3.6'"'"' and platform_system=='"'"'AIX'"'"'' 'numpy==1.16.0; python_version=='"'"'3.7'"'"' and platform_system=='"'"'AIX'"'"'' 'numpy==1.17

  ERROR: Command errored out with exit status 1:
   command: 'C:\ProgramData\Anaconda3\python.exe' 'C:\conda_tmp\pip-standalone-pip-_wd5uuk2\__env_pip__.zip\pip' install --ignore-installed --no-user --prefix 'C:\conda_tmp\pip-build-env-rzg051it\overlay' --no-warn-script-location --no-binary :none: --only-binary :none: -i https://pypi.org/simple -- setuptools wheel 'Cython>=0.28.5' 'numpy==1.13.3; python_version=='"'"'3.6'"'"' and platform_system!='"'"'AIX'"'"' and platform_python_implementation == '"'"'CPython'"'"'' 'numpy==1.14.0; python_version=='"'"'3.6'"'"' and platform_system!='"'"'AIX'"'"' and platform_python_implementation != '"'"'CPython'"'"'' 'numpy==1.14.5; python_version=='"'"'3.7'"'"' and platform_system!='"'"'AIX'"'"'' 'numpy==1.17.3; python_version>='"'"'3.8'"'"' and platform_system!='"'"'AIX'"'"'' 'numpy==1.16.0; python_version=='"'"'3.6'"'"' and platform_system=='"'"'AIX'"'"'' 'numpy==1.16.0; python_version=='"'"'3.7'"'"' and platform_system=='"'"'AIX'"'"'' 'numpy==1.17

Defaulting to user installation because normal site-packages is not writeable
Collecting pycaret
  Downloading pycaret-2.3.10-py3-none-any.whl (320 kB)
Collecting Boruta
  Downloading Boruta-0.3-py3-none-any.whl (56 kB)
Collecting scipy<=1.5.4
  Downloading scipy-1.5.4-cp39-cp39-win_amd64.whl (31.4 MB)
Collecting pyyaml<6.0.0
  Using cached PyYAML-5.4.1-cp39-cp39-win_amd64.whl (213 kB)
Collecting pyLDAvis
  Downloading pyLDAvis-3.3.1.tar.gz (1.7 MB)
  Installing build dependencies: started
  Installing build dependencies: finished with status 'done'
  Getting requirements to build wheel: started
  Getting requirements to build wheel: finished with status 'done'
  Installing backend dependencies: started
  Installing backend dependencies: finished with status 'done'
    Preparing wheel metadata: started
    Preparing wheel metadata: finished with status 'done'
Collecting mlxtend>=0.17.0
  Downloading mlxtend-0.21.0-py2.py3-none-any.whl (1.3 MB)
Collecting wordcloud
  Downloading wordclo

Defaulting to user installation because normal site-packages is not writeable
Collecting pycaret
  Using cached pycaret-2.3.10-py3-none-any.whl (320 kB)
Collecting spacy<2.4.0
  Downloading spacy-2.3.9-cp39-cp39-win_amd64.whl (9.1 MB)


  ERROR: Command errored out with exit status 1:
   command: 'C:\ProgramData\Anaconda3\python.exe' 'C:\conda_tmp\pip-standalone-pip-p5wlwzs0\__env_pip__.zip\pip' install --ignore-installed --no-user --prefix 'C:\conda_tmp\pip-build-env-ey0drsba\overlay' --no-warn-script-location --no-binary :none: --only-binary :none: -i https://pypi.org/simple -- setuptools wheel 'Cython>=0.28.5' 'numpy==1.13.3; python_version=='"'"'3.6'"'"' and platform_system!='"'"'AIX'"'"' and platform_python_implementation == '"'"'CPython'"'"'' 'numpy==1.14.0; python_version=='"'"'3.6'"'"' and platform_system!='"'"'AIX'"'"' and platform_python_implementation != '"'"'CPython'"'"'' 'numpy==1.14.5; python_version=='"'"'3.7'"'"' and platform_system!='"'"'AIX'"'"'' 'numpy==1.17.3; python_version>='"'"'3.8'"'"' and platform_system!='"'"'AIX'"'"'' 'numpy==1.16.0; python_version=='"'"'3.6'"'"' and platform_system=='"'"'AIX'"'"'' 'numpy==1.16.0; python_version=='"'"'3.7'"'"' and platform_system=='"'"'AIX'"'"'' 'numpy==1.17

Collecting scipy<=1.5.4
  Using cached scipy-1.5.4-cp39-cp39-win_amd64.whl (31.4 MB)
Collecting imbalanced-learn==0.7.0
  Downloading imbalanced_learn-0.7.0-py3-none-any.whl (167 kB)
Collecting Boruta
  Using cached Boruta-0.3-py3-none-any.whl (56 kB)
Collecting pyyaml<6.0.0
  Using cached PyYAML-5.4.1-cp39-cp39-win_amd64.whl (213 kB)
Collecting scikit-learn==0.23.2
  Using cached scikit-learn-0.23.2.tar.gz (7.2 MB)
  Installing build dependencies: started
  Installing build dependencies: finished with status 'error'
Collecting pycaret
  Using cached pycaret-2.3.9-py3-none-any.whl (319 kB)
  Using cached pycaret-2.3.8-py3-none-any.whl (318 kB)
  Using cached pycaret-2.3.7-py3-none-any.whl (318 kB)
  Using cached pycaret-2.3.6-py3-none-any.whl (301 kB)
  Using cached pycaret-2.3.5-py3-none-any.whl (288 kB)
Collecting numpy==1.19.5
  Using cached numpy-1.19.5-cp39-cp39-win_amd64.whl (13.3 MB)
Collecting pycaret
  Using cached pycaret-2.3.4-py3-none-any.whl (266 kB)
Collecting numba<0.54



  Ignoring numpy: markers 'python_version == "3.6" and platform_system == "AIX"' don't match your environment
  Ignoring numpy: markers 'python_version == "3.7" and platform_system == "AIX"' don't match your environment
  Ignoring numpy: markers 'python_version >= "3.8" and platform_system == "AIX"' don't match your environment
  Collecting setuptools
    Using cached setuptools-65.6.3-py3-none-any.whl (1.2 MB)
  Collecting wheel
    Using cached wheel-0.38.4-py3-none-any.whl (36 kB)
  Collecting Cython>=0.28.5
    Using cached Cython-0.29.32-py2.py3-none-any.whl (986 kB)
  Collecting numpy==1.17.3
    Using cached numpy-1.17.3.zip (6.4 MB)
  Collecting scipy>=0.19.1
    Using cached scipy-1.9.3-cp39-cp39-win_amd64.whl (40.2 MB)
    Using cached scipy-1.9.2-cp39-cp39-win_amd64.whl (40.1 MB)
    Using cached scipy-1.9.1-cp39-cp39-win_amd64.whl (38.6 MB)
    Using cached scipy-1.9.0-cp39-cp39-win_amd64.whl (38.6 MB)
    Using cached scipy-1.8.1-cp39-cp39-win_amd64.whl (36.9 MB)
  Buildi


Collecting mlflow
  Using cached mlflow-2.1.1-py3-none-any.whl (16.7 MB)
Installing collected packages: mlflow, lightgbm, kmodes, imbalanced-learn, cufflinks, catboost, pycaret
Successfully installed catboost-1.1.1 cufflinks-0.17.3 imbalanced-learn-0.10.0 kmodes-0.12.2 lightgbm-3.3.3 mlflow-2.1.1 pycaret-2.2.2


    customize MSVCCompiler
      libraries openblas not found in ['C:\\ProgramData\\Anaconda3\\lib', 'C:\\', 'C:\\ProgramData\\Anaconda3\\libs']
    get_default_fcompiler: matching types: '['gnu', 'intelv', 'absoft', 'compaqv', 'intelev', 'gnu95', 'g95', 'intelvem', 'intelem', 'flang']'
    customize GnuFCompiler
    Could not locate executable g77
    Could not locate executable f77
    customize IntelVisualFCompiler
    Could not locate executable ifort
    Could not locate executable ifl
    customize AbsoftFCompiler
    Could not locate executable f90
    customize CompaqVisualFCompiler
    Could not locate executable DF
    customize IntelItaniumVisualFCompiler
    Could not locate executable efl
    customize Gnu95FCompiler
    Could not locate executable gfortran
    Could not locate executable f95
    customize G95FCompiler
    Could not locate executable g95
    customize IntelEM64VisualFCompiler
    customize IntelEM64TFCompiler
    Could not locate executable efort
    Could

---------------------------------------------

## Organized Songs in a Playlist

In [None]:
import datetime
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
from sklearn import cluster, decomposition

In [None]:
songs = pd.read_csv('data/my_playlist.csv', encoding_errors='ignore', index_col=0, header=0)


In [None]:
labels = df.values[:,0]
X = df.values[:,10:22]

In [None]:
kmeans = cluster.AffinityPropagation(preference=-200)
kmeans.fit(X)


In [None]:

predictions = {}
for p,n in zip(kmeans.predict(X),labels):
    if not predictions.get(p):
        predictions[p] = []
        
    predictions[p] += [n]

for p in predictions:
    print("Category",p)
    print("-----")
    for n in predictions[p]:
        print(n)
    print("")