# Installing Spotipy

In order to use the Spotify API (SpotiPy) we will have to create an account in Spotify and follow these steps. Once we have done it we will start initializing the API and look at the search method for which we can introduce a "query" q, in this example we will try it with Lady Gaga:

## Loading credentials from another config file


In [1]:
from config import *

## Starting with Spotify API


**Caution!!!**

Spotify API has a limit of the number of calls made every 30 seconds. If you exceed this limit, you will get a "429" error code and you will have to wait several hours to be able to make new requests.

Be extremelly carefull to avoid this problem.

In [2]:
#importing libraries

from random import randint
import time
from time import sleep
import spotipy
from spotipy.oauth2 import SpotifyClientCredentials
import numpy as np
import pandas as pd

In [3]:
#establish the connection
#Initialize SpotiPy with user credentias #
#sp = spotipy.Spotify(auth_manager=SpotifyClientCredentials(client_id=Client_ID, client_secret=Client_Secret))

#Initialize SpotiPy with user credentias #
sp = spotipy.Spotify(auth_manager=SpotifyClientCredentials(client_id=client_id,
                                                           client_secret=client_secret))

In [4]:
hot_songs = pd.read_csv('hot_songs.csv')
not_hot_songs = pd.read_csv('not_hot_songs.csv')

hot_songs.head()

Unnamed: 0,Title,Artist
0,Paint The Town Red,Doja Cat
1,Snooze,SZA
2,Fast Car,Luke Combs
3,Cruel Summer,Taylor Swift
4,I Remember Everything,Zach Bryan Featuring Kacey Musgraves


In [5]:
not_hot_songs.head()

Unnamed: 0,Title,Artist
0,Yellow,Coldplay
1,All The Small Things,blink-182
2,Breathe,Faith Hill
3,In the End,Linkin Park
4,Bye Bye Bye,*NSYNC


## Search functions

In [7]:
# EXAMPLE
# Search for a particular artist to see what albums are available .In this case I have choosen Taylor Swift as the artist

name = ["Taylor Swift","Coldplay"]
result = sp.search(name) 
result['tracks']['items'][1]['artists']

[{'external_urls': {'spotify': 'https://open.spotify.com/artist/06HL4z0CvFAxyc27GXpf02'},
  'href': 'https://api.spotify.com/v1/artists/06HL4z0CvFAxyc27GXpf02',
  'id': '06HL4z0CvFAxyc27GXpf02',
  'name': 'Taylor Swift',
  'type': 'artist',
  'uri': 'spotify:artist:06HL4z0CvFAxyc27GXpf02'}]

### Create search_song function

In [8]:
def search_song1(df: pd.DataFrame, limit=1)-> pd.DataFrame:
    sp = spotipy.Spotify(auth_manager=SpotifyClientCredentials(client_id=client_id,
                                                           client_secret=client_secret))
    chunks = np.array_split(df, 50)
    chunks_ids = []
    for index, chunk in enumerate(chunks):
        print("Collecting IDs for chunk...",index)
        for index, row in chunk.iterrows():
            try:
                title, artist = row['Title'], row['Artist']  # Get title and artist from row
                query = "tracks: " + f"{title}" + "artist: " + f"{artist}"
                results = sp.search(q=query, limit=limit)
                track_id = results['tracks']['items'][0]['id']
                chunks_ids.append(track_id) 
            except Exception as e:
                print(f"Song not found for Title: {title}, Artist: {artist}")
                print(f"Error occurred: {e}")
                chunks_ids.append("None")
        time.sleep(20)
    return pd.DataFrame(chunks_ids, columns=['track_id'])

#### Apply function to not_hot_songs

In [9]:
nhs1 = search_song1(not_hot_songs)

Collecting IDs for chunk... 0
Collecting IDs for chunk... 1
Collecting IDs for chunk... 2
Collecting IDs for chunk... 3
Collecting IDs for chunk... 4
Collecting IDs for chunk... 5
Collecting IDs for chunk... 6
Collecting IDs for chunk... 7
Collecting IDs for chunk... 8
Collecting IDs for chunk... 9
Collecting IDs for chunk... 10
Collecting IDs for chunk... 11
Collecting IDs for chunk... 12
Collecting IDs for chunk... 13
Collecting IDs for chunk... 14
Collecting IDs for chunk... 15
Collecting IDs for chunk... 16
Collecting IDs for chunk... 17
Collecting IDs for chunk... 18
Collecting IDs for chunk... 19
Collecting IDs for chunk... 20
Collecting IDs for chunk... 21
Song not found for Title: Intro, Artist: The xx
Error occurred: list index out of range
Collecting IDs for chunk... 22
Song not found for Title: Over, Artist: Drake
Error occurred: list index out of range
Collecting IDs for chunk... 23
Collecting IDs for chunk... 24
Collecting IDs for chunk... 25
Collecting IDs for chunk... 26

In [10]:
nhs1.head(100)

Unnamed: 0,track_id
0,3AJwUDP919kvQ9QcozQPxg
1,2m1hi0nfMR9vdGC8UcrnwU
2,3y4LxiYMgDl4RethdzpmNe
3,60a0Rd6pjrkxjPbaKzXjfq
4,62bOmKYxYg7dhrC6gH9vFn
...,...
95,3VEZvzr84WVnoorZ4tlBSw
96,7w57O4o0xCTn9YpKuaPZDd
97,10GJQkjRJcZhGTLagFOC62
98,6znv7i4Wif5fLwI6OjKHZ4


In [11]:
nhs1.size

2298

In [12]:
#Concat df: not_hot_songs+ids

not_hot_songs_ids1 = pd.concat([not_hot_songs, nhs1], axis=1)
not_hot_songs_ids1 = not_hot_songs_ids1[not_hot_songs_ids1['track_id'] != "None"]
not_hot_songs_ids1.head()

Unnamed: 0,Title,Artist,track_id
0,Yellow,Coldplay,3AJwUDP919kvQ9QcozQPxg
1,All The Small Things,blink-182,2m1hi0nfMR9vdGC8UcrnwU
2,Breathe,Faith Hill,3y4LxiYMgDl4RethdzpmNe
3,In the End,Linkin Park,60a0Rd6pjrkxjPbaKzXjfq
4,Bye Bye Bye,*NSYNC,62bOmKYxYg7dhrC6gH9vFn


In [None]:
#not_hot_songs_ids = not_hot_songs_ids1[['Title', 'Artist', 'track_id']]

In [15]:
not_hot_songs_ids1.head(100)

NameError: name 'not_hot_songs_ids' is not defined

In [14]:
#Save the final DataFrame with ID's
not_hot_songs_ids1.to_csv('not_hot_songs_ids.csv', index=False)

#### Apply function to hot_songs

In [16]:
hs1 = search_song1(hot_songs)

Collecting IDs for chunk... 0
Collecting IDs for chunk... 1
Collecting IDs for chunk... 2
Collecting IDs for chunk... 3
Collecting IDs for chunk... 4
Collecting IDs for chunk... 5
Collecting IDs for chunk... 6
Collecting IDs for chunk... 7
Collecting IDs for chunk... 8
Collecting IDs for chunk... 9
Collecting IDs for chunk... 10
Collecting IDs for chunk... 11
Collecting IDs for chunk... 12
Collecting IDs for chunk... 13
Collecting IDs for chunk... 14
Collecting IDs for chunk... 15
Collecting IDs for chunk... 16
Collecting IDs for chunk... 17
Collecting IDs for chunk... 18
Collecting IDs for chunk... 19
Collecting IDs for chunk... 20
Collecting IDs for chunk... 21
Collecting IDs for chunk... 22
Collecting IDs for chunk... 23
Collecting IDs for chunk... 24
Collecting IDs for chunk... 25
Collecting IDs for chunk... 26
Collecting IDs for chunk... 27
Collecting IDs for chunk... 28
Collecting IDs for chunk... 29
Collecting IDs for chunk... 30
Collecting IDs for chunk... 31
Collecting IDs for

In [17]:
hs1.head(100)

Unnamed: 0,track_id
0,56y1jOTK0XSvJzVv9vHQBK
1,4iZ4pt7kvcaH6Yo8UoZ4s2
2,1mMLMZYXkMueg65jRRWG1l
3,2EGaDf0cPX789H3LNeB03D
4,4KULAymBBJcPRpk1yO4dOG
...,...
95,6BOP0cv1eeXcvi1oE8bDVZ
96,6VdBDm20nkyk6A29P785aJ
97,3jfywRZH6cB2iLyKqo4EZd
98,1o8Z7GD1CeOaVBEyuzu4HO


In [18]:
#Concat df: hot_songs+ids

hot_songs_ids1 = pd.concat([hot_songs, hs1], axis=1)
hot_songs_ids1 = hot_songs_ids1[hot_songs_ids1['track_id'] != "None"]
hot_songs_ids1.head(100)
# hot_songs_ids = hot_songs_ids.drop_duplicates(keep="first")
#hot_songs_ids.drop_duplicates(keep="first", inplace=True)
# bla, Neil Young
# bla, Bob Marley
# id[song, artist, album]

Unnamed: 0,Title,Artist,track_id
0,Paint The Town Red,Doja Cat,56y1jOTK0XSvJzVv9vHQBK
1,Snooze,SZA,4iZ4pt7kvcaH6Yo8UoZ4s2
2,Fast Car,Luke Combs,1mMLMZYXkMueg65jRRWG1l
3,Cruel Summer,Taylor Swift,2EGaDf0cPX789H3LNeB03D
4,I Remember Everything,Zach Bryan Featuring Kacey Musgraves,4KULAymBBJcPRpk1yO4dOG
...,...,...,...
95,Standing Room Only,Tim McGraw,6BOP0cv1eeXcvi1oE8bDVZ
96,Checkmate,Rod Wave,6VdBDm20nkyk6A29P785aJ
97,Can't Have Mine,Dylan Scott,3jfywRZH6cB2iLyKqo4EZd
98,On My Mama,Victoria Monet,1o8Z7GD1CeOaVBEyuzu4HO


In [21]:
#Save the final DataFrame with ID's
hot_songs_ids1.to_csv('hot_songs_ids1.csv', index=False)

## Audio features function

In [22]:
hs_ids = pd.read_csv('hot_songs_ids1.csv')
nhs_ids = pd.read_csv('not_hot_songs_ids.csv')

In [23]:
def get_audio_features(track_ids):
    sp = spotipy.Spotify(auth_manager=SpotifyClientCredentials(client_id=client_id, client_secret=client_secret))
    
    chunks = np.array_split(track_ids, 50)  # Split track_ids into chunks
    audio_features = []
    for index, chunk in enumerate(chunks):
        print(f"Collecting audio features for Chunk: {index}")
        list_of_ids =[str(track_id) for track_id in chunk]
        features = sp.audio_features(list_of_ids)
        audio_features += features
            # Introduce a time delay between requests to avoid rate limiting
        time.sleep(20)
    return pd.DataFrame(audio_features)


##### Audio_features not_hot_songs

In [24]:
audio_features_hs1 = get_audio_features(nhs_ids['track_id'])

Collecting audio features for Chunk: 0
Collecting audio features for Chunk: 1
Collecting audio features for Chunk: 2
Collecting audio features for Chunk: 3
Collecting audio features for Chunk: 4
Collecting audio features for Chunk: 5
Collecting audio features for Chunk: 6
Collecting audio features for Chunk: 7
Collecting audio features for Chunk: 8
Collecting audio features for Chunk: 9
Collecting audio features for Chunk: 10
Collecting audio features for Chunk: 11
Collecting audio features for Chunk: 12
Collecting audio features for Chunk: 13
Collecting audio features for Chunk: 14
Collecting audio features for Chunk: 15
Collecting audio features for Chunk: 16
Collecting audio features for Chunk: 17
Collecting audio features for Chunk: 18
Collecting audio features for Chunk: 19
Collecting audio features for Chunk: 20
Collecting audio features for Chunk: 21
Collecting audio features for Chunk: 22
Collecting audio features for Chunk: 23
Collecting audio features for Chunk: 24
Collecting

In [28]:
audio_features_hs1.head(5)

Unnamed: 0,danceability,energy,key,loudness,mode,speechiness,acousticness,instrumentalness,liveness,valence,tempo,type,id,uri,track_href,analysis_url,duration_ms,time_signature
0,0.429,0.661,11,-7.227,1,0.0281,0.00239,0.000121,0.234,0.285,173.372,audio_features,3AJwUDP919kvQ9QcozQPxg,spotify:track:3AJwUDP919kvQ9QcozQPxg,https://api.spotify.com/v1/tracks/3AJwUDP919kv...,https://api.spotify.com/v1/audio-analysis/3AJw...,266773,4
1,0.434,0.897,0,-4.918,1,0.0488,0.0103,0.0,0.612,0.684,148.726,audio_features,2m1hi0nfMR9vdGC8UcrnwU,spotify:track:2m1hi0nfMR9vdGC8UcrnwU,https://api.spotify.com/v1/tracks/2m1hi0nfMR9v...,https://api.spotify.com/v1/audio-analysis/2m1h...,167067,4
2,0.529,0.496,7,-9.007,1,0.029,0.173,0.0,0.251,0.278,136.859,audio_features,3y4LxiYMgDl4RethdzpmNe,spotify:track:3y4LxiYMgDl4RethdzpmNe,https://api.spotify.com/v1/tracks/3y4LxiYMgDl4...,https://api.spotify.com/v1/audio-analysis/3y4L...,250547,4
3,0.556,0.864,3,-5.87,0,0.0584,0.00958,0.0,0.209,0.4,105.143,audio_features,60a0Rd6pjrkxjPbaKzXjfq,spotify:track:60a0Rd6pjrkxjPbaKzXjfq,https://api.spotify.com/v1/tracks/60a0Rd6pjrkx...,https://api.spotify.com/v1/audio-analysis/60a0...,216880,4
4,0.61,0.926,8,-4.843,0,0.0479,0.031,0.0012,0.0821,0.861,172.638,audio_features,62bOmKYxYg7dhrC6gH9vFn,spotify:track:62bOmKYxYg7dhrC6gH9vFn,https://api.spotify.com/v1/tracks/62bOmKYxYg7d...,https://api.spotify.com/v1/audio-analysis/62bO...,200400,4


In [29]:
# Concat: not_hot_songs, audio_features_hs1(audio_features)

not_hot_songs_features = pd.concat([not_hot_songs, audio_features_hs1], axis=1)
not_hot_songs_features.head()

Unnamed: 0,Title,Artist,danceability,energy,key,loudness,mode,speechiness,acousticness,instrumentalness,liveness,valence,tempo,type,id,uri,track_href,analysis_url,duration_ms,time_signature
0,Yellow,Coldplay,0.429,0.661,11.0,-7.227,1.0,0.0281,0.00239,0.000121,0.234,0.285,173.372,audio_features,3AJwUDP919kvQ9QcozQPxg,spotify:track:3AJwUDP919kvQ9QcozQPxg,https://api.spotify.com/v1/tracks/3AJwUDP919kv...,https://api.spotify.com/v1/audio-analysis/3AJw...,266773.0,4.0
1,All The Small Things,blink-182,0.434,0.897,0.0,-4.918,1.0,0.0488,0.0103,0.0,0.612,0.684,148.726,audio_features,2m1hi0nfMR9vdGC8UcrnwU,spotify:track:2m1hi0nfMR9vdGC8UcrnwU,https://api.spotify.com/v1/tracks/2m1hi0nfMR9v...,https://api.spotify.com/v1/audio-analysis/2m1h...,167067.0,4.0
2,Breathe,Faith Hill,0.529,0.496,7.0,-9.007,1.0,0.029,0.173,0.0,0.251,0.278,136.859,audio_features,3y4LxiYMgDl4RethdzpmNe,spotify:track:3y4LxiYMgDl4RethdzpmNe,https://api.spotify.com/v1/tracks/3y4LxiYMgDl4...,https://api.spotify.com/v1/audio-analysis/3y4L...,250547.0,4.0
3,In the End,Linkin Park,0.556,0.864,3.0,-5.87,0.0,0.0584,0.00958,0.0,0.209,0.4,105.143,audio_features,60a0Rd6pjrkxjPbaKzXjfq,spotify:track:60a0Rd6pjrkxjPbaKzXjfq,https://api.spotify.com/v1/tracks/60a0Rd6pjrkx...,https://api.spotify.com/v1/audio-analysis/60a0...,216880.0,4.0
4,Bye Bye Bye,*NSYNC,0.61,0.926,8.0,-4.843,0.0,0.0479,0.031,0.0012,0.0821,0.861,172.638,audio_features,62bOmKYxYg7dhrC6gH9vFn,spotify:track:62bOmKYxYg7dhrC6gH9vFn,https://api.spotify.com/v1/tracks/62bOmKYxYg7d...,https://api.spotify.com/v1/audio-analysis/62bO...,200400.0,4.0


In [30]:
#Save the final dataframe not_hot_songs+ids+audio_features
not_hot_songs_features.to_csv('not_hot_songs_features.csv', index=False)

##### Audio_features hot_songs

In [25]:
audio_features_hs = get_audio_features(hs_ids['track_id'])

Collecting audio features for Chunk: 0
Collecting audio features for Chunk: 1
Collecting audio features for Chunk: 2
Collecting audio features for Chunk: 3
Collecting audio features for Chunk: 4
Collecting audio features for Chunk: 5
Collecting audio features for Chunk: 6
Collecting audio features for Chunk: 7
Collecting audio features for Chunk: 8
Collecting audio features for Chunk: 9
Collecting audio features for Chunk: 10
Collecting audio features for Chunk: 11
Collecting audio features for Chunk: 12
Collecting audio features for Chunk: 13
Collecting audio features for Chunk: 14
Collecting audio features for Chunk: 15
Collecting audio features for Chunk: 16
Collecting audio features for Chunk: 17
Collecting audio features for Chunk: 18
Collecting audio features for Chunk: 19
Collecting audio features for Chunk: 20
Collecting audio features for Chunk: 21
Collecting audio features for Chunk: 22
Collecting audio features for Chunk: 23
Collecting audio features for Chunk: 24
Collecting

In [32]:
hot_songs_features = pd.concat([hot_songs, audio_features_hs], axis=1)

In [33]:
#Save the final dataframe hot_songs+ids+audio_features
hot_songs_features.to_csv("hot_songs_features.csv", index=False)

In [None]:
# select numerical_audio_features.
# PCA, ISOMAP, TSNE -> reduce the selected audio features to 2. You want to do an scatterplot of the songs.
# KMeans, DBSCAN, HDBSCAN
# song_Recomender(ISOMAP, KMEANS)