# SongSearch
Using spotify song data for similarity search

## Table of Contents:
* [Dataset](#Dataset)
* [Visualizing the embeddings](#Visualizing_the_embeddings)
* [Similarity Search](#Similarity_search)
    * [K Nearest Neighbors](#knn)
    * [Cosine Similarity](#cosine_similarity)
    * [Optimized Cosine Similarity](#optim_cosine_similarity)

In [2]:
import pandas as pd
import numpy as np

---
## Dataset <a class="anchor" id="Dataset"></a>

30k songs from spotify api

source: https://www.kaggle.com/datasets/joebeachcapital/30000-spotify-songs

Uploaded december 2023

### Import the song data

In [3]:
csv_file = "../dataset/spotify_songs.csv"
df = pd.read_csv(csv_file)
print(f'No of songs: {df.shape[0]}, No of columns: {df.shape[1]}') # print shape of dataset

No of songs: 32833, No of columns: 23


In [4]:
df.head() # print first 5 rows

Unnamed: 0,track_id,track_name,track_artist,track_popularity,track_album_id,track_album_name,track_album_release_date,playlist_name,playlist_id,playlist_genre,...,key,loudness,mode,speechiness,acousticness,instrumentalness,liveness,valence,tempo,duration_ms
0,6f807x0ima9a1j3VPbc7VN,I Don't Care (with Justin Bieber) - Loud Luxur...,Ed Sheeran,66,2oCs0DGTsRO98Gh5ZSl2Cx,I Don't Care (with Justin Bieber) [Loud Luxury...,2019-06-14,Pop Remix,37i9dQZF1DXcZDD7cfEKhW,pop,...,6,-2.634,1,0.0583,0.102,0.0,0.0653,0.518,122.036,194754
1,0r7CVbZTWZgbTCYdfa2P31,Memories - Dillon Francis Remix,Maroon 5,67,63rPSO264uRjW1X5E6cWv6,Memories (Dillon Francis Remix),2019-12-13,Pop Remix,37i9dQZF1DXcZDD7cfEKhW,pop,...,11,-4.969,1,0.0373,0.0724,0.00421,0.357,0.693,99.972,162600
2,1z1Hg7Vb0AhHDiEmnDE79l,All the Time - Don Diablo Remix,Zara Larsson,70,1HoSmj2eLcsrR0vE9gThr4,All the Time (Don Diablo Remix),2019-07-05,Pop Remix,37i9dQZF1DXcZDD7cfEKhW,pop,...,1,-3.432,0,0.0742,0.0794,2.3e-05,0.11,0.613,124.008,176616
3,75FpbthrwQmzHlBJLuGdC7,Call You Mine - Keanu Silva Remix,The Chainsmokers,60,1nqYsOef1yKKuGOVchbsk6,Call You Mine - The Remixes,2019-07-19,Pop Remix,37i9dQZF1DXcZDD7cfEKhW,pop,...,7,-3.778,1,0.102,0.0287,9e-06,0.204,0.277,121.956,169093
4,1e8PAfcKUYoKkxPhrHqw4x,Someone You Loved - Future Humans Remix,Lewis Capaldi,69,7m7vv9wlQ4i0LFuJiE2zsQ,Someone You Loved (Future Humans Remix),2019-03-05,Pop Remix,37i9dQZF1DXcZDD7cfEKhW,pop,...,1,-4.672,1,0.0359,0.0803,0.0,0.0833,0.725,123.976,189052


In [5]:
df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 32833 entries, 0 to 32832
Data columns (total 23 columns):
 #   Column                    Non-Null Count  Dtype  
---  ------                    --------------  -----  
 0   track_id                  32833 non-null  object 
 1   track_name                32828 non-null  object 
 2   track_artist              32828 non-null  object 
 3   track_popularity          32833 non-null  int64  
 4   track_album_id            32833 non-null  object 
 5   track_album_name          32828 non-null  object 
 6   track_album_release_date  32833 non-null  object 
 7   playlist_name             32833 non-null  object 
 8   playlist_id               32833 non-null  object 
 9   playlist_genre            32833 non-null  object 
 10  playlist_subgenre         32833 non-null  object 
 11  danceability              32833 non-null  float64
 12  energy                    32833 non-null  float64
 13  key                       32833 non-null  int64  
 14  loudne

In [6]:
df.describe()

Unnamed: 0,track_popularity,danceability,energy,key,loudness,mode,speechiness,acousticness,instrumentalness,liveness,valence,tempo,duration_ms
count,32833.0,32833.0,32833.0,32833.0,32833.0,32833.0,32833.0,32833.0,32833.0,32833.0,32833.0,32833.0,32833.0
mean,42.477081,0.65485,0.698619,5.374471,-6.719499,0.565711,0.107068,0.175334,0.084747,0.190176,0.510561,120.881132,225799.811622
std,24.984074,0.145085,0.18091,3.611657,2.988436,0.495671,0.101314,0.219633,0.22423,0.154317,0.233146,26.903624,59834.006182
min,0.0,0.0,0.000175,0.0,-46.448,0.0,0.0,0.0,0.0,0.0,0.0,0.0,4000.0
25%,24.0,0.563,0.581,2.0,-8.171,0.0,0.041,0.0151,0.0,0.0927,0.331,99.96,187819.0
50%,45.0,0.672,0.721,6.0,-6.166,1.0,0.0625,0.0804,1.6e-05,0.127,0.512,121.984,216000.0
75%,62.0,0.761,0.84,9.0,-4.645,1.0,0.132,0.255,0.00483,0.248,0.693,133.918,253585.0
max,100.0,0.983,1.0,11.0,1.275,1.0,0.918,0.994,0.994,0.996,0.991,239.44,517810.0


---


### Processing the dataset:
Process genres

In [7]:
# Get the unique genres and subgenres
unique_genres = df['playlist_genre'].unique()
unique_subgenres = df['playlist_subgenre'].unique()

# Create a dictionary to map genres to numbers
genre_mapping = {genre: i * 100 for i, genre in enumerate(unique_genres)}
genre_mapping['Other'] = 0  # Assign 0 to 'Other' genre

# Create a dictionary to map subgenres to numbers
subgenre_mapping = {}
for genre in unique_genres:
    subgenres = df[df['playlist_genre'] == genre]['playlist_subgenre'].unique()
    for i, subgenre in enumerate(subgenres):
        subgenre_mapping[subgenre] = genre_mapping[genre] + i + 1

subgenre_mapping['Other'] = 0  # Assign 0 to 'Other' subgenre

# Create new columns 'genre_id' and 'subgenre_id' with the assigned numbers
df['genre_id'] = df['playlist_genre'].map(genre_mapping)
df['genre_id'] = df['genre_id'].fillna(0).astype(int)
df['subgenre_id'] = df['playlist_subgenre'].map(subgenre_mapping)
df['subgenre_id'] = df['subgenre_id'].fillna(0).astype(int)

<b>Create song embeddings from characteristics</b>

In [8]:
# select columns that will be included in our song embedding
embedding_headers = ["danceability", "energy", "key", "loudness", "mode", "speechiness", "acousticness", "instrumentalness", "liveness", "valence", "tempo"]
# embedding_headers = ["genre_id", "subgenre_id", "danceability", "energy", "key", "loudness", "mode", "speechiness", "acousticness", "instrumentalness", "liveness", "valence", "tempo"]

# store the embeddings
embedding_df =  df[embedding_headers]

<b>Normalizing the embeddings</b>

In [9]:
from sklearn.preprocessing import StandardScaler

scaler = StandardScaler() # Create an instance of StandardScaler
embedding_matrix = scaler.fit_transform(embedding_df) # Fit and transform the embeddings
print(embedding_matrix[0])

[ 0.64204909  1.20161406  0.1731999   1.36712341  0.87617693 -0.48136238
 -0.33389784 -0.37795302 -0.80922951  0.03190765  0.04292678]


---
## Visualizing the embeddings <a class="anchor" id="Visualizing_the_embeddings"></a>

In [10]:
import matplotlib.pyplot as plt
from sklearn.manifold import TSNE
import plotly.graph_objects as go
from plotly.offline import plot

In [24]:
# embedding_matrix contains the song embeddings after being normalized
embeddings = embedding_matrix

# Perform t-SNE to reduce the dimensions to 3
tsne = TSNE(n_components=3, random_state=42)
embeddings_tsne = tsne.fit_transform(embeddings)

In [37]:
# Create a DataFrame with the t-SNE embeddings and genre information
tsne_df = pd.DataFrame(data=embeddings_tsne, columns=['t-SNE1', 't-SNE2', 't-SNE3'])
tsne_df['genre'] = df['playlist_subgenre']
tsne_df['track_name'] = df['track_name']
tsne_df['artist'] = df['track_artist']

# Create custom data for the hover template
custom_data = np.stack((tsne_df['track_name'], tsne_df['artist'], tsne_df['genre']), axis=-1)

# Create an interactive 3D Plotly plot
fig = go.Figure(data=go.Scatter3d(
    x=tsne_df['t-SNE1'],
    y=tsne_df['t-SNE2'],
    z=tsne_df['t-SNE3'],
    mode='markers',
    marker=dict(
        size=3,
        color=tsne_df['genre'].astype('category').cat.codes,
        colorscale='viridis',
        opacity=0.7
    ),
    customdata=custom_data,
    hovertemplate='<b>Track:</b> %{customdata[0]}<br><b>Artist:</b> %{customdata[1]}<br><b>Genre:</b> %{customdata[2]}<extra></extra>'
))

fig.update_layout(
    title='Interactive Visualization of Song Embeddings',
    scene=dict(
        xaxis_title='x1',
        yaxis_title='x2',
        zaxis_title='x3'
    )
)

# Save the interactive plot as an HTML file
plot(fig, filename='song_embeddings_3d.html')

'song_embeddings_3d.html'

---
## Similarity Search <a class="anchor" id="Similarity_search"></a>

### K Nearest Neighbors: <a class="anchor" id="knn"></a>

In [71]:
from sklearn.neighbors import NearestNeighbors

def find_similar_songs(song_embeddings, song_index, k=10):
    # Create a NearestNeighbors object
    nn = NearestNeighbors(n_neighbors=k, metric='cosine')
    
    # Fit the NearestNeighbors object with the song embeddings
    nn.fit(song_embeddings)
    
    # Get the embedding of the selected song
    selected_song_embedding = song_embeddings[song_index]
    
    # Find the k nearest neighbors of the selected song
    distances, indices = nn.kneighbors([selected_song_embedding])
    
    # Get the distances and indices of the similar songs
    similar_song_distances = distances[0]
    similar_song_indices = indices[0]
    
    # Return the distances and indices of the similar songs
    return similar_song_distances, similar_song_indices

In [72]:
%time
song_embeddings = embedding_matrix
id = 30000
song = df.iloc[id]
print(f'ID: {id}, Name: {song["track_name"]}, Artist(s): {song["track_artist"]}')
print(f'Genre: [{song["playlist_genre"]}, {song["playlist_subgenre"]}]\n')

k = 10 # Number of similar songs to retrieve

# Find similar songs
similar_song_distances, similar_song_indices = find_similar_songs(song_embeddings, id, k)

CPU times: user 3 µs, sys: 1e+03 ns, total: 4 µs
Wall time: 8.34 µs
ID: 30000, Name: My House, Artist(s): Flo Rida
Genre: [edm, pop edm]



In [73]:
# Print the details of the similar songs
for i, index in enumerate(similar_song_indices):
    if i == 0:
        continue
    print(f"Song {i}:")
    print("  Index:", index)
    print("  Title:", df.loc[index, 'track_name'])
    print("  Artist:", df.loc[index, 'track_artist'])
    print("  Distance:", similar_song_distances[i])
    print()

Song 1:
  Index: 4813
  Title: My House
  Artist: Flo Rida
  Distance: 0.0

Song 2:
  Index: 17859
  Title: No Te Vayas
  Artist: Camilo
  Distance: 0.046839562599604534

Song 3:
  Index: 23672
  Title: Per Un Milione
  Artist: Boomdabash
  Distance: 0.048672934094705855

Song 4:
  Index: 14648
  Title: Dirt in my Eyes
  Artist: Cold War Kids
  Distance: 0.0490695935056602

Song 5:
  Index: 25030
  Title: Tell Me How You Feel - Radio Mix
  Artist: Joy Enriquez
  Distance: 0.05361673465518446

Song 6:
  Index: 3712
  Title: I Took A Pill In Ibiza - Seeb Remix
  Artist: Mike Posner
  Distance: 0.059688812912257694

Song 7:
  Index: 5325
  Title: I Took A Pill In Ibiza - Seeb Remix
  Artist: Mike Posner
  Distance: 0.059688812912257694

Song 8:
  Index: 30108
  Title: I Took A Pill In Ibiza - Seeb Remix
  Artist: Mike Posner
  Distance: 0.059688812912257694

Song 9:
  Index: 2620
  Title: Blow That Smoke (feat. Tove Lo)
  Artist: Major Lazer
  Distance: 0.06313268377241965



### Cosine similarity: <a class="anchor" id="cosine_similarity"></a>

    cosine_similarity_vec(songA, songB) = dot_product(A,B) / (norm(A) * norm(B))
    cosine_similarity_matrix(songsA, songB) = dot_product(A,B) / (norm(A) * norm(B))

In [74]:
def cosine_similarity_vec(vecA, vecB):
    dp = np.dot(vecA, vecB) # dot product of vector A and B
    # Compute the L2 norms (Euclidean lengths) (Frobenius norm) of the vectors
    norm1 = np.linalg.norm(vecA)
    norm2 = np.linalg.norm(vecB)
    return dp / (norm1 * norm2)

def cosine_similarity_matrix_vec(matrixA, vecB):
    dp = np.dot(matrixA, vecB) # dot product of matrix A and vector B
    # Compute the L2 norms (Euclidean lengths) (Frobenius norm) of the matirx A and vector B
    norm1 = np.linalg.norm(matrixA, axis=1)
    norm2 = np.linalg.norm(vecB)
    return dp / (norm1 * norm2)

def cosine_similarity_matrix(matrixA, matrixB):
    dp = np.dot(matrixA, matrixB.T) # dot product of matrix A and vector B
    # Compute the L2 norms (Euclidean lengths) (Frobenius norm) of the matirx A and matrix B
    norm1 = np.linalg.norm(matrixA, axis=1)
    norm2 = np.linalg.norm(matrixB, axis=1)
    return dp / np.outer(norm1, norm2)

<b>Softmax function:</b>

In [75]:
def softmax(x):
    exp_x = np.exp(x - np.max(x))
    return exp_x / np.sum(exp_x)

---

<b>Run similarity search</b>

In [80]:
%%time
# Find the cosines of the song id wrt to every other song
id = 30000
song = df.iloc[id]
print(f'ID: {id}, Name: {song["track_name"]}, Artist(s): {song["track_artist"]}')
print(f'Genre: [{song["playlist_genre"]}, {song["playlist_subgenre"]}]\n')
res = cosine_similarity_matrix_vec(embedding_matrix, embedding_matrix[id])
# softmax_res = softmax(res)

ID: 30000, Name: My House, Artist(s): Flo Rida
Genre: [edm, pop edm]

CPU times: user 32.8 ms, sys: 38.4 ms, total: 71.1 ms
Wall time: 13.4 ms


In [81]:
%%time
k = 20  # Number of top similar songs to retrieve
top_k_idx = np.argsort(res)[-k:][::-1][1:] # Get the indices of the top k similar songs and ignore first song
top_k_scores = res[top_k_idx] # Get the scores of the top k similar songs

CPU times: user 3.46 ms, sys: 16 ms, total: 19.4 ms
Wall time: 2.08 ms


Display metadata of top k songs

In [82]:
for idx, score in zip(top_k_idx, top_k_scores):
    song = df.iloc[idx]
    print(f'ID: {idx}, Name: {song["track_name"]}, Artist(s): {song["track_artist"]}')
    print(f'Genre: [{song["playlist_genre"]}, {song["playlist_subgenre"]}], Score: {score}\n')

ID: 30000, Name: My House, Artist(s): Flo Rida
Genre: [edm, pop edm], Score: 1.0

ID: 17859, Name: No Te Vayas, Artist(s): Camilo
Genre: [latin, latin pop], Score: 0.9531604374003954

ID: 23672, Name: Per Un Milione, Artist(s): Boomdabash
Genre: [r&b, hip pop], Score: 0.9513270659052938

ID: 14648, Name: Dirt in my Eyes, Artist(s): Cold War Kids
Genre: [rock, permanent wave], Score: 0.9509304064943395

ID: 25030, Name: Tell Me How You Feel - Radio Mix, Artist(s): Joy Enriquez
Genre: [r&b, new jack swing], Score: 0.9463832653448154

ID: 5325, Name: I Took A Pill In Ibiza - Seeb Remix, Artist(s): Mike Posner
Genre: [pop, indie poptimism], Score: 0.9403111870877422

ID: 3712, Name: I Took A Pill In Ibiza - Seeb Remix, Artist(s): Mike Posner
Genre: [pop, electropop], Score: 0.9403111870877422

ID: 30108, Name: I Took A Pill In Ibiza - Seeb Remix, Artist(s): Mike Posner
Genre: [edm, pop edm], Score: 0.9403111870877422

ID: 2620, Name: Blow That Smoke (feat. Tove Lo), Artist(s): Major Lazer


---
<b>Find average of two embeddings </b>

In [121]:
# Find the cosines of the first song wrt to every other song
id1 = 220
song = df.iloc[id1]
print(f'ID: {id1}, Name: {song["track_name"]}, Artist(s): {song["track_artist"]}')
print(f'Genre: [{song["playlist_genre"]}, {song["playlist_subgenre"]}]\n')

id2 = 300
song = df.iloc[id2]
print(f'ID: {id2}, Name: {song["track_name"]}, Artist(s): {song["track_artist"]}')
print(f'Genre: [{song["playlist_genre"]}, {song["playlist_subgenre"]}]\n')

def concatenate_song_vectors(*song_vectors):
    # Concatenate the song vectors vertically
    concatenated_matrix = np.vstack(song_vectors)
    return concatenated_matrix

concatenated_matrix = concatenate_song_vectors(embedding_matrix[id1], embedding_matrix[id2])
average_embedding = np.mean(concatenated_matrix, axis=0)

ID: 220, Name: Roses - Imanbek Remix, Artist(s): SAINt JHN
Genre: [pop, dance pop]

ID: 300, Name: Summer Days (feat. Macklemore & Patrick Stump of Fall Out Boy), Artist(s): Martin Garrix
Genre: [pop, dance pop]



In [122]:
%%time

avg_res = cosine_similarity_matrix_vec(embedding_matrix, average_embedding)
k = 10  # Number of top similar songs to retrieve
top_k_idx = np.argsort(avg_res)[-k:][::-1] # Get the indices of the top k similar songs
top_k_scores = avg_res[top_k_idx] # Get the scores of the top k similar songs

CPU times: user 76.4 ms, sys: 85.6 ms, total: 162 ms
Wall time: 22.1 ms


In [123]:
for idx, score in zip(top_k_idx, top_k_scores):
    song = df.iloc[idx]
    print(f'ID: {idx}, Name: {song["track_name"]}, Artist(s): {song["track_artist"]}')
    print(f'Genre: [{song["playlist_genre"]}, {song["playlist_subgenre"]}], Score: {score}\n')

ID: 1488, Name: Rumors, Artist(s): Lindsay Lohan
Genre: [pop, post-teen pop], Score: 0.8518682256465477

ID: 30038, Name: Lonely, Artist(s): Carson Lueders
Genre: [edm, pop edm], Score: 0.8413254304031813

ID: 14648, Name: Dirt in my Eyes, Artist(s): Cold War Kids
Genre: [rock, permanent wave], Score: 0.8404903490182943

ID: 18923, Name: Lo Que Pasó, Pasó, Artist(s): Daddy Yankee
Genre: [latin, reggaeton], Score: 0.8277508433689182

ID: 25514, Name: Doing Alright, Artist(s): Bastian Steven
Genre: [r&b, neo soul], Score: 0.8271037985607315

ID: 24818, Name: If I Had No Loot, Artist(s): Tony! Toni! Toné!
Genre: [r&b, new jack swing], Score: 0.8261462820222887

ID: 22628, Name: La Negra Tiene Tumbao, Artist(s): Celia Cruz
Genre: [r&b, urban contemporary], Score: 0.8169560596331372

ID: 7334, Name: Pretty Girls, Artist(s): Britney Spears
Genre: [rap, southern hip hop], Score: 0.813493912646819

ID: 1739, Name: Pretty Girls, Artist(s): Britney Spears
Genre: [pop, post-teen pop], Score: 0.81

---
### Optimized Cosine Similarity <a class="anchor" id="optim_cosine_similarity"></a>


In [188]:
from sklearn.cluster import KMeans

# Step 1: Perform clustering on the embedding dataset
num_clusters = 300
kmeans = KMeans(num_clusters, random_state=42)
cluster_labels = kmeans.fit_predict(embedding_matrix)





In [189]:
# Step 2: Create the average embeddings matrix
cluster_averages = []
cluster_indices = []
for i in range(num_clusters):
    cluster_embeddings = embedding_matrix[cluster_labels == i]
    cluster_average = np.mean(cluster_embeddings, axis=0)
    cluster_averages.append(cluster_average)
    cluster_indices.append(np.where(cluster_labels == i)[0])

cluster_averages = np.array(cluster_averages)

<b>visualizing the clusters:</b>

In [174]:
# cluster averages contains the average embedding of each cluster
embeddings = cluster_averages

# Perform t-SNE to reduce the dimensions to 3
tsne = TSNE(n_components=3, random_state=42)
embeddings_tsne = tsne.fit_transform(embeddings)

In [175]:
# Create a DataFrame with the t-SNE embeddings and cluster information
tsne_df = pd.DataFrame(data=embeddings_tsne, columns=['t-SNE1', 't-SNE2', 't-SNE3'])
tsne_df['cluster'] = np.arange(len(cluster_averages))

# Create custom data for the hover template
custom_data = np.stack((tsne_df['cluster']), axis=-1)

# Create an interactive 3D Plotly plot
fig = go.Figure(data=go.Scatter3d(
    x=tsne_df['t-SNE1'],
    y=tsne_df['t-SNE2'],
    z=tsne_df['t-SNE3'],
    mode='markers',
    marker=dict(
        size=5,
        color=tsne_df['cluster'],
        colorscale='viridis',
        opacity=0.8
    ),
    customdata=custom_data,
    hovertemplate='<b>Cluster:</b> %{customdata[0]}<extra></extra>'
))

fig.update_layout(
    title='Interactive Visualization of Cluster Average Embeddings',
    scene=dict(
        xaxis_title='x1',
        yaxis_title='x2',
        zaxis_title='x3'
    )
)

# Save the interactive plot as an HTML file
plot(fig, filename='cluster_average_embeddings_3d.html')

'cluster_average_embeddings_3d.html'

<b>running similarity search:</b>

In [233]:
# Step 3: Find similar songs
def retrieve_similar_songs(song_embedding, top_k=11, num_clusters_to_compare=3):
    # Compare the song embedding to the cluster average embeddings
    cluster_res = cosine_similarity_matrix_vec(cluster_averages, song_embedding)
    top_clusters = np.argsort(cluster_res)[-num_clusters_to_compare:][::-1]
    # Concatenate the embeddings and metadata from the top clusters
    top_clusters_embeddings = []
    top_clusters_metadata = []
    
    for top_cluster in top_clusters:
        cluster_embeddings = embedding_matrix[cluster_indices[top_cluster]]
        cluster_metadata = df.iloc[cluster_indices[top_cluster]]
        
        top_clusters_embeddings.append(cluster_embeddings)
        top_clusters_metadata.append(cluster_metadata)
    
    top_clusters_embeddings = np.concatenate(top_clusters_embeddings, axis=0)
    top_clusters_metadata = pd.concat(top_clusters_metadata, ignore_index=True)
    
    # Calculate cosine similarity within the concatenated top clusters
    res = cosine_similarity_matrix_vec(top_clusters_embeddings, song_embedding)
    top_indices = np.argsort(res)[-top_k:][::-1][1:]
    
    # Get the top similar songs and their metadata
    similar_songs = top_clusters_metadata.iloc[top_indices]
    similar_songs_scores = res[top_indices]
    
    return similar_songs, similar_songs_scores

In [259]:
# Example usage
song_index = 99 # Specify the index of the song you want to find similar songs for
song_embedding = embedding_matrix[song_index]
song = df.iloc[song_index]
print(f'ID: {song_index}, Name: {song["track_name"]}, Artist(s): {song["track_artist"]}')
print(f'Genre: [{song["playlist_genre"]}, {song["playlist_subgenre"]}]\n')

ID: 99, Name: Good Things Fall Apart (with Jon Bellion), Artist(s): ILLENIUM
Genre: [pop, dance pop]



In [260]:
%time

similar_songs, similar_songs_scores = retrieve_similar_songs(song_embedding)

# Print the similar songs and their scores
print(f"\nSimilar songs for: {df.iloc[song_index]['track_name']} - {df.iloc[song_index]['track_artist']}\n")
for i, (index, song) in enumerate(similar_songs.iterrows()):
    print(f"{i+1}. {song['track_name']} - {song['track_artist']} [Score: {similar_songs_scores[i]:.4f}]")

CPU times: user 3 µs, sys: 2 µs, total: 5 µs
Wall time: 8.82 µs

Similar songs for: Good Things Fall Apart (with Jon Bellion) - ILLENIUM

1. Good Things Fall Apart (with Jon Bellion) - ILLENIUM [Score: 1.0000]
2. Whatever It Takes - Imagine Dragons [Score: 0.9369]
3. Say It Ain't So - Weezer [Score: 0.9357]
4. Say It Ain't So - Weezer [Score: 0.9357]
5. Made For You - John De Sohn [Score: 0.9285]
6. Obsessed - Hogland [Score: 0.9167]
7. Blue Savannah - Erasure [Score: 0.9041]
8. Everyday - Logic [Score: 0.9006]
9. Everyday - Logic [Score: 0.9006]
10. Mr. Angel - Tommy Newport [Score: 0.8927]
