# Song Vectorizer & Music Similarity Explorer Demo

This notebook demonstrates how to use the `song-reccomender` library to embed songs, find similar tracks, and visualize the music space.

In [8]:
import os
import sys

# Add src to path
sys.path.append(os.path.abspath('../'))

from src.embeddings.embedder import AudioEmbedder
from src.similarity.recommender import Recommender
from src.visualization.projector import Projector
from src.visualization.plot import plot_embeddings_interactive

## 1. Setup Paths

In [9]:
INPUT_DIR = '../data/audio'  # Put some .mp3 or .wav files here
OUTPUT_DIR = '../data/embeddings'

## 2. Generate Embeddings
This step loads the CLAP model and processes all audio files in the input directory.

In [20]:
embedder = AudioEmbedder()
embedder.embed_library(INPUT_DIR, OUTPUT_DIR)

Loading CLAP model: laion/clap-htsat-unfused...
Model moved to MPS.
Found 49 audio files in ../data/audio



PySoundFile failed. Trying audioread instead.


librosa.core.audio.__audioread_load
	Deprecated as of librosa version 0.10.0.
	It will be removed in librosa version 1.0.


PySoundFile failed. Trying audioread instead.
	Audioread support is deprecated in librosa 0.10.0 and will be removed in version 1.0.

Embedding songs: 100%|██████████| 49/49 [00:00<00:00, 62.20it/s]

Finished embedding. Metadata saved to ../data/embeddings/metadata.json





## 3. Find Recommendations
Load the embeddings and find similar songs.

In [21]:
recommender = Recommender(OUTPUT_DIR)

# List all available songs
for i, meta in enumerate(recommender.metadata):
    print(f"{i}: {meta.get('filename')}")

Loaded 49 embeddings.
0: Taylor Swift - Bad Blood.m4a
1: Taylor Swift - Love Story.m4a
2: The Chainsmokers - All We Know (feat. Phoebe Ryan).m4a
3: Taylor Swift - Shake It Off.m4a
4: The Chainsmokers - Hope (feat. Winona Oak).m4a
5: Cloud & Kunfu - Bunt..m4a
6: Martin Garrix & Dua Lipa - Scared to Be Lonely.m4a
7: Martin Garrix - Don't Look Down (feat. Usher).m4a
8: The Chainsmokers - Closer (feat. Halsey).m4a
9: Armin van Buuren & Martin Garrix - Sleepless Nights (feat. Libby Whitehouse).m4a
10: The Chainsmokers - Roses (feat. ROZES).m4a
11: The Chainsmokers - This Feeling (feat. Kelsea Ballerini).m4a
12: Martin Garrix & Bebe Rexha - In the Name of Love.m4a
13: Martin Garrix & Troye Sivan - There for You.m4a
14: ILLENIUM & Teddy Swims - All That Really Matters.m4a
15: ILLENIUM & X Ambassadors - In Your Arms.m4a
16: The Chainsmokers - Closer (feat. Halsey) [R3hab Remix].m4a
17: Martin Garrix - Animals.m4a
18: ILLENIUM & Annika Wells - Nightlight.m4a
19: Taylor Swift - Blank Space.m4a
2

In [22]:
idx = 35
if recommender.metadata:
    first_song = recommender.metadata[idx]['filename']
    print(f"Finding recommendations for: {first_song}")
    
    recs = recommender.recommend(song_index=idx, k=5)
    for r in recs:
        print(f"- {r['filename']} (Score: {r['similarity_score']:.4f})")

Finding recommendations for: Taylor Swift - The Fate of Ophelia.m4a
- The Chainsmokers - All We Know (feat. Phoebe Ryan).m4a (Score: 0.9341)
- ILLENIUM & Ryan Tedder - With Your Love.m4a (Score: 0.9110)
- Taylor Swift - You Need To Calm Down.m4a (Score: 0.9043)
- The Chainsmokers - This Feeling (feat. Kelsea Ballerini).m4a (Score: 0.8953)
- The Chainsmokers - Hope (feat. Winona Oak).m4a (Score: 0.8946)


## 4. Visualize Embeddings
Project high-dimensional embeddings to 2D using UMAP.

In [23]:
projector = Projector(method='umap')
projections = projector.fit_transform(recommender.embeddings)

plot_embeddings_interactive(projections, recommender.metadata)


n_jobs value 1 overridden to 1 by setting random_state. Use no seed for parallelism.

