Embeat: A Music Recommendation System Based on Acoustic Features

Introduction

Embeat is a music recommendation system built on Spotify acoustic feature data. It encodes audio features into vectors via a contrastive learning model and combines them with a multi-channel recall strategy to deliver high-quality music recommendations.

Key Features:

Acoustic Similarity: The EmbeatMLP model, trained on Spotify Audio Features (key, tempo, energy, valence, etc.), encodes acoustic features into 64-dim vectors
Genre Awareness: Leverages 6,000+ micro-genre tags to precisely assign genres to 2M+ artists, preventing "acoustically similar but stylistically different" recommendations
Multi-Channel Recall: 5 parallel recall channels (Acoustic Similarity / Same-Genre Popular / Same Artist / Similar Artists / Playlist Collaborative Filtering), merged and scored for final output
Playlist Collaborative Filtering: Track2Vec (Word2Vec-inspired) learns track co-occurrence patterns from 1.88M Spotify playlists
Millisecond-Level Response: Powered by the Qdrant vector database, retrieval across 45M tracks completes in 30–100ms

Roadmap

If you find this project helpful, please give it a ⭐️. It means a lot to a personal project, thanks!

2026-06-26: Open-source initial codebase + EmbeatMLP model weights
2026-06-26: Open-source 45M tracks dataset + technical documentation
100 Stars: Open-source Qdrant database
1K Stars: Open-source Track2Vec model weights + 1.8M playlists dataset

Demo

Below are example recommendation results from Embeat (please unmute before playing)

Uptown Funk - Bruno Mars [dance pop, pop]

Seed Track	Embeat #1	Embeat #2	Embeat #3
Uptown Funk - Bruno Mars	CAN'T STOP THE FEELING! - Justin Timberlake	Happy - Pharrell Williams	I Like to Move It - will.i.am
demo_1_seed_track.mp4	demo_1_embeat_1.mp4	demo_1_embeat_2.mp4	demo_1_embeat_3.mp4

杀死那个石家庄人 - 万能青年旅店 [chinese indie rock]

Seed Track	Embeat #1	Embeat #2	Embeat #3
杀死那个石家庄人 - 万能青年旅店	大石碎胸口 - 万能青年旅店	凄美地 - 郭顶	不要停止我的音乐 - 痛仰乐队
demo_2_seed_track.mp4	demo_2_embeat_1.mp4	demo_2_embeat_2.mp4	demo_2_embeat_3.mp4

Sis puella magica! - 梶浦由記 [anime score, japanese vgm]

Seed Track	Embeat #1	Embeat #2	Embeat #3
Sis puella magica! - 梶浦由記	Decretum - 梶浦由記	Zoltraak - Evan Call	Arrietty's Song - Cécile Corbel
demo_3_seed_track.mp4	demo_3_embeat_1.mp4	demo_3_embeat_2.mp4	demo_3_embeat_3.mp4

Gizeh - Oskar Schuster [compositional ambient]

Seed Track	Embeat #1	Embeat #2	Embeat #3
Gizeh - Oskar Schuster	Vleurgat - Oskar Schuster	Sleeping Lotus - Joep Beving	Travelling - James Spiteri
demo_4_seed_track.mp4	demo_4_embeat_1.mp4	demo_4_embeat_2.mp4	demo_4_embeat_3.mp4

LLM Blind Evaluation

Using the LLM-as-a-Judge method (GPT-5.5 / Gemini Flash 3.5 / Claude Sonnet 4.6), Embeat was blindly evaluated against Netease Cloud Music in AB tests:

Judge Model	Embeat Wins	Netease Wins
Claude Sonnet 4.6	8	2
Gemini Flash 3.5	9	1
GPT 5.5	6	4

Conclusions:

Embeat's core strength lies in its balance between style precision and artist diversity, with a particularly notable advantage in niche-style scenarios that span across languages and cultures
Netease Cloud Music retains some reference value only in its deep mining of Mandarin-language local content
For detailed comparison, please refer to the technical documentation

System Architecture

Model Details

EmbeatMLP - Acoustic Feature Encoding Model

Input: 64-dim discrete features (key, mode, tempo, time_signature) + 64-dim continuous features (energy, valence, danceability, etc., 7 dimensions)
Architecture: Dual-tower MLP (Discrete Tower + Acoustic Tower -> Backbone)
Output: 64-dim L2-normalized vectors
Training: Masked InfoNCE Loss, batch_size=4096, converges in ~70 steps
Extremely small parameter count, supports real-time CPU-only inference

Track2Vec - Playlist Collaborative Filtering Model

Based on Word2Vec Skip-Gram, treating playlists as "sentences" and tracks as "words"
Training data: 1.88M Spotify playlists
Vocabulary: 1.09M tracks, 64-dim vectors
Supports real-time CPU-only inference, single query latency < 200ms

Multi-Channel Recall

Input seed track: track_id / track_name + artist_name
  │
  ├─ Channel 1: Acoustic Similarity Recall (genre filtering + EmbeatMLP cosine similarity)
  ├─ Channel 2: Same-Genre Popular Recall (genre filtering + popularity ranking)
  ├─ Channel 3: Same Artist Recall (same artist + EmbeatMLP cosine similarity)
  ├─ Channel 4: Similar Artists Recall (similar artists + EmbeatMLP cosine similarity)
  ├─ Channel 5: Playlist Collaborative Filtering (Track2Vec cosine similarity)
  │
  ├─ ISRC Deduplication / Re-ranking / Same-Artist Ratio Control
  │
  └─ Output: Top-K Recommendation List

Project Structure

Embeat/
├── assets/                 # Static assets folder
├── checkpoints/            # Model weights folder
│   ├── EmbeatMLP/          # EmbeatMLP model weights
│   └── Track2Vec/          # Track2Vec model weights (requires separate download)
├── data/                   # Data processing folder (not fully organized)
├── infer/                  # Inference code folder
│   ├── Embeat.py           # Embeat recommendation system core
│   ├── EmbeatUtils.py      # Embeat extension utilities
│   ├── infer.py            # EmbeatMLP inference entry point
│   ├── eval_infer.py       # EmbeatMLP evaluation utilities
│   └── hf_to_qdrant.py     # HF Dataset to Qdrant database
├── train/                  # Training code folder
│   ├── model.py            # EmbeatMLP model definition
│   ├── dataset.py          # HF Dataset processing
│   ├── sampler.py          # Positive/negative sample sampler
│   ├── loss.py             # Masked InfoNCE Loss
│   ├── trainer.py          # EmbeatMLP trainer
│   ├── train.py            # EmbeatMLP training entry point
│   └── train_track2vec.py  # Track2Vec training entry point
├── requirements.txt
└── LICENSE

Getting Started

Requirements (recommended)

Python >= 3.10
PyTorch >= 2.6, < 2.7 (required for training)
CUDA >= 12.0 (required for training)
Qdrant (required for inference)

Installation

conda create -n embeat python=3.10
conda activate embeat

# Install PyTorch (CUDA 12.x), see https://pytorch.org/get-started/previous-versions/
pip install "torch>=2.6,<2.7" --index-url https://download.pytorch.org/whl/cu126

pip install -r requirements.txt

Train EmbeatMLP

# Prepare training data in HuggingFace Dataset format under data/datasets/, rename to spotify_45m_tracks_metadata
python -m train.train \
    --dataset data/datasets/spotify_45m_tracks_metadata@10000000 \
    --batch-size 4096 \
    --max-steps 200 \
    --lr 1e-4 \
    --tau 0.05 \
    --ckpt-dir checkpoints

Train Track2Vec

# Prepare playlist training data (txt format, one playlist per line, space-separated track_ids)
cd train
python train_track2vec.py

Inference: Compute Acoustic Similarity Between Two Tracks

from infer.infer import infer

song_a = {"key": 7, "mode": 1, "tempo": 137, "time_signature": 4,
          "danceability": 0.54, "energy": 0.56, "speechiness": 0.02,
          "instrumentalness": 0.0, "valence": 0.41, "acousticness": 0.23,
          "liveness": 0.1}

song_b = {"key": 5, "mode": 0, "tempo": 87, "time_signature": 4,
          "danceability": 0.67, "energy": 0.65, "speechiness": 0.05,
          "instrumentalness": 0.03, "valence": 0.57, "acousticness": 0.27,
          "liveness": 0.19}

similarity = infer(sample_a=song_a, sample_b=song_b,
                   checkpoint_path="checkpoints/EmbeatMLP/model.pt")

print(f"Similarity: {similarity}")

Inference: Qdrant-Based Music Recommendation

# 1. Start the Qdrant service and import the database
# 2. Query recommendations via command line
cd infer
python Embeat.py -t 5pIcwtJYNJx93l420oR2Vm   # Query by Spotify Track ID
python Embeat.py -s "晴天 - Jay Chou"   # Query by track name and artist
python Embeat.py -a "Jay Chou"   # Query by artist name

Acknowledgements

License

Scope	License
Code, Model Weights	MIT
Datasets, Database	CC-BY-NC 4.0

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Embeat: A Music Recommendation System Based on Acoustic Features

Introduction

Roadmap

Demo

LLM Blind Evaluation

System Architecture

Model Details

Multi-Channel Recall

Project Structure

Getting Started

Requirements (recommended)

Installation

Train EmbeatMLP

Train Track2Vec

Inference: Compute Acoustic Similarity Between Two Tracks

Inference: Qdrant-Based Music Recommendation

Related Links

Acknowledgements

License

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 4 Commits
assets		assets
checkpoints		checkpoints
data		data
infer		infer
train		train
LICENSE		LICENSE
README.md		README.md
README_zh.md		README_zh.md
requirements.txt		requirements.txt

Folders and files

Latest commit

History

Repository files navigation

Embeat: A Music Recommendation System Based on Acoustic Features

Introduction

Roadmap

Demo

LLM Blind Evaluation

System Architecture

Model Details

Multi-Channel Recall

Project Structure

Getting Started

Requirements (recommended)

Installation

Train EmbeatMLP

Train Track2Vec

Inference: Compute Acoustic Similarity Between Two Tracks

Inference: Qdrant-Based Music Recommendation

Related Links

Acknowledgements

License

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages