English | 简体中文
Embeat is a music recommendation system built on Spotify acoustic feature data. It encodes audio features into vectors via a contrastive learning model and combines them with a multi-channel recall strategy to deliver high-quality music recommendations.
Key Features:
- Acoustic Similarity: The EmbeatMLP model, trained on Spotify Audio Features (key, tempo, energy, valence, etc.), encodes acoustic features into 64-dim vectors
- Genre Awareness: Leverages 6,000+ micro-genre tags to precisely assign genres to 2M+ artists, preventing "acoustically similar but stylistically different" recommendations
- Multi-Channel Recall: 5 parallel recall channels (Acoustic Similarity / Same-Genre Popular / Same Artist / Similar Artists / Playlist Collaborative Filtering), merged and scored for final output
- Playlist Collaborative Filtering: Track2Vec (Word2Vec-inspired) learns track co-occurrence patterns from 1.88M Spotify playlists
- Millisecond-Level Response: Powered by the Qdrant vector database, retrieval across 45M tracks completes in 30–100ms
If you find this project helpful, please give it a ⭐️. It means a lot to a personal project, thanks!
- 2026-06-26: Open-source initial codebase + EmbeatMLP model weights
- 2026-06-26: Open-source 45M tracks dataset + technical documentation
- 100 Stars: Open-source Qdrant database
- 1K Stars: Open-source Track2Vec model weights + 1.8M playlists dataset
Below are example recommendation results from Embeat (please unmute before playing)
Uptown Funk - Bruno Mars [dance pop, pop]
| Seed Track | Embeat #1 | Embeat #2 | Embeat #3 |
|---|---|---|---|
| Uptown Funk - Bruno Mars | CAN'T STOP THE FEELING! - Justin Timberlake | Happy - Pharrell Williams | I Like to Move It - will.i.am |
demo_1_seed_track.mp4 |
demo_1_embeat_1.mp4 |
demo_1_embeat_2.mp4 |
demo_1_embeat_3.mp4 |
杀死那个石家庄人 - 万能青年旅店 [chinese indie rock]
| Seed Track | Embeat #1 | Embeat #2 | Embeat #3 |
|---|---|---|---|
| 杀死那个石家庄人 - 万能青年旅店 | 大石碎胸口 - 万能青年旅店 | 凄美地 - 郭顶 | 不要停止我的音乐 - 痛仰乐队 |
demo_2_seed_track.mp4 |
demo_2_embeat_1.mp4 |
demo_2_embeat_2.mp4 |
demo_2_embeat_3.mp4 |
Sis puella magica! - 梶浦由記 [anime score, japanese vgm]
| Seed Track | Embeat #1 | Embeat #2 | Embeat #3 |
|---|---|---|---|
| Sis puella magica! - 梶浦由記 | Decretum - 梶浦由記 | Zoltraak - Evan Call | Arrietty's Song - Cécile Corbel |
demo_3_seed_track.mp4 |
demo_3_embeat_1.mp4 |
demo_3_embeat_2.mp4 |
demo_3_embeat_3.mp4 |
Gizeh - Oskar Schuster [compositional ambient]
| Seed Track | Embeat #1 | Embeat #2 | Embeat #3 |
|---|---|---|---|
| Gizeh - Oskar Schuster | Vleurgat - Oskar Schuster | Sleeping Lotus - Joep Beving | Travelling - James Spiteri |
demo_4_seed_track.mp4 |
demo_4_embeat_1.mp4 |
demo_4_embeat_2.mp4 |
demo_4_embeat_3.mp4 |
Using the LLM-as-a-Judge method (GPT-5.5 / Gemini Flash 3.5 / Claude Sonnet 4.6), Embeat was blindly evaluated against Netease Cloud Music in AB tests:
| Judge Model | Embeat Wins | Netease Wins | Tie |
|---|---|---|---|
| Claude Sonnet 4.6 | 8 | 2 | 0 |
| Gemini Flash 3.5 | 9 | 1 | 0 |
| GPT 5.5 | 6 | 4 | 0 |
Conclusions:
- Embeat's core strength lies in its balance between style precision and artist diversity, with a particularly notable advantage in niche-style scenarios that span across languages and cultures
- Netease Cloud Music retains some reference value only in its deep mining of Mandarin-language local content
- For detailed comparison, please refer to the technical documentation
EmbeatMLP - Acoustic Feature Encoding Model
- Input: 64-dim discrete features (key, mode, tempo, time_signature) + 64-dim continuous features (energy, valence, danceability, etc., 7 dimensions)
- Architecture: Dual-tower MLP (Discrete Tower + Acoustic Tower -> Backbone)
- Output: 64-dim L2-normalized vectors
- Training: Masked InfoNCE Loss, batch_size=4096, converges in ~70 steps
- Extremely small parameter count, supports real-time CPU-only inference
Track2Vec - Playlist Collaborative Filtering Model
- Based on Word2Vec Skip-Gram, treating playlists as "sentences" and tracks as "words"
- Training data: 1.88M Spotify playlists
- Vocabulary: 1.09M tracks, 64-dim vectors
- Supports real-time CPU-only inference, single query latency < 200ms
Input seed track: track_id / track_name + artist_name
│
├─ Channel 1: Acoustic Similarity Recall (genre filtering + EmbeatMLP cosine similarity)
├─ Channel 2: Same-Genre Popular Recall (genre filtering + popularity ranking)
├─ Channel 3: Same Artist Recall (same artist + EmbeatMLP cosine similarity)
├─ Channel 4: Similar Artists Recall (similar artists + EmbeatMLP cosine similarity)
├─ Channel 5: Playlist Collaborative Filtering (Track2Vec cosine similarity)
│
├─ ISRC Deduplication / Re-ranking / Same-Artist Ratio Control
│
└─ Output: Top-K Recommendation List
Embeat/
├── assets/ # Static assets folder
├── checkpoints/ # Model weights folder
│ ├── EmbeatMLP/ # EmbeatMLP model weights
│ └── Track2Vec/ # Track2Vec model weights (requires separate download)
├── data/ # Data processing folder (not fully organized)
├── infer/ # Inference code folder
│ ├── Embeat.py # Embeat recommendation system core
│ ├── EmbeatUtils.py # Embeat extension utilities
│ ├── infer.py # EmbeatMLP inference entry point
│ ├── eval_infer.py # EmbeatMLP evaluation utilities
│ └── hf_to_qdrant.py # HF Dataset to Qdrant database
├── train/ # Training code folder
│ ├── model.py # EmbeatMLP model definition
│ ├── dataset.py # HF Dataset processing
│ ├── sampler.py # Positive/negative sample sampler
│ ├── loss.py # Masked InfoNCE Loss
│ ├── trainer.py # EmbeatMLP trainer
│ ├── train.py # EmbeatMLP training entry point
│ └── train_track2vec.py # Track2Vec training entry point
├── requirements.txt
└── LICENSE
- Python >= 3.10
- PyTorch >= 2.6, < 2.7 (required for training)
- CUDA >= 12.0 (required for training)
- Qdrant (required for inference)
conda create -n embeat python=3.10
conda activate embeat
# Install PyTorch (CUDA 12.x), see https://pytorch.org/get-started/previous-versions/
pip install "torch>=2.6,<2.7" --index-url https://download.pytorch.org/whl/cu126
pip install -r requirements.txt# Prepare training data in HuggingFace Dataset format under data/datasets/, rename to spotify_45m_tracks_metadata
python -m train.train \
--dataset data/datasets/spotify_45m_tracks_metadata@10000000 \
--batch-size 4096 \
--max-steps 200 \
--lr 1e-4 \
--tau 0.05 \
--ckpt-dir checkpoints# Prepare playlist training data (txt format, one playlist per line, space-separated track_ids)
cd train
python train_track2vec.pyfrom infer.infer import infer
song_a = {"key": 7, "mode": 1, "tempo": 137, "time_signature": 4,
"danceability": 0.54, "energy": 0.56, "speechiness": 0.02,
"instrumentalness": 0.0, "valence": 0.41, "acousticness": 0.23,
"liveness": 0.1}
song_b = {"key": 5, "mode": 0, "tempo": 87, "time_signature": 4,
"danceability": 0.67, "energy": 0.65, "speechiness": 0.05,
"instrumentalness": 0.03, "valence": 0.57, "acousticness": 0.27,
"liveness": 0.19}
similarity = infer(sample_a=song_a, sample_b=song_b,
checkpoint_path="checkpoints/EmbeatMLP/model.pt")
print(f"Similarity: {similarity}")# 1. Start the Qdrant service and import the database
# 2. Query recommendations via command line
cd infer
python Embeat.py -t 5pIcwtJYNJx93l420oR2Vm # Query by Spotify Track ID
python Embeat.py -s "晴天 - Jay Chou" # Query by track name and artist
python Embeat.py -a "Jay Chou" # Query by artist name- GD Music (Live Demo): https://music.gdstudio.xyz
- Bilibili: https://space.bilibili.com/13715770
- Telegram: https://t.me/gdstudio_music
| Scope | License |
|---|---|
| Code, Model Weights | MIT |
| Datasets, Database | CC-BY-NC 4.0 |

