Skip to content

gdstudio-org/Embeat

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

4 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Embeat Banner

English | 简体中文

Stars License


Embeat: A Music Recommendation System Based on Acoustic Features

Introduction

Embeat is a music recommendation system built on Spotify acoustic feature data. It encodes audio features into vectors via a contrastive learning model and combines them with a multi-channel recall strategy to deliver high-quality music recommendations.

Key Features:

  • Acoustic Similarity: The EmbeatMLP model, trained on Spotify Audio Features (key, tempo, energy, valence, etc.), encodes acoustic features into 64-dim vectors
  • Genre Awareness: Leverages 6,000+ micro-genre tags to precisely assign genres to 2M+ artists, preventing "acoustically similar but stylistically different" recommendations
  • Multi-Channel Recall: 5 parallel recall channels (Acoustic Similarity / Same-Genre Popular / Same Artist / Similar Artists / Playlist Collaborative Filtering), merged and scored for final output
  • Playlist Collaborative Filtering: Track2Vec (Word2Vec-inspired) learns track co-occurrence patterns from 1.88M Spotify playlists
  • Millisecond-Level Response: Powered by the Qdrant vector database, retrieval across 45M tracks completes in 30–100ms

Roadmap

If you find this project helpful, please give it a ⭐️. It means a lot to a personal project, thanks!

Demo

Below are example recommendation results from Embeat (please unmute before playing)

Uptown Funk - Bruno Mars [dance pop, pop]
Seed Track Embeat #1 Embeat #2 Embeat #3
Uptown Funk - Bruno Mars CAN'T STOP THE FEELING! - Justin Timberlake Happy - Pharrell Williams I Like to Move It - will.i.am
demo_1_seed_track.mp4
demo_1_embeat_1.mp4
demo_1_embeat_2.mp4
demo_1_embeat_3.mp4
杀死那个石家庄人 - 万能青年旅店 [chinese indie rock]
Seed Track Embeat #1 Embeat #2 Embeat #3
杀死那个石家庄人 - 万能青年旅店 大石碎胸口 - 万能青年旅店 凄美地 - 郭顶 不要停止我的音乐 - 痛仰乐队
demo_2_seed_track.mp4
demo_2_embeat_1.mp4
demo_2_embeat_2.mp4
demo_2_embeat_3.mp4
Sis puella magica! - 梶浦由記 [anime score, japanese vgm]
Seed Track Embeat #1 Embeat #2 Embeat #3
Sis puella magica! - 梶浦由記 Decretum - 梶浦由記 Zoltraak - Evan Call Arrietty's Song - Cécile Corbel
demo_3_seed_track.mp4
demo_3_embeat_1.mp4
demo_3_embeat_2.mp4
demo_3_embeat_3.mp4
Gizeh - Oskar Schuster [compositional ambient]
Seed Track Embeat #1 Embeat #2 Embeat #3
Gizeh - Oskar Schuster Vleurgat - Oskar Schuster Sleeping Lotus - Joep Beving Travelling - James Spiteri
demo_4_seed_track.mp4
demo_4_embeat_1.mp4
demo_4_embeat_2.mp4
demo_4_embeat_3.mp4

LLM Blind Evaluation

Using the LLM-as-a-Judge method (GPT-5.5 / Gemini Flash 3.5 / Claude Sonnet 4.6), Embeat was blindly evaluated against Netease Cloud Music in AB tests:

Judge Model Embeat Wins Netease Wins Tie
Claude Sonnet 4.6 8 2 0
Gemini Flash 3.5 9 1 0
GPT 5.5 6 4 0

Conclusions:

  • Embeat's core strength lies in its balance between style precision and artist diversity, with a particularly notable advantage in niche-style scenarios that span across languages and cultures
  • Netease Cloud Music retains some reference value only in its deep mining of Mandarin-language local content
  • For detailed comparison, please refer to the technical documentation

System Architecture

Model Details

EmbeatMLP - Acoustic Feature Encoding Model

  • Input: 64-dim discrete features (key, mode, tempo, time_signature) + 64-dim continuous features (energy, valence, danceability, etc., 7 dimensions)
  • Architecture: Dual-tower MLP (Discrete Tower + Acoustic Tower -> Backbone)
  • Output: 64-dim L2-normalized vectors
  • Training: Masked InfoNCE Loss, batch_size=4096, converges in ~70 steps
  • Extremely small parameter count, supports real-time CPU-only inference

Track2Vec - Playlist Collaborative Filtering Model

  • Based on Word2Vec Skip-Gram, treating playlists as "sentences" and tracks as "words"
  • Training data: 1.88M Spotify playlists
  • Vocabulary: 1.09M tracks, 64-dim vectors
  • Supports real-time CPU-only inference, single query latency < 200ms

Multi-Channel Recall

Input seed track: track_id / track_name + artist_name
  │
  ├─ Channel 1: Acoustic Similarity Recall (genre filtering + EmbeatMLP cosine similarity)
  ├─ Channel 2: Same-Genre Popular Recall (genre filtering + popularity ranking)
  ├─ Channel 3: Same Artist Recall (same artist + EmbeatMLP cosine similarity)
  ├─ Channel 4: Similar Artists Recall (similar artists + EmbeatMLP cosine similarity)
  ├─ Channel 5: Playlist Collaborative Filtering (Track2Vec cosine similarity)
  │
  ├─ ISRC Deduplication / Re-ranking / Same-Artist Ratio Control
  │
  └─ Output: Top-K Recommendation List

Project Structure

Embeat/
├── assets/                 # Static assets folder
├── checkpoints/            # Model weights folder
│   ├── EmbeatMLP/          # EmbeatMLP model weights
│   └── Track2Vec/          # Track2Vec model weights (requires separate download)
├── data/                   # Data processing folder (not fully organized)
├── infer/                  # Inference code folder
│   ├── Embeat.py           # Embeat recommendation system core
│   ├── EmbeatUtils.py      # Embeat extension utilities
│   ├── infer.py            # EmbeatMLP inference entry point
│   ├── eval_infer.py       # EmbeatMLP evaluation utilities
│   └── hf_to_qdrant.py     # HF Dataset to Qdrant database
├── train/                  # Training code folder
│   ├── model.py            # EmbeatMLP model definition
│   ├── dataset.py          # HF Dataset processing
│   ├── sampler.py          # Positive/negative sample sampler
│   ├── loss.py             # Masked InfoNCE Loss
│   ├── trainer.py          # EmbeatMLP trainer
│   ├── train.py            # EmbeatMLP training entry point
│   └── train_track2vec.py  # Track2Vec training entry point
├── requirements.txt
└── LICENSE

Getting Started

Requirements (recommended)

  • Python >= 3.10
  • PyTorch >= 2.6, < 2.7 (required for training)
  • CUDA >= 12.0 (required for training)
  • Qdrant (required for inference)

Installation

conda create -n embeat python=3.10
conda activate embeat

# Install PyTorch (CUDA 12.x), see https://pytorch.org/get-started/previous-versions/
pip install "torch>=2.6,<2.7" --index-url https://download.pytorch.org/whl/cu126

pip install -r requirements.txt

Train EmbeatMLP

# Prepare training data in HuggingFace Dataset format under data/datasets/, rename to spotify_45m_tracks_metadata
python -m train.train \
    --dataset data/datasets/spotify_45m_tracks_metadata@10000000 \
    --batch-size 4096 \
    --max-steps 200 \
    --lr 1e-4 \
    --tau 0.05 \
    --ckpt-dir checkpoints

Train Track2Vec

# Prepare playlist training data (txt format, one playlist per line, space-separated track_ids)
cd train
python train_track2vec.py

Inference: Compute Acoustic Similarity Between Two Tracks

from infer.infer import infer

song_a = {"key": 7, "mode": 1, "tempo": 137, "time_signature": 4,
          "danceability": 0.54, "energy": 0.56, "speechiness": 0.02,
          "instrumentalness": 0.0, "valence": 0.41, "acousticness": 0.23,
          "liveness": 0.1}

song_b = {"key": 5, "mode": 0, "tempo": 87, "time_signature": 4,
          "danceability": 0.67, "energy": 0.65, "speechiness": 0.05,
          "instrumentalness": 0.03, "valence": 0.57, "acousticness": 0.27,
          "liveness": 0.19}

similarity = infer(sample_a=song_a, sample_b=song_b,
                   checkpoint_path="checkpoints/EmbeatMLP/model.pt")

print(f"Similarity: {similarity}")

Inference: Qdrant-Based Music Recommendation

# 1. Start the Qdrant service and import the database
# 2. Query recommendations via command line
cd infer
python Embeat.py -t 5pIcwtJYNJx93l420oR2Vm   # Query by Spotify Track ID
python Embeat.py -s "晴天 - Jay Chou"   # Query by track name and artist
python Embeat.py -a "Jay Chou"   # Query by artist name

Related Links

GDMusic Embeat

Acknowledgements

License

Scope License
Code, Model Weights MIT
Datasets, Database CC-BY-NC 4.0

About

Content-based music recommendation system, training on Spotify 45M tracks & 1.8M playlists.

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages