Skip to content

gladia-research-group/cocola

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

17 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

COCOLA

Introduction

This is the official repository COCOLA: Coherence-Oriented Contrastive Learning of Musical Audio Representations. The code for the CompoNet baseline can be found at https://github.com/EmilianPostolache/componet.

Installation

Create virtual environment (Optional)

conda create --name cocola python=3.11
conda activate cocola

Install dependencies

pip install -r requirements.txt

Install datasets

If you wish to use MoisesDB for training/validation/test, download it from the official website and unzip it inside ~/moisesdb_contrastive. The other datasets (CocoChorales, Slakh2100, Musdb) are automatically downoladed and extracted by the respective PyTorch Datasets.

Usage

This project uses LightningCLI. For info about usage:

python main.py --help

For info about subcommands usage:

python main.py fit --help
python main.py validate --help
python main.py test --help
python main.py predict --help

You can pass a YAML config file as command line argument instead of specifying each parameter in the command:

python main.py fit --config path/to/config.yaml

See configs for examples of cofig files.

Example: Training a contrastive model on CocoChorales + MoisesDB + Slakh2100

python main.py fit --config configs/train_all_submixtures_efficientnet.yaml

Pretrained Models

Model Checkpoint Train Dataset Train Config Description
coco_submixtures_efficientnet_bilinear CocoChorales configs/train_coco_submixtures_efficientnet.yaml COCOLA model trained on CocoChorales dataset using EfficientNet as embedding model and Bilinear Similarity as similarity measure. Submixtures of stems are used during training, with 5 seconds at 16000 kHz audio examples.
all_submixtures_efficientnet_bilinear CocoChorales + Slakh2100 + MoisesDB configs/train_all_submixtures_efficientnet.yaml COCOLA model trained on CocoChorales, Slakh2100 and MoisesDB datasets using EfficientNet as embedding model and Bilinear Similarity as similarity measure. Submixtures of stems are used during training, with 5 seconds at 16000 kHz audio examples.

Example: calculating COCOLA Score using a pretrained model

from contrastive_model.contrastive_model import CoCola

model = CoCola.load_from_checkpoint("/path/to/checkpoint.ckpt")

model.eval()

similarities = model(x)

where x is like:

x = {
    "anchor": torch.randn(batch_size, 1, 16000*5, dtype=torch.float32), # 5 seconds, 16000 kHz
    "positive": torch.randn(batch_size, 1, 16000*5, dtype=torch.float32) # 5 seconds, 16000 kHz
}

If batch_size is 1, model(x) returns the COCOLA Score between x["anchor"] and x["positive"].

Troubleshooting

CocoChorales Dataset

Remove string_track001353 from the train split as one stem contains less frames than the other ones.

About

No description, website, or topics provided.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages