EnCodecMAE: Leveraging neural codecs for universal audio representation learning

This is EnCodecMAE, an audio feature extractor pretrained with masked language modelling to predict discrete targets generated by EnCodec, a neural audio codec. For more details about the architecture and pretraining procedure, read the paper.

Updates:

2024/5/23 Updated paper in arxiv. New models with better performance across all downstream tasks are available for feature extraction. Code for older version is here
2024/2/29 New code to go from encodecmae to the waveform domain, with pretrained generative audio models from this paper.
2024/2/14 Leveraging Pre-Trained Autoencoders for Interpretable Prototype Learning of Music Audio was accepted to ICASSP 2024 XAI Workshop.
2023/10/23 Prompting for audio generation.

Usage

Feature extraction using pretrained models

Try our example Colab notebook or

1) Clone the EnCodecMAE library:

git clone https://github.com/habla-liaa/encodecmae.git

2) Install it:

cd encodecmae
pip install -e .

3) Extract embeddings in Python:

from encodecmae import load_model

model = load_model('mel256-ec-base_st', device='cuda:0')
features = model.extract_features_from_file('gsc/bed/00176480_nohash_0.wav')

Pretrain your models

1) Install docker and docker-compose in your system. You'll also need to install nvidia-container toolkit to access GPUs from a docker container.

2) Execute the start_docker.sh script

First, docker-compose.yml has to be modified. In the volumes section, change the routes to the ones in your system. You'll need a folder called datasets with the following subfolders:

audioset_24k/unbalanced_train
fma_large_24k
librilight_med_24k

All the audio files need to be converted to a 24kHz sampling rate.

You might also modify the device_ids if you have a different number of gpus.

Then, run:

chmod +x start_docker.sh
./start_docker.sh

This will build the encodecmae image, start a container using docker compose, and attach to it.

3) Install the encodecmae package inside the container

cd workspace/encodecmae
pip install -e .

4) Run the training script

chmod +x scripts/run_pretraining.sh
scripts/run_pretraining.sh

The training script uses my own library for executing pipelines configured with gin: ginpipe. By modifying the config files (with .gin extension), you can control aspects of the training and the model configuration. I plan to explain my approach to ML pipelines, and how to use gin and ginpipe in a future blog article. Stay tuned!

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

README.md

README.md

EnCodecMAE: Leveraging neural codecs for universal audio representation learning

Updates:

Usage

Feature extraction using pretrained models

Try our example Colab notebook or

1) Clone the EnCodecMAE library:

2) Install it:

3) Extract embeddings in Python:

Pretrain your models

1) Install docker and docker-compose in your system. You'll also need to install nvidia-container toolkit to access GPUs from a docker container.

2) Execute the start_docker.sh script

3) Install the encodecmae package inside the container

4) Run the training script

Files

README.md

Latest commit

History

README.md

File metadata and controls

EnCodecMAE: Leveraging neural codecs for universal audio representation learning

Updates:

Usage

Feature extraction using pretrained models

Try our example Colab notebook or

1) Clone the EnCodecMAE library:

2) Install it:

3) Extract embeddings in Python:

Pretrain your models

1) Install docker and docker-compose in your system. You'll also need to install nvidia-container toolkit to access GPUs from a docker container.

2) Execute the start_docker.sh script

3) Install the encodecmae package inside the container

4) Run the training script