Skip to content

knoriy/CLARA

Repository files navigation

CLARA: Multilingual Contrastive Learning for Audio Representation Acquisition

Arxiv CI testing

Overview

CLARA is designed for multilingual audio representation through a contrastive learning approach. Our aim is to develop a shared representation for various languages and acoustic scenarios. We leverage a rich multilingual audio-text dataset, augmented for diversity. With CLARA, we focus on building a comprehensive model for speech, targeting emotion detection, sound categorisation, and cross-modal retrieval in both zero-shot and few-shot settings. The results demonstrate its potential for universal speech representation that is adaptable to new languages and tasks, minimising reliance on labelled data and enhancing cross-lingual adaptability.

Note: This project is in active development. Contributions are encouraged and welcomed.

Models

We will provide our models for all to use, ready to download from Huggingface. Additionally, we provide models fine-tuned on specific datasets, ensuring optimised performance for specialized tasks. Below, you'll find an organised listing of our base models and their fine-tuned counterparts, complete with download links for each.

Size Parameters Model Download
small # M x
medium 109 M x
large # M x

Finetuned model of varous datasets

FineTuned Base Model Model Download
AudioSet medium x
Crema-D medium x
MSWC medium x

If you've fine-tuned CLARA on your dataset and wish to feature it here, please contact us.

Installation

Clone the repository:

# clone CLARA   
git clone https://github.com/knoriy/CLARA.git
cd CLARA

Conda

Create a conda environment:

# Create conda env
conda env create -f environments/env.yaml

Docker

Build and run the container (Nvidia Docker required for GPU):

docker build --no-cache ./environments/ -t knoriy/clara
docker run -it --rm --gpus=all -v $(pwd):/workspace --name clara knoriy/clara

By default the container starts a juypter notebook, to start the container in interactive mode, use:

docker run -it --rm --gpus=all -v $(pwd):/workspace --name clara knoriy/clara bash

Pip

Note: This has not been fully tested. If you find any issue please open an issue, with code to replicate the problem.

This CLARA is setup as a package which means you can now easily import any file into any other file, like so:

pip install git+https://github.com/knoriy/CLARA.git

Train model

CLARA is built upon pytorch-lightning (PL). For guidance, please refer to the PL CLI documentation.

For a list of all parameters, you can use the following command:

python clara/train.py fit --help

To fit and train the model on your own data,

python clara/train.py fit \
    --trainer path/to/trainer_config.yml \
    --model path/to/model_config.yml \
    --data path/to/data_config.yml

We provide some default config files for training CLARA --data.root_data_path should be used to direct to tar sharded dataset, this follows the format of webdataset. We currently support locally stored data and those stored on aws S3.

python clara/train.py fit \
    --config ./config/config/base.yaml \
    --trainer ./config/config/trainer/base.yaml \
    --model ./config/config/model/pl_clara_100M.yaml \
    --data ./config/config/data/base.yaml \
    --data.root_data_path path/to/dataset/ \
    --data.num_workers 6 \
    --data.batch_size 6 \
    --data.dataset_list ./config/dataset_list.txt \
    --trainer.logger.name clara_100M_FT_RAV \

Eval

Supported Tasks and Datasets

This project facilitates various audio classification tasks, namely:

  • Emotion
  • Gender
  • Sounds
  • Speech

Currently, we extend support to the following datasets for each task:

Sounds Classification:

  • ESC50
  • AudioSet
  • US8K
  • FSD50K

Emotion Classification:

  • EMNS
  • EmoV-DB
  • CREMA-D
  • RAVDESS

Speech Classification:

  • MSWC

Utilise these datasets to perform nuanced audio classification across various domains, enhancing your model's understanding and predictive capabilities.

Zeroshot

python clara/eval/test_zeroshot.py \
--model_path path/to/checkpoint.ckpt \
--task emotion \
--dataset_name ravdess \
--root_cfg_path ./config/

Retrieval

python clara/eval/test_retrieval.py \
--model_path path/to/checkpoint.ckpt \
--task sounds \
--dataset_name audioset \
--root_cfg_path ./config/

Citation

@article{noriy_clara:_2023,
  title = {{CLARA}: {Multilingual} {Contrastive} {Learning} for {Audio} {Representation} {Acquisition}},
  shorttitle = {{CLARA}},
  author = {Noriy, Kari A. and Yang, Xiaosong and Budka, Marcin and Zhang, Jian Jun},
  note = {arXiv:2310.11830 [cs, eess]},
  url = {http://arxiv.org/abs/2310.11830},
  doi = {10.48550/arXiv.2310.11830},
  year = {2023}
}

About

No description, website, or topics provided.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published