CLARA: Multilingual Contrastive Learning for Audio Representation Acquisition

Overview

CLARA is designed for multilingual audio representation through a contrastive learning approach. Our aim is to develop a shared representation for various languages and acoustic scenarios. We leverage a rich multilingual audio-text dataset, augmented for diversity. With CLARA, we focus on building a comprehensive model for speech, targeting emotion detection, sound categorisation, and cross-modal retrieval in both zero-shot and few-shot settings. The results demonstrate its potential for universal speech representation that is adaptable to new languages and tasks, minimising reliance on labelled data and enhancing cross-lingual adaptability.

Note: This project is in active development. Contributions are encouraged and welcomed.

Models

We will provide our models for all to use, ready to download from Huggingface. Additionally, we provide models fine-tuned on specific datasets, ensuring optimised performance for specialized tasks. Below, you'll find an organised listing of our base models and their fine-tuned counterparts, complete with download links for each.

Size	Parameters	Model Download
small	# M	x
medium	109 M	x
large	# M	x

Finetuned model of varous datasets

FineTuned	Base Model	Model Download
AudioSet	medium	x
Crema-D	medium	x
MSWC	medium	x

If you've fine-tuned CLARA on your dataset and wish to feature it here, please contact us.

Installation

Clone the repository:

# clone CLARA   
git clone https://github.com/knoriy/CLARA.git
cd CLARA

Conda

Create a conda environment:

# Create conda env
conda env create -f environments/env.yaml

Docker

Build and run the container (Nvidia Docker required for GPU):

docker build --no-cache ./environments/ -t knoriy/clara
docker run -it --rm --gpus=all -v $(pwd):/workspace --name clara knoriy/clara

By default the container starts a juypter notebook, to start the container in interactive mode, use:

docker run -it --rm --gpus=all -v $(pwd):/workspace --name clara knoriy/clara bash

Pip

Note: This has not been fully tested. If you find any issue please open an issue, with code to replicate the problem.

This CLARA is setup as a package which means you can now easily import any file into any other file, like so:

pip install git+https://github.com/knoriy/CLARA.git

Train model

CLARA is built upon pytorch-lightning (PL). For guidance, please refer to the PL CLI documentation.

For a list of all parameters, you can use the following command:

python clara/train.py fit --help

To fit and train the model on your own data,

python clara/train.py fit \
    --trainer path/to/trainer_config.yml \
    --model path/to/model_config.yml \
    --data path/to/data_config.yml

We provide some default config files for training CLARA --data.root_data_path should be used to direct to tar sharded dataset, this follows the format of webdataset. We currently support locally stored data and those stored on aws S3.

python clara/train.py fit \
    --config ./config/config/base.yaml \
    --trainer ./config/config/trainer/base.yaml \
    --model ./config/config/model/pl_clara_100M.yaml \
    --data ./config/config/data/base.yaml \
    --data.root_data_path path/to/dataset/ \
    --data.num_workers 6 \
    --data.batch_size 6 \
    --data.dataset_list ./config/dataset_list.txt \
    --trainer.logger.name clara_100M_FT_RAV \

Eval

Supported Tasks and Datasets

This project facilitates various audio classification tasks, namely:

Emotion
Gender
Sounds
Speech

Currently, we extend support to the following datasets for each task:

Sounds Classification:

ESC50
AudioSet
US8K
FSD50K

Emotion Classification:

EMNS
EmoV-DB
CREMA-D
RAVDESS

Speech Classification:

MSWC

Utilise these datasets to perform nuanced audio classification across various domains, enhancing your model's understanding and predictive capabilities.

Zeroshot

python clara/eval/test_zeroshot.py \
--model_path path/to/checkpoint.ckpt \
--task emotion \
--dataset_name ravdess \
--root_cfg_path ./config/

Retrieval

python clara/eval/test_retrieval.py \
--model_path path/to/checkpoint.ckpt \
--task sounds \
--dataset_name audioset \
--root_cfg_path ./config/

Citation

@article{noriy_clara:_2023,
  title = {{CLARA}: {Multilingual} {Contrastive} {Learning} for {Audio} {Representation} {Acquisition}},
  shorttitle = {{CLARA}},
  author = {Noriy, Kari A. and Yang, Xiaosong and Budka, Marcin and Zhang, Jian Jun},
  note = {arXiv:2310.11830 [cs, eess]},
  url = {http://arxiv.org/abs/2310.11830},
  doi = {10.48550/arXiv.2310.11830},
  year = {2023}
}

Name		Name	Last commit message	Last commit date
Latest commit History 672 Commits
.github/workflows		.github/workflows
clara		clara
config		config
environments		environments
img		img
scripts		scripts
tests		tests
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
setup.cfg		setup.cfg
setup.py		setup.py
submit.sh		submit.sh

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

CLARA: Multilingual Contrastive Learning for Audio Representation Acquisition

Overview

Models

Finetuned model of varous datasets

Installation

Conda

Docker

Pip

Train model

Eval

Supported Tasks and Datasets

Sounds Classification:

Emotion Classification:

Speech Classification:

Zeroshot

Retrieval

Citation

About

Releases

Packages

Contributors 2

Languages

License

knoriy/CLARA

Folders and files

Latest commit

History

Repository files navigation

CLARA: Multilingual Contrastive Learning for Audio Representation Acquisition

Overview

Models

Finetuned model of varous datasets

Installation

Conda

Docker

Pip

Train model

Eval

Supported Tasks and Datasets

Sounds Classification:

Emotion Classification:

Speech Classification:

Zeroshot

Retrieval

Citation

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Contributors 2

Languages

Packages