Siamese Networks to Evaluate Vocal Imitations

A program that trains up a siamese neural network that evaluates the likelihood that a vocal imitation is an imitation of a reference recording. Offers options to train with pairwise and triplet loss functions, as well as a variety of utilities that allow the user to perform replicable and reviewable trials.

Prerequisites

All the software you need to get started is a python 3 installation and pip. If you want to train on the GPU, you also need to install CUDA.

Installation

Clone the master branch of repository and install the dependencies (you should probably do this in a virtual environment):

pip install -r requirements.txt

Finally, fill in the values in example_config.yaml and rename it to config.yaml.

Training

To see all training options:

python train.py -h

An example call that trains both pairwise and triplet loss models on 10 categories, using a fresh data split and set of weights, for 300 epochs and 20 trials, using CUDA:

python train.py -c -t -p -e 300 -tr 20 -rs -rw

Datasets

This program can interact with two datasets released by the Interactive Audio Lab.

Vocal Imitation contains ~10 sound each of 302 sound concepts from Google's AudioSet ontology. One of these references, deemed the "canonical" reference, has (on average) 18 vocal imitations of that sound.
- Zenodo download link
- Because this dataset includes negative fine-grain pairs (e.g. an imitation of a dog barking paired with a different recording of a dog barking than the one that was imitated), it enables us to train using triplet loss.
Vocal Sketch contains 240 references with (on average) 28 imitations of each reference sound. Two versions of this dataset are available and supported; version 1.0 has the same amount of references but fewer imitations per reference and is a subset of version 1.1.
- Zenodo download link, v1.1
- Zenodo download link, v1.0
- This dataset cannot be used for training with triplet loss.

Citation

TODO: insert paper citation

Contact

Contact Brian Margolis (BrianMargolis2019 [at] u.northwestern.edu) with any questions regarding this work.

Name		Name	Last commit message	Last commit date
Latest commit History 294 Commits
data_files		data_files
data_partitions		data_partitions
data_sets		data_sets
experiments		experiments
models		models
scripts		scripts
utils		utils
.gitignore		.gitignore
README.md		README.md
evaluation.py		evaluation.py
example_config.yaml		example_config.yaml
requirements.txt		requirements.txt
train.py		train.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Siamese Networks to Evaluate Vocal Imitations

Prerequisites

Installation

Training

Datasets

Citation

Contact

About

Releases

Packages

Languages

interactiveaudiolab/Siamese-Vocal-Imitations

Folders and files

Latest commit

History

Repository files navigation

Siamese Networks to Evaluate Vocal Imitations

Prerequisites

Installation

Training

Datasets

Citation

Contact

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages