GitHub - mmgyorke/cae-invar-wav: Learning Complex Basis Functions for Invariant Signal Representations with the Complex Autoencoder

Complex Autoencoder for Learning Invariant Signal Representations

PyTorch implementation of the Complex Autoencoder proposed in

Learning Complex Basis Functions for Invariant Representations of Audio [pdf]
Stefan Lattner¹, Andreas Arzt², Monika Dörfler³, 2019
20th International Society for Music Information Retrieval Conference, ISMIR 2019, Delft, The Netherlands
(best paper award winner)

¹Sony CSL Paris
²Institute of Computational Perception, JKU Linz
³Faculty of Mathematics, University of Vienna

Outline

Prerequisites
Quick Start
Training for Audio (CQT Representation)
Convert Audio Files to Invariant Features
Extract Repeated Themes and Sections
1D Signal Experiment
Rotated MNIST Experiment

Prerequisites

PyTorch
Librosa
ConfigObj
Pillow
Numpy
Scipy
Matplotlib
Scikit-Learn
Torchvision
Tqdm

To install (update) the requirements using pip, run:

  pip install -r requirements.txt

Audio Experiments (CQT)

Quick Start

Edit the text file with name filelist_audio.txt, or keep the default entries (not recommended - too little data). The file should be in the form

filelist_audio.txt:

data/audio1.wav
data/audio2.wav
data/audio3.wav

Save the file in the root folder of the project. To reproduce the paper results, we recommend ~3 hours of polyphonic piano music [download].

Run all steps for repeated section discovery (training, conversion, motive extraction, evaluation) using the following command (choose a run_keyword):

  ./run_batch_audio.sh run_keyword

Note that the files in filelist_audio.txt are only used for training, while motive extraction and evaluation is performed on the JKUPDD dataset.

Training for Audio (CQT Representation)

This experiment yields 2D Complex Basis Functions which resemble 2D Fourier components.

Create a text file (or use existing filelist_audio.txt), which lists audio files to use for training, as shown in the section above.
Start training using the following command (choose a run_keyword, add --help to list all parameters):

  python train.py run_keyword filelist_audio.txt config_cqt.ini

Note that file preprocessing will be cached. Thus, when changing data-related parameters or when modifying the content of filelist_audio.txt, use the flag --refresh-cache. An experiment folder output/run_keyword will be created, where all files regarding the experiment (including plots and parameters of the trained network) will be placed.

Convert Audio Files to Invariant Features

Create a text file (or use existing filelist_audio.txt) which lists the audio files to convert (see Section Quick Start).

2a. Convert files, run (using the same run_keyword as in the training):

  python convert.py run_keyword filelist_audio.txt config_cqt.ini

The converted files will be saved (bz2 compressed pickle) in the experiment folder output/run_keyword. A method to load the compressed files as numpy arrays can be found in complex_auto/utils.py -> load_pyc_bz(filename).

2b. In order to also create a self-similarity matrix for each audio file, run

  python convert.py run_keyword filelist_audio.txt config_cqt.ini --self-sim-matrix

The self-similarity matrices are saved in the output/run_keyword folder together with a file ss_matrices_filelist.txt, which lists the paths to the stored matrices (used by the motive extractor in the next step).

Extract Repeated Themes and Sections

The motive extractor reads the file ss_matrices_filelist.txt as generated in the step 2b above. Ensure that you performed that step and also to use the same run_keyword for all steps.

Run the motive extractor using

python extract_motives.py run_keyword -r 2 -th 0.01 -csv jku_csv_files.txt

where depending on the CAE training and the dataset used, different -r and -th values may lead to better results. The found patterns are then written to .segraw files into the folder output/run_keyword. The evaluation using extract_motives_eval.py together with extract_motives.py is currently only implemented for the JKUPDD dataset, as groundtruth annotations and bpm information have to be available. In order to test the full pipeline, see Section Quick Start.

1D Signal Experiment

Training for Audio (Time Domain)

This experiment yields 1D Complex Basis Functions which resemble complex Gabor-like filters.

Create a text file (or use existing filelist_signal.txt) which lists audio files to convert (see Section Quick Start). Here, only one audio file is sufficient (when all frequencies are present).
Start training using the following command (choose a run_keyword, add --help to list all parameters):

python train.py run_keyword filelist_signal.txt config_signal1D.ini

All results can be found in output/run_keyword.

Rotated MNIST Experiment

Training for Rotated MNIST

This experiment yields basis functions representing the complex Eigenvectors of rotation.

Download the rotated MNIST dataset using this link and place it in the data folder: ./data/mnist_rot.pyc.bz.
Start training using the following command (choose a run_keyword, add --help to list all parameters):

python train.py run_keyword no_filelist config_mnist.ini

All results can be found in output/run_keyword.

Name		Name	Last commit message	Last commit date
Latest commit History 29 Commits
complex_auto		complex_auto
data		data
images		images
.gitignore		.gitignore
MIT-LICENSE.txt		MIT-LICENSE.txt
README.md		README.md
config_cqt.ini		config_cqt.ini
config_mnist.ini		config_mnist.ini
config_signal1D.ini		config_signal1D.ini
config_spec.cfg		config_spec.cfg
convert.py		convert.py
extract_motives.py		extract_motives.py
extract_motives_eval.py		extract_motives_eval.py
filelist_audio.txt		filelist_audio.txt
filelist_signal.txt		filelist_signal.txt
jku_csv_files.txt		jku_csv_files.txt
jku_input_audio.txt		jku_input_audio.txt
requirements.txt		requirements.txt
run_batch_audio.sh		run_batch_audio.sh
train.py		train.py

License

mmgyorke/cae-invar-wav

Folders and files

Latest commit

History

Repository files navigation

Complex Autoencoder for Learning Invariant Signal Representations

Outline

Prerequisites

Audio Experiments (CQT)

Quick Start

Training for Audio (CQT Representation)

Convert Audio Files to Invariant Features

Extract Repeated Themes and Sections

1D Signal Experiment

Training for Audio (Time Domain)

Rotated MNIST Experiment

Training for Rotated MNIST

About

Resources

License

Stars

Watchers

Forks

Languages