GlotNET vocoder

An implementation GlotNET vocoder as variant of WaveNET. WaveNET is inherited from https://r9y9.github.io/wavenet_vocoder/.

The repo is quite immature, there may be defects, bugs and so on. I sincerely appreciate any feedback, suggestion or contribution. Thanks for your interest in the repo, enjoy it!

Highlights

Focus on local conditioning of GlotNet, which is essential for vocoder.
Mixture of logistic distributions loss / sampling
Various audio samples
Fast inference by caching intermediate states in convolutions. Similar to arXiv:1611.09482

Figure 1: WaveNet vocoder (left) uses acoustic features (AC) and past signal samples to generate the next speech sample. In contrast, GlotNet (right) operates on the more simplistic glottal excitation signal, which is filtered by a vocal tract (VT) filter already parametrized in the acoustic features.

Requirements

Python 3
SciPy
CUDA >= 8.0
PyTorch >= v0.4.0
TensorFlow >= v1.3

Getting started

A few notes on usage

Input data to network must be 'raw' other options 'mu-law-quantization' are for pure wavenet. I prefer not to use upsampling of conditional features. If you are eager to do so go over scripts quickly for modifications and arrange interpolation layers' factors so that factors yields 254. For now networks preprocess LJSpeech only, if you want to feed in other dataset corresponding scripts must be provided.

Preset parameters

There are many hyper parameters to be turned depends on data. For typical datasets, parameters known to work good (preset) are provided in the repository. See presets directory for details. Notice that

preprocess.py
train.py
synthesis.py

accepts --preset=<json> optional parameter, which specifies where to load preset parameters. If you are going to use preset parameters, then you must use same --preset=<json> throughout preprocessing, training and evaluation. e.g.,

python preprocess.py --preset=presets/cmu_arctic_8bit.json cmu_arctic ~/data/cmu_arctic
python train.py --preset=presets/cmu_arctic_8bit.json --data-root=./data/cmu_arctic

instead of

python preprocess.py cmu_arctic ~/data/cmu_arctic
# warning! this may use different hyper parameters used at preprocessing stage
python train.py --preset=presets/cmu_arctic_8bit.json --data-root=./data/cmu_arctic

0. Download dataset

CMU ARCTIC (en): http://festvox.org/cmu_arctic/
LJSpeech (en): https://keithito.com/LJ-Speech-Dataset/

1. Preprocessing

Usage:

python preprocess.py ${dataset_name} ${dataset_path} ${out_dir} --preset=<json>

Supported ${dataset_name}s for now are

ljspeech (single speaker)
cmu_arctic (multi speaker)

Assuming you use preset parameters known to work good for CMU ARCTIC dataset and have data in ~/data/cmu_arctic, then you can preprocess data by:

python preprocess.py cmu_arctic ~/data/cmu_arctic ./data/cmu_arctic --preset=presets/cmu_arctic_8bit.json

When this is done, you will see time-aligned extracted features (audio, glottal exciation, vocal tract filter and mel-spectrogram) in ./data/cmu_arctic.

2. Training

Note: for multi gpu training, you have better ensure that batch_size % num_gpu == 0

Usage:

python train.py --data-root=${data-root} --preset=<json> --hparams="parameters you want to override"

Important options:

--speaker-id=<n>: (Multi-speaker dataset only) it specifies which speaker of data we use for training. If this is not specified, all training data are used. This should only be specified when you are dealing with a multi-speaker dataset. For example, if you are trying to build a speaker-dependent WaveNet vocoder for speaker awb of CMU ARCTIC, then you have to specify --speaker-id=0. Speaker ID is automatically assigned as follows:

In [1]: from nnmnkwii.datasets import cmu_arctic

In [2]: [(i, s) for (i,s) in enumerate(cmu_arctic.available_speakers)]
Out[2]:

[(0, 'awb'),
 (1, 'bdl'),
 (2, 'clb'),
 (3, 'jmk'),
 (4, 'ksp'),
 (5, 'rms'),
 (6, 'slt')]

2. Training WaveNet conditioned on mel-spectrogram

python train.py --data-root=./data/cmu_arctic/ --speaker-id=0 \
    --hparams="cin_channels=80,gin_channels=-1"

3. Monitor with Tensorboard

Logs are dumped in ./log directory by default. You can monitor logs by tensorboard:

tensorboard --logdir=log

4. Synthesize glottal excitation from a checkpoint

Usage:

python synthesis.py ${checkpoint_path} ${output_dir} --preset=<json> --hparams="parameters you want to override"

Important options:

--length=<n>: (Un-conditional WaveNet only) Number of time steps to generate.
--conditional=<path>: (Required for onditional WaveNet) Path of local conditional features (.npy). If this is specified, number of time steps to generate is determined by the size of conditional feature.

e.g.,

python synthesis.py --hparams="parameters you want to override" \
    checkpoints_awb/checkpoint_step000100000.pth \
    generated/test_awb \
    --conditional=./data/cmu_arctic/cmu_arctic-mel-00001.npy

5. Post Process

Combine vocal tract and glottal excitation generated by GlotNET to get the speech item.

Usage:

python postprocess.py ${vocal-tract_path} ${glot_path} ${output_dir} --preset=<json> --hparams="parameters you want to override"

Name		Name	Last commit message	Last commit date
Latest commit History 71 Commits
presets		presets
tests		tests
wavenet_vocoder		wavenet_vocoder
1804.09593.pdf		1804.09593.pdf
LICENSE		LICENSE
MANIFEST.in		MANIFEST.in
README.md		README.md
audio.py		audio.py
dump_hparams_to_json.py		dump_hparams_to_json.py
evaluate.py		evaluate.py
gnvswn.png		gnvswn.png
hparams.py		hparams.py
jsut.py		jsut.py
librivox.py		librivox.py
ljspeech.py		ljspeech.py
lrschedule.py		lrschedule.py
postprocess.py		postprocess.py
preprocess.py		preprocess.py
synthesis.py		synthesis.py
train.py		train.py

License

mertcokluk/GlotNET

Folders and files

Latest commit

History

Repository files navigation

GlotNET vocoder

Highlights

Requirements

Getting started

A few notes on usage

Preset parameters

0. Download dataset

1. Preprocessing

2. Training

2. Training WaveNet conditioned on mel-spectrogram

3. Monitor with Tensorboard

4. Synthesize glottal excitation from a checkpoint

5. Post Process

References

About

Topics

Resources

License

Stars

Watchers

Forks

Languages