FloWaveNet : A Generative Flow for Raw Audio

Unofficial tensorflow implementation of the paper "FloWaveNet : A Generative Flow for Raw Audio".

Requirements

Python 3.5
tensorflow 1.12
Librosa

How to use

Download the LJ-Speech dataset and unpack it:

>>> tar -xvf LJSpeech-1.1.tar.bz2

Preprocess dataset using the following command:

>>> python3 preprocessing.py --in_dir=LJSpeech-1.1 --out_dir=training_data

Run training:

>>> python3 train.py

Features

Implemented Multig-gpu training
Added Global condition features
Mixed precision training

With mixed precision training (enabled by default) the model can be trained for 7.5 days on a single GPU with 11Gb RAM. To use float32 training set dtype=tf.float32 and scale=1. in hparams.py.

Several examples of synthesis can be found here.

Todo list

Learning rate and batch size tuning for efficient multi-GPU training

Reference

Official pytorch implementation: https://github.com/ksw0306/FloWaveNet

Name		Name	Last commit message	Last commit date
Latest commit History 50 Commits
examples		examples
png		png
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
convolutional.py		convolutional.py
dataset.py		dataset.py
example.ipynb		example.ipynb
hparams.py		hparams.py
hparams8000.py		hparams8000.py
model.py		model.py
modules.py		modules.py
prepare_tacotron_result.ipynb		prepare_tacotron_result.ipynb
preprocessing.py		preprocessing.py
requirements.txt		requirements.txt
synthesize.py		synthesize.py
tfrecord.py		tfrecord.py
train.py		train.py
utils.py		utils.py

License

ryhorv/tf-flowavenet

Folders and files

Latest commit

History

Repository files navigation

FloWaveNet : A Generative Flow for Raw Audio

Requirements

How to use

Features

Todo list

Reference

About

Topics

Resources

License

Stars

Watchers

Forks

Languages