Autovocoder: Fast Waveform Generation from a Learned Speech Representation using Differentiable Digital Signal Processing

Unofficial Pytorch implementation of Autovocoder: Fast Waveform Generation from a Learned Speech Representation using Differentiable Digital Signal Processing. This repository is based on iSTFTNet github (Paper).

Disclaimer : This repo is built for testing purpose.

Training :

python train.py --config config.json

In train.py, change --input_wavs_dir to the directory of LJSpeech-1.1/wavs.
In config.json, change latent_dim for AV128, AV192, and AV256 (Default).
Considering Section 3.3, you can select dec_istft_input between cartesian (Default), polar, and both.

Note:

Validation loss of AV256 during training.
In our test, it converges almost 3X times faster than HiFi-V1 (referring to the official repo).

Citations :

@article{Webber2022AutovocoderFW,
  title={Autovocoder: Fast Waveform Generation from a Learned Speech Representation using Differentiable Digital Signal Processing},
  author={Jacob J. Webber and Cassia Valentini-Botinhao and Evelyn Williams and Gustav Eje Henter and Simon King},
  journal={ArXiv},
  year={2022},
  volume={abs/2211.06989}
}

Name		Name	Last commit message	Last commit date
Latest commit History 16 Commits
LJSpeech-1.1		LJSpeech-1.1
.gitignore		.gitignore
AutoVocoder.jpeg		AutoVocoder.jpeg
AutoVocoder_validation.jpg		AutoVocoder_validation.jpg
LICENSE		LICENSE
README.md		README.md
complexdataset.py		complexdataset.py
config.json		config.json
env.py		env.py
inference.py		inference.py
inference_e2e.py		inference_e2e.py
models.py		models.py
requirements.txt		requirements.txt
stft.py		stft.py
train.py		train.py
utils.py		utils.py

License

shaun95/AutoVocoder

Folders and files

Latest commit

History

Repository files navigation

Autovocoder: Fast Waveform Generation from a Learned Speech Representation using Differentiable Digital Signal Processing

Training :

Note:

Citations :

References:

About

Resources

License

Stars

Watchers

Forks

Languages