✨ SuperVoice Vocoder

Easy to use SOTA vocoder for speech synthesis from Mel Spectograms.

Description

This repository contains easy to use SOTA vocoder that is uses BigVSAN original pre-trained weights with removed weight normalization and most of the code, but built to be used in a plug and play fashion. This model is tailored to a specific Mel-Spectogram parameters which is most common for voice synthesis tasks.

Model: BigVSAN
Sample rate: 24000 Hz
Mel Spectogram:
- Mel Number: 100
- Number of FFT: 1024
- Hop Length: 256
- Window Length: 1024
- Norm: slaney
- Scale: slaney
- Power: 1.0 aka amplitude spectogram
- Center: true
- Padding: reflect

Evaluation

To evaluate model you can use evaluation notebook which can run anywhere where torch and torchaudio are installed.

How to use

This model is available using Torch Hub:

# Load model
bigvsan = torch.hub.load(repo_or_dir='ex3ndr/supervoice-vocoder', model='bigvsan')

# Source mel spectogram
spec = torch.randn(100, 1234)

# Synthesized audio
audio = model.generate(spec)

License

MIT

Name		Name	Last commit message	Last commit date
Latest commit History 12 Commits
supervoice_vocoder		supervoice_vocoder
.gitignore		.gitignore
README.md		README.md
eval.ipynb		eval.ipynb
export.py		export.py
hubconf.py		hubconf.py
resynth.wav		resynth.wav
sample.wav		sample.wav

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

✨ SuperVoice Vocoder

Description

Evaluation

How to use

License

About

Releases

Packages

Languages

ex3ndr/supervoice-vocoder

Folders and files

Latest commit

History

Repository files navigation

✨ SuperVoice Vocoder

Description

Evaluation

How to use

License

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages