ncnn-hifi-GAN

VULKAN support ...

HiFi-GAN - GAN-based high-speed Neural Vocoder for Efficient and High Fidelity Speech Synthesis in TTS pipeline and Realistic Voice Conversion.

HiFi-GAN has improved the shortcomings of poor voice quality in previous GAN-based works.

The experimental results prove that HiFi-GAN can generate 22.05 kHz speech 13.4 times faster than autoregressive models.

In TTS based on deep learning, there are two stages to generate speech from text:

generate mel-spec from text, typically such as Tacotron and FastSpeech ,
generate speech from mel-spec, such as WaveNet and WaveRNN .

The performance of WaveNet is almost the same as that of human speech, but the generation speed is too slow. Recently, GAN-based Vocoder, such as MelGAN, tries to further increase the speed of speech generation. However, this type of model sacrifices quality while improving efficiency. Therefore, researchers hope to have a Vocoder with both efficiency and quality, this is HiFi-GAN.

output.mp4

How to use.

Download model hifivoice and place it in /models folder.
hifivoice.exe -i melgram_flipped.jpg
The input range of the mel-spectrogram for the vocoder is approximately from -11 to 2. For example, we take a mel-spectrogram saved in a regular jpg file with a magnitude range of 0..255. To use mel-spectrogram from a picture, the values need to be scaled. Mel_Image = Mel_Image * (1/255) * 13 - 11 = we get a range of values from -11 to 2.
Input Mel spectrogram paramters:
- n_fft = 1024
- num_mels = 80
- sampling_rate = 22050
- hop_size = 256
- win_size = 1024
- fmin = 0
- fmax = 8000

NCNN is a high-performance neural network.

HiFi-GAN Generative Adversarial Networks for Efficient and High Fidelity Speech Synthesis.

Name		Name	Last commit message	Last commit date
Latest commit History 56 Commits
Python		Python
include		include
models		models
shader		shader
CMakeLists.txt		CMakeLists.txt
Convolution1D_vulkan.cpp		Convolution1D_vulkan.cpp
Convolution1D_vulkan.h		Convolution1D_vulkan.h
README.md		README.md
hifivoice.cpp		hifivoice.cpp
main.cpp		main.cpp
melgram_flipped.jpg		melgram_flipped.jpg

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

ncnn-hifi-GAN

How to use.

About

Releases 1

Packages

Languages

magicse/ncnn-hifi-GAN

Folders and files

Latest commit

History

Repository files navigation

ncnn-hifi-GAN

How to use.

About

Resources

Stars

Watchers

Forks

Releases 1

Packages 0

Languages

Packages