Nvidia Riva support #321

kfatehi · 2023-04-29T06:20:53Z

This PR adds support for Nvidia Riva. The Riva stack serves a gRPC service, which seemed non-trivial (impossible?) to interface with from the extension environment, so in this implementation I am relying on a companion web service that conforms an HTTP GET request similar to that used for IBM Watson for riva in order to return the desired ogg file.

Audio Samples

riva-english-female-1.webm

riva-english-male-1.webm

Screenshots

kfatehi · 2023-04-29T06:28:33Z

Converted it to a draft because I forgot to implement the prosody features, which Riva supports, so it's just a matter of piping them through the proxy.

But would this PR even be approved? I think it's important to support Riva considering it's a fully offline, yet very high quality solution. One just need to have the GPU power to run the models. By default, it seems to consume around 13 GB of GPU memory to run the default Riva stack... so I can only do this because I'm using an Nvidia 4090... Still pretty cool though.

kfatehi · 2023-04-29T06:30:32Z

I would also want to reduce the blind code-copying I did of the IBM engine, prior to this being merged, but I wanted to get it out there because it does work and I want to see if there is interest for me to complete it to a higher standard of quality. As it stands now it works for me though.

ken107 · 2023-04-29T10:27:18Z

I'll merge it. This is indeed very cool. Even though most users right now won't have the hardware needed to run it locally, the idea of having an offline next-gen TTS voice that's free to use is the future we'd like to get to. When the time comes, perhaps the TTS subsystem can be deployed as a native app and extensions communicate with it via native messaging. (For reference, https://docs.nvidia.com/deeplearning/riva/user-guide/docs/tutorials-tts-contents.html)

kfatehi · 2023-04-29T16:27:09Z

Ok great. I will clean things up, implement prosody control, and notify you when it's ready for review.

What do you mean by native messaging? I am not too thrilled about interjecting a proxy between the extension and Riva, but I see I cannot use TCP directly in chrome extension API ? https://developer.chrome.com/docs/extensions/reference/sockets_tcp/

Even if we were willing to add the complexity of gRPC into the extension, is it even allowed, or must everything be HTTP? You made me curious with your point about native messaging.

I think it'd be better to eliminate that proxy and just let people connect directly to the Riva stack, as that alone is hard enough to setup for a novice, which I'll summarize below:

The link you provided goes to their table of contents which I basically followed in order to get the Riva stack running on my gaming computer which is running Windows 11, Docker Desktop, and WSL2 with Ubuntu, which is needed in order to run the bash scripts for controlling the stack which runs inside Docker with GPU support (a relatively new WSL2 feature).

ken107 · 2023-04-29T23:08:36Z

Well, at this time I think only a few users will be able to use this feature, so it's basically experimental, it doesn't have to be perfect, as long as it doesn't affect existing functionalities. So you don't need to implement that prosody feature, or leave it for later it's fine. And using a proxy is fine too, it'll be a work in progress.

Native messaging is a way for extension to talk to a native app that's installed separately in the OS. The native app still has to do the gRPC proxying, so I think using HTTP like you're doing is no difference. So never mind about native messaging.

As long as someone with the necessary hardware can set it up, this will be a fun experimental function they can try out.

kfatehi · 2023-04-30T04:56:42Z

@ken107 Ok great, it's ready for your review.

I implemented prosody (pitch and rate interpreted within Riva) which meant passing a rate of 1 to the player, and letting the proxy interpret it into Riva's preferred scale. Likewise pitch is rescaled properly to that which Riva wants. Works great.
Implemented streaming in the proxy (text -> wav -> ogg) so the latency is as low as possible (at the cost of lesser throughput, but this is worth it since we're talking personal use). This is working perfectly and it's super fast now regardless of sentence length.
Implemented prefetching, which seems to work well and made the transition from paragraph to paragraph even faster.
Fixed a bug where multi-sentence was breaking Riva, this was as simple as tokenizing the input paragraph into sentences.
Added MIT license
Published to Docker hub
Updated the README to reflect the new API (JSON -> OGG stream) and how to use it straight from the Docker hub.
Got the Docker image down to 141MB compressed (380MB according to docker images) by rewriting the Dockefile w/ alpine linux.

All in all, quite comprehensive! I'm more proud of the proxy now than before. A good Samaritan rich in GPU power could in theory deploy this, crank the WEB_CONCURRENCY environment variable way up, and serve to many people. In the meantime, people with modern consumer GPUs can also use it.

Great job on read-aloud -- it was surprisingly easy to pull this off and it's still not even Sunday yet :)

kfatehi marked this pull request as draft April 29, 2023 06:26

kfatehi changed the title ~~add riva support~~ Nvidia Riva support Apr 29, 2023

add riva support

2896fb9

kfatehi force-pushed the nvidia-riva branch from 91963d9 to 2896fb9 Compare April 29, 2023 06:32

riva: use JSON, riva-native rate and pitch

7d05ab9

kfatehi added 2 commits April 29, 2023 21:10

riva: safer rate mutation

66c34ea

riva: implement prefetch

012c1b0

kfatehi marked this pull request as ready for review April 30, 2023 04:46

riva: classify as a remote voice

d328c83

ken107 merged commit f81387b into ken107:master May 6, 2023

kfatehi mentioned this pull request Feb 8, 2024

Feature request: Support Piper voices in the custom voice backend #375

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Nvidia Riva support #321

Nvidia Riva support #321

kfatehi commented Apr 29, 2023 •

edited

kfatehi commented Apr 29, 2023 •

edited

kfatehi commented Apr 29, 2023

ken107 commented Apr 29, 2023

kfatehi commented Apr 29, 2023

ken107 commented Apr 29, 2023

kfatehi commented Apr 30, 2023

Nvidia Riva support #321

Nvidia Riva support #321

Conversation

kfatehi commented Apr 29, 2023 • edited

Audio Samples

Screenshots

kfatehi commented Apr 29, 2023 • edited

kfatehi commented Apr 29, 2023

ken107 commented Apr 29, 2023

kfatehi commented Apr 29, 2023

ken107 commented Apr 29, 2023

kfatehi commented Apr 30, 2023

kfatehi commented Apr 29, 2023 •

edited

kfatehi commented Apr 29, 2023 •

edited