New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Nvidia Riva support #321
Nvidia Riva support #321
Conversation
Converted it to a draft because I forgot to implement the prosody features, which Riva supports, so it's just a matter of piping them through the proxy. But would this PR even be approved? I think it's important to support Riva considering it's a fully offline, yet very high quality solution. One just need to have the GPU power to run the models. By default, it seems to consume around 13 GB of GPU memory to run the default Riva stack... so I can only do this because I'm using an Nvidia 4090... Still pretty cool though. |
I would also want to reduce the blind code-copying I did of the IBM engine, prior to this being merged, but I wanted to get it out there because it does work and I want to see if there is interest for me to complete it to a higher standard of quality. As it stands now it works for me though. |
I'll merge it. This is indeed very cool. Even though most users right now won't have the hardware needed to run it locally, the idea of having an offline next-gen TTS voice that's free to use is the future we'd like to get to. When the time comes, perhaps the TTS subsystem can be deployed as a native app and extensions communicate with it via native messaging. (For reference, https://docs.nvidia.com/deeplearning/riva/user-guide/docs/tutorials-tts-contents.html) |
Ok great. I will clean things up, implement prosody control, and notify you when it's ready for review. What do you mean by native messaging? I am not too thrilled about interjecting a proxy between the extension and Riva, but I see I cannot use TCP directly in chrome extension API ? https://developer.chrome.com/docs/extensions/reference/sockets_tcp/ Even if we were willing to add the complexity of gRPC into the extension, is it even allowed, or must everything be HTTP? You made me curious with your point about native messaging. I think it'd be better to eliminate that proxy and just let people connect directly to the Riva stack, as that alone is hard enough to setup for a novice, which I'll summarize below: The link you provided goes to their table of contents which I basically followed in order to get the Riva stack running on my gaming computer which is running Windows 11, Docker Desktop, and WSL2 with Ubuntu, which is needed in order to run the bash scripts for controlling the stack which runs inside Docker with GPU support (a relatively new WSL2 feature). |
Well, at this time I think only a few users will be able to use this feature, so it's basically experimental, it doesn't have to be perfect, as long as it doesn't affect existing functionalities. So you don't need to implement that prosody feature, or leave it for later it's fine. And using a proxy is fine too, it'll be a work in progress. Native messaging is a way for extension to talk to a native app that's installed separately in the OS. The native app still has to do the gRPC proxying, so I think using HTTP like you're doing is no difference. So never mind about native messaging. As long as someone with the necessary hardware can set it up, this will be a fun experimental function they can try out. |
@ken107 Ok great, it's ready for your review.
All in all, quite comprehensive! I'm more proud of the proxy now than before. A good Samaritan rich in GPU power could in theory deploy this, crank the WEB_CONCURRENCY environment variable way up, and serve to many people. In the meantime, people with modern consumer GPUs can also use it. Great job on read-aloud -- it was surprisingly easy to pull this off and it's still not even Sunday yet :) |
This PR adds support for Nvidia Riva. The Riva stack serves a gRPC service, which seemed non-trivial (impossible?) to interface with from the extension environment, so in this implementation I am relying on a companion web service that conforms an HTTP GET request similar to that used for IBM Watson for riva in order to return the desired ogg file.
Audio Samples
riva-english-female-1.webm
riva-english-male-1.webm
Screenshots