Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Nvidia Riva support #321

Merged
merged 5 commits into from May 6, 2023
Merged

Nvidia Riva support #321

merged 5 commits into from May 6, 2023

Conversation

kfatehi
Copy link
Contributor

@kfatehi kfatehi commented Apr 29, 2023

This PR adds support for Nvidia Riva. The Riva stack serves a gRPC service, which seemed non-trivial (impossible?) to interface with from the extension environment, so in this implementation I am relying on a companion web service that conforms an HTTP GET request similar to that used for IBM Watson for riva in order to return the desired ogg file.

Audio Samples

riva-english-female-1.webm
riva-english-male-1.webm

Screenshots

image

image

image

@kfatehi kfatehi marked this pull request as draft April 29, 2023 06:26
@kfatehi
Copy link
Contributor Author

kfatehi commented Apr 29, 2023

Converted it to a draft because I forgot to implement the prosody features, which Riva supports, so it's just a matter of piping them through the proxy.

But would this PR even be approved? I think it's important to support Riva considering it's a fully offline, yet very high quality solution. One just need to have the GPU power to run the models. By default, it seems to consume around 13 GB of GPU memory to run the default Riva stack... so I can only do this because I'm using an Nvidia 4090... Still pretty cool though.

@kfatehi
Copy link
Contributor Author

kfatehi commented Apr 29, 2023

I would also want to reduce the blind code-copying I did of the IBM engine, prior to this being merged, but I wanted to get it out there because it does work and I want to see if there is interest for me to complete it to a higher standard of quality. As it stands now it works for me though.

@kfatehi kfatehi changed the title add riva support Nvidia Riva support Apr 29, 2023
@ken107
Copy link
Owner

ken107 commented Apr 29, 2023

I'll merge it. This is indeed very cool. Even though most users right now won't have the hardware needed to run it locally, the idea of having an offline next-gen TTS voice that's free to use is the future we'd like to get to. When the time comes, perhaps the TTS subsystem can be deployed as a native app and extensions communicate with it via native messaging. (For reference, https://docs.nvidia.com/deeplearning/riva/user-guide/docs/tutorials-tts-contents.html)

@kfatehi
Copy link
Contributor Author

kfatehi commented Apr 29, 2023

Ok great. I will clean things up, implement prosody control, and notify you when it's ready for review.

What do you mean by native messaging? I am not too thrilled about interjecting a proxy between the extension and Riva, but I see I cannot use TCP directly in chrome extension API ? https://developer.chrome.com/docs/extensions/reference/sockets_tcp/

Even if we were willing to add the complexity of gRPC into the extension, is it even allowed, or must everything be HTTP? You made me curious with your point about native messaging.

I think it'd be better to eliminate that proxy and just let people connect directly to the Riva stack, as that alone is hard enough to setup for a novice, which I'll summarize below:

The link you provided goes to their table of contents which I basically followed in order to get the Riva stack running on my gaming computer which is running Windows 11, Docker Desktop, and WSL2 with Ubuntu, which is needed in order to run the bash scripts for controlling the stack which runs inside Docker with GPU support (a relatively new WSL2 feature).

@ken107
Copy link
Owner

ken107 commented Apr 29, 2023

Well, at this time I think only a few users will be able to use this feature, so it's basically experimental, it doesn't have to be perfect, as long as it doesn't affect existing functionalities. So you don't need to implement that prosody feature, or leave it for later it's fine. And using a proxy is fine too, it'll be a work in progress.

Native messaging is a way for extension to talk to a native app that's installed separately in the OS. The native app still has to do the gRPC proxying, so I think using HTTP like you're doing is no difference. So never mind about native messaging.

As long as someone with the necessary hardware can set it up, this will be a fun experimental function they can try out.

@kfatehi kfatehi marked this pull request as ready for review April 30, 2023 04:46
@kfatehi
Copy link
Contributor Author

kfatehi commented Apr 30, 2023

@ken107 Ok great, it's ready for your review.

  1. I implemented prosody (pitch and rate interpreted within Riva) which meant passing a rate of 1 to the player, and letting the proxy interpret it into Riva's preferred scale. Likewise pitch is rescaled properly to that which Riva wants. Works great.
  2. Implemented streaming in the proxy (text -> wav -> ogg) so the latency is as low as possible (at the cost of lesser throughput, but this is worth it since we're talking personal use). This is working perfectly and it's super fast now regardless of sentence length.
  3. Implemented prefetching, which seems to work well and made the transition from paragraph to paragraph even faster.
  4. Fixed a bug where multi-sentence was breaking Riva, this was as simple as tokenizing the input paragraph into sentences.
  5. Added MIT license
  6. Published to Docker hub
  7. Updated the README to reflect the new API (JSON -> OGG stream) and how to use it straight from the Docker hub.
  8. Got the Docker image down to 141MB compressed (380MB according to docker images) by rewriting the Dockefile w/ alpine linux.

All in all, quite comprehensive! I'm more proud of the proxy now than before. A good Samaritan rich in GPU power could in theory deploy this, crank the WEB_CONCURRENCY environment variable way up, and serve to many people. In the meantime, people with modern consumer GPUs can also use it.

Great job on read-aloud -- it was surprisingly easy to pull this off and it's still not even Sunday yet :)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants