-
Notifications
You must be signed in to change notification settings - Fork 24
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Local TTS Engine Requirements #22
Comments
What are your machine specs? The pytorch model being used requiresa CPU that supports AVX2 instructions. Also, are you running it in a VM? |
I initially was running on windows via the linux subsystem but that is a whole can of worms when dealing with audio so I abandoned it pretty quickly. I then tried on windows but that complains about not having espeak even though I did install it after I saw that error but still fails on that error. I used the ubuntu subsystem to inspect /proc/cpuinfo so it appears my windows system does support avx2 but because windows seems to be an issue, won't work. The error from above is from a different dedicated piece of hardware, not a VM having the issue that does not report avx or avx2 in its cpuinfo. So that may be the requirement I am missing, a piece of hardware that supports AVX2. From what I read, the pi doesn't won't support this correct? |
Yes, sadly at this point the pi doesn't support the tts engine. Someone had given us very vague instructions on how to 'maybe' get it to work, but it was way above my head and didn't look all that promising. I have managed to get the tts-engine to work on my windows machine. There is something I had to do to get espeak to work, something about adding it's library to the windows environment variables. When it comes to audio, I also got it to work on windows for the engine. I haven't updated the main glados assistant to include the same code as I only use windows for the tts, then my pi for everything else. |
Do you know what variable you added to windows? Also, do you have a link to the comment concerning pi compatibility? |
Just did some googling and testing so if anybody else comes here looking for the answer... You'll need to use espeak-ng, https://github.com/espeak-ng/espeak-ng/releases. And then referencing this issue bootphon/phonemizer#44, Add the env variable: PHONEMIZER_ESPEAK_LIBRARY So assuming you didn't change the default install location of espeak ng, you |
Yea good job. That's what I had to do. Thanks for documenting it here. |
@SuperJonotron @eternalliving Sorry for the late reply. I'm the person who made this TTS. Both the TTS and vocoder were optimized with torchscript. The TTS specifically always runs on the CPU because it is so fast that it does not need GPU speedup. The vocoder is much more costly. With a more efficient vocoder the e2e latency would be extremely low. In the folder there are also high and low quality CPU models which remove the requirement for a GPU. All of the models (excluding the GPU vocoder) have been quantized for CPU inference. I believe that internally they use XNNPACK, which has SSE, AVX, AVX2, AVX512, and even NEON implementations, so it should be able to run on a Raspberry Pi even (extremely slowly). |
@R2D2FISH
Seems to be an error associated with missing AVX2 but if there's something else that needs to happen for this to run on a pi, let me know because I don't think it's supported because of that requirement. I'd be happy to be proven wrong though. |
@SuperJonotron Not sure if this works, but try running this command before loading the models: |
@R2D2FISH I had seen that comment somewhere but was unclear on where it was intended to be used so thanks for that clarification. Tested it out and seems to work. Here's some benchmark comparisons for anybody else interested in performance before going this route. More Powerful computer with AVX2 Support: With qnpack set: ~60 second startup Approximately 12.7x slower with qnnpack on system with AVX2 support using qnnpack On RPI4 Model B Rev 1.4 8GB RAM: Approximately 4x slower than the AVX2 optoin on a more powerful system. So RPi runs faster with qnnpack than more powerful system that has AVX2 but still slower. Using cache though still returns instantly so this should let you generate a library locally and still have instant responses if using the cache option. |
@SuperJonotron Awesome! On AVX2 systems you might want to test replacing 'qnnpack' with 'xnnpack', 'fbgemm', or 'mkldnn'. I'll modify the code to autoselect a backend based on the host device. I suspect that mkldnn will be the fastest, but I can't test this myself at the moment because my pytorch build is missing mkldnn (it was causing build errors) |
@R2D2FISH Tested out the other options on the AVX2 hardware and here are the results for performance: Running with fbgemm (same message 4 times): Running with nothing specified (same message 4 times): Looks to me like an AVX2 system already chooses the most optimized setting since these times look basically the same. Hopefully this makes an update to choose the correct one on other hardware all that's necessary for you when AVX2 is already supported. |
@jhughesbiot Good to know. |
@SuperJonotron A few things to note. First of all, the reason the startup is so long is that in order to load the models into RAM they made it run like 4 empty strings in a row when it first loads. You may want to experiment with altering that number or removing it. Additionally, they discovered that my "quantized" models are actually slower than the standard version, and switched to that one. This may not be the case on aarch64, so you might want to try the 'vocoder-cpu-hq.pt' model and see if it performs any better. Finally, and perhaps most excitingly, apparently the Raspberry Pi 4 has full Vulkan 1.2 conformance now, so you may actually be able to run these models on the GPU now with the right drivers and a custom build (I don't think prebuilt pytorch has vulkan enabled). |
Don't really have any experience with torch, vulkan and the various models created for this project so not really sure where I'd start on getting the RPi to a custom build with it but I'd be happy to test something out. |
Looks like using the vocoder-cpu-hq.pt instead of the gpu model on the RPi4 as is drops the time down a little more than half. Seeing about a 1.7x (~4x with gpu model) slower time than the AVX2 option which isn't that bad if you're already not expecting real time responses as is. |
@SuperJonotron That's actually amazingly quick for a Pi. I'll try cook up a Vulkan enabled aarch64 build on my laptop later today if you'd like to try it out. I'll throw in ThinLTO for good measure ;). I'm pretty sure that the reason why the quantized models are not running very fast on desktop is actually because qnnpack and xnnpack are disabled in the desktop builds (at least on Windows). I trained this model on an extremely cobbled together version of pytorch and rocm both built from source in order to allow me to train on my AMD GPU laptop so a lot of options were left turned on, which is probably why I was seeing better performance with the quantized builds. Are you running Raspberry Pi OS? |
@R2D2FISH For both systems, I am using my fork that wraps this project in docker. The base image for that docker container is Ubuntu 20.04. On my windows machine with AVX2 I run this via the linux subsystem (WSL) and docker desktop. RPi is running Ubuntu 20.04 Desktop but since I run both tests using docker, there really is no difference in what the host is running since they both execute in the same docker environment of ubuntu 20.04 and just use the hardware available. I'll definitely try out any improvements that might improve speed. |
I managed to get glados-tts to run on my system that doesn't support AVX2 by rebuilding pytorch you can find my blog post about it here. https://blog.longearsfor.life/blog/2023/11/26/building-pytorch-for-systems-without-avx2-instructions/ I hope this helps anyone running into similar problems. |
Are there undocumented requirements to use the new local glados-tts? I have tried directly cloning and running and against this project with a clean Ubuntu install and walking through every dependency installed without success. Right now on clean install, installing everything listed and running I get:
The text was updated successfully, but these errors were encountered: