-
-
Notifications
You must be signed in to change notification settings - Fork 3.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat: backend support for TTS (Bark, etc.) #126
Comments
Hi, Thanks for the suggestion. Sounds like an interesting idea, I'll see what I can do about it but only after I have every previous feature request out the way. In the meantime, if you could implement a working prototype using python and provide us with implementation examples, that would be sublime. Thanks. |
Bark is rather unstable, slow, and overkill for an assistant. Piper however seems fine. It also has Python support. I also wonder if the server or client should be responsible for TTS... It is written in C++, so a WASM port is possible, if desired. |
I'll be looking into this in the near future! In the meantime, TTS support is already been implemented with legacy web api. Thanks! |
Since we already have the speaker button there, I think we can integrate piper, since its lightweight and fast. The only requirement is that the server have piper installed via:
Directory structure: /flask-piper-app Python:
Here's the html (which you can convert to svelte):
This way, our model responses will not sound like Stephen Hawking. |
I'll actively take a look after #216, but piper doesn't seem to support macos. If any of you guys know any workarounds for this, please let us know. Thanks. Encountering this issue: rhasspy/piper#203 |
Let's get the ball rolling on this one! Stay tuned! |
If I may also suggest this feature has an option to use openai tts as well considering there's already a place to input your api key in the UI. Their model sounds more natural for those of us that are attempting to using AI for language learning. |
Piper will likely support wasm compilation soon which would allow browser-side generation: rhasspy/piper#352 |
I have actually made a pull request that integrated piper in it. But I deleted it since I recall that Timothy said, it is not well supported on his macbook or on mac in general. If you want, I can make a piper integration again, but it would necessitate to "remove the browser Speech recognition default", unless otherwise some would be kind enough to put a new "piper button" as a sign that I should place it back, it (the new speaker icon) should differentiate between Speech Recognition (the default), and the one to be used for piper (since I'm not very good at svelte, but I'm know quite a lot about javascript). The speech though will not be browser controlled (not wasm yet), but it will read the prompt response, send to server and the server audio generated by piper will be served to the browser. The only downside is that for longer prompts, the rendered audio file would be larger for the most simplified implementation (without using complex compression algorithm). Let me know so that I can generate a new pull request should this be still helpful. Alternatively, we can create a piper branch for this repo for research purposes for other developers to look and build on the work. Coz, if I'm not mistaken, OpenAIs whisper server is not free of charge. Its fast but not free. Piper is better than BARK, since you need a huge GPU to run BARK, and it takes hours on smaller GPUs before bark can talk back to the user text prompt. In Piper, for a message this long (as my comment) for a medium size quality voice, will generate between 1-5 mb. It should be installed where the UI is running. Then it will generate voice back to you from the server between 10 seconds to 30 seconds, or sometimes longer. For longer text, it might require a minute. But if you run piper on a GPU, its as quick as lightning, the only downside would just be "how to compress it" after bark generates the audio file. Im sure there are countless developers here who could figure that out on top of the simplest example, coz for longer text, it reaches more mb, and the voice --model WHATEVER-medium.onnx is quite huge (up to 70MB), which shouldn't be included in the pull request, but can be run (downloaded) after running the piper flask server or bash (which can also be included in the Ollama WebUI run script. |
OpenAI TTS support has been added with #656! As for the local TTS support, piper seems promising so let's wait until they merge the two blocking PRs. |
Thanks Timothy. |
Piper library seems to be unmaintained. Looking for alternatives atm, open to suggestions! |
Piper works well on Mac also if you build from source and make a tiny change to the CMakelist 🙈 I am pretty sure @synesthesiam will get around to merging those pull requests, piper seems to be his baby after all. I played around with bark.cpp and coqui.ai TTS and both are far too slow to be useful. |
I agree, out of the big three projects for local TTS Piper is probably the best hope we've got.. I really don't understand how this particular niche is so devoid of development, it's one of the most asked-for features in any local AI project. |
Piper is definitely still being maintained! As @jmtatsch said, I've just been busy with other stuff. One thing that's held up development is needing to replace the espeak-ng library due to its license. I think this niche is fairly devoid of development because very few projects leave the demo stage before the authors are on to the next model/paper. I want Piper to be more of a "boring" technology in the sense that it does a job well without always chasing state-of-the-art. |
I very much agree with that part of the unix philosophy: do one thing and do it well. Thanks for the status update @synesthesiam 🙏 |
I deployed a TTS/STT on my own server, there's REST api, how can I integrate my own API in this web ui. |
Can the existing base url for openai tts be made configurable? |
This looks very promising. The API seems to work well, and it's a similar docker-based setup to ollama. I agree, just allowing tweaking the OPENAI_BASE_URL for audio would go a long way to fully local whisper+xtts-v2 with this. |
FYI, I made this work with a local openedai-speech (linked above) on my branch, here: https://github.com/lee-b/open-webui It currently requires an extra environment variable and uses a custom docker file and runner script to run the thing, but it works. I'll integrate this better if the core team want to advise on their preferred way to solve some of the issues that I did these things to hack around. |
Any way to fix this?
Server is running:
|
Fixed with the following, kudos to ChatGPT:
|
I think it would be best if open webui just enables us to set a different TTS base url via ENV variable like OPENAI_TTS_BASE_URL. |
@tjbck would you be open to the approach taken in https://github.com/lee-b/open-webui |
Is there a simple way to change the TTS model to my own now? I can't stand the voice of this robot lol. |
since cbd18ec you should be able to set your own openai compatible base url |
In case this helps anyone who is running the
Note that |
Works wonderfully now. |
I'll leave it up to @oliverbob to decide to call this issue fixed or not, or I will close it as such in a few days if we don't hear from them. |
Is it possible to have a native support for Bark TTS or langchain version of it since we already have that microphone prompt?
The text was updated successfully, but these errors were encountered: