-
-
Notifications
You must be signed in to change notification settings - Fork 110
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Does this include a way of using it from an API? #3
Comments
what a great detailed answer, thank you! I will try. on a slightly unrelated note - it seems like whatever call goes through to the chat completions api results in audio generation too? So maybe my question is a little moot, because I thought I could have the plugin turned on, but only generate audio sometimes, for only some of the calls |
Yeah I noticed something similar to that myself the other day. It shouldn't do that, but Ive had no time to look at it yet. I would imagine that unchecking the "Activate TTS" in the text-gen-webui interface will stop it doing that. Though at some point Ill have to do something with the API on it.. As I mentioned, I yet have to refine those, In all honestly, I didnt think anyone would be using them just yet. let me know how you get on and I can take a look some time (no promises as I need a break at some point) but if its nice and easy to do.... I might be able to push something through :) |
oh.. I probably didnt mention, you can run it as a standalone app. It doesnt have to be run through Text-generation-webui.... So you can use it that way if that works better for you. just literally "python script.py" If you want a separate install for it, but to use the same models, just follow the custom model setup in the docs and you can point one of the installs at the others model folder. |
FYI... Im probably 70% of the way through a new api.... one that has ALL options within it....I may even include a "play the wav file option" through the actual terminal window (no web page etc). Otherwise, I do have it already feeding back out the location to the wav file as a path e.g. c:\folder\folder\folder\myaudio.wav and a web page address e.g. http://127.0.0.1/auidio/myaudio.wav.... So if I include the ability for it to play at within the console session, you can either make it auto play the wav OR pull the details into your own application, either as a file path or web address and play it from there. No promises when Ill have this finished. |
Amazing work on this extension, much appreciated and well though off! How can we use the regular Webui api import requests url = "http://127.0.0.1:5000/v1/chat/completions" headers = {"Content-Type": "application/json"} To simply get back the File_Path of the audio file like its listed in the WebUI chat from a normal prompt call. Seems like the API breaks. |
@mercuryyy you wont get it on the normal api of text-generation-webui, but I am working on a whole new API for interacting with it (as mentioned above). This one is for external requests to AllTalk.. so say you have your own software and you want it to generate audio, then give you a file path or http://ipaddress:port/audio/myfile.wav But are you saying you want to pull the information from you actual chat/conversation within text-generation-webui? e.g. you are chatting within the interface of text-generation-webui and all the details in there are posted back out? |
Yes exactly, of course we can do it as side script, but then we would be making 2 api requests. I think a good option to have would be be to be able to chat/conversation with text-generation-webui via the http://127.0.0.1:5000/v1/chat/completions Right now if you have TTS enabled and you make a request to "v1/chat/completions" the script will error. i posted it here in detail - oobabooga/text-generation-webui#4944 |
I don't doubt its something that could be done, but taking a quick look, its something Oobabooga would probably have to do as its a core change to the API of text-generation-webui, whereas the bits I'm doing are just within my own code and I have no way to feed into the main API of text-generation-webui and make it send out the details of a file/web address. I'm assuming you've looked around the api for text-gen https://github.com/oobabooga/text-generation-webui/wiki/12-%E2%80%90-OpenAI-API#compatibility--not-so-compatibility (it says audio is supported) and the actual openai api too https://platform.openai.com/docs/api-reference/audio I'm sure its absolutely possible to achieve, its just not within my code base to do that. However, if the API from OpenAi is a 1-2-1 copy, there's no direct "just output the file name" that I can see as a standalone option, but as I say, thats not to say it cant be done. As you said though, if you are creating/using an application say like SillyTavern to interact with text-generation-webui to get your text in and out, yes, you will be able to then make a follow on request to AllTalks API to generate the TTS output and that will feed you back both path and webpage addresses. It will also allow you to control/change the voice you are using, use narration, filter/clean or not filter the text, choose if you name your file something specific, choose if you just keep stacking up wav files or keep over-writing just the one. |
Yeah good point, i tried /v1/audio/ didnt respond at all but i think its more of a base API to simple TTS. What i would probably end up doing is just editing the "v1/chat/completions" output to fix call your API before json response. will be basically the same thing :) |
i'm not able to get the windows curl examples to work, not with the built in curl from windows or the most recent one. |
Oh, its the HTML formatting missing out the 's (will correct it). The line would be:
|
turns out the error only happens in powershell. works fine in command prompt, looks like powershell doesn't escape properly or causes some really odd formatting problems but that appears to have been the issue (as well as the missing ' ). works fine with your above curl command in cmd, just plain doesn't work in powershell but that's fine tbh. |
@rktvr I've updated the code now, so you can update if you want it to look correct/be able to copy/paste from the built in docs :) |
For anyone who wishes to try the new AllTalk API, the tty_server.py can be downloaded from the DEV branch here: https://github.com/erew123/alltalk_tts/blob/dev/tts_server.py You can save it over your existing tts_server.py file. Its all roughly documented in the online documentation. It now has a web output address. The command line is a bit more unwieldly, but it does give a lot more flexibility. I have tested it, but I cant say I've spent 3+ hours long testing it. |
Guys, as Ive not heard back from you, Im closing this for now. The new version of AllTalk has been released, now with Finetuning of the model with any voice you like! https://github.com/erew123/alltalk_tts?tab=readme-ov-file#-finetuning-a-model |
@Madd0g Not sure if this covers what you were interested in, but the API suite is now leaps and bounds from where it was 2x weeks ago. It now supports the option to play the TTS generated through the command prompt where the script is running from/your audio device on that machine. You also have a JSON response giving you the file location on disk and also a web address (if you want to play back the sound somewhere else). Full details are here https://github.com/erew123/alltalk_tts?#-api-suite-and-json-curl |
@erew123 that's great, thank you. I'll try it. The performance wasn't too great for me while in interactive chat (too slow to generate on my mac), hopefully I'll find the creation via API a little more useful. Can it be used fully separately now and not generate voice for every API completion call? |
@Madd0g Well you have multiple options if you don't want it interacting with Text-gen-webui:
Beyond that, you the API is pretty flexible. Out of curiosity.. are you on an M series mac? I might be able to do something to get the generation running much faster! I would have to change some bits in the code. |
Yes I am. That would be great, I was kind of disappointed at the speed, took at least 4-5x longer than the text to generate. I don't have a lot of time these days, but I'll test it when I can. |
@Madd0g it will take me a while to write the code to do it..... not sure if Ill get there today. But ill let you know when I do... |
Can I call some API to generate audio?
I'm already using textgen UI as a hub for other tools to connect, It would be nice to be able to generate audio not just from the UI.
Thanks
The text was updated successfully, but these errors were encountered: