Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Does this include a way of using it from an API? #3

Closed
Madd0g opened this issue Dec 14, 2023 · 22 comments
Closed

Does this include a way of using it from an API? #3

Madd0g opened this issue Dec 14, 2023 · 22 comments
Labels
question Further information is requested

Comments

@Madd0g
Copy link

Madd0g commented Dec 14, 2023

Can I call some API to generate audio?

I'm already using textgen UI as a hub for other tools to connect, It would be nice to be able to generate audio not just from the UI.

Thanks

@erew123
Copy link
Owner

erew123 commented Dec 15, 2023

Yes you can via JSON calls, I have not written much on this yet, but the functionality is there. It also operates as a standalone app, if you don't want to start up AllTalk as part of text-generation-webui e.g. at the command prompt python script.py in the AllTalk folder, though you need a python environment loaded of course.

If you look at the CURL commands section in the in-built documentation you can see the basics of using CURL at command line. Obviously those can be used as JSON commands within any software you can get to make JSON calls.

It wants these 4x bits of info sent over on a JSON or CURL command:

"text": text,
"voice": voice,
"language": language,
"output_file": output_file,

Text is obviously the block of text you want to generate. Voice is the voice sample name from the voices folder. Language is in the form of en etc (look at the languages.json file for details). And output file is the wav file name it will create in the output folder. Its not sending the wav file over the API, just dumping it in the folder, returning a success message with the path to the file.

So, you would have to figure a way to make it playback........ Though, saying that, I did the other day think about making it speak things as standard when you do things like change the model in text-generation-webui. I just hadnt looked at if this will work yet......... I think the limitation is that the text-generation-webui page has to be at the forefront of your screen. Ive just not tested all this out yet.

As of this time however, anything sent over to /api/generate will not go through any text filtering, so preferably you will need to:

  1. Clean your text. Basically there's no need for any punctuation characters, other than full stops and commas. Everything else seems to potentially cause it to generate strange sounds on occasion. You don't have to do this, but you might occasionally get a strange noise if not. So I would send things over like this:

Original Text: *This is text.* "This is quoted text." This it just standard text!

into:

New String: This is text. This is quoted text. This is just standard text.

  1. As its not going through my filtering, its not automatically going to split out text between narrator/character. I can and had debated setting up a separate api call for this though. So if you really need this, ask me. Its just not high on my agenda yet. Obviously you can control the voice sample used for each text generation you send, by changing the "voice" sample it uses.

The only other note is that I don't know what the limits are of how much text you can send at one time...... obviously though, the more text you send in 1 go, the longer its going to take to generate it.

Screenshot from the built in instructions: (you can copy/paste these from the instructions page)
image

Example of a Python call, if you had something with Python.
def send_generate_request(text, voice, language, output_file):
url = f"{base_url}/api/generate"
payload = {
"text": text,
"voice": voice,
"language": language,
"output_file": output_file,
}
headers = {"Content-Type": "application/json"}
response = requests.post(url, json=payload, headers=headers)
return response.json()

Probably more information than you needed/wanted... but hopefully this gives you a starting point!

Thanks

@Madd0g
Copy link
Author

Madd0g commented Dec 15, 2023

what a great detailed answer, thank you! I will try.

on a slightly unrelated note - it seems like whatever call goes through to the chat completions api results in audio generation too?

So maybe my question is a little moot, because I thought I could have the plugin turned on, but only generate audio sometimes, for only some of the calls

@erew123
Copy link
Owner

erew123 commented Dec 15, 2023

Yeah I noticed something similar to that myself the other day. It shouldn't do that, but Ive had no time to look at it yet. I would imagine that unchecking the "Activate TTS" in the text-gen-webui interface will stop it doing that. Though at some point Ill have to do something with the API on it.. As I mentioned, I yet have to refine those, In all honestly, I didnt think anyone would be using them just yet.

let me know how you get on and I can take a look some time (no promises as I need a break at some point) but if its nice and easy to do.... I might be able to push something through :)

@erew123
Copy link
Owner

erew123 commented Dec 15, 2023

oh.. I probably didnt mention, you can run it as a standalone app. It doesnt have to be run through Text-generation-webui....

So you can use it that way if that works better for you. just literally "python script.py"

If you want a separate install for it, but to use the same models, just follow the custom model setup in the docs and you can point one of the installs at the others model folder.

@erew123
Copy link
Owner

erew123 commented Dec 16, 2023

FYI... Im probably 70% of the way through a new api.... one that has ALL options within it....I may even include a "play the wav file option" through the actual terminal window (no web page etc). Otherwise, I do have it already feeding back out the location to the wav file as a path e.g. c:\folder\folder\folder\myaudio.wav and a web page address e.g. http://127.0.0.1/auidio/myaudio.wav....

So if I include the ability for it to play at within the console session, you can either make it auto play the wav OR pull the details into your own application, either as a file path or web address and play it from there. No promises when Ill have this finished.

@mercuryyy
Copy link

@erew123

Amazing work on this extension, much appreciated and well though off!

How can we use the regular Webui api

import requests

url = "http://127.0.0.1:5000/v1/chat/completions"

headers = {"Content-Type": "application/json"}
history = []
while True:
user_message = input("> ")
while True:
user_message = input("> ")
history.append({"role": "user", "content": user_message})
data = {
"mode": "chat",
"character": "assistant_1",
"messages": history,
'preset': 'LLaMA-Precise',
"stream": False
}
....

To simply get back the File_Path of the audio file like its listed in the WebUI chat from a normal prompt call.

Seems like the API breaks.

@erew123
Copy link
Owner

erew123 commented Dec 17, 2023

@mercuryyy you wont get it on the normal api of text-generation-webui, but I am working on a whole new API for interacting with it (as mentioned above). This one is for external requests to AllTalk.. so say you have your own software and you want it to generate audio, then give you a file path or http://ipaddress:port/audio/myfile.wav

But are you saying you want to pull the information from you actual chat/conversation within text-generation-webui? e.g. you are chatting within the interface of text-generation-webui and all the details in there are posted back out?

@mercuryyy
Copy link

Yes exactly, of course we can do it as side script, but then we would be making 2 api requests.

I think a good option to have would be be to be able to chat/conversation with text-generation-webui via the http://127.0.0.1:5000/v1/chat/completions
End point, and get any of the TTS model responses if they enabled.

Right now if you have TTS enabled and you make a request to "v1/chat/completions" the script will error. i posted it here in detail - oobabooga/text-generation-webui#4944

@erew123
Copy link
Owner

erew123 commented Dec 17, 2023

I don't doubt its something that could be done, but taking a quick look, its something Oobabooga would probably have to do as its a core change to the API of text-generation-webui, whereas the bits I'm doing are just within my own code and I have no way to feed into the main API of text-generation-webui and make it send out the details of a file/web address. I'm assuming you've looked around the api for text-gen https://github.com/oobabooga/text-generation-webui/wiki/12-%E2%80%90-OpenAI-API#compatibility--not-so-compatibility (it says audio is supported) and the actual openai api too https://platform.openai.com/docs/api-reference/audio

I'm sure its absolutely possible to achieve, its just not within my code base to do that. However, if the API from OpenAi is a 1-2-1 copy, there's no direct "just output the file name" that I can see as a standalone option, but as I say, thats not to say it cant be done.

As you said though, if you are creating/using an application say like SillyTavern to interact with text-generation-webui to get your text in and out, yes, you will be able to then make a follow on request to AllTalks API to generate the TTS output and that will feed you back both path and webpage addresses. It will also allow you to control/change the voice you are using, use narration, filter/clean or not filter the text, choose if you name your file something specific, choose if you just keep stacking up wav files or keep over-writing just the one.

@mercuryyy
Copy link

Yeah good point, i tried /v1/audio/ didnt respond at all but i think its more of a base API to simple TTS.

What i would probably end up doing is just editing the "v1/chat/completions" output to fix call your API before json response. will be basically the same thing :)

@rktvr
Copy link

rktvr commented Dec 20, 2023

i'm not able to get the windows curl examples to work, not with the built in curl from windows or the most recent one.
using the example curl command nets this error:
curl: (3) URL rejected: Malformed input to a URL function curl: (3) unmatched close brace/bracket in URL position 70: text, voice: female_07.wav, language: en, output_file: outputfile.wav} ^

@erew123
Copy link
Owner

erew123 commented Dec 20, 2023

Oh, its the HTML formatting missing out the 's (will correct it). The line would be:

curl -X POST -H "Content-Type: application/json" -d "{\"text\": \"This is text to generate as TTS\", \"voice\": \"female_01.wav\", \"language\": \"en\", \"output_file\": \"outputfile.wav\"}" http://127.0.0.1:7851/api/generate

@rktvr
Copy link

rktvr commented Dec 20, 2023

turns out the error only happens in powershell. works fine in command prompt, looks like powershell doesn't escape properly or causes some really odd formatting problems but that appears to have been the issue (as well as the missing ' ). works fine with your above curl command in cmd, just plain doesn't work in powershell but that's fine tbh.
this is what i get for getting too used to powershell, hah.

@erew123
Copy link
Owner

erew123 commented Dec 20, 2023

@rktvr I've updated the code now, so you can update if you want it to look correct/be able to copy/paste from the built in docs :)

@rktvr
Copy link

rktvr commented Dec 20, 2023

yeah looks good now
image

@erew123 erew123 added the question Further information is requested label Dec 21, 2023
@erew123
Copy link
Owner

erew123 commented Dec 21, 2023

For anyone who wishes to try the new AllTalk API, the tty_server.py can be downloaded from the DEV branch here:

https://github.com/erew123/alltalk_tts/blob/dev/tts_server.py

image

You can save it over your existing tts_server.py file.

Its all roughly documented in the online documentation. It now has a web output address. The command line is a bit more unwieldly, but it does give a lot more flexibility.

I have tested it, but I cant say I've spent 3+ hours long testing it.

image

@erew123
Copy link
Owner

erew123 commented Dec 24, 2023

Guys, as Ive not heard back from you, Im closing this for now. The new version of AllTalk has been released, now with Finetuning of the model with any voice you like! https://github.com/erew123/alltalk_tts?tab=readme-ov-file#-finetuning-a-model

@erew123 erew123 closed this as completed Dec 24, 2023
@erew123
Copy link
Owner

erew123 commented Dec 29, 2023

@Madd0g Not sure if this covers what you were interested in, but the API suite is now leaps and bounds from where it was 2x weeks ago. It now supports the option to play the TTS generated through the command prompt where the script is running from/your audio device on that machine. You also have a JSON response giving you the file location on disk and also a web address (if you want to play back the sound somewhere else). Full details are here https://github.com/erew123/alltalk_tts?#-api-suite-and-json-curl

@Madd0g
Copy link
Author

Madd0g commented Dec 30, 2023

@erew123 that's great, thank you. I'll try it.

The performance wasn't too great for me while in interactive chat (too slow to generate on my mac), hopefully I'll find the creation via API a little more useful.

Can it be used fully separately now and not generate voice for every API completion call?

@erew123
Copy link
Owner

erew123 commented Dec 30, 2023

@Madd0g Well you have multiple options if you don't want it interacting with Text-gen-webui:

Beyond that, you the API is pretty flexible.

Out of curiosity.. are you on an M series mac? I might be able to do something to get the generation running much faster! I would have to change some bits in the code.

https://sebastianraschka.com/blog/2022/pytorch-m1-gpu.html

@Madd0g
Copy link
Author

Madd0g commented Dec 30, 2023

Out of curiosity.. are you on an M series mac? I might be able to do something to get the generation running much faster! I would have to change some bits in the code.

Yes I am. That would be great, I was kind of disappointed at the speed, took at least 4-5x longer than the text to generate.

I don't have a lot of time these days, but I'll test it when I can.

@erew123
Copy link
Owner

erew123 commented Dec 30, 2023

@Madd0g it will take me a while to write the code to do it..... not sure if Ill get there today. But ill let you know when I do...

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
question Further information is requested
Projects
None yet
Development

No branches or pull requests

4 participants