Does this include a way of using it from an API? #3

Madd0g · 2023-12-14T23:06:19Z

Can I call some API to generate audio?

I'm already using textgen UI as a hub for other tools to connect, It would be nice to be able to generate audio not just from the UI.

Thanks

erew123 · 2023-12-15T05:50:30Z

Yes you can via JSON calls, I have not written much on this yet, but the functionality is there. It also operates as a standalone app, if you don't want to start up AllTalk as part of text-generation-webui e.g. at the command prompt python script.py in the AllTalk folder, though you need a python environment loaded of course.

If you look at the CURL commands section in the in-built documentation you can see the basics of using CURL at command line. Obviously those can be used as JSON commands within any software you can get to make JSON calls.

It wants these 4x bits of info sent over on a JSON or CURL command:

"text": text,
"voice": voice,
"language": language,
"output_file": output_file,

Text is obviously the block of text you want to generate. Voice is the voice sample name from the voices folder. Language is in the form of en etc (look at the languages.json file for details). And output file is the wav file name it will create in the output folder. Its not sending the wav file over the API, just dumping it in the folder, returning a success message with the path to the file.

So, you would have to figure a way to make it playback........ Though, saying that, I did the other day think about making it speak things as standard when you do things like change the model in text-generation-webui. I just hadnt looked at if this will work yet......... I think the limitation is that the text-generation-webui page has to be at the forefront of your screen. Ive just not tested all this out yet.

As of this time however, anything sent over to /api/generate will not go through any text filtering, so preferably you will need to:

Clean your text. Basically there's no need for any punctuation characters, other than full stops and commas. Everything else seems to potentially cause it to generate strange sounds on occasion. You don't have to do this, but you might occasionally get a strange noise if not. So I would send things over like this:

Original Text: *This is text.* "This is quoted text." This it just standard text!

into:

New String: This is text. This is quoted text. This is just standard text.

As its not going through my filtering, its not automatically going to split out text between narrator/character. I can and had debated setting up a separate api call for this though. So if you really need this, ask me. Its just not high on my agenda yet. Obviously you can control the voice sample used for each text generation you send, by changing the "voice" sample it uses.

The only other note is that I don't know what the limits are of how much text you can send at one time...... obviously though, the more text you send in 1 go, the longer its going to take to generate it.

Screenshot from the built in instructions: (you can copy/paste these from the instructions page)

Example of a Python call, if you had something with Python.
def send_generate_request(text, voice, language, output_file):
url = f"{base_url}/api/generate"
payload = {
"text": text,
"voice": voice,
"language": language,
"output_file": output_file,
}
headers = {"Content-Type": "application/json"}
response = requests.post(url, json=payload, headers=headers)
return response.json()

Probably more information than you needed/wanted... but hopefully this gives you a starting point!

Thanks

Madd0g · 2023-12-15T23:05:44Z

what a great detailed answer, thank you! I will try.

on a slightly unrelated note - it seems like whatever call goes through to the chat completions api results in audio generation too?

So maybe my question is a little moot, because I thought I could have the plugin turned on, but only generate audio sometimes, for only some of the calls

erew123 · 2023-12-15T23:46:33Z

Yeah I noticed something similar to that myself the other day. It shouldn't do that, but Ive had no time to look at it yet. I would imagine that unchecking the "Activate TTS" in the text-gen-webui interface will stop it doing that. Though at some point Ill have to do something with the API on it.. As I mentioned, I yet have to refine those, In all honestly, I didnt think anyone would be using them just yet.

let me know how you get on and I can take a look some time (no promises as I need a break at some point) but if its nice and easy to do.... I might be able to push something through :)

erew123 · 2023-12-15T23:53:39Z

oh.. I probably didnt mention, you can run it as a standalone app. It doesnt have to be run through Text-generation-webui....

So you can use it that way if that works better for you. just literally "python script.py"

If you want a separate install for it, but to use the same models, just follow the custom model setup in the docs and you can point one of the installs at the others model folder.

erew123 · 2023-12-16T18:43:58Z

FYI... Im probably 70% of the way through a new api.... one that has ALL options within it....I may even include a "play the wav file option" through the actual terminal window (no web page etc). Otherwise, I do have it already feeding back out the location to the wav file as a path e.g. c:\folder\folder\folder\myaudio.wav and a web page address e.g. http://127.0.0.1/auidio/myaudio.wav....

So if I include the ability for it to play at within the console session, you can either make it auto play the wav OR pull the details into your own application, either as a file path or web address and play it from there. No promises when Ill have this finished.

mercuryyy · 2023-12-17T08:31:01Z

@erew123

Amazing work on this extension, much appreciated and well though off!

How can we use the regular Webui api

import requests

url = "http://127.0.0.1:5000/v1/chat/completions"

headers = {"Content-Type": "application/json"}
history = []
while True:
user_message = input("> ")
while True:
user_message = input("> ")
history.append({"role": "user", "content": user_message})
data = {
"mode": "chat",
"character": "assistant_1",
"messages": history,
'preset': 'LLaMA-Precise',
"stream": False
}
....

To simply get back the File_Path of the audio file like its listed in the WebUI chat from a normal prompt call.

Seems like the API breaks.

erew123 · 2023-12-17T08:37:07Z

@mercuryyy you wont get it on the normal api of text-generation-webui, but I am working on a whole new API for interacting with it (as mentioned above). This one is for external requests to AllTalk.. so say you have your own software and you want it to generate audio, then give you a file path or http://ipaddress:port/audio/myfile.wav

But are you saying you want to pull the information from you actual chat/conversation within text-generation-webui? e.g. you are chatting within the interface of text-generation-webui and all the details in there are posted back out?

mercuryyy · 2023-12-17T13:38:08Z

Yes exactly, of course we can do it as side script, but then we would be making 2 api requests.

I think a good option to have would be be to be able to chat/conversation with text-generation-webui via the http://127.0.0.1:5000/v1/chat/completions
End point, and get any of the TTS model responses if they enabled.

Right now if you have TTS enabled and you make a request to "v1/chat/completions" the script will error. i posted it here in detail - oobabooga/text-generation-webui#4944

erew123 · 2023-12-17T14:03:52Z

I don't doubt its something that could be done, but taking a quick look, its something Oobabooga would probably have to do as its a core change to the API of text-generation-webui, whereas the bits I'm doing are just within my own code and I have no way to feed into the main API of text-generation-webui and make it send out the details of a file/web address. I'm assuming you've looked around the api for text-gen https://github.com/oobabooga/text-generation-webui/wiki/12-%E2%80%90-OpenAI-API#compatibility--not-so-compatibility (it says audio is supported) and the actual openai api too https://platform.openai.com/docs/api-reference/audio

I'm sure its absolutely possible to achieve, its just not within my code base to do that. However, if the API from OpenAi is a 1-2-1 copy, there's no direct "just output the file name" that I can see as a standalone option, but as I say, thats not to say it cant be done.

As you said though, if you are creating/using an application say like SillyTavern to interact with text-generation-webui to get your text in and out, yes, you will be able to then make a follow on request to AllTalks API to generate the TTS output and that will feed you back both path and webpage addresses. It will also allow you to control/change the voice you are using, use narration, filter/clean or not filter the text, choose if you name your file something specific, choose if you just keep stacking up wav files or keep over-writing just the one.

mercuryyy · 2023-12-17T14:26:56Z

Yeah good point, i tried /v1/audio/ didnt respond at all but i think its more of a base API to simple TTS.

What i would probably end up doing is just editing the "v1/chat/completions" output to fix call your API before json response. will be basically the same thing :)

rktvr · 2023-12-20T00:54:10Z

i'm not able to get the windows curl examples to work, not with the built in curl from windows or the most recent one.
using the example curl command nets this error:
curl: (3) URL rejected: Malformed input to a URL function curl: (3) unmatched close brace/bracket in URL position 70: text, voice: female_07.wav, language: en, output_file: outputfile.wav} ^

erew123 · 2023-12-20T01:06:32Z

Oh, its the HTML formatting missing out the 's (will correct it). The line would be:

curl -X POST -H "Content-Type: application/json" -d "{\"text\": \"This is text to generate as TTS\", \"voice\": \"female_01.wav\", \"language\": \"en\", \"output_file\": \"outputfile.wav\"}" http://127.0.0.1:7851/api/generate

rktvr · 2023-12-20T01:12:49Z

turns out the error only happens in powershell. works fine in command prompt, looks like powershell doesn't escape properly or causes some really odd formatting problems but that appears to have been the issue (as well as the missing ' ). works fine with your above curl command in cmd, just plain doesn't work in powershell but that's fine tbh.
this is what i get for getting too used to powershell, hah.

erew123 · 2023-12-20T01:14:16Z

@rktvr I've updated the code now, so you can update if you want it to look correct/be able to copy/paste from the built in docs :)

rktvr · 2023-12-20T01:32:40Z

yeah looks good now

erew123 · 2023-12-21T11:41:35Z

For anyone who wishes to try the new AllTalk API, the tty_server.py can be downloaded from the DEV branch here:

https://github.com/erew123/alltalk_tts/blob/dev/tts_server.py

You can save it over your existing tts_server.py file.

Its all roughly documented in the online documentation. It now has a web output address. The command line is a bit more unwieldly, but it does give a lot more flexibility.

I have tested it, but I cant say I've spent 3+ hours long testing it.

erew123 · 2023-12-24T13:42:11Z

Guys, as Ive not heard back from you, Im closing this for now. The new version of AllTalk has been released, now with Finetuning of the model with any voice you like! https://github.com/erew123/alltalk_tts?tab=readme-ov-file#-finetuning-a-model

erew123 · 2023-12-29T20:48:46Z

@Madd0g Not sure if this covers what you were interested in, but the API suite is now leaps and bounds from where it was 2x weeks ago. It now supports the option to play the TTS generated through the command prompt where the script is running from/your audio device on that machine. You also have a JSON response giving you the file location on disk and also a web address (if you want to play back the sound somewhere else). Full details are here https://github.com/erew123/alltalk_tts?#-api-suite-and-json-curl

Madd0g · 2023-12-30T05:02:31Z

@erew123 that's great, thank you. I'll try it.

The performance wasn't too great for me while in interactive chat (too slow to generate on my mac), hopefully I'll find the creation via API a little more useful.

Can it be used fully separately now and not generate voice for every API completion call?

erew123 · 2023-12-30T08:05:50Z

@Madd0g Well you have multiple options if you don't want it interacting with Text-gen-webui:

You can run it as a standalone app https://github.com/erew123/alltalk_tts?#-running-alltalk-as-a-standalone-app
The API is also separate generation code so wont feed anything back to text-generation-webui https://github.com/erew123/alltalk_tts#-api-suite-and-json-curl
Finally, if you generally just had AllTalk loaded in Text-generation-webui but didnt want to use it for TTS, then you can just uncheck the Activate TTS option and it will still be fully available for API Calls and TTS generation via that method.

Beyond that, you the API is pretty flexible.

Out of curiosity.. are you on an M series mac? I might be able to do something to get the generation running much faster! I would have to change some bits in the code.

https://sebastianraschka.com/blog/2022/pytorch-m1-gpu.html

Madd0g · 2023-12-30T08:25:44Z

Out of curiosity.. are you on an M series mac? I might be able to do something to get the generation running much faster! I would have to change some bits in the code.

Yes I am. That would be great, I was kind of disappointed at the speed, took at least 4-5x longer than the text to generate.

I don't have a lot of time these days, but I'll test it when I can.

erew123 · 2023-12-30T08:51:02Z

@Madd0g it will take me a while to write the code to do it..... not sure if Ill get there today. But ill let you know when I do...

erew123 mentioned this issue Dec 15, 2023

Coqui_TTS needs TTS updating or it will keep downloading the model. Also sounds Strange (FIX) oobabooga/text-generation-webui#4723

Closed

1 task

erew123 added the question Further information is requested label Dec 21, 2023

erew123 closed this as completed Dec 24, 2023

tom2698 mentioned this issue Jun 5, 2024

FFMPEG RuntimeError: Failed to open the input in finetune.py #243

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Does this include a way of using it from an API? #3

Does this include a way of using it from an API? #3

Madd0g commented Dec 14, 2023

erew123 commented Dec 15, 2023 •

edited

Loading

Madd0g commented Dec 15, 2023

erew123 commented Dec 15, 2023

erew123 commented Dec 15, 2023

erew123 commented Dec 16, 2023

mercuryyy commented Dec 17, 2023

erew123 commented Dec 17, 2023

mercuryyy commented Dec 17, 2023

erew123 commented Dec 17, 2023

mercuryyy commented Dec 17, 2023

rktvr commented Dec 20, 2023

erew123 commented Dec 20, 2023 •

edited

Loading

rktvr commented Dec 20, 2023 •

edited

Loading

erew123 commented Dec 20, 2023

rktvr commented Dec 20, 2023

erew123 commented Dec 21, 2023

erew123 commented Dec 24, 2023

erew123 commented Dec 29, 2023

Madd0g commented Dec 30, 2023

erew123 commented Dec 30, 2023

Madd0g commented Dec 30, 2023

erew123 commented Dec 30, 2023

Does this include a way of using it from an API? #3

Does this include a way of using it from an API? #3

Comments

Madd0g commented Dec 14, 2023

erew123 commented Dec 15, 2023 • edited Loading

Madd0g commented Dec 15, 2023

erew123 commented Dec 15, 2023

erew123 commented Dec 15, 2023

erew123 commented Dec 16, 2023

mercuryyy commented Dec 17, 2023

erew123 commented Dec 17, 2023

mercuryyy commented Dec 17, 2023

erew123 commented Dec 17, 2023

mercuryyy commented Dec 17, 2023

rktvr commented Dec 20, 2023

erew123 commented Dec 20, 2023 • edited Loading

rktvr commented Dec 20, 2023 • edited Loading

erew123 commented Dec 20, 2023

rktvr commented Dec 20, 2023

erew123 commented Dec 21, 2023

erew123 commented Dec 24, 2023

erew123 commented Dec 29, 2023

Madd0g commented Dec 30, 2023

erew123 commented Dec 30, 2023

Madd0g commented Dec 30, 2023

erew123 commented Dec 30, 2023

erew123 commented Dec 15, 2023 •

edited

Loading

erew123 commented Dec 20, 2023 •

edited

Loading

rktvr commented Dec 20, 2023 •

edited

Loading