AllTalk v2 BETA Download Details & Discussion #245

erew123 · 2024-06-06T15:41:42Z

erew123
Jun 6, 2024
Maintainer

I'm intending this to be a simple place for people to discuss the v2 BETA. If there is something you want to discuss or a small technical issue, we can do that here. If you have a big technical issue, where there will probably be crashes/logs and lots to discuss, lets open that in a ticket.

For those whom want to try it, the BETA is here https://github.com/erew123/alltalk_tts/tree/alltalkbeta (sorry the instructions are a bit rough and ready at the moment)

This is not a direct update over V1 as you would need to delete/rebuild some of the Python environment (its completely possible, I just don't have time to explain ATM).

Since upload I have tested a fresh download/setup on Windows as a Standalone installation. That seems to be fine.

Over the last 2-3 days I had to change about some code for when AllTalk is installed into Text-generation-webui's Python environment, Im reasonably sure that all should be ok if you install AllTalk that way, but Ive not managed to have hours of testing.

With Linux installation. I had a lot of difficulties with Nvidia's own packages, breaking other Nvidia packages when they installed {insert swear words here}. As such this took me about 40-50 installation tests on Linux and writing my own fix to repair some symlinks during the installation. generally all should be good with Linux installs, but now you know why the I was delayed (40-50 installs x 20 minutes each + troubleshooting). but there was no point pushing something out that may cause problems down the line.

Otherwise:

Please read the installation instructions on the BETA page.
Please pay attention to anything you may get told in the terminal/command prompt esp on start-up OR if you think something isn't doing what it should be doing.
Much of the documentation is built in, but I haven't had time to flesh it out and make everything sound nice and fluffy.

Now that the core of AllTalk is rebuilt, I hope to just make sure things are stable, fix any issues or clear up any documentation before adding in other features/engines etc that I've not had time to yet.

In effect though, the code base should be mostly stable now, so if people want to look at adding/changing any code, that should be ok.

FYI, for those whom are interested in trying to get AMD, Intel or Mac acceleration working, You're welcome to give it a go. Obviously different TTS engines code will be capable of supporting different things, however, all the AllTalk code to speak to any TTS engines is now broken out into each engine you can find in /system/tts_engines/{enginename}/ and you can simply work on one of those OR even just copy one and add it to the tts_engines.json list as an engine to test/work with. There will be no need to touch or interact with any other code.

I'm welcome to general feedback here, but, if you have a big technical problem lets do that through a ticket and try keep this cleaner here for general discussion.

The Feature Requests List of things I haven't gotten around to is here

I've literally spent X days in front of a computer screen 10+ hours each day, so do excuse me taking a bit of a day or two's break (aka, my responses my be slow).

One Piper voice (that I know of) `en_US-ryan-xxxxxx.onnx` has issues. Its a known issue and not AllTalk. It sometimes works and sometimes speaks a garbled mess.

Thanks

mercuryyy · 2024-06-07T05:13:46Z

mercuryyy
Jun 7, 2024

Installing and Testing now, Amazing job. Will post results / comments soon

6 replies

mercuryyy Jun 7, 2024

Was able to get RVC to work, can you recommend some default RVC voice models to download that would improve XTTS in general.

erew123 Jun 7, 2024
Maintainer Author

RE "[AllTalk ENG] Warning: Model 'tts_models--en--jenny--jenny' does not match any known model type."

That's just trying to load a default model in and none exists hence the warning. But noted. I did tidy up the tts_engines.json but it didn't upload right so, I just uploaded another copy. Will tidy that up at some point.

When a model has been downloaded, you should be able to use the Refresh Server Settings button to get it to update the dropdowns. Worst case Swap TTS Engine is the "Have you tried turning it off and on again" of AllTalk, so should always get back to square one.

Where can we download some default RVC voices just for testing?

In the documentation there's a link to https://voice-models.com/ 74,000 voices (so it claims).

erew123 Jun 7, 2024
Maintainer Author

Quick note. RVC, Hop length should probably be 130. Index influence 0.75.

mercuryyy Jun 7, 2024

So for vits i did all that, i manually check and tts_models--en--jenny--jenny is in the models dir but still giving the error.

Yeah i saw https://voice-models.com/. Thank you!

erew123 Jun 10, 2024
Maintainer Author

Hi @mercuryyy With the VITS tts engine loaded, you should have any VITS models listed in the "Load Different Model" dropdown. Select a VITS model in there and click the button and see if that resolves it for you.

Jxspa · 2024-06-07T10:01:45Z

Jxspa
Jun 7, 2024

All good here (windows 10). Thank you for your hard work. It looks great! Love being able to easily switch between finetuned xtts models.

0 replies

bollerdominik · 2024-06-08T07:48:23Z

bollerdominik
Jun 8, 2024

Thanks for your work. Testing it on RunPod Ubuntu.

Installation worked fine but running it I get

The "Running in Docker" is strange as I don't have docker installed.

After manually editing the script.py I got it to work. Unfortunately I can't get DeepSpeed to work. It says it is installed
DeepSpeed version : 0.14.2+cu121torch2.2 but all requests are with DeepSpeed: False and it is not clear how to enable it.

Can't get Gradio UI to work since RunPod creates a Cloudflare Tunnel and afaik there is no way to specify a a custom API / Gradio domain during the AllTalk setup.

I still hope a future version of AllTalk can nicely integrate with running the application in the cloud (Runpod, Collab, etc)

3 replies

erew123 Jun 10, 2024
Maintainer Author

Hi @bollerdominik Colab and docker/runpods all need a few minor changes, though Ill be working on Colab first. This Im hoping will be easy, but could be another 10-20+ hours of testing various things so I didnt want to delay getting the BETA out at this time. Hopefully will have an update on this soon though.

Re DeepSpeed. that is set on a per-engine basis (where engines support it).

You need to enable it there and then reload the TTS engine or model.

Thanks

bollerdominik Jun 10, 2024

I can't use Gradio UI because of the issue mentioned above. Is there any manual way to enable DeepSpeed. I tried updating confignew.json but it still shows

  "deepspeed_capable": true,
  "deepspeed_available": true,
  "deepspeed_enabled": false,

in the API.

Edit: Managed to enabled it by reading the code and calling the /api/deepspeed?new_deepspeed_value=true. Now deepspeed works

erew123 Jun 10, 2024
Maintainer Author

@bollerdominik Ive updated the code today, so that if you can get a 2nd tunnel working to the server, you will have access to the gradio interface. The tunnel will need to pass to port 7852. Gradio will be accessible, but you will not be able to generate TTS in Gradio (yet) though it will give you control into the models.

That aside, currently XTTS is the only tts engine in there that supports DeepSpeed. You can edit the XTTS TTS engine setting (that you would change in the interface) by going to /system/tts_engines/xtts/ and editing model_settings.json to change "deepspeed_enabled": false, to "deepspeed_enabled": true,

Suiyou · 2024-06-09T04:13:15Z

Suiyou
Jun 9, 2024

Great work, it installed without issues, I was able to download mostly everything through the interface except new RVC voices, change between tts engines/models, use RVC voices, etc. XTTS is still the best free option in my opinion for speed with decent emotive voices and coupled with RVC you can make it sound even better. I was already using another project to do that for my TTS generator output folder files before joining them.

If you want, you can take a look at a github project called Applio that has an integrated search and download of RVC voices and I don't remember what TTS engine it uses but its the fastest I tried, not as emotive and without RVC it doesn't sound quite so good, the only downside it's that it has a hard limit to the amount of lines you can generate per generation.

I primarily use TTS generator but its nice to setup everything swap to TTS gen and keep working with it. Speaking of TTS gen, it would be convenient if RVC, joining and transcoding gets performed at the end after I finish checking the lines I have to regenerate to avoid the overhead time (I know I could disable RVC but then I have to use another project like I've been doing so far to process them) .

Also TTS Gen has a bug that after it finishes generating every chunk it won't enable the options to export, play, clear, etc, you have to regenerate any chunk to get them enabled.

I think there is bug when using the tab for Voice2RVC, it won't show me the models I got.

Lastly I tried the option to analyze the accuracy of the generated audio, great addition but to me is impractically slow on my machine, I use another whisper project to transcribe the audio and a python script to compare the lines in the json file generated by tts gen, 1000 files at a time, it takes me like 15 minutes.

I need to check all the the other new functionality added in this beta but the project keeps getting better every time I check.

3 replies

erew123 Jun 10, 2024
Maintainer Author

Hi @Suiyou Thanks for the detailed feedback. With the TTSGen, I've not really touched the code of that yet, Im hoping it may be possible to pull it into Gradio.

RVC Generation after initially generating your TTS, I can see how that would be useful. It will be a decent extra bit of coding to figure that one out, but Ill think how to get that working.

Re Voice2RVC, have you got RVC globally enabled? Global Settings > RVC settings? Thats the only reason I can think the list would remain at Disabled.

Re "TS Gen has a bug that after it finishes generating every chunk it won't enable the options to export, play, clear, etc," that one Ive not seen happen. Humm, I wonder if that could be a browser specific issue. What browser do you use?

Re "analyze the accuracy of the generated audio" I know there may be faster options out there depending on how whisper is loaded in or other voice models. I have stored a few links on other ones to look at, though typically they do require a higher memory use. Will be something I re-visit when I look at the TTS gen setup (and possibly general audio transcoding).

Thanks

Suiyou Jun 11, 2024

Hi!

RVC Generation after initially generating your TTS, I can see how that would be useful. It will be a decent extra bit of coding to figure that one out, but Ill think how to get that working.

Yes, it would be useful, thanks.

Re Voice2RVC, have you got RVC globally enabled? Global Settings > RVC settings? Thats the only reason I can think the list would remain at Disabled.

Now I can't replicate it, the voices are shown even when I set Global Settings > RVC settings to disabled, of course it throws an error, but then as soon as I enable the option, the conversion goes without a problem. It may have happened because initially I tried to use Symlinks so I didn't have to duplicate those files between the other application I was using and alltalk.

Re "TS Gen has a bug that after it finishes generating every chunk it won't enable the options to export, play, clear, etc," that one Ive not seen happen. Humm, I wonder if that could be a browser specific issue. What browser do you use?

I tested it in Opera and Chrome, but I figured it out, it happens when you select "playback: No playback", since I usually generate a book's worth of text I don't need to hear the generations right away, I was about to test it in Firefox too but I forgot to change the playback type and then I saw the difference.

Re "analyze the accuracy of the generated audio" I know there may be faster options out there depending on how whisper is loaded in or other voice models. I have stored a few links on other ones to look at, though typically they do require a higher memory use. Will be something I re-visit when I look at the TTS gen setup (and possibly general audio transcoding).

To test the accuracy I use this github project https://github.com/jhj0517/Whisper-WebUI, but the version you can install with Pinokio. Then I open the json file replace the route to each wav to point at the outputs directory and use this script.
compare.txt

The one thing that I don't know if it's posible is to teach the model how to pronounce certain words, especially names and made up terms, depending on the words I currently re-write them to make them sound close enough to the real pronunciation.

Thanks for the hard work.

erew123 Jun 12, 2024
Maintainer Author

@Suiyou Yes it may have issues with Symlinks... really not sure on Pythons handling of that. I do know that Firefox browsers have some quirks playing back audio, so I would assume any browsers that are built off Firefox may have quirks too. A ticket I looked into streaming issues is here #143, though I haven't explored further to see if there are other quirks, there may be, I just dont have time at the moment to test across all browsers.

Re "teach the model how to pronounce certain words," Im assuming you mean XTTS models. In theory yes, but how much Finetuning it would need I dont know. You can teach them other/new languages with enough training, so in theory any new word should be possible. You would have to research on Coqui's site on building a training set for such a thing.

Dagbafrosty · 2024-06-09T20:24:32Z

Dagbafrosty
Jun 9, 2024

Hey, thanks for the beta! One issue I noticed is not being able to access the gradio page from other devices in the network. The api page and TTS generator page is accessible using 192.168... but not gradio, not sure if this is an issue on my end. Everything is accessible from the host computer on 127.0 etc

1 reply

erew123 Jun 10, 2024
Maintainer Author

Hi @Dagbafrosty I know what this will be. It will come back to a change I make for the Colab setups etc. Its not your computer so dont worry. I will hopefully have an update soon for this. Will post back on here.

Thanks

m-eideh · 2024-06-10T11:23:02Z

m-eideh
Jun 10, 2024

Hi, thank you for the great work.

I was trying to create a dataset for Arabic language, however, I'm getting the following error:

If I switch the language to English (but still using an Arabic audio files), it works fine, generates wavs correctly, and actually translate the sentences in metadata_train & metadata_eval to English.

So it's understanding the language fine, but there seems to be an issue with file/sentence generation in native Arabic.

6 replies

m-eideh Jun 11, 2024

Thank you for the prompt reply and fix @erew123!

It is now indeed generating wavs and transcribing Arabic correctly! However, there are a couple of major issues with the wav files generation:

Even-sequenced files are generating to "finetune\tmp-trn\wavs" (eg: xx_00000002.wav, xx_00000008.wav)
Odd-sequenced files are generating to "finetune\tmp-trn\wavs\wavs" (eg: xx_00000009.wav)
The file names don't match the file generated in the metadata csv files. For example, in metadata_train.csv here:

The first two sentences show that they are "wavs/731_00000002.wav" and "wavs/731_00000004.wav". But when listening to the wav files, the actual sentences should be in "wavs/wav/731_00000001.wav" and "wavs/wavs/731_00000003.wav" respectively.

This is probably related to the first two issues. I'll gladly test more scenarios if needed.

m-eideh Jun 11, 2024

One more thing to note is that all files "finetune\tmp-trn\wavs" are less than 1 second long and are basically unusable, while the ones in "finetune\tmp-trn\wavs\wavs" are the actual usable 15-second files (length which I specified in the settings).

erew123 Jun 12, 2024
Maintainer Author

Hi @m-eideh

I've just (after a very long slog coding) updated the finetuning. You will need to update the requirements to use it, start the python environment and pip install word2number then you should be good to run finetuning. The paths and csv's will(should) match up perfectly fine now.

I have added a dataset validation option (you will see the tab). This is a best effort to run Whisper against the originally generated wav samples and compare to see if the wav files match what's inside your csv files. I have no idea how well Whisper will work on other languages either for dataset generation or for this validation. But it may help you out there.

Re "are less than 1 second long and are basically unusable" and "length which I specified in the settings". I've set the code to make wav files a minimum of 1 second long, which should be fine for most cases, its ok to pick up only 1 or 2 words and train on that. The setting for 15 seconds that you set, that is a maximum wav file length, not minimum. The issue here is that the trainer can only handle training a of 12 seconds of audio at a time (or something close to that), so Whisper used to sometimes create wav files that were 2 minutes long and obviously, that 2 minute file would only get processed once per epoch with only 12 seconds of audio from it being used, meaning that a lot of the audio was never trained on with overly large wav files. As such, we split down those larger wav files into smaller files, so that individually each file will have a chance of its audio being used, rather than never used. Hopefully that makes sense. Thanks

m-eideh Jun 13, 2024

After some testing, the issues in my post are pretty much resolved, thank you @erew123!

One last thing I have noticed (maybe this is related to Whisper), when these 1 second wavs are generated, the sentences in the metadata files are duplicated, but in the second occurrence the wav file doesn't exist.

For example, these 2 lines in metadata_train are identical:

However, upon checking the wavs folder, "wavs/731_00000007.wav" exist and is only 1 second long, and contains only a few words of the full sentence printed in the metadata file, while "wavs/731_00000008.wav" doesn't exist at all.

I saw something similar with 2 and 3 second files. Out of 115 wav files generated from 4 audio sources, 8 files were affected while the rest were correctly generated at 15 seconds long with the correct sentences.

Not a huge issue as I can just manually remove these from the metadata files for now, but something to note.

erew123 Jun 13, 2024
Maintainer Author

@m-eideh Yeah I thought I spotted that possibly happening, but I couldn't manage to reproduce it with the datasets I tested. So Ive just faked up a couple of missing files in my dataset and Ive added a console/terminal printout when you run the dataset validation. It will at least tell you what missing files there are:

So that might help get you 80% of the way there and Ill re-visit the code some time in future.

gshawn3 · 2024-06-11T13:13:09Z

gshawn3
Jun 11, 2024

I ran into a couple of issues running the beta as an Oobabooga plugin. The first is that on first startup, it looked for firstrun.py in the wrong path.

05:43:44-831231 INFO     Loading the extension "alltalk_tts"
[AllTalk TTS]     _    _ _ _____     _ _       _____ _____ ____
[AllTalk TTS]    / \  | | |_   _|_ _| | | __  |_   _|_   _/ ___|
[AllTalk TTS]   / _ \ | | | | |/ _` | | |/ /    | |   | | \___ \
[AllTalk TTS]  / ___ \| | | | | (_| | |   <     | |   | |  ___) |
[AllTalk TTS] /_/   \_\_|_| |_|\__,_|_|_|\_\    |_|   |_| |____/
[AllTalk TTS]
python: can't open file 'C:\\Users\\Admin\\Desktop\\text_generation_webui\\system\\config\\firstrun.py': [Errno 2] No such file or directory

Error occurred while running the script: Command '['python', 'system/config/firstrun.py']' returned non-zero exit status 2.

In my specific case, it should have looked for that file in C:\\Users\\Admin\\Desktop\\text_generation_webui\\extensions\\alltalk_tts\\system\\config\\firstrun.py

Editing the script_path variable with the correct path fixed the issue on the next startup.

The second issue is tricker and I haven't been able to figure it out. Despite seemingly having all the requirements installed correctly, the app complains that there is a missing Gradio "system" module:

05:53:32-686542 INFO     Starting Text generation web UI
05:53:32-689532 INFO     Loading the extension "alltalk_tts"
[AllTalk TTS]     _    _ _ _____     _ _       _____ _____ ____
[AllTalk TTS]    / \  | | |_   _|_ _| | | __  |_   _|_   _/ ___|
[AllTalk TTS]   / _ \ | | | | |/ _` | | |/ /    | |   | | \___ \
[AllTalk TTS]  / ___ \| | | | | (_| | |   <     | |   | |  ___) |
[AllTalk TTS] /_/   \_\_|_| |_|\__,_|_|_|\_\    |_|   |_| |____/
[AllTalk TTS]
[AllTalk TTS] Config file update: No Updates required
[AllTalk TTS] Start-up Mode     : Text-gen-webui mode
[AllTalk TTS] WAV file deletion : Disabled
[AllTalk TTS] Github updated    : 6th June 2024 at 22:23
[AllTalk ENG] Transcoding       : ffmpeg found
[AllTalk ENG] DeepSpeed version : Not available
[AllTalk ENG] Python Version    : 3.11.9
[AllTalk ENG] PyTorch Version   : 2.2.1+cu121
[AllTalk ENG] CUDA Version      : 12.1
[AllTalk ENG]
[AllTalk ENG] Model/Engine : xttsv2_2.0.3 loading into cuda
[AllTalk ENG] Model License: https://coqui.ai/cpml.txt
[AllTalk ENG] Load time : 11.66 seconds.
[AllTalk TTS]
[AllTalk TTS] API Address : 127.0.0.1:7851
[AllTalk TTS] Gradio Light: http://127.0.0.1:7852
[AllTalk TTS] Gradio Dark : http://127.0.0.1:7852?__theme=dark
[AllTalk TTS]
05:53:55-750966 ERROR    Could not import the requirements for 'alltalk_tts'. Make sure to install the requirements for the extension.

                         * To install requirements for all available extensions, launch the
                           update_wizard script for your OS and choose the B option.

                         * To install the requirements for this extension alone, launch the
                           cmd script for your OS and paste the following command in the
                           terminal window that appears:

                         Linux / Mac:

                         pip install -r extensions/alltalk_tts/requirements.txt --upgrade

                         Windows:

                         pip install -r extensions\alltalk_tts\requirements.txt --upgrade

05:53:55-753956 ERROR    Failed to load the extension "alltalk_tts".
Traceback (most recent call last):
  File "C:\Users\Admin\Desktop\text_generation_webui\modules\extensions.py", line 37, in load_extensions
    extension = importlib.import_module(f"extensions.{name}.script")
                ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "C:\Users\Admin\Desktop\text_generation_webui\installer_files\env\Lib\importlib\__init__.py", line 126, in import_module
    return _bootstrap._gcd_import(name[level:], package, level)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "<frozen importlib._bootstrap>", line 1204, in _gcd_import
  File "<frozen importlib._bootstrap>", line 1176, in _find_and_load
  File "<frozen importlib._bootstrap>", line 1147, in _find_and_load_unlocked
  File "<frozen importlib._bootstrap>", line 690, in _load_unlocked
  File "<frozen importlib._bootstrap_external>", line 940, in exec_module
  File "<frozen importlib._bootstrap>", line 241, in _call_with_frames_removed
  File "C:\Users\Admin\Desktop\text_generation_webui\extensions\alltalk_tts\script.py", line 1141, in <module>
    import system.gradio_pages.themes.loadThemes as loadThemes
ModuleNotFoundError: No module named 'system'

Running on local URL:  http://127.0.0.1:7860

Note that I've tried installing the requirements first through atsetup.bat, and subsequently via pip install -r system\requirements\requirements_textgen.txt, but either way the error persists. (Side note that the instructions to install requirements still need to be updated in the error message above.)

Here are the relevant sections from running the diagnostics:

CUDA Device : NVIDIA GeForce RTX 3090 Ti
CUDA Memory : 23.99 GB
CUDA Version: 12.1
CUDA Working: Success - CUDA is available and working.
CUDA_HOME   : C:\Users\Admin\Desktop\text_generation_webui\installer_files\env
Cublas64_11 : C:\Users\Admin\Desktop\text_generation_webui\installer_files\env\Lib\site-packages\nvidia/cublas\bin\cublas64_11.dll

  If you do not have a CUDA version and CUDA is failing, you will not have your
  TTS engines being accelerated with CUDA. CUDA is only available on Nvidia GPU
  and is setup by installing PyTorch with a correct CUDA version in your Python
  virtual environment.

PyTorch Version  : 2.2.1+cu121
Python Version   : 3.11.9
Python Executable: C:\Users\Admin\Desktop\text_generation_webui\installer_files\env\python.exe

  AllTalk has been validated to run on Python 3.11.x versions and also PyTorch
  2.0.x to 2.2.x. Earlier or later versions of PyTorch and Python may not work.

Conda Environment: C:\Users\Admin\Desktop\text_generation_webui\installer_files\env

Python Search Path:
  C:\Users\Admin\Desktop\text_generation_webui\extensions\alltalk_tts
  C:\Users\Admin\Desktop\text_generation_webui\installer_files\env\python311.zip
  C:\Users\Admin\Desktop\text_generation_webui\installer_files\env\DLLs
  C:\Users\Admin\Desktop\text_generation_webui\installer_files\env\Lib
  C:\Users\Admin\Desktop\text_generation_webui\installer_files\env
  C:\Users\Admin\Desktop\text_generation_webui\installer_files\env\Lib\site-packages
  C:\Users\Admin\Desktop\text_generation_webui\installer_files\env\Lib\site-packages\win32
  C:\Users\Admin\Desktop\text_generation_webui\installer_files\env\Lib\site-packages\win32\lib
  C:\Users\Admin\Desktop\text_generation_webui\installer_files\env\Lib\site-packages\Pythonwin

  If you are correctly in the AllTalk Python virtual environment, you will
  expect to see 'alltalk_environment' as part of the path of the above folders.
  If you are running AllTalk as part of Text-generation-webui, you should see
  'text-generation-webui' listed in the path of the above folders. If you dont
  see them mentioned, you have probably not started the correct Python virtual
  environment.

Requirements file package comparison:
  coqui-tts           Required: >= 0.24.1        Installed: 0.24.1
  faster-whisper      Required: >= 1.0.*         Installed: 1.0.2
  gradio              Required: >= 4.26.0        Installed: 4.26.0
  importlib_metadata  Required: >= 7.1.*         Installed: 7.1.0
  inputimeout         Required: >= 1.0.4         Installed: 1.0.4
  Jinja2              Required: >= 3.1.*         Installed: 3.1.2
  librosa             Required: >= 0.10.2.post1  Installed: 0.10.2.post1
  nvidia-cublas-cu11  Required: >= 11.11.3.6     Installed: 11.11.3.6
  nvidia-cudnn-cu11   Required: >= 9.1.1.17      Installed: 9.1.1.17
  onnxruntime-gpu     Required: >= 1.18.*        Installed: 1.18.0
  pydantic            Required: >= 2.7.*         Installed: 2.7.3
  python-ffmpeg       Required: >= 2.0.*         Installed: 2.0.12
  python-Levenshtein  Required: >= 0.25.1        Installed: 0.25.1
  praat-parselmouth   Required: >= 0.4.*         Installed: 0.4.3
  pyworld             Required: >= 0.3.*         Installed: 0.3.4
  sounddevice         Required: >= 0.4.7         Installed: 0.4.7
  soundfile           Required: >= 0.12.*        Installed: 0.12.1
  spacy               Required: >= 3.7.1         Installed: 3.7.5
  torchcrepe          Required: >= 0.0.2         Installed: 0.0.22
  tqdm                Required: >= 4.66.*        Installed: 4.66.4
  unidic-lite         Required: >= 1.0.8         Installed: 1.0.8
  uvicorn             Required: >= 0.29.0        Installed: 0.30.1

(Another side note, the message above mentions "you should see 'text-generation-webui' listed in the path of the above folders." That is no longer correct, because dashes in the folder name now cause AllTalk to throw an error on startup. Could be confusing to some users.)

Let me know if you'd like me to try anything specific to troubleshoot.

3 replies

erew123 Jun 12, 2024
Maintainer Author

@gshawn3 I know exactly what this is the second I saw it and I realised what I have done (or not done). I had to write a chunk of extra code for identifying when AllTalk is running in TGWUI, which I did... and then in my rush dealing with other code issues, Ive not merged it into the main script. So ill take a shot at getting this merged back in today.

gshawn3 Jun 12, 2024

Thanks for being super responsive! I can wait, there's no rush on my end... The V1 of AllTalk still works great 🙂

erew123 Jun 12, 2024
Maintainer Author

Should be done. You can probably just git pull and it should be ok, but feel free to re-apply the requirements to be sure.

Mithadon · 2024-06-12T01:32:45Z

Mithadon
Jun 12, 2024

From my experience with v1, and looking at the screenshot of v2, this is going to be phenomenal. So glad you're doing all of this. Thank you!

0 replies

gboross · 2024-06-12T08:09:41Z

gboross
Jun 12, 2024

Hello, the second variation is really great. By the way, is it possible for you to make it so that it can serve multiple clients simultaneously rather than sequentially as the requests come in? So, can it be asynchronous? Of course, if there are enough resources, but could it be done even through Docker? Thanks.

1 reply

erew123 Jun 12, 2024
Maintainer Author

Hi @gboross Please see the Feature Requests list and the links in there on Streaming to see where that it at link here

StellarBeing25 · 2024-06-12T09:07:01Z

StellarBeing25
Jun 12, 2024

Hey, V2 is great. Here are some suggestions to further streamline the user interface. The contents of the AllTalk v2 Beta, Generate Help, API Endpoints & Dev, and About This Project tabs should all be moved under the Documentation and Help section. TTS-generation settings can also be shifted under Global Settings. Please consider.

2 replies

erew123 Jun 12, 2024
Maintainer Author

Hi @StellarBeing25 My ultimate goal was potentially to make these modular, so you can in effect turn off certain pages/things in the interface. I've not had chance to do that yet though. That aside, I would intend to put documentation in the documentation section. The issue is people finding it/reading it. I often find I spend quite a lot of time pointing people to the documentation, so while its in BETA, I have left things quite prominent in the interface, in a hope it will ease my burden and also people can tell me problems they had with explanation in the documentation.

With the TTS-generation setting, these are specifically unique to each individual TTS generation. They are not stored settings, so having them elsewhere in the interface does not make sense for people whom may be developing/want to test out certain things. On the flip side of that, I may be able to make the extra settings (for want of a better term) an accordion:

Where you have an expandable section to get to these other features, which would probably cover off most of what you're suggesting?

StellarBeing25 Jun 12, 2024

Forgot to add: It would be nice to have RVC pitch adjustment also available under Generate TTS and Voice2RVC since it frequently needs to be adjusted depending upon the TTS voice selected.

jeddyhhh · 2024-06-14T01:51:05Z

jeddyhhh
Jun 14, 2024

Hey, great work on the project, I've been using v1 for a few days now and have started moving towards v2 with my project totally-real-news-bot so I can use RVC models.

I'm using alltalkv2 in TGWUI mode

I'm using the API with piper TTS, that works great, generation is muiltiple times faster than coqui (my pc build is chinese e-waste), but when I use RVC, it seems to start the conversion process, it finds a .pth model, VRAM usage goes up but then my CPU shoots to 100% like its processing something.

Is it possible RVC conversion is in cpu mode or could my setup be incorrectly configured?

2 replies

erew123 Jun 14, 2024
Maintainer Author

Hi @jeddyhhh Im actually not sure if a CPU mode for RVC will/wont work at the moment. I stripped apart and rebuilt a decent amount of RVC to get it working on Python 3.11, though I never looked at CPU specifically (not enough time on my hands to check every variation in time for a BETA, lots of other code to deal with etc). What I can say is that RVC is a 2x step process, dealing with the index file and then dealing with the model. Im 95% sure that the index file stage would work ok, however, if you want to test that, you can move the index file out of the folder and still run RVC and see if that changes anything for you. My code will (should) say "hey, no index to process, so ill just get on generating TTS". So you can eliminate one step and see if that has any change. If that worked, then its an indexing issue, if it doesnt work then that would suggest that it doesnt work on CPU. Though saying that, RVC is quite heavy processing so I cant say how long it would take on a CPU. Id suggest trying with a smaller TTS sample first.

jeddyhhh Jun 15, 2024

Hey, thanks for the reply, I think something is weird with the TGWUI mode (or my TGWUI is configured incorrectly).
I've just installed alltalkv2 as a standalone app and RVC conversion works using the GPU using the same settings as I used in TWGUI mode.

I'm pretty sure its trying to do the RVC conversion with the CPU in TWGUI mode, which I didn't know was possible, it takes way too long but doesn't crash, no errors. Just takes hours to convert 1 minute of audio.

I'll just use alltalkv2 as a standalone app for now, it all works as expected. Thanks :)

ibrah3m · 2024-06-19T10:24:22Z

ibrah3m
Jun 19, 2024

I tested the project, it's amazing!

works perfectly with English.
in Arabic I still struggling a little bit need more work , I finetune with 100 Epoch but didn't make notable change (Xtts) I saw there's different tts options but never tried them

0 replies

Mithadon · 2024-06-19T18:26:23Z

Mithadon
Jun 19, 2024

I've been using v2 for a while now and it's fantastic. I use the standalone version, usually with SillyTavern. I did have some difficulty getting the SillyTavern settings to work. Something about having so many voice/narrator dropdowns and having to match them with selections in the webui, it's confusing to me. It would be great to have some way to save some presets - for example, when selecting preset A, it populates alltalk character, narrator, and rvc character, narrator.

Can't wait for the large generator to be added to the main webui. That is my main wish, together with being able to import .txt files into it or, better yet, process an entire folder of .txt files. Wow!! Having RVC applied at the same time is so much less hassle than exporting .wav, then running it through RVC webui manually...

Last thing: are you aware of any XTTS2 finetunes for accents or gender (in English)? I've googled a lot and been to websites that claim to host tons of models but have found only a handful of XTTS2 finetunes, and not a single interesting one.

Thx!

1 reply

erew123 Jun 20, 2024
Maintainer Author

Hi @Mithadon The voice settings within ST are stored within SillyTaverns own "voicemap" save file. I did originally save things separately but got a polite nudge (friendly telling off) by the ST Devs and told to leave ST to store things in their voice map. In theory (as I understand at least) this should store your main character voice setup on a per character card basis, though it wont save the rvc voices etc separately as part of the voice map, I think they are a more global saved setting. So I dont think its something I can change easily as it doesn't align with how they want the ST code to work.....

When you say "large generator" you mean the TTS generator? Its still there as the web page version, the link is on the TAB and it will pull the default AllTalk settings you centrally can set, meaning, if you set an RVC voice as the default voice, it will generate the TTS with RVC voices, you just wont be able to select them on the web page, only in the gradio interface central/global settings. Updating the TTS Generator code to Gradio is, well, challenging lets say, mainly because of some limitations/complexities that Gradio introduces. Ive had 2x shots at it and cant get the list generation to work correctly. All the other bits do, but generating dynamic lists of text/TTS that you can edit, they are a problem, so Im considering if that can or cant be achieved. TBD.

Re finetunes, Im not aware of them generally being around on the internet. Im not too sure anything has specifically been setup to share them. Though if you want to put a post up in the Discussion area on here asking if people want to share, Ive no issue with that... I guess there is a question of where they get put to be shared... but youre welcome to put up a post.

Dolyfin · 2024-06-21T04:35:39Z

Dolyfin
Jun 21, 2024

Would you be looking to add MeloTTS to v2 at some point? Seems like one of the better (and faster) TTS models that you can also train locally.

0 replies

ElevatedKitten · 2024-06-22T13:45:00Z

ElevatedKitten
Jun 22, 2024

Can't get it work :(

[AllTalk TTS] _ _ _ _____ _ _ _____ _____ ____
[AllTalk TTS] / \ | | |_ | | | | __ | | / |
[AllTalk TTS] / _ \ | | | | |/ _ | | |/ / | | | | _
[AllTalk TTS] / ___ | | | | | (| | | < | | | | ) |
[AllTalk TTS] // __|| ||_,|||_\ || || |___/
[AllTalk TTS]
[AllTalk TTS] Config file update: No Updates required
Traceback (most recent call last):
File "C:\Users\USER\Desktop\alltalkbeta\script.py", line 190, in
import gradio as gr
File "C:\Users\USER\Desktop\alltalkbeta\alltalk_environment\env\Lib\site-packages\gradio_init_.py", line 3, in
import gradio.simple_templates
File "C:\Users\USER\Desktop\alltalkbeta\alltalk_environment\env\Lib\site-packages\gradio_simple_templates_init.py", line 1, in
from .simpledropdown import SimpleDropdown
File "C:\Users\USER\Desktop\alltalkbeta\alltalk_environment\env\Lib\site-packages\gradio_simple_templates\simpledropdown.py", line 6, in
from gradio.components.base import FormComponent
File "C:\Users\USER\Desktop\alltalkbeta\alltalk_environment\env\Lib\site-packages\gradio\components_init_.py", line 1, in
from gradio.components.annotated_image import AnnotatedImage
File "C:\Users\USER\Desktop\alltalkbeta\alltalk_environment\env\Lib\site-packages\gradio\components\annotated_image.py", line 9, in
import PIL.Image
File "C:\Users\USER\Desktop\alltalkbeta\alltalk_environment\env\Lib\site-packages\PIL\Image.py", line 100, in
from . import _imaging as core
ImportError: DLL load failed while importing _imaging: Das angegebene Modul wurde nicht gefunden.

2 replies

erew123 Jun 22, 2024
Maintainer Author

Try start_environment.bat then pip install --upgrade --force-reinstall pillow Not sure what may be breaking that atm, but that should fix it.

ElevatedKitten Jun 22, 2024

Thanks, that fixed it!

qJake · 2024-06-22T18:42:15Z

qJake
Jun 22, 2024

On a fresh install of Debian 12 with an Nvidia GPU, I followed the setup instructions - cloned the beta branch, ran ./atsetup.sh, and waited for it to finish. I didn't see any errors, but the console did clear itself a few times during installation so I wasn't able to review.

I noticed some python requirements were missing when I tried to run it, so I tried to reapply/reinstall requirements, and got this error:

error: command 'gcc' failed: No such file or directory

In atsetup.sh, should we check for gcc early and warn the user if it's not found?

1 reply

qJake Jun 22, 2024

Follow-up - pyworld still failed to build, I also needed g++ in addition to gcc to resolve the issue, so these are additional system prerequisites:

sudo apt install gcc g++

qJake · 2024-06-22T19:51:10Z

qJake
Jun 22, 2024

On Debian 12 (fresh install), after installing the proper Nvidia drivers, launching AllTalk Beta resulted in the following error:

ERROR:    Traceback (most recent call last):
  File "/home/alltalk/alltalk_tts_beta/alltalk_environment/env/lib/python3.11/site-packages/starlette/routing.py", line 732, in lifespan
    async with self.lifespan_context(app) as maybe_state:
  File "/home/alltalk/alltalk_tts_beta/alltalk_environment/env/lib/python3.11/contextlib.py", line 210, in __aenter__
    return await anext(self.gen)
           ^^^^^^^^^^^^^^^^^^^^^
  File "/home/alltalk/alltalk_tts_beta/tts_server.py", line 182, in startup_shutdown
    await model_engine.setup()
  File "/home/alltalk/alltalk_tts_beta/system/tts_engines/xtts/model_engine.py", line 171, in setup
    await self.handle_tts_method_change(tts_model)
  File "/home/alltalk/alltalk_tts_beta/system/tts_engines/xtts/model_engine.py", line 414, in handle_tts_method_change
    self.model = await self.xtts_manual_load_model(model_name)
                 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/alltalk/alltalk_tts_beta/system/tts_engines/xtts/model_engine.py", line 354, in xtts_manual_load_model
    self.model.load_checkpoint(
  File "/home/alltalk/alltalk_tts_beta/alltalk_environment/env/lib/python3.11/site-packages/TTS/tts/models/xtts.py", line 790, in load_checkpoint
    self.gpt.init_gpt_for_inference(kv_cache=self.args.kv_cache, use_deepspeed=use_deepspeed)
  File "/home/alltalk/alltalk_tts_beta/alltalk_environment/env/lib/python3.11/site-packages/TTS/tts/layers/xtts/gpt.py", line 223, in init_gpt_for_inference
    self.ds_engine = deepspeed.init_inference(
                     ^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/alltalk/alltalk_tts_beta/alltalk_environment/env/lib/python3.11/site-packages/deepspeed/__init__.py", line 346, in init_inference
    engine = InferenceEngine(model, config=ds_inference_config)
             ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/alltalk/alltalk_tts_beta/alltalk_environment/env/lib/python3.11/site-packages/deepspeed/inference/engine.py", line 158, in __init__
    self._apply_injection_policy(config)
  File "/home/alltalk/alltalk_tts_beta/alltalk_environment/env/lib/python3.11/site-packages/deepspeed/inference/engine.py", line 418, in _apply_injection_policy
    replace_transformer_layer(client_module, self.module, checkpoint, config, self.config)
  File "/home/alltalk/alltalk_tts_beta/alltalk_environment/env/lib/python3.11/site-packages/deepspeed/module_inject/replace_module.py", line 354, in replace_transformer_layer
    replaced_module = replace_module(model=model,
                      ^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/alltalk/alltalk_tts_beta/alltalk_environment/env/lib/python3.11/site-packages/deepspeed/module_inject/replace_module.py", line 603, in replace_module
    replaced_module, _ = _replace_module(model, policy, state_dict=sd)
                         ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/alltalk/alltalk_tts_beta/alltalk_environment/env/lib/python3.11/site-packages/deepspeed/module_inject/replace_module.py", line 663, in _replace_module
    _, layer_id = _replace_module(child,
                  ^^^^^^^^^^^^^^^^^^^^^^
  File "/home/alltalk/alltalk_tts_beta/alltalk_environment/env/lib/python3.11/site-packages/deepspeed/module_inject/replace_module.py", line 663, in _replace_module
    _, layer_id = _replace_module(child,
                  ^^^^^^^^^^^^^^^^^^^^^^
  File "/home/alltalk/alltalk_tts_beta/alltalk_environment/env/lib/python3.11/site-packages/deepspeed/module_inject/replace_module.py", line 639, in _replace_module
    replaced_module = policies[child.__class__][0](child,
                      ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/alltalk/alltalk_tts_beta/alltalk_environment/env/lib/python3.11/site-packages/deepspeed/module_inject/replace_module.py", line 310, in replace_fn
    new_module = replace_with_policy(child,
                 ^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/alltalk/alltalk_tts_beta/alltalk_environment/env/lib/python3.11/site-packages/deepspeed/module_inject/replace_module.py", line 247, in replace_with_policy
    _container.create_module()
  File "/home/alltalk/alltalk_tts_beta/alltalk_environment/env/lib/python3.11/site-packages/deepspeed/module_inject/containers/gpt2.py", line 20, in create_module
    self.module = DeepSpeedGPTInference(_config, mp_group=self.mp_group)
                  ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/alltalk/alltalk_tts_beta/alltalk_environment/env/lib/python3.11/site-packages/deepspeed/model_implementations/transformers/ds_gpt.py", line 20, in __init__
    super().__init__(config, mp_group, quantize_scales, quantize_groups, merge_count, mlp_extra_grouping)
  File "/home/alltalk/alltalk_tts_beta/alltalk_environment/env/lib/python3.11/site-packages/deepspeed/model_implementations/transformers/ds_transformer.py", line 58, in __init__
    inference_module = builder.load()
                       ^^^^^^^^^^^^^^
  File "/home/alltalk/alltalk_tts_beta/alltalk_environment/env/lib/python3.11/site-packages/deepspeed/ops/op_builder/builder.py", line 472, in load
    self.validate_torch_version(torch_info)
  File "/home/alltalk/alltalk_tts_beta/alltalk_environment/env/lib/python3.11/site-packages/deepspeed/ops/op_builder/builder.py", line 147, in validate_torch_version
    raise RuntimeError("PyTorch version mismatch! DeepSpeed ops were compiled and installed "
RuntimeError: PyTorch version mismatch! DeepSpeed ops were compiled and installed with a different version than what is being used at runtime. Please re-install DeepSpeed or switch torch versions. Install torch version=2.2, Runtime torch version=2.3

ERROR:    Application startup failed. Exiting.

The fix was to downgrade pytorch to 2.2.0 by activating the conda env and executing:

conda install pytorch==2.2.0 torchvision==0.17.0 torchaudio==2.2.0 pytorch-cuda=12.1 -c pytorch -c nvidia

I also had to make sure the following lines were at the bottom of my .bashrc file:

export CUDA_HOME=/usr/local/cuda-12.1
export PATH=$CUDA_HOME/bin:$PATH
export LD_LIBRARY_PATH=$CUDA_HOME/lib64

With that, the beta is working with CUDA+DeepSpeed!

0 replies

IIEleven11 · 2024-06-22T23:46:28Z

IIEleven11
Jun 22, 2024

Could not load library libcudnn_ops_infer.so.8. Error: libcudnn_ops_infer.so.8: cannot open shared object file: No such file or directory
./start_finetune.sh: line 5: 400362 Aborted python finetune.py

This error has been a nightmare. Im trying to install with Ubuntu 2204. I can get the webui up and going but then I hit the create dataset button and it throws this error. I have tried all pytorch with the 121 url and 118, I have tried it with pip and with conda. I have reinstalled CUDA and cdnn I am running out of ideas

3 replies

erew123 Jun 23, 2024
Maintainer Author

Hi @IIEleven11

Im not sure what condition your python environment will currently be in, however you should have libcudnn_ops_infer.so.8 in the alltalk_environment path.

If you dont/are unsure, feel free to delete the alltalk_environment folder and re-run the ./atsetup.sh and run the install again, which will get you back at a stable situation.

After that, lets check to see if the environment has/is started correctly, as it should set the paths for the Python virtual environment. So run ./start_environment.sh which should change the terminal prompt and the python environment should be loaded in:

Try running python finetune.py and see if you get the same error/issue?

If it works at that point, just let me know and I will see what I can do to change how start_finetune.sh starts. If not, lets move this onto a issue ticket, as we may have to swap a chunk of information and it would be better to not have a huge chain of text here in the discussions area.

Pretty sure either way its just the CUDA_HOME environment variable not setting correctly, which can be a real pain in the ass but relatively easy to solve.

IIEleven11 Jun 23, 2024

What's tricky is the script will run and the UI will be accessible, the error is only thrown when you push "create dataset". I believe specifically it's a problem with whisper or faster-whisper.

I was able to get it going though with the standalone option. The issue is something to do with the newest version of cuda and cudnn, which is what i had. Sadly I was tired of chasing dependency errors and didn't look into the underlying cause too deeply.

So to solve this I had to rollback cuda to 12.1 while removing then reinstalling cudnn. i was using WSL2/Ubuntu2204. Which is important because that requires a specific CudNN version. There's a few stack overflow and GitHub issues I was looking at for a solution. These point you towards a solution that is partially deprecated with Nvidia old keyring. So be careful that will throw an error leading down the wrong rabbit hole.

erew123 Jun 23, 2024
Maintainer Author

Glad you have it sorted!

If you were on a later build of CUDNN etc than 12.1 CUDA, that probably would be the issue. Pretty much everything built within Python is based on CUDA 11.8 or 12.1 as a standard (currently). Although later CUDA versions are available from Nvidia for use within Python, many things just dont work on later builds.

So you can have your GPU driver on any version of CUDA, but your Python environment (currently) needs things installing into it which are either based on CUDA 11.8 or 12.1 (with 12.4 potentially being approved some time this year).

To be super clear for anyone else reading, Python environments and GPU drivers are completely different things. As are CUDA development toolkits and their features like CUDNN.

AllTalk v2 BETA Download Details & Discussion #245

erew123 Jun 6, 2024 Maintainer

One Piper voice (that I know of) en_US-ryan-xxxxxx.onnx has issues. Its a known issue and not AllTalk. It sometimes works and sometimes speaks a garbled mess.

Replies: 18 comments · 34 replies

erew123 Jun 7, 2024 Maintainer Author

RE "[AllTalk ENG] Warning: Model 'tts_models--en--jenny--jenny' does not match any known model type."

Where can we download some default RVC voices just for testing?

erew123 Jun 7, 2024 Maintainer Author

erew123 Jun 10, 2024 Maintainer Author

erew123 Jun 10, 2024 Maintainer Author

erew123 Jun 10, 2024 Maintainer Author

erew123 Jun 10, 2024 Maintainer Author

erew123 Jun 12, 2024 Maintainer Author

erew123 Jun 10, 2024 Maintainer Author

erew123 Jun 12, 2024 Maintainer Author

erew123 Jun 13, 2024 Maintainer Author

erew123 Jun 12, 2024 Maintainer Author

erew123 Jun 12, 2024 Maintainer Author

erew123 Jun 12, 2024 Maintainer Author

erew123 Jun 12, 2024 Maintainer Author

erew123 Jun 14, 2024 Maintainer Author

erew123 Jun 20, 2024 Maintainer Author

erew123 Jun 22, 2024 Maintainer Author

erew123 Jun 23, 2024 Maintainer Author

erew123 Jun 23, 2024 Maintainer Author

erew123
Jun 6, 2024
Maintainer

One Piper voice (that I know of) `en_US-ryan-xxxxxx.onnx` has issues. Its a known issue and not AllTalk. It sometimes works and sometimes speaks a garbled mess.

Replies: 18 comments 34 replies

erew123 Jun 7, 2024
Maintainer Author

erew123 Jun 7, 2024
Maintainer Author

erew123 Jun 10, 2024
Maintainer Author

erew123 Jun 10, 2024
Maintainer Author

erew123 Jun 10, 2024
Maintainer Author

erew123 Jun 10, 2024
Maintainer Author

erew123 Jun 12, 2024
Maintainer Author

erew123 Jun 10, 2024
Maintainer Author

erew123 Jun 12, 2024
Maintainer Author

erew123 Jun 13, 2024
Maintainer Author

erew123 Jun 12, 2024
Maintainer Author

erew123 Jun 12, 2024
Maintainer Author

erew123 Jun 12, 2024
Maintainer Author

erew123 Jun 12, 2024
Maintainer Author

erew123 Jun 14, 2024
Maintainer Author

erew123 Jun 20, 2024
Maintainer Author

erew123 Jun 22, 2024
Maintainer Author

erew123 Jun 23, 2024
Maintainer Author

erew123 Jun 23, 2024
Maintainer Author