-
Notifications
You must be signed in to change notification settings - Fork 160
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add Stable Audio, please! #319
Comments
Hi, thanks for requesting this! I have been procrastinating with it actually. One question - such a model would require a huggingface account and a login to be used, since this https://huggingface.co/stabilityai/stable-audio-open-1.0 cannot be automatically downloaded. Would you be ok with that? Please respond as this is a matter that could really determine whether or not people use it. |
I don't have a problem downloading the model this way, maybe you could ask for the login to download it? So those who have it can use it, those who don't can't. I don't know why it's tied to a license, but I've seen a video of it making quite good sound effects, so after the login the model would be downloaded. |
I'd be interested in trying this out too, please. |
For instance, I'm ok with it. Thanks! |
a hearty same from I
…On Thu, Jun 20, 2024 at 10:31 AM Christopher Lowden < ***@***.***> wrote:
I'd be interested in trying this out too, please.
—
Reply to this email directly, view it on GitHub
<#319 (comment)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/AMCXCYQLNAQEDMUOODLQOWDZILRS5AVCNFSM6AAAAABJCW7LPOVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDCOBQHA2TQNRUGE>
.
You are receiving this because you are subscribed to this thread.Message
ID: ***@***.***>
|
I've already downloaded the checkpoint. I presume that those who are enjoying your interface are the sort of people who already have a huggingface account. |
Stable audio has been added but is causing some problems so it might be added-removed a few times until it's 'stable'. |
Also, I just want to clarify - with extensive research - stable audio is not a 'stable diffusion 1.5' moment because it has a restrictive, potentially dangerous license (which might be legally unenforceable or impossible to defend in court; it's the very same infamous SD3 license) and I saw comments about Facebook's (notably similarly non-commercially licensed) AudioGen/MusicGen performing similarly. My biggest issue so far is that running the 'official' inference code results in ~14gb RAM usage, where due to memory management my 24 gb RAM & 24 gb VRAM system would often just fail. That being said, I really appreciate receiving information about what people want to try and see. |
I concur on the VRAM issue. I often saturate my RTX 3090 with 24GB of RAM using MusicGen. I have not been able to test MultiBandDiffusion due to VRAM saturation. I have seen that python will not release the VRAM it takes up so it blocks the GPU. I have to restart the machine to liberate the VRAM. |
Restarting the webui should be enough. Additionally, after I fix the bugs arising from adding this new model, I can spend more time on 'unload model' buttons throughout the UI; however, there will always be some leftovers that aren't unloaded. |
And as we are talking of other models ... maybe people are interested in ... Toucan TTS with 7000 languages |
For this project it seems decent but could be hard to handle if it means everyone has to install espeak. |
Ok nevermind stable audio is amazing sometimes. If you have the GPU for it, it generates quickly (anything below the 'default size', which I think is 47 seconds is not going to generate faster, but if you want a full sample it's so quick) and it often generates without needing a lot more steering that you would expect with musicgen. That being said, the license is still the way it is. |
I will close this issue as Stable Audio has been added. In the future it will be added to the React UI too. I optimized the memory a bit so while it does spike, it's a very brief amount of time so you can use the remaining VRAM freely, I tested this by running Stable Diffusion alongside Stable Audio. (Edit: so by using 'half' the consistent memory consumption is only 6gb, but there still is a few second spike of 14gb, which could perhaps be modified to allow running on smaller GPUs). Finally, I invested some GPU resources to generate Stable Audio samples and test different prompts at https://promptecho.com/stableaudio . The parameters are quite useful:
|
Thank you for fantastic work and this addon! I will use it! |
Well done. Thank you so much. I have downloaded the latest version. I am getting an error in the Stable_Audio tab Error: expected an indented block after 'if' statement on line 548 (stable_audio.py, line 550) Any ideas how I can resolve this |
Thanks for reporting, fixed it, just update normally or do a git pull for a very quick update. |
It is fixed for me. Thank you very much. Now the fun begins.
…On Mon, Jul 1, 2024 at 11:21 AM Roberts Slisans ***@***.***> wrote:
Thanks for reporting, fixed it, just update normally or do a git pull for
a very quick update.
—
Reply to this email directly, view it on GitHub
<#319 (comment)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/ABTPXSV4XSPKLLVX5RXFXOTZKENAPAVCNFSM6AAAAABJCW7LPOVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDCOJZGY2DIOJXGE>
.
You are receiving this because you commented.Message ID:
***@***.***>
|
Well sort of ...
I have downloaded the checkpoint to the *data/models/stable-audio/* folder
but I'm getting the errors below when I try to load it in the webUI.
File
"/home/admin/anaconda3/envs/python310/lib/python3.10/site-packages/gradio/queueing.py",
line 407, in call_prediction
output = await route_utils.call_process_api(
File
"/home/admin/anaconda3/envs/python310/lib/python3.10/site-packages/gradio/route_utils.py",
line 226, in call_process_api
output = await app.get_blocks().process_api(
File
"/home/admin/anaconda3/envs/python310/lib/python3.10/site-packages/gradio/blocks.py",
line 1550, in process_api
result = await self.call_function(
File
"/home/admin/anaconda3/envs/python310/lib/python3.10/site-packages/gradio/blocks.py",
line 1185, in call_function
prediction = await anyio.to_thread.run_sync(
File
"/home/admin/anaconda3/envs/python310/lib/python3.10/site-packages/anyio/to_thread.py",
line 56, in run_sync
return await get_async_backend().run_sync_in_worker_thread(
File
"/home/admin/anaconda3/envs/python310/lib/python3.10/site-packages/anyio/_backends/_asyncio.py",
line 2144, in run_sync_in_worker_thread
return await future
File
"/home/admin/anaconda3/envs/python310/lib/python3.10/site-packages/anyio/_backends/_asyncio.py",
line 851, in run
result = context.run(func, *args)
File
"/home/admin/anaconda3/envs/python310/lib/python3.10/site-packages/gradio/utils.py",
line 661, in wrapper
response = f(*args, **kwargs)
File
"/mnt/5231ec3e-8240-4386-b11a-c8f7218327ab/tts-generation-webui-main/src/stable_audio/stable_audio.py",
line 115, in load_model_helper
model_config=load_model_config(model_name),
File
"/mnt/5231ec3e-8240-4386-b11a-c8f7218327ab/tts-generation-webui-main/src/stable_audio/stable_audio.py",
line 111, in load_model_config
with open(path) as f:
*NotADirectoryError: [Errno 20] Not a directory:
'data/models/stable-audio/SD_AUDIO_V1_model.safetensors/model_config.json'*
Traceback (most recent call last):
File
"/home/admin/anaconda3/envs/python310/lib/python3.10/site-packages/gradio/queueing.py",
line 407, in call_prediction
output = await route_utils.call_process_api(
File
"/home/admin/anaconda3/envs/python310/lib/python3.10/site-packages/gradio/route_utils.py",
line 226, in call_process_api
output = await app.get_blocks().process_api(
File
"/home/admin/anaconda3/envs/python310/lib/python3.10/site-packages/gradio/blocks.py",
line 1550, in process_api
result = await self.call_function(
File
"/home/admin/anaconda3/envs/python310/lib/python3.10/site-packages/gradio/blocks.py",
line 1185, in call_function
prediction = await anyio.to_thread.run_sync(
File
"/home/admin/anaconda3/envs/python310/lib/python3.10/site-packages/anyio/to_thread.py",
line 56, in run_sync
return await get_async_backend().run_sync_in_worker_thread(
File
"/home/admin/anaconda3/envs/python310/lib/python3.10/site-packages/anyio/_backends/_asyncio.py",
line 2144, in run_sync_in_worker_thread
return await future
File
"/home/admin/anaconda3/envs/python310/lib/python3.10/site-packages/anyio/_backends/_asyncio.py",
line 851, in run
result = context.run(func, *args)
File
"/home/admin/anaconda3/envs/python310/lib/python3.10/site-packages/gradio/utils.py",
line 661, in wrapper
response = f(*args, **kwargs)
File
"/mnt/5231ec3e-8240-4386-b11a-c8f7218327ab/tts-generation-webui-main/src/stable_audio/stable_audio.py",
line 115, in load_model_helper
model_config=load_model_config(model_name),
File
"/mnt/5231ec3e-8240-4386-b11a-c8f7218327ab/tts-generation-webui-main/src/stable_audio/stable_audio.py",
line 111, in load_model_config
with open(path) as f:
*NotADirectoryError: [Errno 20] Not a directory:
'data/models/stable-audio/model.ckpt/model_config.json'*
*Any ideas are most welcome ..*
[image: image.png]
On Mon, Jul 1, 2024 at 12:02 PM ***@***.*** <
***@***.***> wrote:
… It is fixed for me. Thank you very much. Now the fun begins.
On Mon, Jul 1, 2024 at 11:21 AM Roberts Slisans ***@***.***>
wrote:
> Thanks for reporting, fixed it, just update normally or do a git pull for
> a very quick update.
>
> —
> Reply to this email directly, view it on GitHub
> <#319 (comment)>,
> or unsubscribe
> <https://github.com/notifications/unsubscribe-auth/ABTPXSV4XSPKLLVX5RXFXOTZKENAPAVCNFSM6AAAAABJCW7LPOVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDCOJZGY2DIOJXGE>
> .
> You are receiving this because you commented.Message ID:
> ***@***.***>
>
|
Ok I think I figured it out:
|
Concerning GPU memory, it has a very low memory footprint in comparison so Musicgen. My RTX 3090 has no problem with SD audio for the moment. |
So about the outputs - I want to avoid spending a huge amount of time on integrating with the old favorites system and move on to a new system. |
Now files are being saved to outputs-rvc/stableaudio/... |
Commercial use is now OK for most people, this makes Stable Audio one if not the best open source model we have! (Many other famous models are not open source, non-commercial etc) https://stability.ai/news/license-update |
Thank you for sharing this update. This is excellent news from SD. I was starting to worry that the SD project would fold to the GAFA pressure ... which is still a possibility ... |
Hello |
The audio file appearing at root - it's from the official package, strange
decision, they should stop doing that.
With the folders being empty - that is a bug. It doesn't happen for me on
windows so maybe it's scipy/wavfile problem and hopefully it can be seen as
an error in the logs. Do other tabs save properly, like bark or tortoise?
…On Wed, Jul 10, 2024, 12:37 PM Christopher Lowden ***@***.***> wrote:
Now files are being saved to outputs-rvc/stableaudio/...
Hello
I have been doing tests. I now get individually named folders in
outputs-rvc but they are empty. But the audio file still appears at the
folder root and is overwritten each time.
I replaced the stableaudio file in src with the new one but maybe there is
something else to swap too?
Many thanks
—
Reply to this email directly, view it on GitHub
<#319 (comment)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/ABTRXIZGDJ2FOIKDTOZX4XTZLT6HBAVCNFSM6AAAAABJCW7LPOVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDEMRQGAZDMOBXGE>
.
You are receiving this because you modified the open/close state.Message
ID: ***@***.***>
|
No problem with TTS or bark |
I'm using Rocky Linux 8.10 |
Incidentally, with the latest stable diffusion file, all my outputs are now 48secs long, even if the total seconds are above or below 48secs. |
|
So I think it could be the parenthesis and file system names, I will add a fix for removing the parenthesis, but could you try just a simple 'water' and see if that generation gets saved?
Ok, that helps a lot to know.
Yes, I always saw that behaviour, have you ever seen it generate a different length? To me, if I put say 10s the audio will be silent but still output 48 seconds. I tried online demos and saw the same; so I was waiting for stable diffusion to fix this.
I will check this part. value here means the default value, but it could be related. |
Or maybe not as the sample_rate = 32000 |
I checked the source of Stable Audio and it does seem like sample_size as defined within their code could determine the output length, but the gradio API they have made does not allow changing the length. They have a more internal API but it still seems like their model generates the audio equivalent of 512 by 512. |
I've done so many tests that I am probably getting confused as to what I can do with what service. |
Got it, I see we are doing a lot of back and forth so it might be useful to go on the new discord server. I will be busy for a while but hopefully can do more from there. |
The 48 seconds is interesting because the official limit is 47 seconds. I'm sorry that you can't set an custom length, but the website says it's fixed: The filenames and folder names are really long, with a prompt you can easily reach the Windows 255 character path limit. I'd rather say the date-seed format could be more manageable, since the file you're describing is next to it anyway. |
Fixed the filenames:
|
Great! Now you can better manage your folders and files, thank you! |
Even the file names need a little fix. Stable Audio Generator produced such a prompt, and it is not saved because of the characters it contains:
The problem is with the \n:
|
Should be fixed in the latest update #342 |
Great work! I tested it, the save works perfectly! Thank you! |
Please add Stable Audio to the options, if you please! Thank you very much in advance!
https://github.com/Stability-AI/stable-audio-tools
And model here:
https://huggingface.co/stabilityai/stable-audio-open-1.0
The text was updated successfully, but these errors were encountered: