-
Notifications
You must be signed in to change notification settings - Fork 4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
How to load NPZ file of my voice ? #379
Comments
This is easy. Here is example of my code change is the only change needed where it says range(11) and then name your file right. or if you want to call it something else, then code that here. starting at line 74 in my version at least of generation.py ALLOWED_PROMPTS = {"announcer"} |
Hey @RageshAntony! Sorry I don't have answer for you (I just started exploring this repo today). Best, |
also, i tried that npz generator, I have not been able to produce a working npz with it. different errors every time... it generates the npz but they don't work... |
a little update. i followed the api in that link to this endpoint |
How did you use it? I renamed as "en_speaker_10.npz" and loaded as "v2/en_speaker_10", but still get "history prompt not found" error |
Same issue with me. |
Ahh, the issue is in line 71 to 75 in generation.py. The |
The voice cloning is not working. :( |
@RahulBhalley Maybe some issue with Generated NPZ or bark not supporting it |
u two didn't listen to a word I said. |
This was the best result I could get |
it played a lot of music and was not the voice it was suppose to be even when prompted to be male so.. i dunno |
howveer, that does mean that npz generator is working, I suspect it is just very picky, ie, you need to speak more clear, remove background noise, and maybe say a specific phrase instead of a generic random one |
if you want to follow my progress on this I've been actively working on this for a bit now :P |
ok, i got a better result this time. but it still makes a lot of music noises and it was my voice I used to make the npz file and I prompted bark with [man] in the text and it still sound like voice 6 female... So I dunno... |
@cybershrapnel added till 15
output includes the v2/en_speaker_10 also But still I got "History prompt not found" Then only I removed the IF block that throws the error Then I got assertion error I attached the NPZ file for reference cc: @RahulBhalley |
thats wrong |
u dont need to edit your generation.py file if you don't understand, set it back the way it was, rename ur file as en_speaker_9.npz Set up sample rateSAMPLE_RATE = 22050 or if you really want it in the v2 folder Set up sample rateSAMPLE_RATE = 22050 the v2 thing is not important. thats the directory |
If you don’t care about using other speakers already present, simply use any of those names. My issue was that voice didn’t sound like it should have. |
same, the npz is not correct or the models don't support it, not sure which |
I think the correct npz file is being loaded. And the model must support every voice if it’s trained on humongous dataset like VALL-E. But I’m doubtful about the way the latent features from voice are extracted. Maybe that part has some issue. It’s not able to fetch the timbre information from my voice. Furthermore, sometimes same speaker sounds different. The team should give some argument to control the randomness of every inference like HuggingFace gives for Stable Diffusion (that |
agreed, but Ive heard examples with bark using other voices. so... Does anyone have any example npz files we can play with? |
Did the same. But still getting Assertion error |
Could you please give me some links? I straightaway started generating speech instead of looking at other’s generated speeches. |
why do you keep using v2 in the path. Im not trying to be a jerk, but you clearly dont understand very basic coding concepts. |
@cybershrapnel
|
v2 npzs are different i think woudl explain your error. |
@cybershrapnel |
@cybershrapnel Still get assertation error |
let me check with the clone_voice.ipynb notebook |
you need to reinstall, u cleary messed up something if thats not working, because it still sounds like ur having a path issue, i tried it under both normal and v2 and it worked |
also, keep in mind there have been serious changes to this repo lately, and I think they broke it, I backed up to an older version, hence the memory issues the new versions introduces on 8g and lower cards. i think its due to the increased speed in inference but not sure. when i use the current version, get a lot of garbled audio. old version is almost perfect but very slow |
warnings.warn("torch.nn.utils.weight_norm is deprecated in favor of torch.nn.utils.parametrizations.weight_norm.") File "C:\Python310\lib\site-packages\bark\api.py", line 113, in generate_audio Process finished with exit code 1 |
Regarding loading the npz you must pass the full path with the from bark import SAMPLE_RATE, generate_audio, preload_models
from scipy.io.wavfile import write as write_wav
preload_models()
prompt = "Hello, my name is Suno. And, uh — and I like pizza. [laughs] But I also have other interests such as playing tic tac toe."
history_prompt = "/path/to/history_prompt.npz"
audio_array = generate_audio(prompt, history_prompt=history_prompt)
write_wav("/path/to/audio.wav", SAMPLE_RATE, audio_array) About the other error, your npz is not correct and you should seek help where you created that npz. Ps.: Looking at the link you provided they use a technique I tried before but it don't work, |
You're probably using the old cloning tech which produced invalid .npz files very often. Use the new hubert based methods. Most popular Bark UIs have it built in (including mine) and the original repo is https://github.com/gitmylo/bark-voice-cloning-HuBERT-quantizer |
Craft a voice with text prompts is generally done with a random voice, and then saving the bark output as a new .npz file. If you're cloning the text prompt isn't going to shape the text prompt much. It does shape it somewhat, so you can save the sample again and make new version. For example here's a variant of v2/en_spraker_3 I modified to speak faster. But that's a lot more fiddly. en_speaker_03_double_expresso.mp4As an example of voice crafting try using a random voice (no history_prompt, no .npz file) with this prompt: You'll get a very high percentage of slow calm female voices. |
Hi guys, I am using bark with hugging face transfromers API and when I tried to call history_prompt I got an error "TypeError: string indices must be integers, not 'str'". It seems that transformers API expect a dictionary like type and the value must be torch.tensor. Any clues to solve the problem? |
@JeavanCode just pass the path of the npz file, but it usually complains about custom npz files that work fine with the original bark. Looks like if the npz is not exactly in the format they need it does nothing to crop the data like the original bark does. from scipy.io import wavfile
from transformers import AutoProcessor, BarkModel
processor = AutoProcessor.from_pretrained("suno/bark-small")
model = BarkModel.from_pretrained("suno/bark-small")
voice_preset = "/path/to/history_prompt.npz"
inputs = processor("Hello, my dog is cute, I need him in my life", voice_preset=voice_preset)
audio_array = model.generate(**inputs, semantic_max_new_tokens=100)
audio_array = audio_array.cpu().numpy().squeeze()
sample_rate = model.generation_config.sample_rate
wavfile.write(f"/path/to/audio.wav", sample_rate, audio_array) |
Thanks! I didin't know I need to pass voice_preset to voice_preset instead of pass history_prompt to model.generate. BTW, how do you figure it out, is there a document or handbook or sth. ? I always get confused when calling APIs like BarkModel.from_pretrained("suno/bark-small"), I don't understand how to traceback code like this. |
Documentation https://huggingface.co/docs/transformers/model_doc/bark |
Hi, |
Yes, to clone Italian you need a hubert model specific for Italian. |
How can get a guide to train hubert base model?
Il ven 3 nov 2023, 15:51 Jairo Correa ***@***.***> ha scritto:
… Hi,
I created npz file with italian clone voice, but it's not good with
italian language. I need to create a new hubert base model and after I will
train audio?
Yes, to clone Italian you need a hubert model specific for Italian.
—
Reply to this email directly, view it on GitHub
<#379 (comment)>, or
unsubscribe
<https://github.com/notifications/unsubscribe-auth/AE43GZTVANBW2QC5XGR5HNDYCUAITAVCNFSM6AAAAAAZ7A3AYOVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMYTOOJSGU4DKNZZGI>
.
You are receiving this because you commented.Message ID:
***@***.***>
|
https://github.com/gitmylo/bark-voice-cloning-HuBERT-quantizer#how-do-i-train-it-myself |
I already do It, but base model it's in english.
I mean, how can create base hubert in italian for training other speaker.
Il ven 3 nov 2023, 20:33 Jairo Correa ***@***.***> ha scritto:
… How can get a guide to train hubert base model?
https://github.com/gitmylo/bark-voice-cloning-HuBERT-quantizer#how-do-i-train-it-myself
—
Reply to this email directly, view it on GitHub
<#379 (comment)>, or
unsubscribe
<https://github.com/notifications/unsubscribe-auth/AE43GZWMVMLRWDLE3F6FDBDYCVBHNAVCNFSM6AAAAAAZ7A3AYOVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMYTOOJSHE4TCMZZGA>
.
You are receiving this because you commented.Message ID:
***@***.***>
|
Read the How do I train it myself? it explains how to create a new model in any language you want. |
Repeat. I do It but it's not good with italian, because base pth it's in
english
Il ven 3 nov 2023, 22:45 Jairo Correa ***@***.***> ha scritto:
… I already do It, but base model it's in english. I mean, how can create
base hubert in italian for training other speaker.
Read the *How do I train it myself?
<https://github.com/gitmylo/bark-voice-cloning-HuBERT-quantizer#how-do-i-train-it-myself>*
it explains how to create a new model in any language you want.
—
Reply to this email directly, view it on GitHub
<#379 (comment)>, or
unsubscribe
<https://github.com/notifications/unsubscribe-auth/AE43GZTSG7NUSM62ZPJ32IDYCVQWTAVCNFSM6AAAAAAZ7A3AYOVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMYTOOJTGEZTSNJUGU>
.
You are receiving this because you commented.Message ID:
***@***.***>
|
@Maverick1983 Looks like you are having trouble finding it so I will copy and paste it here https://github.com/gitmylo/bark-voice-cloning-HuBERT-quantizer#how-do-i-train-it-myself How do I train it myself?Simply run the training commands. A simple way to create semantic data and wavs for training, is with my script: bark-data-gen. But remember that the creation of the wavs will take around the same time if not longer than the creation of the semantics. This can take a while to generate because of that. For example, if you have a dataset with zips containing audio files, one zip for semantics, and one for the wav files. Inside of a folder called "Literature" You should run You should run You should run And when your model has trained enough, you can run To create the dataset use this repository as example but CHANGE THE BOOKS TO ITALIAN BOOKS so it works with ITALIAN https://github.com/gitmylo/bark-data-gen After you do all this things you will have a PTH file for ITALIAN |
I already do It... But not speak good italian.
Il dom 5 nov 2023, 03:19 Jairo Correa ***@***.***> ha scritto:
… @Maverick1983 <https://github.com/Maverick1983> Looks like you are having
trouble finding it so I will copy and paste it here
https://github.com/gitmylo/bark-voice-cloning-HuBERT-quantizer#how-do-i-train-it-myself
------------------------------
How do I train it myself?
Simply run the training commands.
A simple way to create semantic data and wavs for training, is with my
script: bark-data-gen <https://github.com/gitmylo/bark-data-gen>. But
remember that the creation of the wavs will take around the same time if
not longer than the creation of the semantics. This can take a while to
generate because of that.
For example, if you have a dataset with zips containing audio files, one
zip for semantics, and one for the wav files. Inside of a folder called
"Literature"
You should run process.py --path Literature --mode prepare for extracting
all the data to one directory
You should run process.py --path Literature --mode prepare2 for creating
HuBERT semantic vectors, ready for training
You should run process.py --path Literature --mode train for training
And when your model has trained enough, you can run process.py --path
Literature --mode test to test the latest model.
------------------------------
To create the dataset use this repository as example but CHANGE THE BOOKS
TO ITALIAN BOOKS so it works with ITALIAN
https://github.com/gitmylo/bark-data-gen
------------------------------
After you do all this things you will have a PTH file for ITALIAN
—
Reply to this email directly, view it on GitHub
<#379 (comment)>, or
unsubscribe
<https://github.com/notifications/unsubscribe-auth/AE43GZWOJQ77DYQI55PVHXLYC3ZUVAVCNFSM6AAAAAAZ7A3AYOVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMYTOOJTGYYTCMJVGQ>
.
You are receiving this because you were mentioned.Message ID:
***@***.***>
|
It works for others, try open a issue in the https://github.com/gitmylo/bark-voice-cloning-HuBERT-quantizer |
I created a NPZ file via this site
https://huggingface.co/spaces/fffiloni/clone-voice-for-bark
Then I put it in the /assets/prompts/v2/ as ragesh.npz
Then I loaded it like this
audio_array = generate_audio(text_prompt, history_prompt="v2/ragesh")
But I get
ValueError: history prompt not found
Then I tired like
audio_array = generate_audio(text_prompt, history_prompt="/path/to/../v2/ragesh") and still the same error
Then I tried like
audio_array = generate_audio(text_prompt, history_prompt="/path/to/../v2/ragesh.npz")
Then I got
Please help me to load NPZ file of my voice
The text was updated successfully, but these errors were encountered: