Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

How to load NPZ file of my voice ? #379

Open
RageshAntony opened this issue Jul 5, 2023 · 51 comments
Open

How to load NPZ file of my voice ? #379

RageshAntony opened this issue Jul 5, 2023 · 51 comments

Comments

@RageshAntony
Copy link

I created a NPZ file via this site
https://huggingface.co/spaces/fffiloni/clone-voice-for-bark

Then I put it in the /assets/prompts/v2/ as ragesh.npz

Then I loaded it like this

audio_array = generate_audio(text_prompt, history_prompt="v2/ragesh")

But I get

ValueError: history prompt not found

Then I tired like
audio_array = generate_audio(text_prompt, history_prompt="/path/to/../v2/ragesh") and still the same error

Then I tried like
audio_array = generate_audio(text_prompt, history_prompt="/path/to/../v2/ragesh.npz")

Then I got

100%|██████████| 471/471 [00:06<00:00, 75.90it/s]
---------------------------------------------------------------------------
AssertionError                            Traceback (most recent call last)
[<ipython-input-7-defec2c3955c>](https://localhost:8080/#) in <cell line: 13>()
     11      But I also have other interests such as playing tic tac toe.
     12 """
---> 13 audio_array = generate_audio(text_prompt, history_prompt="/content/bark/bark/assets/prompts/v2/ragesh.npz")
     14 
     15 # save audio to disk

2 frames
[/usr/local/lib/python3.10/dist-packages/bark/generation.py](https://localhost:8080/#) in generate_coarse(x_semantic, history_prompt, temp, top_k, top_p, silent, max_coarse_history, sliding_window_len, use_kv_caching)
    569             and x_coarse_history.max() <= CODEBOOK_SIZE - 1
    570             and (
--> 571                 round(x_coarse_history.shape[-1] / len(x_semantic_history), 1)
    572                 == round(semantic_to_coarse_ratio / N_COARSE_CODEBOOKS, 1)
    573             )

AssertionError:

Please help me to load NPZ file of my voice

@cybershrapnel
Copy link

This is easy.
Open you generation.py file.
You will see it is expecting the file to be named with a language prename, and underscore, and then and n number.
you need to later the range of the array to go from 1 to 10 to go from 1 to 11. and then you need to name your new npz file as
en_speaker_10.npz
or 11 if you have more etc and adjust the range of the array proper.
You aren't telling your script where the npz file is, is your issue.

Here is example of my code change is the only change needed where it says range(11) and then name your file right. or if you want to call it something else, then code that here.

starting at line 74 in my version at least of generation.py

ALLOWED_PROMPTS = {"announcer"}
for _, lang in SUPPORTED_LANGS:
for prefix in ("", f"v2{os.path.sep}"):
for n in range(11):
ALLOWED_PROMPTS.add(f"{prefix}{lang}speaker{n}")

@RahulBhalley
Copy link

Hey @RageshAntony!

Sorry I don't have answer for you (I just started exploring this repo today).
I wanted to know what's the code behind https://huggingface.co/spaces/fffiloni/clone-voice-for-bark?

Best,
Rahul

@cybershrapnel
Copy link

also, i tried that npz generator, I have not been able to produce a working npz with it. different errors every time... it generates the npz but they don't work...

@cybershrapnel
Copy link

a little update. i followed the api in that link to this endpoint
https://fffiloni-clone-voice-for-bark.hf.space/
and it did generate a working npz, and it took a lot longer. I think the other one isn't passing the audio file.??
But it still didn't work work. It was very garbled at beginning, and then still sounded like voice 6 but deeper

@RageshAntony
Copy link
Author

RageshAntony commented Jul 8, 2023

and it did generate a working npz, and it took a lot longer. I think the other one isn't passing the audio file.??

How did you use it? I renamed as "en_speaker_10.npz" and loaded as "v2/en_speaker_10", but still get "history prompt not found" error

@RahulBhalley
Copy link

I renamed as "en_speaker_10.npz" and loaded as "v2/en_speaker_10", but still get "history prompt not found" error

Same issue with me.

@RahulBhalley
Copy link

Ahh, the issue is in line 71 to 75 in generation.py. The ALLOWED_PROMPTS set variable restricts the name of speakers so ours is not included in it and ValueError("history prompt not found") is being raised.

@RahulBhalley
Copy link

The voice cloning is not working. :(

@RageshAntony
Copy link
Author

@RahulBhalley
I deleted that IF block.But now get assertion error

Maybe some issue with Generated NPZ or bark not supporting it

@cybershrapnel
Copy link

u two didn't listen to a word I said.
You need to edit the generation.py to allow the array to goto 11 ffs
otherwise rename your npz file as number 9 or 8 if you don't know how to edit an array. But you shouldn't be editing this py file if you don't know how to read an array..

@cybershrapnel
Copy link

@cybershrapnel
Copy link

it played a lot of music and was not the voice it was suppose to be even when prompted to be male so.. i dunno

@cybershrapnel
Copy link

howveer, that does mean that npz generator is working, I suspect it is just very picky, ie, you need to speak more clear, remove background noise, and maybe say a specific phrase instead of a generic random one

@cybershrapnel
Copy link

if you want to follow my progress on this I've been actively working on this for a bit now :P

https://www.xtdevelopment.net/audio/

@cybershrapnel
Copy link

ok, i got a better result this time. but it still makes a lot of music noises and it was my voice I used to make the npz file and I prompted bark with [man] in the text and it still sound like voice 6 female... So I dunno...
https://github.com/suno-ai/bark/assets/17352697/bf6ab70e-4f65-4bcf-a027-f468b3729514

@RageshAntony
Copy link
Author

RageshAntony commented Jul 9, 2023

@cybershrapnel
I already did that

added till 15

ALLOWED_PROMPTS = {"announcer"}
for _, lang in SUPPORTED_LANGS:
    for prefix in ("", f"v2{os.path.sep}"):
        for n in range(15):
            ALLOWED_PROMPTS.add(f"{prefix}{lang}_speaker_{n}")

print(ALLOWED_PROMPTS)

output includes the v2/en_speaker_10 also
en_speaker_10.npz.zip

But still I got "History prompt not found"

Then only I removed the IF block that throws the error

Then I got assertion error

I attached the NPZ file for reference

cc: @RahulBhalley

@cybershrapnel
Copy link

thats wrong

@cybershrapnel
Copy link

cybershrapnel commented Jul 9, 2023

u dont need to edit your generation.py file if you don't understand, set it back the way it was, rename ur file as en_speaker_9.npz
do not change the generation.py
and call in script like this

Set up sample rate

SAMPLE_RATE = 22050
HISTORY_PROMPT = "en_speaker_9"
SPEAKER=HISTORY_PROMPT

or if you really want it in the v2 folder

Set up sample rate

SAMPLE_RATE = 22050
HISTORY_PROMPT = "v2/en_speaker_9"
SPEAKER=HISTORY_PROMPT

the v2 thing is not important. thats the directory

@RahulBhalley
Copy link

RahulBhalley commented Jul 9, 2023

If you don’t care about using other speakers already present, simply use any of those names.

My issue was that voice didn’t sound like it should have.

@cybershrapnel
Copy link

same, the npz is not correct or the models don't support it, not sure which

@RahulBhalley
Copy link

RahulBhalley commented Jul 9, 2023

I think the correct npz file is being loaded. And the model must support every voice if it’s trained on humongous dataset like VALL-E.

But I’m doubtful about the way the latent features from voice are extracted. Maybe that part has some issue. It’s not able to fetch the timbre information from my voice.

Furthermore, sometimes same speaker sounds different. The team should give some argument to control the randomness of every inference like HuggingFace gives for Stable Diffusion (that generator argument). Bark won’t be useful if the speaking style and voice will always vary across every inference (i.e. if it’ll be unpredictable every time).

@cybershrapnel
Copy link

agreed, but Ive heard examples with bark using other voices. so... Does anyone have any example npz files we can play with?

@RageshAntony
Copy link
Author

@cybershrapnel

Did the same. But still getting Assertion error

image

@RahulBhalley
Copy link

but Ive heard examples with bark using other voices.

Could you please give me some links? I straightaway started generating speech instead of looking at other’s generated speeches.

@cybershrapnel
Copy link

why do you keep using v2 in the path. Im not trying to be a jerk, but you clearly dont understand very basic coding concepts.
Stop trying to call it v2, or if you are gonna call it v2, put the file in the v2 folder. You are having nothing but a path issue, which we can't help you with. Path issues are aq very basic coding principle. You need to learn you basics on paths before you go any further. You are only having an issue with the file path. thats it. Nothing else is going except you are pathing your file wrong.

@RageshAntony
Copy link
Author

@cybershrapnel
Well. I have 7 years of coding experience. Let me tell you in detail what I did

  1. I ran the sample code with the default speaker "v2/en_speaker_9". I executed and played successfully
  2. Then I overwrite the original en_speaker_9.npz with my own "en_speaker_9.npz" .
  3. Then when I ran it again
  4. But now, i got Assertion error

@cybershrapnel
Copy link

v2 npzs are different i think woudl explain your error.
that generator clearly makes v1 npz

@RageshAntony
Copy link
Author

@cybershrapnel
May be. Let me check it

@RageshAntony
Copy link
Author

@cybershrapnel
I replaced the "en_speaker_9" inside the prompts folder

Still get assertation error

@RageshAntony
Copy link
Author

let me check with the clone_voice.ipynb notebook

@cybershrapnel
Copy link

you need to reinstall, u cleary messed up something if thats not working, because it still sounds like ur having a path issue, i tried it under both normal and v2 and it worked

@cybershrapnel
Copy link

also, keep in mind there have been serious changes to this repo lately, and I think they broke it, I backed up to an older version, hence the memory issues the new versions introduces on 8g and lower cards. i think its due to the increased speed in inference but not sure. when i use the current version, get a lot of garbled audio. old version is almost perfect but very slow

@EricKong1985
Copy link

EricKong1985 commented Jul 11, 2023

warnings.warn("torch.nn.utils.weight_norm is deprecated in favor of torch.nn.utils.parametrizations.weight_norm.")
100%|██████████| 92/92 [00:00<00:00, 112.01it/s]
Traceback (most recent call last):

File "C:\Python310\lib\site-packages\bark\api.py", line 113, in generate_audio
out = semantic_to_waveform(
File "C:\Python310\lib\site-packages\bark\api.py", line 54, in semantic_to_waveform
coarse_tokens = generate_coarse(
File "C:\Python310\lib\site-packages\bark\generation.py", line 571, in generate_coarse
round(x_coarse_history.shape[-1] / len(x_semantic_history), 1)
AssertionError

Process finished with exit code 1
I follow the topic to clone my voice, then I hit this error, anyone know how to fix it ?

@jn-jairo
Copy link
Contributor

jn-jairo commented Jul 14, 2023

Regarding loading the npz you must pass the full path with the .npz extension like:

from bark import SAMPLE_RATE, generate_audio, preload_models
from scipy.io.wavfile import write as write_wav

preload_models()

prompt = "Hello, my name is Suno. And, uh — and I like pizza. [laughs] But I also have other interests such as playing tic tac toe."

history_prompt = "/path/to/history_prompt.npz"

audio_array = generate_audio(prompt, history_prompt=history_prompt)

write_wav("/path/to/audio.wav", SAMPLE_RATE, audio_array)

About the other error, your npz is not correct and you should seek help where you created that npz.

Ps.: Looking at the link you provided they use a technique I tried before but it don't work, as far as I know there is no reliable method to really clone your voice. I tried the hubert based method mentioned below and it works fine.

@JonathanFly
Copy link
Contributor

You're probably using the old cloning tech which produced invalid .npz files very often. Use the new hubert based methods. Most popular Bark UIs have it built in (including mine) and the original repo is https://github.com/gitmylo/bark-voice-cloning-HuBERT-quantizer

@JonathanFly
Copy link
Contributor

JonathanFly commented Jul 17, 2023

ok, i got a better result this time. but it still makes a lot of music noises and it was my voice I used to make the npz file and I prompted bark with [man] in the text and it still sound like voice 6 female... So I dunno... https://github.com/suno-ai/bark/assets/17352697/bf6ab70e-4f65-4bcf-a027-f468b3729514

Craft a voice with text prompts is generally done with a random voice, and then saving the bark output as a new .npz file. If you're cloning the text prompt isn't going to shape the text prompt much. It does shape it somewhat, so you can save the sample again and make new version. For example here's a variant of v2/en_spraker_3 I modified to speak faster. But that's a lot more fiddly.
v2_en_speaker_3_double_expresso.zip

en_speaker_03_double_expresso.mp4

As an example of voice crafting try using a random voice (no history_prompt, no .npz file) with this prompt:
Listen to my soothing, relaxing voice. Breathe calmly in, and out. Slowly close your eyes. Continue to breathe at this slow pace. Feel the air expand your lungs with each in breath.

You'll get a very high percentage of slow calm female voices.

@JeavanCode
Copy link

Hi guys, I am using bark with hugging face transfromers API and when I tried to call history_prompt I got an error "TypeError: string indices must be integers, not 'str'". It seems that transformers API expect a dictionary like type and the value must be torch.tensor. Any clues to solve the problem?

@jn-jairo
Copy link
Contributor

Hi guys, I am using bark with hugging face transfromers API and when I tried to call history_prompt I got an error "TypeError: string indices must be integers, not 'str'". It seems that transformers API expect a dictionary like type and the value must be torch.tensor. Any clues to solve the problem?

@JeavanCode just pass the path of the npz file, but it usually complains about custom npz files that work fine with the original bark. Looks like if the npz is not exactly in the format they need it does nothing to crop the data like the original bark does.

from scipy.io import wavfile
from transformers import AutoProcessor, BarkModel

processor = AutoProcessor.from_pretrained("suno/bark-small")
model = BarkModel.from_pretrained("suno/bark-small")

voice_preset = "/path/to/history_prompt.npz"

inputs = processor("Hello, my dog is cute, I need him in my life", voice_preset=voice_preset)

audio_array = model.generate(**inputs, semantic_max_new_tokens=100)
audio_array = audio_array.cpu().numpy().squeeze()

sample_rate = model.generation_config.sample_rate
wavfile.write(f"/path/to/audio.wav", sample_rate, audio_array)

@JeavanCode
Copy link

Hi guys, I am using bark with hugging face transfromers API and when I tried to call history_prompt I got an error "TypeError: string indices must be integers, not 'str'". It seems that transformers API expect a dictionary like type and the value must be torch.tensor. Any clues to solve the problem?

@JeavanCode just pass the path of the npz file, but it usually complains about custom npz files that work fine with the original bark. Looks like if the npz is not exactly in the format they need it does nothing to crop the data like the original bark does.

from scipy.io import wavfile
from transformers import AutoProcessor, BarkModel

processor = AutoProcessor.from_pretrained("suno/bark-small")
model = BarkModel.from_pretrained("suno/bark-small")

voice_preset = "/path/to/history_prompt.npz"

inputs = processor("Hello, my dog is cute, I need him in my life", voice_preset=voice_preset)

audio_array = model.generate(**inputs, semantic_max_new_tokens=100)
audio_array = audio_array.cpu().numpy().squeeze()

sample_rate = model.generation_config.sample_rate
wavfile.write(f"/path/to/audio.wav", sample_rate, audio_array)

Thanks! I didin't know I need to pass voice_preset to voice_preset instead of pass history_prompt to model.generate. BTW, how do you figure it out, is there a document or handbook or sth. ? I always get confused when calling APIs like BarkModel.from_pretrained("suno/bark-small"), I don't understand how to traceback code like this.

@jn-jairo
Copy link
Contributor

Thanks! I didin't know I need to pass voice_preset to voice_preset instead of pass history_prompt to model.generate. BTW, how do you figure it out, is there a document or handbook or sth. ? I always get confused when calling APIs like BarkModel.from_pretrained("suno/bark-small"), I don't understand how to traceback code like this.

Documentation https://huggingface.co/docs/transformers/model_doc/bark
and source code https://github.com/huggingface/transformers

@Maverick1983
Copy link

Hi,
I created npz file with italian clone voice, but it's not good with italian language. I need to create a new hubert base model and after I will train audio?

@jn-jairo
Copy link
Contributor

jn-jairo commented Nov 3, 2023

Hi,
I created npz file with italian clone voice, but it's not good with italian language. I need to create a new hubert base model and after I will train audio?

Yes, to clone Italian you need a hubert model specific for Italian.

@Maverick1983
Copy link

Maverick1983 commented Nov 3, 2023 via email

@jn-jairo
Copy link
Contributor

jn-jairo commented Nov 3, 2023

How can get a guide to train hubert base model?

https://github.com/gitmylo/bark-voice-cloning-HuBERT-quantizer#how-do-i-train-it-myself

@Maverick1983
Copy link

Maverick1983 commented Nov 3, 2023 via email

@jn-jairo
Copy link
Contributor

jn-jairo commented Nov 3, 2023

I already do It, but base model it's in english. I mean, how can create base hubert in italian for training other speaker.

Read the How do I train it myself? it explains how to create a new model in any language you want.

@Maverick1983
Copy link

Maverick1983 commented Nov 4, 2023 via email

@jn-jairo
Copy link
Contributor

jn-jairo commented Nov 5, 2023

@Maverick1983 Looks like you are having trouble finding it so I will copy and paste it here

https://github.com/gitmylo/bark-voice-cloning-HuBERT-quantizer#how-do-i-train-it-myself


How do I train it myself?

Simply run the training commands.

A simple way to create semantic data and wavs for training, is with my script: bark-data-gen. But remember that the creation of the wavs will take around the same time if not longer than the creation of the semantics. This can take a while to generate because of that.

For example, if you have a dataset with zips containing audio files, one zip for semantics, and one for the wav files. Inside of a folder called "Literature"

You should run process.py --path Literature --mode prepare for extracting all the data to one directory

You should run process.py --path Literature --mode prepare2 for creating HuBERT semantic vectors, ready for training

You should run process.py --path Literature --mode train for training

And when your model has trained enough, you can run process.py --path Literature --mode test to test the latest model.


To create the dataset use this repository as example but CHANGE THE BOOKS TO ITALIAN BOOKS so it works with ITALIAN

https://github.com/gitmylo/bark-data-gen


After you do all this things you will have a PTH file for ITALIAN

@Maverick1983
Copy link

Maverick1983 commented Nov 5, 2023 via email

@jn-jairo
Copy link
Contributor

jn-jairo commented Nov 6, 2023

I already do It... But not speak good italian.

It works for others, try open a issue in the https://github.com/gitmylo/bark-voice-cloning-HuBERT-quantizer

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

8 participants