Skip to content
This repository has been archived by the owner on May 28, 2019. It is now read-only.

New Language #23

Closed
gaziway opened this issue Oct 23, 2017 · 4 comments
Closed

New Language #23

gaziway opened this issue Oct 23, 2017 · 4 comments

Comments

@gaziway
Copy link

gaziway commented Oct 23, 2017

First of all thank you for releasing the codes.
I would like to know how difficult will be to do the training on a speakers data on a new language such as Turkish. As far as I sow during the generation step there is need for some kind of pronunciation dictionary. But what about pre-processing steps, Merlin and other tools, are they language agnostic. Thank you in advance

@adampolyak
Copy link
Contributor

Yes, it is possible.

Preparing new data, requires 2 steps:

  1. Extract phonemes - it possible to do so using https://github.com/bootphon/phonemizer. The documentation suggest that Turkish is supported.

  2. Extract acoustic features - these features are agnostic to the language. You can extract using this script - just update the relevant paths to the tools directory downloaded in this repo.

@gaziway
Copy link
Author

gaziway commented Oct 31, 2017

I am trying to process your answer, for which I thank you.
Let me summarize what I understand, so you could, please, correct me if needed.
Preparing new data should be done by extract_feats.py which as input accepts folders of txt and wav files.
Hence the next natural questions is how should one combine the steps you proposed in your answer with the extract_feats.py
One alternative is:

  1. The content of the original text files should be replaced by their phonemes codes produced by phonemizer tool.
  2. for extracting acoustic feature try to combine codes from second point with extract_feats.py

@adampolyak
Copy link
Contributor

You can try to run extract_feats.py as usual and then simply update the generated npz:

save_dict = dict(numpy.load(npz_path))
save_dict['text_features'] = np.array(# run phonemizer here)
np.savez_compressed(new_npz_path, **save_dict)

@alex73
Copy link

alex73 commented Jan 9, 2019

I have own voice dataset like LJSpeech, with metadata.csv and wavs/*.wav files.
Also, I'm able to convert text into phonemes via phonemizer.
I started this script and it created: bap/*.bap, lf0/*.lf0, mgc/*.mgc files for each my wav file.

But what should be my next step for training ?

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants