Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Model training question #26

Open
Cpgrach opened this issue Nov 14, 2022 · 4 comments
Open

Model training question #26

Cpgrach opened this issue Nov 14, 2022 · 4 comments

Comments

@Cpgrach
Copy link

Cpgrach commented Nov 14, 2022

Hi, thanks for sharing the code.
I have a folder with wav files of different speakers. I don't understand what to do next to get the trained model. What type of files should be in the "mels" and "embeds" folders.
How exactly to fill them. Maybe there is some more detailed instructions?

@zwan074
Copy link

zwan074 commented Dec 17, 2022

I have the same issue about the input data format. Please put more instructions

@li1jkdaw
Copy link

li1jkdaw commented May 19, 2023

Hi! Sorry for such a brief description of training process in readme and such a late response (I hope it will still be useful to put it here).

The whole folders structure in your data directory data_dir (it is the directory that you set in train_enc.py and train_dec.py before training starts) should look like this:

data_dir/wavs/spk1/spk1_000001.wav
---------//----------/spk1_000002.wav
and all other wav files for speaker spk1, then
data_dir/wavs/spk2/spk2_abc.wav
---------//----------/spk2_xyz.wav
and all other wav files for speaker spk2, and so on for all of your speakers.
The important thing is that filenames of your wav files should start with "<speaker_id>_", the remaining part can be any string uniquely describing the corresponding wav file.

As for mels and embeds subfolders, they should have the same structure:
data_dir/mels/spk1/spk1_000001_mel.npy
---------//----------/spk1_000002_mel.npy ,..
data_dir/mels/spk2/spk2_abc_mel.npy
---------//----------/spk2_xyz_mel.npy ,..
data_dir/embeds/spk1/spk1_000001_embed.npy
-----------//------------/spk1_000002_embed.npy ,..
data_dir/embeds/spk2/spk2_abc_embed.npy
-----------//------------/spk2_xyz_embed.npy ,..
The important thing here is that npy file containing mel-spectrogram for some wav file should have the same name with _mel appended. The same holds for npy files containing speaker embeddings - they should be appended with _embed.

Calculating mel-spectrograms and speaker embeddings from wav files to fill the subfolders mels and embeds can be performed with functions get_mel and get_embed defined in the jupyter notebook inference.ipynb correspondingly. These functions return numpy arrays that should be saved using np.save.

After you do that, you can write some wav filenames (without ".wav") to filelists/valid.txt to use them for validation purposes. Also, if for some reasons you don't want specific wavs to be used at training, you can add them in the same format to filelists/exceptions.txt. Otherwise you can leave this file empty. Paths to valid.txt and exceptions.txt should be set in train_dec.py (variables val_file and exc_file respectively) along with the path to the data directory data_dir. After these paths there is also a list of training parameters in train_dec.py (like epochs, batch_size and learning_rate). Some other important model hyperparameters can be set in params.py.

Then you can finally launch train_dec.py with the pre-trained encoder in logs_enc directory. If you also want to train the encoder yourself (e.g. your language is different from English, or you want to use a dataset richer than LibriTTS), you have to do some additional data preparation.

For training encoder you'll need additional subfolders mels_mode and textgrids with the following structure:
data_dir/mels_mode/spk1/spk1_000001_avgmel.npy
-------------//--------------/spk1_000002_avgmel.npy ,..
data_dir/mels_mode/spk2/spk2_abc_avgmel.npy
-------------//--------------/spk2_xyz_avgmel.npy ,..
data_dir/textgrids/spk1/spk1_000001.TextGrid
------------//------------/spk1_000002.TextGrid ,..
data_dir/textgrids/spk2/spk2_abc.TextGrid
------------//------------/spk2_xyz.TextGrid ,..

As for alignment TextGrid files in the subfolder textgrids, please refer to Montreal Forced Aligner for the instructions on how to get such alignment files from wavs. To get average voice mel-spectrograms in the subfolder mels_mode, please run get_avg_mels.ipynb jupyter noteboook.

After this has been done, you can launch train_enc.py to start training your encoder.

@Cpgrach
Copy link
Author

Cpgrach commented Sep 4, 2023

Thank you very much for the answer. Can you tell me if there are any encoders for the Russian language? Or datasets on which you can train the encoder?

@Biyani404198
Copy link

Hi! Sorry for such a brief description of training process in readme and such a late response (I hope it will still be useful to put it here).

The whole folders structure in your data directory data_dir (it is the directory that you set in train_enc.py and train_dec.py before training starts) should look like this:

data_dir/wavs/spk1/spk1_000001.wav ---------//----------/spk1_000002.wav and all other wav files for speaker spk1, then data_dir/wavs/spk2/spk2_abc.wav ---------//----------/spk2_xyz.wav and all other wav files for speaker spk2, and so on for all of your speakers. The important thing is that filenames of your wav files should start with "_<speaker_id>__", the remaining part can be any string uniquely describing the corresponding wav file.

As for mels and embeds subfolders, they should have the same structure: data_dir/mels/spk1/spk1_000001_mel.npy ---------//----------/spk1_000002_mel.npy ,.. data_dir/mels/spk2/spk2_abc_mel.npy ---------//----------/spk2_xyz_mel.npy ,.. data_dir/embeds/spk1/spk1_000001_embed.npy -----------//------------/spk1_000002_embed.npy ,.. data_dir/embeds/spk2/spk2_abc_embed.npy -----------//------------/spk2_xyz_embed.npy ,.. The important thing here is that npy file containing mel-spectrogram for some wav file should have the same name with _mel appended. The same holds for npy files containing speaker embeddings - they should be appended with _embed.

Calculating mel-spectrograms and speaker embeddings from wav files to fill the subfolders mels and embeds can be performed with functions get_mel and get_embed defined in the jupyter notebook inference.ipynb correspondingly. These functions return numpy arrays that should be saved using np.save.

After you do that, you can write some wav filenames (without ".wav") to filelists/valid.txt to use them for validation purposes. Also, if for some reasons you don't want specific wavs to be used at training, you can add them in the same format to filelists/exceptions.txt. Otherwise you can leave this file empty. Paths to valid.txt and exceptions.txt should be set in train_dec.py (variables val_file and exc_file respectively) along with the path to the data directory data_dir. After these paths there is also a list of training parameters in train_dec.py (like epochs, batch_size and learning_rate). Some other important model hyperparameters can be set in params.py.

Then you can finally launch train_dec.py with the pre-trained encoder in logs_enc directory. If you also want to train the encoder yourself (e.g. your language is different from English, or you want to use a dataset richer than LibriTTS), you have to do some additional data preparation.

For training encoder you'll need additional subfolders mels_mode and textgrids with the following structure: data_dir/mels_mode/spk1/spk1_000001_avgmel.npy -------------//--------------/spk1_000002_avgmel.npy ,.. data_dir/mels_mode/spk2/spk2_abc_avgmel.npy -------------//--------------/spk2_xyz_avgmel.npy ,.. data_dir/textgrids/spk1/spk1_000001.TextGrid ------------//------------/spk1_000002.TextGrid ,.. data_dir/textgrids/spk2/spk2_abc.TextGrid ------------//------------/spk2_xyz.TextGrid ,..

As for alignment TextGrid files in the subfolder textgrids, please refer to Montreal Forced Aligner for the instructions on how to get such alignment files from wavs. To get average voice mel-spectrograms in the subfolder mels_mode, please run get_avg_mels.ipynb jupyter noteboook.

After this has been done, you can launch train_enc.py to start training your encoder.

Hi,
I have followed these steps and created textgrid files. Now I want to create mels_mode sub directory. I am using get_avg_mels.ipynb jupyter noteboook but Im only getting mels_mode and lens dictionary. There are no further process or instructions to create _avgmel.npy using these two dictionary created.
Can you pls help.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants