<a href="https://colab.research.google.com/github/rmcpantoja/My-Colab-Notebooks/blob/main/notebooks/ForwardTacotron_training.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# 📓`Forward Tacotron` training notebook. 📓

*Version: 1.0.*

---

This notebook has been developed by [rmcpantoja](https://github.com/rmcpantoja)

# ✉️ Thanks:

* To [Xx_Nessu_xX](https://fakeyou.com/profile/Xx_Nessu_xX) for the design and notebook fixes.
* To [Exink](http://github.com/exink) for help in the development of this notebook.


## 📝credits:

* [as-ideas/ForwardTacotron repository](https://github.com/as-ideas/ForwardTacotron).

*last update: 2023/03/26*

In [None]:
#@markdown ### 👁️check allocated GPU.
#@markdown ---
#@markdown You need at least one tesla t4, since the training process will take longer. If you have a GPU like k80, go to the menu bar and select runtime-disconnect and remove runtime.

!nvidia-smi -L

In [None]:
#@markdown ### 📁mount google drive.
#@markdown ---
#@markdown This is very important to store the checkpoints and preprocessed datasets that Forward Tacotron will be able to work with. However, some important notes:
#@markdown * It's important that you verify your storage space in [Drive](http://drive.google.com/). Depending on the size of the dataset, you need to calculate a larger amount of available space.

from google.colab import drive
drive.mount('/content/drive', force_remount=True)

In [None]:
#@markdown ## 💻install process.
#@markdown ---
#@markdown This will install the synthesizer and other important dependencies.

#@markdown * Nota: reinicia el entorno de ejecución si se te solicita y, a continuación, ejecuta esta celda nuevamente y después prosigue más adelante sin problemas.

%cd /content
import os
from os.path import exists
if (not os.path.exists("/content/ForwardTacotron")):
  !git clone https://github.com/as-ideas/ForwardTacotron
# pip:
!pip install numba librosa pyworld phonemizer webrtcvad PyYAML dataclasses soundfile scipy tensorboard matplotlib unidecode inflect resemblyzer==0.1.1-dev pandas
!pip install --upgrade gdown
%cd /content/ForwardTacotron
!rm -r .git/
#apt:
!apt install espeak-ng
print("ready")

# 🗂️ project setup.

In [None]:
%cd /content/ForwardTacotron
#@markdown ### 🔧settings wizard.

#@markdown These are some options with which we can modify settings related to data and training. You can run this cell to manage it.

#@markdown ---
# imports:
import os
import ipywidgets as widgets
from IPython.display import Markdown
from utils.files import read_config, save_config

# interface:
model_type  = widgets.Dropdown(
    options=['Single speaker', 'Multiple speakers'],
    value='Single speaker',
    description='Model variant:',
)

tts_model_id = widgets.Text(
    value='ExampleTTS',
    description='desired name for the model:',
)
tts_model = widgets.ToggleButtons(
    options=['forward_tacotron', 'multi_forward_tacotron', 'fast_pitch'],
    description='Model to train:',
)
continue_training = widgets.Checkbox(
    value=False,
    description='Continue a training?',
)

preprocess_path = widgets.Text(
    value='/content/drive/MyDrive/ForwardTacotron/EjemploTTS/dataset_preprocessed.zip',
    description='Preprocessing path ffrom Drive (if enabled)',
    disabled=True
)

custom_save_dir = widgets.Checkbox(
    value=False,
    description='Save checkpoints and preprocessing to a custom path? (recomended)',
)
save_dir = widgets.Text(
    value='/content/drive/MyDrive/ForwardTacotron/EjemploTTS',
    description='If it's this checkbox on, where do you want to save it?',
    disabled=True
)
sample_rate = widgets.IntText(
    value=22050,
    min=16000,
    max=48000,
    step=1000,
    description='Sample rate: (Opcional)',
    style={'description_width': 'initial'}
)
metafile_format = widgets.Dropdown(
    options=['ljspeech', 'ljspeech_multi', 'pandas', 'vctk'],
    value='ljspeech',
    description='Transcript format:',
    style={'description_width': 'initial'}
)
n_val = widgets.IntText(
    value=10,
    min=1,
    max=200,
    step=1,
    description='Number of validations (You can adjust it according to the size of your dataset):',
    style={'description_width': 'initial'}
)

language = widgets.Dropdown(
    options=['en-029', 'en-gb', 'en-gb-scotland', 'en-gb-x-gbclan', 'en-gb-x-gbcwmd', 'en-gb-x-rp', 'en-us'],
    value='en-us',
    description='Dataset language variation:',
    style={'description_width': 'initial'}
)
plot_every = widgets.IntText(
    value=1000,
    min=500,
    max=5000,
    step=1000,
    description='Step interval to generate model training samples (tensorboard):',
    style={'description_width': 'initial'}
)
applyBTN = widgets.Button(
    description='apply settings',
    button_style='success'
)

def on_continue_training_change(change):
    if change['new']:
        preprocess_path.disabled = False
    else:
        preprocess_path.disabled = True

def on_custom_save_dir_change(change):
    if change['new']:
        save_dir.disabled = False
    else:
        save_dir.disabled = True

def check_config(model_type):
    if model_type == "Single speaker":
        config_path = "configs/singlespeaker.yaml"
    elif model_type == "Multiple speakers":
        config_path = "configs/multispeaker.yaml"
    else:
        raise Exception("Tipo de modelo no soportado. Actualmente, puedes elegir entre Single speaker o varios")
    return config_path

def save_settings(b):
    tts_id = tts_model_id.value
    display(Markdown(f"""
# Settings summary:

* Model type: {model_type.value}.
* TTS Models Name: {tts_id}.
* TTS model to use: {tts_model.value}.
* Continue a training: {continue_training.value}.
* Use a custom save directory for the models: {custom_save_dir.value}.
* Sample rate: {sample_rate.value}.
* Transcript format: {metafile_format.value}.
* Number of validations: {n_val.value}.
* Language: {language.value}
* Generation of training samples every {plot_every.value} steps.

If there's something you need to fix, you can adjust and apply the settings again.
    """))
    config_path = check_config(model_type.value)
    config = read_config(config_path)
    config['tts_model_id'] = tts_id
    if tts_model.value == "multi_forward_tacotron" and model_type.value == "Single speaker":
        raise Exception("The multi_forward_tacotron model is only supported on multispeaker models.")
    config['tts_model'] = tts_model.value
    if continue_training.value:
        !unzip -q "{preprocess_path.value}" -d /content/ForwardTacotron
    if custom_save_dir.value:
        if not os.path.exists(save_dir.value):
            os.makedirs(save_dir.value)
    else:
        print("Warning! Changes will not be saved, only in the local project folder.")
        save_dir.value = "/content/ForwardTacotron"
    config['dsp']['sample_rate'] = sample_rate.value
    config['dsp']['vad_sample_rate'] = sample_rate.value
    if metafile_format.value == "ljspeech" and model_type.value == "Multiple speakers":
        raise Exception("The ljspeech format is supported only on Single speaker models.")
    elif metafile_format.value == "ljspeech_multi" or metafile_format.value == "pandas" or metafile_format.value == "vctk":
        if model_type.value == "Single speaker":
            raise Exception("Neither the ljspeech_multi model, pandas nor vctk do not support Single speaker models.")
    config['preprocessing']['metafile_format'] = metafile_format.value
    config['preprocessing']['n_val'] = n_val.value
    config['preprocessing']['language'] = language.value
    # reduce workers in dur extraction:
    config['duration_extraction']['num_workers'] = 2
    # Tacotron singlespeaker (80k steps):
    if model_type == "Single speaker":
        config['tacotron']['training']['schedule'] = ['5,  1e-3,  10_000,  32', '3,   1e-4,  20_000,  16', '2,   1e-4,  30_000,  8', '1,   1e-4,  40_000,  8', '5,  1e-3,  50_000,  32', '3,   1e-4,  60_000,  16', '2,   1e-4,  70_000,  8', '1,   1e-4,  80_000,  8']
    # todo: multispeaker pretrained models.
    # plot:
    config['tacotron']['training']['plot_every'] = plot_every.value
    if model_type.value == "Multiple speakers":
        config['multi_forward_tacotron']['training']['plot_every'] = plot_every.value
    else:
        config['forward_tacotron']['training']['plot_every'] = plot_every.value
        config['fast_pitch']['training']['plot_every'] = plot_every.value
    # Manage pretrained models:
    if model_type.value == "Single speaker":
        if not continue_training.value:
            if not os.path.exists(save_dir.value+"/checkpoints/"+tts_id+".tacotron"):
                os.makedirs(save_dir.value+"/checkpoints/"+tts_id+".tacotron")
            print(f"Download the pretrained model in: {save_dir.value}/checkpoints/{tts_id}.tacotron")
            !gdown -q 1-_p_NZ3Njhrx03E2VVeyBxR-1kKo3YQp -O "{save_dir.value}/checkpoints/{tts_id}.tacotron/latest_model.pt"
        else:
            print(f"Training will resume in: {save_dir.value}/checkpoints/{tts_id}.tacotron")
    else:
        print(f"Warning! Currently, there's no pre-trained model for {model_type.value}. We're probably working on it. If you want, you can submit a pull request on GitHub. A model will be trained from scratch.")
    # check checkpoints:
    if continue_training.value:
        if custom_save_dir.value:
            if not os.path.exists(save_dir.value+"/checkpoints/"+tts_id+".tacotron/latest_model.pt"):
                raise Exception("It sounds like you're trying to continue a training. However, I can't find the model in the specified path. Please fix it."+save_dir.value+"/checkpoints/"+tts_id+".tacotron/latest_model.pt")
    # phoneme singlespeaker:
    config['preprocessing']['use_phonemes'] = True
    # attention:
    if model_type.value == "Multiple speakers":
        config['multi_forward_tacotron']['training']['filter_attention'] = True
        config['multi_forward_tacotron']['training']['min_attention_sharpness'] = 0.5
        config['multi_forward_tacotron']['training']['min_attention_alignment'] = 0.75
    else:
        config['forward_tacotron']['training']['filter_attention'] = True
        config['forward_tacotron']['training']['min_attention_sharpness'] = 0.5
        config['forward_tacotron']['training']['min_attention_alignment'] = 0.75
    save_config(config, config_path)
    print("Configuration saved successfully!")
    return config_path

continue_training.observe(on_continue_training_change, names='value')
custom_save_dir.observe(on_custom_save_dir_change, names='value')

display(model_type )
display(tts_model_id)
display(tts_model)
display(Markdown("The `multi_forward_tacotron` option is supported for multi-speaker models only."))
display(continue_training)
display(preprocess_path)
display(custom_save_dir)
display(save_dir)
display(sample_rate)
display(metafile_format)
display(Markdown("The `ljspeech` format is the only one used for single-speaker models."))
display(n_val)
display(language)
display(Markdown('''
Here's a table to choose the desired language code:

| Code | Language |
|:---:|:---:|:---:|
|en-029|English (Caribbean)|
|en-gb|English (Great Britain)|
|en-gb-scotland|English (Scotland)|
|en-gb-x-gbclan|English (Lancaster)|
|en-gb-x-gbcwmd|English (West Midlands)|
|en-gb-x-rp|English (Received Pronunciation)|
|en-us|English (America)|
'''))
display(plot_every)
display(Markdown("Note: this setting will apply to all models: Tacotron, Forward_tacotron, multi_forward_tacotron (if training with multiple speakers), and FastPitch."))
display(applyBTN)
applyBTN.on_click(save_settings)
config_path = check_config(model_type.value)

In [None]:
#@markdown ### ⚙️Apply patches according to settings.
#@markdown ---
#@markdown Before continuing, it's recommended to run this cell to patch the paths where the models are saved. By skipping this cell, these will be saved to the root folder of the project instead of the custom save directory (if you have the corresponding box checked).

tts_id = tts_model_id.value
voc_id = tts_model_id.value+"_voc"
name = "test"

print("applying patch...")
with open('/content/ForwardTacotron/utils/paths.py', 'w') as f:
  f.write('''
import os
from pathlib import Path


class Paths:
    """Manages and configures the paths used by WaveRNN, Tacotron, and the data."""
    def __init__(self, data_path, tts_id):

        # directories
        self.base = Path(__file__).parent.parent.expanduser().resolve()
        self.data = Path(data_path).expanduser().resolve()
        self.quant = self.data/'quant'
        self.mel = self.data/'mel'
        self.gta = self.data/'gta'
        self.att_pred = self.data/'att_pred'
        self.alg = self.data/'alg'
        self.speaker_emb = self.data/'speaker_emb'
        self.mean_speaker_emb = self.data/'mean_speaker_emb'
        self.raw_pitch = self.data/'raw_pitch'
        self.phon_pitch = self.data/'phon_pitch'
        self.phon_energy = self.data/'phon_energy'
        self.model_output = self.base / 'model_output'
        self.save_dir = Path("'''+save_dir.value+'''").expanduser().resolve()
        self.taco_checkpoints = self.save_dir/'checkpoints/'''+tts_id+'''.tacotron'
        self.taco_log = self.taco_checkpoints / 'logs'
        self.forward_checkpoints = self.save_dir/'checkpoints/'''+tts_id+'''.forward'
        self.forward_log = self.forward_checkpoints/'logs'

        # pickle objects
        self.train_dataset = self.data / 'train_dataset.pkl'
        self.val_dataset = self.data / 'val_dataset.pkl'
        self.text_dict = self.data / 'text_dict.pkl'
        self.speaker_dict = self.data / 'speaker_dict.pkl'
        self.att_score_dict = self.data / 'att_score_dict.pkl'
        # future:
        self.duration_stats = self.data / 'duration_stats.pkl'

        self.create_paths()

    def create_paths(self):
        os.makedirs(self.data, exist_ok=True)
        os.makedirs(self.quant, exist_ok=True)
        os.makedirs(self.mel, exist_ok=True)
        os.makedirs(self.gta, exist_ok=True)
        os.makedirs(self.alg, exist_ok=True)
        os.makedirs(self.speaker_emb, exist_ok=True)
        os.makedirs(self.mean_speaker_emb, exist_ok=True)
        os.makedirs(self.att_pred, exist_ok=True)
        os.makedirs(self.raw_pitch, exist_ok=True)
        os.makedirs(self.phon_pitch, exist_ok=True)
        os.makedirs(self.phon_energy, exist_ok=True)
        os.makedirs(self.taco_checkpoints, exist_ok=True)
        os.makedirs(self.forward_checkpoints, exist_ok=True)

    def get_tts_named_weights(self, name):
        """Gets the path for the weights in a named tts checkpoint."""
        return self.taco_checkpoints / f'{name}_weights.pyt'

    def get_tts_named_optim(self, name):
        """Gets the path for the optimizer state in a named tts checkpoint."""
        return self.taco_checkpoints / f'{name}_optim.pyt'

    def get_voc_named_weights(self, name):
        """Gets the path for the weights in a named voc checkpoint."""
        return self.voc_checkpoints/f'{name}_weights.pyt'

    def get_voc_named_optim(self, name):
        """Gets the path for the optimizer state in a named voc checkpoint."""
        return self.voc_checkpoints/f'{name}_optim.pyt'
''')
print("Done!")

## working with the dataset.

**You can skip these cells if you have already preprocessed a dataset for the first time and want to train it on the last saved checkpoint. Otherwise, expand this section and read the instructions for each cell.**

In [None]:
import zipfile
import os
import os.path
#@markdown ### 💾dataset preprocessing.
#@markdown ---
#@markdown * Note: If you are going to preprocess larger datasets, it is recommended to have more space available on your drive.
#@markdown ---
#@markdown #### 🔊wavs path (zip file):
wavs_path = "/content/drive/MyDrive/Wavs_m.zip" #@param {type:"string"}
#@markdown ---
#@markdown #### ✍️Transcription Path: (By default metadata.csv)
list_path = "/content/drive/MyDrive/list.csv" #@param {type:"string"}
list_filename = os.path.basename(list_path).split('/')[-1]
#@markdown ---
%cd /content
!mkdir dataset
%cd dataset
!mkdir wavs
if zipfile.is_zipfile(wavs_path):
  !unzip -j "$wavs_path" -d /content/dataset/wavs
else:
  print("Warning: the wavs path is not a compressed file.")
if list_path.endswith('.txt'):
  raise Exception("The transcript format should be in csv extension.")
if not os.path.exists(list_path):
  raise Exception("Error: Transcript file does not exist, please try again.")
else:
  !cp $list_path /content/ForwardTacotron
%cd /content/ForwardTacotron
print("Running preprocess...")
!python preprocess.py --path /content/dataset --config "{config_path}" --metafile "{list_filename}"
if custom_save_dir.value:
  print("Backing up preprocessed data...")
  if model_type == "multiple speakers":":
    !zip -r "{save_dir.value}/dataset_preprocessed.zip" data_multisspeaker
  elif model_type == "Single speaker":
    !zip -r "{save_dir.value}/dataset_preprocessed.zip" configs data
  else:
    raise Exception("The model type is not recognized. Remember that you can only choose between a single speaker or multiple speakers.")
  print("The preprocessing has been compressed. This is useful for resuming a training next to checkpoints. The configs will not be saved, so remember the configuration settings from this session as it will be useful to return to it at any time.")

### <font color='red'>⚠️ Caution!</font> You should run this cell if you have a dataset in your forward tacotron and want to train another. The contents will be deleted. ⚠️ </font>

In [None]:
#@markdown ### <font color='red'>Remove the current dataset (if it exists):
#@markdown ---
#@markdown Since the datasets are in the working folder, you may need to train another dataset. If so, run this cell to do so.
# dataset
!rm -rf /content/ForwardTacotron/dataset
# preprocessed:
!rm -rf /content/ForwardTacotron/data/*

# 🏋️Train!
These steps will require time to achieve a stable training and after hours, and sometimes a few days, to obtain the final results. Please, I suggest carefully reading the indications of each of the cells.

In [None]:
#@markdown ### 📈Run tensorboard extension.
#@markdown --
#@markdown The tensorboard is used to visualize the model training process. Note that if you want to visualize this, you can go to the **audio**, **image** or **scalars** tabs.
%load_ext tensorboard
print(f"directory: {save_dir.value}/checkpoints")
%tensorboard --logdir "{save_dir.value}/checkpoints"
import tensorflow as tf
import datetime

In [None]:
#@markdown ### 🎤Train 1: Tacotron.
#@markdown ---
#@markdown A point to take into account is the training division.
#@markdown * The model will be trained between a total of 40k steps. By default, backups are saved every 10k steps, so we should be concerned about storage. ___(Instead, you can delete the old backups. Also, the one that really matters and is used is the latest_model.pt checkpoint that is saved most often.)___
#@markdown * Likewise, this training complies with a schedule which will apply different parameters.
#@markdown
#@markdown Let's train!

!python train_tacotron.py --config "{config_path}"
# include att score, pitch, att, aligments and more:
if custom_save_dir.value:
  print("But first, backing up the work that's just been done...")
  if model_type.value == "Multiple speakers":":
    !zip "{save_dir.value}/dataset_preprocessed.zip" data_multisspeaker
  elif model_type.value == "Single speaker":
    !zip "$save_dir/dataset_preprocessed.zip" data
  else:
    raise Exception("The model type is not recognized. Remember that you can only choose between a single speaker or multiple speakers.")
print("Done!")

In [None]:
#@markdown ### 🚀Train 2: Forward Tacotron.
#@markdown ---
#@markdown This will train the final model for Forward Tacotron, taking into account the work done previously.
#@markdown * It will also follow a schedule. By default, it trains up to 300k steps, but it can work with less.
#@markdown * Please note that care will be taken into account based on the tacotron model. If you are training with few files due to bad attention (we can tell this when training starts) there is a problem in the dataset, so please try to review it, fix what is necessary, add more data or revise carefully the Tensorboard.
#@markdown * You can adjust the batch size if you are out of memory during the session here.
batch_size = 24 #@param {type:"integer"}
# check TTS_model:
config_path = check_config(model_type.value)
config = read_config(config_path)
if tts_model.value == "forward_tacotron":
    config['forward_tacotron']['training']['schedule'] = ['5e-5,  150_000,  '+str(batch_size), '1e-5,  300_000,  '+str(batch_size)]
elif tts_model.value == "multi_forward_tacotron":
    config['multi_forward_tacotron']['training']['schedule'] = ['5e-5,  500_000,  '+str(batch_size), '1e-5,  600_000,  '+str(batch_size)]
elif tts_model.value == "fast_pitch":
    config['fast_pitch']['training']['schedule'] = ['1e-5,  5_000,  '+str(batch_size), '5e-5,  100_000,  '+str(batch_size), '2e-5,  300_000,  '+str(batch_size)]
else:
    raise Exception(f"This TTS model isn't supported: {tts_model.value}.")
save_config(config, config_path)
#@markdown ---

!python train_forward.py --config "{config_path}"

# 🧍Have you finished training for today?🏠🚶
🔊Test the model in the synthesis notebook by clicking [here.](https://colab.research.google.com/drive/1yHdMGB5H6JG44TAN5BNcv7f95beN3qYZ)