<a href="https://colab.research.google.com/github/rmcpantoja/My-Colab-Notebooks/blob/main/notebooks/ForwardTacotron_training(beta7).ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# `Forward Tacotron` training notebook
This notebook has been developed by [rmcpantoja](https://github.com/rmcpantoja)

collaborator: [Xx_Nessu_XX](https://fakeyou.com/profile/Xx_Nessu_xX)

## credits:

* [as-ideas/ForwardTacotron repository](https://github.com/as-ideas/ForwardTacotron).


### <ins>Important!</ins>:

* For now, this notebook is not optimal for training with small datasets. I'm planning to retrain ljspeech dataset for tacotron. After that, small datasets can be trained.


*last update: 2023/02/25*

In [None]:
#@markdown ### check allocated GPU.
#@markdown ---
#@markdown You need at least one tesla t4, since the training process will take longer. If you have a GPU like k80, go to the menu bar and select runtime-disconnect and remove runtime.
#@markdown * You can also run this notebook without a GPU (not recommended) by disabling hardware acceleration in the notebook settings.
!nvidia-smi -L

In [None]:
#@markdown ### mount google drive.
#@markdown ---
#@markdown This is very important to store the checkpoints and preprocessed datasets that Forward Tacotron will be able to work with. However, some important notes:
#@markdown * It's important that you verify your storage space in [Drive](http://drive.google.com/). Depending on the size of the dataset, you need to calculate a larger amount of available space.

from google.colab import drive
drive.mount('/content/drive', force_remount=True)

In [None]:
#@markdown ## install process.
#@markdown ---
#@markdown This will install the synthesizer and other important dependencies.
%cd /content
import os
from os.path import exists
if (not os.path.exists("/content/ForwardTacotron")):
  print("Cloning repository...")
  !git clone https://github.com/as-ideas/ForwardTacotron
else:
  print("The working repository already exists. Skipping...")
# pip:
!pip install numba librosa pyworld phonemizer webrtcvad PyYAML dataclasses soundfile scipy tensorboard matplotlib unidecode inflect resemblyzer==0.1.1-dev pandas
#!pip install git+https://github.com/wkentaro/gdown.git
%cd /content/ForwardTacotron
!rm -r .git/
#apt:
!apt install espeak-ng
!wget https://github.com/mikefarah/yq/releases/download/v4.29.2/yq_linux_amd64.tar.gz
!tar -xvf yq_linux_amd64.tar.gz
!mv /content/ForwardTacotron/yq_linux_amd64 /content/ForwardTacotron/yq
#!bash install-man-page.sh
print("ready")

# Apply patches.

## **Please run after settings cell.**

In [None]:
name = "test"
from IPython.core.magic import register_line_cell_magic

@register_line_cell_magic
def writetemplate(line, cell):
    with open(line, 'w') as f:
        f.write(cell.format(**globals()))


In [None]:
%%writetemplate /content/ForwardTacotron/utils/paths.py
import os
from pathlib import Path


class Paths:
    """Manages and configures the paths used by WaveRNN, Tacotron, and the data."""
    def __init__(self, data_path, tts_id):

        # directories
        self.base = Path(__file__).parent.parent.expanduser().resolve()

        # Data Paths
        self.data = Path(data_path).expanduser().resolve()
        self.quant = self.data/'quant'
        self.mel = self.data/'mel'
        self.gta = self.data/'gta'
        self.att_pred = self.data/'att_pred'
        self.alg = self.data/'alg'
        self.speaker_emb = self.data/'speaker_emb'
        self.mean_speaker_emb = self.data/'mean_speaker_emb'
        self.raw_pitch = self.data/'raw_pitch'
        self.phon_pitch = self.data/'phon_pitch'
        self.phon_energy = self.data/'phon_energy'
        self.model_output = self.base / 'model_output'
        self.save_dir = Path("{save_dir}").expanduser().resolve()
        self.taco_checkpoints = self.save_dir/'checkpoints/{tts_id}.tacotron'
        self.taco_log = self.taco_checkpoints / 'logs'
        self.forward_checkpoints = self.save_dir/'checkpoints/{tts_id}.forward'
        self.forward_log = self.forward_checkpoints/'logs'

        # pickle objects
        self.train_dataset = self.data / 'train_dataset.pkl'
        self.val_dataset = self.data / 'val_dataset.pkl'
        self.text_dict = self.data / 'text_dict.pkl'
        self.speaker_dict = self.data / 'speaker_dict.pkl'
        self.att_score_dict = self.data / 'att_score_dict.pkl'

        self.create_paths()

    def create_paths(self):
        os.makedirs(self.data, exist_ok=True)
        os.makedirs(self.quant, exist_ok=True)
        os.makedirs(self.mel, exist_ok=True)
        os.makedirs(self.gta, exist_ok=True)
        os.makedirs(self.alg, exist_ok=True)
        os.makedirs(self.speaker_emb, exist_ok=True)
        os.makedirs(self.mean_speaker_emb, exist_ok=True)
        os.makedirs(self.att_pred, exist_ok=True)
        os.makedirs(self.raw_pitch, exist_ok=True)
        os.makedirs(self.phon_pitch, exist_ok=True)
        os.makedirs(self.phon_energy, exist_ok=True)
        os.makedirs(self.taco_checkpoints, exist_ok=True)
        os.makedirs(self.forward_checkpoints, exist_ok=True)

    def get_tts_named_weights(self, name):
        """Gets the path for the weights in a named tts checkpoint."""
        return self.taco_checkpoints / f'{name}_weights.pyt'

    def get_tts_named_optim(self, name):
        """Gets the path for the optimizer state in a named tts checkpoint."""
        return self.taco_checkpoints / f'{name}_optim.pyt'

    def get_voc_named_weights(self, name):
        """Gets the path for the weights in a named voc checkpoint."""
        return self.voc_checkpoints/f'{name}_weights.pyt'

    def get_voc_named_optim(self, name):
        """Gets the path for the optimizer state in a named voc checkpoint."""
        return self.voc_checkpoints/f'{name}_optim.pyt'




# project preparation.

In [None]:
%cd /content/ForwardTacotron
#@markdown ### settings.

#@markdown These are some options with which we can modify settings related to data and training.

#@markdown ---
#@markdown #### Choose the model variant to use:
import os
model_type = "Single speaker" #@param ["Single speaker", "Multiple speakers"]
if model_type == "Single speaker":
  config_path = "configs/singlespeaker.yaml"
elif model_type == "Multiple speakers":
  config_path = "configs/multispeaker.yaml"
else:
  raise Exception("Model type not supported. Currently, you can choose between a single speaker or multiple speakers.")
#@markdown ---
#@markdown #### desired name for the TTS model
tts_model_id = "ExampleTTS" #@param {type:"string"}
tts_id = tts_model_id
!./yq -i '.tts_model_id = "{tts_model_id}"' "{config_path}"
#@markdown ---
# waveRNN vocoder removed from last version
#@markdown #### Choose the model type to be trained on this dataset:

#@markdown The `multi_forward_tacotron` option is supported for multi-speaker models only.
tts_model = "forward_tacotron" #@param ["forward_tacotron", "multi_forward_tacotron", "fast_pitch"]
if tts_model == "multi_forward_tacotron" and model_type == "Single speaker":
  raise Exception("The multi_forward_tacotron model is only supported on multi-speaker models.")
!./yq -i '.tts_model = "{tts_model}"' "{config_path}"
#@markdown ---
#@markdown #### Continue training?
continue_training = False #@param {type:"boolean"}
#@markdown You can set the location of the preprocessed dataset that was saved to your Drive
preprocess_path = "/content/drive/MyDrive/ForwardTacotron/ExampleTTS/dataset_preprocessed.zip" #@param {type:"string"}
if continue_training:
  !unzip $preprocess_path -d /content/FforwardTacotron
#@markdown ---
#@markdown #### Save checkpoints and preprocessing to a custom path?
custom_save_dir = True #@param {type:"boolean"}

#@markdown If it's this checkbox on, where do you want to save it?
save_dir = "/content/drive/MyDrive/ForwardTacotron/ExampleTTS" #@param {type:"string"}
if custom_save_dir:
  if not os.path.exists(save_dir):
    os.makedirs(save_dir)
else:
  print("Warning! The changes will not be saved, only in the local folder of the project.")
  save_dir = "/content/ForwardTacotron"
#@markdown ---
#@markdown #### Choose the sample rate: (Optional)
sample_rate = "22050" #@param ["16000", "22050", "24000", "32000", "44100", "48000"] {allow-input: true}
!./yq -i '.dsp.sample_rate = {sample_rate}' "{config_path}"
if sample_rate == "16000":
  !./yq -i '.dsp.vad_sample_rate = 11025' "{config_path}"
#@markdown ---
#@markdown #### Choose the transcript metafile format:

#@markdown The `ljspeech` format is the only one used for single-speaker models.

metafile_format = "ljspeech" #@param ["ljspeech", "ljspeech_multi", "pandas", "vctk"]
if metafile_format == "ljspeech" and model_type == "Varios hablantes":
  raise Exception("The ljspeech format is only compatible with single-speaker models.")
else:
  !./yq -i '.preprocessing.metafile_format = "{metafile_format}"' "{config_path}"
#@markdown ---
#@markdown #### Number of validation (You can adjust it according to the size of the dataset).
n_val = 100 #@param {type:"integer"}
!./yq -i '.preprocessing.n_val = {n_val}' "{config_path}"
#@markdown ---
#@markdown #### Choose the language variation in which you have this dataset:
#@markdown Here is a table to choose the desired language code:

#@markdown Code|Language
#@markdown en-029|English (Caribbean)
#@markdown en-gb|English (Great Britain)
#@markdown en-gb-scotland|English (Scotland)
#@markdown en-gb-x-gbclan|English (Lancaster)
#@markdown en-gb-x-gbcwmd|English (West Midlands)
#@markdown en-gb-x-rp|English (Received Pronunciation)
#@markdown en-us|English (America)

language = 'en-us' #@param ["en-029", "en-gb", "en-gb-scotland", "en-gb-x-gbclan", "en-gb-x-gbcwmd", "en-gb-x-rp", "en-us"]
!./yq -i '.preprocessing.language = "{language}"' "{config_path}"
#@markdown ---
# reduce workers in dur extraction:
!./yq -i '.duration_extraction.num_workers = "2"' "{config_path}"
#@markdown #### Step interval to generate model training signals
#@markdown Here we can configure how many steps figures, images, visuals and audio will be generated, that is, the progress of the training that can be seen in tensorboard (in the following cells).
#@markdown * Note: this setting will apply to all models: Tacotron, Forward_tacotron, multi_forward_tacotron (if training with multiple speakers), and FastPitch.
plot_every = 5000 #@param {type:"integer"}
!./yq -i '.tacotron.training.plot_every = {plot_every}' "{config_path}"
if model_type == "Multiple speakers":
  !./yq -i '.multi_forward_tacotron.training.plot_every = {plot_every}' "{config_path}"
else:
  !./yq -i '.forward_tacotron.training.plot_every = {plot_every}' "{config_path}"
!./yq -i '.fast_pitch.training.plot_every = {plot_every}' "{config_path}"
#@markdown ---
# phoneme singlespeaker:
!./yq -i '.preprocessing.use_phonemes = "True"' "{config_path}"
# attention:
if model_type == "Multiple speakers":
  !./yq -i '.multi_forward_tacotron.training.filter_attention = True' "{config_path}"
  !./yq -i '.multi_forward_tacotron.training.min_attention_sharpness = 0.25' "{config_path}"
  !./yq -i '.multi_forward_tacotron.training.min_attention_alignment = 0.5' "{config_path}"
else:
  !./yq -i '.forward_tacotron.training.filter_attention = True' "{config_path}"
  !./yq -i '.forward_tacotron.training.min_attention_sharpness = 0.25' "{config_path}"
  !./yq -i '.forward_tacotron.training.min_attention_alignment = 0.5' "{config_path}"

## working with the dataset.

**You can skip these cells if you have already preprocessed a dataset for the first time and want to train it on the last saved checkpoint. Otherwise, expand this section and read the instructions for each cell.**

In [None]:
import zipfile
import os
import os.path
#@markdown ### dataset preprocessing.
#@markdown ---
#@markdown * Note: If you are going to preprocess larger datasets, it is recommended to have more space available on your drive.
#@markdown ---
#@markdown #### wavs path (zip file):
wavs_path = "/content/drive/MyDrive/Wavs_m.zip" #@param {type:"string"}
#@markdown ---
#@markdown #### Transcription Path: (By default metadata.csv)
list_path = "/content/drive/MyDrive/list_m.txt" #@param {type:"string"}
list_filename = os.path.basename(list_path).split('/')[-1]
#@markdown ---
%cd /content
!mkdir dataset
%cd dataset
!mkdir wavs
if zipfile.is_zipfile(wavs_path):
  !unzip -j "$wavs_path" -d /content/ForwardTacotron/wavs
else:
  print("Warning: the wavs path is not a compressed file.")
if list_path.endswith('.txt'):
  raise Exception("The transcript format should be in csv extension.")
if not os.path.exists(list_path):
  raise Exception("Error: Transcript file does not exist, please try again.")
else:
  !cp $list_path /content/ForwardTacotron
%cd /content/ForwardTacotron
print("Running preprocess...")
!python preprocess.py --path /content/dataset --config "{config_path}" --metafile "{list_filename}"
if custom_save_dir:
  print("Backing up preprocessed data...")
  !zip -r "$save_dir/dataset_preprocessed.zip" configs data

### <font color='red'>Caution!</font> You should run this cell if you have a dataset in your forward tacotron and want to train another. The contents will be deleted.

In [None]:
#@markdown ### remove the current dataset (if it exists):
#@markdown ---
#@markdown Since the datasets are in the working folder, you may need to train another dataset. If so, run this cell to do so.
# dataset
!rm -rf /content/ForwardTacotron/dataset
# preprocessed:
!rm -rf /content/ForwardTacotron/data/*

# Train!
These steps will require time to achieve a stable training and after hours, and sometimes a few days, to obtain the final results. Please, I suggest carefully reading the indications of each of the cells.

***LJSpeech pretrained model soon!***

In [None]:
#@markdown ### Run tensorboard extension.
#@markdown --
#@markdown The tensorboard is used to visualize the model training process. Note that if you want to visualize this, you can go to the **audio**, **image** or **scalars** tabs.
%load_ext tensorboard
%tensorboard --logdir "{data_path}/checkpoints"
import tensorflow as tf
import datetime

In [None]:
#@markdown ### Train 1: Tacotron.
#@markdown ---
#@markdown A point to take into account between Tacotron and Forward Tacotron is the training division.
#@markdown * The model will be trained between a total of 40k steps. By default, backups are saved every 10k steps, so we should be concerned about storage. ___(Instead, you can delete the old backups. Also, the one that really matters and is used is the latest_model.pt checkpoint that is saved most often.)___
#@markdown * Likewise, this training complies with a schedule which will apply different parameters.
#@markdown
#@markdown Let's train!

!python train_tacotron.py --config "{config_path}"
# include att score, pitch, att, aligments and more:
print("But first, backing up the work that's just been done...")
!zip –u "$save_dir/dataset_preprocessed.zip" configs data
print("Done!")

In [None]:
#@markdown ### Train 2: Forward Tacotron.
#@markdown ---
#@markdown This will train the final model for Forward Tacotron, taking into account the work done previously.
#@markdown * It will also follow a schedule. By default, it trains up to 300k steps, but it can work with less.
#@markdown * Please note that care will be taken into account based on the tacotron model. If you are training with few files due to bad attention (we can tell this when training starts) there is a problem in the dataset, so please try to review it, fix what is necessary, add more data or revise carefully the Tensorboard.
#@markdown * You can adjust the batch size if you are out of memory during the session here.
batch_size = 24 #@param {type:"integer"}
!./yq -i '.forward_tacotron.training.schedule[0] = 5e-5,  150_000,  {batch_size}' "{config_path}"
!./yq -i '.forward_tacotron.training.schedule[1] = 1e-5,  300_000,  {batch_size}' "{config_path}"
#@markdown ---
!python train_forward.py --config "{config_path}"

# Have you finished training for today?
Test the model in the synthesis notebook by clicking [here.](https://colab.research.google.com/drive/1yHdMGB5H6JG44TAN5BNcv7f95beN3qYZ)