<a href="https://colab.research.google.com/github/rmcpantoja/AHK-scripts-for-accessibility/blob/main/notebooks/piper_multilingual_training_notebook.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# <font color="pink"> **[Piper](https://github.com/rhasspy/piper) training notebook.**
## ![Piper logo](https://contribute.rhasspy.org/img/logo.png)

---

- Notebook made by [rmcpantoja](http://github.com/rmcpantoja)
- Collaborator: [Xx_Nessu_xX](https://fakeyou.com/profile/Xx_Nessu_xX)

# <font color="pink">🔧 ***First steps.*** 🔧

In [None]:
#@markdown ## <font color="pink"> **Google Colab Anti-Disconnect.** 🔌
#@markdown ---
#@markdown #### Avoid automatic disconnection. Still, it will disconnect after <font color="orange">**6 to 12 hours**</font>.

import IPython
js_code = '''
function ClickConnect(){
console.log("Working");
document.querySelector("colab-toolbar-button#connect").click()
}
setInterval(ClickConnect,60000)
'''
display(IPython.display.Javascript(js_code))

In [None]:
#@markdown ## <font color="pink"> **Check GPU type.** 👁️
#@markdown ---
#@markdown #### A higher capable GPU can lead to faster training speeds. By default, you will have a <font color="orange">**Tesla T4**</font>.
!nvidia-smi

In [None]:
#@markdown # <font color="pink"> **Mount Google Drive.** 📂
from google.colab import drive
drive.mount('/content/drive', force_remount=True)

In [None]:
#@markdown # <font color="pink"> **Install software.** 📦
#@markdown ####In this cell the synthesizer and its necessary dependencies to execute the training will be installed. (this may take a while)

#@markdown <font color="orange">**Note: Please restart the runtime environment when the cell execution is finished. Then you can continue with the training section.**

# clone:
!git clone https://github.com/rmcpantoja/piper
%cd piper/src/python
!pip install --upgrade pip
!pip install --upgrade wheel setuptools
!pip install -r requirements.txt
!pip install torchtext==0.12.0
!pip install torchvision==0.12.0
!bash build_monotonic_align.sh
!apt-get install espeak-ng
%cd /content

# <font color="pink"> 🤖 ***Training.*** 🤖

In [None]:
#@markdown # <font color="pink"> **1. Extract dataset.** 📥
#@markdown ####Important: the audios must be in <font color="orange">**wav format, (16000 or 22050hz, 16-bits, mono), and, for convenience, numbered. Example:**

#@markdown * <font color="orange">**1.wav**</font>
#@markdown * <font color="orange">**2.wav**</font>
#@markdown * <font color="orange">**3.wav**</font>
#@markdown * <font color="orange">**.....**</font>

#@markdown ---

%cd /content
!mkdir /content/dataset
%cd /content/dataset
!mkdir /content/dataset/wavs
#@markdown ### Audio dataset path to unzip
zip_path = "/content/drive/MyDrive/wavs.zip" #@param {type:"string"}
!unzip "{zip_path}" -d /content/dataset/wavs
#@markdown ---

In [None]:
#@markdown # <font color="pink"> **2. Upload the transcript file.** 📝
#@markdown ---
#@markdown ####Important: the transcription means writing what the character says in each of the audios, and it must have the following structure:

#@markdown * wavs/1.wav|This is what my character says in audio 1.
#@markdown * wavs/2.wav|This, the text that the character says in audio 2.
#@markdown * ...

#@markdown And so on. In addition, the transcript must be in a .csv format. (UTF-8 without BOM)

%cd /content/dataset
from google.colab import files
!rm /content/dataset/metadata.csv
listfn, length = files.upload().popitem()
if listfn != "metadata.csv":
  !mv "$listfn" metadata.csv
%cd ..

In [None]:
#@markdown # <font color="pink"> **3. Preprocess dataset.** 🔄

import os
#@markdown ### First of all, select the language of your dataset.
language = "English (U.S.)" #@param ["Català", "Dansk", "Deutsch", "Ελληνικά", "English (British)", "English (U.S.)", "Español", "Suomi", "Français", "ქართული", "Icelandic", "Italiano", "қазақша", "नेपाली", "Nederlands", "Norsk", "Polski", "Português (Brasil)", "Русский", "Svenska", "украї́нська", "Tiếng Việt", "简体中文"]
#@markdown ---
# language definition:
languages = {
    "Català": "ca",
    "Dansk": "da",
    "Deutsch": "de",
    "Ελληνικά": "grc",
    "English (British)": "en",
    "English (U.S.)": "en-us",
    "Español": "es",
    "Suomi": "fi",
    "Français": "fr",
    "Icelandic": "is",
    "Italiano": "it",
    "ქართული": "ka",
    "қазақша": "kk",
    "नेपाली": "ne",
    "Nederlands": "nl",
    "Norsk": "nb",
    "Polski": "pl",
    "Português (Brasil)": "pt-br",
    "Русский": "ru",
    "Svenska": "sv",
    "украї́нська": "uk",
    "Tiếng Việt": "vi-vn-x-central",
    "简体中文": "yue"
}

def _get_language(code):
    return languages[code]

final_language = _get_language(language)
#@markdown ### Choose a name for your model:
model_name = "Test" #@param {type:"string"}
#@markdown ---
# output:
#@markdown ### Choose the working folder: (recommended to save to Drive)

#@markdown The working folder will be used in preprocessing, but also in training the model.
output_path = "/content/drive/MyDrive/colab/piper" #@param {type:"string"}
output_dir = output_path+"/"+model_name
if not os.path.exists(output_dir):
  os.makedirs(output_dir)
#@markdown ---
#@markdown ### Choose dataset format:
dataset_format = "ljspeech" #@param ["ljspeech", "mycroft"]
#@markdown ---
#@markdown ### Is this a single speaker dataset? Otherwise, uncheck:
single_speaker = True #@param {type:"boolean"}
if single_speaker:
  force_sp = " --single-speaker"
else:
  force_sp = ""
#@markdown ---
#@markdown ### Select the sample rate of the dataset:
sample_rate = "16000" #@param ["16000", "22050"]
#@markdown ---
%cd /content/piper/src/python
!python -m piper_train.preprocess \
  --language {final_language} \
  --input-dir /content/dataset \
  --output-dir {output_dir} \
  --dataset-format {dataset_format} \
  --sample-rate {sample_rate} \
  {force_sp}

In [None]:
#@markdown # <font color="pink"> **4. Settings.** 🧰
import json
import ipywidgets as widgets
from IPython.display import display

#@markdown ### Fine-tune this dataset?
finetune = True #@param {type:"boolean"}
#@markdown ---
if finetune:
    ft_command = '--resume_from_checkpoint "/content/pretrained.ckpt" '
    try:
        with open('/content/piper/notebooks/pretrained_models.json') as f:
            pretrained_models = json.load(f)
        if final_language in pretrained_models:
            models = pretrained_models[final_language]
            model_options = [(model_name, model_name) for model_name, model_url in models.items()]
            model_dropdown = widgets.Dropdown(description = "Choose pretrained model", options=model_options)
            download_button = widgets.Button(description="Download")
            def download_model(btn):
                model_name = model_dropdown.value
                model_url = pretrained_models[final_language][model_name]
                if model_url.startswith("1"):
                    !gdown "{model_url}" -O "/content/pretrained.ckpt"
                elif model_url.startswith("https://drive.google.com/file/d/"):
                    !gdown "{model_url}" -O "/content/pretrained.ckpt" --fuzzy
                else:
                    !wget "{model_url}" -O "/content/pretrained.ckpt"
            download_button.on_click(download_model)
            display(model_dropdown, download_button)
        else:
            raise Exception(f"There are no pretrained models available for the language {final_language}")
    except FileNotFoundError:
        raise Exception("The pretrained_models.json file was not found.")
else:
    ft_command = ""
#@markdown ### Choose batch size based on this dataset:
batch_size = 16 #@param {type:"integer"}
#@markdown ---
#@markdown ### Validation split:
validation_split = 0.01 #@param {type:"number"}
#@markdown ---
#@markdown ### Choose the quality for this model:

#@markdown * x-low - 16Khz audio, 5-7M params
#@markdown * low - 16Khz audio, 15-20M params
#@markdown * medium - 22.05Khz audio, 15-20 params
#@markdown * high - 22.05Khz audio, 28-32M params
quality = "x-low" #@param ["high", "x-low", "medium"]
#@markdown ---
#@markdown ### For how many epochs to save training checkpoints?
#@markdown The larger your dataset, you should set this saving interval to a smaller value, as epochs can progress longer time.
checkpoint_epochs = 25 #@param {type:"integer"}
#@markdown ---
#@markdown ### Step interval to generate model samples:
log_every_n_steps = 250 #@param {type:"integer"}
#@markdown ---
#@markdown ### Training epochs:
max_epochs = 10000 #@param {type:"integer"}
#@markdown ---

In [None]:
#@markdown # <font color="pink"> **5. Run the TensorBoard extension.** 📈

#@markdown The TensorBoard is used to visualize the results of the model while it is being trained.
%load_ext tensorboard
%tensorboard --logdir {output_dir}

In [None]:
#@markdown # <font color="pink"> **6. Train.** 🏋️‍♂️

#@markdown <font color="orange">**Note: Remember to empty the trash of your Drive from time to time to avoid a lot of space consumption when saving the models.**
!python -m piper_train \
    --dataset-dir "{output_dir}" \
    --accelerator 'gpu' \
    --devices 1 \
    --batch-size {batch_size} \
    --validation-split {validation_split} \
    --num-test-examples 2 \
    --quality {quality} \
    --checkpoint-epochs {checkpoint_epochs} \
    --check_val_every_n_epoch {checkpoint_epochs} \
    --log_every_n_steps {log_every_n_steps} \
    --max_epochs {max_epochs} \
    {ft_command}\
    --precision 32

# Have you finished training and want to test the model?

Export your model using the [model exporter notebook](https://colab.research.google.com/github/rmcpantoja/piper/blob/master/notebooks/piper_model_exporter.ipynb)!