<a href="https://colab.research.google.com/github/helplanes/JS/blob/main/XTTS2_Inf_Train.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Dataset building + XTTS2 finetuning and inference

### **Install**
Run the first 3 cells (Allow Google Colab to use your Google Drive Account to import here the finished model, ignore pip install errors in the first one)

Then click on the link `Running on public URL: ` when the demo is ready.


### **Download the model & processed dataset**

Run the last 3 cells


### **Import a model & processed dataset**
Run the first 3 cells (Allow Google Colab to use your Google Drive Account to import here the model you wanna use, so be sure you uploaded the model (.pth, config.json, vocab.json, any speaker audio refernce from the dataset), ignore pip install errors in the first one)

Then click on the link `Running on public URL: ` when the demo is ready.

Skip to Inference

Put "/content/drive/MyDrive/XTTS2_Inf_Train/model.pth" in the checkpoint path section

Put "/content/drive/MyDrive/XTTS2_Inf_Train/config.json" in the configh path section

Put "/content/drive/MyDrive/XTTS2_Inf_Train/vocab.json" in the vocab path section

Put "/content/drive/MyDrive/XTTS2_Inf_Train/(name of any of the .wav inside of the processed dataset, that needs to be in the Google Drive Folder too)" in the speaker reference audio section (XTTS2 will generate different types of quality output based on which audio you choose)

NOTE: If the model is in a folder under the Google Drive XTTS_ft_colab folder, you will have to put that too in all sections, for example: "/content/drive/MyDrive/XTTS2_Inf_Train/test/model.pth"

Load Fine Tuned Model, input any text and click on Inference


You can also watch this guide https://www.youtube.com/watch?v=8tpDiiouGxc , but its from their original colab which is a bit different from this

In [None]:
#@title Install XTTS2
!rm -rf TTS/ # delete repo to be able to reinstall if needed
!git clone --branch xtts_demo -q https://github.com/coqui-ai/TTS.git
!pip install --use-deprecated=legacy-resolver -q -e TTS
!pip install --use-deprecated=legacy-resolver -q -r TTS/TTS/demos/xtts_ft_demo/requirements.txt
!pip install -q typing_extensions==4.8 numpy==1.26.2

In [None]:
#@title Mount Google Drive
from google.colab import drive
import shutil
drive.mount('/content/drive')
!mkdir /content/drive/MyDrive/XTTS2_Inf_Train/

In [None]:
#@title Run UI
!python TTS/TTS/demos/xtts_ft_demo/xtts_demo.py

In [None]:
#@title Download the Procesed Dataset (Audio Speaker Reference) of the finetuned model
from google.colab import files

!zip -q -r dataset.zip /tmp/xtts_ft/dataset
files.download('dataset.zip')

In [None]:
#@title Download the finetuned Model (.pth, config.json, vocab.json)
from google.colab import files
import os
import glob
import torch

def find_latest_best_model(folder_path):
    search_path = os.path.join(folder_path, '**', 'best_model.pth')
    files = glob.glob(search_path, recursive=True)
    latest_file = max(files, key=os.path.getctime, default=None)
    return latest_file

model_path = find_latest_best_model("/tmp/xtts_ft/run/training/")
checkpoint = torch.load(model_path, map_location=torch.device("cpu"))
del checkpoint["optimizer"]
for key in list(checkpoint["model"].keys()):
    if "dvae" in key:
        del checkpoint["model"][key]
torch.save(checkpoint, "model.pth")
model_dir = os.path.dirname(model_path)
files.download(os.path.join(model_dir, 'config.json'))
files.download(os.path.join(model_dir, 'vocab.json'))
files.download('model.pth')

In [None]:
#@title Download the Dataset & Model in Google Drive
#@markdown The two previous cells are a requirement for this step but it can be much faster
from google.colab import drive
import shutil
drive.mount('/content/drive')

model_name = "test" #@param {type:"string"}
give_model_name = True #@param {type:"boolean"}

if give_model_name:
  # Create the directory for the model in Google Drive
  model_folder = "/content/drive/MyDrive/XTTS2_Inf_Train/" + model_name
  !mkdir -p {model_folder}
  shutil.copy(os.path.join(model_dir, 'config.json'), model_folder)
  shutil.copy(os.path.join(model_dir, 'vocab.json'), model_folder)
  shutil.copy('model.pth', model_folder)
  shutil.copy('dataset.zip', model_folder)
else:
  !mkdir /content/drive/MyDrive/XTTS_ft_colab
  shutil.copy(os.path.join(model_dir, 'config.json'), "/content/drive/MyDrive/XTTS2_Inf_Train/config.json")
  shutil.copy(os.path.join(model_dir, 'vocab.json'), "/content/drive/MyDrive/XTTS2_Inf_Train/vocab.json'")
  shutil.copy('model.pth', "/content/drive/MyDrive/XTTS2_Inf_Train/model.pth")
  shutil.copy('dataset.zip', "/content/drive/MyDrive/XTTS2_Inf_Train/dataset.zip")

# Task
Finetune an XTTS2 model using your dataset located at `/content/drive/MyDrive/tts-training/bhumika_training` by launching the XTTS2 finetuning UI in Google Colab.

## Install XTTS2 and Dependencies

### Subtask:
Install all necessary libraries and clone the XTTS2 repository required for finetuning and inference.


#### Instructions
Run the first code cell to install XTTS2 and its dependencies. You can ignore any `pip install` errors that may appear during this step, as indicated in the notebook.

**Reasoning**:
The next step is to execute the code block that installs XTTS2 and its dependencies, as instructed in the previous markdown block.



In [1]:
#@title Install XTTS2
!rm -rf TTS/ # delete repo to be able to reinstall if needed
!git clone --branch xtts_demo -q https://github.com/coqui-ai/TTS.git
!pip install --use-deprecated=legacy-resolver -q -e TTS
!pip install --use-deprecated=legacy-resolver -q -r TTS/TTS/demos/xtts_ft_demo/requirements.txt
!pip install -q typing_extensions==4.8 numpy==1.26.2

  Installing build dependencies ... [?25l[?25hdone
  Checking if build backend supports build_editable ... [?25l[?25hdone
  [1;31merror[0m: [1msubprocess-exited-with-error[0m
  
  [31m×[0m [32mGetting requirements to build editable[0m did not run successfully.
  [31m│[0m exit code: [1;36m1[0m
  [31m╰─>[0m No available output.
  
  [1;35mnote[0m: This error originates from a subprocess, and is likely not a problem with pip.
  Getting requirements to build editable ... [?25l[?25herror
[1;31merror[0m: [1msubprocess-exited-with-error[0m

[31m×[0m [32mGetting requirements to build editable[0m did not run successfully.
[31m│[0m exit code: [1;36m1[0m
[31m╰─>[0m No available output.

[1;35mnote[0m: This error originates from a subprocess, and is likely not a problem with pip.
  Installing build dependencies ... [?25l[?25hdone
  [1;31merror[0m: [1msubprocess-exited-with-error[0m
  
  [31m×[0m [32mGetting requirements to build wheel[0m did not run 

## Mount Google Drive

### Subtask:
Mount your Google Drive to `/content/drive`, allowing the Colab environment to access your dataset located at `/content/drive/MyDrive/tts-training/bhumika_training`.


#### Instructions
Run the code cell that mounts Google Drive. This will prompt you to authorize Google Colab to access your Google Drive. Follow the instructions to complete the mounting process.

**Reasoning**:
The next step is to execute the code block that mounts Google Drive, which is required to access the dataset for finetuning.



In [2]:
from google.colab import drive
import shutil
drive.mount('/content/drive')
!mkdir /content/drive/MyDrive/XTTS2_Inf_Train/

Mounted at /content/drive
mkdir: cannot create directory ‘/content/drive/MyDrive/XTTS2_Inf_Train/’: File exists


## Launch XTTS2 Finetuning UI

### Subtask:
Start the Gradio-based user interface for XTTS2 finetuning.


#### Instructions
Run the code cell that launches the XTTS2 finetuning UI. This will start a server and provide a public URL. Once the URL appears, click on it to access the finetuning interface.

**Reasoning**:
To launch the XTTS2 finetuning UI, the Python script `xtts_demo.py` needs to be executed. This will initiate the Gradio interface, making it accessible via a public URL.

**Reasoning**:
To start the Gradio-based user interface for XTTS2 finetuning as instructed, I need to execute the corresponding Python script in the notebook.



In [3]:
!python TTS/TTS/demos/xtts_ft_demo/xtts_demo.py

Traceback (most recent call last):
  File "/content/TTS/TTS/demos/xtts_ft_demo/xtts_demo.py", line 6, in <module>
    import gradio as gr
  File "/usr/local/lib/python3.12/dist-packages/gradio/__init__.py", line 3, in <module>
    import gradio._simple_templates
  File "/usr/local/lib/python3.12/dist-packages/gradio/_simple_templates/__init__.py", line 1, in <module>
    from .simpledropdown import SimpleDropdown
  File "/usr/local/lib/python3.12/dist-packages/gradio/_simple_templates/simpledropdown.py", line 7, in <module>
    from gradio.components.base import Component, FormComponent
  File "/usr/local/lib/python3.12/dist-packages/gradio/components/__init__.py", line 1, in <module>
    from gradio.components.annotated_image import AnnotatedImage
  File "/usr/local/lib/python3.12/dist-packages/gradio/components/annotated_image.py", line 14, in <module>
    from gradio import processing_utils, utils
  File "/usr/local/lib/python3.12/dist-packages/gradio/processing_utils.py", line 29, 

**Reasoning**:
The previous attempt to launch the XTTS2 finetuning UI failed due to an `ImportError` related to `typing_extensions` and `pydantic_core`. This indicates a version incompatibility, specifically that the installed `typing_extensions==4.8` is too old. `pydantic-core` requires `typing-extensions>=4.14.1`. To resolve this, I need to explicitly upgrade `typing_extensions` to a compatible version, such as `4.14.1`.



In [4]:
!pip install -q typing_extensions==4.14.1
print('Upgraded typing_extensions to 4.14.1 to resolve import error.')

[31mERROR: pip's dependency resolver does not currently take into account all the packages that are installed. This behaviour is the source of the following dependency conflicts.
shap 0.50.0 requires numpy>=2, but you have numpy 1.26.2 which is incompatible.[0m[31m
[0mUpgraded typing_extensions to 4.14.1 to resolve import error.


**Reasoning**:
Now that `typing_extensions` has been upgraded, I need to re-attempt launching the XTTS2 finetuning UI to see if the dependency issue has been resolved.



In [5]:
import os
if not os.path.exists('TTS'):
  !git clone --branch xtts_demo -q https://github.com/coqui-ai/TTS.git
!python TTS/TTS/demos/xtts_ft_demo/xtts_demo.py

Traceback (most recent call last):
  File "/content/TTS/TTS/demos/xtts_ft_demo/xtts_demo.py", line 14, in <module>
    from TTS.demos.xtts_ft_demo.utils.formatter import format_audio_list
ModuleNotFoundError: No module named 'TTS'


**Reasoning**:
The previous attempt to launch the XTTS2 finetuning UI failed with `ModuleNotFoundError: No module named 'TTS'`, indicating that the `TTS` package itself was not properly installed in editable mode. This was likely due to the `subprocess-exited-with-error` during the initial `pip install -e TTS` command. Now that `typing_extensions` has been updated to a compatible version, I need to re-run the `pip install -e TTS` and `pip install -r requirements.txt` commands to ensure the `TTS` module is correctly set up before attempting to launch the UI again. I will remove the explicit `numpy` installation from the original install cell to avoid potential conflicts, as it was not directly related to the current `ModuleNotFoundError` and `typing_extensions` is now correctly set.



In [6]:
print('Re-installing TTS package and its requirements to resolve ModuleNotFoundError...')
!pip install --use-deprecated=legacy-resolver -q -e TTS
!pip install --use-deprecated=legacy-resolver -q -r TTS/TTS/demos/xtts_ft_demo/requirements.txt

print('Attempting to launch XTTS2 finetuning UI again...')
!python TTS/TTS/demos/xtts_ft_demo/xtts_demo.py

Re-installing TTS package and its requirements to resolve ModuleNotFoundError...
  Installing build dependencies ... [?25l[?25hdone
  Checking if build backend supports build_editable ... [?25l[?25hdone
  [1;31merror[0m: [1msubprocess-exited-with-error[0m
  
  [31m×[0m [32mGetting requirements to build editable[0m did not run successfully.
  [31m│[0m exit code: [1;36m1[0m
  [31m╰─>[0m No available output.
  
  [1;35mnote[0m: This error originates from a subprocess, and is likely not a problem with pip.
  Getting requirements to build editable ... [?25l[?25herror
[1;31merror[0m: [1msubprocess-exited-with-error[0m

[31m×[0m [32mGetting requirements to build editable[0m did not run successfully.
[31m│[0m exit code: [1;36m1[0m
[31m╰─>[0m No available output.

[1;35mnote[0m: This error originates from a subprocess, and is likely not a problem with pip.
  Installing build dependencies ... [?25l[?25hdone
  [1;31merror[0m: [1msubprocess-exited-with-er

**Reasoning**:
The `ModuleNotFoundError: No module named 'TTS'` persists despite re-running the installation commands, suggesting that the editable install is not correctly adding the TTS directory to Python's path. To address this, I will explicitly add the `/content/TTS` directory to the `PYTHONPATH` environment variable before attempting to launch the UI again. This should ensure Python can locate the `TTS` module.



In [7]:
import os

# Add the TTS directory to PYTHONPATH
if '/content/TTS' not in os.environ.get('PYTHONPATH', ''):
    os.environ['PYTHONPATH'] = f"/content/TTS:{os.environ.get('PYTHONPATH', '')}"
    print('Added /content/TTS to PYTHONPATH.')
else:
    print('/content/TTS already in PYTHONPATH.')

print('Attempting to launch XTTS2 finetuning UI again with updated PYTHONPATH...')
!python TTS/TTS/demos/xtts_ft_demo/xtts_demo.py

Added /content/TTS to PYTHONPATH.
Attempting to launch XTTS2 finetuning UI again with updated PYTHONPATH...
Traceback (most recent call last):
  File "/content/TTS/TTS/demos/xtts_ft_demo/xtts_demo.py", line 14, in <module>
    from TTS.demos.xtts_ft_demo.utils.formatter import format_audio_list
  File "/content/TTS/TTS/demos/xtts_ft_demo/utils/formatter.py", line 5, in <module>
    from faster_whisper import WhisperModel
ModuleNotFoundError: No module named 'faster_whisper'


**Reasoning**:
The previous attempt to launch the XTTS2 finetuning UI failed with `ModuleNotFoundError: No module named 'faster_whisper'`. This indicates that the `faster_whisper` library, which is a dependency for the XTTS2 demo, is missing. To resolve this, I need to explicitly install `faster_whisper`.



In [8]:
print('Installing missing dependency: faster_whisper...')
!pip install -q faster_whisper

print('Attempting to launch XTTS2 finetuning UI again after installing faster_whisper...')
!python TTS/TTS/demos/xtts_ft_demo/xtts_demo.py

Installing missing dependency: faster_whisper...
Attempting to launch XTTS2 finetuning UI again after installing faster_whisper...
Traceback (most recent call last):
  File "/content/TTS/TTS/demos/xtts_ft_demo/xtts_demo.py", line 14, in <module>
    from TTS.demos.xtts_ft_demo.utils.formatter import format_audio_list
  File "/content/TTS/TTS/demos/xtts_ft_demo/utils/formatter.py", line 14, in <module>
    from TTS.tts.layers.xtts.tokenizer import multilingual_cleaners
  File "/content/TTS/TTS/tts/layers/__init__.py", line 1, in <module>
    from TTS.tts.layers.losses import *
  File "/content/TTS/TTS/tts/layers/losses.py", line 5, in <module>
    from coqpit import Coqpit
ModuleNotFoundError: No module named 'coqpit'


**Reasoning**:
The previous attempt to launch the XTTS2 finetuning UI failed with `ModuleNotFoundError: No module named 'coqpit'`. This indicates that the `coqpit` library, which is a dependency for the XTTS2 demo, is missing. To resolve this, I need to explicitly install `coqpit`.



In [9]:
print('Installing missing dependency: coqpit...')
!pip install -q coqpit

print('Attempting to launch XTTS2 finetuning UI again after installing coqpit...')
!python TTS/TTS/demos/xtts_ft_demo/xtts_demo.py

Installing missing dependency: coqpit...
Attempting to launch XTTS2 finetuning UI again after installing coqpit...
Traceback (most recent call last):
  File "/content/TTS/TTS/demos/xtts_ft_demo/xtts_demo.py", line 14, in <module>
    from TTS.demos.xtts_ft_demo.utils.formatter import format_audio_list
  File "/content/TTS/TTS/demos/xtts_ft_demo/utils/formatter.py", line 14, in <module>
    from TTS.tts.layers.xtts.tokenizer import multilingual_cleaners
  File "/content/TTS/TTS/tts/layers/xtts/tokenizer.py", line 6, in <module>
    import pypinyin
ModuleNotFoundError: No module named 'pypinyin'


**Reasoning**:
The previous attempt to launch the XTTS2 finetuning UI failed with `ModuleNotFoundError: No module named 'pypinyin'`. This indicates that the `pypinyin` library, which is a dependency for the XTTS2 demo, is missing. To resolve this, I need to explicitly install `pypinyin`.



In [10]:
print('Installing missing dependency: pypinyin...')
!pip install -q pypinyin

print('Attempting to launch XTTS2 finetuning UI again after installing pypinyin...')
!python TTS/TTS/demos/xtts_ft_demo/xtts_demo.py

Installing missing dependency: pypinyin...
Attempting to launch XTTS2 finetuning UI again after installing pypinyin...
Traceback (most recent call last):
  File "/content/TTS/TTS/demos/xtts_ft_demo/xtts_demo.py", line 14, in <module>
    from TTS.demos.xtts_ft_demo.utils.formatter import format_audio_list
  File "/content/TTS/TTS/demos/xtts_ft_demo/utils/formatter.py", line 14, in <module>
    from TTS.tts.layers.xtts.tokenizer import multilingual_cleaners
  File "/content/TTS/TTS/tts/layers/xtts/tokenizer.py", line 8, in <module>
    from hangul_romanize import Transliter
ModuleNotFoundError: No module named 'hangul_romanize'


**Reasoning**:
The previous attempt to launch the XTTS2 finetuning UI failed with `ModuleNotFoundError: No module named 'hangul_romanize'`. This indicates that the `hangul_romanize` library, which is a dependency for the XTTS2 demo, is missing. To resolve this, I need to explicitly install `hangul_romanize`.



In [11]:
print('Installing missing dependency: hangul_romanize...')
!pip install -q hangul_romanize

print('Attempting to launch XTTS2 finetuning UI again after installing hangul_romanize...')
!python TTS/TTS/demos/xtts_ft_demo/xtts_demo.py

Installing missing dependency: hangul_romanize...
Attempting to launch XTTS2 finetuning UI again after installing hangul_romanize...
Traceback (most recent call last):
  File "/content/TTS/TTS/demos/xtts_ft_demo/xtts_demo.py", line 14, in <module>
    from TTS.demos.xtts_ft_demo.utils.formatter import format_audio_list
  File "/content/TTS/TTS/demos/xtts_ft_demo/utils/formatter.py", line 14, in <module>
    from TTS.tts.layers.xtts.tokenizer import multilingual_cleaners
  File "/content/TTS/TTS/tts/layers/xtts/tokenizer.py", line 10, in <module>
    from num2words import num2words
ModuleNotFoundError: No module named 'num2words'
