<a href="https://colab.research.google.com/github/hmezer/TorToiSe-TTS-RVC-Voice-Generation/blob/main/TorToiSe_RVC_generation.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>


**A General Prospect of the Process**

In TorToiSe, generate text-to-speech (TTS) voice files using finetuned weights, and then cooking the TTS output in RVC voice conversion model with specified finetuned weights (timbre as RVC calls), output a voice file that is realistic.

***

**Initial Labor for Downloading the Models**

You can access the Drive folder that I use to store and upload the model weights in this link:
https://drive.google.com/drive/folders/1JIK8uqMYhggNn4x5qp9uxClyC9d3EfJs?usp=sharing

I advise you to download the folder content and upload to your own Drive folder so that each time you run the code blocks in this notebook, you can utilize the structure of importing the models or otherwise uploading the trained models to Drive folder. So that another time you use the notebook, you would easily access the folder from your own Drive.

Note: I could not find a better way of sharing my own trained models to anyone and if you have any advise, mail me at ezerhuseyinmert@gmail.com

***
***

**Weight Storages for TorToiSe and RVC**

The following are the drive folders for the TorToiSe TTS and RVC Voice Conversion weight sets:

*TorToiSe*: https://drive.google.com/drive/folders/1--_4so5U9ykcGF7ZzxFhj1vKS7Txzlpi?usp=drive_link

*RVC*: https://drive.google.com/drive/folders/1--_4so5U9ykcGF7ZzxFhj1vKS7Txzlpi?usp=drive_link

***

**Voice Datasets**

Voice datasets that I used to train the weights are in the drive folder that I share here: https://drive.google.com/drive/folders/125oL1P4VJ5WB_wgxXi4-U1O4Pqjxs0R7?usp=drive_link


***
***

**Gimmicks**

* I have weights both for TorToiSe and RVC that I finetuned. You can import them from Drive with the code blocks below relevant for the task to see how good the process is. Important thing to mention might be that I used the same collection of voice files for finetuning the weights in both TTS (TorToiSe) and voice conversion (RVC). So that if I want to have an output that is the closest to the original voice, I use the weights in TorToiSe and in RVC for the same voice model. Playing around and testing different weights in each of the two steps might produce interesting results, but I avoided it since the production in the pipeline already takes quite a lot of time.

* I tried once to feed RVC with the voice file TorToiSe generated for whether it could polish the voice file in iterating over it. There is a folder containing an example of it, you can check, but it seems feeding into RVC more than once only corrupts the pronunciation and does not produce something better
* * Here is the experiment: https://drive.google.com/drive/folders/1-1gciI4Sq416GzN5SnaYmbGAjE-ei_j2?usp=drive_link

* I sometimes only use TorToiSe or RVC, so initializing them both might be redundant, since these are two distinct production processes and running the two models simultaneously online in Gradio is not possible in my implementation. So, adjust the use to your needs.

# DOWNLOAD THE DRIVE FOLDER
This section requires to be run only once. When you download the folder in my Drive to Colab session, you will upload the folder to your own Drive, so that you can access the content easily next time you use the models


In [None]:
#@title Update gdown
!pip uninstall gdown -y && pip install gdown
!gdown -V

In [None]:
!gdown --v

In [None]:
#@title Download the Drive folder to Colab session
!mkdir -p /content/gdown
# download the TTS to Colab session
!mkdir -p /content/gdown/ai-voice-cloning-base/tortoise-backup
!gdown --folder https://drive.google.com/drive/folders/1MrqlhjZsvH_N_DsJ13Kn437_0P7CjPuG?usp=sharing -O /content/gdown/ai-voice-cloning-base/tortoise-backup


In [None]:
# download the RVC to Colab session
!mkdir -p /content/gdown/ai-voice-cloning-base/rvc-models
!gdown --folder https://drive.google.com/drive/folders/1--_4so5U9ykcGF7ZzxFhj1vKS7Txzlpi?usp=sharing -O /content/gdown/ai-voice-cloning-base/rvc-models

In [None]:
#@title Mount Drive
from google.colab import drive
drive.mount('/content/drive')

In [None]:
#@title Upload the folder to your Drive
!mkdir -p /content/drive/MyDrive/ai-voice-cloning-base
!cp /content/gdown/ai-voice-cloning-base/* /content/drive/MyDrive/ai-voice-cloning-base/

# INITIALIZE COLAB SESSION

Remove here the sample_data folder automatically created in each Colab session and mount Drive.

In [None]:
#@title Clear the colab directory the stupid sample data and set the notebook up
%cd /content
!rm -r sample_data

import os

In [None]:
#@title Mount Drive
from google.colab import drive
drive.mount('/content/drive')

# TORTOISE TTS 🐢

In [None]:
#@title Initialize TorToiSe
#clone repository for tortoise and set it up
%cd /content
!apt install python3.10-venv
!git clone https://git.ecker.tech/mrq/ai-voice-cloning/
%cd /content/ai-voice-cloning

!python -m venv /venv

!./setup-cuda.sh

In [None]:
#@title RECOVER TTS MODELS I: list the models in Drive

tortoise_models_path = "/content/drive/MyDrive/ai-voice-cloning-base/tortoise-backup/training"

for model in os.listdir(tortoise_models_path):
    is_audio = False
    num_audio = len(os.listdir(os.path.join(tortoise_models_path, model, "audio")))
    if num_audio >= 1:
        is_audio = True
    epochs = [model.split("_")[0] for model in os.listdir(os.path.join(tortoise_models_path, model, "finetune/models"))]
    print(f"{model=}, {is_audio=} ({num_audio}), {epochs=}")


In [None]:
#@title RECOVER TTS MODELS II: recover the model

#@markdown Indicate the model name to recover
MODEL = "dc-narrator" #@param {type: "string"}

#@markdown Indicate the model epoch number to recover
EPOCH = "100" #@param {type: "string"}
MOD_TO_LOAD = EPOCH + "_gpt.pth"

%cd /content/ai-voice-cloning/training
!mkdir -p ./{MODEL}
%cd /content/ai-voice-cloning/training/{MODEL}
!mkdir -p ./finetune
%cd /content/ai-voice-cloning/training/{MODEL}/finetune
!mkdir -p ./models

!cp -r /content/drive/MyDrive/ai-voice-cloning-base/tortoise-backup/training/{MODEL}/audio /content/ai-voice-cloning/training/{MODEL}
!cp -r /content/drive/MyDrive/ai-voice-cloning-base/tortoise-backup/training/{MODEL}/finetune/models/{MOD_TO_LOAD} /content/ai-voice-cloning/training/{MODEL}/finetune/models

In [None]:
#@title RECOVER TTS VOICE I: list the voices in Drive

tortoise_voices_path = "/content/drive/MyDrive/ai-voice-cloning-base/tortoise-backup/voices"

for voice in os.listdir(tortoise_voices_path):
    is_audio = False
    num_audio = len(os.listdir(os.path.join(tortoise_voices_path, voice)))
    if num_audio >= 1:
        is_audio = True
    print(f"{voice=}, {is_audio=} ({num_audio})")


In [None]:
#@title RECOVER TTS VOICE II: recover the model

#@markdown Indicate the voice name to recover
VOICE = "elysium" #@param {type: "string"}

!cp -r /content/drive/MyDrive/ai-voice-cloning-base/tortoise-backup/voices/{VOICE} /content/ai-voice-cloning/voices

In [None]:
#@title Run TorToiSe WebUI
%cd /content/ai-voice-cloning/
!./start.sh --share

In [None]:
#@title Export the TTS generation to drive

#@markdown Indicate the voice from which to export
VOICE = "joyce-messier" #@param {type: "string"}

#@markdown Indicate the name of the folder to export to
FOLDER = "dr-moreau" #@param {type: "string"}

!mkdir /content/drive/MyDrive/ai-voice-cloning-base/tortoise-results/{FOLDER}
!cp -r /content/ai-voice-cloning/results/{VOICE}/* /content/drive/MyDrive/ai-voice-cloning-base/tortoise-results/{FOLDER}/

In [None]:
#move generations by tortoise outside and delete tortoise
!mkdir -p /content/tortoise-results
!cp -r /content/ai-voice-cloning/results/* /content/tortoise-results
%cd /content
!rm -r ai-voice-cloning

# RVC VOICE CONVERSION

In [None]:
#@title Initialize RVC setup

#install dependencies for RVC
%cd /content
!apt-get -y install build-essential python3-dev ffmpeg
!pip3 install --upgrade setuptools wheel
!pip3 install --upgrade pip
!pip3 install faiss-cpu==1.7.2 fairseq gradio==3.14.0 ffmpeg ffmpeg-python praat-parselmouth pyworld numpy==1.23.5 numba==0.56.4 librosa==0.9.2

#clone repository for RVC
!git clone --depth=1 -b stable https://github.com/fumiama/Retrieval-based-Voice-Conversion-WebUI

%cd /content/Retrieval-based-Voice-Conversion-WebUI
!mkdir -p pretrained uvr5_weights

#install aria2
!apt -y install -qq aria2


#download template
!aria2c --console-log-level=error -c -x 16 -s 16 -k 1M https://huggingface.co/lj1995/VoiceConversionWebUI/resolve/main/pretrained/D32k.pth -d /content/Retrieval-based-Voice-Conversion-WebUI/pretrained -o D32k.pth
!aria2c --console-log-level=error -c -x 16 -s 16 -k 1M https://huggingface.co/lj1995/VoiceConversionWebUI/resolve/main/pretrained/D40k.pth -d /content/Retrieval-based-Voice-Conversion-WebUI/pretrained -o D40k.pth
!aria2c --console-log-level=error -c -x 16 -s 16 -k 1M https://huggingface.co/lj1995/VoiceConversionWebUI/resolve/main/pretrained/D48k.pth -d /content/Retrieval-based-Voice-Conversion-WebUI/pretrained -o D48k.pth
!aria2c --console-log-level=error -c -x 16 -s 16 -k 1M https://huggingface.co/lj1995/VoiceConversionWebUI/resolve/main/pretrained/G32k.pth -d /content/Retrieval-based-Voice-Conversion-WebUI/pretrained -o G32k.pth
!aria2c --console-log-level=error -c -x 16 -s 16 -k 1M https://huggingface.co/lj1995/VoiceConversionWebUI/resolve/main/pretrained/G40k.pth -d /content/Retrieval-based-Voice-Conversion-WebUI/pretrained -o G40k.pth
!aria2c --console-log-level=error -c -x 16 -s 16 -k 1M https://huggingface.co/lj1995/VoiceConversionWebUI/resolve/main/pretrained/G48k.pth -d /content/Retrieval-based-Voice-Conversion-WebUI/pretrained -o G48k.pth
!aria2c --console-log-level=error -c -x 16 -s 16 -k 1M https://huggingface.co/lj1995/VoiceConversionWebUI/resolve/main/pretrained/f0D32k.pth -d /content/Retrieval-based-Voice-Conversion-WebUI/pretrained -o f0D32k.pth
!aria2c --console-log-level=error -c -x 16 -s 16 -k 1M https://huggingface.co/lj1995/VoiceConversionWebUI/resolve/main/pretrained/f0D40k.pth -d /content/Retrieval-based-Voice-Conversion-WebUI/pretrained -o f0D40k.pth
!aria2c --console-log-level=error -c -x 16 -s 16 -k 1M https://huggingface.co/lj1995/VoiceConversionWebUI/resolve/main/pretrained/f0D48k.pth -d /content/Retrieval-based-Voice-Conversion-WebUI/pretrained -o f0D48k.pth
!aria2c --console-log-level=error -c -x 16 -s 16 -k 1M https://huggingface.co/lj1995/VoiceConversionWebUI/resolve/main/pretrained/f0G32k.pth -d /content/Retrieval-based-Voice-Conversion-WebUI/pretrained -o f0G32k.pth
!aria2c --console-log-level=error -c -x 16 -s 16 -k 1M https://huggingface.co/lj1995/VoiceConversionWebUI/resolve/main/pretrained/f0G40k.pth -d /content/Retrieval-based-Voice-Conversion-WebUI/pretrained -o f0G40k.pth
!aria2c --console-log-level=error -c -x 16 -s 16 -k 1M https://huggingface.co/lj1995/VoiceConversionWebUI/resolve/main/pretrained/f0G48k.pth -d /content/Retrieval-based-Voice-Conversion-WebUI/pretrained -o f0G48k.pth

#download speech separation model
!aria2c --console-log-level=error -c -x 16 -s 16 -k 1M https://huggingface.co/lj1995/VoiceConversionWebUI/resolve/main/uvr5_weights/HP2-人声vocals+非人声instrumentals.pth -d /content/Retrieval-based-Voice-Conversion-WebUI/uvr5_weights -o HP2-人声vocals+非人声instrumentals.pth
!aria2c --console-log-level=error -c -x 16 -s 16 -k 1M https://huggingface.co/lj1995/VoiceConversionWebUI/resolve/main/uvr5_weights/HP5-主旋律人声vocals+其他instrumentals.pth -d /content/Retrieval-based-Voice-Conversion-WebUI/uvr5_weights -o HP5-主旋律人声vocals+其他instrumentals.pth

#download hubert-base
!aria2c --console-log-level=error -c -x 16 -s 16 -k 1M https://huggingface.co/lj1995/VoiceConversionWebUI/resolve/main/hubert_base.pt -d /content/Retrieval-based-Voice-Conversion-WebUI -o hubert_base.pt

In [None]:
#@title RECOVER RVC MODEL I: list the models in Drive

rvc_models_path = "/content/drive/MyDrive/ai-voice-cloning-base/rvc-models"

for model in os.listdir(rvc_models_path):
    is_index, is_D, is_G, is_main = False, False, False, False
    D_list, G_list, main_list = [], [], []
    for file in os.listdir(os.path.join(rvc_models_path, model)):
        if ".index" in file:
            is_index = True
        if ".pth" in file:
            file = file.strip(".pth")
            if "_D_" in file:
                is_D = True
                D_list.append(file.split("_D_")[-1])
            elif "_G_" in file:
                is_G = True
                G_list.append(file.split("_G_")[-1])
            else:
                is_main = True
                main_list.append(file)
    if is_index and is_main and is_D and is_G:
        print(f"{model=}\n\t\tmain -> {main_list}\n\t\tD -> {D_list}\n\t\tG -> {G_list}\n\t\t")

In [None]:
#@title RECOVER RVC MODEL II: recover the speficied
#@markdown You need to manually check the file names of the models in the "logs" folder and modify the file name at the end of the command below.

#@markdown Model general name
MODGENNAME = "joshua-long-v3-model" #@param {type:"string"}
#@markdown Model name
MODELNAME = "joshua-long-v3"  #@param {type:"string"}
#@markdown Model epoch
MODELEPOCH = "9000"  #@param {type:"string"}

LOADDIR = "/content/drive/MyDrive/ai-voice-cloning-base/rvc-models/" + MODGENNAME


!mkdir -p /content/Retrieval-based-Voice-Conversion-WebUI/logs/{MODELNAME}

!cp {LOADDIR}/{MODELNAME}_D_{MODELEPOCH}.pth /content/Retrieval-based-Voice-Conversion-WebUI/logs/{MODELNAME}/G_{MODELEPOCH}.pth
!cp {LOADDIR}/{MODELNAME}_G_{MODELEPOCH}.pth /content/Retrieval-based-Voice-Conversion-WebUI/logs/{MODELNAME}/D_{MODELEPOCH}.pth
!mkdir -p /content/{MODELNAME}
!cp {LOADDIR}/*.index /content/{MODELNAME}
!cp {LOADDIR}/*.npy /content/{MODELNAME}
!cp {LOADDIR}/{MODELNAME}{MODELEPOCH}.pth /content/Retrieval-based-Voice-Conversion-WebUI/weights/{MODELNAME}.pth

In [None]:
#@title Run RVC WebUI
%cd /content/Retrieval-based-Voice-Conversion-WebUI
# %load_ext tensorboard
# %tensorboard --logdir /content/Retrieval-based-Voice-Conversion-WebUI/logs
!python3 infer-web.py --colab --pycmd python3

# TRAINING AND BACKUP

In [None]:
#@title UNZIP AUDIO DATASETS I: list the datasets in Drive

datasets_path = "/content/drive/MyDrive/ai-voice-cloning-base/voice-datasets"

datasets = [dataset for dataset in os.listdir(datasets_path) if ".zip" in dataset]

for dataset in datasets:
    print(dataset)

In [None]:
#@title UNZIP AUDIO DATASETS II: unzip the dataset

dataset_path = "/content/drive/MyDrive/ai-voice-cloning-base/voice-datasets/"

#@markdown Indicate the name of the dataset
DATASET = "disco-elysium-narrator-10secs.zip" #@param {type: "string"}

DATASET = dataset_path + DATASET

!mkdir -p /content/dataset
!unzip -d /content/dataset -B {DATASET}

In [None]:
%cd /content
!rm -r ./{dataset,instruments,vocals}

In [None]:
#@title Also place dataset into TTS voices folder

#@markdown Indicate the voice name you want to create
VOICENAME = "dc-narrator" #@param {type: "string"}

%cd /content/ai-voice-cloning/voices
!mkdir -p ./{VOICENAME}

!cp /content/dataset/* /content/ai-voice-cloning/voices/{VOICENAME}/

In [None]:
#@title Backup the RVC trained models to Drive
#@markdown You need to check the file name of the model under the logs folder by yourself, and manually modify the file name at the end of the command below

#@markdown Indicate the model file name you want to create
MODELFILE = "sinatra-exp-model" #@param {type: "string"}
# @markdown Indicate the model name
MODELNAME = "sinatra-exp"  # @param {type:"string"}
# @markdown Indicate the model epoch
MODELEPOCH = 2000  # @param {type:"integer"}

%cd /content/drive/MyDrive/ai-voice-cloning-base/rvc-models/
!mkdir -p ./{MODELFILE}

!cp /content/Retrieval-based-Voice-Conversion-WebUI/logs/{MODELNAME}/G_{MODELEPOCH}.pth /content/drive/MyDrive/ai-voice-cloning-base/rvc-models/{MODELFILE}/{MODELNAME}_D_{MODELEPOCH}.pth
!cp /content/Retrieval-based-Voice-Conversion-WebUI/logs/{MODELNAME}/D_{MODELEPOCH}.pth /content/drive/MyDrive/ai-voice-cloning-base/rvc-models/{MODELFILE}/{MODELNAME}_G_{MODELEPOCH}.pth
!cp /content/Retrieval-based-Voice-Conversion-WebUI/logs/{MODELNAME}/added_*.index /content/drive/MyDrive/ai-voice-cloning-base/rvc-models/{MODELFILE}
!cp /content/Retrieval-based-Voice-Conversion-WebUI/logs/{MODELNAME}/total_*.npy /content/drive/MyDrive/ai-voice-cloning-base/rvc-models/{MODELFILE}

!cp /content/Retrieval-based-Voice-Conversion-WebUI/weights/{MODELNAME}.pth /content/drive/MyDrive/ai-voice-cloning-base/rvc-models/{MODELFILE}/{MODELNAME}{MODELEPOCH}.pth

In [None]:
#@title Clean the TTS training and voices folders for generation

#@markdown Indicate the voice files to keep (format: '"voice_1.wav", "voice_2.wav"')
VOICES_TO_KEEP = "Conceptualization-THOUGHT  ART COP-10.wav,Conceptualization-OFFICE  FILE CABINET-36.wav,Conceptualization-APT  STUDENT COMMUNIST-241.wav" #@param {type: "string"}

#@markdown Indicate the voice general name
VOICE = "dc-narrator" #@param {type: "string"}

!rm -r /content/ai-voice-cloning/voices/{VOICE}

for VOICE_TO_KEEP in VOICES_TO_KEEP.split(","):
  !cp -r /content/ai-voice-cloning/training/{VOICE}/audio/{VOICE_TO_KEEP} /content/ai-voice-cloning/voices/{VOICE}

!rm -r /content/ai-voice-cloning/training/{VOICE}/audio

!cp -r /content/ai-voice-cloning/voices/{VOICE}/* /content/ai-voice-cloning/training/{VOICE}/audio

In [None]:
#@title Clean the TTS training folder

#@markdown Indicate the voice name
VOICE = "dc-narrator" #@param {type: "string"}

%cd /content
!mkdir -p TEMP

!cp -r /content/ai-voice-cloning/training/{VOICE}/audio /content/TEMP
!cp -r /content/ai-voice-cloning/training/{VOICE}/finetune/models /content/TEMP

%cd /content/ai-voice-cloning/training
!rm -r ./{VOICE}

%cd /content/ai-voice-cloning/training
!mkdir -p ./{VOICE}

!cp -r /content/TEMP/audio /content/ai-voice-cloning/training/{VOICE}

%cd /content/ai-voice-cloning/training/{VOICE}
!mkdir -p finetune

!cp -r /content/TEMP/models /content/ai-voice-cloning/training/{VOICE}/finetune

%cd /content
!rm -r ./TEMP

In [None]:
#@title Save TTS model to Drive

#@markdown Indicate the trained model name
TRAINEDMODEL = "dc-narrator" #@param {type: "string"}

#@markdown Indicate the model epoch
TRAINEDEPOCH = "25" #@param {type: "string"}
MODEL_TO_SAVE = TRAINEDEPOCH + "_gpt.pth"

%cd /content/drive/MyDrive/ai-voice-cloning-base/tortoise-backup/training
!mkdir -p ./{TRAINEDMODEL}
%cd /content/drive/MyDrive/ai-voice-cloning-base/tortoise-backup/training/{TRAINEDMODEL}
!mkdir -p ./finetune
%cd /content/drive/MyDrive/ai-voice-cloning-base/tortoise-backup/training/{TRAINEDMODEL}/finetune
!mkdir -p ./models

!cp -r /content/ai-voice-cloning/training/{TRAINEDMODEL}/audio /content/drive/MyDrive/ai-voice-cloning-base/tortoise-backup/training/{TRAINEDMODEL}
!cp -r /content/ai-voice-cloning/training/{TRAINEDMODEL}/finetune/models/{MODEL_TO_SAVE} /content/drive/MyDrive/ai-voice-cloning-base/tortoise-backup/training/{TRAINEDMODEL}/finetune/models

In [None]:
#@title Save TTS voice to Drive

#@markdown Indicate the voice name
VOICE = "joshua-graham" #@param {type: "string"}

!cp -r /content/ai-voice-cloning/voices/{VOICE} /content/drive/MyDrive/ai-voice-cloning-base/tortoise-backup/voices/

In [None]:
#@title Export generations from TorToiSe to Drive
import datetime

file_name = datetime.datetime.now().isoformat()

#@markdown Indicate the name of the voice model
VOICE = "joshua-voice" #@param {type: "string"}

#@ Export generations from TorToiSe to Drive
!cp -r /content/ai-voice-cloning/results/{VOICE}/* /content/drive/MyDrive/ai-voice-cloning-base/tortoise-results/{file_name}/