## Before training

This program saves the last 3 generations of models to Google Drive. Since 1 generation of models is >1GB, you should have at least 3GB of free space in Google Drive. If you do not have such free space, it is recommended to create another Google Account.

Training requires >10GB VRAM. (T4 should be enough) Inference does not require such a lot of VRAM.

## Installation

In [None]:
#@title Connect to colab runtime and check GPU
!nvidia-smi

In [None]:
#@title Install dependencies
#@markdown pip may fail to resolve dependencies and raise ERROR, but it can be ignored.
!python -m pip install -U pip setuptools wheel
%pip install -U ipython~=7.34.0

#@markdown Branch (for development)
BRANCH = "none" #@param {"type": "string"}
if BRANCH == "none":
    %pip install -U so-vits-svc-fork
else:
    %pip install -U git+https://github.com/34j/so-vits-svc-fork.git@{BRANCH}

#@markdown ### After the execution is completed, the runtime will **automatically restart**
# exit()

In [None]:
#@title Mount Google Drive
from google.colab import drive
drive.mount('/content/drive')

In [None]:
#@title Install rclone for OneDrive
# !apt install -y fuse3
!sudo -v ; curl https://rclone.org/install.sh | sudo bash
!rclone -V

In [None]:
#@title Mount OneDrive
# ![ ! -d onedrive ] && mkdir onedrive
# !rclone mount colab:/ /content/onedrive --config /content/drive/MyDrive/rclone.conf --vfs-cache-mode full --daemon

## Training

In [None]:
#@title Clean dataset directory
!rm -r "dataset_raw_raw"
!rm -r "dataset_raw"
!rm -r "dataset"

In [None]:
#@title Make dataset directory
!mkdir -p "dataset_raw_raw"
!mkdir -p "dataset_raw"
!mkdir -p "dataset"

In [None]:
#@title Copy your dataset_raw_raw
#@markdown **We assume that your dataset_raw_raw is in your Google Drive's `so-vits-svc-fork/dataset_raw_raw/(speaker_name)` directory.**
DATASET_NAME = "kiritan" #@param {type: "string"}
!cp -R /content/drive/MyDrive/so-vits-svc-fork/dataset_raw_raw/{DATASET_NAME}/ -t "dataset_raw_raw/"

In [None]:
#@title Automatic split audio files into multiple files
!svc pre-split

In [None]:
#@title Copy your dataset_raw
#@markdown **We assume that your dataset_raw is in your Google Drive's `so-vits-svc-fork/dataset_raw/(speaker_name)` directory.**
DATASET_NAME = "kiritan" #@param {type: "string"}
!cp -R /content/drive/MyDrive/so-vits-svc-fork/dataset_raw/{DATASET_NAME}/ -t "dataset_raw/"

In [None]:
#@title Download dataset (Tsukuyomi-chan JVS)
#@markdown You can download this dataset if you don't have your own dataset.
#@markdown Make sure you agree to the license when using this dataset.
#@markdown https://tyc.rei-yumesaki.net/material/corpus/#toc6
# !wget https://tyc.rei-yumesaki.net/files/sozai-tyc-corpus1.zip
# !unzip sozai-tyc-corpus1.zip
# !mv "/content/つくよみちゃんコーパス Vol.1 声優統計コーパス（JVSコーパス準拠）/おまけ：WAV（+12dB増幅＆高音域削減）/WAV（+12dB増幅＆高音域削減）" "dataset_raw/tsukuyomi"

In [None]:
#@title Automatic preprocessing Resample to 44100Hz and mono
!svc pre-resample

In [None]:
#@title Divide filelists and generate config.json
CONFIG_TYPE = "so-vits-svc-4.0v1" #@param ["quickvc", "so-vits-svc-4.0v1-legacy", "so-vits-svc-4.0v1"]
!svc pre-config -t {CONFIG_TYPE}

In [None]:
#@title Backup configs file
!cp -r configs drive/MyDrive/so-vits-svc-fork/
!cp -r filelists drive/MyDrive/so-vits-svc-fork/

In [None]:
#@title Download configs file
!cp -r drive/MyDrive/so-vits-svc-fork/configs .
!cp -r drive/MyDrive/so-vits-svc-fork/filelists .

In [None]:
#@title Generate hubert and f0
F0_METHOD = "crepe" #@param ["crepe", "crepe-tiny", "parselmouth", "dio", "harvest"]
FORCE_REBUILD_ON = True #@param {type:"boolean"}
if FORCE_REBUILD_ON:
    !svc pre-hubert -fm {F0_METHOD}
else:
    !svc pre-hubert -fm {F0_METHOD} -nf

In [None]:
#@title Backup or download dataset hubert and f0
DATASET_NAME = "kiritan" #@param {type: "string"}

BACKUP_ON = True #@param {type:"boolean"}
if BACKUP_ON:
    !zip -r dataset.zip dataset
    # !zip -r dataset.wav.zip dataset -i dataset/**/*.wav
    # !zip -r dataset.data.pt.zip dataset -i dataset/**/*.data.pt

    !mkdir -p drive/MyDrive/so-vits-svc-fork/datasets/{DATASET_NAME}/
    !rclone mkdir colab:/so-vits-svc-fork/datasets/{DATASET_NAME}/ --config /content/drive/MyDrive/rclone.conf

    !rm -f *.{{md5,sha1,sha256}}.sum
    !rclone hashsum MD5 . --output-file checksum.md5.sum --filter "+ dataset{{,.wav,.data.pt}}.zip" --filter "- dataset/**" --filter "- *.{{md5,sha1,sha256}}.sum" --filter "- **"
    !rclone hashsum SHA1 . --output-file checksum.sha1.sum --filter "+ dataset{{,.wav,.data.pt}}.zip" --filter "- dataset/**" --filter "- *.{{md5,sha1,sha256}}.sum" --filter "- **"
    !rclone hashsum SHA256 . --output-file checksum.sha256.sum --filter "+ dataset{{,.wav,.data.pt}}.zip" --filter "- dataset/**" --filter "- *.{{md5,sha1,sha256}}.sum" --filter "- **"
    
    # !cp -f dataset{{,.wav,.data.pt}}.zip drive/MyDrive/so-vits-svc-fork/datasets/{DATASET_NAME}/
    !cp -f *.{{md5,sha1,sha256}}.sum drive/MyDrive/so-vits-svc-fork/datasets/{DATASET_NAME}/
    !rclone copy . colab:/so-vits-svc-fork/datasets/{DATASET_NAME}/ --config /content/drive/MyDrive/rclone.conf --no-traverse --filter "+ dataset{{,.wav,.data.pt}}.zip" --filter "- dataset/**" --filter "- *.{{md5,sha1,sha256}}.sum" --filter "- **"
    !rclone copy . colab:/so-vits-svc-fork/datasets/{DATASET_NAME}/ --config /content/drive/MyDrive/rclone.conf --no-traverse --filter "+ /*.{{md5,sha1,sha256}}.sum" --filter "- **"

    # !rclone check . drive/MyDrive/so-vits-svc-fork/datasets/{DATASET_NAME}/ --one-way --download --differ - --missing-on-dst - --error - --filter "+ dataset{{,.wav,.data.pt}}.zip" --filter "- dataset/**" --filter "- *.{{md5,sha1,sha256}}.sum" --filter "- **"
    !rclone check . drive/MyDrive/so-vits-svc-fork/datasets/{DATASET_NAME}/ --one-way --download --differ - --missing-on-dst - --error - --filter "+ /*.{{md5,sha1,sha256}}.sum" --filter "- **"
    # !rclone check . colab:/so-vits-svc-fork/datasets/{DATASET_NAME}/ --config /content/drive/MyDrive/rclone.conf --one-way --download --differ - --missing-on-dst - --error - --filter "+ dataset{{,.wav,.data.pt}}.zip" --filter "- dataset/**" --filter "- *.{{md5,sha1,sha256}}.sum" --filter "- **"
    !rclone check checksum.md5.sum colab:/so-vits-svc-fork/datasets/{DATASET_NAME}/ --config /content/drive/MyDrive/rclone.conf --checkfile MD5 --one-way --download --differ - --missing-on-dst - --error - --filter "+ dataset{{,.wav,.data.pt}}.zip" --filter "- dataset/**" --filter "- *.{{md5,sha1,sha256}}.sum" --filter "- **"
    !rclone check . colab:/so-vits-svc-fork/datasets/{DATASET_NAME}/ --config /content/drive/MyDrive/rclone.conf --one-way --download --differ - --missing-on-dst - --error - --filter "+ /*.{{md5,sha1,sha256}}.sum" --filter "- **"

    # !rclone checksum MD5 checksum.md5.sum colab:/so-vits-svc-fork/datasets/{DATASET_NAME}/ --config /content/drive/MyDrive/rclone.conf --one-way --download --differ - --missing-on-dst - --error - --filter "+ dataset{{,.wav,.data.pt}}.zip" --filter "- dataset/**" --filter "- *.{{md5,sha1,sha256}}.sum" --filter "- **"

    # !ps -ef | grep rclone
    # !iftop -t -s 1 -n
else:
    !cp -f drive/MyDrive/so-vits-svc-fork/datasets/{DATASET_NAME}/*.{{md5,sha1,sha256}}.sum .
    !rclone copy colab:/so-vits-svc-fork/datasets/{DATASET_NAME}/ . --config /content/drive/MyDrive/rclone.conf --no-traverse --filter "+ dataset{{,.wav,.data.pt}}.zip" --filter "- dataset/**" --filter "- *.{{md5,sha1,sha256}}.sum" --filter "- **"

    !rclone check checksum.md5.sum . --checkfile MD5 --one-way --differ - --missing-on-dst - --error - --filter "+ dataset{{,.wav,.data.pt}}.zip" --filter "- dataset/**" --filter "- *.{{md5,sha1,sha256}}.sum" --filter "- **"
    # !rclone checksum MD5 checksum.md5.sum . --one-way --differ - --missing-on-dst - --error - --filter "+ dataset{{,.wav,.data.pt}}.zip" --filter "- dataset/**" --filter "- *.{{md5,sha1,sha256}}.sum" --filter "- **"

    # !unzip -d ./ drive/MyDrive/so-vits-svc-fork/datasets/{DATASET_NAME}/dataset.zip
    !unzip -d ./ dataset.zip

In [None]:
#@title Check dataset
!ls dataset/*/*/* | wc -l
!ls dataset/*/*/*.wav | wc -l
!ls dataset/*/*/*.data.pt | wc -l
!ls -lt dataset/**/* | tail -n 10

In [None]:
#@title Train
TENSORBOARD_ON = True #@param {type:"boolean"}
if TENSORBOARD_ON:
    %load_ext tensorboard
    %tensorboard --logdir drive/MyDrive/so-vits-svc-fork/logs/44k
    !svc train --model-path drive/MyDrive/so-vits-svc-fork/logs/44k
else:
    !svc train --model-path drive/MyDrive/so-vits-svc-fork/logs/44k

## Training Cluster model

In [None]:
#@title Train cluster model (Optional)
!svc train-cluster --output-path drive/MyDrive/so-vits-svc-fork/logs/44k/kmeans.pt

## Inference

In [None]:
#@title Get the author's voice as a source
import random
NAME = str(random.randint(1, 49))
TYPE = "fsd50k" #@param ["", "digit", "dog", "fsd50k"]
CUSTOM_FILEPATH = "" #@param {type: "string"}
if CUSTOM_FILEPATH != "":
    NAME = CUSTOM_FILEPATH
else:
    # it is extremely difficult to find a voice that can download from the internet directly
    if TYPE == "dog":
        !wget -N f"https://huggingface.co/datasets/437aewuh/dog-dataset/resolve/main/dogs/dogs_{NAME:.0000}.wav" -O {NAME}.wav
    elif TYPE == "digit":
        # george, jackson, lucas, nicolas, ...
        !wget -N f"https://github.com/Jakobovski/free-spoken-digit-dataset/raw/master/recordings/0_george_{NAME}.wav" -O {NAME}.wav
    elif TYPE == "fsd50k":
        !wget -N f"https://huggingface.co/datasets/Fhrozen/FSD50k/blob/main/clips/dev/{10000+int(NAME)}.wav" -O {NAME}.wav
    else:
        !wget -N f"https://zunko.jp/sozai/utau/voice_{"kiritan" if NAME < 25 else "itako"}{NAME % 5 + 1}.wav" -O {NAME}.wav

from IPython.display import Audio, display

display(Audio(f"{NAME}.wav"))

In [None]:
#@title Use trained model
#@markdown **Put your .wav file in `so-vits-svc-fork/audio` directory**
from IPython.display import Audio, display

AUDIO_PATH = 'drive/MyDrive/so-vits-svc-fork/audio/' #@param {type: "string"}
NAME = "test" #@param {type: "string"}
TRANSPOSE = 0 #@param {type: "number"}
F0_METHOD = "crepe" #@param ["crepe", "crepe-tiny", "parselmouth", "dio", "harvest"]

AUTO_PREDICT_F0 = True #@param {type:"boolean"}
if AUTO_PREDICT_F0:
    !svc infer {AUDIO_PATH}{NAME}.wav -m drive/MyDrive/so-vits-svc-fork/logs/44k/ -c drive/MyDrive/so-vits-svc-fork/logs/44k/config.json -t {TRANSPOSE} -fm {F0_METHOD}
else:
    !svc infer {AUDIO_PATH}{NAME}.wav -m drive/MyDrive/so-vits-svc-fork/logs/44k/ -c drive/MyDrive/so-vits-svc-fork/logs/44k/config.json -t {TRANSPOSE} -fm {F0_METHOD} -na
display(Audio(f"{AUDIO_PATH}{NAME}.wav", autoplay=False))
display(Audio(f"{AUDIO_PATH}{NAME}.out.wav", autoplay=False))

In [None]:
##@title Use trained model (with cluster)
from IPython.display import Audio, display

AUDIO_PATH = 'drive/MyDrive/so-vits-svc-fork/audio/' #@param {type: "string"}
NAME = "test" #@param {type: "string"}
SPEAKER = "kiritan" #@param {type: "string"}
TRANSPOSE = 0 #@param {type: "number"}
F0_METHOD = "crepe" #@param ["crepe", "crepe-tiny", "parselmouth", "dio", "harvest"]

AUTO_PREDICT_F0 = True #@param {type:"boolean"}
if AUTO_PREDICT_F0:
    !svc infer {AUDIO_PATH}{NAME}.wav -s {SPEAKER} -r 0.1 -m drive/MyDrive/so-vits-svc-fork/logs/44k/ -c drive/MyDrive/so-vits-svc-fork/logs/44k/config.json -k drive/MyDrive/so-vits-svc-fork/logs/44k/kmeans.pt -t {TRANSPOSE} -fm {F0_METHOD}
else:
    !svc infer {AUDIO_PATH}{NAME}.wav -s {SPEAKER} -r 0.1 -m drive/MyDrive/so-vits-svc-fork/logs/44k/ -c drive/MyDrive/so-vits-svc-fork/logs/44k/config.json -k drive/MyDrive/so-vits-svc-fork/logs/44k/kmeans.pt -t {TRANSPOSE} -fm {F0_METHOD} -na
display(Audio(f"{AUDIO_PATH}{NAME}.wav", autoplay=False))
display(Audio(f"{AUDIO_PATH}{NAME}.out.wav", autoplay=False))

### Backup models

In [None]:
DATASET_NAME = "kiritan" #@param {type: "string"}

!rm -f *.{{md5,sha1,sha256}}.sum
!rclone mkdir colab:/so-vits-svc-fork/models/{DATASET_NAME}/ --config /content/drive/MyDrive/rclone.conf
!rclone copy colab:/so-vits-svc-fork/models/{DATASET_NAME}/ . --config /content/drive/MyDrive/rclone.conf --no-traverse --filter "+ /*.{{md5,sha1,sha256}}.sum" --filter "- **"

# !rclone ls colab:/so-vits-svc-fork/models/{DATASET_NAME}/ --config /content/drive/MyDrive/rclone.conf --filter "+ /*.{{md5,sha1,sha256}}.sum" --filter "- **"
# !rclone ls . --filter "+ /*.{{md5,sha1,sha256}}.sum" --filter "- **"
# !rclone ls drive/MyDrive/so-vits-svc-fork/logs/44k/ --filter "+ config.json" --filter "- {{D,G}}_0.pth" --filter "+ {{D,G}}_*.pth" --filter "+ kmeans.pt" --filter "- *.{{md5,sha1,sha256}}.sum" --filter "- **"

!sort -uk 2 <(rclone hashsum MD5 drive/MyDrive/so-vits-svc-fork/logs/44k/ --filter "+ config.json" --filter "- {{D,G}}_0.pth" --filter "+ {{D,G}}_*.pth" --filter "+ kmeans.pt" --filter "- *.{{md5,sha1,sha256}}.sum" --filter "- **") <(cat checksum.md5.sum) > checksum.md5.sum
!sort -uk 2 <(rclone hashsum SHA1 drive/MyDrive/so-vits-svc-fork/logs/44k/ --filter "+ config.json" --filter "- {{D,G}}_0.pth" --filter "+ {{D,G}}_*.pth" --filter "+ kmeans.pt" --filter "- *.{{md5,sha1,sha256}}.sum" --filter "- **") <(cat checksum.sha1.sum) > checksum.sha1.sum
!sort -uk 2 <(rclone hashsum SHA256 drive/MyDrive/so-vits-svc-fork/logs/44k/ --filter "+ config.json" --filter "- {{D,G}}_0.pth" --filter "+ {{D,G}}_*.pth" --filter "+ kmeans.pt" --filter "- *.{{md5,sha1,sha256}}.sum" --filter "- **") <(cat checksum.sha256.sum) > checksum.sha256.sum

!cp -f *.{{md5,sha1,sha256}}.sum drive/MyDrive/so-vits-svc-fork/logs/44k/
!rclone copy drive/MyDrive/so-vits-svc-fork/logs/44k/ colab:/so-vits-svc-fork/models/{DATASET_NAME}/ --config /content/drive/MyDrive/rclone.conf --no-traverse --filter "+ config.json" --filter "- {{D,G}}_0.pth" --filter "+ {{D,G}}_*.pth" --filter "+ kmeans.pt" --filter "- *.{{md5,sha1,sha256}}.sum" --filter "- **"
!rclone copy . colab:/so-vits-svc-fork/models/{DATASET_NAME}/ --config /content/drive/MyDrive/rclone.conf --no-traverse --filter "+ /*.{{md5,sha1,sha256}}.sum" --filter "- **"

!rclone check . drive/MyDrive/so-vits-svc-fork/logs/44k/ --one-way --download --differ - --missing-on-dst - --error - --filter "+ /*.{{md5,sha1,sha256}}.sum" --filter "- **"
!rclone check drive/MyDrive/so-vits-svc-fork/logs/44k/ colab:/so-vits-svc-fork/models/{DATASET_NAME}/ --config /content/drive/MyDrive/rclone.conf --one-way --download --differ - --missing-on-dst - --error - --filter "+ config.json" --filter "- {{D,G}}_0.pth" --filter "+ {{D,G}}_*.pth" --filter "+ kmeans.pt" --filter "- *.{{md5,sha1,sha256}}.sum" --filter "- **"
!rclone check . colab:/so-vits-svc-fork/models/{DATASET_NAME}/ --config /content/drive/MyDrive/rclone.conf --one-way --download --differ - --missing-on-dst - --error - --filter "+ /*.{{md5,sha1,sha256}}.sum" --filter "- **"

### Pretrained models

In [None]:
#@title https://huggingface.co/TachibanaKimika/so-vits-svc-4.0-models/tree/main
!wget -N "https://huggingface.co/TachibanaKimika/so-vits-svc-4.0-models/resolve/main/riri/G_riri_220.pth"
!wget -N "https://huggingface.co/TachibanaKimika/so-vits-svc-4.0-models/resolve/main/riri/config.json"

In [None]:
from IPython.display import Audio, display

AUDIO_PATH = 'drive/MyDrive/so-vits-svc-fork/audio/' #@param {type: "string"}
NAME = "test" #@param {type: "string"}

AUTO_PREDICT_F0 = True #@param {type:"boolean"}
if AUTO_PREDICT_F0:
    !svc infer {AUDIO_PATH}{NAME}.wav -c config.json -m G_riri_220.pth
else:
    !svc infer {AUDIO_PATH}{NAME}.wav -c config.json -m G_riri_220.pth -na
display(Audio(f"{AUDIO_PATH}{NAME}.wav", autoplay=False))
display(Audio(f"{AUDIO_PATH}{NAME}.out.wav", autoplay=False))

In [None]:
#@title https://huggingface.co/therealvul/so-vits-svc-4.0/tree/main
!wget -N "https://huggingface.co/therealvul/so-vits-svc-4.0/resolve/main/Pinkie%20(speaking%20sep)/G_166400.pth"
!wget -N "https://huggingface.co/therealvul/so-vits-svc-4.0/resolve/main/Pinkie%20(speaking%20sep)/config.json"

In [None]:
from IPython.display import Audio, display

AUDIO_PATH = 'drive/MyDrive/so-vits-svc-fork/audio/' #@param {type: "string"}
NAME = "test" #@param {type: "string"}

AUTO_PREDICT_F0 = True #@param {type:"boolean"}
if AUTO_PREDICT_F0:
    !svc infer {AUDIO_PATH}{NAME}.wav --speaker "Pinkie {neutral}" -c config.json -m G_166400.pth
else:
    !svc infer {AUDIO_PATH}{NAME}.wav --speaker "Pinkie {neutral}" -c config.json -m G_166400.pth -na
display(Audio(f"{AUDIO_PATH}{NAME}.wav", autoplay=False))
display(Audio(f"{AUDIO_PATH}{NAME}.out.wav", autoplay=False))