**Original project link：https://github.com/yxlllc/DDSP-SVC**

It is a fast training SVC project, very suitable for the free version, the training degree can be obtained within the limits of the model file, while inference on the quality of the input source is very low requirements. colab part of the code borrowed from Sovits' Colab.

Process the data in advance.
Require pure human voice, body whispering sound/gas sound is not suitable (difficult to extract F0), best to remove.
When processing split into 2s-10s, maybe more than 20s is possible, this project is not eaten memory, but below 2s is not allowed.
Remember to resample to 44.1kHz when processing, non-sampling rate can run but will greatly reduce the efficiency.
Because of the free version of the limit, it is best to pre-process locally, and do not use non-44.1kHz data, the data quality in general, a limit of time can also get good training results.

Dataset file structure.

Place all the training set data (.wav format audio slices) in data/train/audio

Put all the validation set data (.wav format audio slices) into data/val/audio

Pack the data folder into a zip format named data.zip and upload it to the root of Google Cloud Drive

It is recommended to pre-process locally to save time on the limit, and after pre-processing, just follow the same method as above to package and upload

Two training methods are provided, "combsub-based model" and "sinusoidal additive synthesizer-based model (sin)", the latter being less comprehensive than the former but still providing options.

Modify the hyperparameters (such as "bs") to increase the video memory occupation may not improve efficiency (I tried to reduce anyway), the default parameter is about 4.85batch/s, almost no overfitting.

3/6 added the code of downloading the bottom mode, test convergence speed increased by about 40%, almost no tone leakage at 50k, the test is not very full, still can be a try.

3/8 updated multi-speaker training, the dataset structure is as follows, single person training can still use the previous dataset with no effect. The multi-speaker model can be enabled by modifying the 'n_spk' option in the configuration file.

```
# Training set
# 1st speaker
data/train/audio/1/aaa.wav
data/train/audio/1/bbb.wav
...
# 2 speaker
data/train/audio/2/ccc.wav
data/train/audio/2/ddd.wav
...

# Validation set
# 1st speaker
data/val/audio/1/eee.wav
data/val/audio/1/fff.wav
...
# 2nd speaker
data/val/audio/2/ggg.wav
data/val/audio/2/hhh.wav
...
```

In [None]:

#@title Check GPU
!nvidia-smi

In [None]:

#@title Clone the github repository
!git clone https://github.com/yxlllc/DDSP-SVC

In [None]:
#@title Install dependencies
%cd /content/DDSP-SVC
!pip install pyworld praat-parselmouth torchcrepe einops local_attention

In [None]:
#@title Download necessary files
!wget -P pretrain/hubert/ https://github.com/bshall/hubert/releases/download/v0.1/hubert-soft-0d54a1f4.pt
!wget -P pretrain/ https://github.com/openvpi/vocoders/releases/download/nsf-hifigan-v1/nsf_hifigan_20221211.zip
!unzip -d /content/DDSP-SVC/pretrain /content/DDSP-SVC/pretrain/nsf_hifigan_20221211.zip

In [None]:
#@title Loading Google Drive
from google.colab import drive
drive.mount('/content/drive')

In [None]:
#@title Get dataset from cloud disk
!unzip -d / /content/drive/MyDrive/data.zip #自行修改路径与文件名

File structure:

Put all training set data (audio slices in .wav format) into data/train/audio

Put all validation set data (.wav format audio slices) into data/val/audio

In [None]:
#@title Data preprocessing, already processed can be skipped
#@markdown ##Select training method
way = "combsub" #@param ["combsub","sins"]

if way == "combsub":
  !python preprocess.py -c configs/combsub.yaml
if way == "sins":
  !python preprocess.py -c configs/sins.yaml

In [None]:
#@title Packing/Backing Up Datasets
!zip -r dataset.zip /content/DDSP-SVC/data
!cp /content/DDSP-SVC/dataset.zip /content/drive/MyDrive/

In [None]:
#@title Set model backup
#@markdown **Whether to back up the model to the cloud disk, colab explodes at any time, it is recommended to back up, and it is saved to the DDSP-SVC folder in the root directory of the cloud disk by default**
Save_to_drive = True #@param {type:"boolean"}
if Save_to_drive:
  !rm -rf /content/DDSP-SVC/exp
  !mkdir -p /content/drive/MyDrive/DDSPSVC
  !ln -s /content/drive/MyDrive/DDSPSVC /content/DDSP-SVC/exp

In [None]:
#@title Download pre-trained model (optional)
!wget -P exp https://github.com/yxlllc/DDSP-SVC/releases/download/1.0/opencpop.zip
!mkdir /content/DDSP-SVC/exp/combsub-test/
!mkdir /content/DDSP-SVC/exp/sin-test/
!unzip -d /content/DDSP-SVC/exp /content/DDSP-SVC/exp/opencpop.zip
!cp /content/DDSP-SVC/exp/opencpop/model_300000.pt /content/DDSP-SVC/exp/sin-test/
!cp /content/DDSP-SVC/exp/opencpop/model_300000.pt /content/DDSP-SVC/exp/combsub-test/

**Before training, please open the "/DDSP-SVC/config" folder in the "File" tab on the left column, and modify the first "combsub.yaml" or "sins.yaml" (depending on your training method) On line 35, change "cache_device: 'cpu'" to "cache_device: 'cuda'", and click the "File" tab on the upper sidebar to save.**

**This will further speed up training.**

**Using the bottom model can properly increase the value of "lr" in line 39 in the early stage of training? (Without testing, you need to master it by yourself, don’t change it if you are not sure)**

In [None]:
#@title Start Training
#@markdown ##Select training method
way = "combsub" #@param ["combsub","sins"]

%load_ext tensorboard
%tensorboard --logdir exp

%cd /content/DDSP-SVC
if way == "combsub":
  !python train.py -c configs/combsub.yaml
if way == "sins":
  !python train.py -c configs/sins.yaml

In [None]:
#@title Reasoning (Multiple speakers modify the code to specify the speaker)
#@markdown **Upload the processed ".wav" input source file to the root directory of the cloud disk, and modify the following options**

#@markdown **Select training method**
way = "combsub" #@param ["combsub","sins"]
#@markdown **".wav" File filename**
input = "input" #@param {type:"string"}
input_path = "/content/drive/MyDrive/"
input_name =  input_path + input
model_path = "/content/drive/MyDrive/DDSP-SVC"
#@markdown **Pitch adjustment**
keychange = "0"  #@param {type:"string"}
!python main.py -i {input_name}.wav -m {model_path}/{way}/model_best.pt -o {input_name}_result.wav -k {keychange} -e true