<a href="https://colab.research.google.com/github/luxiya0615/yezaikai/blob/main/vits%E6%A8%A1%E5%9E%8B%E8%AE%AD%E7%BB%83%E7%AC%94%E8%AE%B0%E6%9C%AC.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

Forked from https://github.com/CjangCjengh/vits/blob/main/vits.ipynb

该笔记本可用于训练vits单人和多人模型，不包括语音合成。

**请注意该笔记本不包括合成语音，而且默认只保存一个checkpoint以节省空间。建议每次重新开始训练之前备份一次。**

**默认每隔200次step保存一次，可在“每隔多少次step保存一次断点”部分进行修改。**

**在看到进度save之前不要轻易退出，以免丢失进度。**

[tacotron2笔记本](https://colab.research.google.com/drive/18fbCupSaQde-FtF2Z2Na-LP5BrukjNMs?usp=sharing)

[添加情感向量支持的vits笔记本](https://colab.research.google.com/drive/10MkPCQhhTs30jwUSMpZ8mTbptqpUOLnl?usp=sharing)

[单人数据集制作工具包](https://colab.research.google.com/drive/1oM3HuRdGtONgpNNTredRCYeG_JrdF1be?usp=sharing)




### 常见问题Q&A
Q：压缩包解压后还是“FileNotFoundError”？

确保压缩包下的文件夹**直接是**`wavs`和`filelists`。如果还是不行，请检查压缩包的格式是否在支持的格式范围内。

Q: ValueError: too many values to unpack (expected 3)

如果是训练单人模型，txt台词文件不需要加中间的序号，和tacotron2一样就可以了。

如果是训练多人模型，台词中间不能出现符号'|'。

Q: IndexError: list index out of range

台词文件不能有任何空行。

Q: UnicodeDecodeError: 'utf-8' codec can't decode byte 0x8d in position 21: invalid start byte

txt文件的编码问题，需要保存成UTF-8格式。

Q: RuntimeError: shape '\[1, 1, 264836\]' is invalid for input of size 529672

需要wav文件为**单声道**

Q：CUDA out of memory

~~喜报：CUDA out of memory~~

batch_size设置太大或语音文件过长，可以尝试减小batch_size或缩短语音文件长度。

Q：训练集多大效果比较好？训练多长时间？

建议200条语音起步，最好2000条语音以上。训练到200 iteration左右，即出现"ieration 200"字样即可。

另外收集的语音质量对效果影响比较大，建议收集发音清晰、情绪平稳的语音，音频文件格式参考tacotron2，音频时长在3\~10秒左右。

Q：我的数据集实在不够，有办法解决吗？

如果一个角色的语音比较少，可以尝试先用**同一语种**的其他角色的语音混进去，等合成出的语音发音标准后再换数据集、只留一个角色的语音训练(可以理解为“微调”)。**注意配置中的speaker前后不能变**。

Q: AssertionError: 4D tensors expect 4 values for padding

报错原因是音频文件是立体声，导致多了一个维度。

需要将音频文件转成22050Hz**单声道**的wav文件。

Q：如何继续上次的进度训练？

如果云端硬盘中存放模型的路径没有改变，使用上次运行的笔记本重新运行一遍即可。

Q：我训练了很长时间，合成效果还是不太行，怎么办？

可能是以下原因：
1. 数据集有较多错误，台词和语音不对应
2. 合成时或训练时cleaner选择错误，cleaner和symbol不对应
3. 语音发音不清晰，语气激烈
4. 语音过长或过短
5. 样本数量太少

Q: 有些角色因为语音比较少，效果不如其他角色，有什么好办法吗？

可以尝试在台词txt文件中把这个角色的台词多复制粘贴几遍，比如：
```
wavs/001.wav|0|バラバラ
wavs/002.wav|0|バラバラ
```
改成
```
wavs/001.wav|0|バラバラ
wavs/002.wav|0|バラバラ
wavs/001.wav|0|バラバラ
wavs/002.wav|0|バラバラ
wavs/001.wav|0|バラバラ
wavs/002.wav|0|バラバラ
```
通过这种方式让该角色的语音被学习的次数更多一些，可能有效改善语音质量。

### 其他相关参考

[Tacotron2、Vits、SoVits、Diffsvc常见报错及其解决方案](https://www.bilibili.com/read/cv20636396)




In [None]:
#@title 准备
#@markdown 定义工具函数 `run_command` `run_command_by_line` `get_symbols` 和 `get_tensorboard_showing`
# forked from https://www.endpointdev.com/blog/2015/01/getting-realtime-output-using-python/
import os
import re
import subprocess
def run_command(command_args):
    def print_pipe(raw):
        return print(raw.decode("utf-8"), end='')
    try:
      process = subprocess.Popen(command_args, stdout=subprocess.PIPE, stderr=subprocess.PIPE)
      out, err = process.communicate()
    except:
      pass
    print_pipe(out)
    print_pipe(err)
    rc = process.poll()
    return rc

def run_command_by_line(command_args):
    def print_pipe(raw):
        return print(raw.decode("utf-8"), end='')
    with subprocess.Popen(command_args, stdout=subprocess.PIPE, stderr=subprocess.PIPE) as process:
      while process.poll() is None:
        print_pipe(process.stdout.readline())
      errlines = process.stderr.readlines()
      errlines = [line.decode("utf-8") for line in errlines]
      if len(errlines) > 0:
        sp = "\r\n"
        print(f'Warning: {sp} {"".join(errlines)}')
    return

'''
Defines the set of symbols used in text input to the model.
'''

symbols_map = {
    "japanese_cleaners": {
        "_pad": '_',
        "_punctuation": ',.!?-',
        "_letters": 'AEINOQUabdefghijkmnoprstuvwyzʃʧ↓↑ '
    },
    "japanese_cleaners2": {
        "_pad": '_',
        "_punctuation": ',.!?-~…',
        "_letters": 'AEINOQUabdefghijkmnoprstuvwyzʃʧʦ↓↑ ',
    },
    "korean_cleaners": {
        "_pad": '_',
        "_punctuation": ',.!?…~',
        "_letters": 'ㄱㄴㄷㄹㅁㅂㅅㅇㅈㅊㅋㅌㅍㅎㄲㄸㅃㅆㅉㅏㅓㅗㅜㅡㅣㅐㅔ ',
    },
    "cjke_cleaners2": {
        "_pad": '_',
        "_punctuation": ',.!?…~',
        "_letters": 'NQabdefghijklmnopstuvwxyzɑæʃʑçɯɪɔɛɹðəɫɥɸʊɾʒθβŋɦ⁼ʰ`',
    },
}


def get_symbols(specify_cleaners):
    if re.match(r'english_cleaners', specify_cleaners):
        specify_cleaners = "cjke_cleaners2"
    if specify_cleaners not in symbols_map.keys():
        raise ValueError("不存在对应cleaners的symbols!")
    symbols = symbols_map[specify_cleaners]
    return [symbols["_pad"]] + list(symbols["_punctuation"]) + list(symbols["_letters"])

def get_tensorboard_showing(logdir):
    from multiprocessing import Process
    from tensorboard import notebook
    import tensorflow as tf
    import time

    def run_tb():
        run_command_by_line(["tensorboard","--reload_interval", "30",  "--logdir", logdir, "--bind_all"])

    def monitor_tb():
        while True:
            try:
                notebook.display(height=998)
                break
            except Exception as e:
                print(e)
                time.sleep(3)

    Process(target=run_tb).start()
    Process(target=monitor_tb).start()

def clean_empty_lines(file):
    file_in = open(file, "r", encoding="utf-8")
    content = file_in.readlines()
    file_in.close()
    file_out = open(file, "w", encoding="utf-8")
    for i in range(len(content)):
        line = content[i]
        line = line.strip()
        if len(line) == 0:
            continue
        if i == len(content) - 1:
            print(line, file=file_out, end="")
        else:
            print(line, file=file_out)
    file_out.close()

In [None]:
#@title 加载Google云端硬盘
from google.colab import drive
drive.mount('/content/drive')

In [None]:
#@title 使用缓存数据（可选）
#@markdown 如果之前缓存过数据，可填入缓存数据的压缩包路径，跳过"下载依赖库"( 除安装依赖外 )至"预处理"的步骤 <br />
#@markdown 否则可以跳过
hh_cache_data_use_path = "/content/drive/MyDrive/cache.7z" # @param {type:"string"}
run_command_by_line(["7z", "x", hh_cache_data_use_path, "vits"])
os.chdir('/content/vits')
!pip install -r requirements.txt
!sudo apt-get install espeak -y
!sudo apt-get install p7zip-full p7zip-rar
!pip install demjson
!pip install transformers

In [None]:
#@title 下载依赖库
#@markdown 取消勾选则不会节省空间
colab_save_space = True #@param {type:"boolean"}
os.chdir('/content')
run_command_by_line(["git", "clone", "https://github.com/wind4000/vits.git", "-b", "save-space-2" if colab_save_space else "main"])
os.chdir('/content/vits')


In [None]:
#@markdown 安装依赖
os.chdir('/content/vits')
!rm /usr/local/bin/cmake
!cmake --version
!pip install -r requirements.txt
!sudo apt-get install espeak -y
!sudo apt-get install p7zip-full p7zip-rar
!pip install setuptools==57.5
!pip install demjson
!pip install -U numpy==1.23.0

In [None]:
#@title 解压数据集
#@markdown 压缩包路径
import subprocess
dataset_path = "/content/drive/MyDrive/dataset/YOURDATASET.rar"  #@param {type:"string"}
os.chdir('/content/vits')
run_command_by_line(["7z", "x", dataset_path])

**目前支持的cleaner(和tacotron2版效果不同)**

cleaners from https://github.com/CjangCjengh/vits

english cleaners 来自 `cjke_cleaners2` in https://github.com/CjangCjengh/vits

|序号|cleaners名称|语种|
|---|---|---|
|1. |japanese_cleaners|日语|
|2. |korean_cleaners|韩语|
|3. |english_cleaners_ipa|英语|
|4. |english_cleaners_ipa2|英语|
|5. |english_cleaners_lazy_ipa|英语|

In [None]:
#@title 生成配置文件
# forked from https://github.com/CjangCjengh/vits/blob/main/configs/japanese_ss_base2.json
#@markdown 配置文件名称
json_filename = "test.json" #@param {type:"string"}
#@markdown 训练次数
hparams_epochs = 2000 #@param {type:"integer"}
#@markdown 每隔多少次step保存一次断点
hparams_eval_interval = 200 #@param {type:"integer"}
#@markdown 单次step的文件数（建议在16以内）
hparams_batch_size = 12 #@param {type:"integer"}
#@markdown 训练集文件列表
hparams_training_files = "/content/vits/filelists/list.txt" #@param {type:"string"}
#@markdown 验证集文件列表
hparams_validation_files = "/content/vits/filelists/list.txt"#@param {type:"string"}
#@markdown 选择cleaner
hparams_cleaner =  "japanese_cleaners" #@param {type:"string"}
#@markdown 人物名，多个人物用英文逗号隔开
hparams_speaker = "test" #@param {type:"string"}
#@markdown 模型名
hparams_model_name = "test" #@param {type:"string"}

hparams_symbols = get_symbols(hparams_cleaner)
speakers = [speaker.strip() for speaker in hparams_speaker.split(",")]
print("speakers: ")
for i, speaker in enumerate(speakers):
  print("\t{a}: {b}".format(a=i, b=speaker))
training_json = {
  "train": {
    "log_interval": 200,
    "eval_interval": hparams_eval_interval,
    "seed": 1234 ,
    "epochs": hparams_epochs,
    "learning_rate": 2e-4,
    "betas": [0.8, 0.99],
    "eps": 1e-9,
    "batch_size": hparams_batch_size,
    "fp16_run": True,
    "lr_decay": 0.999875,
    "segment_size": 8192,
    "init_lr_ratio": 1,
    "warmup_epochs": 0,
    "c_mel": 45,
    "c_kl": 1.0
  },
  "data": {
    "training_files": hparams_training_files + ".cleaned",
    "validation_files": hparams_validation_files + ".cleaned",
    "text_cleaners":[hparams_cleaner],
    "max_wav_value": 32768.0,
    "sampling_rate": 22050,
    "filter_length": 1024,
    "hop_length": 256,
    "win_length": 1024,
    "n_mel_channels": 80,
    "mel_fmin": 0.0,
    "mel_fmax": None,
    "add_blank": True,
    "n_speakers": len(speakers) if len(speakers) > 1 else 0,
    "cleaned_text": True
  },
  "model": {
    "inter_channels": 192,
    "hidden_channels": 192,
    "filter_channels": 768,
    "n_heads": 2,
    "n_layers": 6,
    "kernel_size": 3,
    "p_dropout": 0.1,
    "resblock": "1",
    "resblock_kernel_sizes": [3,7,11],
    "resblock_dilation_sizes": [[1,3,5], [1,3,5], [1,3,5]],
    "upsample_rates": [8,8,2,2],
    "upsample_initial_channel": 512,
    "upsample_kernel_sizes": [16,16,4,4],
    "n_layers_q": 3,
    "use_spectral_norm": False,
  },
  "speakers": speakers,
  "symbols": hparams_symbols
}

if len(speakers) > 1:
  training_json["model"]["gin_channels"] = 256

import demjson
os.chdir('/content/vits/configs')
training_json_text = demjson.encode(training_json)
with open(json_filename, "w") as file:
  file.write(training_json_text)

os.chdir('/content/vits/text')
with open("symbols.py", "w") as file:
  print("symbols = ", hparams_symbols, sep="", file=file)
os.chdir('/content/vits')



In [None]:
#@title 预处理
#@markdown 尝试自动清空行
is_auto_clean_empty = False # @param {type:"boolean"}
if is_auto_clean_empty:
  print(f'---0. Clean empty line...---')
  clean_empty_lines(hparams_training_files)
  clean_empty_lines(hparams_validation_files)
print(f'---1. Preprocess...---')
os.chdir('/content/vits/monotonic_align')
!python setup.py build_ext --inplace
os.chdir('/content/vits')
run_command(["python", "preprocess.py", "--text_index", "2" if len(speakers) > 1 else "1", "--text_cleaners", hparams_cleaner, "--filelists", hparams_training_files, hparams_validation_files])

In [None]:
#@title 缓存数据（可选）
#@markdown **缓存后可省去一些繁琐步骤，但会占用很多空间， 可跳过**
%cd /content
hh_cache_data_path = "/content/drive/MyDrive/cache.7z" # @param {type:"string"}
run_command_by_line(["7z", "a", hh_cache_data_path, "vits"])

In [None]:
#@title 训练

#@markdown 模型保存位置 <br >
model_path = "/content/drive/MyDrive" # @param {type:"string"}
#@markdown 启用tensorboard可视化数据
enable_tb = False  # @param {type:"boolean"}
if enable_tb:
  logdir = os.path.join(model_path, hparams_model_name)
  get_tensorboard_showing(logdir)
os.chdir('/content/vits')
run_command_by_line(["python", "train_ms.py" if len(speakers) > 1 else "train.py", "-c", "configs/{json}".format(json=json_filename), "-m", hparams_model_name, "-o", model_path])

文件目录结构如下 <br >
```
--模型保存位置 (model_path)
---模型名 (hparams_model_name)
----logs
----G.pth
----D.pth
```

例: 模型保存位置 `model_path`为 `/content/drive/MyDrive`，模型名 `hparams_model_name` 为 `test`

则把模型放到 `/content/drive/MyDrive/test/G.pth` 和·`/content/drive/MyDrive/test/D.pth`

## 工具

这部分辅助[MoeTTS](https://github.com/luoyily/MoeTTS)等软件用vits合成语音。

运行本部分前必须执行的步骤：“准备”、“下载依赖库”、“加载Google云端硬盘”和“生成配置文件”。

这部分代码不要求GPU，可使用非GPU运行时，即达到限额后仍可使用。

In [None]:
#@title 生成供MoeTTS使用的配置文件
#@markdown 保存路径
moetts_savepath = "/content/drive/MyDrive/" #@param {type:"string"}
moetts_filepath = moetts_savepath + "config.json"
moetts_filepath_symbol = moetts_savepath + "moetts.json"
training_json["data"]["text_cleaners"] = ["custom_cleaners"]
training_json_text = demjson.encode(training_json)
moetts_symbols = {"symbols": hparams_symbols}
moetts_symbols_text = demjson.encode(moetts_symbols)
with open(moetts_filepath, "w") as file:
  file.write(training_json_text)
with open(moetts_filepath_symbol, "w") as file:
  file.write(moetts_symbols_text)
print("已保存到", moetts_filepath)
print("已保存到", moetts_filepath_symbol)

In [None]:
#@title 合成前转换文本
os.chdir('/content/vits')
import text
input_text = "\u3053\u308C\u304B\u3089\u3082\u3001\u304A\u308C\u305F\u3061\u304C\u305F\u3061\u3068\u307E\u3089\u306A\u3044\u304B\u304E\u308A\u3001\u9053\u306F\u7D9A\u304F\u3002" #@param {type:"string"}
input_cleaners = "japanese_cleaners" #@param {type:"string"}
try:
  output_text = text._clean_text(input_text, [input_cleaners])
  print("转换结果：", output_text)
except Exception as e:
  print("文本有误？", e)