## Multi-Accent and Multi-Lingual Voice Clone Demo with MeloTTS

In [22]:
import os
import torch
from openvoice import se_extractor
from openvoice.api import ToneColorConverter

### Initialization

In this example, we will use the checkpoints from OpenVoiceV2. OpenVoiceV2 is trained with more aggressive augmentations and thus demonstrate better robustness in some cases.

In [17]:
ckpt_converter = 'checkpoints_v2/converter'
device = "cuda:0" if torch.cuda.is_available() else "cpu"
output_dir = 'outputs_v2'

tone_color_converter = ToneColorConverter(f'{ckpt_converter}/config.json', device=device)
tone_color_converter.load_ckpt(f'{ckpt_converter}/checkpoint.pth')

os.makedirs(output_dir, exist_ok=True)

Loaded checkpoint 'checkpoints_v2/converter/checkpoint.pth'
missing/unexpected keys: [] []


### Obtain Tone Color Embedding
We only extract the tone color embedding for the target speaker. The source tone color embeddings can be directly loaded from `checkpoints_v2/ses` folder.

In [19]:

reference_speaker = 'resources/example_reference.mp3' # This is the voice you want to clone
target_se, audio_name = se_extractor.get_se(reference_speaker, tone_color_converter, vad=True)

OpenVoice version: v2




Downloading: "https://github.com/snakers4/silero-vad/zipball/master" to /home/piai/.cache/torch/hub/master.zip
[(0.0, 58.8188125)]
after vad: dur = 58.81798185941043


Note: you can still call torch.view_as_real on the complex output to recover the old return format. (Triggered internally at /pytorch/aten/src/ATen/native/SpectralOps.cpp:875.)
  return _VF.stft(  # type: ignore[attr-defined]


#### Use MeloTTS as Base Speakers

MeloTTS is a high-quality multi-lingual text-to-speech library by @MyShell.ai, supporting languages including English (American, British, Indian, Australian, Default), Spanish, French, Chinese, Japanese, Korean. In the following example, we will use the models in MeloTTS as the base speakers. 

In [25]:
from melo.api import TTS

texts = {
    # 'EN_NEWEST': "Did you ever hear a folk tale about a giant turtle?",  # The newest English base speaker model
    # 'EN': "Did you ever hear a folk tale about a giant turtle?",
    'ES': "El resplandor del sol acaricia las olas, pintando el cielo con una paleta deslumbrante.",
    'FR': "La lueur dorée du soleil caresse les vagues, peignant le ciel d'une palette éblouissante.",
    'ZH': "在这次vacation中，我们计划去Paris欣赏埃菲尔铁塔和卢浮宫的美景。",
    'JP': "彼は毎朝ジョギングをして体を健康に保っています。",
    'KR': "안녕하세요! 오늘은 날씨가 정말 좋네요.",
}


src_path = f'{output_dir}/tmp.wav'

# Speed is adjustable
speed = 1.0

for language, text in texts.items():
    model = TTS(language=language, device=device)
    speaker_ids = model.hps.data.spk2id
    
    for speaker_key in speaker_ids.keys():
        speaker_id = speaker_ids[speaker_key]
        speaker_key = speaker_key.lower().replace('_', '-')
        
        source_se = torch.load(f'checkpoints_v2/base_speakers/ses/{speaker_key}.pth', map_location=device)
        if torch.backends.mps.is_available() and device == 'cpu':
            torch.backends.mps.is_available = lambda: False
        model.tts_to_file(text, speaker_id, src_path, speed=speed)
        save_path = f'{output_dir}/output_v2_{speaker_key}.wav'

        # Run the tone color converter
        encode_message = "@MyShell"
        tone_color_converter.convert(
            audio_src_path=src_path, 
            src_se=source_se, 
            tgt_se=target_se, 
            output_path=save_path,
            message=encode_message)

Downloading config.json: 3.43kB [00:00, 3.65MB/s]
Downloading checkpoint.pth: 100%|██████████████████████████████████| 208M/208M [00:02<00:00, 104MB/s]


 > Text split to sentences.
El resplandor del sol acaricia las olas, pintando el cielo con una paleta deslumbrante.


100%|██████████████████████████████████████████████████████████████████| 1/1 [00:00<00:00,  1.12it/s]
Downloading config.json: 3.40kB [00:00, 3.66MB/s]
Downloading checkpoint.pth: 100%|█████████████████████████████████| 208M/208M [00:02<00:00, 93.6MB/s]


 > Text split to sentences.
La lueur dorée du soleil caresse les vagues, peignant le ciel d'une palette éblouissante.


  0%|                                                                          | 0/1 [00:00<?, ?it/s]
Downloading pytorch_model.bin:   0%|                                      | 0.00/445M [00:00<?, ?B/s][A
Downloading pytorch_model.bin:   2%|▋                            | 10.5M/445M [00:00<00:05, 85.4MB/s][A
Downloading pytorch_model.bin:   5%|█▎                           | 21.0M/445M [00:00<00:05, 81.6MB/s][A
Downloading pytorch_model.bin:   9%|██▋                          | 41.9M/445M [00:00<00:04, 99.6MB/s][A
Downloading pytorch_model.bin:  14%|████▏                         | 62.9M/445M [00:00<00:03, 102MB/s][A
Downloading pytorch_model.bin:  19%|█████▋                        | 83.9M/445M [00:00<00:03, 104MB/s][A
Downloading pytorch_model.bin:  21%|██████▏                      | 94.4M/445M [00:01<00:05, 69.0MB/s][A
Downloading pytorch_model.bin:  24%|███████                       | 105M/445M [00:01<00:04, 74.5MB/s][A
Downloading pytorch_model.bin:  28%|████████▍             

 > Text split to sentences.
在这次vacation中,
我们计划去Paris欣赏埃菲尔铁塔和卢浮宫的美景.


  0%|                                                                          | 0/2 [00:00<?, ?it/s]Building prefix dict from the default dictionary ...
Dumping model to file cache /tmp/jieba.cache
Loading model cost 0.689 seconds.
Prefix dict has been built successfully.

Downloading pytorch_model.bin:   0%|                                      | 0.00/672M [00:00<?, ?B/s][A
Downloading pytorch_model.bin:   2%|▍                            | 10.5M/672M [00:00<00:07, 85.7MB/s][A
Downloading pytorch_model.bin:   5%|█▎                           | 31.5M/672M [00:00<00:06, 98.4MB/s][A
Downloading pytorch_model.bin:   6%|█▊                           | 41.9M/672M [00:00<00:06, 94.9MB/s][A
Downloading pytorch_model.bin:   9%|██▋                          | 62.9M/672M [00:00<00:06, 96.9MB/s][A
Downloading pytorch_model.bin:  11%|███▏                         | 73.4M/672M [00:00<00:06, 95.3MB/s][A
Downloading pytorch_model.bin:  14%|████                         | 94.4M/672M [00:00<00:05, 96.

 > Text split to sentences.
彼は毎朝ジョギングをして体を健康に保っています.


100%|██████████████████████████████████████████████████████████████████| 1/1 [00:00<00:00,  6.19it/s]
Downloading config.json: 3.40kB [00:00, 6.82MB/s]
Downloading checkpoint.pth: 100%|█████████████████████████████████| 208M/208M [00:02<00:00, 95.8MB/s]


 > Text split to sentences.
안녕하세요! 오늘은 날씨가 정말 좋네요.


  0%|                                                                          | 0/1 [00:00<?, ?it/s]

you have to install python-mecab-ko. install it...
huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
	- Avoid using `tokenizers` before the fork if possible
	- Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)
Collecting python-mecab-ko
  Downloading python_mecab_ko-1.3.7-cp39-cp39-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.metadata (3.4 kB)
Collecting python-mecab-ko-dic (from python-mecab-ko)
  Downloading python_mecab_ko_dic-2.1.1.post2-py3-none-any.whl.metadata (1.4 kB)
Downloading python_mecab_ko-1.3.7-cp39-cp39-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (578 kB)
[2K   [38;2;114;156;31m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m578.6/578.6 kB[0m [31m13.9 MB/s[0m eta [36m0:00:00[0m
[?25hDownloading python_mecab_ko_dic-2.1.1.post2-py3-none-any.whl (34.5 MB)
[2K   [38;2;114;156;31m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m34.5/34


Downloading pytorch_model.bin:   0%|                                      | 0.00/476M [00:00<?, ?B/s][A
Downloading pytorch_model.bin:   2%|▋                            | 10.5M/476M [00:00<00:09, 49.8MB/s][A
Downloading pytorch_model.bin:   7%|█▉                           | 31.5M/476M [00:00<00:05, 81.6MB/s][A
Downloading pytorch_model.bin:  11%|███▏                         | 52.4M/476M [00:00<00:04, 94.2MB/s][A
Downloading pytorch_model.bin:  15%|████▋                         | 73.4M/476M [00:00<00:03, 102MB/s][A
Downloading pytorch_model.bin:  18%|█████                        | 83.9M/476M [00:00<00:03, 99.9MB/s][A
Downloading pytorch_model.bin:  22%|██████▊                        | 105M/476M [00:01<00:03, 103MB/s][A
Downloading pytorch_model.bin:  26%|████████▏                      | 126M/476M [00:01<00:03, 103MB/s][A
Downloading pytorch_model.bin:  31%|█████████▌                     | 147M/476M [00:01<00:03, 107MB/s][A
Downloading pytorch_model.bin:  35%|██████████▉       