## Multi-Accent and Multi-Lingual Voice Clone Demo with MeloTTS

In [1]:
import os
import torch
from openvoice import se_extractor
from openvoice.api import ToneColorConverter

  from .autonotebook import tqdm as notebook_tqdm


Importing the dtw module. When using in academic works please cite:
  T. Giorgino. Computing and Visualizing Dynamic Time Warping Alignments in R: The dtw Package.
  J. Stat. Soft., doi:10.18637/jss.v031.i07.



### Initialization

In this example, we will use the checkpoints from OpenVoiceV2. OpenVoiceV2 is trained with more aggressive augmentations and thus demonstrate better robustness in some cases.

In [2]:
ckpt_converter = 'checkpoints_v2/converter'
device = "cuda:0" if torch.cuda.is_available() else "cpu"
output_dir = 'outputs_v2'

tone_color_converter = ToneColorConverter(f'{ckpt_converter}/config.json', device=device)
tone_color_converter.load_ckpt(f'{ckpt_converter}/checkpoint.pth')

os.makedirs(output_dir, exist_ok=True)

Loaded checkpoint 'checkpoints_v2/converter/checkpoint.pth'
missing/unexpected keys: [] []


In [3]:
tone_color_converter.device

'cpu'

### Obtain Tone Color Embedding
We only extract the tone color embedding for the target speaker. The source tone color embeddings can be directly loaded from `checkpoints_v2/ses` folder.

In [3]:

#reference_speaker = 'resources/example_reference.mp3' # This is the voice you want to clone
reference_speaker = 'resources/training_enrique.mp3' # This is the voice you want to clone
target_se, audio_name = se_extractor.get_se(reference_speaker, tone_color_converter, vad=False)

OpenVoice version: v2


Estimating duration from bitrate, this may be inaccurate


#### Use MeloTTS as Base Speakers

MeloTTS is a high-quality multi-lingual text-to-speech library by @MyShell.ai, supporting languages including English (American, British, Indian, Australian, Default), Spanish, French, Chinese, Japanese, Korean. In the following example, we will use the models in MeloTTS as the base speakers. 

In [4]:
from melo.api import TTS

texts = {
    'EN_NEWEST': "Did you ever hear a folk tale about a giant turtle?",  # The newest English base speaker model
    'EN': "Did you ever hear a folk tale about a giant turtle?",
    'ES': "El resplandor del sol acaricia las olas, pintando el cielo con una paleta deslumbrante.",
    'FR': "La lueur dorée du soleil caresse les vagues, peignant le ciel d'une palette éblouissante.",
    'ZH': "在这次vacation中，我们计划去Paris欣赏埃菲尔铁塔和卢浮宫的美景。",
    'JP': "彼は毎朝ジョギングをして体を健康に保っています。",
    'KR': "안녕하세요! 오늘은 날씨가 정말 좋네요.",
}


src_path = f'{output_dir}/tmp.wav'

# Speed is adjustable
speed = 1.0

for language, text in texts.items():
    model = TTS(language=language, device=device)
    speaker_ids = model.hps.data.spk2id
    
    for speaker_key in speaker_ids.keys():
        speaker_id = speaker_ids[speaker_key]
        speaker_key = speaker_key.lower().replace('_', '-')
        
        source_se = torch.load(f'checkpoints_v2/base_speakers/ses/{speaker_key}.pth', map_location=device)
        model.tts_to_file(text, speaker_id, src_path, speed=speed)
        save_path = f'{output_dir}/output_v2_{speaker_key}.wav'

        # Run the tone color converter
        encode_message = "@MyShell"
        tone_color_converter.convert(
            audio_src_path=src_path, 
            src_se=source_se, 
            tgt_se=target_se, 
            output_path=save_path,
            message=encode_message)

Downloading tokenizer_config.json: 100%|██████████| 251/251 [00:00<00:00, 252kB/s]
Downloading vocab.txt: 100%|██████████| 231k/231k [00:00<00:00, 2.43MB/s]
[nltk_data] Downloading package averaged_perceptron_tagger to
[nltk_data]     /home/enrique/nltk_data...
[nltk_data]   Unzipping taggers/averaged_perceptron_tagger.zip.
[nltk_data] Downloading package cmudict to /home/enrique/nltk_data...
[nltk_data]   Unzipping corpora/cmudict.zip.
Downloading tokenizer_config.json: 100%|██████████| 48.0/48.0 [00:00<00:00, 30.4kB/s]
Downloading config.json: 100%|██████████| 570/570 [00:00<00:00, 515kB/s]
Downloading vocab.txt: 100%|██████████| 232k/232k [00:00<00:00, 9.79MB/s]
Downloading tokenizer.json: 100%|██████████| 466k/466k [00:00<00:00, 13.3MB/s]
Downloading tokenizer_config.json: 100%|██████████| 48.0/48.0 [00:00<00:00, 37.5kB/s]
Downloading config.json: 100%|██████████| 625/625 [00:00<00:00, 667kB/s]
Downloading vocab.txt: 100%|██████████| 872k/872k [00:00<00:00, 23.3MB/s]
Downloading to

 > Text split to sentences.
Did you ever hear a folk tale about a giant turtle?


Downloading pytorch_model.bin: 100%|██████████| 440M/440M [00:12<00:00, 34.9MB/s]
Some weights of the model checkpoint at bert-base-uncased were not used when initializing BertForMaskedLM: ['cls.seq_relationship.bias', 'cls.seq_relationship.weight']
- This IS expected if you are initializing BertForMaskedLM from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).
- This IS NOT expected if you are initializing BertForMaskedLM from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model).
100%|██████████| 1/1 [00:16<00:00, 16.07s/it]
Downloading config.json: 100%|██████████| 3.49k/3.49k [00:00<00:00, 2.65MB/s]
Downloading checkpoint.pth: 100%|██████████| 208M/208M [00:05<00:00, 35.2MB/s] 


 > Text split to sentences.
Did you ever hear a folk tale about a giant turtle?


100%|██████████| 1/1 [00:01<00:00,  1.18s/it]


 > Text split to sentences.
Did you ever hear a folk tale about a giant turtle?


100%|██████████| 1/1 [00:01<00:00,  1.18s/it]


 > Text split to sentences.
Did you ever hear a folk tale about a giant turtle?


100%|██████████| 1/1 [00:01<00:00,  1.64s/it]


 > Text split to sentences.
Did you ever hear a folk tale about a giant turtle?


100%|██████████| 1/1 [00:01<00:00,  1.50s/it]


 > Text split to sentences.
Did you ever hear a folk tale about a giant turtle?


100%|██████████| 1/1 [00:01<00:00,  1.23s/it]
Downloading config.json: 100%|██████████| 3.43k/3.43k [00:00<00:00, 3.37MB/s]
Downloading checkpoint.pth: 100%|██████████| 208M/208M [00:05<00:00, 36.3MB/s] 


 > Text split to sentences.
El resplandor del sol acaricia las olas, pintando el cielo con una paleta deslumbrante.


100%|██████████| 1/1 [00:03<00:00,  3.80s/it]
Downloading config.json: 100%|██████████| 3.40k/3.40k [00:00<00:00, 3.54MB/s]
Downloading checkpoint.pth: 100%|██████████| 208M/208M [00:06<00:00, 34.3MB/s] 


 > Text split to sentences.
La lueur dorée du soleil caresse les vagues, peignant le ciel d'une palette éblouissante.


Downloading pytorch_model.bin: 100%|██████████| 445M/445M [00:12<00:00, 35.3MB/s]
Some weights of the model checkpoint at dbmdz/bert-base-french-europeana-cased were not used when initializing BertForMaskedLM: ['cls.seq_relationship.bias', 'cls.seq_relationship.weight']
- This IS expected if you are initializing BertForMaskedLM from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).
- This IS NOT expected if you are initializing BertForMaskedLM from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model).
100%|██████████| 1/1 [00:16<00:00, 16.42s/it]
Downloading config.json: 100%|██████████| 2.30k/2.30k [00:00<00:00, 1.71MB/s]
Downloading checkpoint.pth: 100%|██████████| 208M/208M [00:06<00:00, 33.9MB/s] 


 > Text split to sentences.
在这次vacation中,
我们计划去Paris欣赏埃菲尔铁塔和卢浮宫的美景.


  0%|          | 0/2 [00:00<?, ?it/s]Building prefix dict from the default dictionary ...
Dumping model to file cache /tmp/jieba.cache
Loading model cost 0.407 seconds.
Prefix dict has been built successfully.
Downloading pytorch_model.bin: 100%|██████████| 672M/672M [00:19<00:00, 34.6MB/s]
Some weights of the model checkpoint at bert-base-multilingual-uncased were not used when initializing BertForMaskedLM: ['cls.seq_relationship.bias', 'cls.seq_relationship.weight']
- This IS expected if you are initializing BertForMaskedLM from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).
- This IS NOT expected if you are initializing BertForMaskedLM from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model).
100%|██████████| 2/2 [00:25<00:00, 12.85s/it]
Downloading conf

 > Text split to sentences.
彼は毎朝ジョギングをして体を健康に保っています.


100%|██████████| 1/1 [00:02<00:00,  2.58s/it]
Downloading config.json: 100%|██████████| 3.40k/3.40k [00:00<00:00, 2.72MB/s]
Downloading checkpoint.pth: 100%|██████████| 208M/208M [00:05<00:00, 35.1MB/s] 


 > Text split to sentences.
안녕하세요! 오늘은 날씨가 정말 좋네요.


  0%|          | 0/1 [00:00<?, ?it/s]

you have to install python-mecab-ko. install it...
huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
	- Avoid using `tokenizers` before the fork if possible
	- Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)
Collecting python-mecab-ko
  Downloading python_mecab_ko-1.3.5-cp39-cp39-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.metadata (3.4 kB)
Collecting python-mecab-ko-dic (from python-mecab-ko)
  Downloading python_mecab_ko_dic-2.1.1.post2-py3-none-any.whl.metadata (1.4 kB)
Downloading python_mecab_ko-1.3.5-cp39-cp39-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (578 kB)
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m578.4/578.4 kB[0m [31m4.6 MB/s[0m eta [36m0:00:00[0ma [36m0:00:01[0m
[?25hDownloading python_mecab_ko_dic-2.1.1.post2-py3-none-any.whl (34.5 MB)
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m34.5/34.5 MB[0m

Downloading pytorch_model.bin: 100%|██████████| 476M/476M [00:13<00:00, 34.7MB/s]
Some weights of the model checkpoint at kykim/bert-kor-base were not used when initializing BertForMaskedLM: ['cls.seq_relationship.bias', 'cls.seq_relationship.weight']
- This IS expected if you are initializing BertForMaskedLM from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).
- This IS NOT expected if you are initializing BertForMaskedLM from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model).
100%|██████████| 1/1 [00:23<00:00, 23.89s/it]
