1. Tested on the following libraries:
    -   NVidia GeForce RTX 2060 6GB
    -   Windows 11 + WSL2
    -   Ubuntu 22.04
    -   Python 3.10
    -   CUDA Toolkit 11.8
    -   openai 1.6.1
    -   TTS 0.22.0
    -   deepspeed 0.12.6

2. Create and run the following script `xtts_download.py` to download the model:

```python
# xtts_download.py
import os
os.environ["COQUI_TOS_AGREED"]="1"

from TTS.utils.manage import ModelManager
print("Downloading...")
mm =  ModelManager(output_prefix="~/gai/models/tts")
model_name="tts_models/multilingual/multi-dataset/xtts_v2"
mm.download_model(model_name)
print("Downloaded")
```

Take note that loading the model for the first time will take a while for deepspeed to compile the model.

In [1]:
## XTTS_TTS
from gai.gen.tts.XTTS_TTS import XTTS_TTS
config={
            "model_name": "tts-coqui-xtts",
            "type": "tts",
            "engine": "XTTS_TTS",
            "model_path": "models/tts/tts_models--multilingual--multi-dataset--xtts_v2",
            "max_seq_len": 128,
        }
tts = XTTS_TTS(config)
response = tts.create(
  voice="Vjollca Johnnie",
  input="The definition of insanity is doing the same thing over and over and expecting different results.",
  language="en",
  stream=True
)
from IPython.display import Audio
audio_data = b''.join(chunk for chunk in response)
Audio(audio_data, rate=24000)

2024-06-22 06:08:10 INFO gai.gen.tts.XTTS_TTS:[32mXTTS generating...[0m
2024-06-22 06:08:10 INFO gai.gen.tts.XTTS_TTS:[32mLoading XTTS...[0m


Loading XTTS...


  from .autonotebook import tqdm as notebook_tqdm


[2024-06-22 06:08:34,822] [INFO] [real_accelerator.py:161:get_accelerator] [32mSetting ds_accelerator to cuda (auto detect)[0m
[2024-06-22 06:08:36,019] [INFO] [logging.py:96:log_dist] [32m[Rank -1] DeepSpeed info: version=0.12.6, git-hash=unknown, git-branch=unknown[0m
[2024-06-22 06:08:36,025] [INFO] [logging.py:96:log_dist] [32m[Rank -1] quantize_bits = 8 mlp_extra_grouping = False, quantize_groups = 1[0m


Using /home/roylai/.cache/torch_extensions/py310_cu121 as PyTorch extensions root...
Detected CUDA files, patching ldflags
Emitting ninja build file /home/roylai/.cache/torch_extensions/py310_cu121/transformer_inference/build.ninja...
Building extension module transformer_inference...
Allowing ninja to set a default number of workers... (overridable by setting the environment variable MAX_JOBS=N)


ninja: no work to do.
Time to load transformer_inference op: 0.21056342124938965 seconds
[2024-06-22 06:08:37,265] [INFO] [logging.py:96:log_dist] [32m[Rank -1] DeepSpeed-Inference config: {'layer_id': 0, 'hidden_size': 1024, 'intermediate_size': 4096, 'heads': 16, 'num_hidden_layers': -1, 'dtype': torch.float32, 'pre_layer_norm': True, 'norm_type': <NormType.LayerNorm: 1>, 'local_rank': -1, 'stochastic_mode': False, 'epsilon': 1e-05, 'mp_size': 1, 'scale_attention': True, 'triangular_masking': True, 'local_attention': False, 'window_size': 1, 'rotary_dim': -1, 'rotate_half': False, 'rotate_every_two': True, 'return_tuple': True, 'mlp_after_attn': True, 'mlp_act_func_type': <ActivationFuncType.GELU: 1>, 'specialized_mode': False, 'training_mp_size': 1, 'bigscience_bloom': False, 'max_out_tokens': 1024, 'min_out_tokens': 1, 'scale_attn_by_inverse_layer_idx': False, 'enable_qkv_quantization': False, 'use_mup': False, 'return_single_tuple': False, 'set_empty_params': False, 'transposed_m

Loading extension module transformer_inference...
2024-06-22 06:08:39 INFO gai.gen.tts.XTTS_TTS:[32mXTTS Loaded.[0m
2024-06-22 06:08:39 INFO gai.gen.tts.XTTS_TTS:[32mXTTS completed.[0m


------------------------------------------------------
Free memory : 4.624023 (GigaBytes)  
Total memory: 7.999573 (GigaBytes)  
Requested memory: 0.335938 (GigaBytes) 
Setting maximum total tokens (input + output) to 1024 
WorkSpace: 0x79d000000 
------------------------------------------------------


In [None]:
## Gaigen
from gai.gen import Gaigen
gen = Gaigen.GetInstance("../../../gai-gen/gai.json").load('xtts-2')
response = gen.create(
  voice="Vjollca Johnnie",
  input="The definition of insanity is doing the same thing over and over and expecting different results.",
  language="en",
  stream=True
)
from IPython.display import Audio
Audio(response, rate=24000)

GENERATING:


2024-06-12 14:11:44 INFO gai.gen.Gaigen:[32mGaigen: Loading generator xtts-2...[0m
2024-06-12 14:11:44 INFO gai.gen.tts.TTS:[32mLoading TTS...[0m
2024-06-12 14:11:44 INFO gai.gen.tts.TTS:[32mUsing tts model XTTS_TTS...[0m
2024-06-12 14:11:44 INFO gai.gen.tts.XTTS_TTS:[32mLoading XTTS...[0m


Loading TTS...
Loading XTTS...


  from .autonotebook import tqdm as notebook_tqdm


[2024-06-12 14:12:06,345] [INFO] [real_accelerator.py:161:get_accelerator] [32mSetting ds_accelerator to cuda (auto detect)[0m
[2024-06-12 14:12:07,177] [INFO] [logging.py:96:log_dist] [32m[Rank -1] DeepSpeed info: version=0.12.6, git-hash=unknown, git-branch=unknown[0m
[2024-06-12 14:12:07,181] [INFO] [logging.py:96:log_dist] [32m[Rank -1] quantize_bits = 8 mlp_extra_grouping = False, quantize_groups = 1[0m


Using /home/roylai/.cache/torch_extensions/py310_cu121 as PyTorch extensions root...
Detected CUDA files, patching ldflags
Emitting ninja build file /home/roylai/.cache/torch_extensions/py310_cu121/transformer_inference/build.ninja...
Building extension module transformer_inference...
Allowing ninja to set a default number of workers... (overridable by setting the environment variable MAX_JOBS=N)


ninja: no work to do.
Time to load transformer_inference op: 0.08993363380432129 seconds
[2024-06-12 14:12:08,102] [INFO] [logging.py:96:log_dist] [32m[Rank -1] DeepSpeed-Inference config: {'layer_id': 0, 'hidden_size': 1024, 'intermediate_size': 4096, 'heads': 16, 'num_hidden_layers': -1, 'dtype': torch.float32, 'pre_layer_norm': True, 'norm_type': <NormType.LayerNorm: 1>, 'local_rank': -1, 'stochastic_mode': False, 'epsilon': 1e-05, 'mp_size': 1, 'scale_attention': True, 'triangular_masking': True, 'local_attention': False, 'window_size': 1, 'rotary_dim': -1, 'rotate_half': False, 'rotate_every_two': True, 'return_tuple': True, 'mlp_after_attn': True, 'mlp_act_func_type': <ActivationFuncType.GELU: 1>, 'specialized_mode': False, 'training_mp_size': 1, 'bigscience_bloom': False, 'max_out_tokens': 1024, 'min_out_tokens': 1, 'scale_attn_by_inverse_layer_idx': False, 'enable_qkv_quantization': False, 'use_mup': False, 'return_single_tuple': False, 'set_empty_params': False, 'transposed_m

Loading extension module transformer_inference...
2024-06-12 14:12:10 INFO gai.gen.tts.XTTS_TTS:[32mXTTS Loaded.[0m
2024-06-12 14:12:10 INFO gai.gen.tts.XTTS_TTS:[32mXTTS generating...[0m
2024-06-12 14:12:10 INFO gai.gen.tts.XTTS_TTS:[32mXTTS completed.[0m


TypeError: float() argument must be a string or a real number, not 'generator'