## Fine Tuning a Text to spech model

Details of this process are based on the official [documentation](https://tts.readthedocs.io/en/latest/what_makes_a_good_dataset.html#what-makes-a-good-dataset) of TTS library 

In [1]:
!pip install pip install git+https://github.com/statscol/TTS@dev
!pip install gruut-lang-es ## important to install spanish phonemes 

Looking in indexes: https://pypi.org/simple, https://us-python.pkg.dev/colab-wheels/public/simple/
Collecting git+https://github.com/statscol/TTS@dev
  Cloning https://github.com/statscol/TTS (to revision dev) to /tmp/pip-req-build-yv37ozqq
  Running command git clone --filter=blob:none --quiet https://github.com/statscol/TTS /tmp/pip-req-build-yv37ozqq
  Resolved https://github.com/statscol/TTS to commit c5a28dbea5828b089835413287d19d892d600fde
  Installing build dependencies ... [?25l[?25hdone
  Getting requirements to build wheel ... [?25l[?25hdone
  Preparing metadata (pyproject.toml) ... [?25l[?25hdone
Collecting install
  Downloading install-1.3.5-py3-none-any.whl (3.2 kB)
Collecting trainer==0.0.20
  Downloading trainer-0.0.20-py3-none-any.whl (45 kB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m45.2/45.2 kB[0m [31m1.7 MB/s[0m eta [36m0:00:00[0m
[?25hCollecting jamo
  Downloading jamo-0.4.1-py3-none-any.whl (9.5 kB)
Collecting g2pkk>=0.1.1
  Downlo

In [2]:
import numpy as np 
import pandas as pd 
import os
import logging


In [3]:
## Mount Drive To save files

from google.colab import drive

drive.mount('/content/drive2/')

Mounted at /content/drive2/


In [4]:
## lets also create a folder to save files

import os

DEFAULT_DRIVE_FOLDER="/content/drive2/MyDrive/tts-ai/"
os.makedirs(DEFAULT_DRIVE_FOLDER, exist_ok=True)


## Clone repo with data

In [3]:
!git clone https://github.com/statscol/tts-ai-public-figure.git
%cd tts-ai-public-figure

/content/tts-ai-public-figure


In [6]:
!unzip -q audios_labeled.zip

In [7]:
import re

text="el co2 que emitimos en los 7 continentes"
def num2letters(text):
  n2l={'0':' cero','1':' uno','2':' dos','3':' tres','4':' cuatro','5':' cinco','6':' seis','7':' siete','8':' ocho','9':' nueve'}
  text=re.sub(r"(\d)", lambda x: n2l[str(x.group(0))], text)
  text=re.sub("\s\s+" , " ", text)
  return text

num2letters(text)

'el co dos que emitimos en los siete continentes'

In [8]:
import pandas as pd
import json

data=[json.loads(line) for line in open('manifest.json', 'r')]

data=[f"{i['audio_filepath'].split(chr(92))[-1].replace('.wav','')}|{num2letters(i['text'])}." for i in data]  # chr(92) is backslash
data[:5]

['ea50ff45-0vid1|vengo de uno de los tres países mas bellos de la tierra.',
 '641cca69-0vid2|porque muchas veces el crimen y la violencia disminuyen si.',
 '89444fc2-1vid1|allí hay una explosión de vida miles de.',
 '4c5d5277-1vid2|deja de haber hambre, si hay agua potable, si los jóvenes tienen posibilidades de sexo acc.',
 'ef41bbcf-2vid1|peces multicolores en los mares, en los cielos, en las tierra.']

In [9]:
%mkdir tts-dataset
%mv audio tts-dataset/wavs 

In [10]:
file = open('tts-dataset/metadata.txt','w')
for audio in data:
    file.write(audio+"\n")
file.close()


## lets check the output file

In [4]:
%%bash
tail -10 tts-dataset/metadata.txt

32c1b3ff-161vid1|ado, cuan dependiente son de lo que acabará, co.
77006a13-162vid1|con la especie humana, si observan que los pueblos se llenan de a.
b1d7a02e-163vid1|hambre y de sed, y emigran por millones hacia el norte, has.
e172e4ac-164vid1|y a donde está el agua, entonces ustedes los encierran, constru.
6da9017d-165vid1|truyen muros, despliegan ametralladores, les disparan.
27a3a43b-167vid1|duplican la mentalidad, de quien creó políticamente.
0947d2f6-175vid4|en su sociedad no lo pueden, digamos se tienen que mover muy prudente.
cc028208-177vid4|eh hay, digamos las cifras son contundentes, es que hay un millón.
ca23c690-178vid4|de muertos latinoamericanos, es que millones de personas nam.
ab9ce438-179vid4|mayoría negras, han pasado por consumir o por portar una pequeña cantida.


## Prepare Trainer

Code taken from TTS [documentation](https://tts.readthedocs.io/en/latest/finetuning.html) for fine-tuning

In [10]:
#!tts --list_models #use this to list models
#in the meantime lets use a pre-trained model for spanish

!tts --model_name tts_models/es/mai/tacotron2-DDC --text "Hola"

 > Downloading model to /root/.local/share/tts/tts_models--es--mai--tacotron2-DDC
100% 575M/575M [00:24<00:00, 23.8MiB/s]
 > Model's license - MPL
 > Check https://www.mozilla.org/en-US/MPL/2.0/ for more info.
 > Downloading model to /root/.local/share/tts/vocoder_models--universal--libri-tts--fullband-melgan
100% 109M/109M [00:04<00:00, 22.1MiB/s]
 > Model's license - MPL
 > Check https://www.mozilla.org/en-US/MPL/2.0/ for more info.
 > Using model: Tacotron2
 > Setting up Audio Processor...
 | > sample_rate:16000
 | > resample:False
 | > num_mels:80
 | > log_func:np.log10
 | > min_level_db:-100
 | > frame_shift_ms:None
 | > frame_length_ms:None
 | > ref_level_db:20
 | > fft_size:1024
 | > power:1.5
 | > preemphasis:0.0
 | > griffin_lim_iters:60
 | > signal_norm:True
 | > symmetric_norm:True
 | > mel_fmin:50.0
 | > mel_fmax:7600.0
 | > pitch_fmin:0.0
 | > pitch_fmax:640.0
 | > spec_gain:1.0
 | > stft_pad_mode:reflect
 | > max_norm:4.0
 | > clip_norm:True
 | > do_trim_silence:True
 | >

In [11]:
from IPython.display import Audio
wn = Audio('tts_output.wav')
display(wn)

### Install loggers and log in

In [5]:
##install logger and instantiate it

#!pip install wandb -q

project = "tts-petro-ai"
display_name = "VITS-es-2"

In [13]:
import wandb 
wandb.login() ##use open session to log in

wandb.init(project=project, name=display_name)

<IPython.core.display.Javascript object>

[34m[1mwandb[0m: Appending key for api.wandb.ai to your netrc file: /root/.netrc
[34m[1mwandb[0m: Currently logged in as: [33mjfparra[0m. Use [1m`wandb login --relogin`[0m to force relogin


In [14]:
##lets see where the model was downloaded
os.listdir("/root/.local/share/tts/tts_models--es--mai--tacotron2-DDC/")

['config.json', 'model_file.pth', 'scale_stats.npy']

### For custom training, we might need to define our own text_cleaner and formatter


In [None]:
# ## define custom cleaner for spanish , i had to fork the repo and update formatters and cleaners file which include the following methods

# import re

# _whitespace_re = re.compile(r"\s+")

# def replace_symbols(text, lang="en"):
#     """Replace symbols based on the lenguage tag.
#     Args:
#       text:
#        Input text.
#       lang:
#         Lenguage identifier. ex: "en", "fr", "pt", "ca".
#     Returns:
#       The modified text
#       example:
#         input args:
#             text: "si l'avi cau, diguem-ho"
#             lang: "ca"
#         Output:
#             text: "si lavi cau, diguemho"
#     """
#     text = text.replace(";", ",")
#     text = text.replace("-", " ") if lang != "ca" else text.replace("-", "")
#     text = text.replace(":", ",")
#     if lang == "en":
#         text = text.replace("&", " and ")
#     elif lang == "fr":
#         text = text.replace("&", " et ")
#     elif lang == "pt":
#         text = text.replace("&", " e ")
#     elif lang == "ca":
#         text = text.replace("&", " i ")
#         text = text.replace("'", "")
#     elif lang== "es":
#         text=text.replace("&","y")
#         text = text.replace("'", "")
#     return text

# def lowercase(text):
#     return text.lower()

# def collapse_whitespace(text):
#     return re.sub(_whitespace_re, " ", text).strip()

# def remove_aux_symbols(text):
#     text = re.sub(r"[\<\>\(\)\[\]\"]+", "", text)
#     return text

# def spanish_cleaners(text):
#     """Basic pipeline for Portuguese text. There is no need to expand abbreviation and
#     numbers, phonemizer already does that"""
#     text = lowercase(text)
#     text = replace_symbols(text, lang="es")
#     text = remove_aux_symbols(text)
#     text = collapse_whitespace(text)
#     return text

# def ljspeech_custom(root_path, meta_file, **kwargs):  # pylint: disable=unused-argument
#     """Normalizes the LJSpeech meta data file to TTS format
#     https://keithito.com/LJ-Speech-Dataset/"""
#     txt_file = os.path.join(root_path, meta_file)
#     items = []
#     speaker_name = "ljspeech"
#     with open(txt_file, "r", encoding="utf-8") as ttf:
#         for line in ttf:
#             cols = line.split("|")
#             wav_file = os.path.join(root_path, "wavs", cols[0] + ".wav")
#             text = cols[1] ## in the repo it appears as cols[2] which created a bug
#             items.append({"text": text, "audio_file": wav_file, "speaker_name": speaker_name, "root_path": root_path})
#     return items





## Selecting Model and ModelConfig

For this notebook we'll use **VITS**, however we can also use GlowTTS or Tacotron, but those are more memory-bound and thus could not train them in colab (free tier)

In [7]:
#writefile train_sp_recipe.py

##tacotron needs more ram for training

import os
from trainer import Trainer, TrainerArgs
from TTS.tts.configs.glow_tts_config import GlowTTSConfig
from TTS.tts.configs.tacotron2_config import Tacotron2Config
from TTS.tts.models.vits import Vits, VitsAudioConfig
from TTS.tts.configs.shared_configs import BaseDatasetConfig,CharactersConfig
from TTS.tts.datasets import load_tts_samples
from TTS.tts.configs.vits_config import VitsConfig
from TTS.tts.models.glow_tts import GlowTTS
from TTS.tts.models.tacotron2 import Tacotron2
from TTS.tts.utils.text.tokenizer import TTSTokenizer
from TTS.utils.audio import AudioProcessor
#import wandb 

DEFAULT_SAMPLE_RATE=16000

data_path="/content/tts-ai-public-figure/"

characters_conf=CharactersConfig(
        pad="<PAD>",
        bos="<BOS>",
        eos="<EOS>",
        blank="<BLNK>",
        characters="abcdefghijklmnopqrstuvwxyzáéíñóú ",
        punctuations="!¡'(),-.:;¿?",
    )

#wandb.login() ##use open session to log in

project = "tts-petro-ai"
display_name = "VITS-es-1"

#wandb_log=wandb.init(project=project, name=display_name)


dataset_config = BaseDatasetConfig(
    formatter="ljspeech_custom",meta_file_train="metadata.txt", path=os.path.join(data_path,"tts-dataset/"))

# INITIALIZE THE TRAINING CONFIGURATION
# Configure the model. Every config class inherits the BaseTTSConfig.

config = VitsConfig(
    run_name=display_name,
    project_name=project,
    batch_size=16,
    eval_batch_size=8,
    num_loader_workers=4,
    num_eval_loader_workers=4,
    run_eval=True,
    test_delay_epochs=-1,
    save_checkpoints=True,
    save_n_checkpoints=2,
    save_best_after=1000,
    epochs=100,
    characters=characters_conf,
    text_cleaner="spanish_cleaners",
    use_phonemes=True,
    phoneme_language="es-es",
    phoneme_cache_path=os.path.join(DEFAULT_DRIVE_FOLDER,"phoneme_cache"),
    compute_input_seq_cache=True,
    print_step=25,
    print_eval=False,
    mixed_precision=True,
    output_path=DEFAULT_DRIVE_FOLDER,
    datasets=[dataset_config],
#    dashboard_logger = 'wandb'
)

# ##AudioProcessor.init_from_config does not allow to set sample rate directly
config.audio['sample_rate']=DEFAULT_SAMPLE_RATE



In [15]:
## Audio processor
ap = AudioProcessor.init_from_config(config)
#=VitsAudioConfig.init_from_config(config)

# INITIALIZE THE TOKENIZER
tokenizer, config = TTSTokenizer.init_from_config(config)
train_samples, eval_samples = load_tts_samples(
    dataset_config,
    eval_split=True,
    eval_split_max_size=config.eval_split_max_size,
    eval_split_size=config.eval_split_size,
)


#model = Tacotron2(config, ap, tokenizer, speaker_manager=None)
model=Vits(config, ap, tokenizer, speaker_manager=None)


## see TrainerArgs here (https://github.com/coqui-ai/Trainer/blob/main/trainer/trainer.py)
trainer = Trainer(
    TrainerArgs(restore_path="/content/drive2/MyDrive/tts-ai/VITS-es-1-April-28-2023_08+10PM-7712052/best_model.pth",
                gpu=0), ##default gpu device 0
    config, DEFAULT_DRIVE_FOLDER, model=model, train_samples=train_samples, eval_samples=eval_samples
)


 > Setting up Audio Processor...
 | > sample_rate:16000
 | > resample:False
 | > num_mels:80
 | > log_func:np.log10
 | > min_level_db:0
 | > frame_shift_ms:None
 | > frame_length_ms:None
 | > ref_level_db:None
 | > fft_size:1024
 | > power:None
 | > preemphasis:0.0
 | > griffin_lim_iters:None
 | > signal_norm:None
 | > symmetric_norm:None
 | > mel_fmin:0
 | > mel_fmax:None
 | > pitch_fmin:None
 | > pitch_fmax:None
 | > spec_gain:20.0
 | > stft_pad_mode:reflect
 | > max_norm:1.0
 | > clip_norm:True
 | > do_trim_silence:False
 | > trim_db:60
 | > do_sound_norm:False
 | > do_amp_to_db_linear:True
 | > do_amp_to_db_mel:True
 | > do_rms_norm:False
 | > db_level:None
 | > stats_path:None
 | > base:10
 | > hop_length:256
 | > win_length:1024
 | > Found 405 files in /content/tts-ai-public-figure/tts-dataset


 > Training Environment:
 | > Current device: 0
 | > Num. of GPUs: 1
 | > Num. of CPUs: 2
 | > Num. of Torch Threads: 1
 | > Torch seed: 54321
 | > Torch CUDNN: True
 | > Torch CUDNN deterministic: False
 | > Torch CUDNN benchmark: False
 > Start Tensorboard: tensorboard --logdir=/content/drive2/MyDrive/tts-ai/VITS-es-1-April-28-2023_09+16PM-7712052
 > Restoring from best_model.pth ...
 > Restoring Model...
 > Restoring Optimizer...
 > Restoring Scaler...
 > Model restored from step 2210

 > Model has 83043436 parameters


## Lets run the training script

In [None]:
# !CUDA_VISIBLE_DEVICES="0" python train_sp_recipe.py \
#     --restore_path /root/.local/share/tts/tts_models--es--mai--tacotron2-DDC/model_file.pth

trainer.fit()


[4m[1m > EPOCH: 0/100[0m
 --> /content/drive2/MyDrive/tts-ai/VITS-es-1-April-28-2023_09+16PM-7712052

[1m > TRAINING (2023-04-28 21:17:52) [0m




> DataLoader initialization
| > Tokenizer:
	| > add_blank: True
	| > use_eos_bos: False
	| > use_phonemes: True
	| > phonemizer:
		| > phoneme language: es-es
		| > phoneme backend: gruut
| > Number of instances : 401
 | > Preprocessing samples
 | > Max text length: 94
 | > Min text length: 20
 | > Avg text length: 61.45885286783042
 | 
 | > Max audio length: 80022.0
 | > Min audio length: 64166.0
 | > Avg audio length: 79982.45885286783
 | > Num. instances discarded samples: 0
 | > Batch group size: 0.



[1m   --> STEP: 14/26 -- GLOBAL_STEP: 2225[0m
     | > loss_disc: 2.73510  (2.71666)
     | > loss_disc_real_0: 0.17843  (0.19080)
     | > loss_disc_real_1: 0.30627  (0.19283)
     | > loss_disc_real_2: 0.24192  (0.23810)
     | > loss_disc_real_3: 0.25352  (0.23730)
     | > loss_disc_real_4: 0.21376  (0.20456)
     | > loss_disc_real_5: 0.26213  (0.26998)
     | > loss_0: 2.73510  (2.71666)
     | > grad_norm_0: 6.46276  (9.68908)
     | > loss_gen: 1.83010  (1.86350)
     | > loss_kl: 1.73453  (1.67419)
     | > loss_feat: 1.97614  (1.97933)
     | > loss_mel: 25.28261  (24.46132)
     | > loss_duration: 1.82576  (1.74378)
     | > amp_scaler: 512.00000  (3035.42857)
     | > loss_1: 32.64914  (31.72211)
     | > grad_norm_1: 264.70023  (121.62417)
     | > current_lr_0: 0.00020 
     | > current_lr_1: 0.00020 
     | > step_time: 0.89240  (0.81550)
     | > loader_time: 0.01410  (0.00785)


[1m > EVALUATION [0m





> DataLoader initialization
| > Tokenizer:
	| > add_blank: True
	| > use_eos_bos: False
	| > use_phonemes: True
	| > phonemizer:
		| > phoneme language: es-es
		| > phoneme backend: gruut
| > Number of instances : 4
 | > Preprocessing samples
 | > Max text length: 81
 | > Min text length: 46
 | > Avg text length: 66.5
 | 
 | > Max audio length: 80022.0
 | > Min audio length: 80022.0
 | > Avg audio length: 80022.0
 | > Num. instances discarded samples: 0
 | > Batch group size: 0.
 | > Synthesizing test sentences.
['<BLNK>', 'i', '<BLNK>', 't', '<BLNK>', ' ', '<BLNK>', 't', '<BLNK>', 'o', '<BLNK>', 'o', '<BLNK>', 'k', '<BLNK>', ' ', '<BLNK>', 'm', '<BLNK>', 'e', '<BLNK>', ' ', '<BLNK>', 'k', '<BLNK>', 'i', '<BLNK>', 't', '<BLNK>', 'e', '<BLNK>', ' ', '<BLNK>', 'a', '<BLNK>', ' ', '<BLNK>', 'l', '<BLNK>', 'o', '<BLNK>', 'n', '<BLNK>', 'ɡ', '<BLNK>', ' ', '<BLNK>', 't', '<BLNK>', 'i', '<BLNK>', 'm', '<BLNK>', 'e', '<BLNK>', ' ', '<BLNK>', 't', '<BLNK>', 'o', '<BLNK>', ' ', '<BLNK>', 'd',


  [1m--> EVAL PERFORMANCE[0m
     | > avg_loader_time: 0.47802 [0m(+0.00000)
     | > avg_loss_disc: 2.65182 [0m(+0.00000)
     | > avg_loss_disc_real_0: 0.52053 [0m(+0.00000)
     | > avg_loss_disc_real_1: 0.10750 [0m(+0.00000)
     | > avg_loss_disc_real_2: 0.24460 [0m(+0.00000)
     | > avg_loss_disc_real_3: 0.15851 [0m(+0.00000)
     | > avg_loss_disc_real_4: 0.40067 [0m(+0.00000)
     | > avg_loss_disc_real_5: 0.13709 [0m(+0.00000)
     | > avg_loss_0: 2.65182 [0m(+0.00000)
     | > avg_loss_gen: 2.55073 [0m(+0.00000)
     | > avg_loss_kl: 2.06975 [0m(+0.00000)
     | > avg_loss_feat: 3.07863 [0m(+0.00000)
     | > avg_loss_mel: 25.17411 [0m(+0.00000)
     | > avg_loss_duration: 1.63365 [0m(+0.00000)
     | > avg_loss_1: 34.50686 [0m(+0.00000)

 > BEST MODEL : /content/drive2/MyDrive/tts-ai/VITS-es-1-April-28-2023_09+16PM-7712052/best_model_2237.pth

[4m[1m > EPOCH: 1/100[0m
 --> /content/drive2/MyDrive/tts-ai/VITS-es-1-April-28-2023_09+16PM-7712052

[1m > TR



> DataLoader initialization
| > Tokenizer:
	| > add_blank: True
	| > use_eos_bos: False
	| > use_phonemes: True
	| > phonemizer:
		| > phoneme language: es-es
		| > phoneme backend: gruut
	| > 6 not found characters:
	| > ɡ
	| > θ
	| > ͡
	| > ʃ
	| > ʝ
	| > ɾ
| > Number of instances : 401
 | > Preprocessing samples
 | > Max text length: 94
 | > Min text length: 20
 | > Avg text length: 61.45885286783042
 | 
 | > Max audio length: 80022.0
 | > Min audio length: 64166.0
 | > Avg audio length: 79982.45885286783
 | > Num. instances discarded samples: 0
 | > Batch group size: 0.



[1m   --> STEP: 13/26 -- GLOBAL_STEP: 2250[0m
     | > loss_disc: 2.74702  (2.72806)
     | > loss_disc_real_0: 0.26178  (0.21188)
     | > loss_disc_real_1: 0.22908  (0.25154)
     | > loss_disc_real_2: 0.23624  (0.24553)
     | > loss_disc_real_3: 0.27362  (0.24830)
     | > loss_disc_real_4: 0.18287  (0.24151)
     | > loss_disc_real_5: 0.23496  (0.25016)
     | > loss_0: 2.74702  (2.72806)
     | > grad_norm_0: 11.93253  (16.96728)
     | > loss_gen: 1.82288  (1.98276)
     | > loss_kl: 1.33672  (1.59219)
     | > loss_feat: 2.20925  (2.06982)
     | > loss_mel: 22.11871  (23.92949)
     | > loss_duration: 1.89877  (1.73876)
     | > amp_scaler: 512.00000  (512.00000)
     | > loss_1: 29.38633  (31.31302)
     | > grad_norm_1: 191.71942  (136.06023)
     | > current_lr_0: 0.00020 
     | > current_lr_1: 0.00020 
     | > step_time: 1.93200  (1.13788)
     | > loader_time: 0.02160  (0.01207)


[1m > EVALUATION [0m





> DataLoader initialization
| > Tokenizer:
	| > add_blank: True
	| > use_eos_bos: False
	| > use_phonemes: True
	| > phonemizer:
		| > phoneme language: es-es
		| > phoneme backend: gruut
	| > 6 not found characters:
	| > ɡ
	| > θ
	| > ͡
	| > ʃ
	| > ʝ
	| > ɾ
| > Number of instances : 4
 | > Preprocessing samples
 | > Max text length: 81
 | > Min text length: 46
 | > Avg text length: 66.5
 | 
 | > Max audio length: 80022.0
 | > Min audio length: 80022.0
 | > Avg audio length: 80022.0
 | > Num. instances discarded samples: 0
 | > Batch group size: 0.
 | > Synthesizing test sentences.



  [1m--> EVAL PERFORMANCE[0m
     | > avg_loader_time:[91m 0.66065 [0m(+0.18263)
     | > avg_loss_disc:[91m 2.74049 [0m(+0.08867)
     | > avg_loss_disc_real_0:[92m 0.12751 [0m(-0.39302)
     | > avg_loss_disc_real_1:[91m 0.14968 [0m(+0.04218)
     | > avg_loss_disc_real_2:[92m 0.20766 [0m(-0.03694)
     | > avg_loss_disc_real_3:[91m 0.25883 [0m(+0.10032)
     | > avg_loss_disc_real_4:[92m 0.19654 [0m(-0.20412)
     | > avg_loss_disc_real_5:[91m 0.23624 [0m(+0.09915)
     | > avg_loss_0:[91m 2.74049 [0m(+0.08867)
     | > avg_loss_gen:[92m 1.63423 [0m(-0.91650)
     | > avg_loss_kl:[92m 1.40711 [0m(-0.66264)
     | > avg_loss_feat:[91m 3.08888 [0m(+0.01025)
     | > avg_loss_mel:[92m 24.91580 [0m(-0.25830)
     | > avg_loss_duration:[91m 1.65997 [0m(+0.02633)
     | > avg_loss_1:[92m 32.70600 [0m(-1.80086)

 > BEST MODEL : /content/drive2/MyDrive/tts-ai/VITS-es-1-April-28-2023_09+16PM-7712052/best_model_2263.pth

[4m[1m > EPOCH: 2/100[0m
 --> /conte



> DataLoader initialization
| > Tokenizer:
	| > add_blank: True
	| > use_eos_bos: False
	| > use_phonemes: True
	| > phonemizer:
		| > phoneme language: es-es
		| > phoneme backend: gruut
	| > 6 not found characters:
	| > ɡ
	| > θ
	| > ͡
	| > ʃ
	| > ʝ
	| > ɾ
| > Number of instances : 401
 | > Preprocessing samples
 | > Max text length: 94
 | > Min text length: 20
 | > Avg text length: 61.45885286783042
 | 
 | > Max audio length: 80022.0
 | > Min audio length: 64166.0
 | > Avg audio length: 79982.45885286783
 | > Num. instances discarded samples: 0
 | > Batch group size: 0.



[1m   --> STEP: 12/26 -- GLOBAL_STEP: 2275[0m
     | > loss_disc: 2.56848  (2.61715)
     | > loss_disc_real_0: 0.15256  (0.20546)
     | > loss_disc_real_1: 0.23512  (0.21569)
     | > loss_disc_real_2: 0.31560  (0.24850)
     | > loss_disc_real_3: 0.25286  (0.24944)
     | > loss_disc_real_4: 0.20687  (0.23038)
     | > loss_disc_real_5: 0.29977  (0.24242)
     | > loss_0: 2.56848  (2.61715)
     | > grad_norm_0: 6.94050  (8.90741)
     | > loss_gen: 1.94389  (1.99458)
     | > loss_kl: 1.24274  (1.49293)
     | > loss_feat: 1.92527  (2.19887)
     | > loss_mel: 23.87654  (23.75289)
     | > loss_duration: 1.77644  (1.72628)
     | > amp_scaler: 512.00000  (512.00000)
     | > loss_1: 30.76487  (31.16555)
     | > grad_norm_1: 221.71133  (208.03186)
     | > current_lr_0: 0.00020 
     | > current_lr_1: 0.00020 
     | > step_time: 1.00030  (0.90077)
     | > loader_time: 0.01030  (0.01502)


[1m > EVALUATION [0m





> DataLoader initialization
| > Tokenizer:
	| > add_blank: True
	| > use_eos_bos: False
	| > use_phonemes: True
	| > phonemizer:
		| > phoneme language: es-es
		| > phoneme backend: gruut
	| > 6 not found characters:
	| > ɡ
	| > θ
	| > ͡
	| > ʃ
	| > ʝ
	| > ɾ
| > Number of instances : 4
 | > Preprocessing samples
 | > Max text length: 81
 | > Min text length: 46
 | > Avg text length: 66.5
 | 
 | > Max audio length: 80022.0
 | > Min audio length: 80022.0
 | > Avg audio length: 80022.0
 | > Num. instances discarded samples: 0
 | > Batch group size: 0.
 | > Synthesizing test sentences.



  [1m--> EVAL PERFORMANCE[0m
     | > avg_loader_time:[91m 0.67673 [0m(+0.01608)
     | > avg_loss_disc:[92m 2.72564 [0m(-0.01485)
     | > avg_loss_disc_real_0:[91m 0.21989 [0m(+0.09238)
     | > avg_loss_disc_real_1:[92m 0.14049 [0m(-0.00919)
     | > avg_loss_disc_real_2:[92m 0.16768 [0m(-0.03998)
     | > avg_loss_disc_real_3:[92m 0.18335 [0m(-0.07548)
     | > avg_loss_disc_real_4:[92m 0.10881 [0m(-0.08773)
     | > avg_loss_disc_real_5:[91m 0.25622 [0m(+0.01998)
     | > avg_loss_0:[92m 2.72564 [0m(-0.01485)
     | > avg_loss_gen:[92m 1.50015 [0m(-0.13408)
     | > avg_loss_kl:[91m 1.96315 [0m(+0.55604)
     | > avg_loss_feat:[92m 1.88602 [0m(-1.20286)
     | > avg_loss_mel:[92m 21.77735 [0m(-3.13845)
     | > avg_loss_duration:[92m 1.65640 [0m(-0.00357)
     | > avg_loss_1:[92m 28.78307 [0m(-3.92293)

 > BEST MODEL : /content/drive2/MyDrive/tts-ai/VITS-es-1-April-28-2023_09+16PM-7712052/best_model_2289.pth

[4m[1m > EPOCH: 3/100[0m
 --> /conte



> DataLoader initialization
| > Tokenizer:
	| > add_blank: True
	| > use_eos_bos: False
	| > use_phonemes: True
	| > phonemizer:
		| > phoneme language: es-es
		| > phoneme backend: gruut
	| > 6 not found characters:
	| > ɡ
	| > θ
	| > ͡
	| > ʃ
	| > ʝ
	| > ɾ
| > Number of instances : 401
 | > Preprocessing samples
 | > Max text length: 94
 | > Min text length: 20
 | > Avg text length: 61.45885286783042
 | 
 | > Max audio length: 80022.0
 | > Min audio length: 64166.0
 | > Avg audio length: 79982.45885286783
 | > Num. instances discarded samples: 0
 | > Batch group size: 0.



[1m   --> STEP: 11/26 -- GLOBAL_STEP: 2300[0m
     | > loss_disc: 2.62095  (2.60090)
     | > loss_disc_real_0: 0.36629  (0.21642)
     | > loss_disc_real_1: 0.22478  (0.20666)
     | > loss_disc_real_2: 0.21390  (0.24444)
     | > loss_disc_real_3: 0.24289  (0.24778)
     | > loss_disc_real_4: 0.27738  (0.23033)
     | > loss_disc_real_5: 0.24788  (0.23944)
     | > loss_0: 2.62095  (2.60090)
     | > grad_norm_0: 21.18637  (10.55456)
     | > loss_gen: 1.99629  (2.04093)
     | > loss_kl: 1.57548  (1.61350)
     | > loss_feat: 2.31545  (2.27286)
     | > loss_mel: 22.77358  (23.89148)
     | > loss_duration: 1.70960  (1.72468)
     | > amp_scaler: 512.00000  (512.00000)
     | > loss_1: 30.37041  (31.54345)
     | > grad_norm_1: 185.92996  (185.16280)
     | > current_lr_0: 0.00020 
     | > current_lr_1: 0.00020 
     | > step_time: 1.01490  (1.06250)
     | > loader_time: 0.01590  (0.00956)


[1m > EVALUATION [0m





> DataLoader initialization
| > Tokenizer:
	| > add_blank: True
	| > use_eos_bos: False
	| > use_phonemes: True
	| > phonemizer:
		| > phoneme language: es-es
		| > phoneme backend: gruut
	| > 6 not found characters:
	| > ɡ
	| > θ
	| > ͡
	| > ʃ
	| > ʝ
	| > ɾ
| > Number of instances : 4
 | > Preprocessing samples
 | > Max text length: 81
 | > Min text length: 46
 | > Avg text length: 66.5
 | 
 | > Max audio length: 80022.0
 | > Min audio length: 80022.0
 | > Avg audio length: 80022.0
 | > Num. instances discarded samples: 0
 | > Batch group size: 0.
 | > Synthesizing test sentences.



  [1m--> EVAL PERFORMANCE[0m
     | > avg_loader_time:[91m 1.24020 [0m(+0.56347)
     | > avg_loss_disc:[92m 2.54801 [0m(-0.17763)
     | > avg_loss_disc_real_0:[91m 0.27337 [0m(+0.05348)
     | > avg_loss_disc_real_1:[91m 0.22099 [0m(+0.08050)
     | > avg_loss_disc_real_2:[91m 0.18924 [0m(+0.02156)
     | > avg_loss_disc_real_3:[91m 0.20540 [0m(+0.02204)
     | > avg_loss_disc_real_4:[91m 0.13924 [0m(+0.03042)
     | > avg_loss_disc_real_5:[92m 0.15516 [0m(-0.10106)
     | > avg_loss_0:[92m 2.54801 [0m(-0.17763)
     | > avg_loss_gen:[91m 1.90455 [0m(+0.40440)
     | > avg_loss_kl:[91m 2.08142 [0m(+0.11827)
     | > avg_loss_feat:[91m 2.65435 [0m(+0.76833)
     | > avg_loss_mel:[91m 23.44324 [0m(+1.66589)
     | > avg_loss_duration:[91m 1.65868 [0m(+0.00227)
     | > avg_loss_1:[91m 31.74224 [0m(+2.95916)


[4m[1m > EPOCH: 4/100[0m
 --> /content/drive2/MyDrive/tts-ai/VITS-es-1-April-28-2023_09+16PM-7712052

[1m > TRAINING (2023-04-28 21:21:02) [0



> DataLoader initialization
| > Tokenizer:
	| > add_blank: True
	| > use_eos_bos: False
	| > use_phonemes: True
	| > phonemizer:
		| > phoneme language: es-es
		| > phoneme backend: gruut
	| > 6 not found characters:
	| > ɡ
	| > θ
	| > ͡
	| > ʃ
	| > ʝ
	| > ɾ
| > Number of instances : 401
 | > Preprocessing samples
 | > Max text length: 94
 | > Min text length: 20
 | > Avg text length: 61.45885286783042
 | 
 | > Max audio length: 80022.0
 | > Min audio length: 64166.0
 | > Avg audio length: 79982.45885286783
 | > Num. instances discarded samples: 0
 | > Batch group size: 0.



[1m   --> STEP: 10/26 -- GLOBAL_STEP: 2325[0m
     | > loss_disc: 2.42004  (2.57167)
     | > loss_disc_real_0: 0.14101  (0.18645)
     | > loss_disc_real_1: 0.24486  (0.22007)
     | > loss_disc_real_2: 0.23910  (0.23660)
     | > loss_disc_real_3: 0.27336  (0.23484)
     | > loss_disc_real_4: 0.27228  (0.24571)
     | > loss_disc_real_5: 0.22453  (0.23182)
     | > loss_0: 2.42004  (2.57167)
     | > grad_norm_0: 6.16633  (11.25119)
     | > loss_gen: 2.47047  (2.07071)
     | > loss_kl: 1.39961  (1.57593)
     | > loss_feat: 2.73209  (2.41923)
     | > loss_mel: 23.85012  (24.28101)
     | > loss_duration: 1.71999  (1.72297)
     | > amp_scaler: 512.00000  (512.00000)
     | > loss_1: 32.17228  (32.06985)
     | > grad_norm_1: 167.31795  (209.30803)
     | > current_lr_0: 0.00020 
     | > current_lr_1: 0.00020 
     | > step_time: 0.74430  (0.73189)
     | > loader_time: 0.00870  (0.00813)


[1m > EVALUATION [0m





> DataLoader initialization
| > Tokenizer:
	| > add_blank: True
	| > use_eos_bos: False
	| > use_phonemes: True
	| > phonemizer:
		| > phoneme language: es-es
		| > phoneme backend: gruut
	| > 6 not found characters:
	| > ɡ
	| > θ
	| > ͡
	| > ʃ
	| > ʝ
	| > ɾ
| > Number of instances : 4
 | > Preprocessing samples
 | > Max text length: 81
 | > Min text length: 46
 | > Avg text length: 66.5
 | 
 | > Max audio length: 80022.0
 | > Min audio length: 80022.0
 | > Avg audio length: 80022.0
 | > Num. instances discarded samples: 0
 | > Batch group size: 0.
 | > Synthesizing test sentences.



  [1m--> EVAL PERFORMANCE[0m
     | > avg_loader_time:[92m 0.47646 [0m(-0.76374)
     | > avg_loss_disc:[92m 2.22029 [0m(-0.32772)
     | > avg_loss_disc_real_0:[92m 0.17045 [0m(-0.10292)
     | > avg_loss_disc_real_1:[92m 0.08680 [0m(-0.13419)
     | > avg_loss_disc_real_2:[91m 0.25711 [0m(+0.06787)
     | > avg_loss_disc_real_3:[91m 0.23008 [0m(+0.02468)
     | > avg_loss_disc_real_4:[91m 0.14247 [0m(+0.00324)
     | > avg_loss_disc_real_5:[91m 0.18079 [0m(+0.02563)
     | > avg_loss_0:[92m 2.22029 [0m(-0.32772)
     | > avg_loss_gen:[91m 2.09200 [0m(+0.18745)
     | > avg_loss_kl:[92m 1.38418 [0m(-0.69724)
     | > avg_loss_feat:[91m 3.39793 [0m(+0.74358)
     | > avg_loss_mel:[91m 24.72383 [0m(+1.28059)
     | > avg_loss_duration:[91m 1.66514 [0m(+0.00646)
     | > avg_loss_1:[91m 33.26307 [0m(+1.52084)


[4m[1m > EPOCH: 5/100[0m
 --> /content/drive2/MyDrive/tts-ai/VITS-es-1-April-28-2023_09+16PM-7712052

[1m > TRAINING (2023-04-28 21:21:37) [0



> DataLoader initialization
| > Tokenizer:
	| > add_blank: True
	| > use_eos_bos: False
	| > use_phonemes: True
	| > phonemizer:
		| > phoneme language: es-es
		| > phoneme backend: gruut
	| > 6 not found characters:
	| > ɡ
	| > θ
	| > ͡
	| > ʃ
	| > ʝ
	| > ɾ
| > Number of instances : 401
 | > Preprocessing samples
 | > Max text length: 94
 | > Min text length: 20
 | > Avg text length: 61.45885286783042
 | 
 | > Max audio length: 80022.0
 | > Min audio length: 64166.0
 | > Avg audio length: 79982.45885286783
 | > Num. instances discarded samples: 0
 | > Batch group size: 0.



[1m   --> STEP: 9/26 -- GLOBAL_STEP: 2350[0m
     | > loss_disc: 2.66976  (2.61111)
     | > loss_disc_real_0: 0.32761  (0.21896)
     | > loss_disc_real_1: 0.21573  (0.21185)
     | > loss_disc_real_2: 0.21805  (0.23995)
     | > loss_disc_real_3: 0.21903  (0.24523)
     | > loss_disc_real_4: 0.28346  (0.23052)
     | > loss_disc_real_5: 0.21345  (0.23612)
     | > loss_0: 2.66976  (2.61111)
     | > grad_norm_0: 18.46738  (14.00691)
     | > loss_gen: 1.84985  (1.99778)
     | > loss_kl: 1.70586  (1.57942)
     | > loss_feat: 2.32126  (2.30964)
     | > loss_mel: 21.85607  (23.52626)
     | > loss_duration: 1.68474  (1.72532)
     | > amp_scaler: 512.00000  (512.00000)
     | > loss_1: 29.41777  (31.13843)
     | > grad_norm_1: 185.78157  (182.58723)
     | > current_lr_0: 0.00020 
     | > current_lr_1: 0.00020 
     | > step_time: 0.86240  (0.75270)
     | > loader_time: 0.00740  (0.00964)


[1m > EVALUATION [0m





> DataLoader initialization
| > Tokenizer:
	| > add_blank: True
	| > use_eos_bos: False
	| > use_phonemes: True
	| > phonemizer:
		| > phoneme language: es-es
		| > phoneme backend: gruut
	| > 6 not found characters:
	| > ɡ
	| > θ
	| > ͡
	| > ʃ
	| > ʝ
	| > ɾ
| > Number of instances : 4
 | > Preprocessing samples
 | > Max text length: 81
 | > Min text length: 46
 | > Avg text length: 66.5
 | 
 | > Max audio length: 80022.0
 | > Min audio length: 80022.0
 | > Avg audio length: 80022.0
 | > Num. instances discarded samples: 0
 | > Batch group size: 0.
 | > Synthesizing test sentences.



  [1m--> EVAL PERFORMANCE[0m
     | > avg_loader_time:[92m 0.47099 [0m(-0.00547)
     | > avg_loss_disc:[91m 2.29965 [0m(+0.07936)
     | > avg_loss_disc_real_0:[91m 0.18939 [0m(+0.01894)
     | > avg_loss_disc_real_1:[91m 0.17100 [0m(+0.08420)
     | > avg_loss_disc_real_2:[92m 0.13316 [0m(-0.12395)
     | > avg_loss_disc_real_3:[92m 0.22381 [0m(-0.00626)
     | > avg_loss_disc_real_4:[91m 0.23770 [0m(+0.09522)
     | > avg_loss_disc_real_5:[92m 0.13405 [0m(-0.04674)
     | > avg_loss_0:[91m 2.29965 [0m(+0.07936)
     | > avg_loss_gen:[91m 2.20004 [0m(+0.10804)
     | > avg_loss_kl:[91m 1.98004 [0m(+0.59586)
     | > avg_loss_feat:[91m 3.92437 [0m(+0.52644)
     | > avg_loss_mel:[91m 24.88089 [0m(+0.15705)
     | > avg_loss_duration:[92m 1.66031 [0m(-0.00482)
     | > avg_loss_1:[91m 34.64564 [0m(+1.38257)


[4m[1m > EPOCH: 6/100[0m
 --> /content/drive2/MyDrive/tts-ai/VITS-es-1-April-28-2023_09+16PM-7712052

[1m > TRAINING (2023-04-28 21:22:12) [0



> DataLoader initialization
| > Tokenizer:
	| > add_blank: True
	| > use_eos_bos: False
	| > use_phonemes: True
	| > phonemizer:
		| > phoneme language: es-es
		| > phoneme backend: gruut
	| > 6 not found characters:
	| > ɡ
	| > θ
	| > ͡
	| > ʃ
	| > ʝ
	| > ɾ
| > Number of instances : 401
 | > Preprocessing samples
 | > Max text length: 94
 | > Min text length: 20
 | > Avg text length: 61.45885286783042
 | 
 | > Max audio length: 80022.0
 | > Min audio length: 64166.0
 | > Avg audio length: 79982.45885286783
 | > Num. instances discarded samples: 0
 | > Batch group size: 0.



[1m   --> STEP: 8/26 -- GLOBAL_STEP: 2375[0m
     | > loss_disc: 2.73622  (2.86348)
     | > loss_disc_real_0: 0.30465  (0.21279)
     | > loss_disc_real_1: 0.16285  (0.33961)
     | > loss_disc_real_2: 0.19534  (0.24846)
     | > loss_disc_real_3: 0.23281  (0.25407)
     | > loss_disc_real_4: 0.23708  (0.24151)
     | > loss_disc_real_5: 0.27945  (0.27440)
     | > loss_0: 2.73622  (2.86348)
     | > grad_norm_0: 20.99614  (26.04968)
     | > loss_gen: 1.91127  (2.11477)
     | > loss_kl: 1.48384  (1.65969)
     | > loss_feat: 2.55401  (2.21352)
     | > loss_mel: 25.78886  (24.67996)
     | > loss_duration: 1.77969  (1.72957)
     | > amp_scaler: 512.00000  (512.00000)
     | > loss_1: 33.51767  (32.39751)
     | > grad_norm_1: 170.66927  (169.49448)
     | > current_lr_0: 0.00020 
     | > current_lr_1: 0.00020 
     | > step_time: 0.91920  (0.87932)
     | > loader_time: 0.00740  (0.01538)


[1m > EVALUATION [0m





> DataLoader initialization
| > Tokenizer:
	| > add_blank: True
	| > use_eos_bos: False
	| > use_phonemes: True
	| > phonemizer:
		| > phoneme language: es-es
		| > phoneme backend: gruut
	| > 6 not found characters:
	| > ɡ
	| > θ
	| > ͡
	| > ʃ
	| > ʝ
	| > ɾ
| > Number of instances : 4
 | > Preprocessing samples
 | > Max text length: 81
 | > Min text length: 46
 | > Avg text length: 66.5
 | 
 | > Max audio length: 80022.0
 | > Min audio length: 80022.0
 | > Avg audio length: 80022.0
 | > Num. instances discarded samples: 0
 | > Batch group size: 0.
 | > Synthesizing test sentences.



  [1m--> EVAL PERFORMANCE[0m
     | > avg_loader_time:[91m 0.65854 [0m(+0.18755)
     | > avg_loss_disc:[91m 2.64112 [0m(+0.34147)
     | > avg_loss_disc_real_0:[91m 0.42980 [0m(+0.24041)
     | > avg_loss_disc_real_1:[92m 0.17055 [0m(-0.00045)
     | > avg_loss_disc_real_2:[91m 0.16343 [0m(+0.03027)
     | > avg_loss_disc_real_3:[92m 0.16593 [0m(-0.05789)
     | > avg_loss_disc_real_4:[92m 0.14598 [0m(-0.09172)
     | > avg_loss_disc_real_5:[91m 0.15527 [0m(+0.02121)
     | > avg_loss_0:[91m 2.64112 [0m(+0.34147)
     | > avg_loss_gen:[92m 1.86123 [0m(-0.33881)
     | > avg_loss_kl:[92m 1.85247 [0m(-0.12757)
     | > avg_loss_feat:[92m 2.26283 [0m(-1.66154)
     | > avg_loss_mel:[92m 22.83853 [0m(-2.04236)
     | > avg_loss_duration:[91m 1.67550 [0m(+0.01518)
     | > avg_loss_1:[92m 30.49055 [0m(-4.15509)


[4m[1m > EPOCH: 7/100[0m
 --> /content/drive2/MyDrive/tts-ai/VITS-es-1-April-28-2023_09+16PM-7712052

[1m > TRAINING (2023-04-28 21:22:46) [0



> DataLoader initialization
| > Tokenizer:
	| > add_blank: True
	| > use_eos_bos: False
	| > use_phonemes: True
	| > phonemizer:
		| > phoneme language: es-es
		| > phoneme backend: gruut
	| > 6 not found characters:
	| > ɡ
	| > θ
	| > ͡
	| > ʃ
	| > ʝ
	| > ɾ
| > Number of instances : 401
 | > Preprocessing samples
 | > Max text length: 94
 | > Min text length: 20
 | > Avg text length: 61.45885286783042
 | 
 | > Max audio length: 80022.0
 | > Min audio length: 64166.0
 | > Avg audio length: 79982.45885286783
 | > Num. instances discarded samples: 0
 | > Batch group size: 0.



[1m   --> STEP: 7/26 -- GLOBAL_STEP: 2400[0m
     | > loss_disc: 2.56703  (2.61891)
     | > loss_disc_real_0: 0.25233  (0.19120)
     | > loss_disc_real_1: 0.18343  (0.20417)
     | > loss_disc_real_2: 0.25775  (0.25134)
     | > loss_disc_real_3: 0.28591  (0.24643)
     | > loss_disc_real_4: 0.30518  (0.21827)
     | > loss_disc_real_5: 0.23751  (0.24850)
     | > loss_0: 2.56703  (2.61891)
     | > grad_norm_0: 14.08738  (12.86020)
     | > loss_gen: 2.09970  (2.00655)
     | > loss_kl: 1.43825  (1.49631)
     | > loss_feat: 2.22128  (2.08851)
     | > loss_mel: 25.97661  (25.51559)
     | > loss_duration: 1.72572  (1.72322)
     | > amp_scaler: 512.00000  (512.00000)
     | > loss_1: 33.46156  (32.83018)
     | > grad_norm_1: 237.63066  (206.02516)
     | > current_lr_0: 0.00020 
     | > current_lr_1: 0.00020 
     | > step_time: 0.73840  (0.83004)
     | > loader_time: 0.00740  (0.01025)


[1m > EVALUATION [0m





> DataLoader initialization
| > Tokenizer:
	| > add_blank: True
	| > use_eos_bos: False
	| > use_phonemes: True
	| > phonemizer:
		| > phoneme language: es-es
		| > phoneme backend: gruut
	| > 6 not found characters:
	| > ɡ
	| > θ
	| > ͡
	| > ʃ
	| > ʝ
	| > ɾ
| > Number of instances : 4
 | > Preprocessing samples
 | > Max text length: 81
 | > Min text length: 46
 | > Avg text length: 66.5
 | 
 | > Max audio length: 80022.0
 | > Min audio length: 80022.0
 | > Avg audio length: 80022.0
 | > Num. instances discarded samples: 0
 | > Batch group size: 0.
 | > Synthesizing test sentences.



  [1m--> EVAL PERFORMANCE[0m
     | > avg_loader_time:[92m 0.48323 [0m(-0.17531)
     | > avg_loss_disc:[92m 2.24869 [0m(-0.39242)
     | > avg_loss_disc_real_0:[92m 0.20531 [0m(-0.22449)
     | > avg_loss_disc_real_1:[91m 0.17508 [0m(+0.00453)
     | > avg_loss_disc_real_2:[92m 0.14580 [0m(-0.01762)
     | > avg_loss_disc_real_3:[91m 0.24079 [0m(+0.07486)
     | > avg_loss_disc_real_4:[91m 0.23798 [0m(+0.09200)
     | > avg_loss_disc_real_5:[91m 0.17641 [0m(+0.02115)
     | > avg_loss_0:[92m 2.24869 [0m(-0.39242)
     | > avg_loss_gen:[91m 2.38974 [0m(+0.52851)
     | > avg_loss_kl:[92m 1.71255 [0m(-0.13992)
     | > avg_loss_feat:[91m 4.00676 [0m(+1.74393)
     | > avg_loss_mel:[91m 27.83235 [0m(+4.99382)
     | > avg_loss_duration:[92m 1.63966 [0m(-0.03584)
     | > avg_loss_1:[91m 37.58105 [0m(+7.09049)


[4m[1m > EPOCH: 8/100[0m
 --> /content/drive2/MyDrive/tts-ai/VITS-es-1-April-28-2023_09+16PM-7712052

[1m > TRAINING (2023-04-28 21:23:21) [0



> DataLoader initialization
| > Tokenizer:
	| > add_blank: True
	| > use_eos_bos: False
	| > use_phonemes: True
	| > phonemizer:
		| > phoneme language: es-es
		| > phoneme backend: gruut
	| > 6 not found characters:
	| > ɡ
	| > θ
	| > ͡
	| > ʃ
	| > ʝ
	| > ɾ
| > Number of instances : 401
 | > Preprocessing samples
 | > Max text length: 94
 | > Min text length: 20
 | > Avg text length: 61.45885286783042
 | 
 | > Max audio length: 80022.0
 | > Min audio length: 64166.0
 | > Avg audio length: 79982.45885286783
 | > Num. instances discarded samples: 0
 | > Batch group size: 0.



[1m   --> STEP: 6/26 -- GLOBAL_STEP: 2425[0m
     | > loss_disc: 2.68991  (2.73481)
     | > loss_disc_real_0: 0.15360  (0.19687)
     | > loss_disc_real_1: 0.22512  (0.21001)
     | > loss_disc_real_2: 0.28350  (0.26675)
     | > loss_disc_real_3: 0.30617  (0.26370)
     | > loss_disc_real_4: 0.31000  (0.24740)
     | > loss_disc_real_5: 0.31611  (0.29035)
     | > loss_0: 2.68991  (2.73481)
     | > grad_norm_0: 9.15919  (10.04848)
     | > loss_gen: 2.18158  (2.00737)
     | > loss_kl: 1.52964  (1.59557)
     | > loss_feat: 1.97894  (2.08693)
     | > loss_mel: 23.76956  (23.77728)
     | > loss_duration: 1.75438  (1.73154)
     | > amp_scaler: 512.00000  (512.00000)
     | > loss_1: 31.21411  (31.19870)
     | > grad_norm_1: 106.99963  (194.51691)
     | > current_lr_0: 0.00020 
     | > current_lr_1: 0.00020 
     | > step_time: 0.74340  (0.74520)
     | > loader_time: 0.00880  (0.00807)


[1m > EVALUATION [0m





> DataLoader initialization
| > Tokenizer:
	| > add_blank: True
	| > use_eos_bos: False
	| > use_phonemes: True
	| > phonemizer:
		| > phoneme language: es-es
		| > phoneme backend: gruut
	| > 6 not found characters:
	| > ɡ
	| > θ
	| > ͡
	| > ʃ
	| > ʝ
	| > ɾ
| > Number of instances : 4
 | > Preprocessing samples
 | > Max text length: 81
 | > Min text length: 46
 | > Avg text length: 66.5
 | 
 | > Max audio length: 80022.0
 | > Min audio length: 80022.0
 | > Avg audio length: 80022.0
 | > Num. instances discarded samples: 0
 | > Batch group size: 0.
 | > Synthesizing test sentences.



  [1m--> EVAL PERFORMANCE[0m
     | > avg_loader_time:[91m 0.48595 [0m(+0.00272)
     | > avg_loss_disc:[91m 2.72793 [0m(+0.47924)
     | > avg_loss_disc_real_0:[91m 0.43390 [0m(+0.22859)
     | > avg_loss_disc_real_1:[92m 0.15716 [0m(-0.01792)
     | > avg_loss_disc_real_2:[91m 0.17404 [0m(+0.02823)
     | > avg_loss_disc_real_3:[91m 0.30377 [0m(+0.06298)
     | > avg_loss_disc_real_4:[91m 0.23997 [0m(+0.00199)
     | > avg_loss_disc_real_5:[92m 0.16746 [0m(-0.00895)
     | > avg_loss_0:[91m 2.72793 [0m(+0.47924)
     | > avg_loss_gen:[92m 2.17503 [0m(-0.21471)
     | > avg_loss_kl:[91m 1.80516 [0m(+0.09261)
     | > avg_loss_feat:[92m 2.59153 [0m(-1.41522)
     | > avg_loss_mel:[92m 24.07339 [0m(-3.75896)
     | > avg_loss_duration:[91m 1.66858 [0m(+0.02892)
     | > avg_loss_1:[92m 32.31370 [0m(-5.26735)


[4m[1m > EPOCH: 9/100[0m
 --> /content/drive2/MyDrive/tts-ai/VITS-es-1-April-28-2023_09+16PM-7712052

[1m > TRAINING (2023-04-28 21:23:56) [0



> DataLoader initialization
| > Tokenizer:
	| > add_blank: True
	| > use_eos_bos: False
	| > use_phonemes: True
	| > phonemizer:
		| > phoneme language: es-es
		| > phoneme backend: gruut
	| > 6 not found characters:
	| > ɡ
	| > θ
	| > ͡
	| > ʃ
	| > ʝ
	| > ɾ
| > Number of instances : 401
 | > Preprocessing samples
 | > Max text length: 94
 | > Min text length: 20
 | > Avg text length: 61.45885286783042
 | 
 | > Max audio length: 80022.0
 | > Min audio length: 64166.0
 | > Avg audio length: 79982.45885286783
 | > Num. instances discarded samples: 0
 | > Batch group size: 0.



[1m   --> STEP: 5/26 -- GLOBAL_STEP: 2450[0m
     | > loss_disc: 2.58553  (2.71153)
     | > loss_disc_real_0: 0.20808  (0.20568)
     | > loss_disc_real_1: 0.23920  (0.25218)
     | > loss_disc_real_2: 0.21578  (0.25232)
     | > loss_disc_real_3: 0.22261  (0.23503)
     | > loss_disc_real_4: 0.24386  (0.25237)
     | > loss_disc_real_5: 0.27312  (0.27714)
     | > loss_0: 2.58553  (2.71153)
     | > grad_norm_0: 4.71811  (13.73054)
     | > loss_gen: 1.79655  (1.94977)
     | > loss_kl: 1.58194  (1.58685)
     | > loss_feat: 2.26153  (2.03940)
     | > loss_mel: 22.57178  (23.25478)
     | > loss_duration: 1.67068  (1.73295)
     | > amp_scaler: 512.00000  (512.00000)
     | > loss_1: 29.88247  (30.56374)
     | > grad_norm_1: 82.80609  (162.66393)
     | > current_lr_0: 0.00020 
     | > current_lr_1: 0.00020 
     | > step_time: 0.71840  (0.72649)
     | > loader_time: 0.00620  (0.00931)


[1m > EVALUATION [0m





> DataLoader initialization
| > Tokenizer:
	| > add_blank: True
	| > use_eos_bos: False
	| > use_phonemes: True
	| > phonemizer:
		| > phoneme language: es-es
		| > phoneme backend: gruut
	| > 6 not found characters:
	| > ɡ
	| > θ
	| > ͡
	| > ʃ
	| > ʝ
	| > ɾ
| > Number of instances : 4
 | > Preprocessing samples
 | > Max text length: 81
 | > Min text length: 46
 | > Avg text length: 66.5
 | 
 | > Max audio length: 80022.0
 | > Min audio length: 80022.0
 | > Avg audio length: 80022.0
 | > Num. instances discarded samples: 0
 | > Batch group size: 0.
 | > Synthesizing test sentences.



  [1m--> EVAL PERFORMANCE[0m
     | > avg_loader_time:[92m 0.47919 [0m(-0.00677)
     | > avg_loss_disc:[92m 2.52323 [0m(-0.20471)
     | > avg_loss_disc_real_0:[92m 0.13748 [0m(-0.29642)
     | > avg_loss_disc_real_1:[91m 0.18979 [0m(+0.03263)
     | > avg_loss_disc_real_2:[91m 0.18644 [0m(+0.01240)
     | > avg_loss_disc_real_3:[92m 0.22473 [0m(-0.07904)
     | > avg_loss_disc_real_4:[92m 0.15671 [0m(-0.08326)
     | > avg_loss_disc_real_5:[91m 0.19955 [0m(+0.03209)
     | > avg_loss_0:[92m 2.52323 [0m(-0.20471)
     | > avg_loss_gen:[92m 1.70597 [0m(-0.46906)
     | > avg_loss_kl:[92m 1.67651 [0m(-0.12865)
     | > avg_loss_feat:[92m 2.57352 [0m(-0.01801)
     | > avg_loss_mel:[92m 23.50788 [0m(-0.56552)
     | > avg_loss_duration:[91m 1.69530 [0m(+0.02672)
     | > avg_loss_1:[92m 31.15917 [0m(-1.15453)


[4m[1m > EPOCH: 10/100[0m
 --> /content/drive2/MyDrive/tts-ai/VITS-es-1-April-28-2023_09+16PM-7712052

[1m > TRAINING (2023-04-28 21:24:30) [



> DataLoader initialization
| > Tokenizer:
	| > add_blank: True
	| > use_eos_bos: False
	| > use_phonemes: True
	| > phonemizer:
		| > phoneme language: es-es
		| > phoneme backend: gruut
	| > 6 not found characters:
	| > ɡ
	| > θ
	| > ͡
	| > ʃ
	| > ʝ
	| > ɾ
| > Number of instances : 401
 | > Preprocessing samples
 | > Max text length: 94
 | > Min text length: 20
 | > Avg text length: 61.45885286783042
 | 
 | > Max audio length: 80022.0
 | > Min audio length: 64166.0
 | > Avg audio length: 79982.45885286783
 | > Num. instances discarded samples: 0
 | > Batch group size: 0.



[1m   --> STEP: 4/26 -- GLOBAL_STEP: 2475[0m
     | > loss_disc: 2.53203  (2.56657)
     | > loss_disc_real_0: 0.16587  (0.18195)
     | > loss_disc_real_1: 0.22841  (0.23187)
     | > loss_disc_real_2: 0.20736  (0.24282)
     | > loss_disc_real_3: 0.24354  (0.25420)
     | > loss_disc_real_4: 0.24535  (0.22541)
     | > loss_disc_real_5: 0.27397  (0.25137)
     | > loss_0: 2.53203  (2.56657)
     | > grad_norm_0: 4.46213  (9.73198)
     | > loss_gen: 1.97860  (2.06572)
     | > loss_kl: 1.33300  (1.47433)
     | > loss_feat: 2.03582  (2.23383)
     | > loss_mel: 22.50082  (23.74731)
     | > loss_duration: 1.70795  (1.75177)
     | > amp_scaler: 512.00000  (512.00000)
     | > loss_1: 29.55620  (31.27296)
     | > grad_norm_1: 220.39622  (234.41235)
     | > current_lr_0: 0.00020 
     | > current_lr_1: 0.00020 
     | > step_time: 0.84670  (0.78620)
     | > loader_time: 0.00720  (0.00852)


[1m > EVALUATION [0m





> DataLoader initialization
| > Tokenizer:
	| > add_blank: True
	| > use_eos_bos: False
	| > use_phonemes: True
	| > phonemizer:
		| > phoneme language: es-es
		| > phoneme backend: gruut
	| > 6 not found characters:
	| > ɡ
	| > θ
	| > ͡
	| > ʃ
	| > ʝ
	| > ɾ
| > Number of instances : 4
 | > Preprocessing samples
 | > Max text length: 81
 | > Min text length: 46
 | > Avg text length: 66.5
 | 
 | > Max audio length: 80022.0
 | > Min audio length: 80022.0
 | > Avg audio length: 80022.0
 | > Num. instances discarded samples: 0
 | > Batch group size: 0.
 | > Synthesizing test sentences.



  [1m--> EVAL PERFORMANCE[0m
     | > avg_loader_time:[91m 0.70499 [0m(+0.22580)
     | > avg_loss_disc:[92m 2.40711 [0m(-0.11612)
     | > avg_loss_disc_real_0:[91m 0.18048 [0m(+0.04300)
     | > avg_loss_disc_real_1:[92m 0.18115 [0m(-0.00864)
     | > avg_loss_disc_real_2:[91m 0.19837 [0m(+0.01194)
     | > avg_loss_disc_real_3:[91m 0.42037 [0m(+0.19563)
     | > avg_loss_disc_real_4:[91m 0.19244 [0m(+0.03573)
     | > avg_loss_disc_real_5:[91m 0.26488 [0m(+0.06534)
     | > avg_loss_0:[92m 2.40711 [0m(-0.11612)
     | > avg_loss_gen:[91m 2.41591 [0m(+0.70995)
     | > avg_loss_kl:[91m 1.90630 [0m(+0.22978)
     | > avg_loss_feat:[91m 3.20187 [0m(+0.62836)
     | > avg_loss_mel:[92m 23.43633 [0m(-0.07154)
     | > avg_loss_duration:[92m 1.64236 [0m(-0.05294)
     | > avg_loss_1:[91m 32.60277 [0m(+1.44360)


[4m[1m > EPOCH: 11/100[0m
 --> /content/drive2/MyDrive/tts-ai/VITS-es-1-April-28-2023_09+16PM-7712052

[1m > TRAINING (2023-04-28 21:25:04) [



> DataLoader initialization
| > Tokenizer:
	| > add_blank: True
	| > use_eos_bos: False
	| > use_phonemes: True
	| > phonemizer:
		| > phoneme language: es-es
		| > phoneme backend: gruut
	| > 6 not found characters:
	| > ɡ
	| > θ
	| > ͡
	| > ʃ
	| > ʝ
	| > ɾ
| > Number of instances : 401
 | > Preprocessing samples
 | > Max text length: 94
 | > Min text length: 20
 | > Avg text length: 61.45885286783042
 | 
 | > Max audio length: 80022.0
 | > Min audio length: 64166.0
 | > Avg audio length: 79982.45885286783
 | > Num. instances discarded samples: 0
 | > Batch group size: 0.



[1m   --> STEP: 3/26 -- GLOBAL_STEP: 2500[0m
     | > loss_disc: 2.74208  (2.74870)
     | > loss_disc_real_0: 0.32665  (0.19876)
     | > loss_disc_real_1: 0.21064  (0.22072)
     | > loss_disc_real_2: 0.22508  (0.24519)
     | > loss_disc_real_3: 0.24338  (0.22289)
     | > loss_disc_real_4: 0.18713  (0.26302)
     | > loss_disc_real_5: 0.22679  (0.25728)
     | > loss_0: 2.74208  (2.74870)
     | > grad_norm_0: 25.89211  (18.58291)
     | > loss_gen: 1.95216  (1.93916)
     | > loss_kl: 2.06285  (1.79069)
     | > loss_feat: 2.08377  (1.99249)
     | > loss_mel: 23.18318  (23.18597)
     | > loss_duration: 1.79558  (1.75236)
     | > amp_scaler: 512.00000  (512.00000)
     | > loss_1: 31.07754  (30.66067)
     | > grad_norm_1: 284.84009  (248.24565)
     | > current_lr_0: 0.00020 
     | > current_lr_1: 0.00020 
     | > step_time: 0.83180  (0.85597)
     | > loader_time: 0.00600  (0.01214)


[1m > EVALUATION [0m





> DataLoader initialization
| > Tokenizer:
	| > add_blank: True
	| > use_eos_bos: False
	| > use_phonemes: True
	| > phonemizer:
		| > phoneme language: es-es
		| > phoneme backend: gruut
	| > 6 not found characters:
	| > ɡ
	| > θ
	| > ͡
	| > ʃ
	| > ʝ
	| > ɾ
| > Number of instances : 4
 | > Preprocessing samples
 | > Max text length: 81
 | > Min text length: 46
 | > Avg text length: 66.5
 | 
 | > Max audio length: 80022.0
 | > Min audio length: 80022.0
 | > Avg audio length: 80022.0
 | > Num. instances discarded samples: 0
 | > Batch group size: 0.
 | > Synthesizing test sentences.



  [1m--> EVAL PERFORMANCE[0m
     | > avg_loader_time:[92m 0.47758 [0m(-0.22741)
     | > avg_loss_disc:[91m 2.56187 [0m(+0.15477)
     | > avg_loss_disc_real_0:[91m 0.20230 [0m(+0.02182)
     | > avg_loss_disc_real_1:[92m 0.09250 [0m(-0.08865)
     | > avg_loss_disc_real_2:[91m 0.32088 [0m(+0.12251)
     | > avg_loss_disc_real_3:[92m 0.20975 [0m(-0.21061)
     | > avg_loss_disc_real_4:[91m 0.21344 [0m(+0.02100)
     | > avg_loss_disc_real_5:[92m 0.19976 [0m(-0.06512)
     | > avg_loss_0:[91m 2.56187 [0m(+0.15477)
     | > avg_loss_gen:[92m 1.87841 [0m(-0.53750)
     | > avg_loss_kl:[92m 1.90160 [0m(-0.00469)
     | > avg_loss_feat:[91m 3.81533 [0m(+0.61346)
     | > avg_loss_mel:[91m 26.67904 [0m(+3.24270)
     | > avg_loss_duration:[91m 1.67502 [0m(+0.03266)
     | > avg_loss_1:[91m 35.94940 [0m(+3.34663)


[4m[1m > EPOCH: 12/100[0m
 --> /content/drive2/MyDrive/tts-ai/VITS-es-1-April-28-2023_09+16PM-7712052

[1m > TRAINING (2023-04-28 21:25:38) [



> DataLoader initialization
| > Tokenizer:
	| > add_blank: True
	| > use_eos_bos: False
	| > use_phonemes: True
	| > phonemizer:
		| > phoneme language: es-es
		| > phoneme backend: gruut
	| > 6 not found characters:
	| > ɡ
	| > θ
	| > ͡
	| > ʃ
	| > ʝ
	| > ɾ
| > Number of instances : 401
 | > Preprocessing samples
 | > Max text length: 94
 | > Min text length: 20
 | > Avg text length: 61.45885286783042
 | 
 | > Max audio length: 80022.0
 | > Min audio length: 64166.0
 | > Avg audio length: 79982.45885286783
 | > Num. instances discarded samples: 0
 | > Batch group size: 0.



[1m   --> STEP: 2/26 -- GLOBAL_STEP: 2525[0m
     | > loss_disc: 2.69784  (2.61586)
     | > loss_disc_real_0: 0.21887  (0.20183)
     | > loss_disc_real_1: 0.28871  (0.24562)
     | > loss_disc_real_2: 0.18357  (0.24937)
     | > loss_disc_real_3: 0.21174  (0.25449)
     | > loss_disc_real_4: 0.31048  (0.25327)
     | > loss_disc_real_5: 0.20142  (0.23015)
     | > loss_0: 2.69784  (2.61586)
     | > grad_norm_0: 9.34839  (8.79992)
     | > loss_gen: 2.19129  (2.17935)
     | > loss_kl: 1.49638  (1.61353)
     | > loss_feat: 2.48486  (2.54167)
     | > loss_mel: 23.01337  (24.12983)
     | > loss_duration: 1.75589  (1.73447)
     | > amp_scaler: 512.00000  (512.00000)
     | > loss_1: 30.94179  (32.19885)
     | > grad_norm_1: 97.07733  (204.89012)
     | > current_lr_0: 0.00020 
     | > current_lr_1: 0.00020 
     | > step_time: 0.71700  (0.77565)
     | > loader_time: 0.00900  (0.00805)


[1m > EVALUATION [0m





> DataLoader initialization
| > Tokenizer:
	| > add_blank: True
	| > use_eos_bos: False
	| > use_phonemes: True
	| > phonemizer:
		| > phoneme language: es-es
		| > phoneme backend: gruut
	| > 6 not found characters:
	| > ɡ
	| > θ
	| > ͡
	| > ʃ
	| > ʝ
	| > ɾ
| > Number of instances : 4
 | > Preprocessing samples
 | > Max text length: 81
 | > Min text length: 46
 | > Avg text length: 66.5
 | 
 | > Max audio length: 80022.0
 | > Min audio length: 80022.0
 | > Avg audio length: 80022.0
 | > Num. instances discarded samples: 0
 | > Batch group size: 0.
 | > Synthesizing test sentences.



  [1m--> EVAL PERFORMANCE[0m
     | > avg_loader_time:[91m 0.48072 [0m(+0.00314)
     | > avg_loss_disc:[91m 2.56898 [0m(+0.00711)
     | > avg_loss_disc_real_0:[91m 0.36360 [0m(+0.16130)
     | > avg_loss_disc_real_1:[91m 0.23078 [0m(+0.13828)
     | > avg_loss_disc_real_2:[92m 0.17745 [0m(-0.14343)
     | > avg_loss_disc_real_3:[92m 0.12911 [0m(-0.08065)
     | > avg_loss_disc_real_4:[92m 0.17007 [0m(-0.04337)
     | > avg_loss_disc_real_5:[92m 0.19909 [0m(-0.00067)
     | > avg_loss_0:[91m 2.56898 [0m(+0.00711)
     | > avg_loss_gen:[91m 2.11891 [0m(+0.24049)
     | > avg_loss_kl:[92m 1.86468 [0m(-0.03693)
     | > avg_loss_feat:[92m 2.81100 [0m(-1.00433)
     | > avg_loss_mel:[92m 22.55545 [0m(-4.12359)
     | > avg_loss_duration:[92m 1.65863 [0m(-0.01639)
     | > avg_loss_1:[92m 31.00867 [0m(-4.94074)


[4m[1m > EPOCH: 13/100[0m
 --> /content/drive2/MyDrive/tts-ai/VITS-es-1-April-28-2023_09+16PM-7712052

[1m > TRAINING (2023-04-28 21:26:14) [



> DataLoader initialization
| > Tokenizer:
	| > add_blank: True
	| > use_eos_bos: False
	| > use_phonemes: True
	| > phonemizer:
		| > phoneme language: es-es
		| > phoneme backend: gruut
	| > 6 not found characters:
	| > ɡ
	| > θ
	| > ͡
	| > ʃ
	| > ʝ
	| > ɾ
| > Number of instances : 401
 | > Preprocessing samples
 | > Max text length: 94
 | > Min text length: 20
 | > Avg text length: 61.45885286783042
 | 
 | > Max audio length: 80022.0
 | > Min audio length: 64166.0
 | > Avg audio length: 79982.45885286783
 | > Num. instances discarded samples: 0
 | > Batch group size: 0.



[1m   --> STEP: 1/26 -- GLOBAL_STEP: 2550[0m
     | > loss_disc: 2.66182  (2.66182)
     | > loss_disc_real_0: 0.17025  (0.17025)
     | > loss_disc_real_1: 0.24227  (0.24227)
     | > loss_disc_real_2: 0.21910  (0.21910)
     | > loss_disc_real_3: 0.30217  (0.30217)
     | > loss_disc_real_4: 0.27217  (0.27217)
     | > loss_disc_real_5: 0.22251  (0.22251)
     | > loss_0: 2.66182  (2.66182)
     | > grad_norm_0: 7.47480  (7.47480)
     | > loss_gen: 2.01497  (2.01497)
     | > loss_kl: 1.69765  (1.69765)
     | > loss_feat: 1.89987  (1.89987)
     | > loss_mel: 22.54731  (22.54731)
     | > loss_duration: 1.71666  (1.71666)
     | > amp_scaler: 512.00000  (512.00000)
     | > loss_1: 29.87646  (29.87646)
     | > grad_norm_1: 181.40155  (181.40155)
     | > current_lr_0: 0.00020 
     | > current_lr_1: 0.00020 
     | > step_time: 0.78440  (0.78445)
     | > loader_time: 0.00760  (0.00758)


[1m > EVALUATION [0m





> DataLoader initialization
| > Tokenizer:
	| > add_blank: True
	| > use_eos_bos: False
	| > use_phonemes: True
	| > phonemizer:
		| > phoneme language: es-es
		| > phoneme backend: gruut
	| > 6 not found characters:
	| > ɡ
	| > θ
	| > ͡
	| > ʃ
	| > ʝ
	| > ɾ
| > Number of instances : 4
 | > Preprocessing samples
 | > Max text length: 81
 | > Min text length: 46
 | > Avg text length: 66.5
 | 
 | > Max audio length: 80022.0
 | > Min audio length: 80022.0
 | > Avg audio length: 80022.0
 | > Num. instances discarded samples: 0
 | > Batch group size: 0.
 | > Synthesizing test sentences.



  [1m--> EVAL PERFORMANCE[0m
     | > avg_loader_time:[91m 0.48288 [0m(+0.00215)
     | > avg_loss_disc:[92m 2.19346 [0m(-0.37553)
     | > avg_loss_disc_real_0:[92m 0.16938 [0m(-0.19422)
     | > avg_loss_disc_real_1:[92m 0.20555 [0m(-0.02523)
     | > avg_loss_disc_real_2:[91m 0.23840 [0m(+0.06095)
     | > avg_loss_disc_real_3:[91m 0.20453 [0m(+0.07542)
     | > avg_loss_disc_real_4:[92m 0.12302 [0m(-0.04705)
     | > avg_loss_disc_real_5:[92m 0.18492 [0m(-0.01417)
     | > avg_loss_0:[92m 2.19346 [0m(-0.37553)
     | > avg_loss_gen:[91m 2.35388 [0m(+0.23498)
     | > avg_loss_kl:[91m 2.05273 [0m(+0.18805)
     | > avg_loss_feat:[91m 3.76610 [0m(+0.95510)
     | > avg_loss_mel:[91m 24.52137 [0m(+1.96592)
     | > avg_loss_duration:[91m 1.67100 [0m(+0.01237)
     | > avg_loss_1:[91m 34.36507 [0m(+3.35641)


[4m[1m > EPOCH: 14/100[0m
 --> /content/drive2/MyDrive/tts-ai/VITS-es-1-April-28-2023_09+16PM-7712052

[1m > TRAINING (2023-04-28 21:26:48) [



> DataLoader initialization
| > Tokenizer:
	| > add_blank: True
	| > use_eos_bos: False
	| > use_phonemes: True
	| > phonemizer:
		| > phoneme language: es-es
		| > phoneme backend: gruut
	| > 6 not found characters:
	| > ɡ
	| > θ
	| > ͡
	| > ʃ
	| > ʝ
	| > ɾ
| > Number of instances : 401
 | > Preprocessing samples
 | > Max text length: 94
 | > Min text length: 20
 | > Avg text length: 61.45885286783042
 | 
 | > Max audio length: 80022.0
 | > Min audio length: 64166.0
 | > Avg audio length: 79982.45885286783
 | > Num. instances discarded samples: 0
 | > Batch group size: 0.



[1m   --> STEP: 0/26 -- GLOBAL_STEP: 2575[0m
     | > loss_disc: 2.59212  (2.59212)
     | > loss_disc_real_0: 0.29593  (0.29593)
     | > loss_disc_real_1: 0.24546  (0.24546)
     | > loss_disc_real_2: 0.24057  (0.24057)
     | > loss_disc_real_3: 0.25247  (0.25247)
     | > loss_disc_real_4: 0.20907  (0.20907)
     | > loss_disc_real_5: 0.26048  (0.26048)
     | > loss_0: 2.59212  (2.59212)
     | > grad_norm_0: 17.87387  (17.87387)
     | > loss_gen: 2.39905  (2.39905)
     | > loss_kl: 1.69861  (1.69861)
     | > loss_feat: 3.42240  (3.42240)
     | > loss_mel: 25.48267  (25.48267)
     | > loss_duration: 1.70020  (1.70020)
     | > amp_scaler: 512.00000  (512.00000)
     | > loss_1: 34.70292  (34.70292)
     | > grad_norm_1: 134.75880  (134.75880)
     | > current_lr_0: 0.00020 
     | > current_lr_1: 0.00020 
     | > step_time: 0.97920  (0.97917)
     | > loader_time: 0.72610  (0.72606)


[1m   --> STEP: 25/26 -- GLOBAL_STEP: 2600[0m
     | > loss_disc: 2.93231  (2.66843)
 



> DataLoader initialization
| > Tokenizer:
	| > add_blank: True
	| > use_eos_bos: False
	| > use_phonemes: True
	| > phonemizer:
		| > phoneme language: es-es
		| > phoneme backend: gruut
	| > 6 not found characters:
	| > ɡ
	| > θ
	| > ͡
	| > ʃ
	| > ʝ
	| > ɾ
| > Number of instances : 4
 | > Preprocessing samples
 | > Max text length: 81
 | > Min text length: 46
 | > Avg text length: 66.5
 | 
 | > Max audio length: 80022.0
 | > Min audio length: 80022.0
 | > Avg audio length: 80022.0
 | > Num. instances discarded samples: 0
 | > Batch group size: 0.
 | > Synthesizing test sentences.



  [1m--> EVAL PERFORMANCE[0m
     | > avg_loader_time:[91m 0.72636 [0m(+0.24348)
     | > avg_loss_disc:[91m 2.42748 [0m(+0.23402)
     | > avg_loss_disc_real_0:[92m 0.14148 [0m(-0.02790)
     | > avg_loss_disc_real_1:[92m 0.13920 [0m(-0.06636)
     | > avg_loss_disc_real_2:[92m 0.17689 [0m(-0.06151)
     | > avg_loss_disc_real_3:[92m 0.14707 [0m(-0.05746)
     | > avg_loss_disc_real_4:[92m 0.10144 [0m(-0.02157)
     | > avg_loss_disc_real_5:[91m 0.21825 [0m(+0.03333)
     | > avg_loss_0:[91m 2.42748 [0m(+0.23402)
     | > avg_loss_gen:[92m 1.69661 [0m(-0.65728)
     | > avg_loss_kl:[92m 1.23040 [0m(-0.82233)
     | > avg_loss_feat:[92m 3.28830 [0m(-0.47780)
     | > avg_loss_mel:[91m 28.10257 [0m(+3.58120)
     | > avg_loss_duration:[92m 1.65082 [0m(-0.02017)
     | > avg_loss_1:[91m 35.96870 [0m(+1.60362)


[4m[1m > EPOCH: 15/100[0m
 --> /content/drive2/MyDrive/tts-ai/VITS-es-1-April-28-2023_09+16PM-7712052

[1m > TRAINING (2023-04-28 21:27:22) [



> DataLoader initialization
| > Tokenizer:
	| > add_blank: True
	| > use_eos_bos: False
	| > use_phonemes: True
	| > phonemizer:
		| > phoneme language: es-es
		| > phoneme backend: gruut
	| > 6 not found characters:
	| > ɡ
	| > θ
	| > ͡
	| > ʃ
	| > ʝ
	| > ɾ
| > Number of instances : 401
 | > Preprocessing samples
 | > Max text length: 94
 | > Min text length: 20
 | > Avg text length: 61.45885286783042
 | 
 | > Max audio length: 80022.0
 | > Min audio length: 64166.0
 | > Avg audio length: 79982.45885286783
 | > Num. instances discarded samples: 0
 | > Batch group size: 0.



[1m   --> STEP: 24/26 -- GLOBAL_STEP: 2625[0m
     | > loss_disc: 2.72474  (2.64186)
     | > loss_disc_real_0: 0.16540  (0.20746)
     | > loss_disc_real_1: 0.24397  (0.20930)
     | > loss_disc_real_2: 0.25047  (0.24495)
     | > loss_disc_real_3: 0.25988  (0.24601)
     | > loss_disc_real_4: 0.31134  (0.23305)
     | > loss_disc_real_5: 0.27551  (0.24480)
     | > loss_0: 2.72474  (2.64186)
     | > grad_norm_0: 9.39457  (18.95376)
     | > loss_gen: 1.96339  (2.02703)
     | > loss_kl: 1.20298  (1.49521)
     | > loss_feat: 2.13446  (2.25011)
     | > loss_mel: 23.94029  (24.05515)
     | > loss_duration: 1.81309  (1.77303)
     | > amp_scaler: 512.00000  (512.00000)
     | > loss_1: 31.05421  (31.60053)
     | > grad_norm_1: 195.28848  (204.75787)
     | > current_lr_0: 0.00020 
     | > current_lr_1: 0.00020 
     | > step_time: 0.84780  (0.77587)
     | > loader_time: 0.00690  (0.00980)


[1m > EVALUATION [0m





> DataLoader initialization
| > Tokenizer:
	| > add_blank: True
	| > use_eos_bos: False
	| > use_phonemes: True
	| > phonemizer:
		| > phoneme language: es-es
		| > phoneme backend: gruut
	| > 6 not found characters:
	| > ɡ
	| > θ
	| > ͡
	| > ʃ
	| > ʝ
	| > ɾ
| > Number of instances : 4
 | > Preprocessing samples
 | > Max text length: 81
 | > Min text length: 46
 | > Avg text length: 66.5
 | 
 | > Max audio length: 80022.0
 | > Min audio length: 80022.0
 | > Avg audio length: 80022.0
 | > Num. instances discarded samples: 0
 | > Batch group size: 0.
 | > Synthesizing test sentences.



  [1m--> EVAL PERFORMANCE[0m
     | > avg_loader_time:[92m 0.52824 [0m(-0.19812)
     | > avg_loss_disc:[91m 2.91303 [0m(+0.48555)
     | > avg_loss_disc_real_0:[91m 0.54530 [0m(+0.40382)
     | > avg_loss_disc_real_1:[91m 0.17162 [0m(+0.03242)
     | > avg_loss_disc_real_2:[91m 0.25604 [0m(+0.07914)
     | > avg_loss_disc_real_3:[91m 0.34155 [0m(+0.19447)
     | > avg_loss_disc_real_4:[91m 0.25420 [0m(+0.15275)
     | > avg_loss_disc_real_5:[91m 0.25207 [0m(+0.03382)
     | > avg_loss_0:[91m 2.91303 [0m(+0.48555)
     | > avg_loss_gen:[91m 2.26661 [0m(+0.57001)
     | > avg_loss_kl:[91m 1.53731 [0m(+0.30691)
     | > avg_loss_feat:[92m 1.65075 [0m(-1.63755)
     | > avg_loss_mel:[92m 23.44357 [0m(-4.65900)
     | > avg_loss_duration:[91m 1.66734 [0m(+0.01652)
     | > avg_loss_1:[92m 30.56558 [0m(-5.40311)


[4m[1m > EPOCH: 16/100[0m
 --> /content/drive2/MyDrive/tts-ai/VITS-es-1-April-28-2023_09+16PM-7712052

[1m > TRAINING (2023-04-28 21:27:56) [



> DataLoader initialization
| > Tokenizer:
	| > add_blank: True
	| > use_eos_bos: False
	| > use_phonemes: True
	| > phonemizer:
		| > phoneme language: es-es
		| > phoneme backend: gruut
	| > 6 not found characters:
	| > ɡ
	| > θ
	| > ͡
	| > ʃ
	| > ʝ
	| > ɾ
| > Number of instances : 401
 | > Preprocessing samples
 | > Max text length: 94
 | > Min text length: 20
 | > Avg text length: 61.45885286783042
 | 
 | > Max audio length: 80022.0
 | > Min audio length: 64166.0
 | > Avg audio length: 79982.45885286783
 | > Num. instances discarded samples: 0
 | > Batch group size: 0.



[1m   --> STEP: 23/26 -- GLOBAL_STEP: 2650[0m
     | > loss_disc: 2.60809  (2.64362)
     | > loss_disc_real_0: 0.17535  (0.19748)
     | > loss_disc_real_1: 0.19885  (0.21662)
     | > loss_disc_real_2: 0.23881  (0.24707)
     | > loss_disc_real_3: 0.23958  (0.24486)
     | > loss_disc_real_4: 0.25982  (0.23203)
     | > loss_disc_real_5: 0.22051  (0.24475)
     | > loss_0: 2.60809  (2.64362)
     | > grad_norm_0: 4.65061  (12.47512)
     | > loss_gen: 2.01448  (1.98604)
     | > loss_kl: 1.54969  (1.43938)
     | > loss_feat: 2.21146  (2.18946)
     | > loss_mel: 23.34823  (23.91680)
     | > loss_duration: 1.72481  (1.76973)
     | > amp_scaler: 512.00000  (512.00000)
     | > loss_1: 30.84867  (31.30141)
     | > grad_norm_1: 256.46921  (178.81839)
     | > current_lr_0: 0.00020 
     | > current_lr_1: 0.00020 
     | > step_time: 1.16180  (0.77505)
     | > loader_time: 0.00600  (0.00947)


[1m > EVALUATION [0m





> DataLoader initialization
| > Tokenizer:
	| > add_blank: True
	| > use_eos_bos: False
	| > use_phonemes: True
	| > phonemizer:
		| > phoneme language: es-es
		| > phoneme backend: gruut
	| > 6 not found characters:
	| > ɡ
	| > θ
	| > ͡
	| > ʃ
	| > ʝ
	| > ɾ
| > Number of instances : 4
 | > Preprocessing samples
 | > Max text length: 81
 | > Min text length: 46
 | > Avg text length: 66.5
 | 
 | > Max audio length: 80022.0
 | > Min audio length: 80022.0
 | > Avg audio length: 80022.0
 | > Num. instances discarded samples: 0
 | > Batch group size: 0.
 | > Synthesizing test sentences.



  [1m--> EVAL PERFORMANCE[0m
     | > avg_loader_time:[92m 0.47574 [0m(-0.05250)
     | > avg_loss_disc:[92m 2.54058 [0m(-0.37245)
     | > avg_loss_disc_real_0:[92m 0.12164 [0m(-0.42366)
     | > avg_loss_disc_real_1:[92m 0.16772 [0m(-0.00390)
     | > avg_loss_disc_real_2:[92m 0.19284 [0m(-0.06319)
     | > avg_loss_disc_real_3:[92m 0.15348 [0m(-0.18806)
     | > avg_loss_disc_real_4:[92m 0.16326 [0m(-0.09093)
     | > avg_loss_disc_real_5:[92m 0.18965 [0m(-0.06242)
     | > avg_loss_0:[92m 2.54058 [0m(-0.37245)
     | > avg_loss_gen:[92m 1.59837 [0m(-0.66824)
     | > avg_loss_kl:[91m 1.82369 [0m(+0.28637)
     | > avg_loss_feat:[91m 2.41328 [0m(+0.76253)
     | > avg_loss_mel:[91m 24.87691 [0m(+1.43334)
     | > avg_loss_duration:[92m 1.66202 [0m(-0.00533)
     | > avg_loss_1:[91m 32.37426 [0m(+1.80867)


[4m[1m > EPOCH: 17/100[0m
 --> /content/drive2/MyDrive/tts-ai/VITS-es-1-April-28-2023_09+16PM-7712052

[1m > TRAINING (2023-04-28 21:28:31) [



> DataLoader initialization
| > Tokenizer:
	| > add_blank: True
	| > use_eos_bos: False
	| > use_phonemes: True
	| > phonemizer:
		| > phoneme language: es-es
		| > phoneme backend: gruut
	| > 6 not found characters:
	| > ɡ
	| > θ
	| > ͡
	| > ʃ
	| > ʝ
	| > ɾ
| > Number of instances : 401
 | > Preprocessing samples
 | > Max text length: 94
 | > Min text length: 20
 | > Avg text length: 61.45885286783042
 | 
 | > Max audio length: 80022.0
 | > Min audio length: 64166.0
 | > Avg audio length: 79982.45885286783
 | > Num. instances discarded samples: 0
 | > Batch group size: 0.



[1m   --> STEP: 22/26 -- GLOBAL_STEP: 2675[0m
     | > loss_disc: 2.70178  (2.60702)
     | > loss_disc_real_0: 0.25475  (0.19434)
     | > loss_disc_real_1: 0.16223  (0.21035)
     | > loss_disc_real_2: 0.26051  (0.24399)
     | > loss_disc_real_3: 0.24073  (0.23515)
     | > loss_disc_real_4: 0.24397  (0.22803)
     | > loss_disc_real_5: 0.25094  (0.24808)
     | > loss_0: 2.70178  (2.60702)
     | > grad_norm_0: 7.98664  (10.82421)
     | > loss_gen: 1.94136  (2.00020)
     | > loss_kl: 1.41478  (1.40784)
     | > loss_feat: 2.00891  (2.24843)
     | > loss_mel: 23.52256  (23.47497)
     | > loss_duration: 1.73494  (1.77845)
     | > amp_scaler: 512.00000  (512.00000)
     | > loss_1: 30.62254  (30.90988)
     | > grad_norm_1: 193.58046  (231.57690)
     | > current_lr_0: 0.00020 
     | > current_lr_1: 0.00020 
     | > step_time: 0.69920  (0.76136)
     | > loader_time: 0.00590  (0.00861)


[1m > EVALUATION [0m





> DataLoader initialization
| > Tokenizer:
	| > add_blank: True
	| > use_eos_bos: False
	| > use_phonemes: True
	| > phonemizer:
		| > phoneme language: es-es
		| > phoneme backend: gruut
	| > 6 not found characters:
	| > ɡ
	| > θ
	| > ͡
	| > ʃ
	| > ʝ
	| > ɾ
| > Number of instances : 4
 | > Preprocessing samples
 | > Max text length: 81
 | > Min text length: 46
 | > Avg text length: 66.5
 | 
 | > Max audio length: 80022.0
 | > Min audio length: 80022.0
 | > Avg audio length: 80022.0
 | > Num. instances discarded samples: 0
 | > Batch group size: 0.
 | > Synthesizing test sentences.



  [1m--> EVAL PERFORMANCE[0m
     | > avg_loader_time:[91m 0.49088 [0m(+0.01514)
     | > avg_loss_disc:[91m 2.74817 [0m(+0.20759)
     | > avg_loss_disc_real_0:[91m 0.21953 [0m(+0.09789)
     | > avg_loss_disc_real_1:[91m 0.21214 [0m(+0.04442)
     | > avg_loss_disc_real_2:[91m 0.19645 [0m(+0.00361)
     | > avg_loss_disc_real_3:[91m 0.21206 [0m(+0.05858)
     | > avg_loss_disc_real_4:[91m 0.51882 [0m(+0.35556)
     | > avg_loss_disc_real_5:[91m 0.25385 [0m(+0.06419)
     | > avg_loss_0:[91m 2.74817 [0m(+0.20759)
     | > avg_loss_gen:[91m 2.20259 [0m(+0.60422)
     | > avg_loss_kl:[92m 1.47537 [0m(-0.34831)
     | > avg_loss_feat:[92m 2.07837 [0m(-0.33491)
     | > avg_loss_mel:[92m 21.33174 [0m(-3.54517)
     | > avg_loss_duration:[91m 1.66677 [0m(+0.00475)
     | > avg_loss_1:[92m 28.75484 [0m(-3.61941)

 > BEST MODEL : /content/drive2/MyDrive/tts-ai/VITS-es-1-April-28-2023_09+16PM-7712052/best_model_2679.pth

[4m[1m > EPOCH: 18/100[0m
 --> /cont



> DataLoader initialization
| > Tokenizer:
	| > add_blank: True
	| > use_eos_bos: False
	| > use_phonemes: True
	| > phonemizer:
		| > phoneme language: es-es
		| > phoneme backend: gruut
	| > 6 not found characters:
	| > ɡ
	| > θ
	| > ͡
	| > ʃ
	| > ʝ
	| > ɾ
| > Number of instances : 401
 | > Preprocessing samples
 | > Max text length: 94
 | > Min text length: 20
 | > Avg text length: 61.45885286783042
 | 
 | > Max audio length: 80022.0
 | > Min audio length: 64166.0
 | > Avg audio length: 79982.45885286783
 | > Num. instances discarded samples: 0
 | > Batch group size: 0.



[1m   --> STEP: 21/26 -- GLOBAL_STEP: 2700[0m
     | > loss_disc: 2.54200  (2.67489)
     | > loss_disc_real_0: 0.24616  (0.18667)
     | > loss_disc_real_1: 0.18098  (0.23156)
     | > loss_disc_real_2: 0.21006  (0.24849)
     | > loss_disc_real_3: 0.20819  (0.24200)
     | > loss_disc_real_4: 0.22256  (0.26270)
     | > loss_disc_real_5: 0.21857  (0.24831)
     | > loss_0: 2.54200  (2.67489)
     | > grad_norm_0: 19.10000  (13.30800)
     | > loss_gen: 2.13030  (2.01098)
     | > loss_kl: 1.29714  (1.46566)
     | > loss_feat: 2.55339  (2.21162)
     | > loss_mel: 23.59073  (23.44823)
     | > loss_duration: 1.82841  (1.77610)
     | > amp_scaler: 512.00000  (512.00000)
     | > loss_1: 31.39997  (30.91259)
     | > grad_norm_1: 315.15479  (205.91460)
     | > current_lr_0: 0.00020 
     | > current_lr_1: 0.00020 
     | > step_time: 0.80430  (0.91664)
     | > loader_time: 0.00870  (0.01029)


[1m > EVALUATION [0m





> DataLoader initialization
| > Tokenizer:
	| > add_blank: True
	| > use_eos_bos: False
	| > use_phonemes: True
	| > phonemizer:
		| > phoneme language: es-es
		| > phoneme backend: gruut
	| > 6 not found characters:
	| > ɡ
	| > θ
	| > ͡
	| > ʃ
	| > ʝ
	| > ɾ
| > Number of instances : 4
 | > Preprocessing samples
 | > Max text length: 81
 | > Min text length: 46
 | > Avg text length: 66.5
 | 
 | > Max audio length: 80022.0
 | > Min audio length: 80022.0
 | > Avg audio length: 80022.0
 | > Num. instances discarded samples: 0
 | > Batch group size: 0.
 | > Synthesizing test sentences.



  [1m--> EVAL PERFORMANCE[0m
     | > avg_loader_time:[91m 0.65196 [0m(+0.16108)
     | > avg_loss_disc:[92m 2.71225 [0m(-0.03592)
     | > avg_loss_disc_real_0:[92m 0.09942 [0m(-0.12011)
     | > avg_loss_disc_real_1:[92m 0.20714 [0m(-0.00500)
     | > avg_loss_disc_real_2:[91m 0.19740 [0m(+0.00095)
     | > avg_loss_disc_real_3:[91m 0.23691 [0m(+0.02485)
     | > avg_loss_disc_real_4:[92m 0.16014 [0m(-0.35868)
     | > avg_loss_disc_real_5:[92m 0.16883 [0m(-0.08502)
     | > avg_loss_0:[92m 2.71225 [0m(-0.03592)
     | > avg_loss_gen:[92m 1.62102 [0m(-0.58158)
     | > avg_loss_kl:[92m 1.43640 [0m(-0.03897)
     | > avg_loss_feat:[91m 4.16884 [0m(+2.09047)
     | > avg_loss_mel:[91m 26.56277 [0m(+5.23104)
     | > avg_loss_duration:[91m 1.68206 [0m(+0.01529)
     | > avg_loss_1:[91m 35.47109 [0m(+6.71625)


[4m[1m > EPOCH: 19/100[0m
 --> /content/drive2/MyDrive/tts-ai/VITS-es-1-April-28-2023_09+16PM-7712052

[1m > TRAINING (2023-04-28 21:29:56) [



> DataLoader initialization
| > Tokenizer:
	| > add_blank: True
	| > use_eos_bos: False
	| > use_phonemes: True
	| > phonemizer:
		| > phoneme language: es-es
		| > phoneme backend: gruut
	| > 6 not found characters:
	| > ɡ
	| > θ
	| > ͡
	| > ʃ
	| > ʝ
	| > ɾ
| > Number of instances : 401
 | > Preprocessing samples
 | > Max text length: 94
 | > Min text length: 20
 | > Avg text length: 61.45885286783042
 | 
 | > Max audio length: 80022.0
 | > Min audio length: 64166.0
 | > Avg audio length: 79982.45885286783
 | > Num. instances discarded samples: 0
 | > Batch group size: 0.



[1m   --> STEP: 20/26 -- GLOBAL_STEP: 2725[0m
     | > loss_disc: 2.68090  (2.67734)
     | > loss_disc_real_0: 0.20178  (0.21055)
     | > loss_disc_real_1: 0.23593  (0.22710)
     | > loss_disc_real_2: 0.25822  (0.24710)
     | > loss_disc_real_3: 0.23803  (0.24640)
     | > loss_disc_real_4: 0.26176  (0.23342)
     | > loss_disc_real_5: 0.23330  (0.25105)
     | > loss_0: 2.68090  (2.67734)
     | > grad_norm_0: 6.29361  (16.52412)
     | > loss_gen: 1.66618  (1.99953)
     | > loss_kl: 1.31541  (1.45770)
     | > loss_feat: 1.86423  (2.13931)
     | > loss_mel: 22.33727  (23.50745)
     | > loss_duration: 1.80470  (1.77532)
     | > amp_scaler: 512.00000  (512.00000)
     | > loss_1: 28.98778  (30.87932)
     | > grad_norm_1: 191.12904  (199.17987)
     | > current_lr_0: 0.00020 
     | > current_lr_1: 0.00020 
     | > step_time: 0.67970  (0.76230)
     | > loader_time: 0.00580  (0.01071)


[1m > EVALUATION [0m





> DataLoader initialization
| > Tokenizer:
	| > add_blank: True
	| > use_eos_bos: False
	| > use_phonemes: True
	| > phonemizer:
		| > phoneme language: es-es
		| > phoneme backend: gruut
	| > 6 not found characters:
	| > ɡ
	| > θ
	| > ͡
	| > ʃ
	| > ʝ
	| > ɾ
| > Number of instances : 4
 | > Preprocessing samples
 | > Max text length: 81
 | > Min text length: 46
 | > Avg text length: 66.5
 | 
 | > Max audio length: 80022.0
 | > Min audio length: 80022.0
 | > Avg audio length: 80022.0
 | > Num. instances discarded samples: 0
 | > Batch group size: 0.
 | > Synthesizing test sentences.



  [1m--> EVAL PERFORMANCE[0m
     | > avg_loader_time:[91m 0.70906 [0m(+0.05710)
     | > avg_loss_disc:[92m 2.37623 [0m(-0.33602)
     | > avg_loss_disc_real_0:[91m 0.25697 [0m(+0.15755)
     | > avg_loss_disc_real_1:[91m 0.23317 [0m(+0.02603)
     | > avg_loss_disc_real_2:[91m 0.22779 [0m(+0.03039)
     | > avg_loss_disc_real_3:[91m 0.33109 [0m(+0.09417)
     | > avg_loss_disc_real_4:[91m 0.23142 [0m(+0.07128)
     | > avg_loss_disc_real_5:[91m 0.19091 [0m(+0.02208)
     | > avg_loss_0:[92m 2.37623 [0m(-0.33602)
     | > avg_loss_gen:[91m 2.47552 [0m(+0.85450)
     | > avg_loss_kl:[91m 1.44795 [0m(+0.01155)
     | > avg_loss_feat:[92m 2.76212 [0m(-1.40672)
     | > avg_loss_mel:[92m 23.99981 [0m(-2.56296)
     | > avg_loss_duration:[92m 1.65522 [0m(-0.02684)
     | > avg_loss_1:[92m 32.34062 [0m(-3.13047)


[4m[1m > EPOCH: 20/100[0m
 --> /content/drive2/MyDrive/tts-ai/VITS-es-1-April-28-2023_09+16PM-7712052

[1m > TRAINING (2023-04-28 21:30:29) [



> DataLoader initialization
| > Tokenizer:
	| > add_blank: True
	| > use_eos_bos: False
	| > use_phonemes: True
	| > phonemizer:
		| > phoneme language: es-es
		| > phoneme backend: gruut
	| > 6 not found characters:
	| > ɡ
	| > θ
	| > ͡
	| > ʃ
	| > ʝ
	| > ɾ
| > Number of instances : 401
 | > Preprocessing samples
 | > Max text length: 94
 | > Min text length: 20
 | > Avg text length: 61.45885286783042
 | 
 | > Max audio length: 80022.0
 | > Min audio length: 64166.0
 | > Avg audio length: 79982.45885286783
 | > Num. instances discarded samples: 0
 | > Batch group size: 0.



[1m   --> STEP: 19/26 -- GLOBAL_STEP: 2750[0m
     | > loss_disc: 2.65178  (2.61173)
     | > loss_disc_real_0: 0.22568  (0.19070)
     | > loss_disc_real_1: 0.27406  (0.21341)
     | > loss_disc_real_2: 0.23998  (0.23822)
     | > loss_disc_real_3: 0.25158  (0.24801)
     | > loss_disc_real_4: 0.30437  (0.23539)
     | > loss_disc_real_5: 0.28437  (0.24856)
     | > loss_0: 2.65178  (2.61173)
     | > grad_norm_0: 7.86295  (10.85916)
     | > loss_gen: 1.96794  (1.98803)
     | > loss_kl: 1.36261  (1.48602)
     | > loss_feat: 2.08391  (2.24652)
     | > loss_mel: 23.56423  (23.98683)
     | > loss_duration: 1.79200  (1.77411)
     | > amp_scaler: 512.00000  (512.00000)
     | > loss_1: 30.77068  (31.48152)
     | > grad_norm_1: 344.48651  (222.71458)
     | > current_lr_0: 0.00020 
     | > current_lr_1: 0.00020 
     | > step_time: 0.77520  (0.75973)
     | > loader_time: 0.00680  (0.00957)


[1m > EVALUATION [0m





> DataLoader initialization
| > Tokenizer:
	| > add_blank: True
	| > use_eos_bos: False
	| > use_phonemes: True
	| > phonemizer:
		| > phoneme language: es-es
		| > phoneme backend: gruut
	| > 6 not found characters:
	| > ɡ
	| > θ
	| > ͡
	| > ʃ
	| > ʝ
	| > ɾ
| > Number of instances : 4
 | > Preprocessing samples
 | > Max text length: 81
 | > Min text length: 46
 | > Avg text length: 66.5
 | 
 | > Max audio length: 80022.0
 | > Min audio length: 80022.0
 | > Avg audio length: 80022.0
 | > Num. instances discarded samples: 0
 | > Batch group size: 0.
 | > Synthesizing test sentences.



  [1m--> EVAL PERFORMANCE[0m
     | > avg_loader_time:[92m 0.47798 [0m(-0.23108)
     | > avg_loss_disc:[91m 2.58544 [0m(+0.20921)
     | > avg_loss_disc_real_0:[91m 0.28235 [0m(+0.02539)
     | > avg_loss_disc_real_1:[92m 0.13895 [0m(-0.09422)
     | > avg_loss_disc_real_2:[92m 0.16951 [0m(-0.05828)
     | > avg_loss_disc_real_3:[92m 0.20251 [0m(-0.12857)
     | > avg_loss_disc_real_4:[92m 0.16917 [0m(-0.06226)
     | > avg_loss_disc_real_5:[92m 0.19086 [0m(-0.00005)
     | > avg_loss_0:[91m 2.58544 [0m(+0.20921)
     | > avg_loss_gen:[92m 1.80713 [0m(-0.66839)
     | > avg_loss_kl:[91m 1.96953 [0m(+0.52158)
     | > avg_loss_feat:[92m 2.40190 [0m(-0.36022)
     | > avg_loss_mel:[92m 22.80750 [0m(-1.19231)
     | > avg_loss_duration:[91m 1.66650 [0m(+0.01128)
     | > avg_loss_1:[92m 30.65256 [0m(-1.68806)


[4m[1m > EPOCH: 21/100[0m
 --> /content/drive2/MyDrive/tts-ai/VITS-es-1-April-28-2023_09+16PM-7712052

[1m > TRAINING (2023-04-28 21:31:05) [



> DataLoader initialization
| > Tokenizer:
	| > add_blank: True
	| > use_eos_bos: False
	| > use_phonemes: True
	| > phonemizer:
		| > phoneme language: es-es
		| > phoneme backend: gruut
	| > 6 not found characters:
	| > ɡ
	| > θ
	| > ͡
	| > ʃ
	| > ʝ
	| > ɾ
| > Number of instances : 401
 | > Preprocessing samples
 | > Max text length: 94
 | > Min text length: 20
 | > Avg text length: 61.45885286783042
 | 
 | > Max audio length: 80022.0
 | > Min audio length: 64166.0
 | > Avg audio length: 79982.45885286783
 | > Num. instances discarded samples: 0
 | > Batch group size: 0.



[1m   --> STEP: 18/26 -- GLOBAL_STEP: 2775[0m
     | > loss_disc: 2.43943  (2.62203)
     | > loss_disc_real_0: 0.20908  (0.18800)
     | > loss_disc_real_1: 0.20307  (0.22266)
     | > loss_disc_real_2: 0.23103  (0.24791)
     | > loss_disc_real_3: 0.23698  (0.23847)
     | > loss_disc_real_4: 0.16134  (0.21985)
     | > loss_disc_real_5: 0.26697  (0.24988)
     | > loss_0: 2.43943  (2.62203)
     | > grad_norm_0: 10.02452  (12.82694)
     | > loss_gen: 2.18544  (2.02822)
     | > loss_kl: 1.04543  (1.46240)
     | > loss_feat: 2.72243  (2.29996)
     | > loss_mel: 24.88552  (23.07959)
     | > loss_duration: 1.82070  (1.77570)
     | > amp_scaler: 512.00000  (512.00000)
     | > loss_1: 32.65952  (30.64587)
     | > grad_norm_1: 237.79163  (228.40877)
     | > current_lr_0: 0.00020 
     | > current_lr_1: 0.00020 
     | > step_time: 0.79560  (0.77133)
     | > loader_time: 0.00690  (0.00990)


[1m > EVALUATION [0m





> DataLoader initialization
| > Tokenizer:
	| > add_blank: True
	| > use_eos_bos: False
	| > use_phonemes: True
	| > phonemizer:
		| > phoneme language: es-es
		| > phoneme backend: gruut
	| > 6 not found characters:
	| > ɡ
	| > θ
	| > ͡
	| > ʃ
	| > ʝ
	| > ɾ
| > Number of instances : 4
 | > Preprocessing samples
 | > Max text length: 81
 | > Min text length: 46
 | > Avg text length: 66.5
 | 
 | > Max audio length: 80022.0
 | > Min audio length: 80022.0
 | > Avg audio length: 80022.0
 | > Num. instances discarded samples: 0
 | > Batch group size: 0.
 | > Synthesizing test sentences.



  [1m--> EVAL PERFORMANCE[0m
     | > avg_loader_time:[92m 0.47507 [0m(-0.00292)
     | > avg_loss_disc:[92m 2.49662 [0m(-0.08882)
     | > avg_loss_disc_real_0:[92m 0.19430 [0m(-0.08805)
     | > avg_loss_disc_real_1:[92m 0.10324 [0m(-0.03571)
     | > avg_loss_disc_real_2:[91m 0.19098 [0m(+0.02148)
     | > avg_loss_disc_real_3:[91m 0.27552 [0m(+0.07301)
     | > avg_loss_disc_real_4:[91m 0.24194 [0m(+0.07278)
     | > avg_loss_disc_real_5:[91m 0.23588 [0m(+0.04502)
     | > avg_loss_0:[92m 2.49662 [0m(-0.08882)
     | > avg_loss_gen:[91m 1.96050 [0m(+0.15337)
     | > avg_loss_kl:[92m 1.71572 [0m(-0.25382)
     | > avg_loss_feat:[91m 3.03887 [0m(+0.63697)
     | > avg_loss_mel:[91m 24.39992 [0m(+1.59242)
     | > avg_loss_duration:[92m 1.65061 [0m(-0.01589)
     | > avg_loss_1:[91m 32.76561 [0m(+2.11305)


[4m[1m > EPOCH: 22/100[0m
 --> /content/drive2/MyDrive/tts-ai/VITS-es-1-April-28-2023_09+16PM-7712052

[1m > TRAINING (2023-04-28 21:31:38) [



> DataLoader initialization
| > Tokenizer:
	| > add_blank: True
	| > use_eos_bos: False
	| > use_phonemes: True
	| > phonemizer:
		| > phoneme language: es-es
		| > phoneme backend: gruut
	| > 6 not found characters:
	| > ɡ
	| > θ
	| > ͡
	| > ʃ
	| > ʝ
	| > ɾ
| > Number of instances : 401
 | > Preprocessing samples
 | > Max text length: 94
 | > Min text length: 20
 | > Avg text length: 61.45885286783042
 | 
 | > Max audio length: 80022.0
 | > Min audio length: 64166.0
 | > Avg audio length: 79982.45885286783
 | > Num. instances discarded samples: 0
 | > Batch group size: 0.



[1m   --> STEP: 17/26 -- GLOBAL_STEP: 2800[0m
     | > loss_disc: 2.72670  (2.59639)
     | > loss_disc_real_0: 0.16684  (0.18850)
     | > loss_disc_real_1: 0.20075  (0.20941)
     | > loss_disc_real_2: 0.26489  (0.24031)
     | > loss_disc_real_3: 0.27395  (0.24158)
     | > loss_disc_real_4: 0.26021  (0.23581)
     | > loss_disc_real_5: 0.25423  (0.24885)
     | > loss_0: 2.72670  (2.59639)
     | > grad_norm_0: 8.20823  (15.42239)
     | > loss_gen: 2.21962  (2.04300)
     | > loss_kl: 1.33654  (1.49473)
     | > loss_feat: 2.44283  (2.37647)
     | > loss_mel: 25.35480  (23.27780)
     | > loss_duration: 1.79357  (1.77337)
     | > amp_scaler: 512.00000  (512.00000)
     | > loss_1: 33.14735  (30.96537)
     | > grad_norm_1: 105.06622  (194.33711)
     | > current_lr_0: 0.00020 
     | > current_lr_1: 0.00020 
     | > step_time: 0.68720  (0.78292)
     | > loader_time: 0.00690  (0.00928)


[1m > EVALUATION [0m





> DataLoader initialization
| > Tokenizer:
	| > add_blank: True
	| > use_eos_bos: False
	| > use_phonemes: True
	| > phonemizer:
		| > phoneme language: es-es
		| > phoneme backend: gruut
	| > 6 not found characters:
	| > ɡ
	| > θ
	| > ͡
	| > ʃ
	| > ʝ
	| > ɾ
| > Number of instances : 4
 | > Preprocessing samples
 | > Max text length: 81
 | > Min text length: 46
 | > Avg text length: 66.5
 | 
 | > Max audio length: 80022.0
 | > Min audio length: 80022.0
 | > Avg audio length: 80022.0
 | > Num. instances discarded samples: 0
 | > Batch group size: 0.
 | > Synthesizing test sentences.



  [1m--> EVAL PERFORMANCE[0m
     | > avg_loader_time:[91m 0.48115 [0m(+0.00608)
     | > avg_loss_disc:[92m 2.43163 [0m(-0.06499)
     | > avg_loss_disc_real_0:[91m 0.32102 [0m(+0.12672)
     | > avg_loss_disc_real_1:[91m 0.28135 [0m(+0.17810)
     | > avg_loss_disc_real_2:[91m 0.24596 [0m(+0.05498)
     | > avg_loss_disc_real_3:[92m 0.17102 [0m(-0.10450)
     | > avg_loss_disc_real_4:[91m 0.28248 [0m(+0.04053)
     | > avg_loss_disc_real_5:[92m 0.19820 [0m(-0.03768)
     | > avg_loss_0:[92m 2.43163 [0m(-0.06499)
     | > avg_loss_gen:[91m 2.65067 [0m(+0.69017)
     | > avg_loss_kl:[92m 1.46231 [0m(-0.25341)
     | > avg_loss_feat:[91m 3.17912 [0m(+0.14026)
     | > avg_loss_mel:[91m 25.91936 [0m(+1.51944)
     | > avg_loss_duration:[91m 1.66201 [0m(+0.01139)
     | > avg_loss_1:[91m 34.87347 [0m(+2.10786)


[4m[1m > EPOCH: 23/100[0m
 --> /content/drive2/MyDrive/tts-ai/VITS-es-1-April-28-2023_09+16PM-7712052

[1m > TRAINING (2023-04-28 21:32:13) [



> DataLoader initialization
| > Tokenizer:
	| > add_blank: True
	| > use_eos_bos: False
	| > use_phonemes: True
	| > phonemizer:
		| > phoneme language: es-es
		| > phoneme backend: gruut
	| > 6 not found characters:
	| > ɡ
	| > θ
	| > ͡
	| > ʃ
	| > ʝ
	| > ɾ
| > Number of instances : 401
 | > Preprocessing samples
 | > Max text length: 94
 | > Min text length: 20
 | > Avg text length: 61.45885286783042
 | 
 | > Max audio length: 80022.0
 | > Min audio length: 64166.0
 | > Avg audio length: 79982.45885286783
 | > Num. instances discarded samples: 0
 | > Batch group size: 0.



[1m   --> STEP: 16/26 -- GLOBAL_STEP: 2825[0m
     | > loss_disc: 2.51543  (2.60306)
     | > loss_disc_real_0: 0.13419  (0.18048)
     | > loss_disc_real_1: 0.18754  (0.20686)
     | > loss_disc_real_2: 0.18132  (0.24771)
     | > loss_disc_real_3: 0.16981  (0.23753)
     | > loss_disc_real_4: 0.26909  (0.24875)
     | > loss_disc_real_5: 0.21356  (0.25054)
     | > loss_0: 2.51543  (2.60306)
     | > grad_norm_0: 16.15522  (11.99712)
     | > loss_gen: 2.11859  (2.07132)
     | > loss_kl: 1.39999  (1.50762)
     | > loss_feat: 2.98685  (2.50894)
     | > loss_mel: 26.09316  (23.95554)
     | > loss_duration: 1.80478  (1.77110)
     | > amp_scaler: 512.00000  (512.00000)
     | > loss_1: 34.40336  (31.81452)
     | > grad_norm_1: 190.51585  (210.14737)
     | > current_lr_0: 0.00020 
     | > current_lr_1: 0.00020 
     | > step_time: 0.72650  (0.78571)
     | > loader_time: 0.01030  (0.01101)


[1m > EVALUATION [0m





> DataLoader initialization
| > Tokenizer:
	| > add_blank: True
	| > use_eos_bos: False
	| > use_phonemes: True
	| > phonemizer:
		| > phoneme language: es-es
		| > phoneme backend: gruut
	| > 6 not found characters:
	| > ɡ
	| > θ
	| > ͡
	| > ʃ
	| > ʝ
	| > ɾ
| > Number of instances : 4
 | > Preprocessing samples
 | > Max text length: 81
 | > Min text length: 46
 | > Avg text length: 66.5
 | 
 | > Max audio length: 80022.0
 | > Min audio length: 80022.0
 | > Avg audio length: 80022.0
 | > Num. instances discarded samples: 0
 | > Batch group size: 0.
 | > Synthesizing test sentences.



  [1m--> EVAL PERFORMANCE[0m
     | > avg_loader_time:[91m 0.75062 [0m(+0.26948)
     | > avg_loss_disc:[91m 2.96939 [0m(+0.53776)
     | > avg_loss_disc_real_0:[91m 0.65492 [0m(+0.33390)
     | > avg_loss_disc_real_1:[92m 0.22300 [0m(-0.05834)
     | > avg_loss_disc_real_2:[91m 0.29948 [0m(+0.05352)
     | > avg_loss_disc_real_3:[91m 0.35485 [0m(+0.18382)
     | > avg_loss_disc_real_4:[92m 0.24459 [0m(-0.03788)
     | > avg_loss_disc_real_5:[91m 0.25085 [0m(+0.05265)
     | > avg_loss_0:[91m 2.96939 [0m(+0.53776)
     | > avg_loss_gen:[91m 2.76060 [0m(+0.10993)
     | > avg_loss_kl:[91m 1.90569 [0m(+0.44338)
     | > avg_loss_feat:[92m 1.87221 [0m(-1.30691)
     | > avg_loss_mel:[92m 20.82895 [0m(-5.09041)
     | > avg_loss_duration:[92m 1.65556 [0m(-0.00645)
     | > avg_loss_1:[92m 29.02300 [0m(-5.85046)


[4m[1m > EPOCH: 24/100[0m
 --> /content/drive2/MyDrive/tts-ai/VITS-es-1-April-28-2023_09+16PM-7712052

[1m > TRAINING (2023-04-28 21:32:47) [



> DataLoader initialization
| > Tokenizer:
	| > add_blank: True
	| > use_eos_bos: False
	| > use_phonemes: True
	| > phonemizer:
		| > phoneme language: es-es
		| > phoneme backend: gruut
	| > 6 not found characters:
	| > ɡ
	| > θ
	| > ͡
	| > ʃ
	| > ʝ
	| > ɾ
| > Number of instances : 401
 | > Preprocessing samples
 | > Max text length: 94
 | > Min text length: 20
 | > Avg text length: 61.45885286783042
 | 
 | > Max audio length: 80022.0
 | > Min audio length: 64166.0
 | > Avg audio length: 79982.45885286783
 | > Num. instances discarded samples: 0
 | > Batch group size: 0.



[1m   --> STEP: 15/26 -- GLOBAL_STEP: 2850[0m
     | > loss_disc: 2.75726  (2.63166)
     | > loss_disc_real_0: 0.16303  (0.17781)
     | > loss_disc_real_1: 0.24293  (0.21301)
     | > loss_disc_real_2: 0.23238  (0.24649)
     | > loss_disc_real_3: 0.24688  (0.25380)
     | > loss_disc_real_4: 0.28822  (0.24585)
     | > loss_disc_real_5: 0.23554  (0.25060)
     | > loss_0: 2.75726  (2.63166)
     | > grad_norm_0: 11.30067  (15.28273)
     | > loss_gen: 1.83142  (2.02054)
     | > loss_kl: 1.28239  (1.46007)
     | > loss_feat: 1.68208  (2.31456)
     | > loss_mel: 21.73529  (23.47960)
     | > loss_duration: 1.84550  (1.76876)
     | > amp_scaler: 512.00000  (512.00000)
     | > loss_1: 28.37668  (31.04354)
     | > grad_norm_1: 151.65414  (216.18266)
     | > current_lr_0: 0.00020 
     | > current_lr_1: 0.00020 
     | > step_time: 0.72970  (0.76336)
     | > loader_time: 0.00760  (0.01018)


[1m > EVALUATION [0m





> DataLoader initialization
| > Tokenizer:
	| > add_blank: True
	| > use_eos_bos: False
	| > use_phonemes: True
	| > phonemizer:
		| > phoneme language: es-es
		| > phoneme backend: gruut
	| > 6 not found characters:
	| > ɡ
	| > θ
	| > ͡
	| > ʃ
	| > ʝ
	| > ɾ
| > Number of instances : 4
 | > Preprocessing samples
 | > Max text length: 81
 | > Min text length: 46
 | > Avg text length: 66.5
 | 
 | > Max audio length: 80022.0
 | > Min audio length: 80022.0
 | > Avg audio length: 80022.0
 | > Num. instances discarded samples: 0
 | > Batch group size: 0.
 | > Synthesizing test sentences.



  [1m--> EVAL PERFORMANCE[0m
     | > avg_loader_time:[92m 0.49032 [0m(-0.26031)
     | > avg_loss_disc:[92m 2.32621 [0m(-0.64318)
     | > avg_loss_disc_real_0:[92m 0.14444 [0m(-0.51048)
     | > avg_loss_disc_real_1:[92m 0.14915 [0m(-0.07386)
     | > avg_loss_disc_real_2:[92m 0.29171 [0m(-0.00777)
     | > avg_loss_disc_real_3:[92m 0.33147 [0m(-0.02337)
     | > avg_loss_disc_real_4:[91m 0.27000 [0m(+0.02540)
     | > avg_loss_disc_real_5:[92m 0.13106 [0m(-0.11979)
     | > avg_loss_0:[92m 2.32621 [0m(-0.64318)
     | > avg_loss_gen:[92m 2.30349 [0m(-0.45711)
     | > avg_loss_kl:[92m 1.59936 [0m(-0.30632)
     | > avg_loss_feat:[91m 3.40900 [0m(+1.53680)
     | > avg_loss_mel:[91m 27.51772 [0m(+6.68877)
     | > avg_loss_duration:[91m 1.67796 [0m(+0.02240)
     | > avg_loss_1:[91m 36.50753 [0m(+7.48453)


[4m[1m > EPOCH: 25/100[0m
 --> /content/drive2/MyDrive/tts-ai/VITS-es-1-April-28-2023_09+16PM-7712052

[1m > TRAINING (2023-04-28 21:33:23) [



> DataLoader initialization
| > Tokenizer:
	| > add_blank: True
	| > use_eos_bos: False
	| > use_phonemes: True
	| > phonemizer:
		| > phoneme language: es-es
		| > phoneme backend: gruut
	| > 6 not found characters:
	| > ɡ
	| > θ
	| > ͡
	| > ʃ
	| > ʝ
	| > ɾ
| > Number of instances : 401
 | > Preprocessing samples
 | > Max text length: 94
 | > Min text length: 20
 | > Avg text length: 61.45885286783042
 | 
 | > Max audio length: 80022.0
 | > Min audio length: 64166.0
 | > Avg audio length: 79982.45885286783
 | > Num. instances discarded samples: 0
 | > Batch group size: 0.



[1m   --> STEP: 14/26 -- GLOBAL_STEP: 2875[0m
     | > loss_disc: 2.64518  (2.69202)
     | > loss_disc_real_0: 0.09187  (0.17231)
     | > loss_disc_real_1: 0.24120  (0.23255)
     | > loss_disc_real_2: 0.22452  (0.25925)
     | > loss_disc_real_3: 0.23773  (0.24016)
     | > loss_disc_real_4: 0.21440  (0.24449)
     | > loss_disc_real_5: 0.24618  (0.26008)
     | > loss_0: 2.64518  (2.69202)
     | > grad_norm_0: 29.33607  (9.71169)
     | > loss_gen: 2.12284  (1.97806)
     | > loss_kl: 1.59856  (1.44310)
     | > loss_feat: 2.02829  (2.18338)
     | > loss_mel: 22.27712  (23.73664)
     | > loss_duration: 1.83725  (1.76450)
     | > amp_scaler: 512.00000  (512.00000)
     | > loss_1: 29.86407  (31.10568)
     | > grad_norm_1: 281.17902  (179.49570)
     | > current_lr_0: 0.00020 
     | > current_lr_1: 0.00020 
     | > step_time: 0.87130  (0.75881)
     | > loader_time: 0.02320  (0.00975)


[1m > EVALUATION [0m





> DataLoader initialization
| > Tokenizer:
	| > add_blank: True
	| > use_eos_bos: False
	| > use_phonemes: True
	| > phonemizer:
		| > phoneme language: es-es
		| > phoneme backend: gruut
	| > 6 not found characters:
	| > ɡ
	| > θ
	| > ͡
	| > ʃ
	| > ʝ
	| > ɾ
| > Number of instances : 4
 | > Preprocessing samples
 | > Max text length: 81
 | > Min text length: 46
 | > Avg text length: 66.5
 | 
 | > Max audio length: 80022.0
 | > Min audio length: 80022.0
 | > Avg audio length: 80022.0
 | > Num. instances discarded samples: 0
 | > Batch group size: 0.
 | > Synthesizing test sentences.



  [1m--> EVAL PERFORMANCE[0m
     | > avg_loader_time:[91m 0.50145 [0m(+0.01113)
     | > avg_loss_disc:[91m 2.88599 [0m(+0.55978)
     | > avg_loss_disc_real_0:[91m 0.27332 [0m(+0.12888)
     | > avg_loss_disc_real_1:[91m 0.18399 [0m(+0.03485)
     | > avg_loss_disc_real_2:[92m 0.21520 [0m(-0.07651)
     | > avg_loss_disc_real_3:[92m 0.18725 [0m(-0.14422)
     | > avg_loss_disc_real_4:[91m 0.27980 [0m(+0.00981)
     | > avg_loss_disc_real_5:[91m 0.18985 [0m(+0.05879)
     | > avg_loss_0:[91m 2.88599 [0m(+0.55978)
     | > avg_loss_gen:[92m 1.59895 [0m(-0.70454)
     | > avg_loss_kl:[92m 1.55829 [0m(-0.04107)
     | > avg_loss_feat:[92m 1.36305 [0m(-2.04596)
     | > avg_loss_mel:[92m 20.52205 [0m(-6.99567)
     | > avg_loss_duration:[92m 1.66633 [0m(-0.01162)
     | > avg_loss_1:[92m 26.70867 [0m(-9.79886)

 > BEST MODEL : /content/drive2/MyDrive/tts-ai/VITS-es-1-April-28-2023_09+16PM-7712052/best_model_2887.pth

[4m[1m > EPOCH: 26/100[0m
 --> /cont



> DataLoader initialization
| > Tokenizer:
	| > add_blank: True
	| > use_eos_bos: False
	| > use_phonemes: True
	| > phonemizer:
		| > phoneme language: es-es
		| > phoneme backend: gruut
	| > 6 not found characters:
	| > ɡ
	| > θ
	| > ͡
	| > ʃ
	| > ʝ
	| > ɾ
| > Number of instances : 401
 | > Preprocessing samples
 | > Max text length: 94
 | > Min text length: 20
 | > Avg text length: 61.45885286783042
 | 
 | > Max audio length: 80022.0
 | > Min audio length: 64166.0
 | > Avg audio length: 79982.45885286783
 | > Num. instances discarded samples: 0
 | > Batch group size: 0.



[1m   --> STEP: 13/26 -- GLOBAL_STEP: 2900[0m
     | > loss_disc: 2.67262  (2.55889)
     | > loss_disc_real_0: 0.19648  (0.17740)
     | > loss_disc_real_1: 0.22580  (0.21026)
     | > loss_disc_real_2: 0.24648  (0.23766)
     | > loss_disc_real_3: 0.24629  (0.23924)
     | > loss_disc_real_4: 0.22656  (0.22731)
     | > loss_disc_real_5: 0.20933  (0.24945)
     | > loss_0: 2.67262  (2.55889)
     | > grad_norm_0: 6.32674  (12.07480)
     | > loss_gen: 1.85390  (2.07201)
     | > loss_kl: 1.62181  (1.40991)
     | > loss_feat: 2.20076  (2.48551)
     | > loss_mel: 22.60306  (23.53851)
     | > loss_duration: 1.92209  (1.75883)
     | > amp_scaler: 512.00000  (512.00000)
     | > loss_1: 30.20163  (31.26477)
     | > grad_norm_1: 248.25851  (191.04359)
     | > current_lr_0: 0.00020 
     | > current_lr_1: 0.00020 
     | > step_time: 0.87910  (0.90663)
     | > loader_time: 0.02750  (0.00987)


[1m > EVALUATION [0m





> DataLoader initialization
| > Tokenizer:
	| > add_blank: True
	| > use_eos_bos: False
	| > use_phonemes: True
	| > phonemizer:
		| > phoneme language: es-es
		| > phoneme backend: gruut
	| > 6 not found characters:
	| > ɡ
	| > θ
	| > ͡
	| > ʃ
	| > ʝ
	| > ɾ
| > Number of instances : 4
 | > Preprocessing samples
 | > Max text length: 81
 | > Min text length: 46
 | > Avg text length: 66.5
 | 
 | > Max audio length: 80022.0
 | > Min audio length: 80022.0
 | > Avg audio length: 80022.0
 | > Num. instances discarded samples: 0
 | > Batch group size: 0.
 | > Synthesizing test sentences.



  [1m--> EVAL PERFORMANCE[0m
     | > avg_loader_time:[91m 0.63430 [0m(+0.13285)
     | > avg_loss_disc:[91m 3.06943 [0m(+0.18344)
     | > avg_loss_disc_real_0:[91m 0.80401 [0m(+0.53069)
     | > avg_loss_disc_real_1:[92m 0.17723 [0m(-0.00677)
     | > avg_loss_disc_real_2:[91m 0.23129 [0m(+0.01609)
     | > avg_loss_disc_real_3:[91m 0.22612 [0m(+0.03887)
     | > avg_loss_disc_real_4:[92m 0.25604 [0m(-0.02376)
     | > avg_loss_disc_real_5:[91m 0.27858 [0m(+0.08873)
     | > avg_loss_0:[91m 3.06943 [0m(+0.18344)
     | > avg_loss_gen:[91m 2.75262 [0m(+1.15368)
     | > avg_loss_kl:[92m 1.32642 [0m(-0.23187)
     | > avg_loss_feat:[91m 2.03262 [0m(+0.66957)
     | > avg_loss_mel:[91m 23.58642 [0m(+3.06437)
     | > avg_loss_duration:[92m 1.65531 [0m(-0.01103)
     | > avg_loss_1:[91m 31.35339 [0m(+4.64472)


[4m[1m > EPOCH: 27/100[0m
 --> /content/drive2/MyDrive/tts-ai/VITS-es-1-April-28-2023_09+16PM-7712052

[1m > TRAINING (2023-04-28 21:34:48) [



> DataLoader initialization
| > Tokenizer:
	| > add_blank: True
	| > use_eos_bos: False
	| > use_phonemes: True
	| > phonemizer:
		| > phoneme language: es-es
		| > phoneme backend: gruut
	| > 6 not found characters:
	| > ɡ
	| > θ
	| > ͡
	| > ʃ
	| > ʝ
	| > ɾ
| > Number of instances : 401
 | > Preprocessing samples
 | > Max text length: 94
 | > Min text length: 20
 | > Avg text length: 61.45885286783042
 | 
 | > Max audio length: 80022.0
 | > Min audio length: 64166.0
 | > Avg audio length: 79982.45885286783
 | > Num. instances discarded samples: 0
 | > Batch group size: 0.



[1m   --> STEP: 12/26 -- GLOBAL_STEP: 2925[0m
     | > loss_disc: 2.73657  (2.62134)
     | > loss_disc_real_0: 0.21142  (0.19192)
     | > loss_disc_real_1: 0.21175  (0.21635)
     | > loss_disc_real_2: 0.25121  (0.23520)
     | > loss_disc_real_3: 0.21352  (0.24039)
     | > loss_disc_real_4: 0.24996  (0.23996)
     | > loss_disc_real_5: 0.26268  (0.24395)
     | > loss_0: 2.73657  (2.62134)
     | > grad_norm_0: 9.02101  (17.46781)
     | > loss_gen: 2.06383  (2.12282)
     | > loss_kl: 1.32632  (1.47707)
     | > loss_feat: 2.05139  (2.60987)
     | > loss_mel: 22.32138  (24.13408)
     | > loss_duration: 1.81259  (1.74856)
     | > amp_scaler: 512.00000  (512.00000)
     | > loss_1: 29.57550  (32.09240)
     | > grad_norm_1: 234.71947  (134.60123)
     | > current_lr_0: 0.00020 
     | > current_lr_1: 0.00020 
     | > step_time: 0.70030  (0.80247)
     | > loader_time: 0.00760  (0.00872)


[1m > EVALUATION [0m





> DataLoader initialization
| > Tokenizer:
	| > add_blank: True
	| > use_eos_bos: False
	| > use_phonemes: True
	| > phonemizer:
		| > phoneme language: es-es
		| > phoneme backend: gruut
	| > 6 not found characters:
	| > ɡ
	| > θ
	| > ͡
	| > ʃ
	| > ʝ
	| > ɾ
| > Number of instances : 4
 | > Preprocessing samples
 | > Max text length: 81
 | > Min text length: 46
 | > Avg text length: 66.5
 | 
 | > Max audio length: 80022.0
 | > Min audio length: 80022.0
 | > Avg audio length: 80022.0
 | > Num. instances discarded samples: 0
 | > Batch group size: 0.
 | > Synthesizing test sentences.



  [1m--> EVAL PERFORMANCE[0m
     | > avg_loader_time:[91m 0.77577 [0m(+0.14148)
     | > avg_loss_disc:[92m 2.35571 [0m(-0.71372)
     | > avg_loss_disc_real_0:[92m 0.12345 [0m(-0.68055)
     | > avg_loss_disc_real_1:[92m 0.12636 [0m(-0.05086)
     | > avg_loss_disc_real_2:[92m 0.17166 [0m(-0.05963)
     | > avg_loss_disc_real_3:[92m 0.16448 [0m(-0.06164)
     | > avg_loss_disc_real_4:[92m 0.23841 [0m(-0.01763)
     | > avg_loss_disc_real_5:[92m 0.22146 [0m(-0.05711)
     | > avg_loss_0:[92m 2.35571 [0m(-0.71372)
     | > avg_loss_gen:[92m 1.99287 [0m(-0.75975)
     | > avg_loss_kl:[91m 1.64790 [0m(+0.32147)
     | > avg_loss_feat:[91m 3.43201 [0m(+1.39939)
     | > avg_loss_mel:[91m 24.28557 [0m(+0.69915)
     | > avg_loss_duration:[91m 1.66638 [0m(+0.01107)
     | > avg_loss_1:[91m 33.02472 [0m(+1.67133)


[4m[1m > EPOCH: 28/100[0m
 --> /content/drive2/MyDrive/tts-ai/VITS-es-1-April-28-2023_09+16PM-7712052

[1m > TRAINING (2023-04-28 21:35:22) [



> DataLoader initialization
| > Tokenizer:
	| > add_blank: True
	| > use_eos_bos: False
	| > use_phonemes: True
	| > phonemizer:
		| > phoneme language: es-es
		| > phoneme backend: gruut
	| > 6 not found characters:
	| > ɡ
	| > θ
	| > ͡
	| > ʃ
	| > ʝ
	| > ɾ
| > Number of instances : 401
 | > Preprocessing samples
 | > Max text length: 94
 | > Min text length: 20
 | > Avg text length: 61.45885286783042
 | 
 | > Max audio length: 80022.0
 | > Min audio length: 64166.0
 | > Avg audio length: 79982.45885286783
 | > Num. instances discarded samples: 0
 | > Batch group size: 0.



[1m   --> STEP: 11/26 -- GLOBAL_STEP: 2950[0m
     | > loss_disc: 2.60225  (2.63024)
     | > loss_disc_real_0: 0.23485  (0.19163)
     | > loss_disc_real_1: 0.17961  (0.21860)
     | > loss_disc_real_2: 0.23459  (0.23907)
     | > loss_disc_real_3: 0.28882  (0.24090)
     | > loss_disc_real_4: 0.24360  (0.21970)
     | > loss_disc_real_5: 0.20602  (0.24343)
     | > loss_0: 2.60225  (2.63024)
     | > grad_norm_0: 6.90396  (10.61746)
     | > loss_gen: 2.27141  (2.03559)
     | > loss_kl: 1.21764  (1.44684)
     | > loss_feat: 2.52728  (2.41456)
     | > loss_mel: 22.23238  (23.80138)
     | > loss_duration: 1.72886  (1.73645)
     | > amp_scaler: 512.00000  (512.00000)
     | > loss_1: 29.97756  (31.43483)
     | > grad_norm_1: 203.27017  (181.06929)
     | > current_lr_0: 0.00020 
     | > current_lr_1: 0.00020 
     | > step_time: 0.71230  (0.79221)
     | > loader_time: 0.00880  (0.00848)


[1m > EVALUATION [0m





> DataLoader initialization
| > Tokenizer:
	| > add_blank: True
	| > use_eos_bos: False
	| > use_phonemes: True
	| > phonemizer:
		| > phoneme language: es-es
		| > phoneme backend: gruut
	| > 6 not found characters:
	| > ɡ
	| > θ
	| > ͡
	| > ʃ
	| > ʝ
	| > ɾ
| > Number of instances : 4
 | > Preprocessing samples
 | > Max text length: 81
 | > Min text length: 46
 | > Avg text length: 66.5
 | 
 | > Max audio length: 80022.0
 | > Min audio length: 80022.0
 | > Avg audio length: 80022.0
 | > Num. instances discarded samples: 0
 | > Batch group size: 0.
 | > Synthesizing test sentences.



  [1m--> EVAL PERFORMANCE[0m
     | > avg_loader_time:[92m 0.54553 [0m(-0.23024)
     | > avg_loss_disc:[91m 2.79402 [0m(+0.43831)
     | > avg_loss_disc_real_0:[91m 0.46903 [0m(+0.34557)
     | > avg_loss_disc_real_1:[91m 0.14655 [0m(+0.02018)
     | > avg_loss_disc_real_2:[91m 0.29395 [0m(+0.12229)
     | > avg_loss_disc_real_3:[91m 0.35447 [0m(+0.18999)
     | > avg_loss_disc_real_4:[91m 0.42756 [0m(+0.18915)
     | > avg_loss_disc_real_5:[91m 0.28883 [0m(+0.06737)
     | > avg_loss_0:[91m 2.79402 [0m(+0.43831)
     | > avg_loss_gen:[91m 2.85263 [0m(+0.85976)
     | > avg_loss_kl:[91m 1.72292 [0m(+0.07502)
     | > avg_loss_feat:[92m 2.30280 [0m(-1.12920)
     | > avg_loss_mel:[91m 25.27821 [0m(+0.99264)
     | > avg_loss_duration:[91m 1.67542 [0m(+0.00904)
     | > avg_loss_1:[91m 33.83198 [0m(+0.80727)


[4m[1m > EPOCH: 29/100[0m
 --> /content/drive2/MyDrive/tts-ai/VITS-es-1-April-28-2023_09+16PM-7712052

[1m > TRAINING (2023-04-28 21:35:57) [



> DataLoader initialization
| > Tokenizer:
	| > add_blank: True
	| > use_eos_bos: False
	| > use_phonemes: True
	| > phonemizer:
		| > phoneme language: es-es
		| > phoneme backend: gruut
	| > 6 not found characters:
	| > ɡ
	| > θ
	| > ͡
	| > ʃ
	| > ʝ
	| > ɾ
| > Number of instances : 401
 | > Preprocessing samples
 | > Max text length: 94
 | > Min text length: 20
 | > Avg text length: 61.45885286783042
 | 
 | > Max audio length: 80022.0
 | > Min audio length: 64166.0
 | > Avg audio length: 79982.45885286783
 | > Num. instances discarded samples: 0
 | > Batch group size: 0.



[1m   --> STEP: 10/26 -- GLOBAL_STEP: 2975[0m
     | > loss_disc: 2.53737  (2.62502)
     | > loss_disc_real_0: 0.12391  (0.18987)
     | > loss_disc_real_1: 0.27563  (0.21274)
     | > loss_disc_real_2: 0.27962  (0.25160)
     | > loss_disc_real_3: 0.23583  (0.24568)
     | > loss_disc_real_4: 0.21501  (0.23011)
     | > loss_disc_real_5: 0.27076  (0.26204)
     | > loss_0: 2.53737  (2.62502)
     | > grad_norm_0: 8.76891  (15.93456)
     | > loss_gen: 2.09251  (2.04012)
     | > loss_kl: 1.54441  (1.48102)
     | > loss_feat: 2.48617  (2.53363)
     | > loss_mel: 23.14991  (24.78917)
     | > loss_duration: 1.72556  (1.73882)
     | > amp_scaler: 512.00000  (512.00000)
     | > loss_1: 30.99856  (32.58276)
     | > grad_norm_1: 204.97166  (180.52257)
     | > current_lr_0: 0.00020 
     | > current_lr_1: 0.00020 
     | > step_time: 0.74130  (0.73432)
     | > loader_time: 0.00790  (0.00940)


[1m > EVALUATION [0m





> DataLoader initialization
| > Tokenizer:
	| > add_blank: True
	| > use_eos_bos: False
	| > use_phonemes: True
	| > phonemizer:
		| > phoneme language: es-es
		| > phoneme backend: gruut
	| > 6 not found characters:
	| > ɡ
	| > θ
	| > ͡
	| > ʃ
	| > ʝ
	| > ɾ
| > Number of instances : 4
 | > Preprocessing samples
 | > Max text length: 81
 | > Min text length: 46
 | > Avg text length: 66.5
 | 
 | > Max audio length: 80022.0
 | > Min audio length: 80022.0
 | > Avg audio length: 80022.0
 | > Num. instances discarded samples: 0
 | > Batch group size: 0.
 | > Synthesizing test sentences.



  [1m--> EVAL PERFORMANCE[0m
     | > avg_loader_time:[92m 0.48340 [0m(-0.06213)
     | > avg_loss_disc:[92m 2.50636 [0m(-0.28766)
     | > avg_loss_disc_real_0:[92m 0.03463 [0m(-0.43440)
     | > avg_loss_disc_real_1:[91m 0.30153 [0m(+0.15498)
     | > avg_loss_disc_real_2:[92m 0.23581 [0m(-0.05814)
     | > avg_loss_disc_real_3:[92m 0.26274 [0m(-0.09173)
     | > avg_loss_disc_real_4:[92m 0.22549 [0m(-0.20208)
     | > avg_loss_disc_real_5:[92m 0.25032 [0m(-0.03851)
     | > avg_loss_0:[92m 2.50636 [0m(-0.28766)
     | > avg_loss_gen:[92m 2.15330 [0m(-0.69933)
     | > avg_loss_kl:[91m 1.86710 [0m(+0.14418)
     | > avg_loss_feat:[91m 2.85655 [0m(+0.55375)
     | > avg_loss_mel:[92m 23.10999 [0m(-2.16822)
     | > avg_loss_duration:[92m 1.62098 [0m(-0.05444)
     | > avg_loss_1:[92m 31.60792 [0m(-2.22406)


[4m[1m > EPOCH: 30/100[0m
 --> /content/drive2/MyDrive/tts-ai/VITS-es-1-April-28-2023_09+16PM-7712052

[1m > TRAINING (2023-04-28 21:36:32) [



> DataLoader initialization
| > Tokenizer:
	| > add_blank: True
	| > use_eos_bos: False
	| > use_phonemes: True
	| > phonemizer:
		| > phoneme language: es-es
		| > phoneme backend: gruut
	| > 6 not found characters:
	| > ɡ
	| > θ
	| > ͡
	| > ʃ
	| > ʝ
	| > ɾ
| > Number of instances : 401
 | > Preprocessing samples
 | > Max text length: 94
 | > Min text length: 20
 | > Avg text length: 61.45885286783042
 | 
 | > Max audio length: 80022.0
 | > Min audio length: 64166.0
 | > Avg audio length: 79982.45885286783
 | > Num. instances discarded samples: 0
 | > Batch group size: 0.



[1m   --> STEP: 9/26 -- GLOBAL_STEP: 3000[0m
     | > loss_disc: 2.54115  (2.55365)
     | > loss_disc_real_0: 0.18384  (0.20059)
     | > loss_disc_real_1: 0.19304  (0.23059)
     | > loss_disc_real_2: 0.23870  (0.23772)
     | > loss_disc_real_3: 0.24934  (0.24615)
     | > loss_disc_real_4: 0.22420  (0.22459)
     | > loss_disc_real_5: 0.29454  (0.25922)
     | > loss_0: 2.54115  (2.55365)
     | > grad_norm_0: 7.42733  (16.29092)
     | > loss_gen: 2.14724  (2.12644)
     | > loss_kl: 1.49684  (1.59402)
     | > loss_feat: 2.34140  (2.64830)
     | > loss_mel: 22.72805  (23.73387)
     | > loss_duration: 1.68787  (1.74298)
     | > amp_scaler: 512.00000  (512.00000)
     | > loss_1: 30.40141  (31.84560)
     | > grad_norm_1: 186.55441  (181.93262)
     | > current_lr_0: 0.00020 
     | > current_lr_1: 0.00020 
     | > step_time: 0.76510  (0.73626)
     | > loader_time: 0.00900  (0.00794)


[1m > EVALUATION [0m





> DataLoader initialization
| > Tokenizer:
	| > add_blank: True
	| > use_eos_bos: False
	| > use_phonemes: True
	| > phonemizer:
		| > phoneme language: es-es
		| > phoneme backend: gruut
	| > 6 not found characters:
	| > ɡ
	| > θ
	| > ͡
	| > ʃ
	| > ʝ
	| > ɾ
| > Number of instances : 4
 | > Preprocessing samples
 | > Max text length: 81
 | > Min text length: 46
 | > Avg text length: 66.5
 | 
 | > Max audio length: 80022.0
 | > Min audio length: 80022.0
 | > Avg audio length: 80022.0
 | > Num. instances discarded samples: 0
 | > Batch group size: 0.
 | > Synthesizing test sentences.



  [1m--> EVAL PERFORMANCE[0m
     | > avg_loader_time:[91m 0.48363 [0m(+0.00023)
     | > avg_loss_disc:[91m 2.81016 [0m(+0.30380)
     | > avg_loss_disc_real_0:[91m 0.40162 [0m(+0.36700)
     | > avg_loss_disc_real_1:[92m 0.21241 [0m(-0.08912)
     | > avg_loss_disc_real_2:[92m 0.20764 [0m(-0.02817)
     | > avg_loss_disc_real_3:[91m 0.31586 [0m(+0.05312)
     | > avg_loss_disc_real_4:[91m 0.26256 [0m(+0.03708)
     | > avg_loss_disc_real_5:[92m 0.24442 [0m(-0.00590)
     | > avg_loss_0:[91m 2.81016 [0m(+0.30380)
     | > avg_loss_gen:[92m 2.12361 [0m(-0.02970)
     | > avg_loss_kl:[92m 1.08345 [0m(-0.78365)
     | > avg_loss_feat:[92m 1.50443 [0m(-1.35212)
     | > avg_loss_mel:[92m 20.43092 [0m(-2.67907)
     | > avg_loss_duration:[91m 1.66494 [0m(+0.04396)
     | > avg_loss_1:[92m 26.80735 [0m(-4.80057)


[4m[1m > EPOCH: 31/100[0m
 --> /content/drive2/MyDrive/tts-ai/VITS-es-1-April-28-2023_09+16PM-7712052

[1m > TRAINING (2023-04-28 21:37:06) [



> DataLoader initialization
| > Tokenizer:
	| > add_blank: True
	| > use_eos_bos: False
	| > use_phonemes: True
	| > phonemizer:
		| > phoneme language: es-es
		| > phoneme backend: gruut
	| > 6 not found characters:
	| > ɡ
	| > θ
	| > ͡
	| > ʃ
	| > ʝ
	| > ɾ
| > Number of instances : 401
 | > Preprocessing samples
 | > Max text length: 94
 | > Min text length: 20
 | > Avg text length: 61.45885286783042
 | 
 | > Max audio length: 80022.0
 | > Min audio length: 64166.0
 | > Avg audio length: 79982.45885286783
 | > Num. instances discarded samples: 0
 | > Batch group size: 0.



[1m   --> STEP: 8/26 -- GLOBAL_STEP: 3025[0m
     | > loss_disc: 2.64550  (2.63667)
     | > loss_disc_real_0: 0.25644  (0.18856)
     | > loss_disc_real_1: 0.21737  (0.21857)
     | > loss_disc_real_2: 0.17823  (0.24098)
     | > loss_disc_real_3: 0.20508  (0.23661)
     | > loss_disc_real_4: 0.26314  (0.23483)
     | > loss_disc_real_5: 0.22797  (0.25434)
     | > loss_0: 2.64550  (2.63667)
     | > grad_norm_0: 16.52052  (11.28617)
     | > loss_gen: 1.87330  (1.93547)
     | > loss_kl: 1.48820  (1.49574)
     | > loss_feat: 2.33368  (2.12206)
     | > loss_mel: 23.44549  (23.02891)
     | > loss_duration: 1.80062  (1.74999)
     | > amp_scaler: 512.00000  (512.00000)
     | > loss_1: 30.94129  (30.33218)
     | > grad_norm_1: 263.16229  (257.24344)
     | > current_lr_0: 0.00020 
     | > current_lr_1: 0.00020 
     | > step_time: 0.89220  (0.80478)
     | > loader_time: 0.00970  (0.00769)


[1m > EVALUATION [0m





> DataLoader initialization
| > Tokenizer:
	| > add_blank: True
	| > use_eos_bos: False
	| > use_phonemes: True
	| > phonemizer:
		| > phoneme language: es-es
		| > phoneme backend: gruut
	| > 6 not found characters:
	| > ɡ
	| > θ
	| > ͡
	| > ʃ
	| > ʝ
	| > ɾ
| > Number of instances : 4
 | > Preprocessing samples
 | > Max text length: 81
 | > Min text length: 46
 | > Avg text length: 66.5
 | 
 | > Max audio length: 80022.0
 | > Min audio length: 80022.0
 | > Avg audio length: 80022.0
 | > Num. instances discarded samples: 0
 | > Batch group size: 0.
 | > Synthesizing test sentences.



  [1m--> EVAL PERFORMANCE[0m
     | > avg_loader_time:[91m 0.68043 [0m(+0.19681)
     | > avg_loss_disc:[92m 2.51746 [0m(-0.29270)
     | > avg_loss_disc_real_0:[92m 0.20113 [0m(-0.20050)
     | > avg_loss_disc_real_1:[92m 0.12088 [0m(-0.09153)
     | > avg_loss_disc_real_2:[91m 0.23854 [0m(+0.03091)
     | > avg_loss_disc_real_3:[92m 0.23953 [0m(-0.07633)
     | > avg_loss_disc_real_4:[92m 0.18668 [0m(-0.07589)
     | > avg_loss_disc_real_5:[92m 0.21225 [0m(-0.03217)
     | > avg_loss_0:[92m 2.51746 [0m(-0.29270)
     | > avg_loss_gen:[92m 1.90840 [0m(-0.21521)
     | > avg_loss_kl:[91m 1.58931 [0m(+0.50586)
     | > avg_loss_feat:[91m 2.61757 [0m(+1.11314)
     | > avg_loss_mel:[91m 20.76747 [0m(+0.33655)
     | > avg_loss_duration:[91m 1.66953 [0m(+0.00459)
     | > avg_loss_1:[91m 28.55227 [0m(+1.74493)


[4m[1m > EPOCH: 32/100[0m
 --> /content/drive2/MyDrive/tts-ai/VITS-es-1-April-28-2023_09+16PM-7712052

[1m > TRAINING (2023-04-28 21:37:41) [



> DataLoader initialization
| > Tokenizer:
	| > add_blank: True
	| > use_eos_bos: False
	| > use_phonemes: True
	| > phonemizer:
		| > phoneme language: es-es
		| > phoneme backend: gruut
	| > 6 not found characters:
	| > ɡ
	| > θ
	| > ͡
	| > ʃ
	| > ʝ
	| > ɾ
| > Number of instances : 401
 | > Preprocessing samples
 | > Max text length: 94
 | > Min text length: 20
 | > Avg text length: 61.45885286783042
 | 
 | > Max audio length: 80022.0
 | > Min audio length: 64166.0
 | > Avg audio length: 79982.45885286783
 | > Num. instances discarded samples: 0
 | > Batch group size: 0.



[1m   --> STEP: 7/26 -- GLOBAL_STEP: 3050[0m
     | > loss_disc: 2.61730  (2.59675)
     | > loss_disc_real_0: 0.18387  (0.18099)
     | > loss_disc_real_1: 0.21734  (0.19871)
     | > loss_disc_real_2: 0.21919  (0.24434)
     | > loss_disc_real_3: 0.23525  (0.25777)
     | > loss_disc_real_4: 0.20527  (0.22277)
     | > loss_disc_real_5: 0.25939  (0.24954)
     | > loss_0: 2.61730  (2.59675)
     | > grad_norm_0: 4.86254  (7.89994)
     | > loss_gen: 2.12161  (2.05984)
     | > loss_kl: 1.40546  (1.51669)
     | > loss_feat: 2.14989  (2.29726)
     | > loss_mel: 23.94451  (23.42156)
     | > loss_duration: 1.73417  (1.74371)
     | > amp_scaler: 512.00000  (512.00000)
     | > loss_1: 31.35565  (31.03906)
     | > grad_norm_1: 178.04750  (222.99757)
     | > current_lr_0: 0.00020 
     | > current_lr_1: 0.00020 
     | > step_time: 0.73030  (0.83743)
     | > loader_time: 0.00710  (0.00723)


[1m > EVALUATION [0m





> DataLoader initialization
| > Tokenizer:
	| > add_blank: True
	| > use_eos_bos: False
	| > use_phonemes: True
	| > phonemizer:
		| > phoneme language: es-es
		| > phoneme backend: gruut
	| > 6 not found characters:
	| > ɡ
	| > θ
	| > ͡
	| > ʃ
	| > ʝ
	| > ɾ
| > Number of instances : 4
 | > Preprocessing samples
 | > Max text length: 81
 | > Min text length: 46
 | > Avg text length: 66.5
 | 
 | > Max audio length: 80022.0
 | > Min audio length: 80022.0
 | > Avg audio length: 80022.0
 | > Num. instances discarded samples: 0
 | > Batch group size: 0.
 | > Synthesizing test sentences.



  [1m--> EVAL PERFORMANCE[0m
     | > avg_loader_time:[92m 0.49059 [0m(-0.18985)
     | > avg_loss_disc:[92m 2.24066 [0m(-0.27680)
     | > avg_loss_disc_real_0:[91m 0.21727 [0m(+0.01614)
     | > avg_loss_disc_real_1:[91m 0.14668 [0m(+0.02580)
     | > avg_loss_disc_real_2:[92m 0.19938 [0m(-0.03916)
     | > avg_loss_disc_real_3:[91m 0.24741 [0m(+0.00788)
     | > avg_loss_disc_real_4:[91m 0.19430 [0m(+0.00763)
     | > avg_loss_disc_real_5:[92m 0.14167 [0m(-0.07058)
     | > avg_loss_0:[92m 2.24066 [0m(-0.27680)
     | > avg_loss_gen:[91m 2.45780 [0m(+0.54941)
     | > avg_loss_kl:[92m 1.41507 [0m(-0.17423)
     | > avg_loss_feat:[91m 4.44219 [0m(+1.82462)
     | > avg_loss_mel:[91m 27.35872 [0m(+6.59126)
     | > avg_loss_duration:[91m 1.67574 [0m(+0.00621)
     | > avg_loss_1:[91m 37.34954 [0m(+8.79727)


[4m[1m > EPOCH: 33/100[0m
 --> /content/drive2/MyDrive/tts-ai/VITS-es-1-April-28-2023_09+16PM-7712052

[1m > TRAINING (2023-04-28 21:38:14) [



> DataLoader initialization
| > Tokenizer:
	| > add_blank: True
	| > use_eos_bos: False
	| > use_phonemes: True
	| > phonemizer:
		| > phoneme language: es-es
		| > phoneme backend: gruut
	| > 6 not found characters:
	| > ɡ
	| > θ
	| > ͡
	| > ʃ
	| > ʝ
	| > ɾ
| > Number of instances : 401
 | > Preprocessing samples
 | > Max text length: 94
 | > Min text length: 20
 | > Avg text length: 61.45885286783042
 | 
 | > Max audio length: 80022.0
 | > Min audio length: 64166.0
 | > Avg audio length: 79982.45885286783
 | > Num. instances discarded samples: 0
 | > Batch group size: 0.



[1m   --> STEP: 6/26 -- GLOBAL_STEP: 3075[0m
     | > loss_disc: 2.67097  (2.57174)
     | > loss_disc_real_0: 0.16160  (0.16439)
     | > loss_disc_real_1: 0.26491  (0.21748)
     | > loss_disc_real_2: 0.33802  (0.25424)
     | > loss_disc_real_3: 0.30260  (0.23694)
     | > loss_disc_real_4: 0.20869  (0.26727)
     | > loss_disc_real_5: 0.29363  (0.22862)
     | > loss_0: 2.67097  (2.57174)
     | > grad_norm_0: 7.75995  (6.77942)
     | > loss_gen: 2.26016  (2.14600)
     | > loss_kl: 1.31277  (1.51653)
     | > loss_feat: 2.51493  (2.57854)
     | > loss_mel: 23.81860  (23.50931)
     | > loss_duration: 1.76778  (1.74442)
     | > amp_scaler: 512.00000  (512.00000)
     | > loss_1: 31.67423  (31.49480)
     | > grad_norm_1: 250.21558  (204.39032)
     | > current_lr_0: 0.00020 
     | > current_lr_1: 0.00020 
     | > step_time: 0.72710  (0.76417)
     | > loader_time: 0.00720  (0.00875)


[1m > EVALUATION [0m





> DataLoader initialization
| > Tokenizer:
	| > add_blank: True
	| > use_eos_bos: False
	| > use_phonemes: True
	| > phonemizer:
		| > phoneme language: es-es
		| > phoneme backend: gruut
	| > 6 not found characters:
	| > ɡ
	| > θ
	| > ͡
	| > ʃ
	| > ʝ
	| > ɾ
| > Number of instances : 4
 | > Preprocessing samples
 | > Max text length: 81
 | > Min text length: 46
 | > Avg text length: 66.5
 | 
 | > Max audio length: 80022.0
 | > Min audio length: 80022.0
 | > Avg audio length: 80022.0
 | > Num. instances discarded samples: 0
 | > Batch group size: 0.
 | > Synthesizing test sentences.



  [1m--> EVAL PERFORMANCE[0m
     | > avg_loader_time:[92m 0.47533 [0m(-0.01525)
     | > avg_loss_disc:[91m 2.93386 [0m(+0.69320)
     | > avg_loss_disc_real_0:[91m 0.41033 [0m(+0.19306)
     | > avg_loss_disc_real_1:[91m 0.22922 [0m(+0.08254)
     | > avg_loss_disc_real_2:[91m 0.23078 [0m(+0.03140)
     | > avg_loss_disc_real_3:[91m 0.26396 [0m(+0.01655)
     | > avg_loss_disc_real_4:[91m 0.32427 [0m(+0.12997)
     | > avg_loss_disc_real_5:[91m 0.23865 [0m(+0.09698)
     | > avg_loss_0:[91m 2.93386 [0m(+0.69320)
     | > avg_loss_gen:[92m 1.96472 [0m(-0.49308)
     | > avg_loss_kl:[91m 1.86777 [0m(+0.45270)
     | > avg_loss_feat:[92m 1.70006 [0m(-2.74213)
     | > avg_loss_mel:[92m 21.15711 [0m(-6.20161)
     | > avg_loss_duration:[92m 1.64570 [0m(-0.03005)
     | > avg_loss_1:[92m 28.33537 [0m(-9.01418)


[4m[1m > EPOCH: 34/100[0m
 --> /content/drive2/MyDrive/tts-ai/VITS-es-1-April-28-2023_09+16PM-7712052

[1m > TRAINING (2023-04-28 21:38:49) [



> DataLoader initialization
| > Tokenizer:
	| > add_blank: True
	| > use_eos_bos: False
	| > use_phonemes: True
	| > phonemizer:
		| > phoneme language: es-es
		| > phoneme backend: gruut
	| > 6 not found characters:
	| > ɡ
	| > θ
	| > ͡
	| > ʃ
	| > ʝ
	| > ɾ
| > Number of instances : 401
 | > Preprocessing samples
 | > Max text length: 94
 | > Min text length: 20
 | > Avg text length: 61.45885286783042
 | 
 | > Max audio length: 80022.0
 | > Min audio length: 64166.0
 | > Avg audio length: 79982.45885286783
 | > Num. instances discarded samples: 0
 | > Batch group size: 0.



[1m   --> STEP: 5/26 -- GLOBAL_STEP: 3100[0m
     | > loss_disc: 2.61561  (2.73037)
     | > loss_disc_real_0: 0.29409  (0.21009)
     | > loss_disc_real_1: 0.17904  (0.24147)
     | > loss_disc_real_2: 0.28449  (0.25715)
     | > loss_disc_real_3: 0.18997  (0.26535)
     | > loss_disc_real_4: 0.30239  (0.24251)
     | > loss_disc_real_5: 0.25618  (0.24938)
     | > loss_0: 2.61561  (2.73037)
     | > grad_norm_0: 13.64611  (11.71760)
     | > loss_gen: 1.92978  (1.98188)
     | > loss_kl: 1.39331  (1.48229)
     | > loss_feat: 2.64617  (2.18793)
     | > loss_mel: 24.17492  (23.66819)
     | > loss_duration: 1.68908  (1.73537)
     | > amp_scaler: 512.00000  (512.00000)
     | > loss_1: 31.83326  (31.05565)
     | > grad_norm_1: 196.98544  (215.22937)
     | > current_lr_0: 0.00020 
     | > current_lr_1: 0.00020 
     | > step_time: 0.73120  (0.73098)
     | > loader_time: 0.00550  (0.00683)


[1m > EVALUATION [0m





> DataLoader initialization
| > Tokenizer:
	| > add_blank: True
	| > use_eos_bos: False
	| > use_phonemes: True
	| > phonemizer:
		| > phoneme language: es-es
		| > phoneme backend: gruut
	| > 6 not found characters:
	| > ɡ
	| > θ
	| > ͡
	| > ʃ
	| > ʝ
	| > ɾ
| > Number of instances : 4
 | > Preprocessing samples
 | > Max text length: 81
 | > Min text length: 46
 | > Avg text length: 66.5
 | 
 | > Max audio length: 80022.0
 | > Min audio length: 80022.0
 | > Avg audio length: 80022.0
 | > Num. instances discarded samples: 0
 | > Batch group size: 0.
 | > Synthesizing test sentences.



  [1m--> EVAL PERFORMANCE[0m
     | > avg_loader_time:[91m 0.49876 [0m(+0.02343)
     | > avg_loss_disc:[91m 3.17990 [0m(+0.24604)
     | > avg_loss_disc_real_0:[91m 0.77397 [0m(+0.36364)
     | > avg_loss_disc_real_1:[91m 0.30042 [0m(+0.07120)
     | > avg_loss_disc_real_2:[91m 0.35422 [0m(+0.12343)
     | > avg_loss_disc_real_3:[91m 0.38788 [0m(+0.12392)
     | > avg_loss_disc_real_4:[92m 0.30168 [0m(-0.02259)
     | > avg_loss_disc_real_5:[91m 0.29456 [0m(+0.05591)
     | > avg_loss_0:[91m 3.17990 [0m(+0.24604)
     | > avg_loss_gen:[91m 3.04087 [0m(+1.07615)
     | > avg_loss_kl:[92m 1.51995 [0m(-0.34782)
     | > avg_loss_feat:[92m 1.51953 [0m(-0.18053)
     | > avg_loss_mel:[92m 20.17322 [0m(-0.98389)
     | > avg_loss_duration:[91m 1.65793 [0m(+0.01223)
     | > avg_loss_1:[92m 27.91150 [0m(-0.42387)


[4m[1m > EPOCH: 35/100[0m
 --> /content/drive2/MyDrive/tts-ai/VITS-es-1-April-28-2023_09+16PM-7712052

[1m > TRAINING (2023-04-28 21:39:23) [



> DataLoader initialization
| > Tokenizer:
	| > add_blank: True
	| > use_eos_bos: False
	| > use_phonemes: True
	| > phonemizer:
		| > phoneme language: es-es
		| > phoneme backend: gruut
	| > 6 not found characters:
	| > ɡ
	| > θ
	| > ͡
	| > ʃ
	| > ʝ
	| > ɾ
| > Number of instances : 401
 | > Preprocessing samples
 | > Max text length: 94
 | > Min text length: 20
 | > Avg text length: 61.45885286783042
 | 
 | > Max audio length: 80022.0
 | > Min audio length: 64166.0
 | > Avg audio length: 79982.45885286783
 | > Num. instances discarded samples: 0
 | > Batch group size: 0.



[1m   --> STEP: 4/26 -- GLOBAL_STEP: 3125[0m
     | > loss_disc: 2.67406  (2.83894)
     | > loss_disc_real_0: 0.23594  (0.24907)
     | > loss_disc_real_1: 0.18999  (0.21068)
     | > loss_disc_real_2: 0.16851  (0.25527)
     | > loss_disc_real_3: 0.18888  (0.25225)
     | > loss_disc_real_4: 0.21090  (0.29006)
     | > loss_disc_real_5: 0.18307  (0.23843)
     | > loss_0: 2.67406  (2.83894)
     | > grad_norm_0: 8.90100  (28.37074)
     | > loss_gen: 2.20614  (1.88140)
     | > loss_kl: 1.74455  (1.66600)
     | > loss_feat: 2.36117  (2.18430)
     | > loss_mel: 23.43653  (23.76802)
     | > loss_duration: 1.73898  (1.75473)
     | > amp_scaler: 512.00000  (512.00000)
     | > loss_1: 31.48738  (31.25445)
     | > grad_norm_1: 212.25771  (154.06438)
     | > current_lr_0: 0.00020 
     | > current_lr_1: 0.00020 
     | > step_time: 0.72560  (0.73361)
     | > loader_time: 0.00820  (0.00905)


[1m > EVALUATION [0m





> DataLoader initialization
| > Tokenizer:
	| > add_blank: True
	| > use_eos_bos: False
	| > use_phonemes: True
	| > phonemizer:
		| > phoneme language: es-es
		| > phoneme backend: gruut
	| > 6 not found characters:
	| > ɡ
	| > θ
	| > ͡
	| > ʃ
	| > ʝ
	| > ɾ
| > Number of instances : 4
 | > Preprocessing samples
 | > Max text length: 81
 | > Min text length: 46
 | > Avg text length: 66.5
 | 
 | > Max audio length: 80022.0
 | > Min audio length: 80022.0
 | > Avg audio length: 80022.0
 | > Num. instances discarded samples: 0
 | > Batch group size: 0.
 | > Synthesizing test sentences.



  [1m--> EVAL PERFORMANCE[0m
     | > avg_loader_time:[91m 0.74290 [0m(+0.24414)
     | > avg_loss_disc:[92m 2.53419 [0m(-0.64571)
     | > avg_loss_disc_real_0:[92m 0.16117 [0m(-0.61280)
     | > avg_loss_disc_real_1:[92m 0.08091 [0m(-0.21951)
     | > avg_loss_disc_real_2:[92m 0.18075 [0m(-0.17346)
     | > avg_loss_disc_real_3:[92m 0.18956 [0m(-0.19832)
     | > avg_loss_disc_real_4:[92m 0.20808 [0m(-0.09359)
     | > avg_loss_disc_real_5:[92m 0.22716 [0m(-0.06740)
     | > avg_loss_0:[92m 2.53419 [0m(-0.64571)
     | > avg_loss_gen:[92m 1.62874 [0m(-1.41212)
     | > avg_loss_kl:[91m 1.60820 [0m(+0.08824)
     | > avg_loss_feat:[91m 3.01501 [0m(+1.49548)
     | > avg_loss_mel:[91m 25.13700 [0m(+4.96379)
     | > avg_loss_duration:[91m 1.69930 [0m(+0.04137)
     | > avg_loss_1:[91m 33.08826 [0m(+5.17676)


[4m[1m > EPOCH: 36/100[0m
 --> /content/drive2/MyDrive/tts-ai/VITS-es-1-April-28-2023_09+16PM-7712052

[1m > TRAINING (2023-04-28 21:39:58) [



> DataLoader initialization
| > Tokenizer:
	| > add_blank: True
	| > use_eos_bos: False
	| > use_phonemes: True
	| > phonemizer:
		| > phoneme language: es-es
		| > phoneme backend: gruut
	| > 6 not found characters:
	| > ɡ
	| > θ
	| > ͡
	| > ʃ
	| > ʝ
	| > ɾ
| > Number of instances : 401
 | > Preprocessing samples
 | > Max text length: 94
 | > Min text length: 20
 | > Avg text length: 61.45885286783042
 | 
 | > Max audio length: 80022.0
 | > Min audio length: 64166.0
 | > Avg audio length: 79982.45885286783
 | > Num. instances discarded samples: 0
 | > Batch group size: 0.



[1m   --> STEP: 3/26 -- GLOBAL_STEP: 3150[0m
     | > loss_disc: 2.63041  (2.63291)
     | > loss_disc_real_0: 0.18184  (0.20608)
     | > loss_disc_real_1: 0.29399  (0.23109)
     | > loss_disc_real_2: 0.27077  (0.22999)
     | > loss_disc_real_3: 0.24873  (0.23188)
     | > loss_disc_real_4: 0.24438  (0.22458)
     | > loss_disc_real_5: 0.23108  (0.23812)
     | > loss_0: 2.63041  (2.63291)
     | > grad_norm_0: 5.02330  (8.37891)
     | > loss_gen: 1.95271  (1.99468)
     | > loss_kl: 1.47061  (1.50060)
     | > loss_feat: 2.03406  (2.09241)
     | > loss_mel: 23.16169  (22.56177)
     | > loss_duration: 1.80177  (1.76924)
     | > amp_scaler: 512.00000  (512.00000)
     | > loss_1: 30.42085  (29.91871)
     | > grad_norm_1: 384.55261  (300.53540)
     | > current_lr_0: 0.00020 
     | > current_lr_1: 0.00020 
     | > step_time: 0.87490  (0.83920)
     | > loader_time: 0.00670  (0.01323)


[1m > EVALUATION [0m





> DataLoader initialization
| > Tokenizer:
	| > add_blank: True
	| > use_eos_bos: False
	| > use_phonemes: True
	| > phonemizer:
		| > phoneme language: es-es
		| > phoneme backend: gruut
	| > 6 not found characters:
	| > ɡ
	| > θ
	| > ͡
	| > ʃ
	| > ʝ
	| > ɾ
| > Number of instances : 4
 | > Preprocessing samples
 | > Max text length: 81
 | > Min text length: 46
 | > Avg text length: 66.5
 | 
 | > Max audio length: 80022.0
 | > Min audio length: 80022.0
 | > Avg audio length: 80022.0
 | > Num. instances discarded samples: 0
 | > Batch group size: 0.
 | > Synthesizing test sentences.



  [1m--> EVAL PERFORMANCE[0m
     | > avg_loader_time:[92m 0.72754 [0m(-0.01537)
     | > avg_loss_disc:[91m 2.64675 [0m(+0.11256)
     | > avg_loss_disc_real_0:[92m 0.12185 [0m(-0.03932)
     | > avg_loss_disc_real_1:[91m 0.19538 [0m(+0.11447)
     | > avg_loss_disc_real_2:[92m 0.16385 [0m(-0.01690)
     | > avg_loss_disc_real_3:[91m 0.19357 [0m(+0.00401)
     | > avg_loss_disc_real_4:[92m 0.20198 [0m(-0.00610)
     | > avg_loss_disc_real_5:[92m 0.18992 [0m(-0.03724)
     | > avg_loss_0:[91m 2.64675 [0m(+0.11256)
     | > avg_loss_gen:[92m 1.62829 [0m(-0.00045)
     | > avg_loss_kl:[92m 1.47537 [0m(-0.13283)
     | > avg_loss_feat:[91m 3.32357 [0m(+0.30856)
     | > avg_loss_mel:[91m 25.17974 [0m(+0.04273)
     | > avg_loss_duration:[92m 1.64820 [0m(-0.05110)
     | > avg_loss_1:[91m 33.25517 [0m(+0.16691)


[4m[1m > EPOCH: 37/100[0m
 --> /content/drive2/MyDrive/tts-ai/VITS-es-1-April-28-2023_09+16PM-7712052

[1m > TRAINING (2023-04-28 21:40:31) [



> DataLoader initialization
| > Tokenizer:
	| > add_blank: True
	| > use_eos_bos: False
	| > use_phonemes: True
	| > phonemizer:
		| > phoneme language: es-es
		| > phoneme backend: gruut
	| > 6 not found characters:
	| > ɡ
	| > θ
	| > ͡
	| > ʃ
	| > ʝ
	| > ɾ
| > Number of instances : 401
 | > Preprocessing samples
 | > Max text length: 94
 | > Min text length: 20
 | > Avg text length: 61.45885286783042
 | 
 | > Max audio length: 80022.0
 | > Min audio length: 64166.0
 | > Avg audio length: 79982.45885286783
 | > Num. instances discarded samples: 0
 | > Batch group size: 0.



[1m   --> STEP: 2/26 -- GLOBAL_STEP: 3175[0m
     | > loss_disc: 2.65650  (2.63053)
     | > loss_disc_real_0: 0.32356  (0.29553)
     | > loss_disc_real_1: 0.23644  (0.19614)
     | > loss_disc_real_2: 0.29617  (0.30316)
     | > loss_disc_real_3: 0.30420  (0.27620)
     | > loss_disc_real_4: 0.27326  (0.22728)
     | > loss_disc_real_5: 0.26609  (0.26333)
     | > loss_0: 2.65650  (2.63053)
     | > grad_norm_0: 20.31141  (18.72904)
     | > loss_gen: 2.22583  (2.32536)
     | > loss_kl: 1.63414  (1.54607)
     | > loss_feat: 2.60436  (2.51346)
     | > loss_mel: 24.05099  (25.16233)
     | > loss_duration: 1.75427  (1.74413)
     | > amp_scaler: 512.00000  (512.00000)
     | > loss_1: 32.26959  (33.29136)
     | > grad_norm_1: 183.17348  (179.83408)
     | > current_lr_0: 0.00020 
     | > current_lr_1: 0.00020 
     | > step_time: 0.91900  (0.92367)
     | > loader_time: 0.00710  (0.00691)


[1m > EVALUATION [0m





> DataLoader initialization
| > Tokenizer:
	| > add_blank: True
	| > use_eos_bos: False
	| > use_phonemes: True
	| > phonemizer:
		| > phoneme language: es-es
		| > phoneme backend: gruut
	| > 6 not found characters:
	| > ɡ
	| > θ
	| > ͡
	| > ʃ
	| > ʝ
	| > ɾ
| > Number of instances : 4
 | > Preprocessing samples
 | > Max text length: 81
 | > Min text length: 46
 | > Avg text length: 66.5
 | 
 | > Max audio length: 80022.0
 | > Min audio length: 80022.0
 | > Avg audio length: 80022.0
 | > Num. instances discarded samples: 0
 | > Batch group size: 0.
 | > Synthesizing test sentences.



  [1m--> EVAL PERFORMANCE[0m
     | > avg_loader_time:[92m 0.50947 [0m(-0.21807)
     | > avg_loss_disc:[92m 2.32069 [0m(-0.32606)
     | > avg_loss_disc_real_0:[91m 0.18551 [0m(+0.06366)
     | > avg_loss_disc_real_1:[92m 0.12594 [0m(-0.06944)
     | > avg_loss_disc_real_2:[92m 0.09937 [0m(-0.06448)
     | > avg_loss_disc_real_3:[92m 0.17439 [0m(-0.01918)
     | > avg_loss_disc_real_4:[92m 0.11446 [0m(-0.08752)
     | > avg_loss_disc_real_5:[92m 0.14543 [0m(-0.04449)
     | > avg_loss_0:[92m 2.32069 [0m(-0.32606)
     | > avg_loss_gen:[91m 2.09081 [0m(+0.46252)
     | > avg_loss_kl:[91m 1.76025 [0m(+0.28488)
     | > avg_loss_feat:[91m 4.44709 [0m(+1.12352)
     | > avg_loss_mel:[91m 25.28371 [0m(+0.10397)
     | > avg_loss_duration:[91m 1.67646 [0m(+0.02826)
     | > avg_loss_1:[91m 35.25832 [0m(+2.00315)


[4m[1m > EPOCH: 38/100[0m
 --> /content/drive2/MyDrive/tts-ai/VITS-es-1-April-28-2023_09+16PM-7712052

[1m > TRAINING (2023-04-28 21:41:07) [



> DataLoader initialization
| > Tokenizer:
	| > add_blank: True
	| > use_eos_bos: False
	| > use_phonemes: True
	| > phonemizer:
		| > phoneme language: es-es
		| > phoneme backend: gruut
	| > 6 not found characters:
	| > ɡ
	| > θ
	| > ͡
	| > ʃ
	| > ʝ
	| > ɾ
| > Number of instances : 401
 | > Preprocessing samples
 | > Max text length: 94
 | > Min text length: 20
 | > Avg text length: 61.45885286783042
 | 
 | > Max audio length: 80022.0
 | > Min audio length: 64166.0
 | > Avg audio length: 79982.45885286783
 | > Num. instances discarded samples: 0
 | > Batch group size: 0.



[1m   --> STEP: 1/26 -- GLOBAL_STEP: 3200[0m
     | > loss_disc: 2.69583  (2.69583)
     | > loss_disc_real_0: 0.22712  (0.22712)
     | > loss_disc_real_1: 0.24493  (0.24493)
     | > loss_disc_real_2: 0.23725  (0.23725)
     | > loss_disc_real_3: 0.28306  (0.28306)
     | > loss_disc_real_4: 0.23734  (0.23734)
     | > loss_disc_real_5: 0.24363  (0.24363)
     | > loss_0: 2.69583  (2.69583)
     | > grad_norm_0: 6.53823  (6.53823)
     | > loss_gen: 1.85563  (1.85563)
     | > loss_kl: 1.80907  (1.80907)
     | > loss_feat: 1.90788  (1.90788)
     | > loss_mel: 22.19630  (22.19630)
     | > loss_duration: 1.73432  (1.73432)
     | > amp_scaler: 512.00000  (512.00000)
     | > loss_1: 29.50321  (29.50321)
     | > grad_norm_1: 229.55370  (229.55370)
     | > current_lr_0: 0.00020 
     | > current_lr_1: 0.00020 
     | > step_time: 0.74970  (0.74969)
     | > loader_time: 0.00680  (0.00685)


[1m > EVALUATION [0m





> DataLoader initialization
| > Tokenizer:
	| > add_blank: True
	| > use_eos_bos: False
	| > use_phonemes: True
	| > phonemizer:
		| > phoneme language: es-es
		| > phoneme backend: gruut
	| > 6 not found characters:
	| > ɡ
	| > θ
	| > ͡
	| > ʃ
	| > ʝ
	| > ɾ
| > Number of instances : 4
 | > Preprocessing samples
 | > Max text length: 81
 | > Min text length: 46
 | > Avg text length: 66.5
 | 
 | > Max audio length: 80022.0
 | > Min audio length: 80022.0
 | > Avg audio length: 80022.0
 | > Num. instances discarded samples: 0
 | > Batch group size: 0.
 | > Synthesizing test sentences.



  [1m--> EVAL PERFORMANCE[0m
     | > avg_loader_time:[92m 0.47519 [0m(-0.03428)
     | > avg_loss_disc:[91m 2.60189 [0m(+0.28120)
     | > avg_loss_disc_real_0:[92m 0.15311 [0m(-0.03240)
     | > avg_loss_disc_real_1:[91m 0.17668 [0m(+0.05074)
     | > avg_loss_disc_real_2:[91m 0.19380 [0m(+0.09444)
     | > avg_loss_disc_real_3:[91m 0.25084 [0m(+0.07645)
     | > avg_loss_disc_real_4:[91m 0.24766 [0m(+0.13320)
     | > avg_loss_disc_real_5:[91m 0.23554 [0m(+0.09011)
     | > avg_loss_0:[91m 2.60189 [0m(+0.28120)
     | > avg_loss_gen:[92m 1.89051 [0m(-0.20030)
     | > avg_loss_kl:[92m 1.66616 [0m(-0.09409)
     | > avg_loss_feat:[92m 2.42327 [0m(-2.02382)
     | > avg_loss_mel:[92m 23.22198 [0m(-2.06173)
     | > avg_loss_duration:[92m 1.66167 [0m(-0.01479)
     | > avg_loss_1:[92m 30.86359 [0m(-4.39474)


[4m[1m > EPOCH: 39/100[0m
 --> /content/drive2/MyDrive/tts-ai/VITS-es-1-April-28-2023_09+16PM-7712052

[1m > TRAINING (2023-04-28 21:41:41) [



> DataLoader initialization
| > Tokenizer:
	| > add_blank: True
	| > use_eos_bos: False
	| > use_phonemes: True
	| > phonemizer:
		| > phoneme language: es-es
		| > phoneme backend: gruut
	| > 6 not found characters:
	| > ɡ
	| > θ
	| > ͡
	| > ʃ
	| > ʝ
	| > ɾ
| > Number of instances : 401
 | > Preprocessing samples
 | > Max text length: 94
 | > Min text length: 20
 | > Avg text length: 61.45885286783042
 | 
 | > Max audio length: 80022.0
 | > Min audio length: 64166.0
 | > Avg audio length: 79982.45885286783
 | > Num. instances discarded samples: 0
 | > Batch group size: 0.



[1m   --> STEP: 0/26 -- GLOBAL_STEP: 3225[0m
     | > loss_disc: 2.39853  (2.39853)
     | > loss_disc_real_0: 0.11751  (0.11751)
     | > loss_disc_real_1: 0.15558  (0.15558)
     | > loss_disc_real_2: 0.16219  (0.16219)
     | > loss_disc_real_3: 0.19660  (0.19660)
     | > loss_disc_real_4: 0.21511  (0.21511)
     | > loss_disc_real_5: 0.20459  (0.20459)
     | > loss_0: 2.39853  (2.39853)
     | > grad_norm_0: 9.91991  (9.91991)
     | > loss_gen: 2.20251  (2.20251)
     | > loss_kl: 1.82652  (1.82652)
     | > loss_feat: 3.94177  (3.94177)
     | > loss_mel: 27.56564  (27.56564)
     | > loss_duration: 1.74459  (1.74459)
     | > amp_scaler: 512.00000  (512.00000)
     | > loss_1: 37.28103  (37.28103)
     | > grad_norm_1: 195.86511  (195.86511)
     | > current_lr_0: 0.00020 
     | > current_lr_1: 0.00020 
     | > step_time: 0.97700  (0.97704)
     | > loader_time: 0.70310  (0.70315)


[1m   --> STEP: 25/26 -- GLOBAL_STEP: 3250[0m
     | > loss_disc: 2.08525  (2.57773)
   



> DataLoader initialization
| > Tokenizer:
	| > add_blank: True
	| > use_eos_bos: False
	| > use_phonemes: True
	| > phonemizer:
		| > phoneme language: es-es
		| > phoneme backend: gruut
	| > 6 not found characters:
	| > ɡ
	| > θ
	| > ͡
	| > ʃ
	| > ʝ
	| > ɾ
| > Number of instances : 4
 | > Preprocessing samples
 | > Max text length: 81
 | > Min text length: 46
 | > Avg text length: 66.5
 | 
 | > Max audio length: 80022.0
 | > Min audio length: 80022.0
 | > Avg audio length: 80022.0
 | > Num. instances discarded samples: 0
 | > Batch group size: 0.
 | > Synthesizing test sentences.



  [1m--> EVAL PERFORMANCE[0m
     | > avg_loader_time:[91m 0.62019 [0m(+0.14500)
     | > avg_loss_disc:[92m 2.54097 [0m(-0.06092)
     | > avg_loss_disc_real_0:[91m 0.25147 [0m(+0.09835)
     | > avg_loss_disc_real_1:[92m 0.14165 [0m(-0.03502)
     | > avg_loss_disc_real_2:[91m 0.23069 [0m(+0.03688)
     | > avg_loss_disc_real_3:[92m 0.24085 [0m(-0.01000)
     | > avg_loss_disc_real_4:[92m 0.16569 [0m(-0.08197)
     | > avg_loss_disc_real_5:[91m 0.25558 [0m(+0.02004)
     | > avg_loss_0:[92m 2.54097 [0m(-0.06092)
     | > avg_loss_gen:[91m 2.01369 [0m(+0.12318)
     | > avg_loss_kl:[92m 1.59189 [0m(-0.07427)
     | > avg_loss_feat:[91m 3.18076 [0m(+0.75750)
     | > avg_loss_mel:[92m 22.94765 [0m(-0.27433)
     | > avg_loss_duration:[92m 1.64748 [0m(-0.01419)
     | > avg_loss_1:[91m 31.38147 [0m(+0.51789)


[4m[1m > EPOCH: 40/100[0m
 --> /content/drive2/MyDrive/tts-ai/VITS-es-1-April-28-2023_09+16PM-7712052

[1m > TRAINING (2023-04-28 21:42:16) [



> DataLoader initialization
| > Tokenizer:
	| > add_blank: True
	| > use_eos_bos: False
	| > use_phonemes: True
	| > phonemizer:
		| > phoneme language: es-es
		| > phoneme backend: gruut
	| > 6 not found characters:
	| > ɡ
	| > θ
	| > ͡
	| > ʃ
	| > ʝ
	| > ɾ
| > Number of instances : 401
 | > Preprocessing samples
 | > Max text length: 94
 | > Min text length: 20
 | > Avg text length: 61.45885286783042
 | 
 | > Max audio length: 80022.0
 | > Min audio length: 64166.0
 | > Avg audio length: 79982.45885286783
 | > Num. instances discarded samples: 0
 | > Batch group size: 0.



[1m   --> STEP: 24/26 -- GLOBAL_STEP: 3275[0m
     | > loss_disc: 2.74897  (2.63401)
     | > loss_disc_real_0: 0.39525  (0.20359)
     | > loss_disc_real_1: 0.31786  (0.22182)
     | > loss_disc_real_2: 0.31678  (0.24675)
     | > loss_disc_real_3: 0.30608  (0.24961)
     | > loss_disc_real_4: 0.27388  (0.23675)
     | > loss_disc_real_5: 0.27099  (0.24648)
     | > loss_0: 2.74897  (2.63401)
     | > grad_norm_0: 45.28003  (18.73160)
     | > loss_gen: 2.34085  (2.12765)
     | > loss_kl: 1.15472  (1.46095)
     | > loss_feat: 2.17642  (2.58742)
     | > loss_mel: 22.29465  (24.04506)
     | > loss_duration: 1.81026  (1.78023)
     | > amp_scaler: 512.00000  (512.00000)
     | > loss_1: 29.77690  (32.00131)
     | > grad_norm_1: 136.94461  (189.74850)
     | > current_lr_0: 0.00020 
     | > current_lr_1: 0.00020 
     | > step_time: 0.79810  (0.76759)
     | > loader_time: 0.00620  (0.00812)


[1m > EVALUATION [0m





> DataLoader initialization
| > Tokenizer:
	| > add_blank: True
	| > use_eos_bos: False
	| > use_phonemes: True
	| > phonemizer:
		| > phoneme language: es-es
		| > phoneme backend: gruut
	| > 6 not found characters:
	| > ɡ
	| > θ
	| > ͡
	| > ʃ
	| > ʝ
	| > ɾ
| > Number of instances : 4
 | > Preprocessing samples
 | > Max text length: 81
 | > Min text length: 46
 | > Avg text length: 66.5
 | 
 | > Max audio length: 80022.0
 | > Min audio length: 80022.0
 | > Avg audio length: 80022.0
 | > Num. instances discarded samples: 0
 | > Batch group size: 0.
 | > Synthesizing test sentences.



  [1m--> EVAL PERFORMANCE[0m
     | > avg_loader_time:[91m 0.76223 [0m(+0.14204)
     | > avg_loss_disc:[92m 2.53605 [0m(-0.00492)
     | > avg_loss_disc_real_0:[92m 0.12217 [0m(-0.12930)
     | > avg_loss_disc_real_1:[92m 0.12122 [0m(-0.02044)
     | > avg_loss_disc_real_2:[92m 0.16628 [0m(-0.06441)
     | > avg_loss_disc_real_3:[92m 0.14089 [0m(-0.09995)
     | > avg_loss_disc_real_4:[91m 0.18733 [0m(+0.02164)
     | > avg_loss_disc_real_5:[92m 0.15923 [0m(-0.09635)
     | > avg_loss_0:[92m 2.53605 [0m(-0.00492)
     | > avg_loss_gen:[92m 1.66771 [0m(-0.34598)
     | > avg_loss_kl:[91m 2.06609 [0m(+0.47420)
     | > avg_loss_feat:[91m 3.72101 [0m(+0.54025)
     | > avg_loss_mel:[91m 25.51881 [0m(+2.57117)
     | > avg_loss_duration:[91m 1.67333 [0m(+0.02585)
     | > avg_loss_1:[91m 34.64696 [0m(+3.26549)


[4m[1m > EPOCH: 41/100[0m
 --> /content/drive2/MyDrive/tts-ai/VITS-es-1-April-28-2023_09+16PM-7712052

[1m > TRAINING (2023-04-28 21:42:49) [



> DataLoader initialization
| > Tokenizer:
	| > add_blank: True
	| > use_eos_bos: False
	| > use_phonemes: True
	| > phonemizer:
		| > phoneme language: es-es
		| > phoneme backend: gruut
	| > 6 not found characters:
	| > ɡ
	| > θ
	| > ͡
	| > ʃ
	| > ʝ
	| > ɾ
| > Number of instances : 401
 | > Preprocessing samples
 | > Max text length: 94
 | > Min text length: 20
 | > Avg text length: 61.45885286783042
 | 
 | > Max audio length: 80022.0
 | > Min audio length: 64166.0
 | > Avg audio length: 79982.45885286783
 | > Num. instances discarded samples: 0
 | > Batch group size: 0.



[1m   --> STEP: 23/26 -- GLOBAL_STEP: 3300[0m
     | > loss_disc: 2.61059  (2.60723)
     | > loss_disc_real_0: 0.27329  (0.20151)
     | > loss_disc_real_1: 0.18880  (0.21287)
     | > loss_disc_real_2: 0.28880  (0.23874)
     | > loss_disc_real_3: 0.22674  (0.24658)
     | > loss_disc_real_4: 0.27678  (0.22906)
     | > loss_disc_real_5: 0.23149  (0.24678)
     | > loss_0: 2.61059  (2.60723)
     | > grad_norm_0: 14.92385  (12.15017)
     | > loss_gen: 2.13858  (2.05283)
     | > loss_kl: 1.14109  (1.46663)
     | > loss_feat: 2.35219  (2.41560)
     | > loss_mel: 23.17537  (23.38355)
     | > loss_duration: 1.73131  (1.77433)
     | > amp_scaler: 512.00000  (512.00000)
     | > loss_1: 30.53855  (31.09295)
     | > grad_norm_1: 154.01810  (187.37283)
     | > current_lr_0: 0.00020 
     | > current_lr_1: 0.00020 
     | > step_time: 0.82300  (0.75809)
     | > loader_time: 0.00670  (0.00767)


[1m > EVALUATION [0m





> DataLoader initialization
| > Tokenizer:
	| > add_blank: True
	| > use_eos_bos: False
	| > use_phonemes: True
	| > phonemizer:
		| > phoneme language: es-es
		| > phoneme backend: gruut
	| > 6 not found characters:
	| > ɡ
	| > θ
	| > ͡
	| > ʃ
	| > ʝ
	| > ɾ
| > Number of instances : 4
 | > Preprocessing samples
 | > Max text length: 81
 | > Min text length: 46
 | > Avg text length: 66.5
 | 
 | > Max audio length: 80022.0
 | > Min audio length: 80022.0
 | > Avg audio length: 80022.0
 | > Num. instances discarded samples: 0
 | > Batch group size: 0.
 | > Synthesizing test sentences.



  [1m--> EVAL PERFORMANCE[0m
     | > avg_loader_time:[92m 0.47975 [0m(-0.28248)
     | > avg_loss_disc:[92m 2.47325 [0m(-0.06280)
     | > avg_loss_disc_real_0:[92m 0.05170 [0m(-0.07046)
     | > avg_loss_disc_real_1:[92m 0.09599 [0m(-0.02523)
     | > avg_loss_disc_real_2:[91m 0.18380 [0m(+0.01751)
     | > avg_loss_disc_real_3:[91m 0.23790 [0m(+0.09701)
     | > avg_loss_disc_real_4:[92m 0.12547 [0m(-0.06186)
     | > avg_loss_disc_real_5:[91m 0.23011 [0m(+0.07088)
     | > avg_loss_0:[92m 2.47325 [0m(-0.06280)
     | > avg_loss_gen:[92m 1.65959 [0m(-0.00812)
     | > avg_loss_kl:[92m 1.31252 [0m(-0.75357)
     | > avg_loss_feat:[92m 2.84236 [0m(-0.87865)
     | > avg_loss_mel:[92m 22.95301 [0m(-2.56580)
     | > avg_loss_duration:[92m 1.66821 [0m(-0.00512)
     | > avg_loss_1:[92m 30.43570 [0m(-4.21126)


[4m[1m > EPOCH: 42/100[0m
 --> /content/drive2/MyDrive/tts-ai/VITS-es-1-April-28-2023_09+16PM-7712052

[1m > TRAINING (2023-04-28 21:43:24) [



> DataLoader initialization
| > Tokenizer:
	| > add_blank: True
	| > use_eos_bos: False
	| > use_phonemes: True
	| > phonemizer:
		| > phoneme language: es-es
		| > phoneme backend: gruut
	| > 6 not found characters:
	| > ɡ
	| > θ
	| > ͡
	| > ʃ
	| > ʝ
	| > ɾ
| > Number of instances : 401
 | > Preprocessing samples
 | > Max text length: 94
 | > Min text length: 20
 | > Avg text length: 61.45885286783042
 | 
 | > Max audio length: 80022.0
 | > Min audio length: 64166.0
 | > Avg audio length: 79982.45885286783
 | > Num. instances discarded samples: 0
 | > Batch group size: 0.



[1m   --> STEP: 22/26 -- GLOBAL_STEP: 3325[0m
     | > loss_disc: 2.63247  (2.59546)
     | > loss_disc_real_0: 0.23153  (0.20348)
     | > loss_disc_real_1: 0.18512  (0.20950)
     | > loss_disc_real_2: 0.20631  (0.23760)
     | > loss_disc_real_3: 0.19043  (0.23298)
     | > loss_disc_real_4: 0.21728  (0.22985)
     | > loss_disc_real_5: 0.23404  (0.24354)
     | > loss_0: 2.63247  (2.59546)
     | > grad_norm_0: 10.77635  (14.80196)
     | > loss_gen: 2.03852  (2.03182)
     | > loss_kl: 1.60315  (1.42607)
     | > loss_feat: 2.38244  (2.41620)
     | > loss_mel: 23.12143  (23.11810)
     | > loss_duration: 1.74354  (1.77896)
     | > amp_scaler: 512.00000  (512.00000)
     | > loss_1: 30.88908  (30.77116)
     | > grad_norm_1: 223.22267  (220.73280)
     | > current_lr_0: 0.00020 
     | > current_lr_1: 0.00020 
     | > step_time: 0.70220  (0.75601)
     | > loader_time: 0.00580  (0.00973)


[1m > EVALUATION [0m





> DataLoader initialization
| > Tokenizer:
	| > add_blank: True
	| > use_eos_bos: False
	| > use_phonemes: True
	| > phonemizer:
		| > phoneme language: es-es
		| > phoneme backend: gruut
	| > 6 not found characters:
	| > ɡ
	| > θ
	| > ͡
	| > ʃ
	| > ʝ
	| > ɾ
| > Number of instances : 4
 | > Preprocessing samples
 | > Max text length: 81
 | > Min text length: 46
 | > Avg text length: 66.5
 | 
 | > Max audio length: 80022.0
 | > Min audio length: 80022.0
 | > Avg audio length: 80022.0
 | > Num. instances discarded samples: 0
 | > Batch group size: 0.
 | > Synthesizing test sentences.



  [1m--> EVAL PERFORMANCE[0m
     | > avg_loader_time:[91m 0.50051 [0m(+0.02076)
     | > avg_loss_disc:[91m 2.63046 [0m(+0.15720)
     | > avg_loss_disc_real_0:[91m 0.36317 [0m(+0.31147)
     | > avg_loss_disc_real_1:[91m 0.21777 [0m(+0.12178)
     | > avg_loss_disc_real_2:[91m 0.25415 [0m(+0.07036)
     | > avg_loss_disc_real_3:[92m 0.18339 [0m(-0.05451)
     | > avg_loss_disc_real_4:[91m 0.20549 [0m(+0.08003)
     | > avg_loss_disc_real_5:[92m 0.22714 [0m(-0.00297)
     | > avg_loss_0:[91m 2.63046 [0m(+0.15720)
     | > avg_loss_gen:[91m 2.27781 [0m(+0.61821)
     | > avg_loss_kl:[91m 1.49028 [0m(+0.17775)
     | > avg_loss_feat:[92m 2.76714 [0m(-0.07522)
     | > avg_loss_mel:[91m 25.27506 [0m(+2.32205)
     | > avg_loss_duration:[92m 1.66616 [0m(-0.00204)
     | > avg_loss_1:[91m 33.47645 [0m(+3.04075)


[4m[1m > EPOCH: 43/100[0m
 --> /content/drive2/MyDrive/tts-ai/VITS-es-1-April-28-2023_09+16PM-7712052

[1m > TRAINING (2023-04-28 21:43:58) [



> DataLoader initialization
| > Tokenizer:
	| > add_blank: True
	| > use_eos_bos: False
	| > use_phonemes: True
	| > phonemizer:
		| > phoneme language: es-es
		| > phoneme backend: gruut
	| > 6 not found characters:
	| > ɡ
	| > θ
	| > ͡
	| > ʃ
	| > ʝ
	| > ɾ
| > Number of instances : 401
 | > Preprocessing samples
 | > Max text length: 94
 | > Min text length: 20
 | > Avg text length: 61.45885286783042
 | 
 | > Max audio length: 80022.0
 | > Min audio length: 64166.0
 | > Avg audio length: 79982.45885286783
 | > Num. instances discarded samples: 0
 | > Batch group size: 0.



[1m   --> STEP: 21/26 -- GLOBAL_STEP: 3350[0m
     | > loss_disc: 2.63294  (2.62258)
     | > loss_disc_real_0: 0.09628  (0.19107)
     | > loss_disc_real_1: 0.20708  (0.21722)
     | > loss_disc_real_2: 0.20230  (0.24571)
     | > loss_disc_real_3: 0.19493  (0.23360)
     | > loss_disc_real_4: 0.23547  (0.24035)
     | > loss_disc_real_5: 0.24226  (0.24633)
     | > loss_0: 2.63294  (2.62258)
     | > grad_norm_0: 16.75503  (15.76854)
     | > loss_gen: 1.83076  (2.03096)
     | > loss_kl: 1.20333  (1.35786)
     | > loss_feat: 2.05916  (2.43881)
     | > loss_mel: 22.54850  (24.03720)
     | > loss_duration: 1.83228  (1.77664)
     | > amp_scaler: 512.00000  (512.00000)
     | > loss_1: 29.47403  (31.64147)
     | > grad_norm_1: 259.08713  (188.06479)
     | > current_lr_0: 0.00020 
     | > current_lr_1: 0.00020 
     | > step_time: 0.70310  (0.75641)
     | > loader_time: 0.00560  (0.00822)


[1m > EVALUATION [0m





> DataLoader initialization
| > Tokenizer:
	| > add_blank: True
	| > use_eos_bos: False
	| > use_phonemes: True
	| > phonemizer:
		| > phoneme language: es-es
		| > phoneme backend: gruut
	| > 6 not found characters:
	| > ɡ
	| > θ
	| > ͡
	| > ʃ
	| > ʝ
	| > ɾ
| > Number of instances : 4
 | > Preprocessing samples
 | > Max text length: 81
 | > Min text length: 46
 | > Avg text length: 66.5
 | 
 | > Max audio length: 80022.0
 | > Min audio length: 80022.0
 | > Avg audio length: 80022.0
 | > Num. instances discarded samples: 0
 | > Batch group size: 0.
 | > Synthesizing test sentences.



  [1m--> EVAL PERFORMANCE[0m
     | > avg_loader_time:[92m 0.49524 [0m(-0.00526)
     | > avg_loss_disc:[92m 2.38782 [0m(-0.24264)
     | > avg_loss_disc_real_0:[92m 0.31673 [0m(-0.04644)
     | > avg_loss_disc_real_1:[92m 0.14004 [0m(-0.07773)
     | > avg_loss_disc_real_2:[92m 0.23947 [0m(-0.01469)
     | > avg_loss_disc_real_3:[91m 0.38133 [0m(+0.19794)
     | > avg_loss_disc_real_4:[91m 0.30994 [0m(+0.10444)
     | > avg_loss_disc_real_5:[92m 0.18476 [0m(-0.04238)
     | > avg_loss_0:[92m 2.38782 [0m(-0.24264)
     | > avg_loss_gen:[91m 2.72379 [0m(+0.44599)
     | > avg_loss_kl:[91m 1.81342 [0m(+0.32314)
     | > avg_loss_feat:[91m 3.42359 [0m(+0.65645)
     | > avg_loss_mel:[92m 24.03963 [0m(-1.23543)
     | > avg_loss_duration:[92m 1.66412 [0m(-0.00204)
     | > avg_loss_1:[91m 33.66456 [0m(+0.18811)


[4m[1m > EPOCH: 44/100[0m
 --> /content/drive2/MyDrive/tts-ai/VITS-es-1-April-28-2023_09+16PM-7712052

[1m > TRAINING (2023-04-28 21:44:32) [



> DataLoader initialization
| > Tokenizer:
	| > add_blank: True
	| > use_eos_bos: False
	| > use_phonemes: True
	| > phonemizer:
		| > phoneme language: es-es
		| > phoneme backend: gruut
	| > 6 not found characters:
	| > ɡ
	| > θ
	| > ͡
	| > ʃ
	| > ʝ
	| > ɾ
| > Number of instances : 401
 | > Preprocessing samples
 | > Max text length: 94
 | > Min text length: 20
 | > Avg text length: 61.45885286783042
 | 
 | > Max audio length: 80022.0
 | > Min audio length: 64166.0
 | > Avg audio length: 79982.45885286783
 | > Num. instances discarded samples: 0
 | > Batch group size: 0.



[1m   --> STEP: 20/26 -- GLOBAL_STEP: 3375[0m
     | > loss_disc: 2.61763  (2.60686)
     | > loss_disc_real_0: 0.19821  (0.18230)
     | > loss_disc_real_1: 0.18173  (0.21423)
     | > loss_disc_real_2: 0.24135  (0.24491)
     | > loss_disc_real_3: 0.23240  (0.25047)
     | > loss_disc_real_4: 0.23805  (0.23475)
     | > loss_disc_real_5: 0.24629  (0.26002)
     | > loss_0: 2.61763  (2.60686)
     | > grad_norm_0: 5.41728  (11.48271)
     | > loss_gen: 2.00466  (2.02701)
     | > loss_kl: 1.27572  (1.37666)
     | > loss_feat: 2.39762  (2.42704)
     | > loss_mel: 23.47182  (23.72180)
     | > loss_duration: 1.80212  (1.77584)
     | > amp_scaler: 512.00000  (512.00000)
     | > loss_1: 30.95194  (31.32835)
     | > grad_norm_1: 224.48485  (182.21031)
     | > current_lr_0: 0.00020 
     | > current_lr_1: 0.00020 
     | > step_time: 0.68660  (0.76320)
     | > loader_time: 0.00650  (0.00862)


[1m > EVALUATION [0m





> DataLoader initialization
| > Tokenizer:
	| > add_blank: True
	| > use_eos_bos: False
	| > use_phonemes: True
	| > phonemizer:
		| > phoneme language: es-es
		| > phoneme backend: gruut
	| > 6 not found characters:
	| > ɡ
	| > θ
	| > ͡
	| > ʃ
	| > ʝ
	| > ɾ
| > Number of instances : 4
 | > Preprocessing samples
 | > Max text length: 81
 | > Min text length: 46
 | > Avg text length: 66.5
 | 
 | > Max audio length: 80022.0
 | > Min audio length: 80022.0
 | > Avg audio length: 80022.0
 | > Num. instances discarded samples: 0
 | > Batch group size: 0.
 | > Synthesizing test sentences.



  [1m--> EVAL PERFORMANCE[0m
     | > avg_loader_time:[91m 0.73899 [0m(+0.24375)
     | > avg_loss_disc:[91m 2.83023 [0m(+0.44241)
     | > avg_loss_disc_real_0:[92m 0.08002 [0m(-0.23671)
     | > avg_loss_disc_real_1:[91m 0.14360 [0m(+0.00355)
     | > avg_loss_disc_real_2:[92m 0.20589 [0m(-0.03358)
     | > avg_loss_disc_real_3:[92m 0.20442 [0m(-0.17691)
     | > avg_loss_disc_real_4:[92m 0.17058 [0m(-0.13935)
     | > avg_loss_disc_real_5:[91m 0.20188 [0m(+0.01712)
     | > avg_loss_0:[91m 2.83023 [0m(+0.44241)
     | > avg_loss_gen:[92m 1.41031 [0m(-1.31348)
     | > avg_loss_kl:[92m 1.52269 [0m(-0.29074)
     | > avg_loss_feat:[92m 3.25016 [0m(-0.17343)
     | > avg_loss_mel:[92m 23.61105 [0m(-0.42858)
     | > avg_loss_duration:[92m 1.66139 [0m(-0.00273)
     | > avg_loss_1:[92m 31.45560 [0m(-2.20897)


[4m[1m > EPOCH: 45/100[0m
 --> /content/drive2/MyDrive/tts-ai/VITS-es-1-April-28-2023_09+16PM-7712052

[1m > TRAINING (2023-04-28 21:45:06) [



> DataLoader initialization
| > Tokenizer:
	| > add_blank: True
	| > use_eos_bos: False
	| > use_phonemes: True
	| > phonemizer:
		| > phoneme language: es-es
		| > phoneme backend: gruut
	| > 6 not found characters:
	| > ɡ
	| > θ
	| > ͡
	| > ʃ
	| > ʝ
	| > ɾ
| > Number of instances : 401
 | > Preprocessing samples
 | > Max text length: 94
 | > Min text length: 20
 | > Avg text length: 61.45885286783042
 | 
 | > Max audio length: 80022.0
 | > Min audio length: 64166.0
 | > Avg audio length: 79982.45885286783
 | > Num. instances discarded samples: 0
 | > Batch group size: 0.



[1m   --> STEP: 19/26 -- GLOBAL_STEP: 3400[0m
     | > loss_disc: 2.44473  (2.59418)
     | > loss_disc_real_0: 0.20369  (0.21049)
     | > loss_disc_real_1: 0.24871  (0.21395)
     | > loss_disc_real_2: 0.22466  (0.23790)
     | > loss_disc_real_3: 0.27203  (0.23380)
     | > loss_disc_real_4: 0.19958  (0.23555)
     | > loss_disc_real_5: 0.22193  (0.24205)
     | > loss_0: 2.44473  (2.59418)
     | > grad_norm_0: 13.72571  (17.19996)
     | > loss_gen: 2.18265  (2.09819)
     | > loss_kl: 1.34106  (1.42023)
     | > loss_feat: 2.75005  (2.61600)
     | > loss_mel: 24.12519  (23.42541)
     | > loss_duration: 1.77563  (1.77467)
     | > amp_scaler: 512.00000  (512.00000)
     | > loss_1: 32.17458  (31.33450)
     | > grad_norm_1: 161.65877  (218.20786)
     | > current_lr_0: 0.00020 
     | > current_lr_1: 0.00020 
     | > step_time: 0.80890  (0.76539)
     | > loader_time: 0.00830  (0.01125)


[1m > EVALUATION [0m





> DataLoader initialization
| > Tokenizer:
	| > add_blank: True
	| > use_eos_bos: False
	| > use_phonemes: True
	| > phonemizer:
		| > phoneme language: es-es
		| > phoneme backend: gruut
	| > 6 not found characters:
	| > ɡ
	| > θ
	| > ͡
	| > ʃ
	| > ʝ
	| > ɾ
| > Number of instances : 4
 | > Preprocessing samples
 | > Max text length: 81
 | > Min text length: 46
 | > Avg text length: 66.5
 | 
 | > Max audio length: 80022.0
 | > Min audio length: 80022.0
 | > Avg audio length: 80022.0
 | > Num. instances discarded samples: 0
 | > Batch group size: 0.
 | > Synthesizing test sentences.



  [1m--> EVAL PERFORMANCE[0m
     | > avg_loader_time:[92m 0.49328 [0m(-0.24572)
     | > avg_loss_disc:[92m 2.54849 [0m(-0.28174)
     | > avg_loss_disc_real_0:[91m 0.26881 [0m(+0.18879)
     | > avg_loss_disc_real_1:[91m 0.18764 [0m(+0.04404)
     | > avg_loss_disc_real_2:[91m 0.21967 [0m(+0.01379)
     | > avg_loss_disc_real_3:[91m 0.24879 [0m(+0.04437)
     | > avg_loss_disc_real_4:[91m 0.20624 [0m(+0.03565)
     | > avg_loss_disc_real_5:[92m 0.16783 [0m(-0.03405)
     | > avg_loss_0:[92m 2.54849 [0m(-0.28174)
     | > avg_loss_gen:[91m 2.06801 [0m(+0.65770)
     | > avg_loss_kl:[91m 1.52615 [0m(+0.00346)
     | > avg_loss_feat:[92m 2.41430 [0m(-0.83586)
     | > avg_loss_mel:[92m 21.81791 [0m(-1.79315)
     | > avg_loss_duration:[91m 1.66547 [0m(+0.00408)
     | > avg_loss_1:[92m 29.49183 [0m(-1.96376)


[4m[1m > EPOCH: 46/100[0m
 --> /content/drive2/MyDrive/tts-ai/VITS-es-1-April-28-2023_09+16PM-7712052

[1m > TRAINING (2023-04-28 21:45:41) [



> DataLoader initialization
| > Tokenizer:
	| > add_blank: True
	| > use_eos_bos: False
	| > use_phonemes: True
	| > phonemizer:
		| > phoneme language: es-es
		| > phoneme backend: gruut
	| > 6 not found characters:
	| > ɡ
	| > θ
	| > ͡
	| > ʃ
	| > ʝ
	| > ɾ
| > Number of instances : 401
 | > Preprocessing samples
 | > Max text length: 94
 | > Min text length: 20
 | > Avg text length: 61.45885286783042
 | 
 | > Max audio length: 80022.0
 | > Min audio length: 64166.0
 | > Avg audio length: 79982.45885286783
 | > Num. instances discarded samples: 0
 | > Batch group size: 0.



[1m   --> STEP: 18/26 -- GLOBAL_STEP: 3425[0m
     | > loss_disc: 2.60408  (2.59344)
     | > loss_disc_real_0: 0.27380  (0.18815)
     | > loss_disc_real_1: 0.21693  (0.21587)
     | > loss_disc_real_2: 0.27040  (0.24022)
     | > loss_disc_real_3: 0.22424  (0.24345)
     | > loss_disc_real_4: 0.22531  (0.22351)
     | > loss_disc_real_5: 0.23205  (0.24726)
     | > loss_0: 2.60408  (2.59344)
     | > grad_norm_0: 13.19401  (10.17692)
     | > loss_gen: 1.93479  (2.04797)
     | > loss_kl: 1.41867  (1.41973)
     | > loss_feat: 2.31708  (2.51094)
     | > loss_mel: 24.52976  (23.88757)
     | > loss_duration: 1.82582  (1.77688)
     | > amp_scaler: 512.00000  (512.00000)
     | > loss_1: 32.02611  (31.64309)
     | > grad_norm_1: 189.73193  (167.84424)
     | > current_lr_0: 0.00020 
     | > current_lr_1: 0.00020 
     | > step_time: 0.83170  (0.76264)
     | > loader_time: 0.00740  (0.00862)


[1m > EVALUATION [0m





> DataLoader initialization
| > Tokenizer:
	| > add_blank: True
	| > use_eos_bos: False
	| > use_phonemes: True
	| > phonemizer:
		| > phoneme language: es-es
		| > phoneme backend: gruut
	| > 6 not found characters:
	| > ɡ
	| > θ
	| > ͡
	| > ʃ
	| > ʝ
	| > ɾ
| > Number of instances : 4
 | > Preprocessing samples
 | > Max text length: 81
 | > Min text length: 46
 | > Avg text length: 66.5
 | 
 | > Max audio length: 80022.0
 | > Min audio length: 80022.0
 | > Avg audio length: 80022.0
 | > Num. instances discarded samples: 0
 | > Batch group size: 0.
 | > Synthesizing test sentences.



  [1m--> EVAL PERFORMANCE[0m
     | > avg_loader_time:[92m 0.49106 [0m(-0.00222)
     | > avg_loss_disc:[91m 2.86617 [0m(+0.31768)
     | > avg_loss_disc_real_0:[91m 0.30614 [0m(+0.03733)
     | > avg_loss_disc_real_1:[91m 0.30525 [0m(+0.11761)
     | > avg_loss_disc_real_2:[91m 0.29293 [0m(+0.07326)
     | > avg_loss_disc_real_3:[92m 0.17864 [0m(-0.07015)
     | > avg_loss_disc_real_4:[91m 0.22037 [0m(+0.01413)
     | > avg_loss_disc_real_5:[91m 0.19751 [0m(+0.02968)
     | > avg_loss_0:[91m 2.86617 [0m(+0.31768)
     | > avg_loss_gen:[91m 2.30489 [0m(+0.23688)
     | > avg_loss_kl:[92m 1.41199 [0m(-0.11416)
     | > avg_loss_feat:[91m 2.81840 [0m(+0.40410)
     | > avg_loss_mel:[91m 22.92962 [0m(+1.11172)
     | > avg_loss_duration:[91m 1.68142 [0m(+0.01596)
     | > avg_loss_1:[91m 31.14633 [0m(+1.65450)


[4m[1m > EPOCH: 47/100[0m
 --> /content/drive2/MyDrive/tts-ai/VITS-es-1-April-28-2023_09+16PM-7712052

[1m > TRAINING (2023-04-28 21:46:15) [



> DataLoader initialization
| > Tokenizer:
	| > add_blank: True
	| > use_eos_bos: False
	| > use_phonemes: True
	| > phonemizer:
		| > phoneme language: es-es
		| > phoneme backend: gruut
	| > 6 not found characters:
	| > ɡ
	| > θ
	| > ͡
	| > ʃ
	| > ʝ
	| > ɾ
| > Number of instances : 401
 | > Preprocessing samples
 | > Max text length: 94
 | > Min text length: 20
 | > Avg text length: 61.45885286783042
 | 
 | > Max audio length: 80022.0
 | > Min audio length: 64166.0
 | > Avg audio length: 79982.45885286783
 | > Num. instances discarded samples: 0
 | > Batch group size: 0.



[1m   --> STEP: 17/26 -- GLOBAL_STEP: 3450[0m
     | > loss_disc: 2.73702  (2.60724)
     | > loss_disc_real_0: 0.26131  (0.19060)
     | > loss_disc_real_1: 0.22039  (0.22298)
     | > loss_disc_real_2: 0.21974  (0.24744)
     | > loss_disc_real_3: 0.22278  (0.24162)
     | > loss_disc_real_4: 0.22928  (0.23813)
     | > loss_disc_real_5: 0.25693  (0.26002)
     | > loss_0: 2.73702  (2.60724)
     | > grad_norm_0: 9.35099  (12.09195)
     | > loss_gen: 1.98146  (2.10493)
     | > loss_kl: 1.43342  (1.49841)
     | > loss_feat: 2.16355  (2.62428)
     | > loss_mel: 23.77267  (23.77964)
     | > loss_duration: 1.80561  (1.77275)
     | > amp_scaler: 512.00000  (512.00000)
     | > loss_1: 31.15670  (31.78002)
     | > grad_norm_1: 154.80763  (163.32292)
     | > current_lr_0: 0.00020 
     | > current_lr_1: 0.00020 
     | > step_time: 0.67520  (0.76715)
     | > loader_time: 0.00670  (0.00915)


[1m > EVALUATION [0m





> DataLoader initialization
| > Tokenizer:
	| > add_blank: True
	| > use_eos_bos: False
	| > use_phonemes: True
	| > phonemizer:
		| > phoneme language: es-es
		| > phoneme backend: gruut
	| > 6 not found characters:
	| > ɡ
	| > θ
	| > ͡
	| > ʃ
	| > ʝ
	| > ɾ
| > Number of instances : 4
 | > Preprocessing samples
 | > Max text length: 81
 | > Min text length: 46
 | > Avg text length: 66.5
 | 
 | > Max audio length: 80022.0
 | > Min audio length: 80022.0
 | > Avg audio length: 80022.0
 | > Num. instances discarded samples: 0
 | > Batch group size: 0.
 | > Synthesizing test sentences.



  [1m--> EVAL PERFORMANCE[0m
     | > avg_loader_time:[91m 0.49805 [0m(+0.00699)
     | > avg_loss_disc:[92m 2.63414 [0m(-0.23203)
     | > avg_loss_disc_real_0:[92m 0.26426 [0m(-0.04189)
     | > avg_loss_disc_real_1:[92m 0.11552 [0m(-0.18973)
     | > avg_loss_disc_real_2:[92m 0.14010 [0m(-0.15284)
     | > avg_loss_disc_real_3:[91m 0.24669 [0m(+0.06805)
     | > avg_loss_disc_real_4:[91m 0.34563 [0m(+0.12526)
     | > avg_loss_disc_real_5:[92m 0.17795 [0m(-0.01956)
     | > avg_loss_0:[92m 2.63414 [0m(-0.23203)
     | > avg_loss_gen:[92m 2.07492 [0m(-0.22997)
     | > avg_loss_kl:[91m 1.83959 [0m(+0.42760)
     | > avg_loss_feat:[91m 3.48887 [0m(+0.67047)
     | > avg_loss_mel:[91m 25.01642 [0m(+2.08680)
     | > avg_loss_duration:[92m 1.67023 [0m(-0.01120)
     | > avg_loss_1:[91m 34.09003 [0m(+2.94370)


[4m[1m > EPOCH: 48/100[0m
 --> /content/drive2/MyDrive/tts-ai/VITS-es-1-April-28-2023_09+16PM-7712052

[1m > TRAINING (2023-04-28 21:46:50) [



> DataLoader initialization
| > Tokenizer:
	| > add_blank: True
	| > use_eos_bos: False
	| > use_phonemes: True
	| > phonemizer:
		| > phoneme language: es-es
		| > phoneme backend: gruut
	| > 6 not found characters:
	| > ɡ
	| > θ
	| > ͡
	| > ʃ
	| > ʝ
	| > ɾ
| > Number of instances : 401
 | > Preprocessing samples
 | > Max text length: 94
 | > Min text length: 20
 | > Avg text length: 61.45885286783042
 | 
 | > Max audio length: 80022.0
 | > Min audio length: 64166.0
 | > Avg audio length: 79982.45885286783
 | > Num. instances discarded samples: 0
 | > Batch group size: 0.



[1m   --> STEP: 16/26 -- GLOBAL_STEP: 3475[0m
     | > loss_disc: 2.47742  (2.57081)
     | > loss_disc_real_0: 0.14920  (0.17501)
     | > loss_disc_real_1: 0.20033  (0.20494)
     | > loss_disc_real_2: 0.21070  (0.24352)
     | > loss_disc_real_3: 0.19081  (0.24452)
     | > loss_disc_real_4: 0.23480  (0.24013)
     | > loss_disc_real_5: 0.27855  (0.26064)
     | > loss_0: 2.47742  (2.57081)
     | > grad_norm_0: 6.12767  (9.10589)
     | > loss_gen: 2.10294  (2.09982)
     | > loss_kl: 1.56651  (1.48033)
     | > loss_feat: 2.43162  (2.62344)
     | > loss_mel: 23.00313  (23.60771)
     | > loss_duration: 1.80786  (1.77102)
     | > amp_scaler: 512.00000  (512.00000)
     | > loss_1: 30.91206  (31.58232)
     | > grad_norm_1: 268.34564  (157.96989)
     | > current_lr_0: 0.00020 
     | > current_lr_1: 0.00020 
     | > step_time: 0.70300  (0.77394)
     | > loader_time: 0.00710  (0.00870)


[1m > EVALUATION [0m





> DataLoader initialization
| > Tokenizer:
	| > add_blank: True
	| > use_eos_bos: False
	| > use_phonemes: True
	| > phonemizer:
		| > phoneme language: es-es
		| > phoneme backend: gruut
	| > 6 not found characters:
	| > ɡ
	| > θ
	| > ͡
	| > ʃ
	| > ʝ
	| > ɾ
| > Number of instances : 4
 | > Preprocessing samples
 | > Max text length: 81
 | > Min text length: 46
 | > Avg text length: 66.5
 | 
 | > Max audio length: 80022.0
 | > Min audio length: 80022.0
 | > Avg audio length: 80022.0
 | > Num. instances discarded samples: 0
 | > Batch group size: 0.
 | > Synthesizing test sentences.



  [1m--> EVAL PERFORMANCE[0m
     | > avg_loader_time:[91m 0.77807 [0m(+0.28003)
     | > avg_loss_disc:[91m 2.72192 [0m(+0.08778)
     | > avg_loss_disc_real_0:[91m 0.40739 [0m(+0.14314)
     | > avg_loss_disc_real_1:[91m 0.22229 [0m(+0.10677)
     | > avg_loss_disc_real_2:[91m 0.20964 [0m(+0.06954)
     | > avg_loss_disc_real_3:[91m 0.27884 [0m(+0.03215)
     | > avg_loss_disc_real_4:[92m 0.21702 [0m(-0.12862)
     | > avg_loss_disc_real_5:[91m 0.24901 [0m(+0.07106)
     | > avg_loss_0:[91m 2.72192 [0m(+0.08778)
     | > avg_loss_gen:[91m 2.19338 [0m(+0.11845)
     | > avg_loss_kl:[92m 1.12763 [0m(-0.71196)
     | > avg_loss_feat:[92m 2.16168 [0m(-1.32719)
     | > avg_loss_mel:[92m 23.54426 [0m(-1.47215)
     | > avg_loss_duration:[91m 1.68735 [0m(+0.01712)
     | > avg_loss_1:[92m 30.71430 [0m(-3.37573)


[4m[1m > EPOCH: 49/100[0m
 --> /content/drive2/MyDrive/tts-ai/VITS-es-1-April-28-2023_09+16PM-7712052

[1m > TRAINING (2023-04-28 21:47:23) [



> DataLoader initialization
| > Tokenizer:
	| > add_blank: True
	| > use_eos_bos: False
	| > use_phonemes: True
	| > phonemizer:
		| > phoneme language: es-es
		| > phoneme backend: gruut
	| > 6 not found characters:
	| > ɡ
	| > θ
	| > ͡
	| > ʃ
	| > ʝ
	| > ɾ
| > Number of instances : 401
 | > Preprocessing samples
 | > Max text length: 94
 | > Min text length: 20
 | > Avg text length: 61.45885286783042
 | 
 | > Max audio length: 80022.0
 | > Min audio length: 64166.0
 | > Avg audio length: 79982.45885286783
 | > Num. instances discarded samples: 0
 | > Batch group size: 0.



[1m   --> STEP: 15/26 -- GLOBAL_STEP: 3500[0m
     | > loss_disc: 2.69980  (2.55589)
     | > loss_disc_real_0: 0.15253  (0.18969)
     | > loss_disc_real_1: 0.21155  (0.20370)
     | > loss_disc_real_2: 0.27316  (0.24532)
     | > loss_disc_real_3: 0.26146  (0.23553)
     | > loss_disc_real_4: 0.25446  (0.21851)
     | > loss_disc_real_5: 0.27040  (0.24174)
     | > loss_0: 2.69980  (2.55589)
     | > grad_norm_0: 12.43908  (9.63757)
     | > loss_gen: 1.81628  (2.05079)
     | > loss_kl: 1.28518  (1.42093)
     | > loss_feat: 1.69251  (2.57058)
     | > loss_mel: 20.76674  (23.32709)
     | > loss_duration: 1.83748  (1.76947)
     | > amp_scaler: 512.00000  (512.00000)
     | > loss_1: 27.39820  (31.13885)
     | > grad_norm_1: 171.32495  (187.26630)
     | > current_lr_0: 0.00020 
     | > current_lr_1: 0.00020 
     | > step_time: 0.72180  (0.76950)
     | > loader_time: 0.00670  (0.00875)


[1m > EVALUATION [0m





> DataLoader initialization
| > Tokenizer:
	| > add_blank: True
	| > use_eos_bos: False
	| > use_phonemes: True
	| > phonemizer:
		| > phoneme language: es-es
		| > phoneme backend: gruut
	| > 6 not found characters:
	| > ɡ
	| > θ
	| > ͡
	| > ʃ
	| > ʝ
	| > ɾ
| > Number of instances : 4
 | > Preprocessing samples
 | > Max text length: 81
 | > Min text length: 46
 | > Avg text length: 66.5
 | 
 | > Max audio length: 80022.0
 | > Min audio length: 80022.0
 | > Avg audio length: 80022.0
 | > Num. instances discarded samples: 0
 | > Batch group size: 0.
 | > Synthesizing test sentences.



  [1m--> EVAL PERFORMANCE[0m
     | > avg_loader_time:[92m 0.52097 [0m(-0.25710)
     | > avg_loss_disc:[92m 2.39702 [0m(-0.32490)
     | > avg_loss_disc_real_0:[92m 0.19325 [0m(-0.21414)
     | > avg_loss_disc_real_1:[92m 0.20102 [0m(-0.02127)
     | > avg_loss_disc_real_2:[92m 0.20460 [0m(-0.00504)
     | > avg_loss_disc_real_3:[92m 0.14713 [0m(-0.13171)
     | > avg_loss_disc_real_4:[91m 0.24651 [0m(+0.02950)
     | > avg_loss_disc_real_5:[92m 0.17233 [0m(-0.07669)
     | > avg_loss_0:[92m 2.39702 [0m(-0.32490)
     | > avg_loss_gen:[92m 2.12332 [0m(-0.07006)
     | > avg_loss_kl:[91m 1.54027 [0m(+0.41264)
     | > avg_loss_feat:[91m 3.28377 [0m(+1.12209)
     | > avg_loss_mel:[91m 23.92537 [0m(+0.38110)
     | > avg_loss_duration:[92m 1.67232 [0m(-0.01503)
     | > avg_loss_1:[91m 32.54504 [0m(+1.83074)


[4m[1m > EPOCH: 50/100[0m
 --> /content/drive2/MyDrive/tts-ai/VITS-es-1-April-28-2023_09+16PM-7712052

[1m > TRAINING (2023-04-28 21:47:57) [



> DataLoader initialization
| > Tokenizer:
	| > add_blank: True
	| > use_eos_bos: False
	| > use_phonemes: True
	| > phonemizer:
		| > phoneme language: es-es
		| > phoneme backend: gruut
	| > 6 not found characters:
	| > ɡ
	| > θ
	| > ͡
	| > ʃ
	| > ʝ
	| > ɾ
| > Number of instances : 401
 | > Preprocessing samples
 | > Max text length: 94
 | > Min text length: 20
 | > Avg text length: 61.45885286783042
 | 
 | > Max audio length: 80022.0
 | > Min audio length: 64166.0
 | > Avg audio length: 79982.45885286783
 | > Num. instances discarded samples: 0
 | > Batch group size: 0.



[1m   --> STEP: 14/26 -- GLOBAL_STEP: 3525[0m
     | > loss_disc: 2.48567  (2.55191)
     | > loss_disc_real_0: 0.15925  (0.18253)
     | > loss_disc_real_1: 0.23476  (0.19721)
     | > loss_disc_real_2: 0.19318  (0.23681)
     | > loss_disc_real_3: 0.22616  (0.24203)
     | > loss_disc_real_4: 0.20654  (0.23137)
     | > loss_disc_real_5: 0.26046  (0.24572)
     | > loss_0: 2.48567  (2.55191)
     | > grad_norm_0: 4.36956  (7.39469)
     | > loss_gen: 2.17437  (2.08283)
     | > loss_kl: 1.41490  (1.42027)
     | > loss_feat: 2.60998  (2.57927)
     | > loss_mel: 24.08993  (23.47708)
     | > loss_duration: 1.82566  (1.76344)
     | > amp_scaler: 512.00000  (512.00000)
     | > loss_1: 32.11484  (31.32290)
     | > grad_norm_1: 246.99202  (185.09494)
     | > current_lr_0: 0.00020 
     | > current_lr_1: 0.00020 
     | > step_time: 0.74990  (0.73854)
     | > loader_time: 0.01060  (0.00847)


[1m > EVALUATION [0m





> DataLoader initialization
| > Tokenizer:
	| > add_blank: True
	| > use_eos_bos: False
	| > use_phonemes: True
	| > phonemizer:
		| > phoneme language: es-es
		| > phoneme backend: gruut
	| > 6 not found characters:
	| > ɡ
	| > θ
	| > ͡
	| > ʃ
	| > ʝ
	| > ɾ
| > Number of instances : 4
 | > Preprocessing samples
 | > Max text length: 81
 | > Min text length: 46
 | > Avg text length: 66.5
 | 
 | > Max audio length: 80022.0
 | > Min audio length: 80022.0
 | > Avg audio length: 80022.0
 | > Num. instances discarded samples: 0
 | > Batch group size: 0.
 | > Synthesizing test sentences.



  [1m--> EVAL PERFORMANCE[0m
     | > avg_loader_time:[92m 0.48072 [0m(-0.04025)
     | > avg_loss_disc:[92m 2.17471 [0m(-0.22231)
     | > avg_loss_disc_real_0:[92m 0.09132 [0m(-0.10193)
     | > avg_loss_disc_real_1:[92m 0.14615 [0m(-0.05487)
     | > avg_loss_disc_real_2:[92m 0.19700 [0m(-0.00760)
     | > avg_loss_disc_real_3:[91m 0.25301 [0m(+0.10588)
     | > avg_loss_disc_real_4:[92m 0.22415 [0m(-0.02236)
     | > avg_loss_disc_real_5:[91m 0.27959 [0m(+0.10726)
     | > avg_loss_0:[92m 2.17471 [0m(-0.22231)
     | > avg_loss_gen:[91m 2.40499 [0m(+0.28167)
     | > avg_loss_kl:[92m 1.14814 [0m(-0.39213)
     | > avg_loss_feat:[91m 3.87064 [0m(+0.58687)
     | > avg_loss_mel:[91m 24.12537 [0m(+0.20001)
     | > avg_loss_duration:[92m 1.66199 [0m(-0.01033)
     | > avg_loss_1:[91m 33.21113 [0m(+0.66609)


[4m[1m > EPOCH: 51/100[0m
 --> /content/drive2/MyDrive/tts-ai/VITS-es-1-April-28-2023_09+16PM-7712052

[1m > TRAINING (2023-04-28 21:48:31) [



> DataLoader initialization
| > Tokenizer:
	| > add_blank: True
	| > use_eos_bos: False
	| > use_phonemes: True
	| > phonemizer:
		| > phoneme language: es-es
		| > phoneme backend: gruut
	| > 6 not found characters:
	| > ɡ
	| > θ
	| > ͡
	| > ʃ
	| > ʝ
	| > ɾ
| > Number of instances : 401
 | > Preprocessing samples
 | > Max text length: 94
 | > Min text length: 20
 | > Avg text length: 61.45885286783042
 | 
 | > Max audio length: 80022.0
 | > Min audio length: 64166.0
 | > Avg audio length: 79982.45885286783
 | > Num. instances discarded samples: 0
 | > Batch group size: 0.



[1m   --> STEP: 13/26 -- GLOBAL_STEP: 3550[0m
     | > loss_disc: 2.53046  (2.60698)
     | > loss_disc_real_0: 0.18219  (0.21007)
     | > loss_disc_real_1: 0.18479  (0.20920)
     | > loss_disc_real_2: 0.21178  (0.23932)
     | > loss_disc_real_3: 0.22911  (0.24129)
     | > loss_disc_real_4: 0.19983  (0.21083)
     | > loss_disc_real_5: 0.20800  (0.24640)
     | > loss_0: 2.53046  (2.60698)
     | > grad_norm_0: 7.72916  (18.59147)
     | > loss_gen: 2.10721  (2.08935)
     | > loss_kl: 1.45618  (1.44250)
     | > loss_feat: 3.27253  (2.63971)
     | > loss_mel: 24.48023  (23.53115)
     | > loss_duration: 1.90924  (1.75473)
     | > amp_scaler: 512.00000  (512.00000)
     | > loss_1: 33.22540  (31.45744)
     | > grad_norm_1: 67.31596  (181.45084)
     | > current_lr_0: 0.00020 
     | > current_lr_1: 0.00020 
     | > step_time: 0.88960  (0.76042)
     | > loader_time: 0.00640  (0.00990)


[1m > EVALUATION [0m





> DataLoader initialization
| > Tokenizer:
	| > add_blank: True
	| > use_eos_bos: False
	| > use_phonemes: True
	| > phonemizer:
		| > phoneme language: es-es
		| > phoneme backend: gruut
	| > 6 not found characters:
	| > ɡ
	| > θ
	| > ͡
	| > ʃ
	| > ʝ
	| > ɾ
| > Number of instances : 4
 | > Preprocessing samples
 | > Max text length: 81
 | > Min text length: 46
 | > Avg text length: 66.5
 | 
 | > Max audio length: 80022.0
 | > Min audio length: 80022.0
 | > Avg audio length: 80022.0
 | > Num. instances discarded samples: 0
 | > Batch group size: 0.
 | > Synthesizing test sentences.



  [1m--> EVAL PERFORMANCE[0m
     | > avg_loader_time:[92m 0.47820 [0m(-0.00253)
     | > avg_loss_disc:[91m 2.63350 [0m(+0.45880)
     | > avg_loss_disc_real_0:[92m 0.06066 [0m(-0.03066)
     | > avg_loss_disc_real_1:[91m 0.16480 [0m(+0.01865)
     | > avg_loss_disc_real_2:[91m 0.24013 [0m(+0.04313)
     | > avg_loss_disc_real_3:[92m 0.22505 [0m(-0.02796)
     | > avg_loss_disc_real_4:[92m 0.17979 [0m(-0.04437)
     | > avg_loss_disc_real_5:[92m 0.25644 [0m(-0.02315)
     | > avg_loss_0:[91m 2.63350 [0m(+0.45880)
     | > avg_loss_gen:[92m 1.65114 [0m(-0.75385)
     | > avg_loss_kl:[91m 1.37499 [0m(+0.22686)
     | > avg_loss_feat:[92m 2.35758 [0m(-1.51306)
     | > avg_loss_mel:[92m 23.10408 [0m(-1.02129)
     | > avg_loss_duration:[92m 1.64768 [0m(-0.01431)
     | > avg_loss_1:[92m 30.13548 [0m(-3.07565)


[4m[1m > EPOCH: 52/100[0m
 --> /content/drive2/MyDrive/tts-ai/VITS-es-1-April-28-2023_09+16PM-7712052

[1m > TRAINING (2023-04-28 21:49:06) [



> DataLoader initialization
| > Tokenizer:
	| > add_blank: True
	| > use_eos_bos: False
	| > use_phonemes: True
	| > phonemizer:
		| > phoneme language: es-es
		| > phoneme backend: gruut
	| > 6 not found characters:
	| > ɡ
	| > θ
	| > ͡
	| > ʃ
	| > ʝ
	| > ɾ
| > Number of instances : 401
 | > Preprocessing samples
 | > Max text length: 94
 | > Min text length: 20
 | > Avg text length: 61.45885286783042
 | 
 | > Max audio length: 80022.0
 | > Min audio length: 64166.0
 | > Avg audio length: 79982.45885286783
 | > Num. instances discarded samples: 0
 | > Batch group size: 0.



[1m   --> STEP: 12/26 -- GLOBAL_STEP: 3575[0m
     | > loss_disc: 2.62121  (2.63572)
     | > loss_disc_real_0: 0.11771  (0.20509)
     | > loss_disc_real_1: 0.21655  (0.21621)
     | > loss_disc_real_2: 0.28185  (0.24440)
     | > loss_disc_real_3: 0.21588  (0.24846)
     | > loss_disc_real_4: 0.25221  (0.23034)
     | > loss_disc_real_5: 0.29507  (0.25544)
     | > loss_0: 2.62121  (2.63572)
     | > grad_norm_0: 20.43129  (21.99044)
     | > loss_gen: 2.30966  (2.10905)
     | > loss_kl: 0.96639  (1.38954)
     | > loss_feat: 2.17770  (2.56753)
     | > loss_mel: 22.56838  (23.26968)
     | > loss_duration: 1.80378  (1.74358)
     | > amp_scaler: 512.00000  (512.00000)
     | > loss_1: 29.82591  (31.07939)
     | > grad_norm_1: 172.18167  (170.59335)
     | > current_lr_0: 0.00020 
     | > current_lr_1: 0.00020 
     | > step_time: 0.71720  (0.79940)
     | > loader_time: 0.00910  (0.01208)


[1m > EVALUATION [0m





> DataLoader initialization
| > Tokenizer:
	| > add_blank: True
	| > use_eos_bos: False
	| > use_phonemes: True
	| > phonemizer:
		| > phoneme language: es-es
		| > phoneme backend: gruut
	| > 6 not found characters:
	| > ɡ
	| > θ
	| > ͡
	| > ʃ
	| > ʝ
	| > ɾ
| > Number of instances : 4
 | > Preprocessing samples
 | > Max text length: 81
 | > Min text length: 46
 | > Avg text length: 66.5
 | 
 | > Max audio length: 80022.0
 | > Min audio length: 80022.0
 | > Avg audio length: 80022.0
 | > Num. instances discarded samples: 0
 | > Batch group size: 0.
 | > Synthesizing test sentences.



  [1m--> EVAL PERFORMANCE[0m
     | > avg_loader_time:[91m 0.72466 [0m(+0.24646)
     | > avg_loss_disc:[92m 2.43968 [0m(-0.19382)
     | > avg_loss_disc_real_0:[91m 0.12455 [0m(+0.06389)
     | > avg_loss_disc_real_1:[92m 0.16263 [0m(-0.00217)
     | > avg_loss_disc_real_2:[92m 0.13296 [0m(-0.10718)
     | > avg_loss_disc_real_3:[91m 0.28274 [0m(+0.05770)
     | > avg_loss_disc_real_4:[91m 0.23616 [0m(+0.05638)
     | > avg_loss_disc_real_5:[92m 0.17801 [0m(-0.07842)
     | > avg_loss_0:[92m 2.43968 [0m(-0.19382)
     | > avg_loss_gen:[91m 2.01751 [0m(+0.36637)
     | > avg_loss_kl:[92m 1.26789 [0m(-0.10710)
     | > avg_loss_feat:[91m 3.83434 [0m(+1.47676)
     | > avg_loss_mel:[91m 24.71243 [0m(+1.60835)
     | > avg_loss_duration:[91m 1.65779 [0m(+0.01011)
     | > avg_loss_1:[91m 33.48997 [0m(+3.35449)


[4m[1m > EPOCH: 53/100[0m
 --> /content/drive2/MyDrive/tts-ai/VITS-es-1-April-28-2023_09+16PM-7712052

[1m > TRAINING (2023-04-28 21:49:40) [



> DataLoader initialization
| > Tokenizer:
	| > add_blank: True
	| > use_eos_bos: False
	| > use_phonemes: True
	| > phonemizer:
		| > phoneme language: es-es
		| > phoneme backend: gruut
	| > 6 not found characters:
	| > ɡ
	| > θ
	| > ͡
	| > ʃ
	| > ʝ
	| > ɾ
| > Number of instances : 401
 | > Preprocessing samples
 | > Max text length: 94
 | > Min text length: 20
 | > Avg text length: 61.45885286783042
 | 
 | > Max audio length: 80022.0
 | > Min audio length: 64166.0
 | > Avg audio length: 79982.45885286783
 | > Num. instances discarded samples: 0
 | > Batch group size: 0.



[1m   --> STEP: 11/26 -- GLOBAL_STEP: 3600[0m
     | > loss_disc: 2.52946  (2.56328)
     | > loss_disc_real_0: 0.20056  (0.17814)
     | > loss_disc_real_1: 0.20883  (0.21618)
     | > loss_disc_real_2: 0.23789  (0.24868)
     | > loss_disc_real_3: 0.29034  (0.22975)
     | > loss_disc_real_4: 0.26841  (0.23506)
     | > loss_disc_real_5: 0.29466  (0.23721)
     | > loss_0: 2.52946  (2.56328)
     | > grad_norm_0: 12.71353  (11.58036)
     | > loss_gen: 2.29859  (2.16921)
     | > loss_kl: 1.39544  (1.42482)
     | > loss_feat: 2.87503  (2.88103)
     | > loss_mel: 21.80971  (23.99481)
     | > loss_duration: 1.73174  (1.73975)
     | > amp_scaler: 512.00000  (512.00000)
     | > loss_1: 30.11051  (32.20963)
     | > grad_norm_1: 131.54491  (176.77086)
     | > current_lr_0: 0.00020 
     | > current_lr_1: 0.00020 
     | > step_time: 0.70530  (0.79590)
     | > loader_time: 0.00800  (0.00964)


[1m > EVALUATION [0m





> DataLoader initialization
| > Tokenizer:
	| > add_blank: True
	| > use_eos_bos: False
	| > use_phonemes: True
	| > phonemizer:
		| > phoneme language: es-es
		| > phoneme backend: gruut
	| > 6 not found characters:
	| > ɡ
	| > θ
	| > ͡
	| > ʃ
	| > ʝ
	| > ɾ
| > Number of instances : 4
 | > Preprocessing samples
 | > Max text length: 81
 | > Min text length: 46
 | > Avg text length: 66.5
 | 
 | > Max audio length: 80022.0
 | > Min audio length: 80022.0
 | > Avg audio length: 80022.0
 | > Num. instances discarded samples: 0
 | > Batch group size: 0.
 | > Synthesizing test sentences.



  [1m--> EVAL PERFORMANCE[0m
     | > avg_loader_time:[92m 0.60380 [0m(-0.12085)
     | > avg_loss_disc:[91m 2.68247 [0m(+0.24279)
     | > avg_loss_disc_real_0:[91m 0.37482 [0m(+0.25027)
     | > avg_loss_disc_real_1:[91m 0.19100 [0m(+0.02837)
     | > avg_loss_disc_real_2:[91m 0.20513 [0m(+0.07217)
     | > avg_loss_disc_real_3:[92m 0.23472 [0m(-0.04803)
     | > avg_loss_disc_real_4:[92m 0.16406 [0m(-0.07210)
     | > avg_loss_disc_real_5:[91m 0.21358 [0m(+0.03557)
     | > avg_loss_0:[91m 2.68247 [0m(+0.24279)
     | > avg_loss_gen:[91m 2.27900 [0m(+0.26149)
     | > avg_loss_kl:[91m 1.57888 [0m(+0.31098)
     | > avg_loss_feat:[92m 2.46383 [0m(-1.37051)
     | > avg_loss_mel:[92m 22.87894 [0m(-1.83349)
     | > avg_loss_duration:[91m 1.67285 [0m(+0.01506)
     | > avg_loss_1:[92m 30.87350 [0m(-2.61646)


[4m[1m > EPOCH: 54/100[0m
 --> /content/drive2/MyDrive/tts-ai/VITS-es-1-April-28-2023_09+16PM-7712052

[1m > TRAINING (2023-04-28 21:50:13) [



> DataLoader initialization
| > Tokenizer:
	| > add_blank: True
	| > use_eos_bos: False
	| > use_phonemes: True
	| > phonemizer:
		| > phoneme language: es-es
		| > phoneme backend: gruut
	| > 6 not found characters:
	| > ɡ
	| > θ
	| > ͡
	| > ʃ
	| > ʝ
	| > ɾ
| > Number of instances : 401
 | > Preprocessing samples
 | > Max text length: 94
 | > Min text length: 20
 | > Avg text length: 61.45885286783042
 | 
 | > Max audio length: 80022.0
 | > Min audio length: 64166.0
 | > Avg audio length: 79982.45885286783
 | > Num. instances discarded samples: 0
 | > Batch group size: 0.



[1m   --> STEP: 10/26 -- GLOBAL_STEP: 3625[0m
     | > loss_disc: 2.73534  (2.59120)
     | > loss_disc_real_0: 0.23400  (0.19367)
     | > loss_disc_real_1: 0.29031  (0.21565)
     | > loss_disc_real_2: 0.27227  (0.24252)
     | > loss_disc_real_3: 0.30131  (0.23997)
     | > loss_disc_real_4: 0.34478  (0.22942)
     | > loss_disc_real_5: 0.22303  (0.22953)
     | > loss_0: 2.73534  (2.59120)
     | > grad_norm_0: 10.32054  (17.01215)
     | > loss_gen: 2.04700  (2.12548)
     | > loss_kl: 1.57002  (1.50693)
     | > loss_feat: 2.42511  (2.72460)
     | > loss_mel: 24.23206  (24.83705)
     | > loss_duration: 1.72498  (1.73516)
     | > amp_scaler: 512.00000  (512.00000)
     | > loss_1: 31.99917  (32.92922)
     | > grad_norm_1: 78.22584  (129.74162)
     | > current_lr_0: 0.00020 
     | > current_lr_1: 0.00020 
     | > step_time: 0.73660  (0.75468)
     | > loader_time: 0.00850  (0.00875)


[1m > EVALUATION [0m





> DataLoader initialization
| > Tokenizer:
	| > add_blank: True
	| > use_eos_bos: False
	| > use_phonemes: True
	| > phonemizer:
		| > phoneme language: es-es
		| > phoneme backend: gruut
	| > 6 not found characters:
	| > ɡ
	| > θ
	| > ͡
	| > ʃ
	| > ʝ
	| > ɾ
| > Number of instances : 4
 | > Preprocessing samples
 | > Max text length: 81
 | > Min text length: 46
 | > Avg text length: 66.5
 | 
 | > Max audio length: 80022.0
 | > Min audio length: 80022.0
 | > Avg audio length: 80022.0
 | > Num. instances discarded samples: 0
 | > Batch group size: 0.
 | > Synthesizing test sentences.



  [1m--> EVAL PERFORMANCE[0m
     | > avg_loader_time:[92m 0.55323 [0m(-0.05057)
     | > avg_loss_disc:[92m 2.44529 [0m(-0.23718)
     | > avg_loss_disc_real_0:[92m 0.25329 [0m(-0.12153)
     | > avg_loss_disc_real_1:[91m 0.27491 [0m(+0.08392)
     | > avg_loss_disc_real_2:[92m 0.13497 [0m(-0.07016)
     | > avg_loss_disc_real_3:[92m 0.16703 [0m(-0.06768)
     | > avg_loss_disc_real_4:[92m 0.14721 [0m(-0.01685)
     | > avg_loss_disc_real_5:[92m 0.15778 [0m(-0.05580)
     | > avg_loss_0:[92m 2.44529 [0m(-0.23718)
     | > avg_loss_gen:[92m 2.11283 [0m(-0.16617)
     | > avg_loss_kl:[91m 1.74711 [0m(+0.16824)
     | > avg_loss_feat:[91m 3.64665 [0m(+1.18281)
     | > avg_loss_mel:[91m 26.47107 [0m(+3.59213)
     | > avg_loss_duration:[91m 1.67853 [0m(+0.00568)
     | > avg_loss_1:[91m 35.65619 [0m(+4.78269)


[4m[1m > EPOCH: 55/100[0m
 --> /content/drive2/MyDrive/tts-ai/VITS-es-1-April-28-2023_09+16PM-7712052

[1m > TRAINING (2023-04-28 21:50:48) [



> DataLoader initialization
| > Tokenizer:
	| > add_blank: True
	| > use_eos_bos: False
	| > use_phonemes: True
	| > phonemizer:
		| > phoneme language: es-es
		| > phoneme backend: gruut
	| > 6 not found characters:
	| > ɡ
	| > θ
	| > ͡
	| > ʃ
	| > ʝ
	| > ɾ
| > Number of instances : 401
 | > Preprocessing samples
 | > Max text length: 94
 | > Min text length: 20
 | > Avg text length: 61.45885286783042
 | 
 | > Max audio length: 80022.0
 | > Min audio length: 64166.0
 | > Avg audio length: 79982.45885286783
 | > Num. instances discarded samples: 0
 | > Batch group size: 0.



[1m   --> STEP: 9/26 -- GLOBAL_STEP: 3650[0m
     | > loss_disc: 2.78959  (2.63736)
     | > loss_disc_real_0: 0.33155  (0.23580)
     | > loss_disc_real_1: 0.18895  (0.20296)
     | > loss_disc_real_2: 0.25306  (0.24109)
     | > loss_disc_real_3: 0.25589  (0.23916)
     | > loss_disc_real_4: 0.27450  (0.21513)
     | > loss_disc_real_5: 0.26242  (0.24992)
     | > loss_0: 2.78959  (2.63736)
     | > grad_norm_0: 20.89263  (19.43092)
     | > loss_gen: 2.11459  (2.06929)
     | > loss_kl: 1.79825  (1.42388)
     | > loss_feat: 2.49434  (2.62066)
     | > loss_mel: 23.39539  (24.85036)
     | > loss_duration: 1.67102  (1.73913)
     | > amp_scaler: 512.00000  (512.00000)
     | > loss_1: 31.47359  (32.70332)
     | > grad_norm_1: 115.35475  (199.25476)
     | > current_lr_0: 0.00020 
     | > current_lr_1: 0.00020 
     | > step_time: 0.72690  (0.72567)
     | > loader_time: 0.00950  (0.00947)


[1m > EVALUATION [0m





> DataLoader initialization
| > Tokenizer:
	| > add_blank: True
	| > use_eos_bos: False
	| > use_phonemes: True
	| > phonemizer:
		| > phoneme language: es-es
		| > phoneme backend: gruut
	| > 6 not found characters:
	| > ɡ
	| > θ
	| > ͡
	| > ʃ
	| > ʝ
	| > ɾ
| > Number of instances : 4
 | > Preprocessing samples
 | > Max text length: 81
 | > Min text length: 46
 | > Avg text length: 66.5
 | 
 | > Max audio length: 80022.0
 | > Min audio length: 80022.0
 | > Avg audio length: 80022.0
 | > Num. instances discarded samples: 0
 | > Batch group size: 0.
 | > Synthesizing test sentences.



  [1m--> EVAL PERFORMANCE[0m
     | > avg_loader_time:[92m 0.50969 [0m(-0.04355)
     | > avg_loss_disc:[91m 2.57858 [0m(+0.13328)
     | > avg_loss_disc_real_0:[92m 0.11345 [0m(-0.13984)
     | > avg_loss_disc_real_1:[92m 0.16273 [0m(-0.11219)
     | > avg_loss_disc_real_2:[91m 0.17823 [0m(+0.04326)
     | > avg_loss_disc_real_3:[91m 0.30499 [0m(+0.13795)
     | > avg_loss_disc_real_4:[91m 0.24917 [0m(+0.10196)
     | > avg_loss_disc_real_5:[91m 0.24830 [0m(+0.09053)
     | > avg_loss_0:[91m 2.57858 [0m(+0.13328)
     | > avg_loss_gen:[92m 1.84088 [0m(-0.27195)
     | > avg_loss_kl:[91m 1.75801 [0m(+0.01090)
     | > avg_loss_feat:[92m 2.08049 [0m(-1.56616)
     | > avg_loss_mel:[92m 20.75351 [0m(-5.71756)
     | > avg_loss_duration:[92m 1.67429 [0m(-0.00424)
     | > avg_loss_1:[92m 28.10718 [0m(-7.54901)


[4m[1m > EPOCH: 56/100[0m
 --> /content/drive2/MyDrive/tts-ai/VITS-es-1-April-28-2023_09+16PM-7712052

[1m > TRAINING (2023-04-28 21:51:23) [



> DataLoader initialization
| > Tokenizer:
	| > add_blank: True
	| > use_eos_bos: False
	| > use_phonemes: True
	| > phonemizer:
		| > phoneme language: es-es
		| > phoneme backend: gruut
	| > 6 not found characters:
	| > ɡ
	| > θ
	| > ͡
	| > ʃ
	| > ʝ
	| > ɾ
| > Number of instances : 401
 | > Preprocessing samples
 | > Max text length: 94
 | > Min text length: 20
 | > Avg text length: 61.45885286783042
 | 
 | > Max audio length: 80022.0
 | > Min audio length: 64166.0
 | > Avg audio length: 79982.45885286783
 | > Num. instances discarded samples: 0
 | > Batch group size: 0.



[1m   --> STEP: 8/26 -- GLOBAL_STEP: 3675[0m
     | > loss_disc: 2.58917  (2.58537)
     | > loss_disc_real_0: 0.28655  (0.19989)
     | > loss_disc_real_1: 0.18064  (0.20671)
     | > loss_disc_real_2: 0.27361  (0.25388)
     | > loss_disc_real_3: 0.24305  (0.23448)
     | > loss_disc_real_4: 0.25157  (0.23222)
     | > loss_disc_real_5: 0.26138  (0.26227)
     | > loss_0: 2.58917  (2.58537)
     | > grad_norm_0: 17.33964  (10.85674)
     | > loss_gen: 2.01337  (2.10242)
     | > loss_kl: 1.46204  (1.56324)
     | > loss_feat: 2.91646  (2.64500)
     | > loss_mel: 23.82539  (23.31572)
     | > loss_duration: 1.80222  (1.74348)
     | > amp_scaler: 512.00000  (512.00000)
     | > loss_1: 32.01949  (31.36986)
     | > grad_norm_1: 290.26300  (245.94763)
     | > current_lr_0: 0.00020 
     | > current_lr_1: 0.00020 
     | > step_time: 0.86240  (0.77910)
     | > loader_time: 0.00880  (0.00908)


[1m > EVALUATION [0m





> DataLoader initialization
| > Tokenizer:
	| > add_blank: True
	| > use_eos_bos: False
	| > use_phonemes: True
	| > phonemizer:
		| > phoneme language: es-es
		| > phoneme backend: gruut
	| > 6 not found characters:
	| > ɡ
	| > θ
	| > ͡
	| > ʃ
	| > ʝ
	| > ɾ
| > Number of instances : 4
 | > Preprocessing samples
 | > Max text length: 81
 | > Min text length: 46
 | > Avg text length: 66.5
 | 
 | > Max audio length: 80022.0
 | > Min audio length: 80022.0
 | > Avg audio length: 80022.0
 | > Num. instances discarded samples: 0
 | > Batch group size: 0.
 | > Synthesizing test sentences.



  [1m--> EVAL PERFORMANCE[0m
     | > avg_loader_time:[91m 0.70791 [0m(+0.19823)
     | > avg_loss_disc:[91m 2.70096 [0m(+0.12238)
     | > avg_loss_disc_real_0:[91m 0.35335 [0m(+0.23990)
     | > avg_loss_disc_real_1:[91m 0.23739 [0m(+0.07466)
     | > avg_loss_disc_real_2:[91m 0.25240 [0m(+0.07416)
     | > avg_loss_disc_real_3:[92m 0.21932 [0m(-0.08567)
     | > avg_loss_disc_real_4:[91m 0.27793 [0m(+0.02875)
     | > avg_loss_disc_real_5:[91m 0.25106 [0m(+0.00276)
     | > avg_loss_0:[91m 2.70096 [0m(+0.12238)
     | > avg_loss_gen:[91m 2.31435 [0m(+0.47347)
     | > avg_loss_kl:[91m 2.01450 [0m(+0.25649)
     | > avg_loss_feat:[91m 2.38980 [0m(+0.30931)
     | > avg_loss_mel:[91m 21.64882 [0m(+0.89531)
     | > avg_loss_duration:[92m 1.65521 [0m(-0.01908)
     | > avg_loss_1:[91m 30.02269 [0m(+1.91551)


[4m[1m > EPOCH: 57/100[0m
 --> /content/drive2/MyDrive/tts-ai/VITS-es-1-April-28-2023_09+16PM-7712052

[1m > TRAINING (2023-04-28 21:51:57) [



> DataLoader initialization
| > Tokenizer:
	| > add_blank: True
	| > use_eos_bos: False
	| > use_phonemes: True
	| > phonemizer:
		| > phoneme language: es-es
		| > phoneme backend: gruut
	| > 6 not found characters:
	| > ɡ
	| > θ
	| > ͡
	| > ʃ
	| > ʝ
	| > ɾ
| > Number of instances : 401
 | > Preprocessing samples
 | > Max text length: 94
 | > Min text length: 20
 | > Avg text length: 61.45885286783042
 | 
 | > Max audio length: 80022.0
 | > Min audio length: 64166.0
 | > Avg audio length: 79982.45885286783
 | > Num. instances discarded samples: 0
 | > Batch group size: 0.



[1m   --> STEP: 7/26 -- GLOBAL_STEP: 3700[0m
     | > loss_disc: 2.49579  (2.58364)
     | > loss_disc_real_0: 0.12405  (0.16835)
     | > loss_disc_real_1: 0.14749  (0.20777)
     | > loss_disc_real_2: 0.20151  (0.23995)
     | > loss_disc_real_3: 0.29111  (0.24950)
     | > loss_disc_real_4: 0.19538  (0.22509)
     | > loss_disc_real_5: 0.27977  (0.25171)
     | > loss_0: 2.49579  (2.58364)
     | > grad_norm_0: 13.23802  (12.84231)
     | > loss_gen: 2.02726  (2.05319)
     | > loss_kl: 1.45072  (1.44196)
     | > loss_feat: 2.66930  (2.54654)
     | > loss_mel: 23.86865  (22.69293)
     | > loss_duration: 1.74171  (1.74379)
     | > amp_scaler: 512.00000  (512.00000)
     | > loss_1: 31.75763  (30.47841)
     | > grad_norm_1: 266.18451  (264.31735)
     | > current_lr_0: 0.00020 
     | > current_lr_1: 0.00020 
     | > step_time: 0.83750  (0.84753)
     | > loader_time: 0.02140  (0.01833)


[1m > EVALUATION [0m





> DataLoader initialization
| > Tokenizer:
	| > add_blank: True
	| > use_eos_bos: False
	| > use_phonemes: True
	| > phonemizer:
		| > phoneme language: es-es
		| > phoneme backend: gruut
	| > 6 not found characters:
	| > ɡ
	| > θ
	| > ͡
	| > ʃ
	| > ʝ
	| > ɾ
| > Number of instances : 4
 | > Preprocessing samples
 | > Max text length: 81
 | > Min text length: 46
 | > Avg text length: 66.5
 | 
 | > Max audio length: 80022.0
 | > Min audio length: 80022.0
 | > Avg audio length: 80022.0
 | > Num. instances discarded samples: 0
 | > Batch group size: 0.
 | > Synthesizing test sentences.



  [1m--> EVAL PERFORMANCE[0m
     | > avg_loader_time:[91m 0.77001 [0m(+0.06209)
     | > avg_loss_disc:[91m 2.80444 [0m(+0.10349)
     | > avg_loss_disc_real_0:[91m 0.36939 [0m(+0.01604)
     | > avg_loss_disc_real_1:[91m 0.25069 [0m(+0.01330)
     | > avg_loss_disc_real_2:[92m 0.20801 [0m(-0.04439)
     | > avg_loss_disc_real_3:[92m 0.20982 [0m(-0.00950)
     | > avg_loss_disc_real_4:[92m 0.19965 [0m(-0.07827)
     | > avg_loss_disc_real_5:[92m 0.19000 [0m(-0.06106)
     | > avg_loss_0:[91m 2.80444 [0m(+0.10349)
     | > avg_loss_gen:[92m 2.03688 [0m(-0.27748)
     | > avg_loss_kl:[92m 1.38392 [0m(-0.63058)
     | > avg_loss_feat:[92m 2.35931 [0m(-0.03049)
     | > avg_loss_mel:[91m 22.73580 [0m(+1.08698)
     | > avg_loss_duration:[91m 1.68386 [0m(+0.02865)
     | > avg_loss_1:[91m 30.19978 [0m(+0.17709)


[4m[1m > EPOCH: 58/100[0m
 --> /content/drive2/MyDrive/tts-ai/VITS-es-1-April-28-2023_09+16PM-7712052

[1m > TRAINING (2023-04-28 21:52:30) [



> DataLoader initialization
| > Tokenizer:
	| > add_blank: True
	| > use_eos_bos: False
	| > use_phonemes: True
	| > phonemizer:
		| > phoneme language: es-es
		| > phoneme backend: gruut
	| > 6 not found characters:
	| > ɡ
	| > θ
	| > ͡
	| > ʃ
	| > ʝ
	| > ɾ
| > Number of instances : 401
 | > Preprocessing samples
 | > Max text length: 94
 | > Min text length: 20
 | > Avg text length: 61.45885286783042
 | 
 | > Max audio length: 80022.0
 | > Min audio length: 64166.0
 | > Avg audio length: 79982.45885286783
 | > Num. instances discarded samples: 0
 | > Batch group size: 0.



[1m   --> STEP: 6/26 -- GLOBAL_STEP: 3725[0m
     | > loss_disc: 2.69621  (2.70391)
     | > loss_disc_real_0: 0.17017  (0.19608)
     | > loss_disc_real_1: 0.24192  (0.22634)
     | > loss_disc_real_2: 0.25535  (0.24016)
     | > loss_disc_real_3: 0.22977  (0.24448)
     | > loss_disc_real_4: 0.27842  (0.21792)
     | > loss_disc_real_5: 0.24427  (0.25244)
     | > loss_0: 2.69621  (2.70391)
     | > grad_norm_0: 8.06965  (14.11070)
     | > loss_gen: 2.06506  (2.01555)
     | > loss_kl: 1.58880  (1.39478)
     | > loss_feat: 2.02054  (2.31502)
     | > loss_mel: 22.79703  (23.28469)
     | > loss_duration: 1.75157  (1.73028)
     | > amp_scaler: 512.00000  (512.00000)
     | > loss_1: 30.22300  (30.74032)
     | > grad_norm_1: 378.50970  (280.55170)
     | > current_lr_0: 0.00020 
     | > current_lr_1: 0.00020 
     | > step_time: 0.72570  (0.79280)
     | > loader_time: 0.01000  (0.01040)


[1m > EVALUATION [0m





> DataLoader initialization
| > Tokenizer:
	| > add_blank: True
	| > use_eos_bos: False
	| > use_phonemes: True
	| > phonemizer:
		| > phoneme language: es-es
		| > phoneme backend: gruut
	| > 6 not found characters:
	| > ɡ
	| > θ
	| > ͡
	| > ʃ
	| > ʝ
	| > ɾ
| > Number of instances : 4
 | > Preprocessing samples
 | > Max text length: 81
 | > Min text length: 46
 | > Avg text length: 66.5
 | 
 | > Max audio length: 80022.0
 | > Min audio length: 80022.0
 | > Avg audio length: 80022.0
 | > Num. instances discarded samples: 0
 | > Batch group size: 0.
 | > Synthesizing test sentences.



  [1m--> EVAL PERFORMANCE[0m
     | > avg_loader_time:[92m 0.48639 [0m(-0.28362)
     | > avg_loss_disc:[92m 2.47622 [0m(-0.32822)
     | > avg_loss_disc_real_0:[92m 0.34331 [0m(-0.02609)
     | > avg_loss_disc_real_1:[92m 0.12657 [0m(-0.12412)
     | > avg_loss_disc_real_2:[92m 0.17817 [0m(-0.02984)
     | > avg_loss_disc_real_3:[92m 0.08778 [0m(-0.12204)
     | > avg_loss_disc_real_4:[92m 0.18749 [0m(-0.01216)
     | > avg_loss_disc_real_5:[92m 0.17397 [0m(-0.01602)
     | > avg_loss_0:[92m 2.47622 [0m(-0.32822)
     | > avg_loss_gen:[92m 1.99226 [0m(-0.04461)
     | > avg_loss_kl:[91m 1.82027 [0m(+0.43634)
     | > avg_loss_feat:[91m 3.49624 [0m(+1.13692)
     | > avg_loss_mel:[91m 25.23227 [0m(+2.49647)
     | > avg_loss_duration:[92m 1.66152 [0m(-0.02234)
     | > avg_loss_1:[91m 34.20255 [0m(+4.00277)


[4m[1m > EPOCH: 59/100[0m
 --> /content/drive2/MyDrive/tts-ai/VITS-es-1-April-28-2023_09+16PM-7712052

[1m > TRAINING (2023-04-28 21:53:05) [



> DataLoader initialization
| > Tokenizer:
	| > add_blank: True
	| > use_eos_bos: False
	| > use_phonemes: True
	| > phonemizer:
		| > phoneme language: es-es
		| > phoneme backend: gruut
	| > 6 not found characters:
	| > ɡ
	| > θ
	| > ͡
	| > ʃ
	| > ʝ
	| > ɾ
| > Number of instances : 401
 | > Preprocessing samples
 | > Max text length: 94
 | > Min text length: 20
 | > Avg text length: 61.45885286783042
 | 
 | > Max audio length: 80022.0
 | > Min audio length: 64166.0
 | > Avg audio length: 79982.45885286783
 | > Num. instances discarded samples: 0
 | > Batch group size: 0.



[1m   --> STEP: 5/26 -- GLOBAL_STEP: 3750[0m
     | > loss_disc: 2.56890  (2.62157)
     | > loss_disc_real_0: 0.24384  (0.19459)
     | > loss_disc_real_1: 0.18574  (0.20829)
     | > loss_disc_real_2: 0.25248  (0.24448)
     | > loss_disc_real_3: 0.20835  (0.25002)
     | > loss_disc_real_4: 0.23203  (0.22992)
     | > loss_disc_real_5: 0.21505  (0.25646)
     | > loss_0: 2.56890  (2.62157)
     | > grad_norm_0: 11.51540  (14.19714)
     | > loss_gen: 2.24299  (2.07764)
     | > loss_kl: 1.49818  (1.53741)
     | > loss_feat: 3.08604  (2.61977)
     | > loss_mel: 22.33029  (22.35959)
     | > loss_duration: 1.67963  (1.72773)
     | > amp_scaler: 512.00000  (512.00000)
     | > loss_1: 30.83712  (30.32213)
     | > grad_norm_1: 215.19943  (178.74754)
     | > current_lr_0: 0.00020 
     | > current_lr_1: 0.00020 
     | > step_time: 0.73250  (0.72978)
     | > loader_time: 0.00800  (0.00849)


[1m > EVALUATION [0m





> DataLoader initialization
| > Tokenizer:
	| > add_blank: True
	| > use_eos_bos: False
	| > use_phonemes: True
	| > phonemizer:
		| > phoneme language: es-es
		| > phoneme backend: gruut
	| > 6 not found characters:
	| > ɡ
	| > θ
	| > ͡
	| > ʃ
	| > ʝ
	| > ɾ
| > Number of instances : 4
 | > Preprocessing samples
 | > Max text length: 81
 | > Min text length: 46
 | > Avg text length: 66.5
 | 
 | > Max audio length: 80022.0
 | > Min audio length: 80022.0
 | > Avg audio length: 80022.0
 | > Num. instances discarded samples: 0
 | > Batch group size: 0.
 | > Synthesizing test sentences.



  [1m--> EVAL PERFORMANCE[0m
     | > avg_loader_time:[91m 0.51443 [0m(+0.02804)
     | > avg_loss_disc:[92m 2.34977 [0m(-0.12646)
     | > avg_loss_disc_real_0:[92m 0.33809 [0m(-0.00522)
     | > avg_loss_disc_real_1:[91m 0.15499 [0m(+0.02842)
     | > avg_loss_disc_real_2:[92m 0.13370 [0m(-0.04447)
     | > avg_loss_disc_real_3:[91m 0.18876 [0m(+0.10098)
     | > avg_loss_disc_real_4:[92m 0.11377 [0m(-0.07372)
     | > avg_loss_disc_real_5:[92m 0.14503 [0m(-0.02895)
     | > avg_loss_0:[92m 2.34977 [0m(-0.12646)
     | > avg_loss_gen:[91m 2.28762 [0m(+0.29535)
     | > avg_loss_kl:[91m 1.91969 [0m(+0.09942)
     | > avg_loss_feat:[91m 4.17198 [0m(+0.67575)
     | > avg_loss_mel:[91m 26.13223 [0m(+0.89996)
     | > avg_loss_duration:[92m 1.64585 [0m(-0.01567)
     | > avg_loss_1:[91m 36.15737 [0m(+1.95482)


[4m[1m > EPOCH: 60/100[0m
 --> /content/drive2/MyDrive/tts-ai/VITS-es-1-April-28-2023_09+16PM-7712052

[1m > TRAINING (2023-04-28 21:53:39) [



> DataLoader initialization
| > Tokenizer:
	| > add_blank: True
	| > use_eos_bos: False
	| > use_phonemes: True
	| > phonemizer:
		| > phoneme language: es-es
		| > phoneme backend: gruut
	| > 6 not found characters:
	| > ɡ
	| > θ
	| > ͡
	| > ʃ
	| > ʝ
	| > ɾ
| > Number of instances : 401
 | > Preprocessing samples
 | > Max text length: 94
 | > Min text length: 20
 | > Avg text length: 61.45885286783042
 | 
 | > Max audio length: 80022.0
 | > Min audio length: 64166.0
 | > Avg audio length: 79982.45885286783
 | > Num. instances discarded samples: 0
 | > Batch group size: 0.



[1m   --> STEP: 4/26 -- GLOBAL_STEP: 3775[0m
     | > loss_disc: 2.67782  (2.63123)
     | > loss_disc_real_0: 0.40322  (0.21932)
     | > loss_disc_real_1: 0.22491  (0.20354)
     | > loss_disc_real_2: 0.34578  (0.25977)
     | > loss_disc_real_3: 0.28076  (0.24502)
     | > loss_disc_real_4: 0.22007  (0.21718)
     | > loss_disc_real_5: 0.25392  (0.24550)
     | > loss_0: 2.67782  (2.63123)
     | > grad_norm_0: 36.75163  (24.71079)
     | > loss_gen: 2.32769  (2.18588)
     | > loss_kl: 1.40469  (1.39670)
     | > loss_feat: 2.58982  (2.63607)
     | > loss_mel: 23.88747  (23.24053)
     | > loss_duration: 1.71383  (1.75034)
     | > amp_scaler: 512.00000  (512.00000)
     | > loss_1: 31.92351  (31.20952)
     | > grad_norm_1: 189.11278  (211.62436)
     | > current_lr_0: 0.00020 
     | > current_lr_1: 0.00020 
     | > step_time: 0.72120  (0.73033)
     | > loader_time: 0.01020  (0.00981)


[1m > EVALUATION [0m





> DataLoader initialization
| > Tokenizer:
	| > add_blank: True
	| > use_eos_bos: False
	| > use_phonemes: True
	| > phonemizer:
		| > phoneme language: es-es
		| > phoneme backend: gruut
	| > 6 not found characters:
	| > ɡ
	| > θ
	| > ͡
	| > ʃ
	| > ʝ
	| > ɾ
| > Number of instances : 4
 | > Preprocessing samples
 | > Max text length: 81
 | > Min text length: 46
 | > Avg text length: 66.5
 | 
 | > Max audio length: 80022.0
 | > Min audio length: 80022.0
 | > Avg audio length: 80022.0
 | > Num. instances discarded samples: 0
 | > Batch group size: 0.
 | > Synthesizing test sentences.



  [1m--> EVAL PERFORMANCE[0m
     | > avg_loader_time:[92m 0.48858 [0m(-0.02585)
     | > avg_loss_disc:[91m 2.52938 [0m(+0.17962)
     | > avg_loss_disc_real_0:[92m 0.26149 [0m(-0.07660)
     | > avg_loss_disc_real_1:[91m 0.15581 [0m(+0.00082)
     | > avg_loss_disc_real_2:[91m 0.16345 [0m(+0.02975)
     | > avg_loss_disc_real_3:[91m 0.22594 [0m(+0.03719)
     | > avg_loss_disc_real_4:[91m 0.26587 [0m(+0.15210)
     | > avg_loss_disc_real_5:[91m 0.23378 [0m(+0.08876)
     | > avg_loss_0:[91m 2.52938 [0m(+0.17962)
     | > avg_loss_gen:[92m 2.15763 [0m(-0.12998)
     | > avg_loss_kl:[92m 1.35449 [0m(-0.56520)
     | > avg_loss_feat:[92m 3.17147 [0m(-1.00052)
     | > avg_loss_mel:[92m 23.73106 [0m(-2.40117)
     | > avg_loss_duration:[91m 1.66660 [0m(+0.02076)
     | > avg_loss_1:[92m 32.08125 [0m(-4.07611)


[4m[1m > EPOCH: 61/100[0m
 --> /content/drive2/MyDrive/tts-ai/VITS-es-1-April-28-2023_09+16PM-7712052

[1m > TRAINING (2023-04-28 21:54:14) [



> DataLoader initialization
| > Tokenizer:
	| > add_blank: True
	| > use_eos_bos: False
	| > use_phonemes: True
	| > phonemizer:
		| > phoneme language: es-es
		| > phoneme backend: gruut
	| > 6 not found characters:
	| > ɡ
	| > θ
	| > ͡
	| > ʃ
	| > ʝ
	| > ɾ
| > Number of instances : 401
 | > Preprocessing samples
 | > Max text length: 94
 | > Min text length: 20
 | > Avg text length: 61.45885286783042
 | 
 | > Max audio length: 80022.0
 | > Min audio length: 64166.0
 | > Avg audio length: 79982.45885286783
 | > Num. instances discarded samples: 0
 | > Batch group size: 0.



[1m   --> STEP: 3/26 -- GLOBAL_STEP: 3800[0m
     | > loss_disc: 2.58778  (2.58047)
     | > loss_disc_real_0: 0.16549  (0.14932)
     | > loss_disc_real_1: 0.20455  (0.20160)
     | > loss_disc_real_2: 0.24397  (0.23924)
     | > loss_disc_real_3: 0.27921  (0.23215)
     | > loss_disc_real_4: 0.16765  (0.19437)
     | > loss_disc_real_5: 0.21121  (0.23839)
     | > loss_0: 2.58778  (2.58047)
     | > grad_norm_0: 5.26342  (20.56562)
     | > loss_gen: 2.25623  (2.04277)
     | > loss_kl: 1.57713  (1.39945)
     | > loss_feat: 2.47568  (2.47525)
     | > loss_mel: 23.74947  (23.82480)
     | > loss_duration: 1.78275  (1.75206)
     | > amp_scaler: 512.00000  (512.00000)
     | > loss_1: 31.84126  (31.49432)
     | > grad_norm_1: 135.60135  (166.18306)
     | > current_lr_0: 0.00020 
     | > current_lr_1: 0.00020 
     | > step_time: 0.84200  (0.77335)
     | > loader_time: 0.00580  (0.00832)


[1m > EVALUATION [0m





> DataLoader initialization
| > Tokenizer:
	| > add_blank: True
	| > use_eos_bos: False
	| > use_phonemes: True
	| > phonemizer:
		| > phoneme language: es-es
		| > phoneme backend: gruut
	| > 6 not found characters:
	| > ɡ
	| > θ
	| > ͡
	| > ʃ
	| > ʝ
	| > ɾ
| > Number of instances : 4
 | > Preprocessing samples
 | > Max text length: 81
 | > Min text length: 46
 | > Avg text length: 66.5
 | 
 | > Max audio length: 80022.0
 | > Min audio length: 80022.0
 | > Avg audio length: 80022.0
 | > Num. instances discarded samples: 0
 | > Batch group size: 0.
 | > Synthesizing test sentences.



  [1m--> EVAL PERFORMANCE[0m
     | > avg_loader_time:[91m 0.71573 [0m(+0.22715)
     | > avg_loss_disc:[91m 3.02095 [0m(+0.49157)
     | > avg_loss_disc_real_0:[91m 0.47829 [0m(+0.21680)
     | > avg_loss_disc_real_1:[92m 0.15571 [0m(-0.00009)
     | > avg_loss_disc_real_2:[91m 0.23779 [0m(+0.07434)
     | > avg_loss_disc_real_3:[91m 0.25492 [0m(+0.02898)
     | > avg_loss_disc_real_4:[91m 0.38471 [0m(+0.11884)
     | > avg_loss_disc_real_5:[91m 0.25316 [0m(+0.01937)
     | > avg_loss_0:[91m 3.02095 [0m(+0.49157)
     | > avg_loss_gen:[91m 2.22909 [0m(+0.07146)
     | > avg_loss_kl:[91m 1.82958 [0m(+0.47510)
     | > avg_loss_feat:[92m 1.90806 [0m(-1.26341)
     | > avg_loss_mel:[92m 21.03805 [0m(-2.69301)
     | > avg_loss_duration:[92m 1.65813 [0m(-0.00848)
     | > avg_loss_1:[92m 28.66291 [0m(-3.41835)


[4m[1m > EPOCH: 62/100[0m
 --> /content/drive2/MyDrive/tts-ai/VITS-es-1-April-28-2023_09+16PM-7712052

[1m > TRAINING (2023-04-28 21:54:47) [



> DataLoader initialization
| > Tokenizer:
	| > add_blank: True
	| > use_eos_bos: False
	| > use_phonemes: True
	| > phonemizer:
		| > phoneme language: es-es
		| > phoneme backend: gruut
	| > 6 not found characters:
	| > ɡ
	| > θ
	| > ͡
	| > ʃ
	| > ʝ
	| > ɾ
| > Number of instances : 401
 | > Preprocessing samples
 | > Max text length: 94
 | > Min text length: 20
 | > Avg text length: 61.45885286783042
 | 
 | > Max audio length: 80022.0
 | > Min audio length: 64166.0
 | > Avg audio length: 79982.45885286783
 | > Num. instances discarded samples: 0
 | > Batch group size: 0.



[1m   --> STEP: 2/26 -- GLOBAL_STEP: 3825[0m
     | > loss_disc: 2.81902  (2.80659)
     | > loss_disc_real_0: 0.07807  (0.11985)
     | > loss_disc_real_1: 0.23060  (0.17488)
     | > loss_disc_real_2: 0.34372  (0.29713)
     | > loss_disc_real_3: 0.27172  (0.25428)
     | > loss_disc_real_4: 0.31372  (0.36279)
     | > loss_disc_real_5: 0.28778  (0.25878)
     | > loss_0: 2.81902  (2.80659)
     | > grad_norm_0: 38.56098  (23.96168)
     | > loss_gen: 2.28190  (2.18701)
     | > loss_kl: 1.49642  (1.51697)
     | > loss_feat: 2.00072  (2.40040)
     | > loss_mel: 20.62901  (23.22277)
     | > loss_duration: 1.75152  (1.73405)
     | > amp_scaler: 512.00000  (512.00000)
     | > loss_1: 28.15957  (31.06120)
     | > grad_norm_1: 93.61497  (169.16029)
     | > current_lr_0: 0.00020 
     | > current_lr_1: 0.00020 
     | > step_time: 0.86110  (0.86751)
     | > loader_time: 0.01330  (0.01029)


[1m > EVALUATION [0m





> DataLoader initialization
| > Tokenizer:
	| > add_blank: True
	| > use_eos_bos: False
	| > use_phonemes: True
	| > phonemizer:
		| > phoneme language: es-es
		| > phoneme backend: gruut
	| > 6 not found characters:
	| > ɡ
	| > θ
	| > ͡
	| > ʃ
	| > ʝ
	| > ɾ
| > Number of instances : 4
 | > Preprocessing samples
 | > Max text length: 81
 | > Min text length: 46
 | > Avg text length: 66.5
 | 
 | > Max audio length: 80022.0
 | > Min audio length: 80022.0
 | > Avg audio length: 80022.0
 | > Num. instances discarded samples: 0
 | > Batch group size: 0.
 | > Synthesizing test sentences.



  [1m--> EVAL PERFORMANCE[0m
     | > avg_loader_time:[92m 0.50772 [0m(-0.20800)
     | > avg_loss_disc:[92m 2.67911 [0m(-0.34184)
     | > avg_loss_disc_real_0:[92m 0.31634 [0m(-0.16195)
     | > avg_loss_disc_real_1:[91m 0.21062 [0m(+0.05490)
     | > avg_loss_disc_real_2:[92m 0.20728 [0m(-0.03051)
     | > avg_loss_disc_real_3:[92m 0.19889 [0m(-0.05603)
     | > avg_loss_disc_real_4:[92m 0.21749 [0m(-0.16722)
     | > avg_loss_disc_real_5:[92m 0.25250 [0m(-0.00066)
     | > avg_loss_0:[92m 2.67911 [0m(-0.34184)
     | > avg_loss_gen:[92m 2.07499 [0m(-0.15410)
     | > avg_loss_kl:[92m 0.96264 [0m(-0.86694)
     | > avg_loss_feat:[91m 2.86384 [0m(+0.95578)
     | > avg_loss_mel:[91m 23.17448 [0m(+2.13643)
     | > avg_loss_duration:[92m 1.65175 [0m(-0.00637)
     | > avg_loss_1:[91m 30.72771 [0m(+2.06480)


[4m[1m > EPOCH: 63/100[0m
 --> /content/drive2/MyDrive/tts-ai/VITS-es-1-April-28-2023_09+16PM-7712052

[1m > TRAINING (2023-04-28 21:55:22) [



> DataLoader initialization
| > Tokenizer:
	| > add_blank: True
	| > use_eos_bos: False
	| > use_phonemes: True
	| > phonemizer:
		| > phoneme language: es-es
		| > phoneme backend: gruut
	| > 6 not found characters:
	| > ɡ
	| > θ
	| > ͡
	| > ʃ
	| > ʝ
	| > ɾ
| > Number of instances : 401
 | > Preprocessing samples
 | > Max text length: 94
 | > Min text length: 20
 | > Avg text length: 61.45885286783042
 | 
 | > Max audio length: 80022.0
 | > Min audio length: 64166.0
 | > Avg audio length: 79982.45885286783
 | > Num. instances discarded samples: 0
 | > Batch group size: 0.



[1m   --> STEP: 1/26 -- GLOBAL_STEP: 3850[0m
     | > loss_disc: 2.66949  (2.66949)
     | > loss_disc_real_0: 0.35508  (0.35508)
     | > loss_disc_real_1: 0.16793  (0.16793)
     | > loss_disc_real_2: 0.25884  (0.25884)
     | > loss_disc_real_3: 0.20941  (0.20941)
     | > loss_disc_real_4: 0.18554  (0.18554)
     | > loss_disc_real_5: 0.27035  (0.27035)
     | > loss_0: 2.66949  (2.66949)
     | > grad_norm_0: 27.28374  (27.28374)
     | > loss_gen: 1.87944  (1.87944)
     | > loss_kl: 1.67509  (1.67509)
     | > loss_feat: 2.21954  (2.21954)
     | > loss_mel: 24.47942  (24.47942)
     | > loss_duration: 1.70398  (1.70398)
     | > amp_scaler: 512.00000  (512.00000)
     | > loss_1: 31.95747  (31.95747)
     | > grad_norm_1: 226.47954  (226.47954)
     | > current_lr_0: 0.00020 
     | > current_lr_1: 0.00020 
     | > step_time: 0.74800  (0.74803)
     | > loader_time: 0.00770  (0.00772)


[1m > EVALUATION [0m





> DataLoader initialization
| > Tokenizer:
	| > add_blank: True
	| > use_eos_bos: False
	| > use_phonemes: True
	| > phonemizer:
		| > phoneme language: es-es
		| > phoneme backend: gruut
	| > 6 not found characters:
	| > ɡ
	| > θ
	| > ͡
	| > ʃ
	| > ʝ
	| > ɾ
| > Number of instances : 4
 | > Preprocessing samples
 | > Max text length: 81
 | > Min text length: 46
 | > Avg text length: 66.5
 | 
 | > Max audio length: 80022.0
 | > Min audio length: 80022.0
 | > Avg audio length: 80022.0
 | > Num. instances discarded samples: 0
 | > Batch group size: 0.
 | > Synthesizing test sentences.



  [1m--> EVAL PERFORMANCE[0m
     | > avg_loader_time:[91m 0.52511 [0m(+0.01739)
     | > avg_loss_disc:[92m 2.38368 [0m(-0.29543)
     | > avg_loss_disc_real_0:[92m 0.10979 [0m(-0.20654)
     | > avg_loss_disc_real_1:[92m 0.18261 [0m(-0.02801)
     | > avg_loss_disc_real_2:[92m 0.11666 [0m(-0.09062)
     | > avg_loss_disc_real_3:[92m 0.16236 [0m(-0.03653)
     | > avg_loss_disc_real_4:[92m 0.19706 [0m(-0.02043)
     | > avg_loss_disc_real_5:[92m 0.21562 [0m(-0.03688)
     | > avg_loss_0:[92m 2.38368 [0m(-0.29543)
     | > avg_loss_gen:[92m 2.00884 [0m(-0.06615)
     | > avg_loss_kl:[92m 0.72126 [0m(-0.24139)
     | > avg_loss_feat:[91m 3.92263 [0m(+1.05879)
     | > avg_loss_mel:[91m 23.77502 [0m(+0.60054)
     | > avg_loss_duration:[91m 1.66808 [0m(+0.01632)
     | > avg_loss_1:[91m 32.09583 [0m(+1.36812)


[4m[1m > EPOCH: 64/100[0m
 --> /content/drive2/MyDrive/tts-ai/VITS-es-1-April-28-2023_09+16PM-7712052

[1m > TRAINING (2023-04-28 21:55:57) [



> DataLoader initialization
| > Tokenizer:
	| > add_blank: True
	| > use_eos_bos: False
	| > use_phonemes: True
	| > phonemizer:
		| > phoneme language: es-es
		| > phoneme backend: gruut
	| > 6 not found characters:
	| > ɡ
	| > θ
	| > ͡
	| > ʃ
	| > ʝ
	| > ɾ
| > Number of instances : 401
 | > Preprocessing samples
 | > Max text length: 94
 | > Min text length: 20
 | > Avg text length: 61.45885286783042
 | 
 | > Max audio length: 80022.0
 | > Min audio length: 64166.0
 | > Avg audio length: 79982.45885286783
 | > Num. instances discarded samples: 0
 | > Batch group size: 0.



[1m   --> STEP: 0/26 -- GLOBAL_STEP: 3875[0m
     | > loss_disc: 2.74412  (2.74412)
     | > loss_disc_real_0: 0.19235  (0.19235)
     | > loss_disc_real_1: 0.26756  (0.26756)
     | > loss_disc_real_2: 0.19367  (0.19367)
     | > loss_disc_real_3: 0.21645  (0.21645)
     | > loss_disc_real_4: 0.23769  (0.23769)
     | > loss_disc_real_5: 0.25020  (0.25020)
     | > loss_0: 2.74412  (2.74412)
     | > grad_norm_0: 5.92852  (5.92852)
     | > loss_gen: 2.09293  (2.09293)
     | > loss_kl: 1.73321  (1.73321)
     | > loss_feat: 2.64118  (2.64118)
     | > loss_mel: 24.01439  (24.01439)
     | > loss_duration: 1.73331  (1.73331)
     | > amp_scaler: 512.00000  (512.00000)
     | > loss_1: 32.21502  (32.21502)
     | > grad_norm_1: 169.49565  (169.49565)
     | > current_lr_0: 0.00020 
     | > current_lr_1: 0.00020 
     | > step_time: 1.00310  (1.00308)
     | > loader_time: 0.68200  (0.68203)


[1m   --> STEP: 25/26 -- GLOBAL_STEP: 3900[0m
     | > loss_disc: 3.02156  (2.60726)
   



> DataLoader initialization
| > Tokenizer:
	| > add_blank: True
	| > use_eos_bos: False
	| > use_phonemes: True
	| > phonemizer:
		| > phoneme language: es-es
		| > phoneme backend: gruut
	| > 6 not found characters:
	| > ɡ
	| > θ
	| > ͡
	| > ʃ
	| > ʝ
	| > ɾ
| > Number of instances : 4
 | > Preprocessing samples
 | > Max text length: 81
 | > Min text length: 46
 | > Avg text length: 66.5
 | 
 | > Max audio length: 80022.0
 | > Min audio length: 80022.0
 | > Avg audio length: 80022.0
 | > Num. instances discarded samples: 0
 | > Batch group size: 0.
 | > Synthesizing test sentences.



  [1m--> EVAL PERFORMANCE[0m
     | > avg_loader_time:[92m 0.50777 [0m(-0.01734)
     | > avg_loss_disc:[91m 2.61345 [0m(+0.22977)
     | > avg_loss_disc_real_0:[91m 0.23841 [0m(+0.12862)
     | > avg_loss_disc_real_1:[92m 0.13493 [0m(-0.04768)
     | > avg_loss_disc_real_2:[91m 0.20082 [0m(+0.08417)
     | > avg_loss_disc_real_3:[91m 0.17333 [0m(+0.01097)
     | > avg_loss_disc_real_4:[91m 0.22503 [0m(+0.02797)
     | > avg_loss_disc_real_5:[91m 0.23946 [0m(+0.02384)
     | > avg_loss_0:[91m 2.61345 [0m(+0.22977)
     | > avg_loss_gen:[92m 1.99529 [0m(-0.01355)
     | > avg_loss_kl:[91m 1.48808 [0m(+0.76682)
     | > avg_loss_feat:[92m 3.33601 [0m(-0.58662)
     | > avg_loss_mel:[91m 24.87126 [0m(+1.09624)
     | > avg_loss_duration:[92m 1.65237 [0m(-0.01570)
     | > avg_loss_1:[91m 33.34301 [0m(+1.24718)


[4m[1m > EPOCH: 65/100[0m
 --> /content/drive2/MyDrive/tts-ai/VITS-es-1-April-28-2023_09+16PM-7712052

[1m > TRAINING (2023-04-28 21:56:31) [



> DataLoader initialization
| > Tokenizer:
	| > add_blank: True
	| > use_eos_bos: False
	| > use_phonemes: True
	| > phonemizer:
		| > phoneme language: es-es
		| > phoneme backend: gruut
	| > 6 not found characters:
	| > ɡ
	| > θ
	| > ͡
	| > ʃ
	| > ʝ
	| > ɾ
| > Number of instances : 401
 | > Preprocessing samples
 | > Max text length: 94
 | > Min text length: 20
 | > Avg text length: 61.45885286783042
 | 
 | > Max audio length: 80022.0
 | > Min audio length: 64166.0
 | > Avg audio length: 79982.45885286783
 | > Num. instances discarded samples: 0
 | > Batch group size: 0.



[1m   --> STEP: 24/26 -- GLOBAL_STEP: 3925[0m
     | > loss_disc: 2.79286  (2.62476)
     | > loss_disc_real_0: 0.13330  (0.19169)
     | > loss_disc_real_1: 0.17197  (0.20427)
     | > loss_disc_real_2: 0.21848  (0.24163)
     | > loss_disc_real_3: 0.22824  (0.24125)
     | > loss_disc_real_4: 0.29978  (0.23747)
     | > loss_disc_real_5: 0.27776  (0.24419)
     | > loss_0: 2.79286  (2.62476)
     | > grad_norm_0: 26.76853  (12.70578)
     | > loss_gen: 1.94348  (2.02257)
     | > loss_kl: 1.26513  (1.35687)
     | > loss_feat: 2.16657  (2.48419)
     | > loss_mel: 22.94823  (23.21044)
     | > loss_duration: 1.81772  (1.75941)
     | > amp_scaler: 512.00000  (512.00000)
     | > loss_1: 30.14113  (30.83349)
     | > grad_norm_1: 198.51123  (209.65710)
     | > current_lr_0: 0.00020 
     | > current_lr_1: 0.00020 
     | > step_time: 0.80880  (0.75756)
     | > loader_time: 0.01370  (0.00888)


[1m > EVALUATION [0m





> DataLoader initialization
| > Tokenizer:
	| > add_blank: True
	| > use_eos_bos: False
	| > use_phonemes: True
	| > phonemizer:
		| > phoneme language: es-es
		| > phoneme backend: gruut
	| > 6 not found characters:
	| > ɡ
	| > θ
	| > ͡
	| > ʃ
	| > ʝ
	| > ɾ
| > Number of instances : 4
 | > Preprocessing samples
 | > Max text length: 81
 | > Min text length: 46
 | > Avg text length: 66.5
 | 
 | > Max audio length: 80022.0
 | > Min audio length: 80022.0
 | > Avg audio length: 80022.0
 | > Num. instances discarded samples: 0
 | > Batch group size: 0.
 | > Synthesizing test sentences.



  [1m--> EVAL PERFORMANCE[0m
     | > avg_loader_time:[91m 0.79626 [0m(+0.28849)
     | > avg_loss_disc:[92m 2.47423 [0m(-0.13922)
     | > avg_loss_disc_real_0:[92m 0.16617 [0m(-0.07225)
     | > avg_loss_disc_real_1:[91m 0.16949 [0m(+0.03456)
     | > avg_loss_disc_real_2:[92m 0.11745 [0m(-0.08337)
     | > avg_loss_disc_real_3:[91m 0.24996 [0m(+0.07663)
     | > avg_loss_disc_real_4:[92m 0.17029 [0m(-0.05474)
     | > avg_loss_disc_real_5:[92m 0.21987 [0m(-0.01959)
     | > avg_loss_0:[92m 2.47423 [0m(-0.13922)
     | > avg_loss_gen:[92m 1.85272 [0m(-0.14257)
     | > avg_loss_kl:[92m 1.46258 [0m(-0.02549)
     | > avg_loss_feat:[91m 3.42410 [0m(+0.08809)
     | > avg_loss_mel:[92m 24.46476 [0m(-0.40650)
     | > avg_loss_duration:[91m 1.66779 [0m(+0.01542)
     | > avg_loss_1:[92m 32.87195 [0m(-0.47106)


[4m[1m > EPOCH: 66/100[0m
 --> /content/drive2/MyDrive/tts-ai/VITS-es-1-April-28-2023_09+16PM-7712052

[1m > TRAINING (2023-04-28 21:57:05) [



> DataLoader initialization
| > Tokenizer:
	| > add_blank: True
	| > use_eos_bos: False
	| > use_phonemes: True
	| > phonemizer:
		| > phoneme language: es-es
		| > phoneme backend: gruut
	| > 6 not found characters:
	| > ɡ
	| > θ
	| > ͡
	| > ʃ
	| > ʝ
	| > ɾ
| > Number of instances : 401
 | > Preprocessing samples
 | > Max text length: 94
 | > Min text length: 20
 | > Avg text length: 61.45885286783042
 | 
 | > Max audio length: 80022.0
 | > Min audio length: 64166.0
 | > Avg audio length: 79982.45885286783
 | > Num. instances discarded samples: 0
 | > Batch group size: 0.



[1m   --> STEP: 23/26 -- GLOBAL_STEP: 3950[0m
     | > loss_disc: 2.59136  (2.60919)
     | > loss_disc_real_0: 0.20385  (0.18475)
     | > loss_disc_real_1: 0.17809  (0.22279)
     | > loss_disc_real_2: 0.25772  (0.24175)
     | > loss_disc_real_3: 0.28378  (0.25033)
     | > loss_disc_real_4: 0.20886  (0.22674)
     | > loss_disc_real_5: 0.25993  (0.24382)
     | > loss_0: 2.59136  (2.60919)
     | > grad_norm_0: 4.82212  (11.40157)
     | > loss_gen: 2.03340  (2.08171)
     | > loss_kl: 1.68214  (1.38004)
     | > loss_feat: 2.44455  (2.53099)
     | > loss_mel: 22.40758  (23.51166)
     | > loss_duration: 1.70844  (1.75519)
     | > amp_scaler: 512.00000  (512.00000)
     | > loss_1: 30.27611  (31.25960)
     | > grad_norm_1: 94.27046  (165.09462)
     | > current_lr_0: 0.00020 
     | > current_lr_1: 0.00020 
     | > step_time: 0.84940  (0.77104)
     | > loader_time: 0.00690  (0.00892)


[1m > EVALUATION [0m





> DataLoader initialization
| > Tokenizer:
	| > add_blank: True
	| > use_eos_bos: False
	| > use_phonemes: True
	| > phonemizer:
		| > phoneme language: es-es
		| > phoneme backend: gruut
	| > 6 not found characters:
	| > ɡ
	| > θ
	| > ͡
	| > ʃ
	| > ʝ
	| > ɾ
| > Number of instances : 4
 | > Preprocessing samples
 | > Max text length: 81
 | > Min text length: 46
 | > Avg text length: 66.5
 | 
 | > Max audio length: 80022.0
 | > Min audio length: 80022.0
 | > Avg audio length: 80022.0
 | > Num. instances discarded samples: 0
 | > Batch group size: 0.
 | > Synthesizing test sentences.



  [1m--> EVAL PERFORMANCE[0m
     | > avg_loader_time:[92m 0.51077 [0m(-0.28550)
     | > avg_loss_disc:[91m 2.65701 [0m(+0.18278)
     | > avg_loss_disc_real_0:[91m 0.35091 [0m(+0.18474)
     | > avg_loss_disc_real_1:[92m 0.09688 [0m(-0.07261)
     | > avg_loss_disc_real_2:[91m 0.18331 [0m(+0.06586)
     | > avg_loss_disc_real_3:[92m 0.14176 [0m(-0.10820)
     | > avg_loss_disc_real_4:[92m 0.11174 [0m(-0.05855)
     | > avg_loss_disc_real_5:[92m 0.20626 [0m(-0.01361)
     | > avg_loss_0:[91m 2.65701 [0m(+0.18278)
     | > avg_loss_gen:[91m 1.95470 [0m(+0.10198)
     | > avg_loss_kl:[92m 1.05681 [0m(-0.40577)
     | > avg_loss_feat:[91m 4.10568 [0m(+0.68158)
     | > avg_loss_mel:[91m 28.58822 [0m(+4.12347)
     | > avg_loss_duration:[92m 1.65685 [0m(-0.01094)
     | > avg_loss_1:[91m 37.36227 [0m(+4.49032)


[4m[1m > EPOCH: 67/100[0m
 --> /content/drive2/MyDrive/tts-ai/VITS-es-1-April-28-2023_09+16PM-7712052

[1m > TRAINING (2023-04-28 21:57:39) [



> DataLoader initialization
| > Tokenizer:
	| > add_blank: True
	| > use_eos_bos: False
	| > use_phonemes: True
	| > phonemizer:
		| > phoneme language: es-es
		| > phoneme backend: gruut
	| > 6 not found characters:
	| > ɡ
	| > θ
	| > ͡
	| > ʃ
	| > ʝ
	| > ɾ
| > Number of instances : 401
 | > Preprocessing samples
 | > Max text length: 94
 | > Min text length: 20
 | > Avg text length: 61.45885286783042
 | 
 | > Max audio length: 80022.0
 | > Min audio length: 64166.0
 | > Avg audio length: 79982.45885286783
 | > Num. instances discarded samples: 0
 | > Batch group size: 0.



[1m   --> STEP: 22/26 -- GLOBAL_STEP: 3975[0m
     | > loss_disc: 2.65144  (2.68686)
     | > loss_disc_real_0: 0.14989  (0.20581)
     | > loss_disc_real_1: 0.26113  (0.22058)
     | > loss_disc_real_2: 0.23951  (0.25023)
     | > loss_disc_real_3: 0.18372  (0.24174)
     | > loss_disc_real_4: 0.24459  (0.24326)
     | > loss_disc_real_5: 0.21533  (0.24526)
     | > loss_0: 2.65144  (2.68686)
     | > grad_norm_0: 9.67924  (15.94071)
     | > loss_gen: 2.32141  (2.06158)
     | > loss_kl: 1.18672  (1.47000)
     | > loss_feat: 3.09704  (2.47835)
     | > loss_mel: 25.18379  (23.56649)
     | > loss_duration: 1.73524  (1.75581)
     | > amp_scaler: 512.00000  (512.00000)
     | > loss_1: 33.52421  (31.33223)
     | > grad_norm_1: 219.84583  (212.44235)
     | > current_lr_0: 0.00020 
     | > current_lr_1: 0.00020 
     | > step_time: 0.70900  (0.75026)
     | > loader_time: 0.00570  (0.00849)


[1m > EVALUATION [0m





> DataLoader initialization
| > Tokenizer:
	| > add_blank: True
	| > use_eos_bos: False
	| > use_phonemes: True
	| > phonemizer:
		| > phoneme language: es-es
		| > phoneme backend: gruut
	| > 6 not found characters:
	| > ɡ
	| > θ
	| > ͡
	| > ʃ
	| > ʝ
	| > ɾ
| > Number of instances : 4
 | > Preprocessing samples
 | > Max text length: 81
 | > Min text length: 46
 | > Avg text length: 66.5
 | 
 | > Max audio length: 80022.0
 | > Min audio length: 80022.0
 | > Avg audio length: 80022.0
 | > Num. instances discarded samples: 0
 | > Batch group size: 0.
 | > Synthesizing test sentences.



  [1m--> EVAL PERFORMANCE[0m
     | > avg_loader_time:[92m 0.50598 [0m(-0.00479)
     | > avg_loss_disc:[92m 2.42490 [0m(-0.23210)
     | > avg_loss_disc_real_0:[92m 0.22276 [0m(-0.12814)
     | > avg_loss_disc_real_1:[91m 0.13741 [0m(+0.04053)
     | > avg_loss_disc_real_2:[91m 0.21575 [0m(+0.03243)
     | > avg_loss_disc_real_3:[91m 0.27786 [0m(+0.13611)
     | > avg_loss_disc_real_4:[91m 0.18593 [0m(+0.07418)
     | > avg_loss_disc_real_5:[91m 0.24894 [0m(+0.04268)
     | > avg_loss_0:[92m 2.42490 [0m(-0.23210)
     | > avg_loss_gen:[91m 2.11430 [0m(+0.15960)
     | > avg_loss_kl:[91m 1.54695 [0m(+0.49014)
     | > avg_loss_feat:[92m 3.23718 [0m(-0.86850)
     | > avg_loss_mel:[92m 26.97470 [0m(-1.61353)
     | > avg_loss_duration:[92m 1.62690 [0m(-0.02995)
     | > avg_loss_1:[92m 35.50003 [0m(-1.86223)


[4m[1m > EPOCH: 68/100[0m
 --> /content/drive2/MyDrive/tts-ai/VITS-es-1-April-28-2023_09+16PM-7712052

[1m > TRAINING (2023-04-28 21:58:14) [



> DataLoader initialization
| > Tokenizer:
	| > add_blank: True
	| > use_eos_bos: False
	| > use_phonemes: True
	| > phonemizer:
		| > phoneme language: es-es
		| > phoneme backend: gruut
	| > 6 not found characters:
	| > ɡ
	| > θ
	| > ͡
	| > ʃ
	| > ʝ
	| > ɾ
| > Number of instances : 401
 | > Preprocessing samples
 | > Max text length: 94
 | > Min text length: 20
 | > Avg text length: 61.45885286783042
 | 
 | > Max audio length: 80022.0
 | > Min audio length: 64166.0
 | > Avg audio length: 79982.45885286783
 | > Num. instances discarded samples: 0
 | > Batch group size: 0.



[1m   --> STEP: 21/26 -- GLOBAL_STEP: 4000[0m
     | > loss_disc: 2.59765  (2.63545)
     | > loss_disc_real_0: 0.21591  (0.20110)
     | > loss_disc_real_1: 0.28811  (0.23367)
     | > loss_disc_real_2: 0.23637  (0.23739)
     | > loss_disc_real_3: 0.23058  (0.24010)
     | > loss_disc_real_4: 0.24200  (0.22331)
     | > loss_disc_real_5: 0.23448  (0.25066)
     | > loss_0: 2.59765  (2.63545)
     | > grad_norm_0: 6.93799  (11.55125)
     | > loss_gen: 2.02418  (2.05876)
     | > loss_kl: 1.19996  (1.36074)
     | > loss_feat: 2.55742  (2.54735)
     | > loss_mel: 23.16113  (23.62731)
     | > loss_duration: 1.81824  (1.75606)
     | > amp_scaler: 512.00000  (512.00000)
     | > loss_1: 30.76094  (31.35023)
     | > grad_norm_1: 137.62642  (161.42087)
     | > current_lr_0: 0.00020 
     | > current_lr_1: 0.00020 
     | > step_time: 0.68970  (0.76030)
     | > loader_time: 0.00580  (0.00931)


[1m > EVALUATION [0m





> DataLoader initialization
| > Tokenizer:
	| > add_blank: True
	| > use_eos_bos: False
	| > use_phonemes: True
	| > phonemizer:
		| > phoneme language: es-es
		| > phoneme backend: gruut
	| > 6 not found characters:
	| > ɡ
	| > θ
	| > ͡
	| > ʃ
	| > ʝ
	| > ɾ
| > Number of instances : 4
 | > Preprocessing samples
 | > Max text length: 81
 | > Min text length: 46
 | > Avg text length: 66.5
 | 
 | > Max audio length: 80022.0
 | > Min audio length: 80022.0
 | > Avg audio length: 80022.0
 | > Num. instances discarded samples: 0
 | > Batch group size: 0.
 | > Synthesizing test sentences.



  [1m--> EVAL PERFORMANCE[0m
     | > avg_loader_time:[92m 0.50323 [0m(-0.00274)
     | > avg_loss_disc:[91m 2.83425 [0m(+0.40934)
     | > avg_loss_disc_real_0:[91m 0.50450 [0m(+0.28174)
     | > avg_loss_disc_real_1:[91m 0.17911 [0m(+0.04170)
     | > avg_loss_disc_real_2:[91m 0.29699 [0m(+0.08124)
     | > avg_loss_disc_real_3:[92m 0.24471 [0m(-0.03315)
     | > avg_loss_disc_real_4:[91m 0.34582 [0m(+0.15989)
     | > avg_loss_disc_real_5:[92m 0.23284 [0m(-0.01609)
     | > avg_loss_0:[91m 2.83425 [0m(+0.40934)
     | > avg_loss_gen:[91m 2.60195 [0m(+0.48765)
     | > avg_loss_kl:[91m 1.57344 [0m(+0.02649)
     | > avg_loss_feat:[92m 2.27898 [0m(-0.95820)
     | > avg_loss_mel:[92m 22.69312 [0m(-4.28158)
     | > avg_loss_duration:[91m 1.66326 [0m(+0.03636)
     | > avg_loss_1:[92m 30.81075 [0m(-4.68928)


[4m[1m > EPOCH: 69/100[0m
 --> /content/drive2/MyDrive/tts-ai/VITS-es-1-April-28-2023_09+16PM-7712052

[1m > TRAINING (2023-04-28 21:58:48) [



> DataLoader initialization
| > Tokenizer:
	| > add_blank: True
	| > use_eos_bos: False
	| > use_phonemes: True
	| > phonemizer:
		| > phoneme language: es-es
		| > phoneme backend: gruut
	| > 6 not found characters:
	| > ɡ
	| > θ
	| > ͡
	| > ʃ
	| > ʝ
	| > ɾ
| > Number of instances : 401
 | > Preprocessing samples
 | > Max text length: 94
 | > Min text length: 20
 | > Avg text length: 61.45885286783042
 | 
 | > Max audio length: 80022.0
 | > Min audio length: 64166.0
 | > Avg audio length: 79982.45885286783
 | > Num. instances discarded samples: 0
 | > Batch group size: 0.



[1m   --> STEP: 20/26 -- GLOBAL_STEP: 4025[0m
     | > loss_disc: 2.55209  (2.64408)
     | > loss_disc_real_0: 0.14102  (0.18267)
     | > loss_disc_real_1: 0.19145  (0.20922)
     | > loss_disc_real_2: 0.22167  (0.24485)
     | > loss_disc_real_3: 0.20464  (0.24118)
     | > loss_disc_real_4: 0.26808  (0.25991)
     | > loss_disc_real_5: 0.25267  (0.24634)
     | > loss_0: 2.55209  (2.64408)
     | > grad_norm_0: 10.94605  (11.22933)
     | > loss_gen: 1.99159  (2.00591)
     | > loss_kl: 1.18966  (1.33746)
     | > loss_feat: 2.49750  (2.39478)
     | > loss_mel: 24.68166  (23.58046)
     | > loss_duration: 1.76764  (1.74700)
     | > amp_scaler: 512.00000  (512.00000)
     | > loss_1: 32.12804  (31.06559)
     | > grad_norm_1: 215.16542  (198.74136)
     | > current_lr_0: 0.00020 
     | > current_lr_1: 0.00020 
     | > step_time: 0.68270  (0.75532)
     | > loader_time: 0.00690  (0.00942)


[1m > EVALUATION [0m





> DataLoader initialization
| > Tokenizer:
	| > add_blank: True
	| > use_eos_bos: False
	| > use_phonemes: True
	| > phonemizer:
		| > phoneme language: es-es
		| > phoneme backend: gruut
	| > 6 not found characters:
	| > ɡ
	| > θ
	| > ͡
	| > ʃ
	| > ʝ
	| > ɾ
| > Number of instances : 4
 | > Preprocessing samples
 | > Max text length: 81
 | > Min text length: 46
 | > Avg text length: 66.5
 | 
 | > Max audio length: 80022.0
 | > Min audio length: 80022.0
 | > Avg audio length: 80022.0
 | > Num. instances discarded samples: 0
 | > Batch group size: 0.
 | > Synthesizing test sentences.



  [1m--> EVAL PERFORMANCE[0m
     | > avg_loader_time:[91m 0.76981 [0m(+0.26658)
     | > avg_loss_disc:[91m 3.02612 [0m(+0.19187)
     | > avg_loss_disc_real_0:[91m 0.70827 [0m(+0.20377)
     | > avg_loss_disc_real_1:[91m 0.22682 [0m(+0.04771)
     | > avg_loss_disc_real_2:[92m 0.19409 [0m(-0.10289)
     | > avg_loss_disc_real_3:[91m 0.34868 [0m(+0.10397)
     | > avg_loss_disc_real_4:[92m 0.24245 [0m(-0.10337)
     | > avg_loss_disc_real_5:[91m 0.30363 [0m(+0.07079)
     | > avg_loss_0:[91m 3.02612 [0m(+0.19187)
     | > avg_loss_gen:[91m 2.94067 [0m(+0.33872)
     | > avg_loss_kl:[91m 1.62157 [0m(+0.04813)
     | > avg_loss_feat:[92m 2.23826 [0m(-0.04072)
     | > avg_loss_mel:[91m 23.64550 [0m(+0.95238)
     | > avg_loss_duration:[92m 1.61487 [0m(-0.04839)
     | > avg_loss_1:[91m 32.06087 [0m(+1.25012)


[4m[1m > EPOCH: 70/100[0m
 --> /content/drive2/MyDrive/tts-ai/VITS-es-1-April-28-2023_09+16PM-7712052

[1m > TRAINING (2023-04-28 21:59:22) [



> DataLoader initialization
| > Tokenizer:
	| > add_blank: True
	| > use_eos_bos: False
	| > use_phonemes: True
	| > phonemizer:
		| > phoneme language: es-es
		| > phoneme backend: gruut
	| > 6 not found characters:
	| > ɡ
	| > θ
	| > ͡
	| > ʃ
	| > ʝ
	| > ɾ
| > Number of instances : 401
 | > Preprocessing samples
 | > Max text length: 94
 | > Min text length: 20
 | > Avg text length: 61.45885286783042
 | 
 | > Max audio length: 80022.0
 | > Min audio length: 64166.0
 | > Avg audio length: 79982.45885286783
 | > Num. instances discarded samples: 0
 | > Batch group size: 0.



[1m   --> STEP: 19/26 -- GLOBAL_STEP: 4050[0m
     | > loss_disc: 2.61011  (2.68960)
     | > loss_disc_real_0: 0.13124  (0.21058)
     | > loss_disc_real_1: 0.24842  (0.21926)
     | > loss_disc_real_2: 0.25328  (0.23924)
     | > loss_disc_real_3: 0.23692  (0.24609)
     | > loss_disc_real_4: 0.22773  (0.24530)
     | > loss_disc_real_5: 0.24852  (0.25341)
     | > loss_0: 2.61011  (2.68960)
     | > grad_norm_0: 8.96814  (16.38893)
     | > loss_gen: 1.91044  (1.96623)
     | > loss_kl: 1.14519  (1.42713)
     | > loss_feat: 2.08377  (2.29598)
     | > loss_mel: 22.20929  (23.15022)
     | > loss_duration: 1.75719  (1.75042)
     | > amp_scaler: 512.00000  (512.00000)
     | > loss_1: 29.10588  (30.58998)
     | > grad_norm_1: 164.01871  (174.16463)
     | > current_lr_0: 0.00020 
     | > current_lr_1: 0.00020 
     | > step_time: 0.74600  (0.76227)
     | > loader_time: 0.00570  (0.00911)


[1m > EVALUATION [0m





> DataLoader initialization
| > Tokenizer:
	| > add_blank: True
	| > use_eos_bos: False
	| > use_phonemes: True
	| > phonemizer:
		| > phoneme language: es-es
		| > phoneme backend: gruut
	| > 6 not found characters:
	| > ɡ
	| > θ
	| > ͡
	| > ʃ
	| > ʝ
	| > ɾ
| > Number of instances : 4
 | > Preprocessing samples
 | > Max text length: 81
 | > Min text length: 46
 | > Avg text length: 66.5
 | 
 | > Max audio length: 80022.0
 | > Min audio length: 80022.0
 | > Avg audio length: 80022.0
 | > Num. instances discarded samples: 0
 | > Batch group size: 0.
 | > Synthesizing test sentences.



  [1m--> EVAL PERFORMANCE[0m
     | > avg_loader_time:[92m 0.51506 [0m(-0.25476)
     | > avg_loss_disc:[92m 2.50903 [0m(-0.51709)
     | > avg_loss_disc_real_0:[92m 0.24932 [0m(-0.45895)
     | > avg_loss_disc_real_1:[92m 0.15990 [0m(-0.06692)
     | > avg_loss_disc_real_2:[91m 0.26545 [0m(+0.07136)
     | > avg_loss_disc_real_3:[92m 0.17813 [0m(-0.17054)
     | > avg_loss_disc_real_4:[92m 0.18420 [0m(-0.05825)
     | > avg_loss_disc_real_5:[92m 0.27543 [0m(-0.02820)
     | > avg_loss_0:[92m 2.50903 [0m(-0.51709)
     | > avg_loss_gen:[92m 2.03186 [0m(-0.90882)
     | > avg_loss_kl:[92m 1.25252 [0m(-0.36905)
     | > avg_loss_feat:[91m 2.59800 [0m(+0.35974)
     | > avg_loss_mel:[92m 23.58988 [0m(-0.05562)
     | > avg_loss_duration:[91m 1.63702 [0m(+0.02216)
     | > avg_loss_1:[92m 31.10927 [0m(-0.95160)


[4m[1m > EPOCH: 71/100[0m
 --> /content/drive2/MyDrive/tts-ai/VITS-es-1-April-28-2023_09+16PM-7712052

[1m > TRAINING (2023-04-28 21:59:56) [



> DataLoader initialization
| > Tokenizer:
	| > add_blank: True
	| > use_eos_bos: False
	| > use_phonemes: True
	| > phonemizer:
		| > phoneme language: es-es
		| > phoneme backend: gruut
	| > 6 not found characters:
	| > ɡ
	| > θ
	| > ͡
	| > ʃ
	| > ʝ
	| > ɾ
| > Number of instances : 401
 | > Preprocessing samples
 | > Max text length: 94
 | > Min text length: 20
 | > Avg text length: 61.45885286783042
 | 
 | > Max audio length: 80022.0
 | > Min audio length: 64166.0
 | > Avg audio length: 79982.45885286783
 | > Num. instances discarded samples: 0
 | > Batch group size: 0.



[1m   --> STEP: 18/26 -- GLOBAL_STEP: 4075[0m
     | > loss_disc: 2.64278  (2.57015)
     | > loss_disc_real_0: 0.24061  (0.18167)
     | > loss_disc_real_1: 0.22073  (0.20773)
     | > loss_disc_real_2: 0.27642  (0.23950)
     | > loss_disc_real_3: 0.26836  (0.24969)
     | > loss_disc_real_4: 0.21199  (0.22716)
     | > loss_disc_real_5: 0.26026  (0.24115)
     | > loss_0: 2.64278  (2.57015)
     | > grad_norm_0: 8.75241  (8.12359)
     | > loss_gen: 1.96456  (2.05372)
     | > loss_kl: 1.56162  (1.45863)
     | > loss_feat: 2.04080  (2.56530)
     | > loss_mel: 21.26785  (22.61268)
     | > loss_duration: 1.76334  (1.74977)
     | > amp_scaler: 512.00000  (512.00000)
     | > loss_1: 28.59817  (30.44010)
     | > grad_norm_1: 228.77448  (175.55133)
     | > current_lr_0: 0.00020 
     | > current_lr_1: 0.00020 
     | > step_time: 0.78180  (0.75181)
     | > loader_time: 0.00610  (0.01009)


[1m > EVALUATION [0m





> DataLoader initialization
| > Tokenizer:
	| > add_blank: True
	| > use_eos_bos: False
	| > use_phonemes: True
	| > phonemizer:
		| > phoneme language: es-es
		| > phoneme backend: gruut
	| > 6 not found characters:
	| > ɡ
	| > θ
	| > ͡
	| > ʃ
	| > ʝ
	| > ɾ
| > Number of instances : 4
 | > Preprocessing samples
 | > Max text length: 81
 | > Min text length: 46
 | > Avg text length: 66.5
 | 
 | > Max audio length: 80022.0
 | > Min audio length: 80022.0
 | > Avg audio length: 80022.0
 | > Num. instances discarded samples: 0
 | > Batch group size: 0.
 | > Synthesizing test sentences.



  [1m--> EVAL PERFORMANCE[0m
     | > avg_loader_time:[92m 0.50934 [0m(-0.00572)
     | > avg_loss_disc:[91m 2.59727 [0m(+0.08824)
     | > avg_loss_disc_real_0:[92m 0.15578 [0m(-0.09354)
     | > avg_loss_disc_real_1:[91m 0.25873 [0m(+0.09883)
     | > avg_loss_disc_real_2:[91m 0.35072 [0m(+0.08527)
     | > avg_loss_disc_real_3:[91m 0.28412 [0m(+0.10599)
     | > avg_loss_disc_real_4:[91m 0.25225 [0m(+0.06805)
     | > avg_loss_disc_real_5:[91m 0.32511 [0m(+0.04968)
     | > avg_loss_0:[91m 2.59727 [0m(+0.08824)
     | > avg_loss_gen:[91m 2.29896 [0m(+0.26710)
     | > avg_loss_kl:[91m 1.73718 [0m(+0.48466)
     | > avg_loss_feat:[92m 2.15127 [0m(-0.44673)
     | > avg_loss_mel:[92m 20.06740 [0m(-3.52247)
     | > avg_loss_duration:[92m 1.62302 [0m(-0.01400)
     | > avg_loss_1:[92m 27.87784 [0m(-3.23144)


[4m[1m > EPOCH: 72/100[0m
 --> /content/drive2/MyDrive/tts-ai/VITS-es-1-April-28-2023_09+16PM-7712052

[1m > TRAINING (2023-04-28 22:00:31) [



> DataLoader initialization
| > Tokenizer:
	| > add_blank: True
	| > use_eos_bos: False
	| > use_phonemes: True
	| > phonemizer:
		| > phoneme language: es-es
		| > phoneme backend: gruut
	| > 6 not found characters:
	| > ɡ
	| > θ
	| > ͡
	| > ʃ
	| > ʝ
	| > ɾ
| > Number of instances : 401
 | > Preprocessing samples
 | > Max text length: 94
 | > Min text length: 20
 | > Avg text length: 61.45885286783042
 | 
 | > Max audio length: 80022.0
 | > Min audio length: 64166.0
 | > Avg audio length: 79982.45885286783
 | > Num. instances discarded samples: 0
 | > Batch group size: 0.



[1m   --> STEP: 17/26 -- GLOBAL_STEP: 4100[0m
     | > loss_disc: 2.43773  (2.63240)
     | > loss_disc_real_0: 0.13680  (0.19459)
     | > loss_disc_real_1: 0.17300  (0.21460)
     | > loss_disc_real_2: 0.20019  (0.24997)
     | > loss_disc_real_3: 0.21285  (0.23976)
     | > loss_disc_real_4: 0.19594  (0.23771)
     | > loss_disc_real_5: 0.22220  (0.25963)
     | > loss_0: 2.43773  (2.63240)
     | > grad_norm_0: 9.65783  (13.20963)
     | > loss_gen: 2.26524  (2.03775)
     | > loss_kl: 1.15016  (1.33571)
     | > loss_feat: 3.15649  (2.48472)
     | > loss_mel: 23.32650  (22.82648)
     | > loss_duration: 1.75718  (1.74702)
     | > amp_scaler: 512.00000  (512.00000)
     | > loss_1: 31.65558  (30.43168)
     | > grad_norm_1: 252.56303  (199.03622)
     | > current_lr_0: 0.00020 
     | > current_lr_1: 0.00020 
     | > step_time: 0.67990  (0.77584)
     | > loader_time: 0.00910  (0.00945)


[1m > EVALUATION [0m





> DataLoader initialization
| > Tokenizer:
	| > add_blank: True
	| > use_eos_bos: False
	| > use_phonemes: True
	| > phonemizer:
		| > phoneme language: es-es
		| > phoneme backend: gruut
	| > 6 not found characters:
	| > ɡ
	| > θ
	| > ͡
	| > ʃ
	| > ʝ
	| > ɾ
| > Number of instances : 4
 | > Preprocessing samples
 | > Max text length: 81
 | > Min text length: 46
 | > Avg text length: 66.5
 | 
 | > Max audio length: 80022.0
 | > Min audio length: 80022.0
 | > Avg audio length: 80022.0
 | > Num. instances discarded samples: 0
 | > Batch group size: 0.
 | > Synthesizing test sentences.



  [1m--> EVAL PERFORMANCE[0m
     | > avg_loader_time:[92m 0.48629 [0m(-0.02305)
     | > avg_loss_disc:[92m 2.52312 [0m(-0.07415)
     | > avg_loss_disc_real_0:[91m 0.17085 [0m(+0.01508)
     | > avg_loss_disc_real_1:[92m 0.13772 [0m(-0.12101)
     | > avg_loss_disc_real_2:[91m 0.35465 [0m(+0.00393)
     | > avg_loss_disc_real_3:[92m 0.20270 [0m(-0.08142)
     | > avg_loss_disc_real_4:[92m 0.23990 [0m(-0.01236)
     | > avg_loss_disc_real_5:[92m 0.27605 [0m(-0.04906)
     | > avg_loss_0:[92m 2.52312 [0m(-0.07415)
     | > avg_loss_gen:[92m 2.16631 [0m(-0.13266)
     | > avg_loss_kl:[92m 1.57653 [0m(-0.16065)
     | > avg_loss_feat:[91m 3.08935 [0m(+0.93808)
     | > avg_loss_mel:[91m 24.14202 [0m(+4.07461)
     | > avg_loss_duration:[91m 1.64825 [0m(+0.02523)
     | > avg_loss_1:[91m 32.62246 [0m(+4.74462)


[4m[1m > EPOCH: 73/100[0m
 --> /content/drive2/MyDrive/tts-ai/VITS-es-1-April-28-2023_09+16PM-7712052

[1m > TRAINING (2023-04-28 22:01:05) [



> DataLoader initialization
| > Tokenizer:
	| > add_blank: True
	| > use_eos_bos: False
	| > use_phonemes: True
	| > phonemizer:
		| > phoneme language: es-es
		| > phoneme backend: gruut
	| > 6 not found characters:
	| > ɡ
	| > θ
	| > ͡
	| > ʃ
	| > ʝ
	| > ɾ
| > Number of instances : 401
 | > Preprocessing samples
 | > Max text length: 94
 | > Min text length: 20
 | > Avg text length: 61.45885286783042
 | 
 | > Max audio length: 80022.0
 | > Min audio length: 64166.0
 | > Avg audio length: 79982.45885286783
 | > Num. instances discarded samples: 0
 | > Batch group size: 0.



[1m   --> STEP: 16/26 -- GLOBAL_STEP: 4125[0m
     | > loss_disc: 2.48108  (2.59189)
     | > loss_disc_real_0: 0.19212  (0.19230)
     | > loss_disc_real_1: 0.22196  (0.21720)
     | > loss_disc_real_2: 0.24878  (0.24848)
     | > loss_disc_real_3: 0.24995  (0.23833)
     | > loss_disc_real_4: 0.23241  (0.23091)
     | > loss_disc_real_5: 0.28718  (0.25439)
     | > loss_0: 2.48108  (2.59189)
     | > grad_norm_0: 4.98374  (12.13727)
     | > loss_gen: 2.03466  (2.07509)
     | > loss_kl: 1.50284  (1.43298)
     | > loss_feat: 3.00578  (2.61468)
     | > loss_mel: 24.40717  (23.08818)
     | > loss_duration: 1.77048  (1.73444)
     | > amp_scaler: 512.00000  (512.00000)
     | > loss_1: 32.72093  (30.94538)
     | > grad_norm_1: 298.11560  (221.15942)
     | > current_lr_0: 0.00020 
     | > current_lr_1: 0.00020 
     | > step_time: 0.71440  (0.77570)
     | > loader_time: 0.00920  (0.01079)


[1m > EVALUATION [0m





> DataLoader initialization
| > Tokenizer:
	| > add_blank: True
	| > use_eos_bos: False
	| > use_phonemes: True
	| > phonemizer:
		| > phoneme language: es-es
		| > phoneme backend: gruut
	| > 6 not found characters:
	| > ɡ
	| > θ
	| > ͡
	| > ʃ
	| > ʝ
	| > ɾ
| > Number of instances : 4
 | > Preprocessing samples
 | > Max text length: 81
 | > Min text length: 46
 | > Avg text length: 66.5
 | 
 | > Max audio length: 80022.0
 | > Min audio length: 80022.0
 | > Avg audio length: 80022.0
 | > Num. instances discarded samples: 0
 | > Batch group size: 0.
 | > Synthesizing test sentences.



  [1m--> EVAL PERFORMANCE[0m
     | > avg_loader_time:[91m 0.73580 [0m(+0.24951)
     | > avg_loss_disc:[91m 2.68321 [0m(+0.16009)
     | > avg_loss_disc_real_0:[91m 0.37707 [0m(+0.20622)
     | > avg_loss_disc_real_1:[91m 0.20320 [0m(+0.06548)
     | > avg_loss_disc_real_2:[92m 0.17747 [0m(-0.17718)
     | > avg_loss_disc_real_3:[91m 0.22540 [0m(+0.02270)
     | > avg_loss_disc_real_4:[91m 0.27237 [0m(+0.03247)
     | > avg_loss_disc_real_5:[92m 0.25734 [0m(-0.01871)
     | > avg_loss_0:[91m 2.68321 [0m(+0.16009)
     | > avg_loss_gen:[91m 2.35306 [0m(+0.18675)
     | > avg_loss_kl:[91m 2.10691 [0m(+0.53038)
     | > avg_loss_feat:[92m 2.63327 [0m(-0.45608)
     | > avg_loss_mel:[92m 21.77051 [0m(-2.37150)
     | > avg_loss_duration:[92m 1.60706 [0m(-0.04119)
     | > avg_loss_1:[92m 30.47082 [0m(-2.15164)


[4m[1m > EPOCH: 74/100[0m
 --> /content/drive2/MyDrive/tts-ai/VITS-es-1-April-28-2023_09+16PM-7712052

[1m > TRAINING (2023-04-28 22:01:39) [



> DataLoader initialization
| > Tokenizer:
	| > add_blank: True
	| > use_eos_bos: False
	| > use_phonemes: True
	| > phonemizer:
		| > phoneme language: es-es
		| > phoneme backend: gruut
	| > 6 not found characters:
	| > ɡ
	| > θ
	| > ͡
	| > ʃ
	| > ʝ
	| > ɾ
| > Number of instances : 401
 | > Preprocessing samples
 | > Max text length: 94
 | > Min text length: 20
 | > Avg text length: 61.45885286783042
 | 
 | > Max audio length: 80022.0
 | > Min audio length: 64166.0
 | > Avg audio length: 79982.45885286783
 | > Num. instances discarded samples: 0
 | > Batch group size: 0.



[1m   --> STEP: 15/26 -- GLOBAL_STEP: 4150[0m
     | > loss_disc: 2.63030  (2.61081)
     | > loss_disc_real_0: 0.13635  (0.19268)
     | > loss_disc_real_1: 0.16298  (0.19907)
     | > loss_disc_real_2: 0.26095  (0.24158)
     | > loss_disc_real_3: 0.25414  (0.23837)
     | > loss_disc_real_4: 0.19860  (0.23086)
     | > loss_disc_real_5: 0.23228  (0.24456)
     | > loss_0: 2.63030  (2.61081)
     | > grad_norm_0: 16.67233  (12.44837)
     | > loss_gen: 2.22943  (2.04213)
     | > loss_kl: 1.18496  (1.37862)
     | > loss_feat: 2.10694  (2.53948)
     | > loss_mel: 22.27370  (23.26569)
     | > loss_duration: 1.78300  (1.73263)
     | > amp_scaler: 512.00000  (512.00000)
     | > loss_1: 29.57802  (30.95856)
     | > grad_norm_1: 231.54710  (205.65102)
     | > current_lr_0: 0.00020 
     | > current_lr_1: 0.00020 
     | > step_time: 0.71550  (0.78444)
     | > loader_time: 0.00680  (0.00790)


[1m > EVALUATION [0m





> DataLoader initialization
| > Tokenizer:
	| > add_blank: True
	| > use_eos_bos: False
	| > use_phonemes: True
	| > phonemizer:
		| > phoneme language: es-es
		| > phoneme backend: gruut
	| > 6 not found characters:
	| > ɡ
	| > θ
	| > ͡
	| > ʃ
	| > ʝ
	| > ɾ
| > Number of instances : 4
 | > Preprocessing samples
 | > Max text length: 81
 | > Min text length: 46
 | > Avg text length: 66.5
 | 
 | > Max audio length: 80022.0
 | > Min audio length: 80022.0
 | > Avg audio length: 80022.0
 | > Num. instances discarded samples: 0
 | > Batch group size: 0.
 | > Synthesizing test sentences.



  [1m--> EVAL PERFORMANCE[0m
     | > avg_loader_time:[92m 0.57218 [0m(-0.16363)
     | > avg_loss_disc:[91m 2.87080 [0m(+0.18759)
     | > avg_loss_disc_real_0:[91m 0.49145 [0m(+0.11438)
     | > avg_loss_disc_real_1:[91m 0.45498 [0m(+0.25178)
     | > avg_loss_disc_real_2:[91m 0.24041 [0m(+0.06294)
     | > avg_loss_disc_real_3:[91m 0.31781 [0m(+0.09241)
     | > avg_loss_disc_real_4:[91m 0.31232 [0m(+0.03996)
     | > avg_loss_disc_real_5:[91m 0.32375 [0m(+0.06641)
     | > avg_loss_0:[91m 2.87080 [0m(+0.18759)
     | > avg_loss_gen:[91m 3.00326 [0m(+0.65020)
     | > avg_loss_kl:[92m 1.00153 [0m(-1.10538)
     | > avg_loss_feat:[91m 2.72473 [0m(+0.09146)
     | > avg_loss_mel:[91m 23.93572 [0m(+2.16521)
     | > avg_loss_duration:[91m 1.65810 [0m(+0.05104)
     | > avg_loss_1:[91m 32.32334 [0m(+1.85252)


[4m[1m > EPOCH: 75/100[0m
 --> /content/drive2/MyDrive/tts-ai/VITS-es-1-April-28-2023_09+16PM-7712052

[1m > TRAINING (2023-04-28 22:02:12) [



> DataLoader initialization
| > Tokenizer:
	| > add_blank: True
	| > use_eos_bos: False
	| > use_phonemes: True
	| > phonemizer:
		| > phoneme language: es-es
		| > phoneme backend: gruut
	| > 6 not found characters:
	| > ɡ
	| > θ
	| > ͡
	| > ʃ
	| > ʝ
	| > ɾ
| > Number of instances : 401
 | > Preprocessing samples
 | > Max text length: 94
 | > Min text length: 20
 | > Avg text length: 61.45885286783042
 | 
 | > Max audio length: 80022.0
 | > Min audio length: 64166.0
 | > Avg audio length: 79982.45885286783
 | > Num. instances discarded samples: 0
 | > Batch group size: 0.



[1m   --> STEP: 14/26 -- GLOBAL_STEP: 4175[0m
     | > loss_disc: 2.61249  (2.83338)
     | > loss_disc_real_0: 0.20214  (0.21401)
     | > loss_disc_real_1: 0.29627  (0.28045)
     | > loss_disc_real_2: 0.19459  (0.24423)
     | > loss_disc_real_3: 0.19402  (0.24596)
     | > loss_disc_real_4: 0.24822  (0.26100)
     | > loss_disc_real_5: 0.24899  (0.25564)
     | > loss_0: 2.61249  (2.83338)
     | > grad_norm_0: 7.67314  (16.98594)
     | > loss_gen: 1.89267  (1.92040)
     | > loss_kl: 1.17837  (1.33700)
     | > loss_feat: 2.18766  (2.11243)
     | > loss_mel: 23.33668  (23.35501)
     | > loss_duration: 1.80255  (1.72900)
     | > amp_scaler: 512.00000  (512.00000)
     | > loss_1: 30.39792  (30.45383)
     | > grad_norm_1: 203.48413  (112.46461)
     | > current_lr_0: 0.00020 
     | > current_lr_1: 0.00020 
     | > step_time: 0.71430  (0.74183)
     | > loader_time: 0.01000  (0.00979)


[1m > EVALUATION [0m





> DataLoader initialization
| > Tokenizer:
	| > add_blank: True
	| > use_eos_bos: False
	| > use_phonemes: True
	| > phonemizer:
		| > phoneme language: es-es
		| > phoneme backend: gruut
	| > 6 not found characters:
	| > ɡ
	| > θ
	| > ͡
	| > ʃ
	| > ʝ
	| > ɾ
| > Number of instances : 4
 | > Preprocessing samples
 | > Max text length: 81
 | > Min text length: 46
 | > Avg text length: 66.5
 | 
 | > Max audio length: 80022.0
 | > Min audio length: 80022.0
 | > Avg audio length: 80022.0
 | > Num. instances discarded samples: 0
 | > Batch group size: 0.
 | > Synthesizing test sentences.



  [1m--> EVAL PERFORMANCE[0m
     | > avg_loader_time:[92m 0.51188 [0m(-0.06029)
     | > avg_loss_disc:[91m 2.96045 [0m(+0.08965)
     | > avg_loss_disc_real_0:[92m 0.47019 [0m(-0.02126)
     | > avg_loss_disc_real_1:[92m 0.25848 [0m(-0.19650)
     | > avg_loss_disc_real_2:[92m 0.15931 [0m(-0.08109)
     | > avg_loss_disc_real_3:[92m 0.26923 [0m(-0.04857)
     | > avg_loss_disc_real_4:[92m 0.27702 [0m(-0.03530)
     | > avg_loss_disc_real_5:[92m 0.19230 [0m(-0.13146)
     | > avg_loss_0:[91m 2.96045 [0m(+0.08965)
     | > avg_loss_gen:[92m 2.01328 [0m(-0.98998)
     | > avg_loss_kl:[91m 1.34244 [0m(+0.34091)
     | > avg_loss_feat:[92m 2.49986 [0m(-0.22487)
     | > avg_loss_mel:[91m 24.07465 [0m(+0.13892)
     | > avg_loss_duration:[92m 1.58231 [0m(-0.07579)
     | > avg_loss_1:[92m 31.51254 [0m(-0.81080)


[4m[1m > EPOCH: 76/100[0m
 --> /content/drive2/MyDrive/tts-ai/VITS-es-1-April-28-2023_09+16PM-7712052

[1m > TRAINING (2023-04-28 22:02:47) [



> DataLoader initialization
| > Tokenizer:
	| > add_blank: True
	| > use_eos_bos: False
	| > use_phonemes: True
	| > phonemizer:
		| > phoneme language: es-es
		| > phoneme backend: gruut
	| > 6 not found characters:
	| > ɡ
	| > θ
	| > ͡
	| > ʃ
	| > ʝ
	| > ɾ
| > Number of instances : 401
 | > Preprocessing samples
 | > Max text length: 94
 | > Min text length: 20
 | > Avg text length: 61.45885286783042
 | 
 | > Max audio length: 80022.0
 | > Min audio length: 64166.0
 | > Avg audio length: 79982.45885286783
 | > Num. instances discarded samples: 0
 | > Batch group size: 0.



[1m   --> STEP: 13/26 -- GLOBAL_STEP: 4200[0m
     | > loss_disc: 2.58608  (2.62063)
     | > loss_disc_real_0: 0.26051  (0.19889)
     | > loss_disc_real_1: 0.21765  (0.20849)
     | > loss_disc_real_2: 0.20778  (0.23753)
     | > loss_disc_real_3: 0.24782  (0.23827)
     | > loss_disc_real_4: 0.22304  (0.24464)
     | > loss_disc_real_5: 0.23437  (0.25167)
     | > loss_0: 2.58608  (2.62063)
     | > grad_norm_0: 13.48545  (15.88427)
     | > loss_gen: 1.98825  (2.01771)
     | > loss_kl: 1.26888  (1.44615)
     | > loss_feat: 2.50701  (2.43759)
     | > loss_mel: 21.93085  (22.27311)
     | > loss_duration: 1.85600  (1.71863)
     | > amp_scaler: 512.00000  (512.00000)
     | > loss_1: 29.55098  (29.89319)
     | > grad_norm_1: 54.16677  (210.18364)
     | > current_lr_0: 0.00020 
     | > current_lr_1: 0.00020 
     | > step_time: 0.86330  (0.76197)
     | > loader_time: 0.00830  (0.00809)


[1m > EVALUATION [0m





> DataLoader initialization
| > Tokenizer:
	| > add_blank: True
	| > use_eos_bos: False
	| > use_phonemes: True
	| > phonemizer:
		| > phoneme language: es-es
		| > phoneme backend: gruut
	| > 6 not found characters:
	| > ɡ
	| > θ
	| > ͡
	| > ʃ
	| > ʝ
	| > ɾ
| > Number of instances : 4
 | > Preprocessing samples
 | > Max text length: 81
 | > Min text length: 46
 | > Avg text length: 66.5
 | 
 | > Max audio length: 80022.0
 | > Min audio length: 80022.0
 | > Avg audio length: 80022.0
 | > Num. instances discarded samples: 0
 | > Batch group size: 0.
 | > Synthesizing test sentences.



  [1m--> EVAL PERFORMANCE[0m
     | > avg_loader_time:[91m 0.51648 [0m(+0.00459)
     | > avg_loss_disc:[92m 2.66918 [0m(-0.29127)
     | > avg_loss_disc_real_0:[92m 0.32162 [0m(-0.14856)
     | > avg_loss_disc_real_1:[92m 0.19695 [0m(-0.06153)
     | > avg_loss_disc_real_2:[91m 0.17936 [0m(+0.02005)
     | > avg_loss_disc_real_3:[91m 0.27217 [0m(+0.00294)
     | > avg_loss_disc_real_4:[92m 0.21167 [0m(-0.06535)
     | > avg_loss_disc_real_5:[91m 0.24718 [0m(+0.05489)
     | > avg_loss_0:[92m 2.66918 [0m(-0.29127)
     | > avg_loss_gen:[91m 2.13878 [0m(+0.12550)
     | > avg_loss_kl:[92m 1.22526 [0m(-0.11718)
     | > avg_loss_feat:[92m 2.19794 [0m(-0.30192)
     | > avg_loss_mel:[92m 22.96083 [0m(-1.11382)
     | > avg_loss_duration:[91m 1.60347 [0m(+0.02116)
     | > avg_loss_1:[92m 30.12628 [0m(-1.38626)


[4m[1m > EPOCH: 77/100[0m
 --> /content/drive2/MyDrive/tts-ai/VITS-es-1-April-28-2023_09+16PM-7712052

[1m > TRAINING (2023-04-28 22:03:21) [



> DataLoader initialization
| > Tokenizer:
	| > add_blank: True
	| > use_eos_bos: False
	| > use_phonemes: True
	| > phonemizer:
		| > phoneme language: es-es
		| > phoneme backend: gruut
	| > 6 not found characters:
	| > ɡ
	| > θ
	| > ͡
	| > ʃ
	| > ʝ
	| > ɾ
| > Number of instances : 401
 | > Preprocessing samples
 | > Max text length: 94
 | > Min text length: 20
 | > Avg text length: 61.45885286783042
 | 
 | > Max audio length: 80022.0
 | > Min audio length: 64166.0
 | > Avg audio length: 79982.45885286783
 | > Num. instances discarded samples: 0
 | > Batch group size: 0.



[1m   --> STEP: 12/26 -- GLOBAL_STEP: 4225[0m
     | > loss_disc: 2.68349  (2.57652)
     | > loss_disc_real_0: 0.25611  (0.20964)
     | > loss_disc_real_1: 0.30742  (0.20932)
     | > loss_disc_real_2: 0.21479  (0.23663)
     | > loss_disc_real_3: 0.18483  (0.22990)
     | > loss_disc_real_4: 0.23102  (0.23496)
     | > loss_disc_real_5: 0.24321  (0.23755)
     | > loss_0: 2.68349  (2.57652)
     | > grad_norm_0: 32.64149  (19.70645)
     | > loss_gen: 2.13106  (2.23164)
     | > loss_kl: 1.21510  (1.44281)
     | > loss_feat: 2.62116  (2.89957)
     | > loss_mel: 25.92388  (24.39054)
     | > loss_duration: 1.73107  (1.70958)
     | > amp_scaler: 1024.00000  (640.00000)
     | > loss_1: 33.62226  (32.67414)
     | > grad_norm_1: 203.13208  (209.59027)
     | > current_lr_0: 0.00020 
     | > current_lr_1: 0.00020 
     | > step_time: 0.83000  (0.79696)
     | > loader_time: 0.00650  (0.01002)


[1m > EVALUATION [0m





> DataLoader initialization
| > Tokenizer:
	| > add_blank: True
	| > use_eos_bos: False
	| > use_phonemes: True
	| > phonemizer:
		| > phoneme language: es-es
		| > phoneme backend: gruut
	| > 6 not found characters:
	| > ɡ
	| > θ
	| > ͡
	| > ʃ
	| > ʝ
	| > ɾ
| > Number of instances : 4
 | > Preprocessing samples
 | > Max text length: 81
 | > Min text length: 46
 | > Avg text length: 66.5
 | 
 | > Max audio length: 80022.0
 | > Min audio length: 80022.0
 | > Avg audio length: 80022.0
 | > Num. instances discarded samples: 0
 | > Batch group size: 0.
 | > Synthesizing test sentences.



  [1m--> EVAL PERFORMANCE[0m
     | > avg_loader_time:[91m 0.72635 [0m(+0.20987)
     | > avg_loss_disc:[91m 2.67334 [0m(+0.00416)
     | > avg_loss_disc_real_0:[92m 0.11777 [0m(-0.20386)
     | > avg_loss_disc_real_1:[91m 0.36876 [0m(+0.17181)
     | > avg_loss_disc_real_2:[91m 0.25529 [0m(+0.07593)
     | > avg_loss_disc_real_3:[92m 0.14584 [0m(-0.12633)
     | > avg_loss_disc_real_4:[92m 0.18465 [0m(-0.02702)
     | > avg_loss_disc_real_5:[91m 0.26754 [0m(+0.02035)
     | > avg_loss_0:[91m 2.67334 [0m(+0.00416)
     | > avg_loss_gen:[92m 1.81941 [0m(-0.31937)
     | > avg_loss_kl:[91m 1.53009 [0m(+0.30483)
     | > avg_loss_feat:[91m 2.52925 [0m(+0.33131)
     | > avg_loss_mel:[92m 22.34519 [0m(-0.61565)
     | > avg_loss_duration:[91m 1.62018 [0m(+0.01671)
     | > avg_loss_1:[92m 29.84412 [0m(-0.28216)


[4m[1m > EPOCH: 78/100[0m
 --> /content/drive2/MyDrive/tts-ai/VITS-es-1-April-28-2023_09+16PM-7712052

[1m > TRAINING (2023-04-28 22:03:55) [



> DataLoader initialization
| > Tokenizer:
	| > add_blank: True
	| > use_eos_bos: False
	| > use_phonemes: True
	| > phonemizer:
		| > phoneme language: es-es
		| > phoneme backend: gruut
	| > 6 not found characters:
	| > ɡ
	| > θ
	| > ͡
	| > ʃ
	| > ʝ
	| > ɾ
| > Number of instances : 401
 | > Preprocessing samples
 | > Max text length: 94
 | > Min text length: 20
 | > Avg text length: 61.45885286783042
 | 
 | > Max audio length: 80022.0
 | > Min audio length: 64166.0
 | > Avg audio length: 79982.45885286783
 | > Num. instances discarded samples: 0
 | > Batch group size: 0.



[1m   --> STEP: 11/26 -- GLOBAL_STEP: 4250[0m
     | > loss_disc: 2.55643  (2.62310)
     | > loss_disc_real_0: 0.18200  (0.18750)
     | > loss_disc_real_1: 0.19740  (0.25945)
     | > loss_disc_real_2: 0.20011  (0.22252)
     | > loss_disc_real_3: 0.22018  (0.24038)
     | > loss_disc_real_4: 0.20922  (0.21428)
     | > loss_disc_real_5: 0.19078  (0.23199)
     | > loss_0: 2.55643  (2.62310)
     | > grad_norm_0: 9.27125  (9.57806)
     | > loss_gen: 2.04195  (1.95287)
     | > loss_kl: 1.56208  (1.48826)
     | > loss_feat: 2.88297  (2.31166)
     | > loss_mel: 22.37083  (22.82500)
     | > loss_duration: 1.68883  (1.70112)
     | > amp_scaler: 1024.00000  (1024.00000)
     | > loss_1: 30.54666  (30.27891)
     | > grad_norm_1: 84.38572  (146.63977)
     | > current_lr_0: 0.00020 
     | > current_lr_1: 0.00020 
     | > step_time: 0.71440  (0.84844)
     | > loader_time: 0.00780  (0.01073)


[1m > EVALUATION [0m





> DataLoader initialization
| > Tokenizer:
	| > add_blank: True
	| > use_eos_bos: False
	| > use_phonemes: True
	| > phonemizer:
		| > phoneme language: es-es
		| > phoneme backend: gruut
	| > 6 not found characters:
	| > ɡ
	| > θ
	| > ͡
	| > ʃ
	| > ʝ
	| > ɾ
| > Number of instances : 4
 | > Preprocessing samples
 | > Max text length: 81
 | > Min text length: 46
 | > Avg text length: 66.5
 | 
 | > Max audio length: 80022.0
 | > Min audio length: 80022.0
 | > Avg audio length: 80022.0
 | > Num. instances discarded samples: 0
 | > Batch group size: 0.
 | > Synthesizing test sentences.



  [1m--> EVAL PERFORMANCE[0m
     | > avg_loader_time:[91m 0.75512 [0m(+0.02877)
     | > avg_loss_disc:[92m 2.33551 [0m(-0.33783)
     | > avg_loss_disc_real_0:[91m 0.30300 [0m(+0.18523)
     | > avg_loss_disc_real_1:[92m 0.18225 [0m(-0.18651)
     | > avg_loss_disc_real_2:[92m 0.17145 [0m(-0.08384)
     | > avg_loss_disc_real_3:[91m 0.22409 [0m(+0.07825)
     | > avg_loss_disc_real_4:[91m 0.20921 [0m(+0.02456)
     | > avg_loss_disc_real_5:[92m 0.19806 [0m(-0.06948)
     | > avg_loss_0:[92m 2.33551 [0m(-0.33783)
     | > avg_loss_gen:[91m 2.47367 [0m(+0.65427)
     | > avg_loss_kl:[92m 1.49421 [0m(-0.03588)
     | > avg_loss_feat:[91m 3.89711 [0m(+1.36785)
     | > avg_loss_mel:[91m 23.92540 [0m(+1.58021)
     | > avg_loss_duration:[92m 1.60984 [0m(-0.01034)
     | > avg_loss_1:[91m 33.40022 [0m(+3.55610)


[4m[1m > EPOCH: 79/100[0m
 --> /content/drive2/MyDrive/tts-ai/VITS-es-1-April-28-2023_09+16PM-7712052

[1m > TRAINING (2023-04-28 22:04:29) [



> DataLoader initialization
| > Tokenizer:
	| > add_blank: True
	| > use_eos_bos: False
	| > use_phonemes: True
	| > phonemizer:
		| > phoneme language: es-es
		| > phoneme backend: gruut
	| > 6 not found characters:
	| > ɡ
	| > θ
	| > ͡
	| > ʃ
	| > ʝ
	| > ɾ
| > Number of instances : 401
 | > Preprocessing samples
 | > Max text length: 94
 | > Min text length: 20
 | > Avg text length: 61.45885286783042
 | 
 | > Max audio length: 80022.0
 | > Min audio length: 64166.0
 | > Avg audio length: 79982.45885286783
 | > Num. instances discarded samples: 0
 | > Batch group size: 0.



[1m   --> STEP: 10/26 -- GLOBAL_STEP: 4275[0m
     | > loss_disc: 2.62158  (2.58897)
     | > loss_disc_real_0: 0.11189  (0.16899)
     | > loss_disc_real_1: 0.23686  (0.21056)
     | > loss_disc_real_2: 0.20199  (0.23345)
     | > loss_disc_real_3: 0.20810  (0.23005)
     | > loss_disc_real_4: 0.24397  (0.23582)
     | > loss_disc_real_5: 0.23946  (0.24742)
     | > loss_0: 2.62158  (2.58897)
     | > grad_norm_0: 16.62246  (13.08378)
     | > loss_gen: 2.08393  (2.03550)
     | > loss_kl: 1.33499  (1.21483)
     | > loss_feat: 2.59730  (2.52120)
     | > loss_mel: 21.80479  (22.19403)
     | > loss_duration: 1.69706  (1.69334)
     | > amp_scaler: 512.00000  (512.00000)
     | > loss_1: 29.51808  (29.65891)
     | > grad_norm_1: 354.07227  (157.46431)
     | > current_lr_0: 0.00020 
     | > current_lr_1: 0.00020 
     | > step_time: 0.73460  (0.75774)
     | > loader_time: 0.00890  (0.01132)


[1m > EVALUATION [0m





> DataLoader initialization
| > Tokenizer:
	| > add_blank: True
	| > use_eos_bos: False
	| > use_phonemes: True
	| > phonemizer:
		| > phoneme language: es-es
		| > phoneme backend: gruut
	| > 6 not found characters:
	| > ɡ
	| > θ
	| > ͡
	| > ʃ
	| > ʝ
	| > ɾ
| > Number of instances : 4
 | > Preprocessing samples
 | > Max text length: 81
 | > Min text length: 46
 | > Avg text length: 66.5
 | 
 | > Max audio length: 80022.0
 | > Min audio length: 80022.0
 | > Avg audio length: 80022.0
 | > Num. instances discarded samples: 0
 | > Batch group size: 0.
 | > Synthesizing test sentences.



  [1m--> EVAL PERFORMANCE[0m
     | > avg_loader_time:[92m 0.53045 [0m(-0.22467)
     | > avg_loss_disc:[92m 2.24586 [0m(-0.08965)
     | > avg_loss_disc_real_0:[92m 0.10144 [0m(-0.20156)
     | > avg_loss_disc_real_1:[92m 0.13749 [0m(-0.04475)
     | > avg_loss_disc_real_2:[92m 0.15946 [0m(-0.01199)
     | > avg_loss_disc_real_3:[92m 0.17225 [0m(-0.05184)
     | > avg_loss_disc_real_4:[92m 0.13946 [0m(-0.06975)
     | > avg_loss_disc_real_5:[91m 0.23239 [0m(+0.03433)
     | > avg_loss_0:[92m 2.24586 [0m(-0.08965)
     | > avg_loss_gen:[92m 1.88825 [0m(-0.58542)
     | > avg_loss_kl:[91m 1.87103 [0m(+0.37683)
     | > avg_loss_feat:[91m 4.13990 [0m(+0.24279)
     | > avg_loss_mel:[91m 24.34988 [0m(+0.42449)
     | > avg_loss_duration:[91m 1.62099 [0m(+0.01116)
     | > avg_loss_1:[91m 33.87007 [0m(+0.46985)


[4m[1m > EPOCH: 80/100[0m
 --> /content/drive2/MyDrive/tts-ai/VITS-es-1-April-28-2023_09+16PM-7712052

[1m > TRAINING (2023-04-28 22:05:04) [



> DataLoader initialization
| > Tokenizer:
	| > add_blank: True
	| > use_eos_bos: False
	| > use_phonemes: True
	| > phonemizer:
		| > phoneme language: es-es
		| > phoneme backend: gruut
	| > 6 not found characters:
	| > ɡ
	| > θ
	| > ͡
	| > ʃ
	| > ʝ
	| > ɾ
| > Number of instances : 401
 | > Preprocessing samples
 | > Max text length: 94
 | > Min text length: 20
 | > Avg text length: 61.45885286783042
 | 
 | > Max audio length: 80022.0
 | > Min audio length: 64166.0
 | > Avg audio length: 79982.45885286783
 | > Num. instances discarded samples: 0
 | > Batch group size: 0.



[1m   --> STEP: 9/26 -- GLOBAL_STEP: 4300[0m
     | > loss_disc: 2.65050  (2.60130)
     | > loss_disc_real_0: 0.11176  (0.19172)
     | > loss_disc_real_1: 0.18469  (0.22298)
     | > loss_disc_real_2: 0.20671  (0.23716)
     | > loss_disc_real_3: 0.25356  (0.24509)
     | > loss_disc_real_4: 0.17123  (0.23642)
     | > loss_disc_real_5: 0.22728  (0.24658)
     | > loss_0: 2.65050  (2.60130)
     | > grad_norm_0: 18.08053  (12.36123)
     | > loss_gen: 1.86204  (2.05873)
     | > loss_kl: 1.26262  (1.42940)
     | > loss_feat: 2.62540  (2.63927)
     | > loss_mel: 24.02004  (23.45692)
     | > loss_duration: 1.64858  (1.69733)
     | > amp_scaler: 512.00000  (512.00000)
     | > loss_1: 31.41868  (31.28166)
     | > grad_norm_1: 102.22977  (204.58980)
     | > current_lr_0: 0.00020 
     | > current_lr_1: 0.00020 
     | > step_time: 0.72020  (0.73352)
     | > loader_time: 0.00690  (0.00728)


[1m > EVALUATION [0m





> DataLoader initialization
| > Tokenizer:
	| > add_blank: True
	| > use_eos_bos: False
	| > use_phonemes: True
	| > phonemizer:
		| > phoneme language: es-es
		| > phoneme backend: gruut
	| > 6 not found characters:
	| > ɡ
	| > θ
	| > ͡
	| > ʃ
	| > ʝ
	| > ɾ
| > Number of instances : 4
 | > Preprocessing samples
 | > Max text length: 81
 | > Min text length: 46
 | > Avg text length: 66.5
 | 
 | > Max audio length: 80022.0
 | > Min audio length: 80022.0
 | > Avg audio length: 80022.0
 | > Num. instances discarded samples: 0
 | > Batch group size: 0.
 | > Synthesizing test sentences.



  [1m--> EVAL PERFORMANCE[0m
     | > avg_loader_time:[92m 0.51212 [0m(-0.01833)
     | > avg_loss_disc:[91m 2.68643 [0m(+0.44057)
     | > avg_loss_disc_real_0:[91m 0.38185 [0m(+0.28041)
     | > avg_loss_disc_real_1:[91m 0.22520 [0m(+0.08771)
     | > avg_loss_disc_real_2:[91m 0.27921 [0m(+0.11975)
     | > avg_loss_disc_real_3:[91m 0.28214 [0m(+0.10988)
     | > avg_loss_disc_real_4:[91m 0.18238 [0m(+0.04293)
     | > avg_loss_disc_real_5:[91m 0.27387 [0m(+0.04148)
     | > avg_loss_0:[91m 2.68643 [0m(+0.44057)
     | > avg_loss_gen:[91m 2.48796 [0m(+0.59971)
     | > avg_loss_kl:[92m 1.55222 [0m(-0.31882)
     | > avg_loss_feat:[92m 2.87440 [0m(-1.26550)
     | > avg_loss_mel:[92m 21.87126 [0m(-2.47862)
     | > avg_loss_duration:[92m 1.60795 [0m(-0.01304)
     | > avg_loss_1:[92m 30.39380 [0m(-3.47627)


[4m[1m > EPOCH: 81/100[0m
 --> /content/drive2/MyDrive/tts-ai/VITS-es-1-April-28-2023_09+16PM-7712052

[1m > TRAINING (2023-04-28 22:05:38) [



> DataLoader initialization
| > Tokenizer:
	| > add_blank: True
	| > use_eos_bos: False
	| > use_phonemes: True
	| > phonemizer:
		| > phoneme language: es-es
		| > phoneme backend: gruut
	| > 6 not found characters:
	| > ɡ
	| > θ
	| > ͡
	| > ʃ
	| > ʝ
	| > ɾ
| > Number of instances : 401
 | > Preprocessing samples
 | > Max text length: 94
 | > Min text length: 20
 | > Avg text length: 61.45885286783042
 | 
 | > Max audio length: 80022.0
 | > Min audio length: 64166.0
 | > Avg audio length: 79982.45885286783
 | > Num. instances discarded samples: 0
 | > Batch group size: 0.



[1m   --> STEP: 8/26 -- GLOBAL_STEP: 4325[0m
     | > loss_disc: 2.74150  (2.66024)
     | > loss_disc_real_0: 0.41976  (0.21985)
     | > loss_disc_real_1: 0.23651  (0.21113)
     | > loss_disc_real_2: 0.23692  (0.25068)
     | > loss_disc_real_3: 0.21097  (0.23031)
     | > loss_disc_real_4: 0.24488  (0.24028)
     | > loss_disc_real_5: 0.23682  (0.23437)
     | > loss_0: 2.74150  (2.66024)
     | > grad_norm_0: 26.12265  (20.57246)
     | > loss_gen: 1.99594  (2.05831)
     | > loss_kl: 1.46389  (1.47320)
     | > loss_feat: 2.54897  (2.57553)
     | > loss_mel: 22.36042  (22.84357)
     | > loss_duration: 1.75632  (1.68470)
     | > amp_scaler: 512.00000  (512.00000)
     | > loss_1: 30.12553  (30.63531)
     | > grad_norm_1: 259.73297  (213.64517)
     | > current_lr_0: 0.00020 
     | > current_lr_1: 0.00020 
     | > step_time: 0.82110  (0.75405)
     | > loader_time: 0.00630  (0.00910)


[1m > EVALUATION [0m





> DataLoader initialization
| > Tokenizer:
	| > add_blank: True
	| > use_eos_bos: False
	| > use_phonemes: True
	| > phonemizer:
		| > phoneme language: es-es
		| > phoneme backend: gruut
	| > 6 not found characters:
	| > ɡ
	| > θ
	| > ͡
	| > ʃ
	| > ʝ
	| > ɾ
| > Number of instances : 4
 | > Preprocessing samples
 | > Max text length: 81
 | > Min text length: 46
 | > Avg text length: 66.5
 | 
 | > Max audio length: 80022.0
 | > Min audio length: 80022.0
 | > Avg audio length: 80022.0
 | > Num. instances discarded samples: 0
 | > Batch group size: 0.
 | > Synthesizing test sentences.



  [1m--> EVAL PERFORMANCE[0m
     | > avg_loader_time:[92m 0.51191 [0m(-0.00021)
     | > avg_loss_disc:[91m 2.84177 [0m(+0.15534)
     | > avg_loss_disc_real_0:[91m 0.50409 [0m(+0.12224)
     | > avg_loss_disc_real_1:[92m 0.20741 [0m(-0.01780)
     | > avg_loss_disc_real_2:[91m 0.35944 [0m(+0.08022)
     | > avg_loss_disc_real_3:[92m 0.20738 [0m(-0.07476)
     | > avg_loss_disc_real_4:[91m 0.29314 [0m(+0.11076)
     | > avg_loss_disc_real_5:[92m 0.27097 [0m(-0.00290)
     | > avg_loss_0:[91m 2.84177 [0m(+0.15534)
     | > avg_loss_gen:[91m 2.70555 [0m(+0.21759)
     | > avg_loss_kl:[92m 1.34994 [0m(-0.20228)
     | > avg_loss_feat:[92m 2.74869 [0m(-0.12572)
     | > avg_loss_mel:[91m 23.43608 [0m(+1.56482)
     | > avg_loss_duration:[92m 1.60196 [0m(-0.00599)
     | > avg_loss_1:[91m 31.84221 [0m(+1.44842)


[4m[1m > EPOCH: 82/100[0m
 --> /content/drive2/MyDrive/tts-ai/VITS-es-1-April-28-2023_09+16PM-7712052

[1m > TRAINING (2023-04-28 22:06:13) [



> DataLoader initialization
| > Tokenizer:
	| > add_blank: True
	| > use_eos_bos: False
	| > use_phonemes: True
	| > phonemizer:
		| > phoneme language: es-es
		| > phoneme backend: gruut
	| > 6 not found characters:
	| > ɡ
	| > θ
	| > ͡
	| > ʃ
	| > ʝ
	| > ɾ
| > Number of instances : 401
 | > Preprocessing samples
 | > Max text length: 94
 | > Min text length: 20
 | > Avg text length: 61.45885286783042
 | 
 | > Max audio length: 80022.0
 | > Min audio length: 64166.0
 | > Avg audio length: 79982.45885286783
 | > Num. instances discarded samples: 0
 | > Batch group size: 0.



[1m   --> STEP: 7/26 -- GLOBAL_STEP: 4350[0m
     | > loss_disc: 2.66156  (2.73478)
     | > loss_disc_real_0: 0.13112  (0.23448)
     | > loss_disc_real_1: 0.26611  (0.22309)
     | > loss_disc_real_2: 0.20850  (0.24952)
     | > loss_disc_real_3: 0.29084  (0.26111)
     | > loss_disc_real_4: 0.20044  (0.24600)
     | > loss_disc_real_5: 0.22147  (0.24838)
     | > loss_0: 2.66156  (2.73478)
     | > grad_norm_0: 12.51545  (22.46390)
     | > loss_gen: 1.95481  (2.00676)
     | > loss_kl: 1.39805  (1.33342)
     | > loss_feat: 2.92008  (2.46160)
     | > loss_mel: 24.60902  (22.90809)
     | > loss_duration: 1.68255  (1.67877)
     | > amp_scaler: 512.00000  (512.00000)
     | > loss_1: 32.56452  (30.38864)
     | > grad_norm_1: 174.77899  (221.75197)
     | > current_lr_0: 0.00020 
     | > current_lr_1: 0.00020 
     | > step_time: 0.88800  (0.85339)
     | > loader_time: 0.01750  (0.01419)


[1m > EVALUATION [0m





> DataLoader initialization
| > Tokenizer:
	| > add_blank: True
	| > use_eos_bos: False
	| > use_phonemes: True
	| > phonemizer:
		| > phoneme language: es-es
		| > phoneme backend: gruut
	| > 6 not found characters:
	| > ɡ
	| > θ
	| > ͡
	| > ʃ
	| > ʝ
	| > ɾ
| > Number of instances : 4
 | > Preprocessing samples
 | > Max text length: 81
 | > Min text length: 46
 | > Avg text length: 66.5
 | 
 | > Max audio length: 80022.0
 | > Min audio length: 80022.0
 | > Avg audio length: 80022.0
 | > Num. instances discarded samples: 0
 | > Batch group size: 0.
 | > Synthesizing test sentences.



  [1m--> EVAL PERFORMANCE[0m
     | > avg_loader_time:[91m 0.75039 [0m(+0.23848)
     | > avg_loss_disc:[92m 2.35610 [0m(-0.48567)
     | > avg_loss_disc_real_0:[92m 0.19399 [0m(-0.31010)
     | > avg_loss_disc_real_1:[92m 0.17027 [0m(-0.03713)
     | > avg_loss_disc_real_2:[92m 0.21585 [0m(-0.14359)
     | > avg_loss_disc_real_3:[91m 0.23484 [0m(+0.02747)
     | > avg_loss_disc_real_4:[92m 0.16332 [0m(-0.12982)
     | > avg_loss_disc_real_5:[92m 0.21197 [0m(-0.05900)
     | > avg_loss_0:[92m 2.35610 [0m(-0.48567)
     | > avg_loss_gen:[92m 2.29861 [0m(-0.40694)
     | > avg_loss_kl:[91m 1.83079 [0m(+0.48085)
     | > avg_loss_feat:[91m 4.17453 [0m(+1.42585)
     | > avg_loss_mel:[91m 25.85674 [0m(+2.42066)
     | > avg_loss_duration:[92m 1.59668 [0m(-0.00528)
     | > avg_loss_1:[91m 35.75735 [0m(+3.91513)


[4m[1m > EPOCH: 83/100[0m
 --> /content/drive2/MyDrive/tts-ai/VITS-es-1-April-28-2023_09+16PM-7712052

[1m > TRAINING (2023-04-28 22:06:46) [



> DataLoader initialization
| > Tokenizer:
	| > add_blank: True
	| > use_eos_bos: False
	| > use_phonemes: True
	| > phonemizer:
		| > phoneme language: es-es
		| > phoneme backend: gruut
	| > 6 not found characters:
	| > ɡ
	| > θ
	| > ͡
	| > ʃ
	| > ʝ
	| > ɾ
| > Number of instances : 401
 | > Preprocessing samples
 | > Max text length: 94
 | > Min text length: 20
 | > Avg text length: 61.45885286783042
 | 
 | > Max audio length: 80022.0
 | > Min audio length: 64166.0
 | > Avg audio length: 79982.45885286783
 | > Num. instances discarded samples: 0
 | > Batch group size: 0.



[1m   --> STEP: 6/26 -- GLOBAL_STEP: 4375[0m
     | > loss_disc: 2.57077  (2.57427)
     | > loss_disc_real_0: 0.15225  (0.18558)
     | > loss_disc_real_1: 0.19476  (0.22142)
     | > loss_disc_real_2: 0.26666  (0.24627)
     | > loss_disc_real_3: 0.20019  (0.24905)
     | > loss_disc_real_4: 0.24729  (0.22245)
     | > loss_disc_real_5: 0.23437  (0.23037)
     | > loss_0: 2.57077  (2.57427)
     | > grad_norm_0: 7.10327  (6.87411)
     | > loss_gen: 1.93712  (2.04377)
     | > loss_kl: 1.29444  (1.48093)
     | > loss_feat: 2.43555  (2.56914)
     | > loss_mel: 23.40388  (23.91964)
     | > loss_duration: 1.72350  (1.68606)
     | > amp_scaler: 512.00000  (512.00000)
     | > loss_1: 30.79450  (31.69954)
     | > grad_norm_1: 245.66823  (212.90263)
     | > current_lr_0: 0.00020 
     | > current_lr_1: 0.00020 
     | > step_time: 0.72140  (0.82185)
     | > loader_time: 0.00620  (0.01137)


[1m > EVALUATION [0m





> DataLoader initialization
| > Tokenizer:
	| > add_blank: True
	| > use_eos_bos: False
	| > use_phonemes: True
	| > phonemizer:
		| > phoneme language: es-es
		| > phoneme backend: gruut
	| > 6 not found characters:
	| > ɡ
	| > θ
	| > ͡
	| > ʃ
	| > ʝ
	| > ɾ
| > Number of instances : 4
 | > Preprocessing samples
 | > Max text length: 81
 | > Min text length: 46
 | > Avg text length: 66.5
 | 
 | > Max audio length: 80022.0
 | > Min audio length: 80022.0
 | > Avg audio length: 80022.0
 | > Num. instances discarded samples: 0
 | > Batch group size: 0.
 | > Synthesizing test sentences.



  [1m--> EVAL PERFORMANCE[0m
     | > avg_loader_time:[92m 0.51572 [0m(-0.23466)
     | > avg_loss_disc:[91m 2.53455 [0m(+0.17845)
     | > avg_loss_disc_real_0:[91m 0.32944 [0m(+0.13546)
     | > avg_loss_disc_real_1:[91m 0.23535 [0m(+0.06508)
     | > avg_loss_disc_real_2:[91m 0.44011 [0m(+0.22426)
     | > avg_loss_disc_real_3:[91m 0.26466 [0m(+0.02982)
     | > avg_loss_disc_real_4:[91m 0.20987 [0m(+0.04655)
     | > avg_loss_disc_real_5:[92m 0.21138 [0m(-0.00059)
     | > avg_loss_0:[91m 2.53455 [0m(+0.17845)
     | > avg_loss_gen:[91m 2.78935 [0m(+0.49074)
     | > avg_loss_kl:[92m 1.17594 [0m(-0.65484)
     | > avg_loss_feat:[92m 3.44683 [0m(-0.72770)
     | > avg_loss_mel:[92m 24.57789 [0m(-1.27885)
     | > avg_loss_duration:[92m 1.57739 [0m(-0.01929)
     | > avg_loss_1:[92m 33.56741 [0m(-2.18994)


[4m[1m > EPOCH: 84/100[0m
 --> /content/drive2/MyDrive/tts-ai/VITS-es-1-April-28-2023_09+16PM-7712052

[1m > TRAINING (2023-04-28 22:07:21) [



> DataLoader initialization
| > Tokenizer:
	| > add_blank: True
	| > use_eos_bos: False
	| > use_phonemes: True
	| > phonemizer:
		| > phoneme language: es-es
		| > phoneme backend: gruut
	| > 6 not found characters:
	| > ɡ
	| > θ
	| > ͡
	| > ʃ
	| > ʝ
	| > ɾ
| > Number of instances : 401
 | > Preprocessing samples
 | > Max text length: 94
 | > Min text length: 20
 | > Avg text length: 61.45885286783042
 | 
 | > Max audio length: 80022.0
 | > Min audio length: 64166.0
 | > Avg audio length: 79982.45885286783
 | > Num. instances discarded samples: 0
 | > Batch group size: 0.



[1m   --> STEP: 5/26 -- GLOBAL_STEP: 4400[0m
     | > loss_disc: 2.53200  (2.63475)
     | > loss_disc_real_0: 0.18082  (0.18891)
     | > loss_disc_real_1: 0.17862  (0.17903)
     | > loss_disc_real_2: 0.31463  (0.24508)
     | > loss_disc_real_3: 0.27038  (0.26216)
     | > loss_disc_real_4: 0.19308  (0.23766)
     | > loss_disc_real_5: 0.21425  (0.23519)
     | > loss_0: 2.53200  (2.63475)
     | > grad_norm_0: 5.53058  (12.22151)
     | > loss_gen: 2.02317  (2.00669)
     | > loss_kl: 1.57773  (1.65898)
     | > loss_feat: 3.23601  (2.65294)
     | > loss_mel: 22.97779  (22.75528)
     | > loss_duration: 1.60501  (1.67063)
     | > amp_scaler: 512.00000  (512.00000)
     | > loss_1: 31.41970  (30.74452)
     | > grad_norm_1: 208.14069  (190.68413)
     | > current_lr_0: 0.00020 
     | > current_lr_1: 0.00020 
     | > step_time: 0.72470  (0.73407)
     | > loader_time: 0.01000  (0.00877)


[1m > EVALUATION [0m





> DataLoader initialization
| > Tokenizer:
	| > add_blank: True
	| > use_eos_bos: False
	| > use_phonemes: True
	| > phonemizer:
		| > phoneme language: es-es
		| > phoneme backend: gruut
	| > 6 not found characters:
	| > ɡ
	| > θ
	| > ͡
	| > ʃ
	| > ʝ
	| > ɾ
| > Number of instances : 4
 | > Preprocessing samples
 | > Max text length: 81
 | > Min text length: 46
 | > Avg text length: 66.5
 | 
 | > Max audio length: 80022.0
 | > Min audio length: 80022.0
 | > Avg audio length: 80022.0
 | > Num. instances discarded samples: 0
 | > Batch group size: 0.
 | > Synthesizing test sentences.



  [1m--> EVAL PERFORMANCE[0m
     | > avg_loader_time:[92m 0.51055 [0m(-0.00517)
     | > avg_loss_disc:[91m 2.68148 [0m(+0.14693)
     | > avg_loss_disc_real_0:[92m 0.12025 [0m(-0.20919)
     | > avg_loss_disc_real_1:[92m 0.19971 [0m(-0.03565)
     | > avg_loss_disc_real_2:[92m 0.18753 [0m(-0.25257)
     | > avg_loss_disc_real_3:[92m 0.25089 [0m(-0.01378)
     | > avg_loss_disc_real_4:[91m 0.25474 [0m(+0.04486)
     | > avg_loss_disc_real_5:[91m 0.33058 [0m(+0.11920)
     | > avg_loss_0:[91m 2.68148 [0m(+0.14693)
     | > avg_loss_gen:[92m 1.85929 [0m(-0.93005)
     | > avg_loss_kl:[91m 1.57453 [0m(+0.39858)
     | > avg_loss_feat:[92m 1.70212 [0m(-1.74472)
     | > avg_loss_mel:[92m 19.58298 [0m(-4.99491)
     | > avg_loss_duration:[91m 1.59975 [0m(+0.02236)
     | > avg_loss_1:[92m 26.31866 [0m(-7.24874)

 > BEST MODEL : /content/drive2/MyDrive/tts-ai/VITS-es-1-April-28-2023_09+16PM-7712052/best_model_4421.pth

[4m[1m > EPOCH: 85/100[0m
 --> /cont



> DataLoader initialization
| > Tokenizer:
	| > add_blank: True
	| > use_eos_bos: False
	| > use_phonemes: True
	| > phonemizer:
		| > phoneme language: es-es
		| > phoneme backend: gruut
	| > 6 not found characters:
	| > ɡ
	| > θ
	| > ͡
	| > ʃ
	| > ʝ
	| > ɾ
| > Number of instances : 401
 | > Preprocessing samples
 | > Max text length: 94
 | > Min text length: 20
 | > Avg text length: 61.45885286783042
 | 
 | > Max audio length: 80022.0
 | > Min audio length: 64166.0
 | > Avg audio length: 79982.45885286783
 | > Num. instances discarded samples: 0
 | > Batch group size: 0.



[1m   --> STEP: 4/26 -- GLOBAL_STEP: 4425[0m
     | > loss_disc: 2.55362  (2.65997)
     | > loss_disc_real_0: 0.21031  (0.19621)
     | > loss_disc_real_1: 0.17181  (0.18639)
     | > loss_disc_real_2: 0.25321  (0.26331)
     | > loss_disc_real_3: 0.24057  (0.23043)
     | > loss_disc_real_4: 0.20245  (0.20084)
     | > loss_disc_real_5: 0.28146  (0.31081)
     | > loss_0: 2.55362  (2.65997)
     | > grad_norm_0: 5.33518  (8.02307)
     | > loss_gen: 2.09601  (1.97518)
     | > loss_kl: 1.48978  (1.50620)
     | > loss_feat: 2.34584  (2.14694)
     | > loss_mel: 23.30715  (22.76163)
     | > loss_duration: 1.64862  (1.67943)
     | > amp_scaler: 512.00000  (512.00000)
     | > loss_1: 30.88740  (30.06938)
     | > grad_norm_1: 201.57292  (205.79506)
     | > current_lr_0: 0.00020 
     | > current_lr_1: 0.00020 
     | > step_time: 0.89220  (0.84191)
     | > loader_time: 0.00860  (0.01085)


[1m > EVALUATION [0m





> DataLoader initialization
| > Tokenizer:
	| > add_blank: True
	| > use_eos_bos: False
	| > use_phonemes: True
	| > phonemizer:
		| > phoneme language: es-es
		| > phoneme backend: gruut
	| > 6 not found characters:
	| > ɡ
	| > θ
	| > ͡
	| > ʃ
	| > ʝ
	| > ɾ
| > Number of instances : 4
 | > Preprocessing samples
 | > Max text length: 81
 | > Min text length: 46
 | > Avg text length: 66.5
 | 
 | > Max audio length: 80022.0
 | > Min audio length: 80022.0
 | > Avg audio length: 80022.0
 | > Num. instances discarded samples: 0
 | > Batch group size: 0.
 | > Synthesizing test sentences.



  [1m--> EVAL PERFORMANCE[0m
     | > avg_loader_time:[91m 0.69079 [0m(+0.18024)
     | > avg_loss_disc:[92m 2.32599 [0m(-0.35548)
     | > avg_loss_disc_real_0:[92m 0.11778 [0m(-0.00247)
     | > avg_loss_disc_real_1:[92m 0.13505 [0m(-0.06466)
     | > avg_loss_disc_real_2:[91m 0.19362 [0m(+0.00608)
     | > avg_loss_disc_real_3:[92m 0.22152 [0m(-0.02937)
     | > avg_loss_disc_real_4:[92m 0.15727 [0m(-0.09747)
     | > avg_loss_disc_real_5:[92m 0.21501 [0m(-0.11557)
     | > avg_loss_0:[92m 2.32599 [0m(-0.35548)
     | > avg_loss_gen:[91m 2.02417 [0m(+0.16488)
     | > avg_loss_kl:[92m 1.45295 [0m(-0.12158)
     | > avg_loss_feat:[91m 3.67037 [0m(+1.96826)
     | > avg_loss_mel:[91m 25.35003 [0m(+5.76705)
     | > avg_loss_duration:[92m 1.57858 [0m(-0.02117)
     | > avg_loss_1:[91m 34.07610 [0m(+7.75743)


[4m[1m > EPOCH: 86/100[0m
 --> /content/drive2/MyDrive/tts-ai/VITS-es-1-April-28-2023_09+16PM-7712052

[1m > TRAINING (2023-04-28 22:08:45) [



> DataLoader initialization
| > Tokenizer:
	| > add_blank: True
	| > use_eos_bos: False
	| > use_phonemes: True
	| > phonemizer:
		| > phoneme language: es-es
		| > phoneme backend: gruut
	| > 6 not found characters:
	| > ɡ
	| > θ
	| > ͡
	| > ʃ
	| > ʝ
	| > ɾ
| > Number of instances : 401
 | > Preprocessing samples
 | > Max text length: 94
 | > Min text length: 20
 | > Avg text length: 61.45885286783042
 | 
 | > Max audio length: 80022.0
 | > Min audio length: 64166.0
 | > Avg audio length: 79982.45885286783
 | > Num. instances discarded samples: 0
 | > Batch group size: 0.



[1m   --> STEP: 3/26 -- GLOBAL_STEP: 4450[0m
     | > loss_disc: 2.73357  (2.64768)
     | > loss_disc_real_0: 0.16042  (0.20013)
     | > loss_disc_real_1: 0.22380  (0.21963)
     | > loss_disc_real_2: 0.21889  (0.24047)
     | > loss_disc_real_3: 0.24114  (0.24787)
     | > loss_disc_real_4: 0.24781  (0.23302)
     | > loss_disc_real_5: 0.24714  (0.24933)
     | > loss_0: 2.73357  (2.64768)
     | > grad_norm_0: 9.76865  (19.06503)
     | > loss_gen: 2.08631  (2.00650)
     | > loss_kl: 1.46944  (1.29318)
     | > loss_feat: 2.43288  (2.44368)
     | > loss_mel: 23.90615  (23.35807)
     | > loss_duration: 1.72521  (1.69195)
     | > amp_scaler: 512.00000  (512.00000)
     | > loss_1: 31.61999  (30.79339)
     | > grad_norm_1: 349.45062  (203.17197)
     | > current_lr_0: 0.00020 
     | > current_lr_1: 0.00020 
     | > step_time: 0.73420  (0.74216)
     | > loader_time: 0.00800  (0.00717)


[1m > EVALUATION [0m





> DataLoader initialization
| > Tokenizer:
	| > add_blank: True
	| > use_eos_bos: False
	| > use_phonemes: True
	| > phonemizer:
		| > phoneme language: es-es
		| > phoneme backend: gruut
	| > 6 not found characters:
	| > ɡ
	| > θ
	| > ͡
	| > ʃ
	| > ʝ
	| > ɾ
| > Number of instances : 4
 | > Preprocessing samples
 | > Max text length: 81
 | > Min text length: 46
 | > Avg text length: 66.5
 | 
 | > Max audio length: 80022.0
 | > Min audio length: 80022.0
 | > Avg audio length: 80022.0
 | > Num. instances discarded samples: 0
 | > Batch group size: 0.
 | > Synthesizing test sentences.



  [1m--> EVAL PERFORMANCE[0m
     | > avg_loader_time:[91m 0.78722 [0m(+0.09644)
     | > avg_loss_disc:[91m 2.77138 [0m(+0.44539)
     | > avg_loss_disc_real_0:[91m 0.62703 [0m(+0.50925)
     | > avg_loss_disc_real_1:[92m 0.09199 [0m(-0.04306)
     | > avg_loss_disc_real_2:[92m 0.17693 [0m(-0.01669)
     | > avg_loss_disc_real_3:[92m 0.13208 [0m(-0.08944)
     | > avg_loss_disc_real_4:[91m 0.19691 [0m(+0.03964)
     | > avg_loss_disc_real_5:[91m 0.22853 [0m(+0.01353)
     | > avg_loss_0:[91m 2.77138 [0m(+0.44539)
     | > avg_loss_gen:[91m 2.47227 [0m(+0.44810)
     | > avg_loss_kl:[91m 1.63986 [0m(+0.18691)
     | > avg_loss_feat:[92m 3.46335 [0m(-0.20703)
     | > avg_loss_mel:[92m 23.60316 [0m(-1.74687)
     | > avg_loss_duration:[91m 1.60044 [0m(+0.02186)
     | > avg_loss_1:[92m 32.77908 [0m(-1.29702)


[4m[1m > EPOCH: 87/100[0m
 --> /content/drive2/MyDrive/tts-ai/VITS-es-1-April-28-2023_09+16PM-7712052

[1m > TRAINING (2023-04-28 22:09:19) [



> DataLoader initialization
| > Tokenizer:
	| > add_blank: True
	| > use_eos_bos: False
	| > use_phonemes: True
	| > phonemizer:
		| > phoneme language: es-es
		| > phoneme backend: gruut
	| > 6 not found characters:
	| > ɡ
	| > θ
	| > ͡
	| > ʃ
	| > ʝ
	| > ɾ
| > Number of instances : 401
 | > Preprocessing samples
 | > Max text length: 94
 | > Min text length: 20
 | > Avg text length: 61.45885286783042
 | 
 | > Max audio length: 80022.0
 | > Min audio length: 64166.0
 | > Avg audio length: 79982.45885286783
 | > Num. instances discarded samples: 0
 | > Batch group size: 0.



[1m   --> STEP: 2/26 -- GLOBAL_STEP: 4475[0m
     | > loss_disc: 2.80187  (2.72892)
     | > loss_disc_real_0: 0.05157  (0.14879)
     | > loss_disc_real_1: 0.23043  (0.20798)
     | > loss_disc_real_2: 0.29633  (0.28668)
     | > loss_disc_real_3: 0.27071  (0.23952)
     | > loss_disc_real_4: 0.26710  (0.26203)
     | > loss_disc_real_5: 0.22844  (0.24981)
     | > loss_0: 2.80187  (2.72892)
     | > grad_norm_0: 48.42089  (29.98845)
     | > loss_gen: 2.14925  (1.95829)
     | > loss_kl: 1.32784  (1.40874)
     | > loss_feat: 2.98293  (2.43525)
     | > loss_mel: 22.75992  (22.60836)
     | > loss_duration: 1.67891  (1.66913)
     | > amp_scaler: 512.00000  (512.00000)
     | > loss_1: 30.89884  (30.07978)
     | > grad_norm_1: 108.81158  (195.44504)
     | > current_lr_0: 0.00020 
     | > current_lr_1: 0.00020 
     | > step_time: 0.86220  (0.86872)
     | > loader_time: 0.00610  (0.00589)


[1m > EVALUATION [0m





> DataLoader initialization
| > Tokenizer:
	| > add_blank: True
	| > use_eos_bos: False
	| > use_phonemes: True
	| > phonemizer:
		| > phoneme language: es-es
		| > phoneme backend: gruut
	| > 6 not found characters:
	| > ɡ
	| > θ
	| > ͡
	| > ʃ
	| > ʝ
	| > ɾ
| > Number of instances : 4
 | > Preprocessing samples
 | > Max text length: 81
 | > Min text length: 46
 | > Avg text length: 66.5
 | 
 | > Max audio length: 80022.0
 | > Min audio length: 80022.0
 | > Avg audio length: 80022.0
 | > Num. instances discarded samples: 0
 | > Batch group size: 0.
 | > Synthesizing test sentences.



  [1m--> EVAL PERFORMANCE[0m
     | > avg_loader_time:[92m 0.53185 [0m(-0.25537)
     | > avg_loss_disc:[92m 2.39250 [0m(-0.37888)
     | > avg_loss_disc_real_0:[92m 0.39684 [0m(-0.23019)
     | > avg_loss_disc_real_1:[91m 0.16672 [0m(+0.07473)
     | > avg_loss_disc_real_2:[91m 0.28322 [0m(+0.10629)
     | > avg_loss_disc_real_3:[91m 0.18692 [0m(+0.05485)
     | > avg_loss_disc_real_4:[91m 0.22320 [0m(+0.02629)
     | > avg_loss_disc_real_5:[92m 0.18837 [0m(-0.04017)
     | > avg_loss_0:[92m 2.39250 [0m(-0.37888)
     | > avg_loss_gen:[91m 2.72349 [0m(+0.25122)
     | > avg_loss_kl:[92m 1.39662 [0m(-0.24323)
     | > avg_loss_feat:[91m 4.06127 [0m(+0.59792)
     | > avg_loss_mel:[91m 24.62743 [0m(+1.02427)
     | > avg_loss_duration:[92m 1.58107 [0m(-0.01937)
     | > avg_loss_1:[91m 34.38988 [0m(+1.61081)


[4m[1m > EPOCH: 88/100[0m
 --> /content/drive2/MyDrive/tts-ai/VITS-es-1-April-28-2023_09+16PM-7712052

[1m > TRAINING (2023-04-28 22:09:54) [



> DataLoader initialization
| > Tokenizer:
	| > add_blank: True
	| > use_eos_bos: False
	| > use_phonemes: True
	| > phonemizer:
		| > phoneme language: es-es
		| > phoneme backend: gruut
	| > 6 not found characters:
	| > ɡ
	| > θ
	| > ͡
	| > ʃ
	| > ʝ
	| > ɾ
| > Number of instances : 401
 | > Preprocessing samples
 | > Max text length: 94
 | > Min text length: 20
 | > Avg text length: 61.45885286783042
 | 
 | > Max audio length: 80022.0
 | > Min audio length: 64166.0
 | > Avg audio length: 79982.45885286783
 | > Num. instances discarded samples: 0
 | > Batch group size: 0.



[1m   --> STEP: 1/26 -- GLOBAL_STEP: 4500[0m
     | > loss_disc: 2.55344  (2.55344)
     | > loss_disc_real_0: 0.32209  (0.32209)
     | > loss_disc_real_1: 0.20255  (0.20255)
     | > loss_disc_real_2: 0.29564  (0.29564)
     | > loss_disc_real_3: 0.26552  (0.26552)
     | > loss_disc_real_4: 0.24449  (0.24449)
     | > loss_disc_real_5: 0.25192  (0.25192)
     | > loss_0: 2.55344  (2.55344)
     | > grad_norm_0: 17.62689  (17.62689)
     | > loss_gen: 1.96916  (1.96916)
     | > loss_kl: 1.62504  (1.62504)
     | > loss_feat: 2.69691  (2.69691)
     | > loss_mel: 24.17413  (24.17413)
     | > loss_duration: 1.67627  (1.67627)
     | > amp_scaler: 512.00000  (512.00000)
     | > loss_1: 32.14152  (32.14152)
     | > grad_norm_1: 192.56920  (192.56920)
     | > current_lr_0: 0.00020 
     | > current_lr_1: 0.00020 
     | > step_time: 0.79500  (0.79502)
     | > loader_time: 0.00860  (0.00864)


[1m > EVALUATION [0m





> DataLoader initialization
| > Tokenizer:
	| > add_blank: True
	| > use_eos_bos: False
	| > use_phonemes: True
	| > phonemizer:
		| > phoneme language: es-es
		| > phoneme backend: gruut
	| > 6 not found characters:
	| > ɡ
	| > θ
	| > ͡
	| > ʃ
	| > ʝ
	| > ɾ
| > Number of instances : 4
 | > Preprocessing samples
 | > Max text length: 81
 | > Min text length: 46
 | > Avg text length: 66.5
 | 
 | > Max audio length: 80022.0
 | > Min audio length: 80022.0
 | > Avg audio length: 80022.0
 | > Num. instances discarded samples: 0
 | > Batch group size: 0.
 | > Synthesizing test sentences.



  [1m--> EVAL PERFORMANCE[0m
     | > avg_loader_time:[92m 0.51411 [0m(-0.01774)
     | > avg_loss_disc:[91m 2.80608 [0m(+0.41358)
     | > avg_loss_disc_real_0:[91m 0.41940 [0m(+0.02256)
     | > avg_loss_disc_real_1:[92m 0.13753 [0m(-0.02919)
     | > avg_loss_disc_real_2:[92m 0.15537 [0m(-0.12785)
     | > avg_loss_disc_real_3:[91m 0.32346 [0m(+0.13654)
     | > avg_loss_disc_real_4:[92m 0.18637 [0m(-0.03683)
     | > avg_loss_disc_real_5:[91m 0.21070 [0m(+0.02233)
     | > avg_loss_0:[91m 2.80608 [0m(+0.41358)
     | > avg_loss_gen:[92m 1.98668 [0m(-0.73681)
     | > avg_loss_kl:[91m 1.48832 [0m(+0.09169)
     | > avg_loss_feat:[92m 2.93724 [0m(-1.12403)
     | > avg_loss_mel:[92m 22.69785 [0m(-1.92958)
     | > avg_loss_duration:[92m 1.57954 [0m(-0.00153)
     | > avg_loss_1:[92m 30.68964 [0m(-3.70025)


[4m[1m > EPOCH: 89/100[0m
 --> /content/drive2/MyDrive/tts-ai/VITS-es-1-April-28-2023_09+16PM-7712052

[1m > TRAINING (2023-04-28 22:10:29) [



> DataLoader initialization
| > Tokenizer:
	| > add_blank: True
	| > use_eos_bos: False
	| > use_phonemes: True
	| > phonemizer:
		| > phoneme language: es-es
		| > phoneme backend: gruut
	| > 6 not found characters:
	| > ɡ
	| > θ
	| > ͡
	| > ʃ
	| > ʝ
	| > ɾ
| > Number of instances : 401
 | > Preprocessing samples
 | > Max text length: 94
 | > Min text length: 20
 | > Avg text length: 61.45885286783042
 | 
 | > Max audio length: 80022.0
 | > Min audio length: 64166.0
 | > Avg audio length: 79982.45885286783
 | > Num. instances discarded samples: 0
 | > Batch group size: 0.



[1m   --> STEP: 0/26 -- GLOBAL_STEP: 4525[0m
     | > loss_disc: 2.60459  (2.60459)
     | > loss_disc_real_0: 0.27905  (0.27905)
     | > loss_disc_real_1: 0.11987  (0.11987)
     | > loss_disc_real_2: 0.16366  (0.16366)
     | > loss_disc_real_3: 0.25410  (0.25410)
     | > loss_disc_real_4: 0.15145  (0.15145)
     | > loss_disc_real_5: 0.22246  (0.22246)
     | > loss_0: 2.60459  (2.60459)
     | > grad_norm_0: 19.84483  (19.84483)
     | > loss_gen: 2.37062  (2.37062)
     | > loss_kl: 1.79795  (1.79795)
     | > loss_feat: 3.73139  (3.73139)
     | > loss_mel: 26.03210  (26.03210)
     | > loss_duration: 1.67697  (1.67697)
     | > amp_scaler: 512.00000  (512.00000)
     | > loss_1: 35.60903  (35.60903)
     | > grad_norm_1: 261.12384  (261.12384)
     | > current_lr_0: 0.00020 
     | > current_lr_1: 0.00020 
     | > step_time: 1.03280  (1.03281)
     | > loader_time: 0.69570  (0.69571)


[1m   --> STEP: 25/26 -- GLOBAL_STEP: 4550[0m
     | > loss_disc: 1.65279  (2.53558)
 



> DataLoader initialization
| > Tokenizer:
	| > add_blank: True
	| > use_eos_bos: False
	| > use_phonemes: True
	| > phonemizer:
		| > phoneme language: es-es
		| > phoneme backend: gruut
	| > 6 not found characters:
	| > ɡ
	| > θ
	| > ͡
	| > ʃ
	| > ʝ
	| > ɾ
| > Number of instances : 4
 | > Preprocessing samples
 | > Max text length: 81
 | > Min text length: 46
 | > Avg text length: 66.5
 | 
 | > Max audio length: 80022.0
 | > Min audio length: 80022.0
 | > Avg audio length: 80022.0
 | > Num. instances discarded samples: 0
 | > Batch group size: 0.
 | > Synthesizing test sentences.



  [1m--> EVAL PERFORMANCE[0m
     | > avg_loader_time:[92m 0.50424 [0m(-0.00987)
     | > avg_loss_disc:[92m 2.21330 [0m(-0.59279)
     | > avg_loss_disc_real_0:[92m 0.18011 [0m(-0.23928)
     | > avg_loss_disc_real_1:[91m 0.13979 [0m(+0.00227)
     | > avg_loss_disc_real_2:[91m 0.22952 [0m(+0.07415)
     | > avg_loss_disc_real_3:[92m 0.16639 [0m(-0.15707)
     | > avg_loss_disc_real_4:[91m 0.22116 [0m(+0.03479)
     | > avg_loss_disc_real_5:[92m 0.14495 [0m(-0.06575)
     | > avg_loss_0:[92m 2.21330 [0m(-0.59279)
     | > avg_loss_gen:[91m 2.33194 [0m(+0.34525)
     | > avg_loss_kl:[92m 1.40680 [0m(-0.08152)
     | > avg_loss_feat:[91m 4.19636 [0m(+1.25912)
     | > avg_loss_mel:[91m 24.43856 [0m(+1.74071)
     | > avg_loss_duration:[92m 1.55734 [0m(-0.02220)
     | > avg_loss_1:[91m 33.93099 [0m(+3.24136)


[4m[1m > EPOCH: 90/100[0m
 --> /content/drive2/MyDrive/tts-ai/VITS-es-1-April-28-2023_09+16PM-7712052

[1m > TRAINING (2023-04-28 22:11:04) [



> DataLoader initialization
| > Tokenizer:
	| > add_blank: True
	| > use_eos_bos: False
	| > use_phonemes: True
	| > phonemizer:
		| > phoneme language: es-es
		| > phoneme backend: gruut
	| > 6 not found characters:
	| > ɡ
	| > θ
	| > ͡
	| > ʃ
	| > ʝ
	| > ɾ
| > Number of instances : 401
 | > Preprocessing samples
 | > Max text length: 94
 | > Min text length: 20
 | > Avg text length: 61.45885286783042
 | 
 | > Max audio length: 80022.0
 | > Min audio length: 64166.0
 | > Avg audio length: 79982.45885286783
 | > Num. instances discarded samples: 0
 | > Batch group size: 0.



[1m   --> STEP: 24/26 -- GLOBAL_STEP: 4575[0m
     | > loss_disc: 2.53818  (2.55339)
     | > loss_disc_real_0: 0.16365  (0.18407)
     | > loss_disc_real_1: 0.19726  (0.20997)
     | > loss_disc_real_2: 0.23785  (0.24377)
     | > loss_disc_real_3: 0.23654  (0.23614)
     | > loss_disc_real_4: 0.24164  (0.23680)
     | > loss_disc_real_5: 0.22397  (0.24879)
     | > loss_0: 2.53818  (2.55339)
     | > grad_norm_0: 6.42669  (9.11595)
     | > loss_gen: 2.07075  (2.10883)
     | > loss_kl: 1.41657  (1.38604)
     | > loss_feat: 2.76965  (2.77715)
     | > loss_mel: 24.20816  (23.00885)
     | > loss_duration: 1.67465  (1.67399)
     | > amp_scaler: 512.00000  (512.00000)
     | > loss_1: 32.13977  (30.95487)
     | > grad_norm_1: 127.92777  (183.65826)
     | > current_lr_0: 0.00020 
     | > current_lr_1: 0.00020 
     | > step_time: 0.83320  (0.75815)
     | > loader_time: 0.00690  (0.00867)


[1m > EVALUATION [0m





> DataLoader initialization
| > Tokenizer:
	| > add_blank: True
	| > use_eos_bos: False
	| > use_phonemes: True
	| > phonemizer:
		| > phoneme language: es-es
		| > phoneme backend: gruut
	| > 6 not found characters:
	| > ɡ
	| > θ
	| > ͡
	| > ʃ
	| > ʝ
	| > ɾ
| > Number of instances : 4
 | > Preprocessing samples
 | > Max text length: 81
 | > Min text length: 46
 | > Avg text length: 66.5
 | 
 | > Max audio length: 80022.0
 | > Min audio length: 80022.0
 | > Avg audio length: 80022.0
 | > Num. instances discarded samples: 0
 | > Batch group size: 0.
 | > Synthesizing test sentences.



  [1m--> EVAL PERFORMANCE[0m
     | > avg_loader_time:[91m 0.78954 [0m(+0.28530)
     | > avg_loss_disc:[91m 2.79357 [0m(+0.58027)
     | > avg_loss_disc_real_0:[91m 0.21685 [0m(+0.03674)
     | > avg_loss_disc_real_1:[91m 0.19473 [0m(+0.05493)
     | > avg_loss_disc_real_2:[91m 0.30057 [0m(+0.07105)
     | > avg_loss_disc_real_3:[91m 0.33873 [0m(+0.17234)
     | > avg_loss_disc_real_4:[91m 0.39854 [0m(+0.17737)
     | > avg_loss_disc_real_5:[91m 0.27268 [0m(+0.12773)
     | > avg_loss_0:[91m 2.79357 [0m(+0.58027)
     | > avg_loss_gen:[92m 2.22070 [0m(-0.11123)
     | > avg_loss_kl:[91m 1.62727 [0m(+0.22047)
     | > avg_loss_feat:[92m 2.48603 [0m(-1.71033)
     | > avg_loss_mel:[92m 23.97853 [0m(-0.46003)
     | > avg_loss_duration:[91m 1.58152 [0m(+0.02418)
     | > avg_loss_1:[92m 31.89405 [0m(-2.03695)


[4m[1m > EPOCH: 91/100[0m
 --> /content/drive2/MyDrive/tts-ai/VITS-es-1-April-28-2023_09+16PM-7712052

[1m > TRAINING (2023-04-28 22:11:38) [



> DataLoader initialization
| > Tokenizer:
	| > add_blank: True
	| > use_eos_bos: False
	| > use_phonemes: True
	| > phonemizer:
		| > phoneme language: es-es
		| > phoneme backend: gruut
	| > 6 not found characters:
	| > ɡ
	| > θ
	| > ͡
	| > ʃ
	| > ʝ
	| > ɾ
| > Number of instances : 401
 | > Preprocessing samples
 | > Max text length: 94
 | > Min text length: 20
 | > Avg text length: 61.45885286783042
 | 
 | > Max audio length: 80022.0
 | > Min audio length: 64166.0
 | > Avg audio length: 79982.45885286783
 | > Num. instances discarded samples: 0
 | > Batch group size: 0.



[1m   --> STEP: 23/26 -- GLOBAL_STEP: 4600[0m
     | > loss_disc: 2.75606  (2.58816)
     | > loss_disc_real_0: 0.39459  (0.20424)
     | > loss_disc_real_1: 0.21277  (0.20553)
     | > loss_disc_real_2: 0.27250  (0.23909)
     | > loss_disc_real_3: 0.24156  (0.24326)
     | > loss_disc_real_4: 0.24383  (0.24399)
     | > loss_disc_real_5: 0.25989  (0.24259)
     | > loss_0: 2.75606  (2.58816)
     | > grad_norm_0: 30.79561  (14.80457)
     | > loss_gen: 1.95835  (2.11704)
     | > loss_kl: 1.45772  (1.28229)
     | > loss_feat: 2.39134  (2.79721)
     | > loss_mel: 22.91502  (23.57904)
     | > loss_duration: 1.62706  (1.67808)
     | > amp_scaler: 512.00000  (512.00000)
     | > loss_1: 30.34948  (31.45365)
     | > grad_norm_1: 113.73142  (164.20828)
     | > current_lr_0: 0.00020 
     | > current_lr_1: 0.00020 
     | > step_time: 0.81230  (0.77360)
     | > loader_time: 0.01000  (0.01121)


[1m > EVALUATION [0m





> DataLoader initialization
| > Tokenizer:
	| > add_blank: True
	| > use_eos_bos: False
	| > use_phonemes: True
	| > phonemizer:
		| > phoneme language: es-es
		| > phoneme backend: gruut
	| > 6 not found characters:
	| > ɡ
	| > θ
	| > ͡
	| > ʃ
	| > ʝ
	| > ɾ
| > Number of instances : 4
 | > Preprocessing samples
 | > Max text length: 81
 | > Min text length: 46
 | > Avg text length: 66.5
 | 
 | > Max audio length: 80022.0
 | > Min audio length: 80022.0
 | > Avg audio length: 80022.0
 | > Num. instances discarded samples: 0
 | > Batch group size: 0.
 | > Synthesizing test sentences.



  [1m--> EVAL PERFORMANCE[0m
     | > avg_loader_time:[92m 0.53385 [0m(-0.25570)
     | > avg_loss_disc:[92m 2.52187 [0m(-0.27170)
     | > avg_loss_disc_real_0:[91m 0.28557 [0m(+0.06872)
     | > avg_loss_disc_real_1:[91m 0.21403 [0m(+0.01931)
     | > avg_loss_disc_real_2:[92m 0.14900 [0m(-0.15157)
     | > avg_loss_disc_real_3:[92m 0.22360 [0m(-0.11512)
     | > avg_loss_disc_real_4:[92m 0.20150 [0m(-0.19703)
     | > avg_loss_disc_real_5:[92m 0.23390 [0m(-0.03878)
     | > avg_loss_0:[92m 2.52187 [0m(-0.27170)
     | > avg_loss_gen:[91m 2.30070 [0m(+0.08000)
     | > avg_loss_kl:[91m 1.82465 [0m(+0.19738)
     | > avg_loss_feat:[91m 3.77497 [0m(+1.28894)
     | > avg_loss_mel:[92m 23.82950 [0m(-0.14903)
     | > avg_loss_duration:[91m 1.58854 [0m(+0.00701)
     | > avg_loss_1:[91m 33.31835 [0m(+1.42430)


[4m[1m > EPOCH: 92/100[0m
 --> /content/drive2/MyDrive/tts-ai/VITS-es-1-April-28-2023_09+16PM-7712052

[1m > TRAINING (2023-04-28 22:12:12) [



> DataLoader initialization
| > Tokenizer:
	| > add_blank: True
	| > use_eos_bos: False
	| > use_phonemes: True
	| > phonemizer:
		| > phoneme language: es-es
		| > phoneme backend: gruut
	| > 6 not found characters:
	| > ɡ
	| > θ
	| > ͡
	| > ʃ
	| > ʝ
	| > ɾ
| > Number of instances : 401
 | > Preprocessing samples
 | > Max text length: 94
 | > Min text length: 20
 | > Avg text length: 61.45885286783042
 | 
 | > Max audio length: 80022.0
 | > Min audio length: 64166.0
 | > Avg audio length: 79982.45885286783
 | > Num. instances discarded samples: 0
 | > Batch group size: 0.



[1m   --> STEP: 22/26 -- GLOBAL_STEP: 4625[0m
     | > loss_disc: 2.72721  (2.60304)
     | > loss_disc_real_0: 0.15327  (0.20111)
     | > loss_disc_real_1: 0.26751  (0.20824)
     | > loss_disc_real_2: 0.30000  (0.24008)
     | > loss_disc_real_3: 0.21242  (0.23799)
     | > loss_disc_real_4: 0.27873  (0.24029)
     | > loss_disc_real_5: 0.23891  (0.24651)
     | > loss_0: 2.72721  (2.60304)
     | > grad_norm_0: 12.86373  (9.73291)
     | > loss_gen: 1.87078  (2.04222)
     | > loss_kl: 1.54189  (1.35280)
     | > loss_feat: 1.98084  (2.61808)
     | > loss_mel: 20.75407  (22.55016)
     | > loss_duration: 1.65588  (1.66990)
     | > amp_scaler: 512.00000  (512.00000)
     | > loss_1: 27.80345  (30.23316)
     | > grad_norm_1: 93.64819  (191.55394)
     | > current_lr_0: 0.00020 
     | > current_lr_1: 0.00020 
     | > step_time: 0.72160  (0.75950)
     | > loader_time: 0.00590  (0.00835)


[1m > EVALUATION [0m





> DataLoader initialization
| > Tokenizer:
	| > add_blank: True
	| > use_eos_bos: False
	| > use_phonemes: True
	| > phonemizer:
		| > phoneme language: es-es
		| > phoneme backend: gruut
	| > 6 not found characters:
	| > ɡ
	| > θ
	| > ͡
	| > ʃ
	| > ʝ
	| > ɾ
| > Number of instances : 4
 | > Preprocessing samples
 | > Max text length: 81
 | > Min text length: 46
 | > Avg text length: 66.5
 | 
 | > Max audio length: 80022.0
 | > Min audio length: 80022.0
 | > Avg audio length: 80022.0
 | > Num. instances discarded samples: 0
 | > Batch group size: 0.
 | > Synthesizing test sentences.



  [1m--> EVAL PERFORMANCE[0m
     | > avg_loader_time:[92m 0.50795 [0m(-0.02590)
     | > avg_loss_disc:[92m 2.47441 [0m(-0.04745)
     | > avg_loss_disc_real_0:[92m 0.11446 [0m(-0.17112)
     | > avg_loss_disc_real_1:[91m 0.27492 [0m(+0.06089)
     | > avg_loss_disc_real_2:[91m 0.29615 [0m(+0.14715)
     | > avg_loss_disc_real_3:[92m 0.21078 [0m(-0.01283)
     | > avg_loss_disc_real_4:[91m 0.23204 [0m(+0.03054)
     | > avg_loss_disc_real_5:[91m 0.24845 [0m(+0.01455)
     | > avg_loss_0:[92m 2.47441 [0m(-0.04745)
     | > avg_loss_gen:[92m 2.29785 [0m(-0.00286)
     | > avg_loss_kl:[92m 1.61673 [0m(-0.20793)
     | > avg_loss_feat:[92m 3.21422 [0m(-0.56075)
     | > avg_loss_mel:[92m 22.12717 [0m(-1.70232)
     | > avg_loss_duration:[92m 1.57166 [0m(-0.01688)
     | > avg_loss_1:[92m 30.82762 [0m(-2.49073)


[4m[1m > EPOCH: 93/100[0m
 --> /content/drive2/MyDrive/tts-ai/VITS-es-1-April-28-2023_09+16PM-7712052

[1m > TRAINING (2023-04-28 22:12:48) [



> DataLoader initialization
| > Tokenizer:
	| > add_blank: True
	| > use_eos_bos: False
	| > use_phonemes: True
	| > phonemizer:
		| > phoneme language: es-es
		| > phoneme backend: gruut
	| > 6 not found characters:
	| > ɡ
	| > θ
	| > ͡
	| > ʃ
	| > ʝ
	| > ɾ
| > Number of instances : 401
 | > Preprocessing samples
 | > Max text length: 94
 | > Min text length: 20
 | > Avg text length: 61.45885286783042
 | 
 | > Max audio length: 80022.0
 | > Min audio length: 64166.0
 | > Avg audio length: 79982.45885286783
 | > Num. instances discarded samples: 0
 | > Batch group size: 0.



[1m   --> STEP: 21/26 -- GLOBAL_STEP: 4650[0m
     | > loss_disc: 2.57789  (2.61631)
     | > loss_disc_real_0: 0.10565  (0.19218)
     | > loss_disc_real_1: 0.19981  (0.21698)
     | > loss_disc_real_2: 0.21662  (0.24483)
     | > loss_disc_real_3: 0.21084  (0.23740)
     | > loss_disc_real_4: 0.21941  (0.23396)
     | > loss_disc_real_5: 0.26631  (0.25253)
     | > loss_0: 2.57789  (2.61631)
     | > grad_norm_0: 17.43215  (12.35150)
     | > loss_gen: 1.95066  (2.05912)
     | > loss_kl: 1.28324  (1.26482)
     | > loss_feat: 2.41510  (2.61705)
     | > loss_mel: 22.14908  (23.00301)
     | > loss_duration: 1.69002  (1.67253)
     | > amp_scaler: 512.00000  (512.00000)
     | > loss_1: 29.48811  (30.61653)
     | > grad_norm_1: 243.49158  (202.69708)
     | > current_lr_0: 0.00020 
     | > current_lr_1: 0.00020 
     | > step_time: 0.68960  (0.75712)
     | > loader_time: 0.00570  (0.00827)


[1m > EVALUATION [0m





> DataLoader initialization
| > Tokenizer:
	| > add_blank: True
	| > use_eos_bos: False
	| > use_phonemes: True
	| > phonemizer:
		| > phoneme language: es-es
		| > phoneme backend: gruut
	| > 6 not found characters:
	| > ɡ
	| > θ
	| > ͡
	| > ʃ
	| > ʝ
	| > ɾ
| > Number of instances : 4
 | > Preprocessing samples
 | > Max text length: 81
 | > Min text length: 46
 | > Avg text length: 66.5
 | 
 | > Max audio length: 80022.0
 | > Min audio length: 80022.0
 | > Avg audio length: 80022.0
 | > Num. instances discarded samples: 0
 | > Batch group size: 0.
 | > Synthesizing test sentences.



  [1m--> EVAL PERFORMANCE[0m
     | > avg_loader_time:[91m 0.51275 [0m(+0.00480)
     | > avg_loss_disc:[91m 2.69334 [0m(+0.21893)
     | > avg_loss_disc_real_0:[91m 0.18675 [0m(+0.07229)
     | > avg_loss_disc_real_1:[92m 0.23058 [0m(-0.04434)
     | > avg_loss_disc_real_2:[92m 0.23673 [0m(-0.05942)
     | > avg_loss_disc_real_3:[91m 0.25512 [0m(+0.04434)
     | > avg_loss_disc_real_4:[91m 0.24296 [0m(+0.01092)
     | > avg_loss_disc_real_5:[92m 0.24656 [0m(-0.00189)
     | > avg_loss_0:[91m 2.69334 [0m(+0.21893)
     | > avg_loss_gen:[92m 1.91005 [0m(-0.38779)
     | > avg_loss_kl:[92m 1.23055 [0m(-0.38618)
     | > avg_loss_feat:[92m 2.09467 [0m(-1.11955)
     | > avg_loss_mel:[92m 21.67835 [0m(-0.44883)
     | > avg_loss_duration:[91m 1.57807 [0m(+0.00641)
     | > avg_loss_1:[92m 28.49168 [0m(-2.33594)


[4m[1m > EPOCH: 94/100[0m
 --> /content/drive2/MyDrive/tts-ai/VITS-es-1-April-28-2023_09+16PM-7712052

[1m > TRAINING (2023-04-28 22:13:22) [



> DataLoader initialization
| > Tokenizer:
	| > add_blank: True
	| > use_eos_bos: False
	| > use_phonemes: True
	| > phonemizer:
		| > phoneme language: es-es
		| > phoneme backend: gruut
	| > 6 not found characters:
	| > ɡ
	| > θ
	| > ͡
	| > ʃ
	| > ʝ
	| > ɾ
| > Number of instances : 401
 | > Preprocessing samples
 | > Max text length: 94
 | > Min text length: 20
 | > Avg text length: 61.45885286783042
 | 
 | > Max audio length: 80022.0
 | > Min audio length: 64166.0
 | > Avg audio length: 79982.45885286783
 | > Num. instances discarded samples: 0
 | > Batch group size: 0.



[1m   --> STEP: 20/26 -- GLOBAL_STEP: 4675[0m
     | > loss_disc: 2.52715  (2.59135)
     | > loss_disc_real_0: 0.15509  (0.19472)
     | > loss_disc_real_1: 0.20652  (0.20677)
     | > loss_disc_real_2: 0.27470  (0.23794)
     | > loss_disc_real_3: 0.24868  (0.23669)
     | > loss_disc_real_4: 0.29137  (0.23146)
     | > loss_disc_real_5: 0.23616  (0.25362)
     | > loss_0: 2.52715  (2.59135)
     | > grad_norm_0: 7.03549  (12.21600)
     | > loss_gen: 2.12279  (2.04813)
     | > loss_kl: 1.03211  (1.33699)
     | > loss_feat: 2.73563  (2.62910)
     | > loss_mel: 22.37065  (22.70294)
     | > loss_duration: 1.67120  (1.66727)
     | > amp_scaler: 512.00000  (512.00000)
     | > loss_1: 29.93239  (30.38443)
     | > grad_norm_1: 292.92688  (191.10423)
     | > current_lr_0: 0.00020 
     | > current_lr_1: 0.00020 
     | > step_time: 0.68880  (0.76427)
     | > loader_time: 0.00600  (0.01027)


[1m > EVALUATION [0m





> DataLoader initialization
| > Tokenizer:
	| > add_blank: True
	| > use_eos_bos: False
	| > use_phonemes: True
	| > phonemizer:
		| > phoneme language: es-es
		| > phoneme backend: gruut
	| > 6 not found characters:
	| > ɡ
	| > θ
	| > ͡
	| > ʃ
	| > ʝ
	| > ɾ
| > Number of instances : 4
 | > Preprocessing samples
 | > Max text length: 81
 | > Min text length: 46
 | > Avg text length: 66.5
 | 
 | > Max audio length: 80022.0
 | > Min audio length: 80022.0
 | > Avg audio length: 80022.0
 | > Num. instances discarded samples: 0
 | > Batch group size: 0.
 | > Synthesizing test sentences.



  [1m--> EVAL PERFORMANCE[0m
     | > avg_loader_time:[91m 0.79953 [0m(+0.28678)
     | > avg_loss_disc:[92m 2.35082 [0m(-0.34252)
     | > avg_loss_disc_real_0:[92m 0.07744 [0m(-0.10931)
     | > avg_loss_disc_real_1:[92m 0.15084 [0m(-0.07974)
     | > avg_loss_disc_real_2:[92m 0.22542 [0m(-0.01131)
     | > avg_loss_disc_real_3:[92m 0.24327 [0m(-0.01184)
     | > avg_loss_disc_real_4:[92m 0.18859 [0m(-0.05437)
     | > avg_loss_disc_real_5:[91m 0.26229 [0m(+0.01574)
     | > avg_loss_0:[92m 2.35082 [0m(-0.34252)
     | > avg_loss_gen:[91m 2.07280 [0m(+0.16275)
     | > avg_loss_kl:[91m 1.50264 [0m(+0.27209)
     | > avg_loss_feat:[91m 3.32922 [0m(+1.23456)
     | > avg_loss_mel:[91m 22.67395 [0m(+0.99561)
     | > avg_loss_duration:[92m 1.56327 [0m(-0.01479)
     | > avg_loss_1:[91m 31.14190 [0m(+2.65022)


[4m[1m > EPOCH: 95/100[0m
 --> /content/drive2/MyDrive/tts-ai/VITS-es-1-April-28-2023_09+16PM-7712052

[1m > TRAINING (2023-04-28 22:13:56) [



> DataLoader initialization
| > Tokenizer:
	| > add_blank: True
	| > use_eos_bos: False
	| > use_phonemes: True
	| > phonemizer:
		| > phoneme language: es-es
		| > phoneme backend: gruut
	| > 6 not found characters:
	| > ɡ
	| > θ
	| > ͡
	| > ʃ
	| > ʝ
	| > ɾ
| > Number of instances : 401
 | > Preprocessing samples
 | > Max text length: 94
 | > Min text length: 20
 | > Avg text length: 61.45885286783042
 | 
 | > Max audio length: 80022.0
 | > Min audio length: 64166.0
 | > Avg audio length: 79982.45885286783
 | > Num. instances discarded samples: 0
 | > Batch group size: 0.


In [6]:
import locale
locale.getpreferredencoding = lambda: "UTF-8"

In [7]:
## Test Model

!tts --text "Hola soy gustavo petro y esta es mi voz ja ja ja" \
      --model_path "/content/drive2/MyDrive/tts-ai/VITS-es-1-April-28-2023_09+16PM-7712052/best_model.pth" \
      --config_path "/content/drive2/MyDrive/tts-ai/VITS-es-1-April-28-2023_09+16PM-7712052/config.json" \
      --out_path output_test.wav


 > Using model: vits
 > Setting up Audio Processor...
 | > sample_rate:16000
 | > resample:False
 | > num_mels:80
 | > log_func:np.log10
 | > min_level_db:0
 | > frame_shift_ms:None
 | > frame_length_ms:None
 | > ref_level_db:None
 | > fft_size:1024
 | > power:None
 | > preemphasis:0.0
 | > griffin_lim_iters:None
 | > signal_norm:None
 | > symmetric_norm:None
 | > mel_fmin:0
 | > mel_fmax:None
 | > pitch_fmin:None
 | > pitch_fmax:None
 | > spec_gain:20.0
 | > stft_pad_mode:reflect
 | > max_norm:1.0
 | > clip_norm:True
 | > do_trim_silence:False
 | > trim_db:60
 | > do_sound_norm:False
 | > do_amp_to_db_linear:True
 | > do_amp_to_db_mel:True
 | > do_rms_norm:False
 | > db_level:None
 | > stats_path:None
 | > base:10
 | > hop_length:256
 | > win_length:1024
 > Text: Hola soy gustavo petro y esta es mi voz ja ja ja
 > Text splitted to sentences.
['Hola soy gustavo petro y esta es mi voz ja ja ja']
['<BLNK>', 'o', '<BLNK>', 'l', '<BLNK>', 'a', '<BLNK>', ' ', '<BLNK>', 's', '<BLNK>', 'o', '

In [8]:
from IPython.display import Audio
wn = Audio('output_test.wav')
display(wn)