**Fine tune or train new VITS model**

Script composed from [Coqui TTS](https://https://github.com/coqui-ai/TTS) contributors, the OpenAI team, and (https://https://www.youtube.com/c/ThorstenMueller) (https://https://www.youtube.com/c/NanoNomad)

**Run this cell to connect your Google Drive account to save files.**

In [None]:
from google.colab import drive

drive.mount("/content/drive")

**Set paths and then run the next cell**

ds_name is the dataset directory (will be created)

output_directory is training storage directory, 

subdirectory of ds_name (will be created)

upload_dir is where your samples are stored (will be created)

MODEL_FILE is the default path to the VITS model downloaded using Coqui (do not need to change)

RUN_NAME is a short name describing your training run

In [None]:
import os

ds_name = "me2-dataset" #@param {type:"string"}
output_directory = "traineroutput" #@param {type:"string"}
upload_dir = "upload" #@param {type:"string"}
MODEL_FILE = "/root/.local/share/tts/tts_models--fi--css10--vits/model_file.pth.tar" #@param {type:"string"}
upload_dir = "/content/drive/MyDrive/" + upload_dir
RUN_NAME = "VITS-fi" #@param {type:"string"}


OUT_PATH = "/content/drive/MyDrive/"+ds_name+"/traineroutput/"
!mkdir $upload_dir
!mkdir /content/drive/MyDrive/$ds_name
!mkdir /content/drive/MyDrive/$ds_name/wavs/

**Set run type.**

Continue to resume an interrupted session

restore to begin a new session from the defalt model model file above (download from Coqui Hub using the download cell later on).

restore-ckpt is for beginning a new session using a prior fine-tuned checkpoint. You can set this later on in the training section in Part 2.

In [None]:
run_type = "restore-ckpt" #@param ["continue","restore","restore-ckpt","new"]
print(run_type + " run selected")

**Download and Build Rnnoise (https://github.com/xiph/rnnoise) and Requirements**

In [None]:
#@title
!pip install pyloudnorm
!git clone https://github.com/xiph/rnnoise.git
!sudo apt-get install curl autoconf automake libtool python-dev pkg-config sox ffmpeg
%cd /content/rnnoise
!sh autogen.sh
!sh configure
!make clean
!make

**Install Sox, Install OpenAI Whisper STT+Translation (https://github.com/openai/whisper)**

In [None]:
#@title
%cd /content
!sudo apt install sox
!git clone https://github.com/openai/whisper.git
!pip install git+https://github.com/openai/whisper.git 


**Install Coqui TTS** (https://github.com/coqui-ai/TTS), espeak-ng phonemeizer (https://github.com/espeak-ng/espeak-ng), download Coqui TTS source and examples from GitHub.
**bold text**
Currently set to force install Coqui Trainer==0.0.20

In [None]:
#@title
%cd /content
!sudo apt-get install espeak-ng
!git clone https://github.com/coqui-ai/TTS.git
!pip install TTS
!pip install Trainer==0.0.20

**(Optional) List pretrained models available on the Coqui Hub**

In [None]:
!tts --list_models

**Audio Preprocessing Options**

Recommended: Leave all 'true'

run_denoise use Xiph's rnnoise on samples

run_splits split samples based on silence interval of 0.2 seconds and then force a split into 8 second segments. Click view code and find the 'sox' lines to change these intervals.

use_audio_filter engage highpass filter 50hz, lowpass fitler 8000hz. Click view code and find the 'sox' lines to change these frequencies if needed.

normalize_audio to engage -6db peak -25LUFS normalization

In [None]:
run_denoise = "True" #@param ["True", "False"]
run_splits = "True" #@param ["True", "False"]
use_audio_filter = "True" #@param ["True", "False"]
normalize_audio = "True" #@param ["True", "False"]
#start_sil_dur = 0.2 #@param {type:"number"}
#end_sil_dur = 0.2 #@param {type:"number"}
#sample_max = 8 #@param {type:"number"}


**Process**
This section will convert mp3 and wav files in upload_dir to  22050hz mono wav files.  Then it will pass the wav files through rnnoise.

rnnoise output is then segmented based on 0.2 second silences (click show code below, change 0.2 in the sox line to the duration to silence duration if needed)

8000hz Highpass and 50hz lowpass filters applied, gain/loudness adjusted to reduce potential clipping, -6db peak normalization and -25db lufs applied.  Should be fine for general purpose.

segmented audio is then passed through sox again to force-split any long segments (above 8 seconds) into segments once again.  Files smaller than 35kb are deleted.

In [None]:
#@title

from pathlib import Path
import os
import subprocess
import soundfile as sf
import pyloudnorm as pyln
import sys
import glob
%cd $upload_dir
#!ls -al
!rm -rf $upload_dir/22k_1ch
!mkdir $upload_dir/22k_1ch

#Convert and resample uploaded mp3/wav clips to 1 channel, 22khz
!find . -name '*.mp3' -exec bash -c 'for f; do ffmpeg -hide_banner -loglevel error -i "$f" -acodec pcm_s16le -ar 22050 -ac 1 22k_1ch/"${f%.mp3}".wav ; done' _ {} +
!find . -name '*.wav' -exec bash -c 'for f; do ffmpeg -hide_banner -loglevel error -i "$f" -acodec pcm_s16le -ar 22050 -ac 1 22k_1ch/"${f%.wav}".wav ; done' _ {} +
#!ls -al $upload_dir/22k_1ch
print("Files converted to 22khz 1ch wav")
if run_denoise=="True":
  print("Running denoise...")
  orig_wavs= upload_dir + "/22k_1ch/"
  print(orig_wavs)

  from pathlib import Path
  import os
  import subprocess
  import soundfile as sf
  import pyloudnorm as pyln
  import sys
  import glob
  rnn = "/content/rnnoise/examples/rnnoise_demo"
  paths = glob.glob(os.path.join(orig_wavs, '*.wav'))
  for filepath in paths:
    base = os.path.basename(filepath)
    tp_s = upload_dir + "/22k_1ch/denoise/"
    tf_s = upload_dir + "/22k_1ch/denoise/" + base
    target_path = Path(tp_s)
    target_file = Path(tf_s)
    print("From: " + str(filepath))
    print("To: " + str(target_file))
	
  # Stereo to Mono; upsample to 48000Hz
  # added -G to fix gain, -v 0.8
    subprocess.run(["sox", "-G", "-v", "0.8", filepath, "48k.wav", "remix", "-", "rate", "48000"])
    subprocess.run(["sox", "48k.wav", "-c", "1", "-r", "48000", "-b", "16", "-e", "signed-integer", "-t", "raw", "temp.raw"]) # convert wav to raw
    subprocess.run(["/content/rnnoise/examples/rnnoise_demo", "temp.raw", "rnn.raw"]) # apply rnnoise
    subprocess.run(["sox", "-G", "-v", "0.8", "-r", "48k", "-b", "16", "-e", "signed-integer", "rnn.raw", "-t", "wav", "rnn.wav"]) # convert raw back to wav

    subprocess.run(["mkdir", "-p", str(target_path)])
    if use_audio_filter=="True":
      print("Running highpass/lowpass & resample")
      subprocess.run(["sox", "rnn.wav", str(target_file), "remix", "-", "highpass", "50", "lowpass", "8000", "rate", "22050"]) 
      # apply high/low pass filter and change sr to 22050Hz
      data, rate = sf.read(target_file)
    elif use_audio_filter=="False":
      print("Running resample without filter")
      subprocess.run(["sox", "rnn.wav", str(target_file), "remix", "-", "rate", "22050"]) 
      # apply high/low pass filter and change sr to 22050Hz
      data, rate = sf.read(target_file)
# peak normalize audio to -6 dB
    if normalize_audio=="True":
      print("Output normalized")
      peak_normalized_audio = pyln.normalize.peak(data, -6.0)

# measure the loudness first
      meter = pyln.Meter(rate) # create BS.1770 meter
      loudness = meter.integrated_loudness(data)

# loudness normalize audio to -25 dB LUFS
      loudness_normalized_audio = pyln.normalize.loudness(data, loudness, -25.0)
      sf.write(target_file, data=loudness_normalized_audio, samplerate=22050)
      print("")
    elif normalize_audio=="False":
      print("File written without normalizing")
      sf.write(target_file, data=data, samplerate=22050)
      print("")

  !rm $target_path/rnn.wav
  !rm $target_path/48k.wav

elif run_denoise=="False":
  paths = glob.glob(os.path.join(orig_wavs, '*.wav'))
  for filepath in paths:
    print("Skipping denoise...")
    base = os.path.basename(filepath)
    tp_s = upload_dir + "/22k_1ch/denoise/"
    tf_s = upload_dir + "/22k_1ch/denoise/" + base
    target_path = Path(tp_s)
    target_file = Path(tf_s)
    print("From: " + str(filepath))
    print("To: " + str(target_file))
    subprocess.run(["sox", "-G", "-v", "0.8", filepath, "48k.wav", "remix", "-", "rate", "48000"])
    subprocess.run(["sox", "48k.wav", "-c", "1", "-r", "48000", "-b", "16", "-e", "signed-integer", "-t", "raw", "temp.raw"]) # convert wav to raw
    #subprocess.run(["/content/rnnoise/examples/rnnoise_demo", "temp.raw", "rnn.raw"]) # apply rnnoise
    subprocess.run(["sox", "-G", "-v", "0.8", "-r", "48k", "-b", "16", "-e", "signed-integer", "rnn.raw", "-t", "wav", "rnn.wav"]) # convert raw back to wav
    subprocess.run(["mkdir", "-p", str(target_path)])
    if use_audio_filter=="True":
      print("Running filter...")
      subprocess.run(["sox", "rnn.wav", str(target_file), "remix", "-", "highpass", "50", "lowpass", "8000", "rate", "22050"]) # apply high/low pass filter and change sr to 22050Hz
      data, rate = sf.read(target_file)
    elif use_audio_filter=="False":
      print("Skipping filter...")
      subprocess.run(["sox", "rnn.wav", str(target_file), "remix", "-", "rate", "22050"]) # apply high/low pass filter and change sr to 22050Hz
      data, rate = sf.read(target_file)
          # peak normalize audio to -6 dB
    if normalize_audio=="True":
      print("Output normalized")
      peak_normalized_audio = pyln.normalize.peak(data, -6.0)

# measure the loudness first
      meter = pyln.Meter(rate) # create BS.1770 meter
      loudness = meter.integrated_loudness(data)

# loudness normalize audio to -25 dB LUFS
      loudness_normalized_audio = pyln.normalize.loudness(data, loudness, -25.0)
      sf.write(target_file, data=loudness_normalized_audio, samplerate=22050)
      print("")
    if normalize_audio=="False":
      print("File written without normalizing")
      sf.write(target_file, data=data, samplerate=22050)
      print("")
  !rm $target_path/rnn.wav
  !rm $target_path/48k.wav

if run_splits=="False":
  print("Copying files without splitting...")
  %mkdir /content/drive/MyDrive/$ds_name
  %mkdir /content/drive/MyDrive/$ds_name/wavs
  !cp $target_path/*.wav /content/drive/MyDrive/$ds_name/wavs
if run_splits=="True":
  print("Splitting output and copying...")
  %cd $target_path
  !rm -rf splits
  !mkdir splits
  !for FILE in *.wav; do sox "$FILE" splits/"$FILE" --show-progress silence 1 0.2 0.1% 1 0.2 0.1% : newfile : restart ; done
#alt split method: force splits of 8 seconds, however this will split words. Comment the above with # and remove the # below to change
#!for FILE in *.wav; do sox "$FILE" splits/"$FILE" --show-progress trim 0 8 : restart ; done
  %cd splits
  !mkdir resplit
  !for FILE in *.wav; do sox "$FILE" resplit/"$FILE" --show-progress trim 0 8 : newfile : restart ; done
  %cd resplit
  !find . -name "*.wav" -type f -size -35k -delete
  #!ls -al
  %cd /content/drive/MyDrive/$ds_name/

  !mkdir wavs
  !cp $target_path/splits/resplit/*.wav /content/drive/MyDrive/$ds_name/wavs
  %cd /content/drive/MyDrive/$ds_name/wavs
#  !ls -al

**Run this cell once only.**

**Load OpenAI Whisper model to memory. Now set to download medium for finnish STT, works pretty well. Loading it multiple times tends to crash Colab.**

Click show code and swap the commented line to use the other models instead. read DOCS.

In [None]:
#@title
import whisper
import os, os.path
import glob
import pandas as pd

from pathlib import Path


#model = whisper.load_model("medium.en")
model = whisper.load_model("medium")

**Run Whisper on generated audio clips, generate transcript named metadata.csv in LJSpeech format in the dataset directory.**

In [None]:
#@title
wavs = '/content/drive/MyDrive/'+ds_name+'/wavs'

paths = glob.glob(os.path.join(wavs, '*.wav'))
print(len(paths))

all_filenames = []
transcript_text = []
with open('/content/drive/MyDrive/'+ds_name+'/metadata.csv', 'w', encoding='utf-8') as outfile:
	for filepath in paths:
		base = os.path.basename(filepath)
		all_filenames.append(base)
	for filepath in paths:
		result = model.transcribe(filepath, language="fi")
		output = result["text"].lstrip()
		output = output.replace("\n","")
		thefile = str(os.path.basename(filepath).lstrip(".")).rsplit(".")[0]
		outfile.write(thefile + '|' + output + '|' + output + '\n')
		print(thefile + '|' + output + '|' + output + '\n')


**Display dataset**

In [None]:
#@title
!cat /content/drive/MyDrive/$ds_name/metadata.csv

**Download VITS model and Generate Sample Wav File to /content/ljspeech-vits.wav  This will be deleted when your Colab session is closed.**

In [None]:
!tts --text "Terve. Olen malli. Minua koulutetaan nyt." --model_name "tts_models/fi/css10/vits" --out_path /content/ljspeech-vits.wav

**Load Tensorboard**

In [None]:
import torch 
%load_ext tensorboard

**Load Dashboard**
May take several minutes to appear from a blank white box.  Ad blockers probably need to whitelist a bunch of Colab stuff or this won't work.

In [None]:
%tensorboard --logdir /content/drive/MyDrive/$ds_name/$output_directory/

**Load libs**

In [None]:
from trainer import Trainer, TrainerArgs

from TTS.tts.configs.shared_configs import BaseDatasetConfig, CharactersConfig
from TTS.tts.configs.vits_config import VitsConfig
from TTS.tts.datasets import load_tts_samples
from TTS.tts.models.vits import Vits, VitsAudioConfig
from TTS.tts.utils.text.tokenizer import TTSTokenizer
from TTS.utils.audio import AudioProcessor


In [None]:
output_path = "/content/drive/MyDrive/"+ds_name + "/" + output_directory + "/"
SKIP_TRAIN_EPOCH = False

In [None]:
dataset_config = BaseDatasetConfig(
    formatter="ljspeech", meta_file_train="metadata.csv", language="fi", path=os.path.join(output_path, "/content/drive/MyDrive/"+ ds_name)
)

In [None]:
audio_config = VitsAudioConfig(
    sample_rate=22050, win_length=1024, hop_length=256, num_mels=80, mel_fmin=0, mel_fmax=None
)

config = VitsConfig(
    audio=audio_config,
    run_name="vits_ljspeech_oma",
    batch_size=16,
    eval_batch_size=16,
    batch_group_size=16,
#   num_loader_workers=8,
    num_loader_workers=2,
    num_eval_loader_workers=2,
    run_eval=True,
    test_delay_epochs=-1,
    epochs=1000,
    save_step=1000,
	  save_checkpoints=True,
	  save_n_checkpoints=4,
	  save_best_after=1000,
    #text_cleaner="english_cleaners",
    text_cleaner="multilingual_cleaners",
    use_phonemes=True,
    phoneme_language="fi",
    phoneme_cache_path=os.path.join(output_path, "phoneme_cache"),
    characters=CharactersConfig(
      characters_class="TTS.tts.utils.text.characters.Graphemes",
      vocab_dict=None,
      pad="<PAD>",
      eos="<EOS>",
      bos="<BOS>",
      blank="<BLNK>",
      characters="abcdefghijklmnopqrstuvwxyzˈŋˌːøɪɡ\u00af\u00b7\u00df\u00e0\u00e1\u00e2\u00e3\u00e4\u00e6\u00e7\u00e8\u00e9\u00ea\u00eb\u00ec\u00ed\u00ee\u00ef\u00f1\u00f2\u00f3\u00f4\u00f5\u00f6\u00f9\u00fa\u00fb\u00fc\u00ff\u0101\u0105\u0107\u0113\u0119\u011b\u012b\u0131\u0142\u0144\u014d\u0151\u0153\u015b\u016b\u0171\u017a\u017c\u01ce\u01d0\u01d2\u01d4\u0430\u0431\u0432\u0433\u0434\u0435\u0436\u0437\u0438\u0439\u043a\u043b\u043c\u043d\u043e\u043f\u0440\u0441\u0442\u0443\u0444\u0445\u0446\u0447\u0448\u0449\u044a\u044b\u044c\u044d\u044e\u044f\u0451\u0454\u0456\u0457\u0491",
      punctuations="!'(),-.:;? ",
      phonemes=None,
      is_unique=True,
      is_sorted=True
    ),
    compute_input_seq_cache=True,
    print_step=25,
    print_eval=True,
    mixed_precision=True,
    output_path=output_path,
    datasets=[dataset_config],
    cudnn_benchmark=False,
    test_sentences=[
    [
        "Sateenkaari on spektrin väreissä esiintyvä ilmakehän optinen ilmiö. Se syntyy, kun valo taittuu pisaran etupinnasta, heijastuu pisaran takapinnasta ja taittuu jälleen pisaran etupinnasta.",
        "css10",
        None,
        "fi"
    ],
    [
        "Moi, minun nimeni on Aleksanteri.",
        "css10",
        None,
        "fi"
    ],
    [
        "Tämä on outo ilmiö. En tiedä mitä tässä tapahtuu.",
        "css10",
        None,
        "fi"
    ],
    ]
)

# INITIALIZE THE AUDIO PROCESSOR
# Audio processor is used for feature extraction and audio I/O.
# It mainly serves to the dataloader and the training loggers.
ap = AudioProcessor.init_from_config(config)


In [None]:
# INITIALIZE THE TOKENIZER
# Tokenizer is used to convert text to sequences of token IDs.
# config is updated with the default characters if not defined in the config.
tokenizer, config = TTSTokenizer.init_from_config(config)

# LOAD DATA SAMPLES
# Each sample is a list of ```[text, audio_file_path, speaker_name]```
# You can define your custom sample loader returning the list of samples.
# Or define your custom formatter and pass it to the `load_tts_samples`.
# Check `TTS.tts.datasets.load_tts_samples` for more details.
train_samples, eval_samples = load_tts_samples(
    dataset_config,
    eval_split=True,
    eval_split_max_size=config.eval_split_max_size,
    eval_split_size=config.eval_split_size,
)

In [None]:
model = Vits.init_from_config(config)

**If continuning a run: use the next cell to list all run directories.**

**Copy and paste the run you want to or restore a checkpoint from into the next box**

In [None]:
#@title
!ls -al /content/drive/MyDrive/$ds_name/traineroutput

**Run folder to continue from or Run folder that contains your restore checkpoint**

In [None]:
run_folder = "vits_ljspeech_oma-April-12-2023_10+34AM-0000000" #@param {type:"string"}


List checkpoints in run folder. The checkpoint only needs to be selected for a restore run.

Continuing a run will load the last best loss checkpoint according to the stored config.json in the run directory on its own (a directory is specified for a continue run, and a model file is specified for a restore run)

In [None]:
#@title
!ls -al /content/drive/MyDrive/$ds_name/traineroutput/$run_folder

**If changing to a different "restore" checkpoint to begin a new training session with a model you are already training, set the checkpoint filename here**

In [None]:
ckpt_file = "checkpoint_105000.pth" #@param {type:"string"}
print(ckpt_file + " selected for restore run")
if run_type=="continue":
  print("Warning:\n restore checkpoint selected, but run type set to continue.\nTrainer will load best loss from checkpoint directory.\n Are you sure this is what you want to do?\n\nIf not, change the run type below to 'restore'")
elif run_type=="restore-ckpt":
  print("Warning:\n restore checkpoint selected, run type set to restore from selected checkpoint, not default base model.\nIf this is not correct, adjust the run type.")


**Last chance to change run type**

In [None]:
run_type = "restore-ckpt" #@param ["continue","restore","restore-ckpt","new"]
print(run_type + " run selected")

**(Session recovery: Reset selected model file back to default predownloaded path)**

In [None]:
#@title /root/.local/share/tts/tts_models--fi--css10--vits/model_file.pth.tar
ckpt_file = "/root/.local/share/tts/tts_models--fi--css10--vits/model_file.pth.tar"
print(ckpt_file + " selected for restore run")

**(Optional) Freeze selected modules. Trainer must be reinitilized if these are changed.**

In [None]:
print("Current reinit_text_encoder value: " + str(config.model_args.reinit_text_encoder))
reinit_te_status = "False" #@param ["False", "True"]
if reinit_te_status=="False":
  print("Text encoder will not be reinitilized")
elif reinit_te_status=="True":
  config.model_args.reinit_text_encoder=True
  print("Model arguments set to reinitilize text encoder")
  print("Current reinit_DP value: " + str(config.model_args.reinit_DP))
reinit_DP_status = "False" #@param ["False", "True"]
if reinit_DP_status=="False":
  print("DP will not be reinitilized")
elif reinit_DP_status=="True":
  config.model_args.reinit_DP=True
  print("Model arguments set to reinitilize DP")
print("Current freeze_waveform_decoder value: " + str(config.model_args.freeze_waveform_decoder))
freeze_waveform_decoder_status = "False" #@param ["False", "True"]
if freeze_waveform_decoder_status=="False":
  print("Waveform decoder will NOT be frozen")
  config.model_args.freeze_waveform_decoder=False
elif freeze_waveform_decoder_status=="True":
  config.model_args.freeze_waveform_decoder=True
  print("Waveform decoder FROZEN")
print("Current freeze_flow_decoder value: " + str(config.model_args.freeze_flow_decoder))
freeze_flow_decoder_status = "False" #@param ["False", "True"]
if freeze_flow_decoder_status=="False":
  print("Flow decoder will NOT be frozen")
  config.model_args.freeze_flow_decoder=None
elif freeze_flow_decoder_status=="True":
  config.model_args.freeze_flow_decoder="True"
  print("Flow decoder FROZEN")
print("Current freeze_encoder value: " + str(config.model_args.freeze_encoder))
freeze_encoder_status = "False" #@param ["False", "True"]
if freeze_encoder_status=="False":
  print("Text encoder will NOT be frozen")
  config.model_args.freeze_encoder=False
elif freeze_encoder_status=="True":
  config.model_args.freeze_encoder=True
  print("Text encoder FROZEN")
print("Current freeze_DP value: " + str(config.model_args.freeze_DP))
freeze_DP_status = "False" #@param ["False", "True"]
if freeze_DP_status=="False":
  print("Duration predictor will NOT be frozen")
  config.model_args.freeze_DP=False
elif freeze_DP_status=="True":
  config.model_args.freeze_DP=True
  print("Duration predictor FROZEN")        

**Init the trainer**

In [None]:
#@title
print(run_type)
if run_type=="continue":
  CONTINUE_PATH="/content/drive/MyDrive/"+ds_name+"/traineroutput/"+run_folder
  trainer = Trainer(
    TrainerArgs(continue_path=CONTINUE_PATH, skip_train_epoch=SKIP_TRAIN_EPOCH),
    config,
    output_path=OUT_PATH,
    model=model,
    train_samples=train_samples,
    eval_samples=eval_samples,
)
elif run_type=="restore":
    trainer = Trainer(
    TrainerArgs(restore_path=MODEL_FILE, skip_train_epoch=SKIP_TRAIN_EPOCH),
    config,
    output_path=OUT_PATH,
    model=model,
    train_samples=train_samples,
    eval_samples=eval_samples,
)
elif run_type=="restore-ckpt":
  trainer = Trainer(
  TrainerArgs(restore_path="/content/drive/MyDrive/"+ds_name+"/traineroutput/"+run_folder+"/"+ckpt_file, skip_train_epoch=SKIP_TRAIN_EPOCH),
  config,
  output_path=OUT_PATH,
  model=model,
  train_samples=train_samples,
  eval_samples=eval_samples,
)
elif run_type=="new":
  trainer = Trainer(
  TrainerArgs(),
  config,
  output_path=OUT_PATH,
  model=model,
  train_samples=train_samples,
  eval_samples=eval_samples,
)

**Run training**

In [None]:
trainer.fit()

In [None]:
!nvidia-smi

Script to extract model without discriminator.

In [None]:
import torch
from TTS.tts.models.vits import Vits
from TTS.config import load_config
from TTS.tts.utils.text.tokenizer import TTSTokenizer
from TTS.utils.audio import AudioProcessor

tts_checkpoint="/content/drive/MyDrive/"+ds_name+"/traineroutput/"+run_folder+"/"+ckpt_file
tts_config_path="/content/drive/MyDrive/"+ds_name+"/traineroutput/"+run_folder+"/config.json"
save_checkpoint = "/content/drive/MyDrive/"+ds_name+"/traineroutput/"+run_folder+"/"+"model.pth"

config = load_config(tts_config_path)
ap = AudioProcessor.init_from_config(config)
tokenizer, config = TTSTokenizer.init_from_config(config)

# init model
model = Vits(config, ap, tokenizer, speaker_manager=None)
model.load_checkpoint(config, tts_checkpoint, eval=True)
model.disc = None
model_state = model.state_dict()
state = {
    "model": model_state
    }
torch.save(state, save_checkpoint)