# Application Example: Spanish to English Translation

This notebook demonstrates how to use NVIDIA NeMo (https://github.com/NVIDIA/NeMo) to build an interactive demo that translates Spanish audio files to English, as well as Spanish text to English.

## Steps:

1. **Instantiate Pre-Trained NeMo Models from NVIDIA NGC**
   - Utilize the models provided by NVIDIA NGC to start the process.

2. **Transcribe Audio with English Speech Recognition Model**
   - The transcription is carried out using a specific model for English speech recognition.

3. **Translate Text to English with Automatic Translation Model**
   - After transcription, the Spanish text is translated into English using an automatic translation model.


### Import all the necessary packages

In [49]:
import ipywidgets as widgets
from IPython.display import display, Audio
import ipywidgets as widgets
from IPython.display import display, Audio
import IPython

In [50]:
# Ignore pre-production warnings
import warnings
warnings.filterwarnings('ignore')
import nemo
# Import Speech Recognition collection
import nemo.collections.asr as nemo_asr
# Import Natural Language Processing collection
import nemo.collections.nlp as nemo_nlp
# Import Speech Synthesis collection
import nemo.collections.tts as nemo_tts
# We'll use this to listen to audio
import IPython

### Starting MLFlow experiment

In [51]:
import mlflow
mlflow.set_experiment("NeMoInferencePipeline")


<Experiment: artifact_location='/phoenix/mlflow/488162201757723049', creation_time=1707186401657, experiment_id='488162201757723049', last_update_time=1707186401657, lifecycle_stage='active', name='NeMoInferencePipeline', tags={}>

### Loading from local saved modelsHere, instead of downloading the models directly from NGC via code, we are showing that we can access the models that were downloaded previously, using Ai Studio assets manager


In [52]:
quartznet = nemo_asr.models.EncDecCTCModel.restore_from(restore_path="/mnt/c/Users/ViDi132/Desktop/ModelsNGC/stt_es_quartznet15x5/stt_es_quartznet15x5.nemo").cuda()

translate = nemo_nlp.models.MTEncDecModel.restore_from(restore_path="/mnt/c/Users/ViDi132/Desktop/ModelsNGC/nmt_es_en_transformer12x2/nmt_es_en_transformer12x2.nemo").cuda()

spectrogram_generator = nemo_tts.models.FastPitchModel.restore_from(restore_path="/mnt/c/Users/ViDi132/Desktop/ModelsNGC/tts_en_fastpitch/tts_en_fastpitch_align_ipa.nemo").cuda()

vocoder = nemo_tts.models.HifiGanModel.restore_from(restore_path="/mnt/c/Users/ViDi132/Desktop/ModelsNGC/tts_en_hifigan_adapter/tts_en_hifigan_adapter.nemo").cuda()

[NeMo W 2024-02-06 09:18:52 modelPT:161] If you intend to do training or fine-tuning, please call the ModelPT.setup_training_data() method and provide a valid configuration file to setup the train data loader.
    Train config : 
    manifest_filepath: /raid/noneval.json
    sample_rate: 16000
    labels:
    - ' '
    - a
    - b
    - c
    - d
    - e
    - f
    - g
    - h
    - i
    - j
    - k
    - l
    - m
    - 'n'
    - o
    - p
    - q
    - r
    - s
    - t
    - u
    - v
    - w
    - x
    - 'y'
    - z
    - ''''
    - á
    - é
    - í
    - ó
    - ú
    - ñ
    - ü
    batch_size: 16
    trim_silence: true
    max_duration: 16.7
    shuffle: true
    is_tarred: false
    tarred_audio_filepaths: null
    num_workers: 8
    pin_memory: true
    
[NeMo W 2024-02-06 09:18:52 modelPT:168] If you intend to do validation, please call the ModelPT.setup_validation_data() or ModelPT.setup_multiple_validation_data() method and provide a valid configuration file to setup th

[NeMo I 2024-02-06 09:18:52 features:289] PADDING: 16
[NeMo I 2024-02-06 09:18:54 save_restore_connector:249] Model EncDecCTCModel was successfully restored from /mnt/c/Users/ViDi132/Desktop/ModelsNGC/stt_es_quartznet15x5/stt_es_quartznet15x5.nemo.
[NeMo I 2024-02-06 09:19:49 tokenizer_utils:179] Getting YouTokenToMeTokenizer with model: /tmp/tmp_3z7ck45/tokenizer.32000.BPE.model with r2l: False.
[NeMo I 2024-02-06 09:19:49 tokenizer_utils:179] Getting YouTokenToMeTokenizer with model: /tmp/tmp_3z7ck45/tokenizer.32000.BPE.model with r2l: False.


[NeMo W 2024-02-06 09:19:49 modelPT:161] If you intend to do training or fine-tuning, please call the ModelPT.setup_training_data() method and provide a valid configuration file to setup the train data loader.
    Train config : 
    src_file_name: /raid/sharded_tarfiles_60_even/batches.tokens.16000._OP_1..302_CL_.tar
    tgt_file_name: /raid/sharded_tarfiles_60_even/batches.tokens.16000._OP_1..302_CL_.tar
    tokens_in_batch: 16000
    clean: true
    max_seq_length: 512
    cache_ids: false
    cache_data_per_node: false
    use_cache: false
    shuffle: true
    num_samples: -1
    drop_last: false
    pin_memory: false
    num_workers: 8
    load_from_cached_dataset: false
    reverse_lang_direction: true
    load_from_tarred_dataset: true
    metadata_path: /raid/sharded_tarfiles_60_even/metadata.json
    tar_shuffle_n: 100
    
[NeMo W 2024-02-06 09:19:49 modelPT:168] If you intend to do validation, please call the ModelPT.setup_validation_data() or ModelPT.setup_multiple_validat

[NeMo I 2024-02-06 09:19:54 nlp_overrides:752] Model MTEncDecModel was successfully restored from /mnt/c/Users/ViDi132/Desktop/ModelsNGC/nmt_es_en_transformer12x2/nmt_es_en_transformer12x2.nemo.


 NeMo-text-processing :: INFO     :: Creating ClassifyFst grammars.
I0206 09:20:01.148831 140647469093312 tokenize_and_classify.py:86] Creating ClassifyFst grammars.
[NeMo W 2024-02-06 09:20:29 experimental:26] `<class 'nemo.collections.tts.g2p.models.i18n_ipa.IpaG2p'>` is experimental and not ready for production yet. Use at your own risk.
[NeMo W 2024-02-06 09:20:34 i18n_ipa:124] apply_to_oov_word=None, This means that some of words will remain unchanged if they are not handled by any of the rules in self.parse_one_word(). This may be intended if phonemes and chars are both valid inputs, otherwise, you may see unexpected deletions in your input.
[NeMo W 2024-02-06 09:20:34 experimental:26] `<class 'nemo.collections.common.tokenizers.text_to_speech.tts_tokenizers.IPATokenizer'>` is experimental and not ready for production yet. Use at your own risk.
[NeMo W 2024-02-06 09:20:34 modelPT:161] If you intend to do training or fine-tuning, please call the ModelPT.setup_training_data() metho

[NeMo I 2024-02-06 09:20:34 features:289] PADDING: 1
[NeMo I 2024-02-06 09:20:34 save_restore_connector:249] Model FastPitchModel was successfully restored from /mnt/c/Users/ViDi132/Desktop/ModelsNGC/tts_en_fastpitch/tts_en_fastpitch_align_ipa.nemo.


[NeMo W 2024-02-06 09:20:43 modelPT:161] If you intend to do training or fine-tuning, please call the ModelPT.setup_training_data() method and provide a valid configuration file to setup the train data loader.
    Train config : 
    dataset:
      _target_: nemo.collections.tts.data.tts_dataset.VocoderDataset
      manifest_filepath: /ws/mel-dataset/Hifitts-VCTK-FP-mels-118-large/hifigan_train.json
      sample_rate: 44100
      n_segments: 16384
      max_duration: null
      min_duration: 0.75
      load_precomputed_mel: true
      hop_length: 512
    dataloader_params:
      drop_last: false
      shuffle: true
      batch_size: 40
      num_workers: 4
      pin_memory: true
    
[NeMo W 2024-02-06 09:20:43 modelPT:168] If you intend to do validation, please call the ModelPT.setup_validation_data() or ModelPT.setup_multiple_validation_data() method and provide a valid configuration file to setup the validation data loader(s). 
    Validation config : 
    dataset:
      _target_: n

[NeMo I 2024-02-06 09:20:43 features:289] PADDING: 0
[NeMo I 2024-02-06 09:20:43 features:297] STFT using exact pad
[NeMo I 2024-02-06 09:20:43 features:289] PADDING: 0
[NeMo I 2024-02-06 09:20:43 features:297] STFT using exact pad
[NeMo I 2024-02-06 09:20:44 save_restore_connector:249] Model HifiGanModel was successfully restored from /mnt/c/Users/ViDi132/Desktop/ModelsNGC/tts_en_hifigan_adapter/tts_en_hifigan_adapter.nemo.


### Get an audio sample in Spanish


In [53]:
# Download audio sample which we'll try
# This is a sample from LibriSpeech Dev Clean dataset - the model hasn't seen it before
Audio_sample = 'como-te-llamas.wav'


# Listen to it
IPython.display.Audio(Audio_sample)

### Transcribe audio file
We will use speech recognition model to convert audio into text.


In [54]:
# Convert our audio sample to text
files = [Audio_sample]
raw_text = ''
text = ''
for fname, transcription in zip(files, quartznet.transcribe(paths2audio_files=files)):
  raw_text = transcription


Transcribing:   0%|          | 0/1 [00:00<?, ?it/s]

In [55]:
print(raw_text)

cómo te llamas


### Translate Spanish text into EnglishNeMo's NMT models have a handy .translate() method.




In [56]:
translations = translate.translate([raw_text], source_lang="es", target_lang="en")
final = translations

In [57]:
print(final)

["What's your name"]


### Generate English audio from textSpeech generation from text typically has two steps:

Generate spectrogram from the text. In this example we will use FastPitch model for this.
Generate actual audio from the spectrogram. In this example we will use HifiGan model for this


In [58]:
# A helper function which combines TTS models to go directly from 
# text to audio
def text_to_audio(text):
    parsed = spectrogram_generator.parse(text)
    spectrogram = spectrogram_generator.generate_spectrogram(tokens=parsed)
    audio = vocoder.convert_spectrogram_to_audio(spec=spectrogram)
    return audio.to('cpu').detach().numpy()

In [59]:
for text in translations:
    audio_output = text_to_audio(text)
    IPython.display.Audio(audio_output, rate=40000)

[NeMo W 2024-02-06 09:20:45 fastpitch:291] parse() is meant to be called in eval mode.
[NeMo W 2024-02-06 09:20:45 fastpitch:368] generate_spectrogram() is meant to be called in eval mode.


In [60]:
IPython.display.Audio(audio_output, rate=40000)

### Register Individual Model
If you choose to register each individual model in MLflow, follow the example below:

In [14]:
import os
import mlflow
from mlflow.pyfunc import PythonModel

class NeMoWrapper(PythonModel):
    def __init__(self, model_path):
        self.model_path = model_path

    def predict(self, context, model_input):
        # Load the model here
        model = nemo_asr.models.EncDecCTCModel.restore_from(self.model_path)
        return [transcription for _, transcription in zip(model_input, model.transcribe(paths2audio_files=model_input))]

def log_nemo_model(model, model_name, model_path, requirements):
    model_wrapper = NeMoWrapper(model_path)
    mlflow.pyfunc.log_model(
        artifact_path=model_name,
        python_model=model_wrapper,
        artifacts={"model": model_path},
        pip_requirements=requirements
    )
    return f"models:/{model_name}"

# Save the model and get the path
quartznet.save_to("quartznet_model.nemo")
model_path = "quartznet_model.nemo"

# Log the model
with mlflow.start_run():
    requirements = [
        "nemo_toolkit[all]=={}".format(nemo.__version__)
        # Add other specific dependencies if needed
    ]

    model_uri = log_nemo_model(quartznet, "quartznet_model", model_path, requirements)
    mlflow.register_model(model_uri=model_uri, name="quartznet_model")


Downloading artifacts:   0%|          | 0/1 [00:00<?, ?it/s]

Successfully registered model 'quartznet_model'.
2024/02/06 02:44:20 INFO mlflow.tracking._model_registry.client: Waiting up to 300 seconds for model version to finish creation. Model name: quartznet_model, version 1
Created version '1' of model 'quartznet_model'.


### Registering a NeMo Model Combination
If you choose to create an execution pipeline by combining different NeMo models, follow these steps:

In [74]:
import mlflow
import nemo.collections.asr as nemo_asr
import nemo.collections.nlp as nemo_nlp
import nemo.collections.tts as nemo_tts
from mlflow.pyfunc import PythonModel

class NeMoCombinedModel(PythonModel):
    def __init__(self, asr_model_path, nlp_model_path, tts_model_path, vocoder_model_path):
        self.asr_model_path = asr_model_path
        self.nlp_model_path = nlp_model_path
        self.tts_model_path = tts_model_path
        self.vocoder_model_path = vocoder_model_path

    def predict(self, context, model_input):
        # Load the models
        asr_model = nemo_asr.models.EncDecCTCModel.restore_from(self.asr_model_path)
        nlp_model = nemo_nlp.models.MTEncDecModel.restore_from(self.nlp_model_path)
        tts_model = nemo_tts.models.FastPitchModel.restore_from(self.tts_model_path)
        vocoder_model = nemo_tts.models.HifiGanModel.restore_from(self.vocoder_model_path)

        # Check if input is a file path (treating as audio) or text
        if isinstance(model_input, str) and os.path.isfile(model_input):
            # Process audio: Transcribe, Translate and TTS
            transcription = asr_model.transcribe(paths2audio_files=[model_input])[0]
            translation = nlp_model.translate([transcription], source_lang="es", target_lang="en")[0]
            parsed = tts_model.parse(translation)
            spectrogram = tts_model.generate_spectrogram(tokens=parsed)
            audio = vocoder_model.convert_spectrogram_to_audio(spec=spectrogram)
            return audio.to('cpu').detach().numpy().tolist()

        # Otherwise, assume it is text for translation
        else:
            # Process text: Just Translate
            translation = nlp_model.translate([model_input], source_lang="es", target_lang="en")[0]
            return translation

        # NLP - Translate text
        translation = nlp_model.translate([transcription], source_lang="es", target_lang="en")[0]

        # TTS - Convert translated text to audio
        parsed = tts_model.parse(translation)
        spectrogram = tts_model.generate_spectrogram(tokens=parsed)
        audio = vocoder_model.convert_spectrogram_to_audio(spec=spectrogram)

        return audio.to('cpu').detach().numpy()

def log_combined_nemo_model(asr_model_path, nlp_model_path, tts_model_path, vocoder_model_path, model_name, requirements):
    combined_model = NeMoCombinedModel(asr_model_path, nlp_model_path, tts_model_path, vocoder_model_path)
    mlflow.pyfunc.log_model(
        artifact_path=model_name,
        python_model=combined_model,
        pip_requirements=requirements
    )
    return f"models:/{model_name}"

# Configure MLflow experiment
mlflow.set_experiment("NeMoCombinedInferencePipeline")

# Paths to saved models
asr_model_path = "/mnt/c/Users/ViDi132/Desktop/ModelsNGC/stt_es_quartznet15x5/stt_es_quartznet15x5.nemo"
nlp_model_path = "/mnt/c/Users/ViDi132/Desktop/ModelsNGC/nmt_es_en_transformer12x2/nmt_es_en_transformer12x2.nemo"
tts_model_path = "/mnt/c/Users/ViDi132/Desktop/ModelsNGC/tts_en_fastpitch/tts_en_fastpitch_align_ipa.nemo"
vocoder_model_path = "/mnt/c/Users/ViDi132/Desktop/ModelsNGC/tts_en_hifigan_adapter/tts_en_hifigan_adapter.nemo"

# Log and register the combined model
with mlflow.start_run():
    requirements = [
        "nemo_toolkit[all]=={}".format(nemo.__version__)
        # Add other specific dependencies if needed
    ]

    model_uri = log_combined_nemo_model(asr_model_path, nlp_model_path, tts_model_path, vocoder_model_path, "combined_nemo_model", requirements)
    mlflow.register_model(model_uri=model_uri, name="combined_nemo_model")


Registered model 'combined_nemo_model' already exists. Creating a new version of this model...
2024/02/06 10:49:39 INFO mlflow.tracking._model_registry.client: Waiting up to 300 seconds for model version to finish creation. Model name: combined_nemo_model, version 7
Created version '7' of model 'combined_nemo_model'.


### Deploying a Model with Conda



In [76]:
!mlflow models serve -m "runs:/bf40e08635044651aa66da803161e9ec/combined_nemo_model" --env-manager conda --port 5001 --port 5001


  value = self.callback(ctx, self, value)
Downloading artifacts: 100%|█████████████████████| 1/1 [00:00<00:00, 299.79it/s]
2024/02/06 12:20:27 INFO mlflow.models.flavor_backend_registry: Selected backend for flavor 'python_function'
Traceback (most recent call last):
  File "/usr/local/lib/python3.10/dist-packages/mlflow/utils/conda.py", line 227, in get_or_create_conda_env
    process._exec_cmd([conda_path, "--help"], throw_on_error=False)
  File "/usr/local/lib/python3.10/dist-packages/mlflow/utils/process.py", line 95, in _exec_cmd
    process = subprocess.Popen(
  File "/usr/lib/python3.10/subprocess.py", line 971, in __init__
    self._execute_child(args, executable, preexec_fn, close_fds,
  File "/usr/lib/python3.10/subprocess.py", line 1863, in _execute_child
    raise child_exception_type(errno_num, err_msg, err_filename)
FileNotFoundError: [Errno 2] No such file or directory: 'conda'

During handling of the above exception, another exception occurred:

Traceback (most recent c

### Deploying a Model with virtualenv

In [21]:
!export MLFLOW_DISABLE_ENV_MANAGER_CONDA_WARNING=TRUE

In [65]:
!mlflow models serve -m "runs:/bf40e08635044651aa66da803161e9ec/combined_nemo_model" --port 5001

Downloading artifacts: 100%|█████████████████████| 1/1 [00:00<00:00, 176.67it/s]
2024/02/06 09:28:44 INFO mlflow.models.flavor_backend_registry: Selected backend for flavor 'python_function'
Traceback (most recent call last):
  File "/usr/local/bin/mlflow", line 8, in <module>
    sys.exit(cli())
  File "/usr/local/lib/python3.10/dist-packages/click/core.py", line 1126, in __call__
    return self.main(*args, **kwargs)
  File "/usr/local/lib/python3.10/dist-packages/click/core.py", line 1051, in main
    rv = self.invoke(ctx)
  File "/usr/local/lib/python3.10/dist-packages/click/core.py", line 1657, in invoke
    return _process_result(sub_ctx.command.invoke(sub_ctx))
  File "/usr/local/lib/python3.10/dist-packages/click/core.py", line 1657, in invoke
    return _process_result(sub_ctx.command.invoke(sub_ctx))
  File "/usr/local/lib/python3.10/dist-packages/click/core.py", line 1393, in invoke
    return ctx.invoke(self.callback, **ctx.params)
  File "/usr/local/lib/python3.10/dist-pac

### Deploying a Model with Flask

In [77]:
from flask import Flask, request, jsonify
import mlflow.pyfunc
import os

app = Flask(__name__)

# Load the model
model = mlflow.pyfunc.load_model("runs:/74360ddce3c847c294c7afe28e9b5428/combined_nemo_model")

@app.route('/predict', methods=['POST'])
def predict():
    data = request.get_json()
    input_data = data.get('text') or data.get('audio_path')

    if not input_data:
        return jsonify({"error": "No text or audio_path provided"}), 400

    # Model prediction
    try:
        result = model.predict(input_data)

        # If input is a file path, it is audio
        if isinstance(input_data, str) and os.path.isfile(input_data):
            # Resultado é áudio
            result = {"audio": result.tolist() if isinstance(result, np.ndarray) else result}
        else:
            # Result is translated text
            result = {"translation": result}

        return jsonify(result)
    except Exception as e:
        return jsonify({"error": str(e)}), 500

if __name__ == '__main__':
    app.run(host='0.0.0.0', port=5001)


 * Serving Flask app '__main__'
 * Debug mode: off


 * Running on all addresses (0.0.0.0)
 * Running on http://127.0.0.1:5001
 * Running on http://10.137.137.111:5001
I0206 12:32:07.034360 140647469093312 _internal.py:96] [33mPress CTRL+C to quit[0m
[NeMo W 2024-02-06 12:32:23 modelPT:161] If you intend to do training or fine-tuning, please call the ModelPT.setup_training_data() method and provide a valid configuration file to setup the train data loader.
    Train config : 
    manifest_filepath: /raid/noneval.json
    sample_rate: 16000
    labels:
    - ' '
    - a
    - b
    - c
    - d
    - e
    - f
    - g
    - h
    - i
    - j
    - k
    - l
    - m
    - 'n'
    - o
    - p
    - q
    - r
    - s
    - t
    - u
    - v
    - w
    - x
    - 'y'
    - z
    - ''''
    - á
    - é
    - í
    - ó
    - ú
    - ñ
    - ü
    batch_size: 16
    trim_silence: true
    max_duration: 16.7
    shuffle: true
    is_tarred: false
    tarred_audio_filepaths: null
    num_workers: 8
    pin_memory: true
    
[NeMo W 2024-02-06 12:

[NeMo I 2024-02-06 12:32:23 features:289] PADDING: 16
[NeMo I 2024-02-06 12:32:27 save_restore_connector:249] Model EncDecCTCModel was successfully restored from /mnt/c/Users/ViDi132/Desktop/ModelsNGC/stt_es_quartznet15x5/stt_es_quartznet15x5.nemo.
[NeMo I 2024-02-06 12:33:28 tokenizer_utils:179] Getting YouTokenToMeTokenizer with model: /tmp/tmpq8ukw20f/tokenizer.32000.BPE.model with r2l: False.
[NeMo I 2024-02-06 12:33:28 tokenizer_utils:179] Getting YouTokenToMeTokenizer with model: /tmp/tmpq8ukw20f/tokenizer.32000.BPE.model with r2l: False.


[NeMo W 2024-02-06 12:33:28 modelPT:161] If you intend to do training or fine-tuning, please call the ModelPT.setup_training_data() method and provide a valid configuration file to setup the train data loader.
    Train config : 
    src_file_name: /raid/sharded_tarfiles_60_even/batches.tokens.16000._OP_1..302_CL_.tar
    tgt_file_name: /raid/sharded_tarfiles_60_even/batches.tokens.16000._OP_1..302_CL_.tar
    tokens_in_batch: 16000
    clean: true
    max_seq_length: 512
    cache_ids: false
    cache_data_per_node: false
    use_cache: false
    shuffle: true
    num_samples: -1
    drop_last: false
    pin_memory: false
    num_workers: 8
    load_from_cached_dataset: false
    reverse_lang_direction: true
    load_from_tarred_dataset: true
    metadata_path: /raid/sharded_tarfiles_60_even/metadata.json
    tar_shuffle_n: 100
    
[NeMo W 2024-02-06 12:33:28 modelPT:168] If you intend to do validation, please call the ModelPT.setup_validation_data() or ModelPT.setup_multiple_validat

[NeMo I 2024-02-06 12:33:40 nlp_overrides:752] Model MTEncDecModel was successfully restored from /mnt/c/Users/ViDi132/Desktop/ModelsNGC/nmt_es_en_transformer12x2/nmt_es_en_transformer12x2.nemo.


 NeMo-text-processing :: INFO     :: Creating ClassifyFst grammars.
I0206 12:33:49.662055 140639308084800 tokenize_and_classify.py:86] Creating ClassifyFst grammars.
[NeMo W 2024-02-06 12:34:39 experimental:26] `<class 'nemo.collections.tts.g2p.models.i18n_ipa.IpaG2p'>` is experimental and not ready for production yet. Use at your own risk.
[NeMo W 2024-02-06 12:34:41 i18n_ipa:124] apply_to_oov_word=None, This means that some of words will remain unchanged if they are not handled by any of the rules in self.parse_one_word(). This may be intended if phonemes and chars are both valid inputs, otherwise, you may see unexpected deletions in your input.
[NeMo W 2024-02-06 12:34:41 experimental:26] `<class 'nemo.collections.common.tokenizers.text_to_speech.tts_tokenizers.IPATokenizer'>` is experimental and not ready for production yet. Use at your own risk.
[NeMo W 2024-02-06 12:34:41 modelPT:161] If you intend to do training or fine-tuning, please call the ModelPT.setup_training_data() metho

[NeMo I 2024-02-06 12:34:41 features:289] PADDING: 1
[NeMo I 2024-02-06 12:34:42 save_restore_connector:249] Model FastPitchModel was successfully restored from /mnt/c/Users/ViDi132/Desktop/ModelsNGC/tts_en_fastpitch/tts_en_fastpitch_align_ipa.nemo.


[NeMo W 2024-02-06 12:34:52 modelPT:161] If you intend to do training or fine-tuning, please call the ModelPT.setup_training_data() method and provide a valid configuration file to setup the train data loader.
    Train config : 
    dataset:
      _target_: nemo.collections.tts.data.tts_dataset.VocoderDataset
      manifest_filepath: /ws/mel-dataset/Hifitts-VCTK-FP-mels-118-large/hifigan_train.json
      sample_rate: 44100
      n_segments: 16384
      max_duration: null
      min_duration: 0.75
      load_precomputed_mel: true
      hop_length: 512
    dataloader_params:
      drop_last: false
      shuffle: true
      batch_size: 40
      num_workers: 4
      pin_memory: true
    
[NeMo W 2024-02-06 12:34:52 modelPT:168] If you intend to do validation, please call the ModelPT.setup_validation_data() or ModelPT.setup_multiple_validation_data() method and provide a valid configuration file to setup the validation data loader(s). 
    Validation config : 
    dataset:
      _target_: n

[NeMo I 2024-02-06 12:34:52 features:289] PADDING: 0
[NeMo I 2024-02-06 12:34:52 features:297] STFT using exact pad
[NeMo I 2024-02-06 12:34:52 features:289] PADDING: 0
[NeMo I 2024-02-06 12:34:52 features:297] STFT using exact pad
[NeMo I 2024-02-06 12:34:55 save_restore_connector:249] Model HifiGanModel was successfully restored from /mnt/c/Users/ViDi132/Desktop/ModelsNGC/tts_en_hifigan_adapter/tts_en_hifigan_adapter.nemo.


I0206 12:34:55.827695 140639308084800 _internal.py:96] 127.0.0.1 - - [06/Feb/2024 12:34:55] "POST /predict HTTP/1.1" 200 -
[NeMo W 2024-02-06 12:35:50 modelPT:161] If you intend to do training or fine-tuning, please call the ModelPT.setup_training_data() method and provide a valid configuration file to setup the train data loader.
    Train config : 
    manifest_filepath: /raid/noneval.json
    sample_rate: 16000
    labels:
    - ' '
    - a
    - b
    - c
    - d
    - e
    - f
    - g
    - h
    - i
    - j
    - k
    - l
    - m
    - 'n'
    - o
    - p
    - q
    - r
    - s
    - t
    - u
    - v
    - w
    - x
    - 'y'
    - z
    - ''''
    - á
    - é
    - í
    - ó
    - ú
    - ñ
    - ü
    batch_size: 16
    trim_silence: true
    max_duration: 16.7
    shuffle: true
    is_tarred: false
    tarred_audio_filepaths: null
    num_workers: 8
    pin_memory: true
    
[NeMo W 2024-02-06 12:35:50 modelPT:168] If you intend to do validation, please call the ModelPT.se

[NeMo I 2024-02-06 12:35:50 features:289] PADDING: 16
[NeMo I 2024-02-06 12:35:52 save_restore_connector:249] Model EncDecCTCModel was successfully restored from /mnt/c/Users/ViDi132/Desktop/ModelsNGC/stt_es_quartznet15x5/stt_es_quartznet15x5.nemo.
[NeMo I 2024-02-06 12:36:57 tokenizer_utils:179] Getting YouTokenToMeTokenizer with model: /tmp/tmp0oqnrwl7/tokenizer.32000.BPE.model with r2l: False.
[NeMo I 2024-02-06 12:36:57 tokenizer_utils:179] Getting YouTokenToMeTokenizer with model: /tmp/tmp0oqnrwl7/tokenizer.32000.BPE.model with r2l: False.


[NeMo W 2024-02-06 12:36:58 modelPT:161] If you intend to do training or fine-tuning, please call the ModelPT.setup_training_data() method and provide a valid configuration file to setup the train data loader.
    Train config : 
    src_file_name: /raid/sharded_tarfiles_60_even/batches.tokens.16000._OP_1..302_CL_.tar
    tgt_file_name: /raid/sharded_tarfiles_60_even/batches.tokens.16000._OP_1..302_CL_.tar
    tokens_in_batch: 16000
    clean: true
    max_seq_length: 512
    cache_ids: false
    cache_data_per_node: false
    use_cache: false
    shuffle: true
    num_samples: -1
    drop_last: false
    pin_memory: false
    num_workers: 8
    load_from_cached_dataset: false
    reverse_lang_direction: true
    load_from_tarred_dataset: true
    metadata_path: /raid/sharded_tarfiles_60_even/metadata.json
    tar_shuffle_n: 100
    
[NeMo W 2024-02-06 12:36:58 modelPT:168] If you intend to do validation, please call the ModelPT.setup_validation_data() or ModelPT.setup_multiple_validat

[NeMo I 2024-02-06 12:37:08 nlp_overrides:752] Model MTEncDecModel was successfully restored from /mnt/c/Users/ViDi132/Desktop/ModelsNGC/nmt_es_en_transformer12x2/nmt_es_en_transformer12x2.nemo.


 NeMo-text-processing :: INFO     :: Creating ClassifyFst grammars.
I0206 12:37:18.696577 140639282906688 tokenize_and_classify.py:86] Creating ClassifyFst grammars.
[NeMo W 2024-02-06 12:38:07 experimental:26] `<class 'nemo.collections.tts.g2p.models.i18n_ipa.IpaG2p'>` is experimental and not ready for production yet. Use at your own risk.
[NeMo W 2024-02-06 12:38:09 i18n_ipa:124] apply_to_oov_word=None, This means that some of words will remain unchanged if they are not handled by any of the rules in self.parse_one_word(). This may be intended if phonemes and chars are both valid inputs, otherwise, you may see unexpected deletions in your input.
[NeMo W 2024-02-06 12:38:09 experimental:26] `<class 'nemo.collections.common.tokenizers.text_to_speech.tts_tokenizers.IPATokenizer'>` is experimental and not ready for production yet. Use at your own risk.
[NeMo W 2024-02-06 12:38:09 modelPT:161] If you intend to do training or fine-tuning, please call the ModelPT.setup_training_data() metho

[NeMo I 2024-02-06 12:38:09 features:289] PADDING: 1
[NeMo I 2024-02-06 12:38:10 save_restore_connector:249] Model FastPitchModel was successfully restored from /mnt/c/Users/ViDi132/Desktop/ModelsNGC/tts_en_fastpitch/tts_en_fastpitch_align_ipa.nemo.


[NeMo W 2024-02-06 12:38:19 modelPT:161] If you intend to do training or fine-tuning, please call the ModelPT.setup_training_data() method and provide a valid configuration file to setup the train data loader.
    Train config : 
    dataset:
      _target_: nemo.collections.tts.data.tts_dataset.VocoderDataset
      manifest_filepath: /ws/mel-dataset/Hifitts-VCTK-FP-mels-118-large/hifigan_train.json
      sample_rate: 44100
      n_segments: 16384
      max_duration: null
      min_duration: 0.75
      load_precomputed_mel: true
      hop_length: 512
    dataloader_params:
      drop_last: false
      shuffle: true
      batch_size: 40
      num_workers: 4
      pin_memory: true
    
[NeMo W 2024-02-06 12:38:19 modelPT:168] If you intend to do validation, please call the ModelPT.setup_validation_data() or ModelPT.setup_multiple_validation_data() method and provide a valid configuration file to setup the validation data loader(s). 
    Validation config : 
    dataset:
      _target_: n

[NeMo I 2024-02-06 12:38:19 features:289] PADDING: 0
[NeMo I 2024-02-06 12:38:19 features:297] STFT using exact pad
[NeMo I 2024-02-06 12:38:19 features:289] PADDING: 0
[NeMo I 2024-02-06 12:38:19 features:297] STFT using exact pad
[NeMo I 2024-02-06 12:38:22 save_restore_connector:249] Model HifiGanModel was successfully restored from /mnt/c/Users/ViDi132/Desktop/ModelsNGC/tts_en_hifigan_adapter/tts_en_hifigan_adapter.nemo.


I0206 12:38:23.125063 140639282906688 _internal.py:96] 127.0.0.1 - - [06/Feb/2024 12:38:23] "POST /predict HTTP/1.1" 200 -


### Example of text-only application

In [16]:

while True:
    text = input("Translate to English (type 'exit' to quit): ")
    

    if text.lower() == 'exit':
        break

    try:
        translations = translate.translate([text], source_lang="es", target_lang="en")
        final = translations[0]
        print("Translated text:", final)
    except Exception as e:
        print("An error occurred during translation:", e)


Translate to English (type 'exit' to quit):  Hola


Translated text: Hello there


Translate to English (type 'exit' to quit):  Despacito


Translated text: Slowly


Translate to English (type 'exit' to quit):  exit


### Example of Audio-to-Text.

In [44]:
while True:
    audio_path = input("Enter the path of the .wav file or (exit) to exit. ")


    if audio_path.lower() == 'exit':
        break

    try:

        raw_text = quartznet.transcribe(paths2audio_files=[audio_path])[0]
        print(f"Transcrição: {raw_text}")


        translation = translate.translate([raw_text], source_lang="es", target_lang="en")[0]
        print(f"Translation to English: {translation}")


        audio_output = text_to_audio(translation)
        ipd.display(ipd.Audio(audio_output, rate=40000))

    except Exception as e:
        print(f"An error has occurred: {e}")

Enter the path of the .wav file or (exit) to exit.  ds-experiments/NeMo-Demos/como-te-llamas.wav


Transcribing:   0%|          | 0/1 [00:00<?, ?it/s]

[NeMo E 2024-02-06 03:57:29 segment:249] Loading ds-experiments/NeMo-Demos/como-te-llamas.wav via SoundFile raised RuntimeError: `Error opening 'ds-experiments/NeMo-Demos/como-te-llamas.wav': System error.`. NeMo will fallback to loading via pydub.


An error has occurred: [Errno 2] No such file or directory: 'ds-experiments/NeMo-Demos/como-te-llamas.wav'


Enter the path of the .wav file or (exit) to exit.  como-te-llamas.wav


Transcribing:   0%|          | 0/1 [00:00<?, ?it/s]

Transcrição: cómo te llamas


[NeMo W 2024-02-06 03:57:46 fastpitch:291] parse() is meant to be called in eval mode.
[NeMo W 2024-02-06 03:57:46 fastpitch:368] generate_spectrogram() is meant to be called in eval mode.


Translation to English: What's your name
An error has occurred: name 'ipd' is not defined


Enter the path of the .wav file or (exit) to exit.  exit
