<a href="https://www.nvidia.com/dli"> <img src="images/DLI_Header.png" alt="Header" style="width: 400px;"/> </a>

# 7.0 TTS Pipeline Deployment with NVIDIA Riva
## (part of Lab 2)

In this notebook, we'll deploy TTS with  [NVIDIA Riva](https://developer.nvidia.com/riva). After the model is deployed in Riva, you can issue inference requests to the Riva server from a client.

**[7.1 Launch Riva Server](#7.1-Launch-Riva-Server)<br>**
&nbsp;&nbsp;&nbsp;&nbsp;[7.1.1 Riva Configuration](#7.1.1-Riva-Configuration)<br>
&nbsp;&nbsp;&nbsp;&nbsp;[7.1.2 Exercise: Configure Riva for TTS](#7.1.2-Exercise:-Configure-Riva-for-TTS)<br>
&nbsp;&nbsp;&nbsp;&nbsp;[7.1.3 Riva Start Services](#7.1.3-Riva-Start-Services)<br>
&nbsp;&nbsp;&nbsp;&nbsp;[7.1.4 Riva Available Services Check](#7.1.4-Riva-Available-Services-Check)<br>
**[7.2 Riva TTS Service Request](#7.2-Riva-TTS-Service-Request)<br>**
&nbsp;&nbsp;&nbsp;&nbsp;[7.2.1 Python Client Demo](#7.2.1-Python-Client-Demo)<br>
&nbsp;&nbsp;&nbsp;&nbsp;[7.2.2 Customizing TTS](#7.2.2-Customizing-TTS)<br>
&nbsp;&nbsp;&nbsp;&nbsp;[7.2.3 Exercise: Customize Phonemes](#7.2.3-Exercise:-Customize-Phonemes)<br>
**[7.3 Stop Riva Services](#7.3-Stop-Riva-Services)<br>**
**[7.4 Shut Down the Kernel](#7.4-Shut-Down-the-Kernel)<br>**

### Notebook Dependencies
The steps in this notebook assume that you have:

1. **NGC Credentials**<br>Be sure you have added your NGC credential as described in the [NGC Setup notebook](003_Intro_NGC_Setup.ipynb)

---
# 7.1 Launch Riva Server
We'll repeat the process of deployment with Riva that we used for ASR, but this time we will choose to deploy the TTS models instead.  Once again, we'll use the 
Riva Quick Start scripts downloaded from NGC.  

Set `RIVA_QS` to the `riva_quickstart` location.  We'll use a new location for the model repo to just contain the TTS models, as this will be a little faster to deploy.

In [None]:
# Set the Riva Quick Start directory
WORKSPACE='/dli_workspace'
RIVA_QS = WORKSPACE + "/riva_quickstart"
RIVA_MODEL_REPO = WORKSPACE + "/riva-tts-model-repo"
!mkdir -p $RIVA_MODEL_REPO

In [None]:
!ls $RIVA_QS

As with ASR, we can initialize the models using `riva_init.sh`, then start and stop the server with `riva_start.sh` and `riva_stop.sh`. We also need to set flags and values in `config.sh` to specify which services and models we want to initiate and start. 

## 7.1.1 Riva Configuration

Open [config.sh](dli_workspace/riva_quickstart/config.sh) and note the following important sections, which may still be set for ASR deployment.  You'll need to modify these sections.

##### Enable/Disable Riva Services
For each service, a true value means that the server is enabled for that particular capability.  For example, if we just want to run an ASR server, we can set the `service_enabled_tts` parameter to be `true` and all other parameters `false`.  An enabled service also means that later in the config file, all NGC models listed in the section will be downloaded.
```yaml
# Enable or Disable Riva Services
service_enabled_asr=true
service_enabled_nlp=true
service_enabled_tts=true
```

##### Specify the Language
You can specify the language code for the models that will be loaded.  The instructions and available language codes are included in the `config.sh` file: 
```yaml
# Language code to fetch models of a specify language
# Currently only ASR supports languages other than English
# Supported language codes: ar-AR, en-US, en-GB, de-DE, es-ES, es-US, fr-FR, hi-IN, it-IT, ja-JP, ru-RU, ko-KR, pt-BR, zh-CN
# for any language other than English, set service_enabled_nlp and service_enabled_tts to False
# for multiple languages enter space separated language codes.
language_code=("en-US")
```

##### Set the Encryption Key
   We want our encryption consistent for all of our projects, so we want this key to be the same as the one used to export our original model (and it already is!).  For the purposes of this class, this setting won't change.
```yaml
# Specify the encryption key to use to deploy models
MODEL_DEPLOY_KEY="tlt_encode"
```

##### Set the Model Location
`riva_model_loc` should be the folder that contains both the `rmir` folder `models` folders.  This value will need to be changed to the actual absolute path for a given project.
```yaml
# Custom models produced by NeMo or TLT and prepared using riva-build
# may also be copied manually to this location $(riva_model_loc/rmir).
#
# Models ($riva_model_loc/models)
# During the riva_init process, the RMIR files in $riva_model_loc/rmir
# are inspected and optimized for deployment. The optimized versions are
# stored in $riva_model_loc/models. The riva server exclusively uses these
# optimized versions.
riva_model_loc="/riva-model-repo"
```

## 7.1.2 Exercise: Configure Riva for TTS

Open [config.sh](dli_workspace/riva_quickstart/config.sh) and modify it to:
* Deploy only the TTS service 
* Specify only English
* Specify the `/dli_workspace/riva-tts-model-repo` model repo location where we've preloaded the TTS models

Save your work.

If you're not sure what to change, take a peek at the [solution](solutions/ex7.1.2_config.sh).

Check your work.  The `diff` comparison in the following cell should have no output.

In [None]:
# Check your work
!diff solutions/ex7.1.2_config.sh dli_workspace/riva_quickstart/config.sh

In [None]:
# Quick fix!
!cp solutions/ex7.1.2_config.sh dli_workspace/riva_quickstart/config.sh

## 7.1.3 Riva Start Services

The `riva_init.sh` script downloads the Riva containers needed, downloads models listed in `config.sh`, and optimizes  models as required with [NVIDIA TensorRT](https://developer.nvidia.com/tensorrt). Since we've already used the ServiceMaker `riva-deploy` tool to optimize the models we are using, `riva_init.sh` won't have much to do, but it is provided here for completeness.

The `riva_start.sh` script starts the server.

In [None]:
# Initialize Riva (5 minutes)
# Models have been preloaded, so TensorRT builds ("deployment") will be skipped
!cd $RIVA_QS && bash riva_init.sh config.sh

In [None]:
# Run Riva Start. This will start the server. (15 seconds)
!cd $RIVA_QS && bash riva_start.sh config.sh

Riva ASR services should be running when you get "Riva server is ready..."

##### Troubleshooting:
If it failed, open a terminal and clean the Riva model repository with:

```bash
cd /dli_workspace/riva_quickstart && bash riva_clean.sh config.sh
```
   
Run Riva Start Services as explained previously.

## 7.1.4 Riva Available Services Check

To check the exposed Riva services, run the `docker logs riva-speech` command. 

You should see the following models ready:



```
+----------------------------------------+---------+--------+
| Model                                  | Version | Status |
+----------------------------------------+---------+--------+
| fastpitch_hifigan_ensemble-English-US  | 1       | READY  |
| riva-onnx-fastpitch_encoder-English-US | 1       | READY  |
| riva-trt-hifigan-English-US            | 1       | READY  |
| spectrogram_chunker-English-US         | 1       | READY  |
| tts_postprocessor-English-US           | 1       | READY  |
| tts_preprocessor-English-US            | 1       | READY  |
+----------------------------------------+---------+--------+
```

In [None]:
!docker logs riva-speech

---
# 7.2 Riva TTS Service Request
To access the Riva API, we need to:
1. Start the Riva Speech Skills server. (already done!)
2. Install the [Riva Client library](https://github.com/nvidia-riva/tutorials#running-the-riva-client). (already done for this course!)
3. Set up requests using the [documentation tutorial example](https://docs.nvidia.com/deeplearning/riva/user-guide/docs/tutorials/tts-python-basics-and-customization-with-ssml.html) for transcription.

## 7.2.1 Python Client Demo

Riva TTS supports a number of options while making a text-to-speech request to the gRPC endpoint, as shown above. Let’s learn more about these parameters:

- `language_code`: Language of the generated audio. en-US represents English (US) and is currently the only language supported OOTB.
- `encoding` - Type of audio encoding to generate. Currently, only LINEAR_PCM is supported.
- `sample_rate_hz` - Sample rate of the generated audio. Depends on the microphone and is usually 22khz or 44khz.
- `voice_name` - Voice used to synthesize the audio. Currently, Riva offers two OOTB voices (English-US.Female-1, English-US.Male-1).

In [None]:
import numpy as np
import IPython.display as ipd
import riva.client

Now that everything is set up, let's give an input that we want our models to speak

In [None]:
sample_rate_hz = 44100

def remove_braces(braced_text):
    return braced_text.replace("{@","").replace("}","")

def tts_predict(text):
    auth = riva.client.Auth(uri='localhost:50051')
    riva_tts = riva.client.SpeechSynthesisService(auth)
    req = { 
            "language_code"  : "en-US",
            "encoding"       : riva.client.AudioEncoding.LINEAR_PCM ,   # Currently only LINEAR_PCM is supported
            "sample_rate_hz" : sample_rate_hz,                          # Generate 44.1KHz audio
            "voice_name"     : "English-US.Female-1"                    # The name of the voice to generate
    }
    req["text"] = text
    resp = riva_tts.synthesize(**req)
    audio_samples = np.frombuffer(resp.audio, dtype=np.int16)
    return audio_samples, remove_braces(resp.meta.processed_text)

In [None]:
audio_samples, processed_text =tts_predict("Hi, my name is Dana and I work for NVIDIA.")
print(remove_braces(processed_text))
ipd.Audio(audio_samples, rate=sample_rate_hz)

## 7.2.2 Customizing TTS

The Speech Synthesis Markup Language (SSML) specification is a markup for directing the performance of the virtual speaker. Riva supports portions of SSML, allowing you to adjust pitch, rate, and pronunciation of the generated audio. SSML support is available only for the FastPitch model at this time. th a different root tag are treated as raw input text.

Riva TTS supports the following SSML tags:
- The prosody tag, which supports attributes rate, pitch, and volume, through which we can control the rate, pitch, and volume of the generated audio.
- The phoneme tag, which allows us to control the pronunciation of the generated audio.
- The sub tag, which allows us to replace the pronunciation of the specified word or phrase with a different word or phrase.

Let's look at [customization of Riva TTS with these SSML tags](https://docs.nvidia.com/deeplearning/riva/user-guide/docs/tutorials/tts-python-basics-and-customization-with-ssml.html#customizing-riva-tts-audio-output-with-ssml) in some detail.

In [None]:
# prosody tag example
text='<speak><prosody pitch="0.1" rate="100%">NVIDIA</prosody></speak>'
audio_samples, processed_text = tts_predict(text)
print("preprocessed text: ", processed_text)
ipd.Audio(audio_samples, rate=sample_rate_hz)

To fix the pronunciation, you can use the phoneme tag.  A list of the TTS phones supported in Riva can be found [here](https://docs.nvidia.com/deeplearning/riva/user-guide/docs/tts/tts-phones.html). 

In [None]:
# prosody tag and phoneme tag example
text='<speak><prosody pitch="0.1" rate="100%"><phoneme ph="ɛnˈvɪdiə">NVIDIA</phoneme></prosody></speak>'
audio_samples, processed_text = tts_predict(text)
print("preprocessed text: ", processed_text)
ipd.Audio(audio_samples, rate=sample_rate_hz)

In [None]:
# prosody tag and phoneme tag example
text='<speak><prosody pitch="0.1" rate="100%">Hi, my name is Dana and I work for <phoneme ph="ɛnˈvɪdiə">NVIDIA</phoneme>.</prosody></speak>'
audio_samples, processed_text = tts_predict(text)
print("preprocessed text: ", processed_text)
ipd.Audio(audio_samples, rate=sample_rate_hz)

## 7.2.3 Exercise: Customize Phonemes
Let's revisit the pronunciation we worked on in the NeMo notebook.  

In [None]:
audio_samples, processed_text =tts_predict("His name is Adam Grzywaczewski and he works for NVIDIA.")
print(remove_braces(processed_text))
ipd.Audio(audio_samples, rate=sample_rate_hz)

The pronunciations for "Grzywaczewski" and "NVIDIA" were not correct. Change those pronunciations to "ɡzɪvɑˈtʃɛvski" and "ɛnˈvɪdiə" using SSML tags.<br>
If you get stuck, you can check the [solution](solutions/ex7.2.3.ipynb).

In [None]:
text=#FIXME
audio_samples, processed_text = tts_predict(text)
print("preprocessed text: ", processed_text)
ipd.Audio(audio_samples, rate=sample_rate_hz)

---
# 7.3 Stop Riva Services 
We need to stop Riva services as we will be modifying the deployed models.

In [None]:
# Run Riva Stop. 
!bash $RIVA_QS/riva_stop.sh

---
# 7.4 Shut Down the Kernel
<h3 style="color:red;">Important!</h3>

From the menu above, choose ***Kernel->Shut Down Kernel*** to fully clear GPU memory before moving on.

---
<h2 style="color:green;">Congratulations!</h2>

In this notebook, you have:
- Launched Riva TTS service
- Requested the TTS service using a Python client API
- Customized pronunciations

In the nex notebook you'll "put it all together" to deploy a [Full Pipline](008_Full_Pipeline.ipynb).

<a href="https://www.nvidia.com/dli"> <img src="images/DLI_Header.png" alt="Header" style="width: 400px;"/> </a>