# Lesson 7: Text to Speech

# Comparatively Challenging Task:

- In classification tasks, there is typically one correct label or a few correct labels. 
- In automatic speech recognition (ASR), there's one correct transcription for a given utterance, but numerous ways to express the same sentence due to variations in voices, dialects, and speaking styles. 
- Despite these challenges, open-source models are available that handle ASR tasks effectively.

- If you would like to run this code on your own machine, you can install the following:

```
    !pip install transformers
    !pip install gradio
    !pip install timm
    !pip install timm
    !pip install inflect
    !pip install phonemizer
    
```

**Note:**  `py-espeak-ng` is only available Linux operating systems.

To run locally in a Linux machine, follow these commands:
```
    sudo apt-get update
    sudo apt-get install espeak-ng
    pip install py-espeak-ng
```

### Build the `text-to-speech` pipeline using the ðŸ¤— Transformers Library

- Here is some code that suppresses warning messages.

In [1]:
from transformers.utils import logging

logging.set_verbosity_error()

In [2]:
from transformers import pipeline

narrator = pipeline("text-to-speech",
                    model="./models/kakao-enterprise/vits-ljs")

Info about [kakao-enterprise/vits-ljs](https://huggingface.co/kakao-enterprise/vits-ljs)

In [3]:
text = """
Researchers at the Allen Institute for AI, \
HuggingFace, Microsoft, the University of Washington, \
Carnegie Mellon University, and the Hebrew University of \
Jerusalem developed a tool that measures atmospheric \
carbon emitted by cloud servers while training machine \
learning models. After a modelâ€™s size, the biggest variables \
were the serverâ€™s location and time of day it was active.
"""

In [4]:
narrated_text = narrator(text)

In [6]:
narrated_text

{'audio': array([[-0.00048845, -0.00021843, -0.00043199, ...,  0.0009256 ,
          0.00134011,  0.00149822]], dtype=float32),
 'sampling_rate': 22050}

In [5]:
from IPython.display import Audio as IPythonAudio

IPythonAudio(narrated_text["audio"][0],
             rate=narrated_text["sampling_rate"])

### Try it yourself! 
- Try this model with your own text to speech examples!