# Getting started with Speech AI

![ASR](./images/asr_diarization_tutorial.png)

By the end of this Getting Started notebook, you will be able to use Automatic Speech Recognition (ASR), Neural Machine Translation (NMT) and Text-to-Speech (TTS) APIs in your projects, opening up a wide range of possibilities for Speech AI applications.

Happy learning!

## Instructions

### 1 - Making API Requests to Speech AI Models

➡️ Before continuing, please consult the **[00-making-http-request.MD](https://github.com/eleapttn/workshop-mastering-speech-ai/blob/main/docs/00-making-http-request.MD)** documentation to discover how to make HTTP API calls to the AI Endpoints models using Python.

This will be essential for interacting with the different AI models!

### 2 - ASR Pipeline

Let's start with Automatic Speech Recognition model.

🎯 Your task is to **send an audio file as input and receive a text transcription**. 

To do that, you will have to:

- Play the audio file, so that you can see what is being said
- Determine the `ASR` model you want to work with among the ones available on [AI Endpoints](https://endpoints.ai.cloud.ovh.net/)
- Get its endpoint `URL`
- Set up the necessary request headers
- Provide the input data expected by the `ASR` model.
- Send your request
- Print the request answer and analyse the audio transcription !

**➡️ You can find all the necessary information, such as the endpoint URL, format of expected input data, and code examples, in the [01-ASR.MD](https://github.com/eleapttn/workshop-mastering-speech-ai/blob/main/docs/01-ASR.MD) documentation file.**

💡 *ASR Solution is provided in the [0_SPEECH_AI_BASICS_SOLUTION.ipynb](https://github.com/eleapttn/workshop-mastering-speech-ai/blob/main/notebooks/solutions/0_SPEECH_AI_BASICS_SOLUTION.ipynb) notebook. But try to tackle the task on your own and ask questions before checking the solution!*

⬇️ *You can write the code in the cell below.* ⬇️

In [None]:
# To execute code cell, select the cell and then click the ▶️ button in the menu above the notebook. You can also select the cell and execute `SHIFT + ENTER`.

# Some audio files are provided in the /workspace/workshop-mastering-speech-ai/samples/audio_samples/ directory

# Write the code to play one of these audio samples, before transcribing it (refer to documentation provided above)



Now that you know the audio file content, send this audio file to the ASR model to transcribe it, by making an API request

In [None]:
# To execute code cell, select the cell and then click the ▶️ button in the menu above the notebook. You can also select the cell and execute `SHIFT + ENTER`.

# Then, send this audio file to the ASR model by making an API request

### 3 - NMT Pipeline

Now that you have successfully completed the `ASR` task, let's move on to the Neural Machine Translation model. 

🎯 Your task is to **translate a given text from one language to another**.

Remember that you'll need to: 

- Determine the `NMT` model you want to work with among the ones available on [AI Endpoints](https://endpoints.ai.cloud.ovh.net/)
- Get its endpoint `URL`
- Set up the necessary request headers
- Adjust the `JSON` request data to suit the newly chosen `NMT` model
- Send your request
- Print the response and analyze the translated text!

**➡️ You can find all the necessary information, such as the NMT endpoint URL, format of expected input data (we are not sending an audio file anymore), and code examples, in the [02-NMT.MD](https://github.com/eleapttn/workshop-mastering-speech-ai/blob/main/docs/02-NMT.MD) documentation file.**

💡 *NMT Solution is provided in the [0_SPEECH_AI_BASICS_SOLUTION.ipynb](https://github.com/eleapttn/workshop-mastering-speech-ai/blob/main/notebooks/solutions/0_SPEECH_AI_BASICS_SOLUTION.ipynb) notebook. Feel free to try the task on your own and ask questions before checking the solution.*

⬇️ *You can code that in the cell below.* ⬇️

In [None]:
# Here is a text example that you can try to translate from English to another language:
input_text = "Devoxx Belgium is an annual conference for software developers and IT professionals. Held in Antwerp, Belgium, it is one of the largest and most well-known conferences in Europe, attracting thousands of attendees from around the world."

# To execute code cell, select the cell and then click the ▶️ button in the menu above the notebook. You can also select the cell and execute `SHIFT + ENTER`.



### 4 - TTS Pipeline

Great job with the NMT model! 

🎯 Let's explore the Text-to-Speech model now. Your goal is to **convert text to human-like speech**. The process is similar to what you have done for the `ASR` and `NMT` sections, but be aware that the `accept` headers will be different this time, as we want to generate an audio file, not a text.

**➡️ You can find all the necessary information, such as the endpoint URL, format of expected input data, and code examples, in the [03-TTS.MD](https://github.com/eleapttn/workshop-mastering-speech-ai/blob/main/docs/03-TTS.MD) documentation file.**

💡 *TTS Solution is provided in the [0_SPEECH_AI_BASICS_SOLUTION.ipynb](https://github.com/eleapttn/workshop-mastering-speech-ai/blob/main/notebooks/solutions/0_SPEECH_AI_BASICS_SOLUTION.ipynb) notebook. Feel free to try the task on your own and ask questions before checking the solution.*

⬇️ *You can code that in the cell below.* ⬇️

In [None]:
# Write your code here

# To execute code cell, select the cell and then click the ▶️ button in the menu above the notebook. You can also select the cell and execute `SHIFT + ENTER`.



Once you've got the model response, read your generated audio file:

In [None]:
# Play the generated audio file here



### Conclusion

Congratulations on completing this introduction to Speech AI models and AI APIs!

You have successfully:

- Discovered the `ASR`, `NMT`, and `TTS` models.
- Learned how to work with `APIs` and send `HTTP` requests.
- Provided correct input data, and used right header settings to achieve the desired output.

Now, you are ready to move on to another notebook where you will put your new skills into practice with more advanced Speech AI tasks, such as subtitle generation, voice dubbling and other exciting speech AI features.

The `/notebooks/1_GENERATE_SRT_FILE.ipynb` notebook is waiting for you !