# Speech to Text and Language Translator

IBM Watson™ Speech to Text is a cloud-native solution that uses deep-learning AI algorithms to apply knowledge about grammar, language structure, and audio/voice signal composition to create customizable speech recognition for optimal text transcription and the IBM Watson™ Language Translator allows us to translate text programmatically from one language into another language. 

The goal of this project is to implement these technologies using Python.

### References for this project

* https://github.com/watson-developer-cloud/python-sdk
* https://cloud.ibm.com/apidocs/speech-to-text?code=python
* https://cloud.ibm.com/apidocs/language-translator?code=python

<hr>

## Preparation

### Install required package

`ibm-watson` is a Python client library to quickly get started with the various [Watson APIs](http://www.ibm.com/watson/developercloud/) services. See more information about this package [here](https://pypi.org/project/ibm-watson/).
 
`pandas` is a Python package that provides fast, flexible, and expressive data structures designed to make working with structured (tabular, multidimensional, potentially heterogeneous) and time series data both easy and intuitive. See more information about this package [here](https://pypi.org/project/pandas/).

Note:

* `ibm-watson` only support `python 3.5` or above.
* `ibm-watson` must be in version `4.7.1` or above.

In [None]:
!pip install --upgrade "ibm-watson>=4.7.1"
!pip install pandas

### Import required modules

In [None]:
from ibm_cloud_sdk_core.authenticators import IAMAuthenticator
from ibm_watson import SpeechToTextV1 
from ibm_watson import LanguageTranslatorV3
from pandas import json_normalize

### Specifies the sample audio

Specify the audio sample we will use, for this project we will use the file `PolynomialRegressionandPipelines.mp3`

Check out the supported audio formats [here](https://cloud.ibm.com/docs/speech-to-text-data?topic=speech-to-text-data-audio-formats).

In [None]:
filename = '<AUDIO_FILE>'

<hr>

## Transcribes audio to text

IBM Watson™ Speech service allows us to transcribes audio to text to enable speech transcription capabilities for applications.

See more information about this product [here](https://www.ibm.com/cloud/watson-speech-to-text).

<br>

### Add IBM Watson™ Speech to Text Credentials

We can create our instance [here](https://cloud.ibm.com/catalog/services/speech-to-text).

In [None]:
API_S2T = '<YOUR_S2T_APIKEY>'
URL_S2T = '<YOUR_S2T_URL>'

### Speech To Text Authentication

IBM Cloud Identity and Access Management (IAM) is the primary method to authenticate to the API.

Explanation:

* The `IAMAuthenticator` utilizes an apikey to obtain a suitable bearer token and adds it to requests with `apikey` argument.
* The `SpeechToTextV1` is the services we will use.
* The `set_the_url` will make HTTP requests with `service_url` argument.

Read more about authentication [here](https://cloud.ibm.com/apidocs/speech-to-text?code=python#authentication).

In [None]:
s2t_auth = IAMAuthenticator(API_S2T)
speech_to_text = SpeechToTextV1(authenticator=s2t_auth)
speech_to_text.set_service_url(URL_S2T)

### Recognize the audio

Here we use `recognize()` method to sends audio and returns transcription results for a recognition request.

We can pass a maximum of 100 MB and a minimum of 100 bytes of audio with a request.

In [None]:
with open(filename, mode="rb") as wav:
    s2t_response = speech_to_text.recognize(audio=wav, content_type='audio/mp3')

### Explore the transcription result

The output looks like this:

```
{
	'result_index': 0,
	'results': [
		{'final': True, 'alternatives': [{'transcript': ...., 'confidence': ...}]},
		{'final': True, 'alternatives': [{'transcript': ...., 'confidence': ...}]}
    		.... ,
	]
}
```

Explanation:

* The `result_index` field provides a unique identifier for the results.
* The `results` field provides an array of information about the transcription results.
* The `final` field has a value of `true` to indicate that these results will not change, `false` for interim results, which are subject to change.
* The `alternatives` field provides an array of transcription results. For this request, the array includes a single element.
* The `confidence` field is a score that indicates the service's confidence in the transcript.
* The `transcript` field provides the results of the transcription.		

Learn more about recognition result [here](https://cloud.ibm.com/docs/speech-to-text?topic=speech-to-text-basic-response).

In [None]:
s2t_response.result

### Normalize the result

Then we normalizing `alternatives` table.

In [None]:
json_normalize(s2t_response.result['results'],"alternatives")

### Cleaning the result

We will collect the `transcript` value and save it to the `list`, each element will represent `transcript` value.

```
s2t_response.result['results']  <--- s2t_response contains two key: result_index and results.
└── results
    └── alternatives[0]         <--- alternatives contains an one element list of dictionary.
        └── transcript
```

In [None]:
s2t_responses_list = []
for responses in s2t_response.result['results']:
    s2t_responses_list.append(responses['alternatives'][0]['transcript'])

s2t_responses_list

### Final result

Then we create one single `string` from `recognized_test`.

In [None]:
final_result_s2t = ' '.join(s2t_responses_list)
final_result_s2t

<hr>

## Translate the text to another language

IBM Watson™ Language Translator allows us to translate text programmatically from one language into another language.

See more information about this product [here](https://www.ibm.com/watson/services/language-translator/).

<br>

### Add IBM Watson™ Language Translator Credentials

We can create our instance [here](https://cloud.ibm.com/catalog/services/language-translator).

In this example, we use Language Translator Version 2018-05-01.

See about versioning [here](https://github.com/watson-developer-cloud/api-guidelines/#versioning).

In [None]:
API_LT = '<YOUR_LT_APIKEY>'
URL_LT = '<YOUR_LT_URL>'
VER_LT = '2018-05-01'

### Language Translator Authentication

IBM Cloud Identity and Access Management (IAM) is the primary method to authenticate to the API.

Explanation:

* The `IAMAuthenticator` utilizes an apikey to obtain a suitable bearer token and adds it to requests with `apikey` argument.
* The `LanguageTranslatorV3` is the services we will use.
* The `set_the_url` will make HTTP requests with `service_url` argument.

Read more about to authentication [here](https://cloud.ibm.com/apidocs/language-translator-data?code=python#authentication).

In [None]:
lt_auth = IAMAuthenticator(API_LT)
language_translator = LanguageTranslatorV3(version=VER_LT, authenticator=lt_auth)
language_translator.set_service_url(URL_LT)

### Get a list of supported languages

The `list_identifiable_languages()` method returns the language code (for example, `en` for English or `es` for Spanish) and the name of each language.

You also can see supported languages [here](https://cloud.ibm.com/docs/language-translator?topic=language-translator-translation-models).

In [None]:
json_normalize(language_translator.list_identifiable_languages().get_result(), "languages")

### Translate from EN to ID

The `translate()` method will translates the input text from the source language to the target language. 

The `text` parameter take text in UTF-8 encoding with maximum of 50 KB (51,200 bytes) of text with a single request. In this example we use text from `final_result_s2t`.

We can specify `model_id` using format `source-target`. For example, `en-de` selects the IBM-provided base model for English-to-German translation.

Read more about this [here](https://cloud.ibm.com/apidocs/language-translator?code=python#translate).


In [None]:
tl_response = language_translator.translate(text=final_result_s2t, model_id='en-id')
tl_result = tl_response.get_result()

### Explore the translation result

The output looks like this:

```
{
    'translations': [{'translation': ...}],
    'word_count': ...,
    'character_count': ...
}
```

Explanation:

* `word_count`: Number of words in the input text.
* `character_count`: Number of characters in the input text.
* `translations`: List of translation output in UTF-8, corresponding to the input text entries.

Read more about the response [here](https://cloud.ibm.com/apidocs/language-translator-data?code=python#translate).

In [None]:
tl_result

### Get the translation result

We will get the `translation` value and save it to the `final_result_tl`.

```
tl_result
└── translations[0]     <--- translations contains an one element list of dictionary.
    └── translation   
```

In [None]:
final_result_tl = tl_result['translations'][0]['translation']

### Final result

In [None]:
final_result_tl