# Lab 7.1: Implementing a Multi-lingual Solution

In this lab, you will use Amazon Transcribe, Amazon Translate, Amazon Polly to convert an audio file from English to Spanish.

## Learning objectives

- Interact with Amazon Transcribe, Amazon Translate, and Amazon Polly using the API and Amazon Boto3. 
- Create a solution to a sentiment analysis business problem.

## Introducing the business scenario

In this lab, you play the role of a machine learning developer working for a media company that translates videos between multiple languages.  


## Lab Steps

To complete this lab, you will follow these steps:

1. ([Amazon Transcribe example](#1.-Amazon-Transcribe-example))
2. ([Amazon Translate example](#2.-Amazon-Translate-example)) 
3. ([Amazon Polly example](#3.-Amazon-Polly-example))
4. ([Challenge Exercise](#4.-Challenge-Exercise))
    
## Submitting your work

1. In the lab console, choose **Submit** to record your progress and when prompted, choose **Yes**.

1. If the results don't display after a couple of minutes, return to the top of the lab instructions and choose **Grades**.

**Tip:** You can submit your work multiple times. After you change your work, choose **Submit** again. Your last submission is what will be recorded for this lab.

1. To find detailed feedback on your work, choose **Details** followed by **View Submission Report**.    
    

## 1. Amazon Transcribe example
([Go to top](#Lab-7.1:-Implementing-a-Multi-lingual-Solution))

In this step you will use the Amazon boto3 client to call Amazon Transcribe and convert an audio file into text. After running the example you can open [Amazon Transcribe](https://console.aws.amazon.com/transcribe/home?region=us-east-1) in the AWS Console to see the transcription.
It will take a few minutes for the transcription to complete.

In [None]:
import uuid
import json
import boto3
from time import sleep

bucket = 'c46255a638438l1748394t1w538120888142-labbucket-12figcw8iu648'
database_access_role_arn = 'arn:aws:iam::538120888142:role/service-role/c46255a638438l1748394t1w5-ComprehendDataAccessRole-1A1092NM0Q4C7'
translate_access_role_arn = 'arn:aws:iam::538120888142:role/c46255a638438l1748394t1w53812088-TranslateDemoRole-VDMR4T2LVEBF'

transcribe_client = boto3.client("transcribe")

The sample file named `test.wav` can be found in the /s3 folder. It contains the audio phrase **Test, Hello, hello, hello. This is a test. Test, test ,test.**

In [None]:
#create input paramters for job_name and job_uri
media_input_uri = f's3://{bucket}/lab71/transcribe-sample/test.wav'

Start by creating the transcription job using the `test.wav` as the intput. Note you need to specify an output location.

In [None]:
#create the transcription job
job_uuid = uuid.uuid1()
transcribe_job_name = f"transcribe-job-{job_uuid}"
transcribe_output_filename = 'transcribe_output.txt'

response = transcribe_client.start_transcription_job(
    TranscriptionJobName=transcribe_job_name,
    Media={'MediaFileUri': media_input_uri},
    MediaFormat='wav',
    LanguageCode='en-US',
    OutputBucketName=bucket,
    OutputKey=transcribe_output_filename
)

Wait until the job completes.

In [None]:
job=None
while True:
    job = transcribe_client.get_transcription_job(TranscriptionJobName = transcribe_job_name)
    if job['TranscriptionJob']['TranscriptionJobStatus'] in ['COMPLETED','FAILED']:
        break
    print('.', end='')
    sleep(20)
        
print(job['TranscriptionJob']['TranscriptionJobStatus'])

If the status above is **COMPLETED**, then you can proceed. Otherwise correct the erro and retry the previous cell.

You can grab the output file using the results from the `get_transcription_job` method call.

In [None]:
transcription_file = job['TranscriptionJob']['Transcript']['TranscriptFileUri']
print(transcription_file)

Download the file from S3 using the s3 client.

In [None]:
s3_client = boto3.client('s3')
with open(transcribe_output_filename, 'wb') as f:
    s3_client.download_fileobj(bucket, transcribe_output_filename, f)

Open the file and read the contents into a JSON object.

In [None]:
with open(transcribe_output_filename) as f:
  data = json.load(f)

In [None]:
data

The actual transcription can be found here:

In [None]:
data['results']['transcripts'][0]['transcript']

## 2. Amazon Translate example

([Go to top](#Lab-7.1:-Implementing-a-Multi-lingual-Solution))

In this step you will use the Amazon boto3 client to call Amazon Translate and convert a text file from English to Spanish.After running the cell you can open [Amazon Translate](https://console.aws.amazon.com/translate/home?region=us-east-1#batch-translation) in the AWS Console to see the translation.
The translation and details about the job are in the Batch Translation section. The text file containing the translation will be in your Amazon S3 bucket. There will also be a details folder containing a JSON file with details about the translation, such as the source and target languages.


Start by creating a translation job. The input and output locations are needed. Note Amazon Translate can translate the same text into multiple target languages. In this example, you will use Spanish, for which the code is **es**.

In [None]:
import uuid

translate_client = boto3.client(service_name='translate')

input_data = f's3://{bucket}/lab71/translate-sample'
output_data = f's3://{bucket}'

job_uuid = uuid.uuid1()
translate_job_name = f"translate-job-{job_uuid}"
translate_job_submission = translate_client.start_text_translation_job(
    JobName=translate_job_name,
    InputDataConfig={'S3Uri': input_data, 'ContentType':'text/plain'},
    OutputDataConfig={'S3Uri': output_data},
    DataAccessRoleArn=translate_access_role_arn,
    SourceLanguageCode='en',
    TargetLanguageCodes=['es']
)
translate_job_id = translate_job_submission['JobId']

You can use the job id extracted from the cell above to get the status and wait for it to complete. Note this can take a few minutes to complete.

In [None]:
while True:
    translate_job = translate_client.describe_text_translation_job(JobId=translate_job_id)
    if translate_job['TextTranslationJobProperties']['JobStatus'] in ['COMPLETED','FAILED']:
        break
    sleep(20)
    print('.', end='')

print(translate_job['TextTranslationJobProperties']['JobStatus'])

If the above cell finished with **COMPLETED** then you can proceed. If not go back and fix the error and try again.

The format of the output folder is created using the account number and job id. The following cell creates a path using this information.

In [None]:
account_id = boto3.client('sts').get_caller_identity().get('Account')
translate_output_path = f'{account_id}-TranslateText-{translate_job_id}/'

Translate outputs several files. You are interested in the .txt file. The following code will download the txt file which is the results from the translation.

In [None]:
s3_resource = boto3.resource('s3')
my_bucket = s3_resource.Bucket(bucket)

for my_bucket_object in my_bucket.objects.filter(Prefix=translate_output_path):
    file=my_bucket_object.key
    if file.endswith('txt'):
        file = file.lstrip(translate_output_path)
        file = file.lstrip('/')
        print(file)
        with open(file, 'wb') as f:
            s3_client.download_fileobj(bucket, my_bucket_object.key, f)

## 3. Amazon Polly example

([Go to top](#Lab-7.1:-Implementing-a-Multi-lingual-Solution))

In this step you will use the Amazon boto3 client to call [Amazon Polly](https://console.aws.amazon.com/polly/home/SynthesisTasks) and create a vocalization of a text file in Spanish.
After you run the cell open your Amazon S3 bucket to see the output. The output is an mp3 file with a long string for a file name. You can open the file and hear the Lucia voice saying "Prueba de prueba, este es una prueba."


In [None]:
polly_client = boto3.client('polly')

itemname = 'lab71/polly-sample/es.test.txt'
obj = s3_resource.Object(bucket, itemname )
body = obj.get()['Body'].read().decode('utf-8')

response = polly_client.start_speech_synthesis_task(
    Engine='standard',
    OutputFormat='mp3',
    OutputS3BucketName=bucket,
    Text=body,
    VoiceId='Lucia'
) 


You can extract the task id from the response.

In [None]:
task_id = response['SynthesisTask']['TaskId']

You can use this task_id to check to see if the job has completed.

In [None]:
while True:
    polly_job = polly_client.get_speech_synthesis_task(TaskId=task_id)
    if polly_job['SynthesisTask']['TaskStatus'] in ['completed','failed']:
        break
    sleep(20)
    print('.', end='')

print(polly_job['SynthesisTask']['TaskStatus'])

If the above cell exists with **completed**, then proceed. If not, go ahead and fix the problem before proceeding.

The following cell will download the results.

In [None]:
s3_client = boto3.client('s3')
polly_output_filename = f'{task_id}.mp3'
with open(polly_output_filename, 'wb') as f:
    s3_client.download_fileobj(bucket, polly_output_filename, f)

## 4. Challenge Exercise

([Go to top](#Lab-7.1:-Implementing-a-Multi-lingual-Solution))
    
Your challenge for this lab is to create a translated audio file from a video with an English audio channel. You can use the code from the previous three examples as a template for your solution.

You can fine the video for the challenge in your Amazon S3 bucket in the `lab71/challenge` folder and is named `sample.mp4`. You can also find this file in the `/s3` folder in this notebook instance.


In [None]:
#create input paramters for job_name and job_uri
media_input_uri = f's3://{bucket}/lab71/challenge/sample.mp4'

In [None]:
#create the transcription job
job_uuid = uuid.uuid1()
transcribe_job_name = f"challenge-job-{job_uuid}"
transcribe_output_filename = 'challenge_output.txt'

response = transcribe_client.start_transcription_job(
    TranscriptionJobName=transcribe_job_name,
    Media={'MediaFileUri': media_input_uri},
    MediaFormat='mp4',
    LanguageCode='en-US',
    OutputBucketName=bucket,
    OutputKey=f'challenge/{transcribe_output_filename}'
)

In [None]:
job=None
while True:
    job = transcribe_client.get_transcription_job(TranscriptionJobName = transcribe_job_name)
    if job['TranscriptionJob']['TranscriptionJobStatus'] in ['COMPLETED','FAILED']:
        break
    print('.', end='')
    sleep(20)
        
print(job['TranscriptionJob']['TranscriptionJobStatus'])

In [None]:
transcription_file = job['TranscriptionJob']['Transcript']['TranscriptFileUri']
print(transcription_file)

In [None]:
s3_client = boto3.client('s3')
with open(transcribe_output_filename, 'wb') as f:
    s3_client.download_fileobj(bucket, f'challenge/{transcribe_output_filename}', f)

In [None]:
with open(transcribe_output_filename) as f:
  data = json.load(f)

In [None]:
challenge_text = data['results']['transcripts'][0]['transcript']
print(challenge_text)

In [None]:
s3_resource = boto3.resource('s3')
s3_resource.Object(bucket, 'challenge-translate-input/input.txt').put(Body=challenge_text)

In [None]:
import uuid

translate_client = boto3.client(service_name='translate')

input_data = f's3://{bucket}/challenge-translate-input'
output_data = f's3://{bucket}/challenge-translate-output'

job_uuid = uuid.uuid1()
translate_job_name = f"translate-job-{job_uuid}"
translate_job_submission = translate_client.start_text_translation_job(
    JobName=translate_job_name,
    InputDataConfig={'S3Uri': input_data, 'ContentType':'text/plain'},
    OutputDataConfig={'S3Uri': output_data},
    DataAccessRoleArn=database_access_role_arn,
    SourceLanguageCode='en',
    TargetLanguageCodes=['es']
)
translate_job_id = translate_job_submission['JobId']

In [None]:
while True:
    translate_job = translate_client.describe_text_translation_job(JobId=translate_job_id)
    if translate_job['TextTranslationJobProperties']['JobStatus'] in ['COMPLETED','FAILED']:
        break
    sleep(20)
    print('.', end='')

print(translate_job['TextTranslationJobProperties']['JobStatus'])

In [None]:
account_id = boto3.client('sts').get_caller_identity().get('Account')

In [None]:
translate_output_path = f'challenge-translate-output/{account_id}-TranslateText-{translate_job_id}/'

In [None]:
s3_resource = boto3.resource('s3')
my_bucket = s3_resource.Bucket(bucket)

for my_bucket_object in my_bucket.objects.filter(Prefix=translate_output_path):
    file=my_bucket_object.key
    if file.endswith('txt'):
        polly_input = file
        
print(polly_input)

In [None]:
polly_client = boto3.client('polly')

obj = s3_resource.Object(bucket, polly_input )
body = obj.get()['Body'].read().decode('utf-8')

response = polly_client.start_speech_synthesis_task(
    Engine='standard',
    OutputFormat='mp3',
    OutputS3BucketName=bucket,
    Text=body,
    VoiceId='Lucia'
) 


In [None]:
task_id = response['SynthesisTask']['TaskId']

In [None]:
while True:
    polly_job = polly_client.get_speech_synthesis_task(TaskId=task_id)
    if polly_job['SynthesisTask']['TaskStatus'] in ['completed','failed']:
        break
    sleep(20)
    print('.', end='')

print(polly_job['SynthesisTask']['TaskStatus'])

In [None]:
s3_client = boto3.client('s3')
polly_output_filename = f'{task_id}.mp3'
with open(polly_output_filename, 'wb') as f:
    s3_client.download_fileobj(bucket, polly_output_filename, f)

*©2021 Amazon Web Services, Inc. or its affiliates. All rights reserved. This work may not be reproduced or redistributed, in whole or in part, without prior written permission from Amazon Web Services, Inc. Commercial copying, lending, or selling is prohibited. All trademarks are the property of their owners.*
