# Use AI Services with Amazon SageMaker


This notebook demonstrates the integration and use of various AWS AI services to process text and speech.

## Setup: Initializing AWS AI services

In [1]:
import boto3
import json
import urllib.request
import time
from IPython.display import Audio, display

# Initialize clients
textract = boto3.client('textract')
comprehend = boto3.client('comprehend')
translate = boto3.client('translate')
transcribe = boto3.client('transcribe')
polly = boto3.client('polly')
s3 = boto3.resource('s3')
print("Initialized")

Initialized


## Step 1: Document processing with Amazon Textract

This code helps us read and extract text from PDF documents stored in Amazon S3.

Let's break down what this code does:

1. We create a function called `extract_text_from_pdf` that needs two pieces of information:

    - `bucket_name`: the name of our S3 storage location
    - `document_name`: the name of our PDF file

2. The function uses Amazon Textract (a service that can read documents) to:

    - Look at our PDF file in S3
    - Find all the text in the document
    - Organize the text line by line

3. After the text is extracted, we:

    - Keep only the lines of text (ignoring other elements like images)
    - Join all the lines together with line breaks between them
    - Return the complete text as one big string

4. Finally, we use this function by:

    - Specifying our S3 bucket and PDF file names
    - Calling our function to get the text
    - Printing the first 1500 characters of the extracted text
    
    
Think of it like having a robot that can read a PDF document and write down all the text it sees, line by line!

In [2]:
def extract_text_from_pdf(bucket_name, document_name):
    response = textract.detect_document_text(
        Document={'S3Object': {'Bucket': bucket_name, 'Name': document_name}}
    )
    text = [item['Text'] for item in response['Blocks'] if item['BlockType'] == 'LINE']
    return '\n'.join(text)


# Replace with your bucket and file
bucket_name = "lab-data-bucket-303334514428-fc52eb70"
document_name = "ABC_Corporation.pdf"
extracted_text = extract_text_from_pdf(bucket_name, document_name)
print("Extracted Text:\n", extracted_text[:1500])

Extracted Text:
 ABC Corporation - Client Communication Analysis Report
Q1 2025
Executive Summary:
Our organization currently handles 10,000+ daily client communications across multiple languages and regions. This
document outlines our current operational challenges and metrics.
Current Process Statistics:
- Daily Document Volume: 10,000+ communications
- Annual Processing Cost: $1.5M
- Current Error Rate: 15%
- Processing Time: Up to 48 hours per document
- Manual Staff Required: 20+ full-time employees
Client Feedback Sample:
We are extremely frustrated with the long processing times for our documentation. While the staff is professional, the
delays are impacting our business operations significantly. We need faster turnaround times and better accuracy in
translations. The current error rate in document processing is unacceptable for our business needs.
International Client Requirements:
Our clients in Asia Pacific region require immediate translation services for their documents, wh

## Step 2: Sentiment analysis with Amazon Comprehend

This code helps us understand the emotional tone of our text using Amazon Comprehend (which is like having an expert reader who can detect feelings in text). Here's what this code does:

1. We create a function called `analyze_sentiment` that needs one piece of information:
   - `text`: the text we want to analyze for sentiment

2. The function uses Amazon Comprehend to:
   - Look at the first 5000 characters of our text (this is Amazon Comprehend's limit)
   - Analyze the text for emotional tone
   - Set the language to English ('en')

3. After the analysis, we:
   - Store the sentiment results in a variable called `sentiment_result`
   - Print "Sentiment Analysis Result:" as a header
   - Print the detailed results in a nice, formatted way (using json.dumps)


Think of it like having an emotional detective who reads your text and tells you if it's happy, sad, angry, or neutral!

In [3]:
def analyze_sentiment(text):
    response = comprehend.detect_sentiment(
        Text=text[:5000],  # Comprehend limit
        LanguageCode='en'
    )
    return response


sentiment_result = analyze_sentiment(extracted_text)
print("Sentiment Analysis Result:")
print(json.dumps(sentiment_result, indent=2))

Sentiment Analysis Result:
{
  "Sentiment": "NEUTRAL",
  "SentimentScore": {
    "Positive": 0.014260094612836838,
    "Negative": 0.0058274585753679276,
    "Neutral": 0.9793285131454468,
    "Mixed": 0.0005839981604367495
  },
  "ResponseMetadata": {
    "RequestId": "0df079a9-d8a1-4773-8ea1-68a9e070de51",
    "HTTPStatusCode": 200,
    "HTTPHeaders": {
      "x-amzn-requestid": "0df079a9-d8a1-4773-8ea1-68a9e070de51",
      "content-type": "application/x-amz-json-1.1",
      "content-length": "165",
      "date": "Sat, 05 Jul 2025 03:13:34 GMT"
    },
    "RetryAttempts": 0
  }
}


Let's break down what this result tells us about our text:

1. Overall Sentiment: The sentiment of the text was classified as "NEUTRAL"

2. Sentiment Score Breakdown:
   - Neutral: 97.93% confidence (very high)
   - Positive: 1.43% confidence
   - Negative: 0.58% confidence
   - Mixed: 0.06% confidence

This means Amazon Comprehend is very confident (97.93%) that the text has a neutral tone, with very minimal positive, negative, or mixed emotions detected. This kind of result is common when analyzing formal or technical documents.

The rest of the output (ResponseMetadata) is just technical information about the API call, including:
- A unique request ID
- HTTP status code 200 (meaning successful request)
- The timestamp of when the analysis was performed

## Step 3: Language translation with Amazon Translate

This code helps us translate text from one language to another using Amazon Translate (which is like having a multilingual translator at your service). Here's what this code does:

1. We create a function called `translate_text` that needs two pieces of information:
   - `text`: the text we want to translate
   - `target_lang`: the language we want to translate to (defaults to Spanish 'es' if not specified)

2. The function uses Amazon Translate to:
   - Take the first 5000 characters of our text (due to size limits)
   - Automatically detect the source language ('auto')
   - Convert it to our target language
   - Extract just the translated text from the response

3. After creating the function, we:
   - Call it with our extracted text and specify French ('fr') as the target language
   - Print "Translated Text (French):" as a header
   - Show the first 1500 characters of the translated text

Think of it like having an instant translator who can take your text and convert it into any language you choose!

In [4]:
def translate_text(text, target_lang='es'):
    response = translate.translate_text(
        Text=text[:5000],
        SourceLanguageCode='auto',
        TargetLanguageCode=target_lang
    )
    return response['TranslatedText']


translated_text = translate_text(extracted_text, 'fr')
print("Translated Text (French):\n", translated_text[:1500])

Translated Text (French):
 ABC Corporation - Rapport d'analyse des communications avec les clients
PREMIER TRIMESTRE 2025
Résumé :
Notre organisation gère actuellement plus de 10 000 communications quotidiennes avec les clients dans plusieurs langues et régions. Ce
Ce document décrit nos défis opérationnels actuels et nos indicateurs.
Statistiques de processus actuelles :
- Volume quotidien de documents : plus de 10 000 communications
- Coût de traitement annuel : 1,5 million de dollars
- Taux d'erreur actuel : 15 %
- Temps de traitement : jusqu'à 48 heures par document
- Personnel manuel requis : plus de 20 employés à temps plein
Exemple de commentaires des clients :
Nous sommes extrêmement frustrés par les longs délais de traitement de notre documentation. Bien que le personnel soit professionnel,
les retards ont un impact significatif sur nos activités commerciales. Nous avons besoin de délais d'exécution plus rapides et d'une meilleure précision
traductions. Le taux d'erreur actuel

## Step 4: Text-to-speech conversion with Amazon Polly

This code converts written text into spoken audio using Amazon Polly (which is like having a professional voice actor read your text). Here's what this code does:

1. We create a function called `text_to_speech` that needs two pieces of information:
   - `text`: the text we want to convert to speech
   - `filename`: what we want to name our audio file

2. The function uses Amazon Polly to:
   - Take the first 3000 characters of our text
   - Create an MP3 audio file
   - Use 'Joanna' as the voice (one of Amazon Polly's AI voices)

3. After getting the audio, we:
   - Read the audio data stream
   - Save the MP3 file to our S3 bucket in an 'audio-output' folder
   - Print a confirmation message with the file location
   - Create an audio player that can play the sound directly in our notebook

4. Finally, we:
   - Call the function with our extracted text (first 3000 characters)
   - Display an audio player in the notebook that we can use to listen to the text


Think of it like having a virtual assistant who can read your text out loud and save the recording for you to play anytime!

In [8]:
def text_to_speech(text, filename):
    # print(text)
    response = polly.synthesize_speech(
        Text=text[:3000],
        OutputFormat='mp3',
        VoiceId='Joanna'
    )

    # Get the audio stream content
    audio_stream = response['AudioStream'].read()

    # Upload to S3
    s3.Object(bucket_name, f"audio-output/{filename}.mp3").put(
        Body=audio_stream
    )

    print(f"Audio saved: s3://{bucket_name}/audio-output/{filename}.mp3")

    # Create audio player with the audio data directly
    return Audio(data=audio_stream, autoplay=False)


# Call the function and display the audio player
audio_player = text_to_speech(extracted_text[:3000], "lab-output")
display(audio_player)

ABC Corporation - Client Communication Analysis Report
Q1 2025
Executive Summary:
Our organization currently handles 10,000+ daily client communications across multiple languages and regions. This
document outlines our current operational challenges and metrics.
Current Process Statistics:
- Daily Document Volume: 10,000+ communications
- Annual Processing Cost: $1.5M
- Current Error Rate: 15%
- Processing Time: Up to 48 hours per document
- Manual Staff Required: 20+ full-time employees
Client Feedback Sample:
We are extremely frustrated with the long processing times for our documentation. While the staff is professional, the
delays are impacting our business operations significantly. We need faster turnaround times and better accuracy in
translations. The current error rate in document processing is unacceptable for our business needs.
International Client Requirements:
Our clients in Asia Pacific region require immediate translation services for their documents, while European clie

## Step 5: Audio-to-text transcription with Amazon Transcribe

This code converts spoken audio back into text using Amazon Transcribe (which is like having a professional transcriptionist listen to audio and write down what they hear). Here's what this code does:

1. We create a function called `transcribe_audio` that needs one piece of information:
   - `audio_file`: the name of the MP3 file we want to transcribe

2. The function uses Amazon Transcribe to:
   - Start a transcription job with a specific name ("lab-transcription-job")
   - Point to our audio file in the S3 bucket
   - Specify that it's an MP3 file
   - Set the language to US English

3. The function then:
   - Checks the job status every 5 seconds
   - Waits until the job is either COMPLETED or FAILED
   - If successful, gets the transcript text from the result
   - If failed, returns an error message

4. Finally, we:
   - Call the function with our audio file name
   - Print the transcribed text


Think of it like having someone listen to your audio recording and type out everything they hear, word for word!

In [9]:
def transcribe_audio(audio_file):
    job_name = "lab-transcription-job"
    transcribe.start_transcription_job(
        TranscriptionJobName=job_name,
        Media={'MediaFileUri': f"s3://{bucket_name}/audio-output/{audio_file}"},
        MediaFormat='mp3',
        LanguageCode='en-US'
    )
    while True:
        status = transcribe.get_transcription_job(TranscriptionJobName=job_name)
        if status['TranscriptionJob']['TranscriptionJobStatus'] in ['COMPLETED', 'FAILED']:
            break
        time.sleep(5)
    
    # Get the transcript text from the JSON file
    if status['TranscriptionJob']['TranscriptionJobStatus'] == 'COMPLETED':
        transcript_uri = status['TranscriptionJob']['Transcript']['TranscriptFileUri']
        response = urllib.request.urlopen(transcript_uri)
        data = json.loads(response.read())
        return data['results']['transcripts'][0]['transcript']
    else:
        return "Transcription failed"

audio_file_name = "lab-output.mp3"
transcript_text = transcribe_audio(audio_file_name)
print("Transcription Text:", transcript_text)

Transcription Text: ABC Corporation Client communication analysis Report Q1 2025 executive summary. Our organization currently handles 10,000 plus daily client communications across multiple languages and regions. This document outlines our current operational challenges and metrics. Current process statistics, daily document volume, 10,000 plus communications, annual processing cost, $1.5 million current error rate, 15%, process. Time up to 48 hours per document, manual staff required, 20 plus full-time employees, client feedback sample. We are extremely frustrated with the long processing times for our documentation. While the staff is professional, the delays are impacting our business operations significantly. We need faster turnaround times and better accuracy in translations. The current error rate in document processing is unacceptable for our business needs. International client requirements. Clients in Asia Pacific region require immediate translation services for their docume

## DIY challenge

This is a template function for use in the DIY section. Here's what this code is set up to do:

1. We have a function called `diy_polly_conversion` that currently:
   - Uses the same text as before
   - Has a 3000 character limit
   - Uses 'Matthew' as the AI voice (different from our previous example with 'Joanna')
   - You must replace the text/variable for OutputFormat

2. You must replace the text/variable for Body in the code that uploads the .mp3 file to S3.
3. Refer to the Polly code cell in Step 4 for guidance.

In [10]:
def diy_polly_conversion(text, filename):

    # TODO: Modify this function
    response = polly.synthesize_speech(
        Text=text[:3000],
        OutputFormat='mp3',
        VoiceId='Matthew'
    )

    # Get the audio stream content
    audio_stream = response['AudioStream'].read()

    # TODO: Modify this function
    s3.Object(bucket_name, f"diy-output/{filename}.mp3").put(
        Body="audio_stream"
    )

    print(f"Audio saved: s3://{bucket_name}/diy-output/{filename}.mp3")

    # Create audio player with the audio data directly
    return Audio(data=audio_stream, autoplay=False)


# Call the function and display the audio player
audio_player = diy_polly_conversion(extracted_text[:3000], "diy-result")
display(audio_player)

Audio saved: s3://lab-data-bucket-303334514428-fc52eb70/diy-output/diy-result.mp3
