Objective:
- Provide an example of creating alt-text descriptions for images using aws bedrock -- specifically Claude Sonnet 3.5 v1.

What are we using?
- Python
- AWS
-- Amazon Bedrock
-- IAM
-- Boto3 (AWS SDK for Python)
- Claude 3 Sonnet

What do you need to install?
- python-dotenv
-- to load our creds
- boto3
- pybase64
- pillow
-- images sent to the llm need to be less than 5 mb. Use the PIL to scale all images to 300 dpi


In [None]:
# start by installing dependencies
import sys

!{sys.executable} -m pip install python-dotenv
!{sys.executable} -m pip install boto3
!{sys.executable} -m pip install pybase64
!{sys.executable} -m pip install pillow


In [None]:
import os
import dotenv
import boto3
import json
from PIL import Image
import base64

In [None]:
# load environment variables
# we use override=True to ensure that the values are refreshed if we edit them on the external 
# configuration file since there seems to be a bug with the Jupyter extension for VS Code where 
# it doesn't reload them even if you close and open the notebook again
dotenv.load_dotenv(".env", override=True)

In [None]:
# set our credentials from the environment values loaded form the .env file
AWS_ACCESS_KEY_ID = os.environ.get('AWS_ACCESS_KEY_ID')
AWS_SECRET_ACCESS_KEY = os.environ.get('AWS_SECRET_ACCESS_KEY')
AWS_REGION = os.environ.get('AWS_REGION')

In [None]:
# instantiate a bedrock client using boto3 (AWS' official Python SDK)
bedrock_runtime_client = boto3.client(
    'bedrock-runtime',
    aws_access_key_id=AWS_ACCESS_KEY_ID,
    aws_secret_access_key=AWS_SECRET_ACCESS_KEY,
    region_name=AWS_REGION
)

To use the model -- you need to retrieve the model ID.  This can be retreived from the Bedrock interface or via the AWS CLI.  

**Using the Bedrock interface**
+ Navigate to Amazon Bedrock
+ In the foundational models, select Claude
+ Select the Claude Sonnet 3.5 v1 model.  You will see the Model Id and Model ARN.  You need the model id.

**Using the AWS CLI**
+ aws bedrock list-foundational-models --by-provider anthropic

[AWS Documentation: list-foundational-models](https://docs.aws.amazon.com/cli/latest/reference/bedrock/list-foundation-models.html)

Example Response: 

[cloudshell-user@ip-10-138-4-48 ~]$ aws bedrock  list-foundation-models --by-provider anthropic  
  
    {  
        "modelSummaries": [          
            {  
                "modelArn": "arn:aws:bedrock:us-west-2::foundation-model/anthropic.claude-instant-v1:2:100k",  
                "modelId": "anthropic.claude-instant-v1:2:100k",  
                "modelName": "Claude Instant",  
                "providerName": "Anthropic",  
                "inputModalities": [  
                    "TEXT"  
                ],  
                "outputModalities": [  
                    "TEXT"  
                ],  
                "responseStreamingSupported": true,  
                "customizationsSupported": [],  
                "inferenceTypesSupported": [  
                    "PROVISIONED"  
                ],  
                "modelLifecycle": {  
                    "status": "ACTIVE"  
                }   
            }  

In [None]:
# select the model id
model_id = "anthropic.claude-3-5-sonnet-20240620-v1:0"

In [None]:
# There is a 5 MB limit on the image data buffer.  This means
# that we need to make sure images are scaled.  

# So, check the image size.  If the image is larger that 4.5 mbs, 
# scale it.
filename = 'SPEC-PA-56-0024-Box17-Folder22_0031.tif'
image_path = 'data/' + filename
tmp_image = 'data/' + filename + '.png'

image_size = os.path.getsize(image_path)

if image_size < 4500000:
    tmp_image = image_path
else:    
    base_width = 800
    img = Image.open(image_path)
    wpercent = (base_width / float(img.size[0]))
    hsize = int((float(img.size[1]) * float(wpercent)))
    img = img.resize((base_width, hsize), Image.Resampling.LANCZOS)
    img.save(tmp_image)

with open(tmp_image, 'rb') as image_file:
    encoded_image = base64.b64encode(image_file.read()).decode()

if tmp_image != image_path:
    # delete temp file
    os.remove(tmp_image)

In [None]:
prompt = """Generate WCAG 2.1-compliant alt text and longer description for a stand-alone image. The output must be in strict JSON format as follows:" _
    {\"image\":  {\"alt\": \"Alternative text\",\"desc\": \"Long-description\"}}
    Follow these guidelines to create appropriate and effective alt text:
    1. Image Description:
       - Describe the key elements of the image, including objects, people, scenes, and any visible text.
       - Consider the image’s role within the PDF. What information or function does it provide?
    2. WCAG 2.1 Compliance:
       a) Text in Image:
          - If duplicated nearby, use empty alt text: alt=“”
          - For functional text (e.g., icons), describe the function
          - Otherwise, include the exact text
       b) Functional Images:
          - For links/buttons, describe the action/destination
       c) Informative Images:
          - Provide a concise description of essential information
          - For complex images, summarize key data or direct to full information
       d) Decorative Images:
          - Use empty alt text: alt=“”
    3. Output Guidelines:
       - Keep alt text short, clear, and relevant and no longer than 25 words
       - Keep long description clear and relevant and no longer than 250 words
       - Ensure it enhances accessibility for assistive technology users
    Remember:
    - Provide only the JSON output with no additional explanation
    - Do not use unnecessary phrases like “Certainly!” or “Here’s the alt text:”
    - If you’re unsure about specific details, focus on describing what you can clearly determine from the context provided
    Now, based on the information given and these guidelines, generate the appropriate alt text in the required JSON format."""

payload = {
    "messages": [
        {
            "role": "user",
            "content": [
                {
                    "type": "image",
                    "source": {
                        "type": "base64",
                        "media_type": "image/jpeg",
                        "data": encoded_image
                    }
                },
                {
                    "type": "text",
                    "text": prompt
                }
            ]
        }
    ],
    "max_tokens": 10000,
    "anthropic_version": "bedrock-2023-05-31"
}

Each model has a different template for interaction.  

For Claude 3 models, the template is the following:

    payload = {  
        "messages": [  
            {  
                "role": "",  
                "content": []  
            }  
        ],  
        "anthropic_version": ""  
    }  


Messages is an array of json objects which must contain at least one item following. Each message must strictly follow the schema and declare:
- role: this can be either user, or system. 
- content: this is also an array as you can send multiple content items in one API request to Claude. At minimum you will have one.

https://community.aws/content/2dfToY7frDS4y8LsTkntgBzORju/hands-on?lang=en


we first need to load our image and convert it to base64

In [None]:
# we're ready to invoke the model!
response = bedrock_runtime_client.invoke_model(
    modelId=model_id,
    contentType="application/json",
    body=json.dumps(payload)
)

In [None]:
# now we need to read the response. It comes back as a stream of bytes 
# so if we want to display the response in one go we need to read the full stream first
# then convert it to a string as json and load it as a dictionary 
# so we can access the field containing the content without all the metadata noise
output_binary = response["body"].read()
output_json = json.loads(output_binary)
output = output_json["content"][0]["text"]


In [None]:
print(output)