# Interacting with Claude 3-Sonnet with images

## Context

Claude 3 now includes the ability to pass an image along with text to the model. This allows you to ask questions about an image opening up a another dimension of interactivity. With Claude 3, the new Messages API body format is required. The following is an example of a multimodal in the Messages API format.

Please see [Claude Vision](https://docs.anthropic.com/claude/docs/vision) for more details on Claude 3 multimodal capabilties and [Amazon Bedrock Claude Messages API](https://docs.aws.amazon.com/bedrock/latest/userguide/model-parameters-anthropic-claude-messages.html) for working with the new Messages API on Bedrock


In [None]:
!pip3 install -qU boto3

In [None]:
{
  "modelId": "anthropic.claude-3-sonnet-20240229-v1:0",
  "contentType": "application/json",
  "accept": "application/json",
  "body": {
    "anthropic_version": "bedrock-2023-05-31",
    "max_tokens": 1000,
    "messages": {
      "role": "user",
      "content": [
        {
          "type": "image",
          "source": {
            "type": "base64",
            "media_type": "image/jpeg",
            "data": "iVBORw..."
          }
        },
        {
          "type": "text",
          "text": "What's in this image?"
        }
      ]
    }
  }
}

## Notebook Walkthrough

In this notebook, we will provide an image to the Claude 3-Sonnet model with model identifier __"anthropic.claude-3-sonnet-20240229-v1:0"__ together with a text query asking about what is in the image. To do this, we will package the image and text into the __MessagesAPI__ format and utilize the __invoke_model__ function from __bedrock-runtime__ within our helper function defined below to generate a response from Claude 3.

## Setup

### Here we install some of the required libraries needed in this notebook.

In [None]:
%pip install --upgrade pip
%pip install boto3>=1.33.2 --force-reinstall --quiet
%pip install botocore>=1.33.2 --force-reinstall --quiet


### Restart the kernel with the updated packages that are installed through the dependencies above

In [None]:
# restart kernel
from IPython.core.display import HTML
HTML("<script>Jupyter.notebook.kernel.restart()</script>")

### Follow the steps below to set up necessary packages
1. Import the necessary libraries for creating the __bedrock-runtime__ needed to invoke foundation models, formatting our JSON bodies, and converting our images into base64 encoding

In [None]:
import boto3
import json
import base64

bedrock_client = boto3.client('bedrock-runtime',region_name='us-east-1')


### Define helper function to pass our models, messages, and inference parameters

In [None]:
def generate_message(bedrock_runtime, model_id, messages, max_tokens,top_p,temp):

    body=json.dumps(
        {
            "anthropic_version": "bedrock-2023-05-31",
            "max_tokens": max_tokens,
            "messages": messages,
            "temperature": temp,
            "top_p": top_p
        }  
    )  
    
    response = bedrock_runtime.invoke_model(body=body, modelId=model_id)
    response_body = json.loads(response.get('body').read())

    return response_body

## Example use case: Extract information from an image

The following image __"calendar_screenshot.jpg"__ will be used in the demo

![./images/calendar_screenshot.png](./images/calendar_screenshot.png)

### Process the image

Here we process the image into b64 encoding. The result will be used as the image component of the message given to Claude 3. For further details on processing of the images for use in an API call please see [Claude Vision](https://docs.anthropic.com/claude/docs/vision)

In [None]:
# Read reference image from file and encode as base64 strings.
with open('./images/calendar_screenshot.png', "rb") as image_file:
    content_image = base64.b64encode(image_file.read()).decode('utf8')

### Create message payload that incorporates text and image input

Here we create the multimodal content message for our input to Claude 3 with seperate JSON objects for the text component and the image component.

In [None]:
prompt = """
Extract the following information from the image in JSON format. Use the provided field names and include the field descriptions as specified:

- "title": The title of the session.
- "time": The time of the event in EDT.
- "duration": The duration of the session in minutes.
- "speakers": A list of objects, each containing:
  - "name": The name of the person.
  - "title": The title or position of the person.
  - "organization": The organization the person is associated with.
  - "role": The role the person has for the session - Example: speaker, moderator, panelist 

Ensure that events without speakers are not included in the output (example - Lunch).
"""

In [None]:
message_mm=[

    { "role": "user",
      "content": [
      {"type": "image","source": { "type": "base64","media_type":"image/jpeg","data": content_image}},
      {"type": "text","text":prompt}
      ]
    }
]


### Generate the response from Claude 3

Finally, we can see the multimodal capabilities in action, asking Claude 3 about what is in the image. The model identifier we are using for this example is __"anthropic.claude-3-sonnet-20240229-v1:0"__.

In [None]:
response = generate_message(bedrock_client, model_id = "anthropic.claude-3-sonnet-20240229-v1:0",messages=message_mm,max_tokens=1024,temp=0.5,top_p=0.9)
response

In [None]:
print(response['content'][0]['text'])

## Next Steps

Now that we have seen how to incorporate multi-modal capabilities of Claude 3-Sonnet on Amazon Bedrock, try asking a different question about the image.

# Function Calling

In [None]:
# Read reference image from file and encode as base64 strings.
with open('./images/FINCIN-105.png', "rb") as image_file:
    content_image = image_file.read()

In [None]:
tools =[
    {
            "toolSpec": {
                "name": "extract",
                "description": "Accurately extract the information from the form provided",
                "inputSchema": {
                    "json": {
                        "type": "object",
                        "properties": {
                            "formType": {
                                "type": "string",
                                "description": "The type or name of the form."
                            },
                            "formAgency": {
                                "type": "string",
                                "description": "The agency responsible for the form"
                            },
                            "formDate": {
                                "type": "string",
                                "description": "The date of the form in MM/YYYY format"
                            },
                            "PartI": {
                                "type": "object",
                                "properties": {
                                    "nameOfPerson": {
                                        "type": "string",
                                        "description": "The name of the person"
                                    },
                                    "identificationNumber": {
                                        "type": "string",
                                        "description": "The personal identification number"
                                    },
                                    "dateOfBirth": {
                                        "type": "string",
                                        "description": "The date of birth of the person in MM/DD/YYYY format"
                                    },
                                    "permanentAddress": {
                                        "type": "string",
                                        "description": "The permanent address in the united states or abroad"
                                    },
                                    "countryCitizenship": {
                                        "type": "string",
                                        "description": "The country or countries of citizenship"
                                    },
                                    "addressWhileInUS": {
                                        "type": "string",
                                        "description": "The address of the person while in the United States"
                                    },
                                    "PassportNo": {
                                        "type": "string",
                                        "description": "The passport number"
                                    },
                                    "passportCountry": {
                                        "type": "string",
                                        "description": "The passport country in English"
                                    },
                                    "usVisaDate": {
                                        "type": "string",
                                        "description": "The US Visa Date in MM/DD/YYYY format"
                                    },
                                    "usVisaIssuedAt": {
                                        "type": "string",
                                        "description": "The place in the untied states visa was issued"
                                    },
                                    "usViasNo": {
                                        "type": "string",
                                        "description": "The immigration alien number or visa number"
                                    },
                                    "exportedFromCity": {
                                        "type": "string",
                                        "description": "The US port or city that departed from for export from the US"
                                    },
                                    "exportedToCity": {
                                        "type": "string",
                                        "description": "The foreign city or country that arrived for export from the US"
                                    },
                                    "importedFromCity": {
                                        "type": "string",
                                        "description": "The foreign city or country that departed from for import to the US"
                                    },
                                    "importedToCity": {
                                        "type": "string",
                                        "description": "The US port or city that imported into the US"
                                    },
                                    "dateShipped": {
                                        "type": "string",
                                        "description": "The date that the currency was mailed or shipped in MM/DD/YYYY format"
                                    },
                                    "dateReceived": {
                                        "type": "string",
                                        "description": "The date that the currency was mailed or shipped was received in MM/DD/YYYY format"
                                    },
                                    "methodShipped": {
                                        "type": "string",
                                        "description": "The method of shipment"
                                    },
                                    "carrier": {
                                        "type": "string",
                                        "description": "The name of the carrier for shipment"
                                    },
                                    "shippedTo": {
                                        "type": "string",
                                        "description": "The name and address that the currency was mailed or shipped to"
                                    },
                                    "receivedFrom": {
                                        "type": "string",
                                        "description": "The name and address that the currency was mailed or shipped from."
                                    }
                                },
                                "required": [
                                    "nameOfPerson",
                                    "identificationNumber",
                                    "dateOfBirth",
                                    "permanentAddress",
                                    "countryCitizenship",
                                    "addressWhileInUS",
                                    "PassportNo",
                                    "passportCountry",
                                    "usVisaDate",
                                    "usVisaIssuedAt",
                                    "usViasNo",
                                    "exportedFromCity",
                                    "exportedToCity",
                                    "importedFromCity",
                                    "importedToCity",
                                    "dateShipped",
                                    "dateReceived",
                                    "methodShipped",
                                    "carrier",
                                    "shippedTo",
                                    "receivedFrom"
                                ]
                            },
                            "PartII": {
                                "type": "object",
                                "properties": {
                                    "nameOnWhoseBehalf": {
                                        "type": "string",
                                        "description": "The name of person(s) or business on whose behalf import or export was conducted"
                                    },
                                    "addressOnWhoseBehalf": {
                                        "type": "string",
                                        "description": "The permanent address in US or abroad on whose behalf import or export was conducted"
                                    },
                                    "typeOfActivity": {
                                        "type": "string",
                                        "description": "The type of business activity, occupation, or profession"
                                    },
                                    "isBusinessABank": {
                                        "type": "string",
                                        "description": "Yes or No, is the business a bank?"
                                    }
                                },
                                "required": [
                                    "nameOnWhoseBehalf",
                                    "addressOnWhoseBehalf",
                                    "typeOfActivity",
                                    "isBusinessABank"
                                ]
                            },
                            "PartIII": {
                                "type": "object",
                                "properties": {
                                    "currencyAndCoinsAmount": {
                                        "type": "string",
                                        "description": "amount of currency and coins"
                                    },
                                    "otherMonetaryInstruments": {
                                        "type": "string",
                                        "description": "amount of other monetary instruments"
                                    },
                                    "totalAmount": {
                                        "type": "string",
                                        "description": "The total amount of currency and monetary instruments"
                                    },
                                    "otherCurrency": {
                                        "type": "string",
                                        "description": "Currency name if other an US currency"
                                    },
                                    "otherCountry": {
                                        "type": "string",
                                        "description": "Name of country of other currency"
                                    }
                                },
                                "required": [
                                    "currencyAndCoinsAmount",
                                    "otherMonetaryInstruments",
                                    "totalAmount",
                                    "otherCurrency",
                                    "otherCountry"
                                ]
                            },
                            "PartIV": {
                                "type": "object",
                                "properties": {
                                    "nameAndTitle": {
                                        "type": "string",
                                        "description": "The name and title of the person completing this form"
                                    },
                                    "signature": {
                                        "type": "string",
                                        "description": "the signature of the person completing this form"
                                    },
                                    "dateOfReport": {
                                        "type": "string",
                                        "description": "The date of the completion of this report in MM/DD/YYYY format"
                                    }
                                },
                                "required": [
                                    "nameAndTitle",
                                    "signature",
                                    "dateOfReport"
                                ]
                            }
                        },
                        "required": [
                            "formType",
                            "formAgency",
                            "formDate",
                            "PartI",
                            "PartII",
                            "PartIII",
                            "PartIv"
                        ]
                    }
                }
            }

    }
]


### Create message payload that incorporates text and image input

Here we create the multimodal content message for our input to Claude 3 with seperate JSON objects for the text component and the image component.

In [None]:
prompt = """
Extract information from this document in JSON.  Be very detailed and accurate.  Ensure you extract all of the information.  The fields do not have information mark the value as UNKNOWN.

Use the extract tool
"""

### Generate the response from Claude 3

Finally, we can see the multimodal capabilities in action, asking Claude 3 about what is in the image. The model identifier we are using for this example is __"anthropic.claude-3-sonnet-20240229-v1:0"__.

In [None]:
message_mm = [
    {
        "role": "user",
        "content": [
            {
                "text": prompt
            },
            {
                    "image": {
                        "format": 'png',
                        "source": {
                            "bytes": content_image
                        }
                    }
            }
        ]
    }
]

In [None]:
response = bedrock_client.converse(
    modelId="anthropic.claude-3-sonnet-20240229-v1:0",
    inferenceConfig={
        "temperature": 1.0,
        "maxTokens": 2048
    },
    messages=message_mm,
    toolConfig={"tools": tools}
)

In [None]:
print(json.dumps(response['output']['message']['content'][0]['toolUse']['input'],indent=2))

In [None]:
ocr_text = """
FINANCIAL GENERAL
DEPARTMENT OF THE TREASURY
OMB NO 1506-0014
FinCEN Form
105
FINANCIAL CRIMES ENFORCEMENT NETWORK
To be filed with the Bureau of
July 2017
REPORT OF INTERNATIONAL
Customs and Border Protection
For Paperwork Reduction Act
Department of the Treasury
TRANSPORTATION OF CURRENCY
Notice and Privacy Act Notice,
FinCEN
OR MONETARY INSTRUMENTS
see back of form.
Please type or print.
31 U.S.C. 5316; 31 CFR 1010.340 and 1010.306
PART I
FOR A PERSON DEPARTING OR ENTERING THE UNITED STATES, OR A PERSON SHIPPING MAILING OR RECEIVING CURRENCY OR
MONETARY INSTRUMENTS (IF ACTING FOR ANYONE ELSE, ALSO COMPLETE PART II BELOW.)
1. NAME (Last or family, first, and middle)
2. IDENTIFICATION NO. (See instructions)
3. DATE OF BIRTH (Mo/Day/Yr.)
John SMITH
123-45-6789
04 01 1980
4. PERMANENT ADDRESS IN UNITED STATES OR ABROAD
5. YOUR COUNTRY OR COUNTRIES OF
CITIZENSHIP
Ziegelhutten weg 37, 60598 Frankfurt
6. ADDRESS WHILE IN THE UNITED STATES
Germany
7. PASSPORT NO. & COUNTRY
123 S. 50TH
KANSAS CITY, MO
87654321 - DE
8. U.S. VISA DATE (Mo./Day/Yr.)
9. PLACE UNITED STATES VISA WAS ISSUED
10. IMMIGRATION ALIEN NO.
01
01
2022
NEW YORK
N/A
11. IF CURRENCY OR MONETARY INSTRUMENT IS ACCOMPANIED BY A PERSON, COMPLETE 11a OR 11b. not both
A. EXPORTED FROM THE UNITED STATES
COMPLETE "A" OR "B" NOT BOTH
B. IMPORTED INTO THE UNITED STATES
Departed From: (U.S. Port/City in U.S.)
Arrived At (Foreign City/Country)
Departed From: (Foreign City/Country)
Arrived At: (City in U.S.)
12. IF CURRENCY OR MONETARY INSTRUMENT WAS MAILED OR OTHERWISE SHIPPED, COMPLETE 12a THROUGH 12f
12a. DATE SHIPPED (Mo/Day/Yr.)
12b. DATE RECEIVED (Mo/Day/Yr.)
12c. METHOD OF SHIPMENT (e.g. U.S. Mail, Public Carrier, etc.)
12d. NAME OF CARRIER
12e SHIPPED TO (Name and Address)
12f. RECEIVED FROM (Name and Address)
PART II INFORMATION ABOUT PERSON(S) OR BUSINESS ON WHOSE BEHALF IMPORTATION OR EXPORTATION WAS CONDUCTED
13. NAME (Last or family, first, and middle or Business Name)
JANE DOE
14. PERMANENT ADDRESS IN UNITED STATES OR ABROAD
456 N Ozark Rd.
15. TYPE OF BUSINESS ACTIVITY, OCCUPATION, OR PROFESSION
15a. IS THE BUSINESS A BANK?
PERSONAL
Yes
No
PART III CURRENCY AND MONETARY INSTRUMENT INFORMATION (SEE INSTRUCTIONS ON REVERSE)(To be completed by everyone)
17. IF OTHER THAN U.S. CURRENCY
16. TYPE AND AMOUNT OF CURRENCY/MONETARY INSTRUMENTS
IS INVOLVED, PLEASE COMPLETE
Currency and Coins
$ 100,000
BLOCKS AAND B.
A. Currency Name
Other Monetary Instruments
$
(Specify type, issuing entity and date, and serial or other identifying number.)
B. Country
(TOTAL)
$ 100,000
PARTIV
SIGNATURE OF PERSON COMPLETING THIS REPORT
Under penalties of perjury, I declare that have examined this report, and to the best of my knowledge and belief it is true, correct and complete.
18. NAME AND TITLE (Print)
19. SIGNATURE
20. DATE OF REPORT (Mo./Day/Yr.)
John Smith - Owner
J.South
08 15 2023
CUSTOMS AND BORDER PROTECTION USE ONLY
PORT CODE
CBP QUERY?
COUNT VERIFIED
VOLUNTARY
THIS SHIPMENT IS
INBOUND
OUTBOUND
REPORT
Yes
No
Yes
No
Yes
No
DATE
AIRLINE/FLIGHT/VESSEL
LICENSE PLATE
INSPECTOR (Name and Badge Number)
STATE/COUNTRY
NUMBER
FinCEN FORM 105
"""

In [None]:
prompt = f"""
Extract information from this image in JSON.  Be very detailed and accurate.  Ensure you extract all of the information.  The fields do not have information mark the value as UNKNOWN.

Here is the raw text from the document to improve the results
<text>
{ocr_text}
</text>

Use the extract tool
"""

In [None]:
message_mm = [
    {
        "role": "user",
        "content": [
            {
                "text": prompt
            },
            {
                    "image": {
                        "format": 'png',
                        "source": {
                            "bytes": content_image
                        }
                    }
            }
        ]
    }
]

In [None]:
response = bedrock_client.converse(
    modelId="anthropic.claude-3-sonnet-20240229-v1:0",
    inferenceConfig={
        "temperature": 1.0,
        "maxTokens": 2048
    },
    messages=message_mm,
    toolConfig={"tools": tools}
)

print(json.dumps(response['output']['message']['content'][0]['toolUse']['input'],indent=2))

In [None]:
response['usage']

In [None]:
print(f"Total Cost: ${response['usage']['inputTokens']*0.003/1000 + response['usage']['outputTokens']*0.015/1000}")

In [None]:
# tables + forms + signatures + layout is 65.00 per 1000 pages.  Or 6.5 cents per page