## Multimodal Requests & Streaming

*(Coding along with deeplearning.ai's online course [Building toward Computer Use with Anthropic - Learn how an AI Assistant is built to use and accomplish tasks on computers](https://learn.deeplearning.ai/courses/building-toward-computer-use-with-anthropic/lesson/1/introduction) taught by Colt Steele)*

In [1]:
# https://learn.deeplearning.ai/courses/building-toward-computer-use-with-anthropic/lesson/3/working-with-the-api
from anthropic import Anthropic
import pandas as pd

anthropic_api_key = pd.read_csv("~/tmp/anthropic/anthropic-key-1.txt", sep=" ", header=None)[0][0]
print("Don't be a fool and sent your api key to github")

client = Anthropic(api_key=anthropic_api_key)
MODEL_NAME="claude-3-5-sonnet-20241022"

Don't be a fool and sent your api key to github


### Content Blocks

In [2]:
# taking a closer look at the message structure
messages = [
    {
        "role": "user",
        # here we're setting content to a string which is a shortcut
        "content": "tell me a joke"
    }
]

response = client.messages.create(
    messages=messages,
    model=MODEL_NAME,
    max_tokens=200
)

print(response.content[0].text)

Here's a classic one:

Why don't scientists trust atoms?
Because they make up everything! 😄


In [3]:
messages = [
    {
        "role": "user",
        # here we're setting content to a list
        # having just one string like above is a shortcut
        "content": [
            # single content block
            {"type": "text", "text": "tell me a joke about crypto trading"},
        ]
    }
]

response = client.messages.create(
    messages=messages,
    model=MODEL_NAME,
    max_tokens=200
)

print(response.content[0].text)

Here's one:

Why did the crypto trader get kicked out of the gym?

Because he kept doing too many pump and dumps!


In [4]:
messages = [
    {
        "role": "user",
        # now we have a list of content blocks in a single message
        # all list items are combined and turned into a single input block
        "content": [
            {"type": "text", "text": "who"},
            {"type": "text", "text": "made"},
            {"type": "text", "text": "you?"},
        ]
    }
]

response = client.messages.create(
    messages=messages,
    model=MODEL_NAME,
    max_tokens=200
)

print(response.content[0].text)

I'm Claude, an AI assistant created by Anthropic. I aim to be direct and honest about this.


### Image Prompts

In [5]:
# from IPython.display import Image
# Image(filename='../assets/images/food.png') 

<img src="../assets/images/food.png" width="50%" />

#### __Image Messages__

In [6]:
import base64
# opens the image file in "read binary" mode
with open("../assets/images/food.png", "rb") as image_file:
    # reads the contents of the image as a bytes object
    binary_data = image_file.read() 
    # encodes the binary data using Base64 encoding
    base_64_encoded_data = base64.b64encode(binary_data) 
    # decodes base_64_encoded_data from bytes to a string
    base64_string = base_64_encoded_data.decode('utf-8')

In [7]:
base64_string[:100] # encoded image data

'iVBORw0KGgoAAAANSUhEUgAACIIAAAW8CAYAAABciLjCAAAMTGlDQ1BJQ0MgUHJvZmlsZQAASImVVwdYU1cbPndkQggQiICMsJcg'

__How to structure a message that contains an image ?__

In [10]:
messages = [
    {
        "role": "user",
        "content": [{  
            # first item of content list
            "type": "image", # new type, above we just had 'text'
            "source": { # source key set to dictionary
                "type": "base64", # encoding format, currently only base64 is supported
                "media_type": "image/png", # the image media type
                "data": base64_string # the actual image data
            },
        },
        {
            # second content block
            "type": "text",
            "text": """How many to-go containers of each type 
            are in this image?"""
        }]
    }
]

In [11]:
response = client.messages.create(
    messages=messages,
    model=MODEL_NAME,
    max_tokens=200
)
print(response.content[0].text)

In this image, there are:
- 3 rectangular plastic containers with clear lids (appears to be standard takeout containers)
- 3 white paper/cardboard folded takeout boxes (often called "Chinese takeout boxes" or "oyster pails")

So there's a total of 6 containers shown on what appears to be a dark wooden table surface.


#### __Image Block Helper Function__

In [12]:
import base64
import mimetypes

def create_image_message(image_path):
    # open the image file in "read binary" mode
    with open(image_path, "rb") as image_file:
        # Read the contents of the image as a bytes object
        binary_data = image_file.read()
    # encode the binary data using Base64 encoding
    base64_encoded_data = base64.b64encode(binary_data)
    # decode base64_encoded_data from bytes to a string
    base64_string = base64_encoded_data.decode('utf-8')
    # get the MIME type of the image based on its file extension
    mime_type, _ = mimetypes.guess_type(image_path)
    # create the image block
    image_block = {
        "type": "image",
        "source": {
            "type": "base64",
            "media_type": mime_type,
            "data": base64_string
        }
    }
    
    
    return image_block

<img src="../assets/images/plant.png" width="50%" />

In [14]:
messages = [
    {
        "role": "user",
        "content": [
            create_image_message("../assets/images/plant.png"),
            {"type": "text", "text": "What species is this?"}
        ]
    }
]

response = client.messages.create(
    model=MODEL_NAME,
    max_tokens=2048,
    messages=messages
)
print(response.content[0].text)

This appears to be a Nepenthes pitcher plant, which is a type of carnivorous plant. The image shows the distinctive pitcher-shaped trap that these plants use to catch and digest insects. The pitcher has a yellowish-green coloration with reddish striping, which is typical of many Nepenthes species. Without more detailed views of the plant's overall structure and growing conditions, it would be difficult to identify the exact species, as there are over 170 known species of Nepenthes. These plants are native to Southeast Asia and are also known as tropical pitcher plants or monkey cups.


#### __A Real World Use Case__

<img src="../assets/images/invoice.png" width="50%" />

In [17]:
messages = [
    {
        "role": "user",
        "content": [
            create_image_message("../assets/images/invoice.png"),
            {"type": "text", "text": """
                Generate a JSON object representing the contents
                of this invoice.  It should include all dates,
                dollar amounts, and addresses. 
                Only respond with the JSON itself.
            """
            }
        ]
    }
]

response = client.messages.create(
    model=MODEL_NAME,
    max_tokens=2048,
    messages=messages
)
print(response.content[0].text)

{
  "invoice_number": "INV-2024-0042",
  "date": "2025-03-17",
  "due_date": "2025-04-16",
  "company": {
    "name": "ACME CORPORATION",
    "address": "123 Business Avenue",
    "city": "Silicon Valley",
    "state": "CA",
    "zip": "94025",
    "email": "accounts@acmecorp.com"
  },
  "bill_to": {
    "address": "789 Market Street, Suite 500",
    "city": "Los Angeles",
    "state": "CA",
    "zip": "90015"
  },
  "line_items": [
    {
      "description": "Enterprise Software License",
      "quantity": 1,
      "unit_price": 5000.00,
      "amount": 5000.00
    },
    {
      "description": "Implementation Services",
      "quantity": 40,
      "unit_price": 150.00,
      "amount": 6000.00
    },
    {
      "description": "Premium Support Plan (Annual)",
      "quantity": 1,
      "unit_price": 2500.00,
      "amount": 2500.00
    }
  ],
  "subtotal": 13500.00,
  "tax_rate": 8.5,
  "tax_amount": 1147.50,
  "total": 14647.50,
  "payment_terms": "Net 30",
  "late_fee_rate": 1.5
}


### Streaming

In [18]:
response = client.messages.create(
    max_tokens=1024,
    messages=[{"role": "user", "content": "write a poem"}],
    model=MODEL_NAME,
)

# usually we're waiting for a response until the entire response is generated
print(response.content[0].text)

Here's an original poem for you:

"Whispers of Dawn"

Morning light breaks through silver clouds,
As dewdrops dance on emerald leaves,
The world awakens from its shroud,
While gentle winds whisper through trees.

Birds paint melodies in the air,
Their songs a gift to greet the day,
Nature's symphony beyond compare,
Chasing shadows of night away.

In this moment, time stands still,
As beauty unfolds, pure and bright,
Promise lingers on the hill,
Where darkness yields to golden light.


In [24]:
# with streaming we can get content back as the content is generated
# speeds up the time till we see the first words of the response, not the response itself
with client.messages.stream(
    max_tokens=1024,
    messages=[
        {"role": "user", "content": "Write a poem about trees"},
        {"role": "assistant", "content": "Tangier"}
    ],
    model=MODEL_NAME,
) as stream:
  for text in stream.text_stream:
      print(text, end="", flush=True)

 Island, located in Chesapeake Bay,
 by day.y sinking into the waves day
 lived there for hundreds of years,
 watermen, crabbers - now facing their fears.

 seas threaten this unique way of life,
 storms grow stronger, bringing more strife.
 distinct dialect, a piece of history so rare,
 soon be lost to the sea's relentless stare.

 ancestral home,ing to their
 still roam.es erode and waters
 others must leave,, while
 to grieve.erished island continues

 this vanishing land,
 and sand?nity built on water
 one we all should know,
 climate change and the tides' ebb and flow.

 the physical disappearance of Tangier Island due to sea level rise and erosion, as well as the cultural loss that accompanies it. The island's unique community and dialect represent a living connection to early American history that is at risk of being lost.]