<img src="https://drive.google.com/uc?id=1aDMlYVXlyWUCMcOXwtZPF77BXAhvZzwM" alt="Alt text" width="700"/>

In this notebook, we'll see how to generate captions out of an image using OpenAI models.

In [None]:
%%capture --no-stderr
%pip install --quiet -U openai gdown

In [None]:
import os, getpass

def _set_env(var: str):
    if not os.environ.get(var):
        os.environ[var] = getpass.getpass(f"{var}: ")

_set_env("OPENAI_API_KEY")

First things first, let’s download a beautiful photo of Bengaluru.

In [None]:
import gdown

# Google Drive file ID
file_id = "1g8ybWMjFVrasbaXPgHG4hD9Er57pL6Cj"
url = f"https://drive.google.com/uc?id={file_id}"

# Download the image
gdown.download(url, output="image.png", quiet=False)

Now, load the image using Pillow and make sure it's properly resized before sending it to the LLM.

> This is a very important step!! If you send HD pictures, your token usage will be way bigger!

In [None]:
from PIL import Image

# Open and resize the image
img = Image.open("image.png").convert("RGB")
img_resized = img.resize((512, 512))

In [None]:
img_resized

We can't send the images as-is—first, we need to encode them using base64.

In [None]:
import base64
from io import BytesIO

def encode_image(image: Image.Image) -> str:
    buffered = BytesIO()
    image.save(buffered, format="PNG")
    return base64.b64encode(buffered.getvalue()).decode()


base64_image = encode_image(img_resized)

Finally, let's call the model.

In [None]:
from openai import OpenAI

client = OpenAI()

chat_completion = client.chat.completions.create(
    messages=[
        {
            "role": "user",
            "content": [
                {
                    "type": "text",
                    "text": "Describe what you see in the picture",
                },
                {
                    "type": "image_url",
                    "image_url": {
                        "url": f"data:image/jpeg;base64,{base64_image}",
                    },
                },
            ],
        }
    ],
    model="gpt-4o",
)

In [None]:
print(chat_completion.choices[0].message.content)
