## GPT-4v

- link: https://platform.openai.com/docs/guides/vision
- format: png, jpeg/jpg, webp, gif
- tokens for image
    - high: scale longest side to 2048px, scale shortest side to 768px, count the number of 512px squares, each 512px square cost 170 tokens, add 85 tokens
    - low: 85 tokens (resized to 256px ?)
- notes: no support of metadata


In [2]:
import openai
from openai import OpenAI
from os import getenv
from dotenv import load_dotenv
load_dotenv()
OPENAI_KEY = getenv("OPENAI_KEY")
OPENAI_ORG = getenv("OPENAI_ORG")

import urllib.request
from PIL import Image
import base64
import requests

### 1. Image Url

In [4]:
image_url = "https://upload.wikimedia.org/wikipedia/commons/thumb/d/dd/Gfp-wisconsin-madison-the-nature-boardwalk.jpg/2560px-Gfp-wisconsin-madison-the-nature-boardwalk.jpg"
client = OpenAI(
    api_key=OPENAI_KEY,
    organization=OPENAI_ORG
)
response = client.chat.completions.create(
    model="gpt-4-vision-preview",
    messages=[
        {
            "role": "user",
            "content": [
                {"type": "text", "text": "What's in this image?"},
                {
                    "type": "image_url",
                    "image_url": image_url,
                }
            ]
        }
    ],
    max_tokens=300,
)
print(response.choices[0].message.content)

urllib.request.urlretrieve(image_url, "example.jpg")
# img = Image.open('example.jpg')
# img

The image displays a serene landscape featuring a wooden boardwalk extending through a lush green meadow. The meadow is surrounded by diverse vegetation, possibly indicating a natural, possibly wetland, environment. The sky above is partly cloudy, suggesting a fair weather day with ample sunlight enhancing the vibrancy of the scene.


('example.jpg', <http.client.HTTPMessage at 0x7f2e56901910>)

In [5]:
# Function to encode the image
def encode_image(image_path):
    with open(image_path, "rb") as image_file:
        return base64.b64encode(image_file.read()).decode('utf-8')

# Path to your image
image_path = "example.jpg"

# Getting the base64 string
base64_image = encode_image(image_path)

headers = {
    "Content-Type": "application/json",
    "Authorization": f"Bearer {OPENAI_KEY}",
    "OpenAI-Organization": f"{OPENAI_ORG}"
}

payload = {
    "model": "gpt-4-vision-preview",
    "messages": [
      {
        "role": "user",
        "content": [
          {
            "type": "text",
            "text": "What's in this image?"
          },
          {
            "type": "image_url",
            "image_url": {
              "url": f"data:image/jpeg;base64,{base64_image}"
            }
          }
        ]
      }
    ],
    "max_tokens": 300
}

response = requests.post("https://api.openai.com/v1/chat/completions", headers=headers, json=payload)

print(response.json()["choices"][0]["message"]["content"])

This image shows a beautiful natural landscape featuring a wooden boardwalk extending through a lush green field with tall grasses. The sky is mostly clear with a few fluffy clouds, and it appears to be a sunny day. The scene looks peaceful and inviting, perfect for a nature walk or for simply enjoying the outdoors. The boardwalk provides a path that encourages exploration while protecting the natural vegetation. Trees and bushes are visible in the distance, adding to the diversity of the greenery.
