LLMs are known to have very different response styles and accuracy when comparing text prompts to image creation or image analysis. Sometimes very fundamental truths can be stretched, and I am curious to find out whether or not Anthropic and Gemini differ in their descriptions of a circle when prompted textually compared to visually. 

I will start with prompting Anthropic and Gemini with text first, then I will upload an image of a circle and ask them to describe the image the same way, comparing the response types.

This could reveal important information behind how these models respond to text and visual prompting, where text responses could be more detail and information-oriented, and visual responses could be more aesthetically-focused.

## Anthropic
How well can Anthropic describe shapes with no image prompting?

In [1]:
from dotenv import load_dotenv
from anthropic import Anthropic
import base64
import httpx
from IPython.display import display, Image, Markdown

# load api keys
_ = load_dotenv()

In [2]:
client = Anthropic()

In [3]:
for model in client.models.list().data:
    print(model.id)

claude-3-7-sonnet-20250219
claude-3-5-sonnet-20241022
claude-3-5-haiku-20241022
claude-3-5-sonnet-20240620
claude-3-haiku-20240307
claude-3-opus-20240229
claude-3-sonnet-20240229
claude-2.1
claude-2.0


In [4]:
OPUS = 'claude-3-opus-20240229'
SONNET = 'claude-3-5-sonnet-20241022'
HAIKU = 'claude-3-5-haiku-20241022'

In [5]:
message = client.messages.create(
    model=HAIKU,
    max_tokens=1000,
    temperature=0.0,
    system="You are a helpful assistant.",
    messages=[
        {"role": "user", "content": "Can you please describe, in as much detail as possible, what a circle is?"}
    ]
)

print(message.content[0].text)

A circle is a fundamental geometric shape defined as a set of points in a plane that are all equidistant from a central point, called the center. Here's a detailed description:

Geometric Properties:
- Perfectly round and symmetrical
- Two-dimensional shape
- Has no corners or edges
- Continuous and smooth curve

Key Components:
1. Center Point
- The fixed point from which all points on the circle's circumference are the same distance
- Located at the middle of the circle

2. Radius
- A line segment from the center to any point on the circumference
- All radii in a circle are exactly equal in length
- Determines the circle's size

3. Circumference
- The complete outer boundary of the circle
- Length calculated by the formula: 2πr (where r is the radius)
- Represents the total distance around the circle

4. Diameter
- A line segment passing through the center
- Connects two points on the circumference
- Always twice the length of the radius
- Splits the circle exactly in half

Mathemati

In [20]:
def encode_image(image_path):
  with open(image_path, "rb") as image_file:
    return base64.b64encode(image_file.read()).decode('utf-8')

circle = encode_image('circle.png')

In [21]:
message = client.messages.create(
    model=HAIKU,
    max_tokens=1024,
    messages=[
        {
            "role": "user",
            "content": [
                {
                    "type": "image",
                    "source": {
                        "type": "base64",
                        "media_type": 'image/png',
                        "data": circle,
                    },
                },
                {
                    "type": "text",
                    "text": "Can you please describe what the image is in as much detail as possible?"
                }
            ],
        }
    ],
)
print(message)

Message(id='msg_01DYjQ9m9qw7J2xadWLazebi', content=[TextBlock(citations=None, text='The image shows a simple, perfect circle drawn with a black outline on a white background. The circle is symmetrical, with a uniform line weight and crisp, clean edges. Its diameter appears to be consistent throughout, creating a precise geometric shape. There are no additional details, patterns, or markings within the circle - it is a basic, minimalist representation of a circle.', type='text')], model='claude-3-5-haiku-20241022', role='assistant', stop_reason='end_turn', stop_sequence=None, type='message', usage=Usage(cache_creation_input_tokens=0, cache_read_input_tokens=0, input_tokens=126, output_tokens=80))


## Gemini
How well can Gemini describe shapes compared to Anthropic?

In [9]:
import google.generativeai as genai
from IPython.display import Image, display, Markdown
from dotenv import load_dotenv
import os
_ = load_dotenv()

In [10]:
genai.configure()

In [12]:
model = genai.GenerativeModel(model_name="models/gemini-1.5-pro-latest")

response = model.generate_content("Can you please describe, in as much detail as possible, what a circle is?")

print(response.text)

A circle is a fundamental geometric shape defined as follows:

**Formal Definition:**

A circle is the set of all points in a plane that are equidistant from a given point called the *center*.  The constant distance between the center and any point on the circle is called the *radius*.

**Key Properties and Features:**

* **Center:** The fixed point in the plane from which all points on the circle are equidistant.
* **Radius (r):**  The distance between the center and any point on the circle.  It's a crucial defining characteristic.  All radii within a given circle are of equal length.
* **Diameter (d):** A line segment that passes through the center and whose endpoints lie on the circle.  The diameter is twice the length of the radius (d = 2r).
* **Circumference (C):** The distance around the circle.  It's calculated using the formula C = 2πr or C = πd, where π (pi) is a mathematical constant approximately equal to 3.14159.
* **Area (A):** The region enclosed by the circle.  It's calc

In [16]:
sample_file = genai.upload_file(path="circle.png",
                                display_name="circle")

print(f"Uploaded file '{sample_file.display_name}' as: {sample_file.uri}")

Uploaded file 'circle' as: https://generativelanguage.googleapis.com/v1beta/files/fmd3frbfwft5


In [18]:
model = genai.GenerativeModel(model_name="models/gemini-1.5-pro-latest")

response = model.generate_content(["Can you please describe what the image is in as much detail as possible?", sample_file])

print(response.text)

The image is a simple, unfilled black circle on a white background. The circle's line is of uniform thickness and appears to be perfectly smooth, with no breaks or variations in its width.  There are no other elements or markings within or outside the circle.  The background is a solid, untextured white.


## Discussion
When comparing Anthropic and Gemini, their response tones appear very similar when grouping between text and image prompting. 

### Text Prompting
For both LLMs, the text responses for describing a circle were highly mathematical, with descriptors of applications towards identifying circles, creating circles, and using circles in relational fields such as physics. 

I deemed this to be an accurate and highly education-based response, even teaching me some new things about the properties of a circle. 

### Image Prompting
Opposingly, when presenting a standard image of a circle from Wikipedia, both LLMs took a significantly more relaxed approach. Both the text and image prompts included the words "in as much detail as possible", yet the text response was several paragraphs long and the image response was 2-3 sentences long. 

The image response included details on thickness and color, with nothing much else about the image's properties or applications, let along identification. 

## Conclusion
It seems as though these LLMs' image identification is much less adaptive than the text response, where it seems that both LLMs were able to quickly move past the idea of a text-based circle and include highly-educated responses, yet image-based circles seemed to get stuck in the identification phase. 

In this way, this is highly descriptive of many of the problems seen in LLM-based image identification and creation. Text-based responses are not immune to hallucinations or inconsistencies, however it seems as though the reasoning behind text-based responses are more developed and accurate than image-based responses.

I am curious to see if emerging new models have this same similar issue, or if this remains a problem in the base theories behind image identification and generation in LLMs. Theoretically, I believe these LLMs should be tuned to identify some base images, including a circle, so that when I present an image of a circle, it can describe thickness and colors as well as its mathematical applications. 