## Do open source VLMs reason with math?

In [1]:
# import httpx
import os
# import base64
import PIL.Image
import time
import google.generativeai as genai

  from .autonotebook import tqdm as notebook_tqdm


### Test variables

In [2]:
## Configure model to run
api_key = os.environ.get("GOOGLE_API_KEY")
genai.configure(api_key=api_key)

## Select model
model_name = "gemini-1.5-flash"
#model_name = "gemini-1.5-pro"
#model_name = "gemini-2.0-flash-exp"
model = genai.GenerativeModel(model_name = model_name)

## Choose images
image_1_path = "./op1.png"  # (12+335)=347
image_2_path = "./op2.png"  # (3+14-25)=-8
image_3_path = "./op3.png"  # (55*14)=770

## Load images
image_1 = PIL.Image.open(image_1_path)
image_2 = PIL.Image.open(image_2_path)
image_3 = PIL.Image.open(image_3_path)

## Select prompts
prompt_1 = "Solve the mathematical operation in the image"
prompt_2 = "This image contains numbers and one mathematical operation symbol. Please, describe the equation in the image and every step along the way to solve it."
prompt_3 = "Solve the mathematical operation in the image. Provide only the final result."

#### Image 1:     12 + 335 = 347

In [3]:
t1 = time.time()
response = model.generate_content([prompt_1, image_1])
t2 = time.time()
print(t2-t1)
print(response.text)

1.3748745918273926
Here's the solution:

12 + 335 = 347


#### Image 2:       3+14-25=-8 

In [4]:
t1 = time.time()
response = model.generate_content([prompt_1, image_2])
t2 = time.time()
print(t2-t1)
print(response.text)

1.3265769481658936
Here's how to solve the equation 3 + 14 - 25:

1. **Addition:** 3 + 14 = 17

2. **Subtraction:** 17 - 25 = -8

Therefore, the answer is $\boxed{-8}$


#### Image 3: 55 * 14 = 770

In [5]:
t1 = time.time()
response = model.generate_content([prompt_1, image_3])
t2 = time.time()
print(t2-t1)
print(response.text)

0.8219888210296631
55 x 14 = 770


#### Image 1: Refine prompt

In [6]:
t1 = time.time()
response = model.generate_content([prompt_2, image_1])
t2 = time.time()
print(t2-t1)
print(response.text)

3.4331560134887695
The image shows the equation 12 + 335.  Here's how to solve it:

**1. Add the ones column:** 2 + 5 = 7

**2. Add the tens column:** 1 + 3 = 4

**3. Add the hundreds column:**  3 (there's only a 3 in the hundreds column)

**4. Combine the results:** The sum of the ones, tens, and hundreds columns gives us the final answer.

Therefore, 12 + 335 = 347


#### Image 2: Refine prompt

In [7]:
t1 = time.time()
response = model.generate_content([prompt_2, image_2])
t2 = time.time()
print(t2-t1)
print(response.text)

1.396599531173706
The image contains the equation 3 + 14 - 25.  Here's how to solve it step-by-step:

1. **Addition:** First, perform the addition operation: 3 + 14 = 17

2. **Subtraction:** Next, perform the subtraction operation: 17 - 25 = -8

Therefore, the solution to the equation 3 + 14 - 25 is $\boxed{-8}$


#### Image 3: Refine prompt

In [8]:
t1 = time.time()
response = model.generate_content([prompt_2, image_3])
t2 = time.time()
print(t2-t1)
print(response.text)

4.252299785614014
The image shows the equation 55 x 14.  Here's how to solve it:

**Method 1: Standard Multiplication**

1. **Multiply 55 by 4 (the ones digit of 14):**
   55 x 4 = 220. Write down 220.

2. **Multiply 55 by 10 (the tens digit of 14):**
   55 x 10 = 550.  Write this below the 220, but shifted one place to the left (because it represents hundreds and thousands).

3. **Add the two results:**
   220
+ 550
-------
  770

Therefore, 55 x 14 = 770

**Method 2: Distributive Property**

This method breaks down the multiplication into smaller, easier steps:

1. **Rewrite 14 as 10 + 4:**  The equation becomes 55 x (10 + 4)

2. **Distribute the 55:** This means multiply 55 by both 10 and 4 separately: (55 x 10) + (55 x 4)

3. **Calculate each multiplication:**
   55 x 10 = 550
   55 x 4 = 220

4. **Add the results:** 550 + 220 = 770

Therefore, 55 x 14 = 770  using the distributive property.


#### Last try: ask for just the answer

In [9]:
t1 = time.time()
response = model.generate_content([prompt_3, image_1])
t2 = time.time()
print(t2-t1)
print(response.text)

1.94980788230896
347


In [10]:
t1 = time.time()
response = model.generate_content([prompt_3, image_2])
t2 = time.time()
print(t2-t1)
print(response.text)

0.6368255615234375
-8


In [12]:
t1 = time.time()
response = model.generate_content([prompt_3, image_3])
t2 = time.time()
print(t2-t1)
print(response.text)

4.519131898880005
220
