How well do the GPT-4V and Gemini Pro Vision models perform zero-shot Visual Question Answering (VQA) on Data Structures?
We create a standard, repeatable process for selecting and obtaining VQA tasks in accordance with the Computer Science Curricula 2013: Curriculum Guidelines for Undergraduate Degree Programs in Computer Science by The Joint Task Force on Computing Curricula Association for Computing Machinery (ACM) IEEE Computer Society.
The following instructions use a bash terminal and assume you have Python and Git installed on your machine.
-
Clone the repository
git clone https://github.com/gutbash/lmm-graph-tree-vqa.git cd lmm-graph-tree-vqa
-
Create a virtual environment
python -m venv .venv
-
Activate the virtual environment
Linux and macOS:
source .venv/bin/activate
Windows:
source .venv/Scripts/activate
-
Install the dependencies
pip install -r requirements.txt
-
Set Environment Variables
mv .env.example .env
Edit the
.env
file and set the environment variables.
Structures
At the core of the project are the data structures. These are the base structures that are used to generate images for the VQA tasks.
There are four base classes: BinaryTree
, BinarySearchTree
, DirectedGraph
, UndirectedGraph
.
You can generate an individual image directly from these classes, but it is not the conventional approach.
The following example generates an image of a binary tree:
from generation.structures.tree import BinaryTree
structure = BinaryTree()
structure.generate()
structure.fill()
structure.draw(save=True, path='test.png')
Generators
Generate an individual image.
The conventional way to generate an individual image is to use the Generator
.
The following example does the same as the previous example:
from generation.structures.tree import BinaryTree
from generation.generator import Generator
from pathlib import Path
import asyncio
generator = Generator()
async def run_generation():
generated = await generator.generate_structure(structure_class=BinaryTree)
filled = await generator.fill_structure(structure_instance=generated)
await generator.draw_structure(structure_instance=filled, save=True, save_path=Path('test/'), save_name='test.png')
asyncio.run(run_generation())
Batch Generators
Generate a batch of images.
Use the BatchGenerator
to create a batch of images. This will also link text and image prompts into the yaml
data.
The following example generates a batch of binary trees:
from generation.structures.tree import BinaryTree
from generation.generator import BatchGenerator
from pathlib import Path
import asyncio
batch_generator = BatchGenerator()
async def run_batch():
await batch_generator.generate_batch(
structure_class=BinaryTree,
type='bit',
yaml_name='binary_tree.yaml',
yaml_path=Path('data/'),
save_path=Path('images/binary_tree/'),
text_path=Path('text/'),
text_name='binary_tree_text.yaml',
)
asyncio.run(run_batch())
Messages
Build a template for a prompt for a model with a list of messages.
OpenAI can use the following message types for prompts: SystemMessage
, UserMessage
, and AssistantMessage
.
The following example creates a typical prompt for OpenAI:
from evaluation.models.messages.message import UserMessage, SystemMessage, AssistantMessage
from pathlib import Path
openai_messages = [
UserMessage(content="Answer this question: What is in this image?", images=[Path('test/test.png')]),
]
DeepMind can use the following message types for prompts: ImageMessage
and BaseMessage
.
The following example creates a typical prompt for DeepMind:
from evaluation.models.messages.message import ImageMessage, BaseMessage
from pathlib import Path
deepmind_messages = [
BaseMessage(content="Answer this question: What is in this image?"),
ImageMessage(image=Path('test/test.png')),
]
Message Keys
Insert text/image prompts from the
yaml
data into messages.
Keys are replaced with the yaml
data's text and image prompts during evaluation. Within a message, there are two keys that can be used within a string of a message's content or image:
{{content}}
for the text prompt{{image}}
for the image prompt
The following example shows the same message lists as the previous examples using message keys:
from evaluation.models.messages.message import UserMessage, SystemMessage, AssistantMessage, ImageMessage, BaseMessage
openai_messages = [
UserMessage(content="Answer this question: {{content}}", images=["{{image}}"]),
]
deepmind_messages = [
BaseMessage(content="Answer this question: {{content}}"),
ImageMessage(image="{{image}}"),
]
Models
Create instances of models for evaluation.
There are two models that can be created for evaluation: OpenAI
and DeepMind
.
The following example creates instances of both models:
from evaluation.models.openai import OpenAI
from evaluation.models.deepmind import DeepMind
from dotenv import load_dotenv
import asyncio
import os
load_dotenv()
openai_api_key = os.environ.get('OPENAI_API_KEY_DEV')
deepmind_api_key = os.environ.get('DEEPMIND_API_KEY_DEV')
openai = OpenAI(api_key=openai_api_key)
deepmind = DeepMind(api_key=deepmind_api_key)
You can directly run completions from these models given a list of messages:
from evaluation.models.messages.message import UserMessage, SystemMessage, ImageMessage, BaseMessage
openai_messages = [UserMessage(content="{{content}}", images=["{{image}}"])]
deepmind_messages = [BaseMessage(content="{{content}}"), ImageMessage(image="{{image}}")]
async def run_completions():
await openai.arun(messages=openai_messages)
await deepmind.arun(messages=deepmind_messages)
asyncio.run(run_completions())
Evaluation
Evaluate models on prompts once images are batch generated and automatically linked to the yaml
data with the Evaluator
.
The following example evaluates the OpenAI model on a batch of binary trees:
from evaluation.evaluator import Evaluator
from evaluation.models.openai import OpenAI
from evaluation.models.messages.message import UserMessage, SystemMessage, AssistantMessage
from pathlib import Path
from dotenv import load_dotenv
import asyncio
import os
load_dotenv()
openai_api_key = os.environ.get('OPENAI_API_KEY_DEV')
openai = OpenAI(
api_key=openai_api_key,
)
messages = [UserMessage(content="{{content}}", images=["{{image}}"])]
evaluator = Evaluator()
async def run_evaluation():
await evaluator.evaluate(
model=openai,
messages=messages,
yaml_path=Path('data/'),
yaml_name='binary_tree.yaml',
csv_path=Path('results/'),
csv_name='openai.csv',
repeats=3,
)
asyncio.run(run_evaluation())