Vision Testing Multi-Modal Models

Ever wonder how well GPT-4V does at recalling text from an image?

Let's try it out

Want to see an overview video? Check out this tweet.

git clone https://github.com/gkamradt/MultiModalVisionTesting

cd MultiModalVisionTesting

python3 -m venv venv

source venv/bin/activate

pip install -r requirements.txt

python3 main.py

This repo already contains all of the results and images from the test.

If you want want to run a single test by yourself, delete the result from the results file you want to run and then run the script.

Or if you want to start completely from scratch, uncomment vtt.reset_results() in main.py

Shoutout to Bryan Bischof for his conversations and feedback on this project.

Note: This repo will likely move over to the Needle/Haystack repository as we build out more tests there

Contributions welcome, but help would be higher leverage by implmenting this on needle/haystack first.

LICENSE: Feel free to do whatever you want with the code, attribution would be appreciated.

Made with ❤️ by Greg Kamradt

Name		Name	Last commit message	Last commit date
Latest commit History 3 Commits
background_text		background_text
images		images
results		results
static		static
visualizations		visualizations
.gitignore		.gitignore
README.md		README.md
animation.js		animation.js
index.html		index.html
main.py		main.py
random_characters.py		random_characters.py
randomizer.py		randomizer.py
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

background_text

background_text

images

images

results

results

static

static

visualizations

visualizations

.gitignore

.gitignore

README.md

README.md

animation.js

animation.js

index.html

index.html

main.py

main.py

random_characters.py

random_characters.py

randomizer.py

randomizer.py

requirements.txt

requirements.txt

Repository files navigation

Vision Testing Multi-Modal Models

About

Releases

Packages

Languages

gkamradt/MultiModalVisionTesting

Folders and files

Latest commit

History

Repository files navigation

Vision Testing Multi-Modal Models

About

Resources

Stars

Watchers

Forks

Languages