Skip to content

gkamradt/MultiModalVisionTesting

Repository files navigation

Vision Testing Multi-Modal Models

Ever wonder how well GPT-4V does at recalling text from an image?

Let's try it out

Want to see an overview video? Check out this tweet.

git clone https://github.com/gkamradt/MultiModalVisionTesting

cd MultiModalVisionTesting

python3 -m venv venv

source venv/bin/activate

pip install -r requirements.txt

python3 main.py

results

This repo already contains all of the results and images from the test.

If you want want to run a single test by yourself, delete the result from the results file you want to run and then run the script.

Or if you want to start completely from scratch, uncomment vtt.reset_results() in main.py

Shoutout to Bryan Bischof for his conversations and feedback on this project.

Note: This repo will likely move over to the Needle/Haystack repository as we build out more tests there

Contributions welcome, but help would be higher leverage by implmenting this on needle/haystack first.

LICENSE: Feel free to do whatever you want with the code, attribution would be appreciated.

Made with ❤️ by Greg Kamradt

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published