Ever wonder how well GPT-4V does at recalling text from an image?
Let's try it out
Want to see an overview video? Check out this tweet.
git clone https://github.com/gkamradt/MultiModalVisionTesting
cd MultiModalVisionTesting
python3 -m venv venv
source venv/bin/activate
pip install -r requirements.txt
python3 main.py
This repo already contains all of the results and images from the test.
If you want want to run a single test by yourself, delete the result from the results file you want to run and then run the script.
Or if you want to start completely from scratch, uncomment vtt.reset_results()
in main.py
Shoutout to Bryan Bischof for his conversations and feedback on this project.
Note: This repo will likely move over to the Needle/Haystack repository as we build out more tests there
Contributions welcome, but help would be higher leverage by implmenting this on needle/haystack first.
LICENSE: Feel free to do whatever you want with the code, attribution would be appreciated.
Made with ❤️ by Greg Kamradt