GridGPT

Grid GPT is a class you can use in your ChatGPT vision api projects that will allow your ChatGPT with vision to finally give you accurate you coordinates when you ask it to locate specific objects. This will enable things like allowing ChatGPT vision to accurately control your mouse through the gui and click on things. There is some extra work to be do to add the translation layer from grid cell to actual button click but that is pretty easy to solve.

#Example 1: Small Image 50 pixel cell size - ask for a single grid

#Example 2: Large 4k Image 100 pixel cell size - ask for group of cell grids

#Example 3: Full prompt example first message in a new chat with ChatGPT and Vision nails it

How it works

During runtime it takes your image and uses Pillow to

Lay down cells of transparent white background. You modify the intensity of this in the code.
Draw the grids according to cell size. They should cover the entire picture even if the cell size doesnt divide by your image size correctly.
Add in a transparent text identifier so ChatGPT can tell the grids apart.
Take the output file and send it to ChatGPT Vision along with the prompt.txt I included(it require two modifications).
After some fine tuning of the parameters ChatGPT will be able to tell you exactly what grid cells to click on for the object you are looking for.

Name		Name	Last commit message	Last commit date
Latest commit History 6 Commits
.gitignore		.gitignore
ExampleResponse.jpg		ExampleResponse.jpg
LICENSE		LICENSE
ObjectsDark.jpg		ObjectsDark.jpg
ObjectsWhite.jpg		ObjectsWhite.jpg
README.md		README.md
VisionPromptExample.txt		VisionPromptExample.txt
alpha_numeric_grid_image.jpg		alpha_numeric_grid_image.jpg
arial.ttf		arial.ttf
main.py		main.py
x.jpg		x.jpg

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

GridGPT

How it works

About

Releases

Packages

Languages

License

quinny1187/GridGPT

Folders and files

Latest commit

History

Repository files navigation

GridGPT

How it works

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages