vision

vision is a simple OpenAI CLI and GPTScript Tool for interacting with vision models.

Prerequisites

NodeJS
OpenAI API key

Usage

Import vision into any .gpt script by referencing this GitHub repo.

Tools: github.com/gptscript-ai/gpt4-v-vision

Describe the images at the following locations:
- examples/eiffel-tower.png
- https://avatars.githubusercontent.com/u/158112119?s=400&u=d2c6ae055a80ced8209f4aab2562986a97d79e9f&v=4

You will be prompted to enter your OpenAI API key if you have not provided it before.

Testing Changes

Clone this repository or download the source code:

git clone git@github.com:gptscript-ai/gpt4-v-vision.git
cd gpt4-v-vision

Install the npm dependencies
```
npm install 
```

Import the local tools.gpt file to test local changes

Here's a simple example:

# The tool script import path is relative to the directory of the script importing it; in this case ./examples
Tools: ../tool.gpt
Description: This script is used to test local changes to the vision tool by invoking it with a simple prompt and image references.

Describe the images at the following locations:
- examples/eiffel-tower.png
- https://avatars.githubusercontent.com/u/158112119?s=400&u=d2c6ae055a80ced8209f4aab2562986a97d79e9f&v=4

It can be run from the root directory of this repo

# Disable response caching to ensure the tool is always called for testing purposes
gptscript --disable-cache examples/test.gpt

Running the CLI

$ node index.js --help
Usage: index [options] <prompt> <images...>

Utility for processing images with the OpenAI API

Arguments:
  prompt                      Prompt to send to the vision model
  images                      List of image URIs to process. Supports file:// and https:// protocols. Images must be jpeg or png.

Options:
  --openai-api-key <key>      OpenAI API Key (env: OPENAI_API_KEY)
  --openai-base-url <string>  OpenAI base URL (env: OPENAI_BASE_URL)
  --openai-org-id <string>    OpenAI Org ID to use (env: OPENAI_ORG_ID)
  --max-tokens <number>       Max tokens to use (default: 2048, env: MAX_TOKENS)
  --model <model>             Model to process images with (choices: "gpt-4-vision-preview", default: "gpt-4-vision-preview", env: MODEL)
  --detail <detail>           Fidelity to use when processing images (choices: "low", "high", "auto", default: "auto", env: DETAIL)
  -h, --help                  display help for command

Ask a question about an image in a local file

node index.js 'Describe the picture' 'file://examples/eiffel-tower.png'

Ask a question about an image at a remote URL

node index.js 'Describe the picture' 'https://github.com/gptscript-ai/vision/blob/main/examples/eiffel-tower.png?raw=true'

Ask a question related to multiple images

node index.js 'Do you think these two portraits are by the same artist?' 'https://github.com/gptscript-ai/vision/blob/main/examples/eiffel-tower.png?raw=true' 'file://examples/eiffel-tower.png'

Name		Name	Last commit message	Last commit date
Latest commit History 23 Commits
credential		credential
examples		examples
.gitignore		.gitignore
README.md		README.md
index.js		index.js
package.json		package.json
tool.gpt		tool.gpt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

credential

credential

examples

examples

.gitignore

.gitignore

README.md

README.md

index.js

index.js

package.json

package.json

tool.gpt

tool.gpt

Repository files navigation

vision

Prerequisites

Usage

Testing Changes

Running the CLI

Ask a question about an image in a local file

Ask a question about an image at a remote URL

Ask a question related to multiple images

About

Releases

Packages

Contributors 5

Languages

gptscript-ai/gpt4-v-vision

Folders and files

Latest commit

History

Repository files navigation

vision

Prerequisites

Usage

Testing Changes

Running the CLI

Ask a question about an image in a local file

Ask a question about an image at a remote URL

Ask a question related to multiple images

About

Resources

Stars

Watchers

Forks

Languages