Pixie - Command-line Image Analysis Using GPT-4 with Vision API

Overview

Pixie analyzes an image using OpenAI's GPT-4 with Vision API. The script allows users to provide an image file, optionally provide a prompt to guide the analysis, and specify the maximum number of output tokens. The script then scales the image so its largest dimension is 512 pixels or less, sends the image to GPT-4 for analysis, and outputs the GPT-4 analysis along with some basic information about the image and the approximate computational cost incurred.

Features

Image analysis using GPT-4 with Vision API.
Customizable prompts for tailored image analysis.
Control over the maximum number of output tokens.

Prerequisites

Python 3.x
A virtual environment (recommended for Python package management)

Setup

Step 1: Clone the Repository

Clone the repository to your local machine using:

git clone https://github.com/peterkaminski/pixie

Step 2: Create a Virtual Environment

Navigate to the cloned directory and create a virtual environment:

python3 -m venv venv

Step 3: Activate the Virtual Environment

Activate the virtual environment.

On Windows:

.\venv\Scripts\activate

On Unix or MacOS:

source venv/bin/activate

Step 4: Install Required Packages

Install the necessary Python packages:

pip install -r requirements.txt

Step 5: Set up OpenAI API key

Sign up for an OpenAI developer account, and generate an API key in settings. See Where do I find my API Key? in the OpenAI Help Center. OpenAI provides a document about Best Practices for API Key Safety, which you should review.

Then on Unix or MacOS, copy the env.sh-template file to env.sh, then edit env.sh to insert your OpenAI API key. Add your API key to the shell environment:

source env.sh

On Windows, the process will be similar, but we do not have documentation for the process yet.

In any case, make sure you do not push any file with your API key back to your upstream Git host, such as GitHub.

Usage

Running the Script

Run the script like this:

./pixie.py [-p <optional-prompt>] [-t <max-tokens>] -i <image-file-name>

Arguments

-i, --input: (Required) The path and name of the image file to analyze.
-p, --prompt: (Optional) A prompt to guide the GPT-4 analysis. Default is "Describe this image."
-t, --tokens: (Optional) The maximum number of output tokens. Default is 300.

Example

./pixie.py -i example.jpg -p "Consider this image. Print applicable keywords and a short description." -t 500

This command analyzes the image example.jpg with the prompt "Consider this image. Print applicable keywords and a short description." and a maximum of 500 tokens for the output.

You can leave off the -p and -t arguments, and sensible defaults will be used.

Output

The script will output:

The original dimensions of the image.
The aspect ratio of the image.
The number of tokens used in the analysis and the approximate cost.
A warning note if the maximum output tokens limit was met.
The analysis response from GPT-4.

Processing Multiple Files

Pixie can be used to process multiple files using standard shell techniques. Here are two examples for Unix or MacOS.

for i in *.png ; do /path/to/pixie.py -i "$i" > "$i.txt"; done

find /path/to/images -iname '*.png' \
-exec sh -c '/path/to/pixie.py -i "$1" > "$1.txt"' _ {} \;

Pixie currently outputs to STDOUT, but in the future, there will be a --output / -o flag to specify output destination, which will be preferred over the output redirection techniques above.

Notes

The software is provided "as is", without warranty of any kind. See the LICENSE file for more details.
This is a "1.0" version of the script, so some features may be missing, more error checking may be needed, etc. Please provide feedback!
You are responsible for providing your OpenAI API key to operate the software. Your API key is not part of this software. You are responsible for keeping your API key secure.
The image analysis is generated by GPT-4; if you get a surprising written response, or a refusal to analyze a script, it's due to OpenAI and GPT-4. Try providing a different prompt with --prompt.
The approximate cost is calculated using hardcoded prices retrieved on 2023-12-15 from OpenAI Pricing. Before you can rely on the approximate price calcuation, you must confirm the prices are up-to-date and the same as given on the OpenAI website.
This version is hard-coded to use the low detail setting, to maintain lower API costs for the user. A future version may enable the use of high detail as well. The low detail setting requires
If the GPT-4 output is truncated due to reaching the maximum token limit, consider increasing the --tokens value.
see the [[Roadmap.md]] file for plans and ideas for future development.

License

This project is licensed under the MIT License. See the LICENSE file for details.

Contributing

Contributions to this script are welcome. Feel free to fork, modify, and send a pull request.

Support

For any issues or questions, please open an issue in the repository or contact Peter Kaminski.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

.gitignore

.gitignore

LICENSE

LICENSE

README.md

README.md

Roadmap.md

Roadmap.md

env.sh-template

env.sh-template

pixie.py

pixie.py

requirements.txt

requirements.txt

Repository files navigation

Pixie - Command-line Image Analysis Using GPT-4 with Vision API

Overview

Features

Prerequisites

Setup

Step 1: Clone the Repository

Step 2: Create a Virtual Environment

Step 3: Activate the Virtual Environment

Step 4: Install Required Packages

Step 5: Set up OpenAI API key

Usage

Running the Script

Arguments

Example

Output

Processing Multiple Files

Notes

License

Contributing

Support

About

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 11 Commits
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
Roadmap.md		Roadmap.md
env.sh-template		env.sh-template
pixie.py		pixie.py
requirements.txt		requirements.txt

License

peterkaminski/pixie

Folders and files

Latest commit

History

Repository files navigation

Pixie - Command-line Image Analysis Using GPT-4 with Vision API

Overview

Features

Prerequisites

Setup

Step 1: Clone the Repository

Step 2: Create a Virtual Environment

Step 3: Activate the Virtual Environment

Step 4: Install Required Packages

Step 5: Set up OpenAI API key

Usage

Running the Script

Arguments

Example

Output

Processing Multiple Files

Notes

License

Contributing

Support

About

Resources

License

Stars

Watchers

Forks

Languages