Pixie analyzes an image using OpenAI's GPT-4 with Vision API. The script allows users to provide an image file, optionally provide a prompt to guide the analysis, and specify the maximum number of output tokens. The script then scales the image so its largest dimension is 512 pixels or less, sends the image to GPT-4 for analysis, and outputs the GPT-4 analysis along with some basic information about the image and the approximate computational cost incurred.
- Image analysis using GPT-4 with Vision API.
- Customizable prompts for tailored image analysis.
- Control over the maximum number of output tokens.
- Python 3.x
- A virtual environment (recommended for Python package management)
Clone the repository to your local machine using:
git clone https://github.com/peterkaminski/pixie
Navigate to the cloned directory and create a virtual environment:
python3 -m venv venv
Activate the virtual environment.
On Windows:
.\venv\Scripts\activate
On Unix or MacOS:
source venv/bin/activate
Install the necessary Python packages:
pip install -r requirements.txt
Sign up for an OpenAI developer account, and generate an API key in settings. See Where do I find my API Key? in the OpenAI Help Center. OpenAI provides a document about Best Practices for API Key Safety, which you should review.
Then on Unix or MacOS, copy the env.sh-template
file to env.sh
, then edit env.sh
to insert your OpenAI API key. Add your API key to the shell environment:
source env.sh
On Windows, the process will be similar, but we do not have documentation for the process yet.
In any case, make sure you do not push any file with your API key back to your upstream Git host, such as GitHub.
Run the script like this:
./pixie.py [-p <optional-prompt>] [-t <max-tokens>] -i <image-file-name>
-i
,--input
: (Required) The path and name of the image file to analyze.-p
,--prompt
: (Optional) A prompt to guide the GPT-4 analysis. Default is "Describe this image."-t
,--tokens
: (Optional) The maximum number of output tokens. Default is 300.
./pixie.py -i example.jpg -p "Consider this image. Print applicable keywords and a short description." -t 500
This command analyzes the image example.jpg
with the prompt "Consider this image. Print applicable keywords and a short description." and a maximum of 500 tokens for the output.
You can leave off the -p
and -t
arguments, and sensible defaults will be used.
The script will output:
- The original dimensions of the image.
- The aspect ratio of the image.
- The number of tokens used in the analysis and the approximate cost.
- A warning note if the maximum output tokens limit was met.
- The analysis response from GPT-4.
Pixie can be used to process multiple files using standard shell techniques. Here are two examples for Unix or MacOS.
for i in *.png ; do /path/to/pixie.py -i "$i" > "$i.txt"; done
find /path/to/images -iname '*.png' \
-exec sh -c '/path/to/pixie.py -i "$1" > "$1.txt"' _ {} \;
Pixie currently outputs to STDOUT, but in the future, there will be a --output
/ -o
flag to specify output destination, which will be preferred over the output redirection techniques above.
- The software is provided "as is", without warranty of any kind. See the LICENSE file for more details.
- This is a "1.0" version of the script, so some features may be missing, more error checking may be needed, etc. Please provide feedback!
- You are responsible for providing your OpenAI API key to operate the software. Your API key is not part of this software. You are responsible for keeping your API key secure.
- The image analysis is generated by GPT-4; if you get a surprising written response, or a refusal to analyze a script, it's due to OpenAI and GPT-4. Try providing a different prompt with
--prompt
. - The approximate cost is calculated using hardcoded prices retrieved on 2023-12-15 from OpenAI Pricing. Before you can rely on the approximate price calcuation, you must confirm the prices are up-to-date and the same as given on the OpenAI website.
- This version is hard-coded to use the
low
detail setting, to maintain lower API costs for the user. A future version may enable the use ofhigh
detail as well. Thelow
detail setting requires - If the GPT-4 output is truncated due to reaching the maximum token limit, consider increasing the
--tokens
value. - see the [[Roadmap.md]] file for plans and ideas for future development.
This project is licensed under the MIT License. See the LICENSE file for details.
Contributions to this script are welcome. Feel free to fork, modify, and send a pull request.
For any issues or questions, please open an issue in the repository or contact Peter Kaminski.