Screen AI

Screen AI is a tool that analyzes images or screenshots using Ollama's vision models to extract text (OCR), to identify potential errors or anomalies, or answer questions.

Features

Take Screenshot: gui starts screenshot utility
OCR Analysis: Extracts text from images while preserving formatting
Error Detection: Reviews extracted text for mistakes and anomalies
Interactive Chat: Ask follow-up questions about the analyzed image
Dual Interface: Both command-line and GUI versions available
Configurable: Customizable server settings and prompts

Applications

`screen-ai-console`

Command-line interface for terminal-based image analysis with interactive chat.

`screen-ai-gui`

Graphical interface with split-pane view showing the image and chat side-by-side.

Installation

Install dependencies (or go with your distribution packages):
```
pip install -r requirements.txt
```
Ensure Ollama server is running on your LLM host with a vision model that can work with images:
```
ollama pull llama3.2-vision
ollama pull granite3.2-vision
ollama serve
```

Configuration

copy screen-ai.conf to your ~/.screen-ai.conf. It has the variables (with examples):

OLLAMA_SERVER_URL = 'http://your_ollama_host:11434'
MODEL_NAME = 'granite3.2-vision' 
SCREENSHOT_COMMAND = 'spectacle -b -r -n -o /tmp/$FILENAME'
INITIAL_PROMPT = """Your prompt here."""

Usage

Console Version

# Basic usage
./screen-ai-console --image screenshot.png

# With custom config
./screen-ai-console -i image.jpg -c /path/to/config

# With debug output
./screen-ai-console -i image.png --debug

GUI Version

# Basic usage
./screen-ai-gui --image screenshot.png

# With custom config and debug
./screen-ai-gui -i image.jpg -c /path/to/config --debug

# Take a screenshot and process
./screen-ai-gui

Command Line Options

Both applications support:

--image, -i: Path to image file (required for console
--config, -c: Path to config file (default: ~/.screen-ai.conf)
--debug, -d: Enable debug output (GUI only)
--help, -h: Show help message

Interactive Features

Console: Type follow-up questions or 'exit'/'quit' to end
GUI: Use the chat input field, arrow keys for command history
Image Context: Follow-up questions maintain visual context of the original image

Requirements

Python 3.7+
PySide6 (for GUI)
ollama
Running Ollama server with vision model

File Structure

screen-ai/
├── screen-ai-console     # CLI application
├── screen-ai-gui         # GUI application  
├── config_loader.py      # Configuration loader
├── requirements.txt      # Python dependencies
└── README.md            # This file

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Screen AI

Features

Applications

`screen-ai-console`

`screen-ai-gui`

Installation

Configuration

Usage

Console Version

GUI Version

Command Line Options

Interactive Features

Requirements

File Structure

About

Uh oh!

Releases

Packages

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 10 Commits
.gitignore		.gitignore
README.md		README.md
config_loader.py		config_loader.py
requirements.txt		requirements.txt
screen-ai-console		screen-ai-console
screen-ai-gui		screen-ai-gui
screen-ai.conf		screen-ai.conf

rsimai/screen-ai

Folders and files

Latest commit

History

Repository files navigation

Screen AI

Features

Applications

screen-ai-console

screen-ai-gui

Installation

Configuration

Usage

Console Version

GUI Version

Command Line Options

Interactive Features

Requirements

File Structure

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

`screen-ai-console`

`screen-ai-gui`

Packages