GitHub - neirzhei/ScreenScribe: Offline-first agent that generates spoken conversational on screen activity using a local multi-modal pipeline (Vision-LLM-TTS) with resource conscious architecture.

An offline-first AI companion that provides real-time, spoken commentary on your screen activity to offer encouragement and witty observations.

How It Works

This program is designed to run entirely offline on consumer hardware. It operates in a simple loop:

Capture: Periodically, it takes a screenshot of the user's primary monitor.
Analyze: A vision model analyzes the screenshot to generate a factual description of the on-screen activity.
Comment: A large language model (LLM) takes this description and generates a short, conversational, and encouraging or witty comment.
Speak: A text-to-speech (TTS) model synthesizes the comment into audio and plays it aloud.

To conserve system resources, each AI model is loaded into memory only when needed and unloaded immediately after its task is complete.

Installation & Usage

This project is containerized using Docker for easy setup.

Prerequisites:

A Linux-based operating system (tested on Debian).
Docker and Docker Compose installed.
An active internet connection for the initial model download.

Configuration:

Clone the repository.
Key parameters like model repositories, GPU layer offloading (LLM_GPU_LAYERS), and commentary frequency (MIN/MAX_INTERVAL_MINUTES) can be adjusted in src/config.py.

Running the Application:

Open a terminal in the project's root directory.
Run the command: docker compose up --build
ScreenScribe will now be running in the background. To stop it, press Ctrl+C in the terminal.

Potential Improvements

Two-way Communication: User being able to reply to ScreenScribe's commentaries.
Conversational Memory: A short-term memory system, allowing ScreenScribe to recall the last few interactions. This will make its comments more contextually relevant, natural, and engaging over time.
GPU Acceleration: Significantly reduce the time from screenshot to spoken comment.
Cross-Platform Support: Re-engineer the screen capture and audio playback modules to be compatible with Windows and macOS, making it OS-Independent.

Name		Name	Last commit message	Last commit date
Latest commit History 7 Commits
src		src
.dockerignore		.dockerignore
.env		.env
.gitignore		.gitignore
Dockerfile		Dockerfile
LICENSE		LICENSE
Readme.md		Readme.md
docker-compose.yml		docker-compose.yml
pyproject.toml		pyproject.toml

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

How It Works

Installation & Usage

Prerequisites:

Configuration:

Running the Application:

Potential Improvements

About

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

How It Works

Installation & Usage

Prerequisites:

Configuration:

Running the Application:

Potential Improvements

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Contributors

Uh oh!

Languages