Skip to content

mlaify/OpenSift

Folders and files

NameName
Last commit message
Last commit date

Latest commit

Β 

History

72 Commits
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 

OpenSift

Build / Smoke Tests Release Release Date License: MIT

Please note that this is only a hobby project, may be insecure, contain security holes, and only a proof-of-conecpt.

Sift faster. Study smarter.

OpenSift is an AI-powered study assistant that helps students ingest large amounts of information (URLs, PDFs, lecture notes) and intelligently sift through it using semantic search and AI generation.


🎯 Why OpenSift?

Students don’t struggle because they lack information.
They struggle because they have too much of it.

OpenSift helps by:

  • Ingesting textbooks, PDFs, and web articles
  • Finding only the most relevant sections
  • Grounding AI responses in your materials
  • Streaming answers in real-time
  • Supporting conversational study Q&A (Study Chat) with concrete recommendations and next steps
  • Generating assignment-focused study plans (Assignment Planner) with milestones and prioritized checklists
  • Generating structured study guides, key points, and quizzes

🎬 Demo

OpenSift Demo


πŸ–Ό Screenshots

Full Chat Study Guide Key Points Quiz Me


πŸš€ Quick Start

1. Create a virtual environment

bash
python3.13 -m venv .venv
source .venv/bin/activate

(Recommended: Python 3.12 or 3.13)

One-command bootstrap (recommended)

From backend/:

./setup.sh

This script will:

  • verify Python 3.12+
  • create/activate .venv
  • install dependencies (openai, anthropic, sentence-transformers, -r requirements.txt)
  • check for missing claude/codex CLIs and ask whether to install them
  • prompt for API keys/tokens and write backend/.env
  • run setup + security audit (python opensift.py setup --skip-key-prompts --no-launch)
  • offer launch targets for local gateway/terminal and Docker gateway/terminal

2. Install dependencies

pip install -U pip setuptools wheel
pip install openai
pip install anthropic
pip install sentence-transformers
pip install -r requirements.txt

3. Set API Keys (Optional)

Supported providers: β€’ Claude Code (setup-token) β€’ ChatGPT Codex (OAuth token) β€’ Claude API (Anthropic) β€’ OpenAI API

Example:

export OPENAI_API_KEY="your-key"

export ANTHROPIC_API_KEY="your-key"

Claude Code users:

claude setup-token
export CLAUDE_CODE_OAUTH_TOKEN="..."
unset ANTHROPIC_API_KEY

Codex users:

export CHATGPT_CODEX_OAUTH_TOKEN="..."
export OPENSIFT_CODEX_CMD="codex"

If codex --help prints Render your codex, that executable is a different npm package. Set OPENSIFT_CODEX_CMD to your ChatGPT Codex CLI executable. If CHATGPT_CODEX_OAUTH_TOKEN is not set, OpenSift will auto-read Codex credentials from:

  • /app/.codex/auth.json (Docker-first default)
  • ~/.codex/auth.json (host/user fallback)

If no provider is configured, OpenSift will still retrieve relevant passages but won’t generate AI summaries.

4. Run the app

4.a How to run it

TEST FEATURE

From backend/:

βœ… Guided setup + launch wizard (recommended)

python opensift.py setup

This workflow lets users:

  • Enter/update API keys and tokens (OPENAI_API_KEY, ANTHROPIC_API_KEY, CLAUDE_CODE_OAUTH_TOKEN, CHATGPT_CODEX_OAUTH_TOKEN)
  • Save settings to backend/.env
  • Prompt to install missing Claude Code / ChatGPT Codex CLIs (or skip)
  • Choose launch mode: gateway, ui, terminal, or both

βœ… Gateway runner (recommended for local orchestration)

python opensift.py gateway --with-mcp

Gateway mode:

  • Supervises OpenSift UI and optional MCP server from one command
  • Runs startup health checks (/health)
  • Handles graceful shutdown for all managed processes

βœ… Web UI (localhost)

python opensift.py ui --reload

βœ… Terminal chatbot

python opensift.py terminal --provider claude_code

Terminal thinking/streaming controls:

python opensift.py terminal --provider claude --thinking

In-session commands:

  • /thinking on|off
  • /show-thinking on|off
  • /true-stream on|off
  • /stream on|off

Run a security audit any time:

python opensift.py security-audit --fix-perms

Example: separate class namespace + quiz mode:

python opensift.py terminal --provider claude_code --owner bio101 --mode quiz

Example: open-ended study coaching mode:

python opensift.py terminal --provider claude_code --owner bio101 --mode study_chat

Example: deadline-aware planning mode:

python opensift.py terminal --provider claude_code --owner bio101 --mode assignment_planner

Then inside the terminal chat: β€’ Ingest a URL: /ingest url https://en.wikipedia.org/wiki/Photosynthesis β€’ Ingest a file: /ingest file /path/to/chapter1.pdf β€’ Ask questions normally.

Supported chat modes (UI + terminal + MCP prompt builder):

  • study_chat (default): open-ended, context-grounded study conversation and recommendations
  • assignment_planner: concrete plan with next steps, short timeline, and checklist
  • study_guide: structured guide output
  • key_points: concise key concepts
  • quiz: quiz + answer key
  • explain: layered explanation with misconceptions

4.b Old Method:

uvicorn ui_app:app --reload --host 127.0.0.1 --port 8001

4.c Docker (recommended for consistent local runtime)

From repository root:

docker compose up --build opensift-gateway

Then open:

http://127.0.0.1:8001/

Useful Docker commands:

touch backend/.env

# Start in background
docker compose up -d --build opensift-gateway

# Start interactive terminal in Docker
docker compose run --rm opensift-terminal

# Stop
docker compose down

# View logs
docker compose logs -f opensift-gateway

Docker notes:

  • Docker publishes OpenSift on loopback by default (127.0.0.1:8001).
  • You can override bind address when needed (for relay/proxy testing):
    • OPENSIFT_BIND_ADDR=0.0.0.0 docker compose up --build
  • Claude Code and Codex CLIs are installed in Docker image builds by default.
    • Disable either install if needed:
      • INSTALL_CLAUDE_CODE_CLI=false docker compose up --build
      • INSTALL_CODEX_CLI=false docker compose up --build
    • Override npm package names if your org mirrors npm packages:
      • CLAUDE_CODE_NPM_PACKAGE=@anthropic-ai/claude-code
      • CODEX_NPM_PACKAGE=@openai/codex
  • Host-installed CLIs are not visible inside containers unless installed in the image.
  • CLI auth state is persisted under backend/.codex and backend/.claude.
    • Docker defaults OPENSIFT_CODEX_AUTH_PATH=/app/.codex/auth.json.
    • Docker sets OPENSIFT_CODEX_SKIP_GIT_REPO_CHECK=true so Codex can run under /app even when it is not a git-trusted workspace.
    • Device auth inside Docker:
      • docker exec -it opensift-gateway sh -lc 'HOME=/app codex login --device-auth'
      • docker exec -it opensift-gateway claude setup-token
  • If traffic must arrive from known relay/proxy egress IPs, allowlist them:
    • OPENSIFT_TRUSTED_CLIENT_IPS=143.204.130.84
    • or CIDR list: OPENSIFT_TRUSTED_CLIENT_CIDRS=143.204.128.0/20
  • OpenSift can still make outbound calls (e.g., Hugging Face model download) while inbound access remains localhost-guarded.
  • Embeddings warmup is enabled by default in Docker to avoid first-chat retrieval interruptions:
    • OPENSIFT_PRELOAD_EMBEDDINGS=true
  • The container mounts ./backend to /app, so local state persists:
    • .chroma, .opensift_sessions, .opensift_library, .opensift_quiz_attempts, .opensift_flashcards, .opensift_auth.json, SOUL.md
  • Provider keys/tokens are loaded from backend/.env via env_file.
  • ChatGPT Codex auto-discovery still works if auth is provided by env (CHATGPT_CODEX_OAUTH_TOKEN).
  • Gateway mode in Docker starts UI + MCP automatically.

Image versioning notes:

  • Local docker compose builds are tagged as opensift-opensift-gateway:latest.
  • For external publishing, use version-aligned tags, e.g.:
    • ghcr.io/opensift/opensift-gateway:1.6.2-alpha
    • ghcr.io/opensift/opensift-gateway:latest

Open:

http://127.0.0.1:8001/

The chatbot page is the default UI.

πŸ’¬ Chat-First Workflow

Everything happens inside the chatbot interface.

You can:

πŸ“₯ Ingest β€’ Paste a URL and ingest it β€’ Upload a PDF, TXT, or MD file β€’ Keep materials separated using the owner field

πŸ”Ž Ask Questions β€’ Ask conceptual questions β€’ Request study guides β€’ Generate quizzes β€’ Compare topics β€’ Extract key points

⚑ Streaming Responses

Responses stream live as they are generated.

You’ll see: β€’ Retrieval phase β€’ Source citations β€’ Incremental streaming output

βΈ»

🧠 How OpenSift Works 1. Text is chunked into semantic segments 2. Each chunk is embedded into vector space 3. Stored in ChromaDB 4. Queries retrieve relevant chunks 5. AI generates answers grounded in those chunks 6. Responses stream back to the UI

βΈ»

πŸ—‚ Owners (Namespaces)

Use the owner field in the chat UI to separate subjects.

Examples: β€’ bio101 β€’ chem_midterm β€’ cs_final β€’ history_notes

Each owner has: β€’ Separate vector results β€’ Separate chat history

βΈ»

πŸ›  Supported Providers

Provider | Requires Key | Streaming | Notes

Claude Code | Setup token | Yes* | Recommended

Claude API | API key | Yes | Anthropic

OpenAI | API key | Yes | GPT-5.2 default

ChatGPT Codex | OAuth token | Yes* | Codex CLI via OPENSIFT_CODEX_CMD (non-interactive codex exec)

  • Claude Code currently uses chunk-streaming; Codex now attempts native CLI streaming and falls back if unavailable.

πŸ“‚ Project Structure

backend/
β”œβ”€β”€ app/
β”‚   β”œβ”€β”€ chunking.py
β”‚   β”œβ”€β”€ ingest.py
β”‚   β”œβ”€β”€ llm.py
β”‚   β”œβ”€β”€ providers.py
β”‚   β”œβ”€β”€ settings.py
β”‚   └── vectordb.py
β”œβ”€β”€ templates/
β”‚   └── chat.html
β”œβ”€β”€ static/
β”œβ”€β”€ ui_app.py
└── requirements.txt

πŸ” Environment Variables

Optional but recommended:

OPENAI_API_KEY=
ANTHROPIC_API_KEY=
OPENSIFT_LOG_LEVEL=INFO
OPENSIFT_LOG_DIR=.opensift_logs
OPENSIFT_LOG_MAX_BYTES=5242880
OPENSIFT_LOG_BACKUP_COUNT=5
OPENSIFT_SOUL_PATH=~/.opensift/SOUL.md
OPENSIFT_BREAK_REMINDERS_ENABLED=true
OPENSIFT_BREAK_REMINDER_EVERY_USER_MSGS=6
OPENSIFT_BREAK_REMINDER_MIN_MINUTES=45
CHATGPT_CODEX_OAUTH_TOKEN=
OPENSIFT_CODEX_CMD=codex
OPENSIFT_CODEX_ARGS=
OPENSIFT_CODEX_AUTH_PATH=/app/.codex/auth.json
OPENSIFT_MAX_URL_REDIRECTS=5

🧾 Logging

OpenSift now includes centralized logging across UI, gateway, terminal chat, and MCP server.

  • Default log file: backend/.opensift_logs/opensift.log
  • Console logging + rotating file logs are enabled by default
  • Configure with OPENSIFT_LOG_* env vars above

Health diagnostics:

  • GET /health now includes diagnostics.codex_auth_detected (boolean only, no secret values)

🎨 SOUL Personality (Study Style)

OpenSift now supports a global study style personality stored in ~/.opensift/SOUL.md by default (override with OPENSIFT_SOUL_PATH) and applied everywhere (UI, terminal, all owners).

  • UI: edit Global Study Style (SOUL) in the left sidebar, then click Save Style
  • Terminal: use /style, /style set <text>, /style clear
  • Styles are injected into generation prompts while still grounding answers in retrieved sources
  • Legacy per-owner SOUL entries are automatically migrated into the global style block

🧘 Wellness Break Reminders

OpenSift can proactively remind learners to pause, hydrate, and rest during long study sessions.

  • Reminders can include water/stretch/mental-health/sleep cues
  • Triggered periodically during chat sessions (UI + terminal)
  • Controlled by OPENSIFT_BREAK_REMINDER_* environment variables
  • UI controls are available in the left sidebar (enable toggle + frequency settings)

🧭 Roadmap β€’ True token streaming from providers β€’ Chat memory persistence (SQLite) β€’ User authentication β€’ Multi-user support β€’ OCR support for scanned PDFs β€’ Docker deployment β€’ UI theming

βΈ»

πŸ“œ License

MIT

βΈ»

πŸ’‘ Philosophy

OpenSift helps students focus on understanding β€” not searching.

It retrieves relevant material and organizes it intelligently so learners can study faster and retain more.

About

OpenSift PoC Project

Topics

Resources

License

Code of conduct

Contributing

Security policy

Stars

Watchers

Forks

Packages

 
 
 

Contributors