Ask anything. Get answers from your corpus — and beyond.
tiny-oracle is a terminal-first Q&A system where a lightweight CLI connects to a high-throughput server that retrieves context from a massive corpus and generates intelligent answers. Built to scale. Built to reason.
┌─────────────────────────────────────────────────────────┐
│ CLIENT │
│ │
│ $ tiny-oracle ask "What is the boiling point of X?" │
│ │ │
│ CLI (Go) │
│ │ │
│ HTTP/gRPC Request │
└───────────────────────────┼─────────────────────────────┘
│
▼
┌─────────────────────────────────────────────────────────┐
│ SERVER │
│ │
│ ┌──────────┐ ┌───────────────┐ ┌──────────────┐ │
│ │ Router │───▶│ Retriever │───▶│ Answer │ │
│ │ (HTTP) │ │ (Vector Search│ │ Generator │ │
│ │ │ │ / RAG) │ │ (LLM layer) │ │
│ └──────────┘ └───────────────┘ └──────────────┘ │
│ │ │
│ ┌──────────▼──────────┐ │
│ │ Corpus Store │ │
│ │ (Vector DB + │ │
│ │ Raw Documents) │ │
│ └─────────────────────┘ │
└─────────────────────────────────────────────────────────┘
Goal: CLI and server can talk to each other. No intelligence yet.
-
tiny-oracleCLI binary (Go) withask <question>command - Sends HTTP POST to server, prints response to terminal
- Basic Go HTTP server with
POST /askendpoint - Config file support (
~/.tiny-oracle/config.yaml) — server URL, auth token - Graceful error handling (server down, timeout, etc.)
Milestone: $ tiny-oracle ask "hello" → server responds "hello received"
Goal: Server can search a real corpus and return relevant chunks.
- Document ingestion pipeline — accepts
.txt,.pdf,.mdfiles - Chunks documents into passages and embeds them
- Vector database integration (Qdrant or pgvector)
-
tiny-oracle ingest ./my-corpus/CLI command - Server retrieves top-K relevant chunks for any incoming query
Milestone: $ tiny-oracle ask "X" → returns relevant raw passages from corpus
Goal: Server uses retrieved context + LLM to generate a clean answer.
- LLM integration (Claude API / OpenAI / local Ollama)
- Prompt:
[System] + [Retrieved Context] + [User Question] - Token streaming — CLI displays answer in real-time
- Fallback: "I don't have enough information" when context is weak
- Optional: single-session conversation history
Milestone: $ tiny-oracle ask "What is X?" → coherent, context-grounded answer
Goal: Server handles massive concurrent load without breaking.
- Go goroutines + worker pool for request handling
- Connection pooling for vector DB
- Caching layer for frequent queries (Redis or in-memory LRU)
- API key authentication + per-user rate limiting
- Observability:
/health,/metrics, latency tracking - Load testing at 10k → 100k → 1M concurrent requests
Milestone: Sustained high-concurrency load with <200ms p95 latency
Goal: Answer questions that go beyond the corpus using logic and reasoning.
- Query classifier — corpus retrieval vs pure reasoning
- Reasoning engine with chain-of-thought prompting
- Handles riddles, logic puzzles, spatial reasoning, math
- Hybrid mode: corpus + reasoning for complex questions
- Honest fallback hierarchy:
- Check corpus → found → generate grounded answer
- Not found → attempt reasoning
- Can't reason confidently → say so
Milestone: $ tiny-oracle ask "I am behind Sumon, Sumon is behind me. How?" → "Both are facing away from each other."
| Layer | Technology |
|---|---|
| CLI Client | Go (cobra) |
| HTTP Server | Go (net/http or chi) |
| Embedding | Python worker / OpenAI API / local model |
| Vector Store | Qdrant (or pgvector) |
| LLM | Claude API (Anthropic) |
| Cache | Redis (Phase 4) |
| Config | YAML (~/.tiny-oracle/config.yaml) |
# Clone the repo
git clone https://github.com/yourname/tiny-oracle
cd tiny-oracle
# Build CLI
go build -o tiny-oracle ./cmd/cli
# Run server
go run ./cmd/server
# Ask a question
./tiny-oracle ask "What is the speed of light?"Most Q&A systems are either too dumb (keyword search) or too expensive (pure LLM).
tiny-oracle is built on a simple belief:
Retrieved knowledge + real reasoning = genuine intelligence.
Start grounded. Think when needed. Be honest when you don't know.
- Phase 1: Core plumbing
- Phase 2: Corpus ingestion & retrieval
- Phase 3: LLM answer generation
- Phase 4: Scale & performance
- Phase 5: Reasoning layer
MIT