-
Notifications
You must be signed in to change notification settings - Fork 0
Home
sarmakska edited this page May 3, 2026
·
5 revisions
Working vision OCR starter. Drop a photo of a receipt, get structured JSON.
Built by Sarma Linux. MIT licence.
Upload a photo of a receipt. The app sends it to a vision-capable language model and extracts structured fields: vendor name, address, transaction date and time, itemised line items with quantity and unit price, subtotal, tax, tip, total, currency, and payment method when visible.
Returns clean JSON, validated against a Zod schema. Renders the result as a table. Wire it to Supabase, Xero, QuickBooks, n8n, or whatever your finance stack needs. The hard part is solved.
- Small business teams replacing manual receipt entry.
- Builders prototyping an AI expense or bookkeeping product.
- Engineers who want to understand how vision models work end to end.
-
Image downscaling —
sharpresizes and re-encodes before the API call. Roughly 4x cost saving with no measurable accuracy loss on receipts. - Auto-rotate — EXIF orientation corrected before the model sees the image.
- Zod validation — malformed model output is rejected at the boundary, not downstream.
-
Persistence stub —
lib/persist.tsis a no-op by default. Drop in a Supabase insert or webhook. -
Swap path —
lib/vision.tsis one function. Replace its body to usegpt-4oor any other vision model.
| Cost element | Approx cost (Claude 3.5 Sonnet) |
|---|---|
| Image input + system prompt | ~£0.006 |
| Output JSON | ~£0.008 |
| Per scan | ~£0.013 |
Next.js 14 App Router, TypeScript, Anthropic Claude vision (claude-3-5-sonnet-latest), sharp, Zod, Tailwind CSS.
- Architecture — scan flow diagram, component table, failure modes, token cost table
- Quick-Start — clone, install, env vars, first scan
- Vision-Models — swapping to OpenAI, local Llama, model comparison
- Configuration — all env vars, tuning image size
- Wire-to-Database — Supabase, Xero, QuickBooks, n8n integration paths
- Edge-Cases — blurry images, multi-page PDFs, hand-written receipts
- Deployment — Vercel one-click, Node runtime requirement
- Roadmap — what is shipped and what is next