Skip to content
sarmakska edited this page May 3, 2026 · 5 revisions

receipt-scanner

Working vision OCR starter. Drop a photo of a receipt, get structured JSON.

Built by Sarma Linux. MIT licence.


What this is

Upload a photo of a receipt. The app sends it to a vision-capable language model and extracts structured fields: vendor name, address, transaction date and time, itemised line items with quantity and unit price, subtotal, tax, tip, total, currency, and payment method when visible.

Returns clean JSON, validated against a Zod schema. Renders the result as a table. Wire it to Supabase, Xero, QuickBooks, n8n, or whatever your finance stack needs. The hard part is solved.

Who this is for

  • Small business teams replacing manual receipt entry.
  • Builders prototyping an AI expense or bookkeeping product.
  • Engineers who want to understand how vision models work end to end.

Key features

  • Image downscalingsharp resizes and re-encodes before the API call. Roughly 4x cost saving with no measurable accuracy loss on receipts.
  • Auto-rotate — EXIF orientation corrected before the model sees the image.
  • Zod validation — malformed model output is rejected at the boundary, not downstream.
  • Persistence stublib/persist.ts is a no-op by default. Drop in a Supabase insert or webhook.
  • Swap pathlib/vision.ts is one function. Replace its body to use gpt-4o or any other vision model.

Token cost reference (typical UK till receipt, 1568px max)

Cost element Approx cost (Claude 3.5 Sonnet)
Image input + system prompt ~£0.006
Output JSON ~£0.008
Per scan ~£0.013

Stack

Next.js 14 App Router, TypeScript, Anthropic Claude vision (claude-3-5-sonnet-latest), sharp, Zod, Tailwind CSS.


Wiki pages

  • Architecture — scan flow diagram, component table, failure modes, token cost table
  • Quick-Start — clone, install, env vars, first scan
  • Vision-Models — swapping to OpenAI, local Llama, model comparison
  • Configuration — all env vars, tuning image size
  • Wire-to-Database — Supabase, Xero, QuickBooks, n8n integration paths
  • Edge-Cases — blurry images, multi-page PDFs, hand-written receipts
  • Deployment — Vercel one-click, Node runtime requirement
  • Roadmap — what is shipped and what is next

Repository

github.com/sarmakska/receipt-scanner

Clone this wiki locally