-
Notifications
You must be signed in to change notification settings - Fork 0
Home
Working vision OCR starter. Drop a photo of a receipt, get structured JSON.
Built by Sarma Linux. MIT licence. Source at github.com/sarmakska/receipt-scanner.
Upload a photo of a receipt. The app sends it to a vision-capable language model and extracts structured fields: vendor name, address, transaction date and time, itemised line items with quantity and unit price, subtotal, tax, tip, total, currency, and payment method when visible.
Returns clean JSON, validated against a Zod schema. Renders the result as a table. Wire it to Supabase, Xero, QuickBooks, n8n, or whatever your finance stack needs. The hard part is solved.
- Small business teams replacing manual receipt entry.
- Builders prototyping an AI expense or bookkeeping product.
- Engineers who want to understand how vision models work end to end.
This is a single-process Next.js 14 application. There is no separate worker, queue, or database in the default build. The whole pipeline runs server-side inside one API route, which keeps the API key off the client and makes the cost surface easy to reason about.
flowchart TD
A[Browser: app/page.tsx] -->|multipart upload| B[app/api/scan/route.ts]
B --> C[sharp: resize, re-encode, auto-rotate]
C --> D[lib/vision.ts: single vision API call]
D --> E[lib/schema.ts: Zod parse + validate]
E -->|valid| F[lib/persist.ts: save stub]
E -->|valid| G[JSON response back to UI table]
E -->|invalid| H[400 with validation error]
The request lifecycle, step by step:
-
Upload.
app/page.tsxposts the file as multipart form data to/api/scan. No client-side processing, so the browser never holds an API key. -
Pre-process.
sharpreads the bytes, corrects EXIF orientation, downscales the longest edge toMAX_IMAGE_PX(default 1568), and re-encodes to JPEG. This is the single biggest cost lever: a 12MP phone photo becomes a fraction of the input tokens with no measurable accuracy loss on receipts. -
Vision call.
lib/vision.tssends the base64 image plus a JSON-only system prompt to the model. This is one function and one network call. Everything provider-specific lives here. -
Validate. The raw model output is parsed by the Zod schema in
lib/schema.ts. Malformed output is rejected at this boundary, so a hallucinated or truncated response never reaches your database or your UI. -
Persist and respond. Valid receipts pass through
lib/persist.ts(a no-op stub by default) and are returned to the UI, which renders them as a table.
The schema is the contract. The model is asked for JSON, but models occasionally return prose, partial objects, or wrong types. Rather than trust the output, every scan must parse() clean before anything downstream sees it. This is why swapping the model (Claude to gpt-4o to a local Llama) requires no changes outside lib/vision.ts: the rest of the app only ever sees a validated Receipt.
| File | Responsibility |
|---|---|
app/page.tsx |
Upload UI, renders the parsed receipt table |
app/api/scan/route.ts |
Orchestrates the pipeline, returns validated JSON or a 400 |
lib/vision.ts |
The single vision API call. The only provider-specific code |
lib/schema.ts |
The Zod contract. Receipt and lineItem types |
lib/persist.ts |
save() stub. Replace with a Supabase insert or webhook |
docs/schema.sql |
Postgres / Supabase tables that mirror the Zod contract |
Expense capture for a small team. Staff snap a photo on their phone, the scan returns structured fields, and you insert straight into Supabase. Wire lib/persist.ts to a single insert against the tables in docs/schema.sql. See Wire-to-Database.
Feeding an accounting tool. After a valid scan, POST the JSON to the Xero or QuickBooks expense API. The validated Receipt shape maps cleanly onto their expense models. The mapping lives next to your save() call.
Automation fan-out with n8n. Add a webhook target in app/api/scan/route.ts and POST every validated receipt to an n8n workflow. From there you can branch on vendor, route for approval, or push to a spreadsheet without touching this codebase again.
Provider benchmarking. Want to compare Claude against gpt-4o on your own receipts? Replace the body of lib/vision.ts, keep the same JSON contract, and the UI and validation stay identical. See Vision-Models.
| Cost element | Approx cost (Claude 3.5 Sonnet) |
|---|---|
| Image input + system prompt | ~£0.006 |
| Output JSON | ~£0.008 |
| Per scan | ~£0.013 |
Resizing in step 2 is what keeps this number small. Disabling the downscale roughly quadruples the input token cost on a full-resolution phone photo.
Missing ANTHROPIC_API_KEY or a 401 from the model. Copy .env.example to .env.local and set a key with vision access. The key is read server-side only; it is never exposed to the browser. Restart the dev server after changing env files.
The build fails with a sharp native binary error. sharp ships platform-specific binaries. If your package manager skipped its build script, run the rebuild step for it (pnpm rebuild sharp). On serverless platforms, confirm the platform provides the native libraries. Vercel does. See Deployment.
A scan returns a 400 validation error. The model returned output that did not satisfy the Zod schema. This is the boundary doing its job. Inspect the raw response, and if the failure is systematic, tighten the system prompt in lib/vision.ts or relax the affected field in lib/schema.ts. See Edge-Cases.
HEIC photos from iPhones fail to decode. HEIC support depends on the sharp build on your platform. Locally this usually works; on serverless, verify HEIC is supported or convert to JPEG upstream.
Multi-page PDF receipts only read the first page. By design this handles one image per scan. Rasterise each page upstream and scan them individually. See Edge-Cases.
Blurry or low-light photos return sparse fields. The model returns what it can read. Improve capture conditions, or raise MAX_IMAGE_PX to retain more detail at the cost of more tokens. See Configuration.
Next.js 14 App Router, TypeScript, Anthropic Claude vision (claude-3-5-sonnet-latest), sharp, Zod, Tailwind CSS.
- Architecture: scan flow diagram, component table, failure modes, token cost table
- Quick-Start: clone, install, env vars, first scan
- Vision-Models: swapping to OpenAI, local Llama, model comparison
- Configuration: all env vars, tuning image size
- Wire-to-Database: Supabase, Xero, QuickBooks, n8n integration paths
- Edge-Cases: blurry images, multi-page PDFs, hand-written receipts
- Deployment: Vercel one-click, Node runtime requirement
- Roadmap: what is shipped and what is next