Voice-to-document API — Turn voice or text into structured drafts (invoices, quotes), render PDFs, and share via link. Built for handymen, artisans, and field workers who want to capture work and send professional documents from their phone.
Tensek (this repo) is the Go backend that powers:
- Speech-to-text — Transcribe audio with OpenAI Whisper
- Structured extraction — LLM turns transcript into invoice/quote JSON (client, line items, totals)
- PDF generation — Render documents with your logo and branding
- Storage & share — Upload to Supabase Storage (or S3-compatible), optional short links for sharing (e.g. WhatsApp, iMessage)
The API is stateless where possible; optional Postgres + Supabase give you persistence, documents, and short-link redirects.
| Feature | Description |
|---|---|
| Transcribe | POST /api/v1/voice/transcribe — Audio (base64 or multipart) → text |
| Drafts | POST /api/v1/drafts — Voice or text → structured invoice/quote JSON |
| Render | POST /api/v1/documents/render — Document JSON → PDF, upload, return URL |
| Short links | Optional short codes; frontend redirects /l/:code → storage URL |
| Invoices | List, get, mark paid, edit (voice/text) with LLM |
| Auth | JWT (Supabase or custom secret); optional DISABLE_AUTH for local dev |
- Go 1.22+ — HTTP server, business logic
- Gin — Router, middleware
- OpenAI — Whisper (STT), GPT-4o-mini (extraction, edits)
- Postgres — Optional (Supabase Postgres or any
DATABASE_URL) - Supabase — Optional storage bucket + auth
- gofpdf — PDF generation (invoice layout, logo, font)
- Go 1.22+
- OpenAI API key (required) — for STT and extraction
- Supabase (optional) — for storage, auth, and DB
- PNG logo (optional) — e.g.
tensek.pngat project root for PDF header - TTF font (optional) — e.g.
LibreFranklin-Regular.ttffor PDF body text
git clone https://github.com/your-org/whisper.git
cd whisper
go mod downloadCopy the example env and set at least your OpenAI key:
cp .env.example .env
# Edit .env: set OPENAI_API_KEY=sk-...For local dev without auth:
DISABLE_AUTH=true
OPENAI_API_KEY=sk-...
# Optional: DATABASE_URL, SUPABASE_* for storage and DBgo run ./cmd/serverServer listens on :8080. Health: http://localhost:8080/health.
# Transcribe audio (body: base64 audio or multipart)
curl -X POST http://localhost:8080/api/v1/voice/transcribe \
-H "Content-Type: application/json" \
-d '{"audio":"<base64>"}'
# Create draft from text (no auth when DISABLE_AUTH=true)
curl -X POST http://localhost:8080/api/v1/drafts \
-H "Content-Type: application/json" \
-d '{"type":"invoice","text":"Client: Acme Corp. Two hours plumbing, 80 per hour. Total 160."}'You can also import the Postman collection to explore all endpoints: postman/Whisper-API.postman_collection.json (in Postman: Import → upload that file).
| Variable | Required | Description |
|---|---|---|
PORT |
No | Server port (default 8080) |
OPENAI_API_KEY |
Yes | OpenAI API key for Whisper + GPT |
CORS_ALLOW_ORIGINS |
No | Comma-separated origins or * (default *) |
DISABLE_AUTH |
No | true = skip JWT (testing only) |
SUPABASE_URL |
No* | Supabase project URL |
SUPABASE_KEY |
No* | Supabase anon or service key |
SUPABASE_JWT_SECRET |
No* | JWT secret for verifying access tokens |
DATABASE_URL |
No | Postgres connection string (Supabase or other) |
FONT_PATH |
No | Path to TTF font (default LibreFranklin-Regular.ttf) |
LOGO_PATH |
No | Path to PNG logo (default tensek.png) |
FRONTEND_BASE_URL |
No | Base URL for short links (e.g. https://app.example.com) |
SMTP_* |
No | Optional email sending (host, port, user, password, from) |
* Auth needs either SUPABASE_JWT_SECRET or both SUPABASE_URL + SUPABASE_KEY unless DISABLE_AUTH=true.
whisper/
├── cmd/server/ # Entrypoint
├── internal/
│ ├── config/ # Env config
│ ├── extract/ # LLM extraction (invoice/quote)
│ ├── handler/ # HTTP handlers + auth
│ ├── ingest/ # Request parsing
│ ├── logger/ # Structured logging
│ ├── pdf/ # PDF layout (gofpdf)
│ ├── render/ # Render by type, storage upload
│ ├── send/ # Optional email
│ ├── storage/ # Supabase Storage client
│ ├── store/ # Postgres (documents, short links)
│ └── stt/ # OpenAI Whisper client
├── pkg/schema/ # Shared types (invoice, document types)
├── k8s/ # Kubernetes deployment
├── .github/workflows/ # CI (e.g. build + deploy)
├── docs/ # Prompts and integration notes
├── Dockerfile
├── .env.example
└── README.md
go run ./cmd/serverOr use air for live reload.
go test ./...docker build -t whisper .
docker run -p 8080:8080 -e OPENAI_API_KEY=sk-... -e DISABLE_AUTH=true whisperThe Dockerfile expects tensek.png and LibreFranklin-Regular.ttf in the build context (project root). Omit or replace for your own branding.
| Method | Path | Auth | Description |
|---|---|---|---|
| GET | /health |
No | Health check |
| GET | /api/v1/links/:code |
No | Resolve short link → redirect |
| POST | /api/v1/voice/transcribe |
Yes* | Audio → text |
| POST | /api/v1/drafts |
Yes* | Voice/text → structured document |
| POST | /api/v1/documents/render |
Yes* | Document → PDF, upload, URL |
| GET | /api/v1/invoices/stats |
Yes* | Invoice stats |
| GET | /api/v1/invoices |
Yes* | List invoices |
| GET | /api/v1/invoices/:id |
Yes* | Get invoice |
| POST | /api/v1/invoices/:id/paid |
Yes* | Mark paid |
| POST | /api/v1/invoices/:id/edit |
Yes* | Edit invoice (voice/text) |
* Not required when DISABLE_AUTH=true.
Request/response shapes: see pkg/schema and the handlers in internal/handler. Frontend integration notes: docs/NEXTJS_PROMPT.md.
- Kubernetes — See
k8s/deployment.yaml; inject env (e.g. Secret forOPENAI_API_KEY,DATABASE_URL). - CI —
.github/workflows/kube.yamlshows an example build-and-deploy; setIMAGE_NAME,APP_NAME,DOMAINand secrets for your registry and cluster.
The Tensek logo at the top of this README is tensek.png in the repo root. The same file is used in generated PDFs when LOGO_PATH is set. For your own fork:
- Replace
tensek.pngwith your logo (PNG recommended). - Or set
LOGO_PATHto another path and ensure it’s available in the container if you deploy with Docker/Kubernetes.
Contributions are welcome. Please open an issue or a PR. For large changes, open an issue first to align on direction.
- Run
go test ./...and keep the build green. - Follow existing style (Go standard layout, structured logging in
internal/logger).
This project is open source under the MIT License.
