Skip to content

Deployment

sarmakska edited this page May 31, 2026 · 2 revisions

Deployment

Three viable paths. Pick the one that matches your team's ops appetite.

Path A: Vercel (recommended for solo / small teams)

Easiest. Deploys on every push. Free tier covers most personal projects.

Steps

  1. Push your fork to GitHub
  2. Go to vercel.com/new, import the repo
  3. Vercel detects Next.js, no config needed
  4. In environment variables, add OPENAI_API_KEY
  5. Click Deploy

Live in 90 seconds. Subsequent pushes auto-deploy.

Custom domain

Vercel project → Settings → Domains. Add yours, point DNS, done.

Limits to know

Limit Free Pro ($20/mo)
Function invocations 100k/mo 1M/mo
Function memory 1024MB 3008MB
Function timeout 10s default, 60s max up to 300s
Bandwidth 100GB 1TB

The 60-second function timeout is set in app/api/upload/route.ts (export const maxDuration = 60). This is enough for PDFs up to ~200 pages with text-embedding-3-small. Above that, switch to background processing or use a longer-timeout deployment target.

Path B: Docker on a VPS (recommended for production)

More control, predictable cost, no platform lock-in.

Dockerfile

FROM node:24-alpine AS deps
WORKDIR /app
COPY package*.json pnpm-lock.yaml* ./
RUN corepack enable && pnpm install --frozen-lockfile

FROM node:24-alpine AS builder
WORKDIR /app
COPY --from=deps /app/node_modules ./node_modules
COPY . .
RUN corepack enable && pnpm build

FROM node:24-alpine AS runner
WORKDIR /app
ENV NODE_ENV=production
COPY --from=builder /app/.next/standalone ./
COPY --from=builder /app/.next/static ./.next/static
COPY --from=builder /app/public ./public
EXPOSE 3000
CMD ["node", "server.js"]

Add output: 'standalone' to next.config.mjs to make the standalone build available.

Run

docker build -t rag-over-pdf .
docker run -d \
  -e OPENAI_API_KEY=sk-... \
  -p 3000:3000 \
  --restart unless-stopped \
  rag-over-pdf

Recommended VPS

  • Hetzner CCX13 (€13/mo, 2 vCPU, 8GB) — fits this app comfortably with room for pgvector
  • Fly.io (~$5/mo at low scale) — global edge, easy deploy
  • Render ($7/mo starter) — Heroku-feel, simple

Front it with Caddy or Cloudflare for HTTPS.

Path C: Self-host with persistent vector DB

When you outgrow in-memory, you need Postgres + pgvector running too.

Topology

graph LR
  CDN[Cloudflare] --> NX[Caddy]
  NX --> APP[rag-over-pdf<br/>Node container]
  APP --> PG[(Postgres 15<br/>pgvector)]
  APP --> OAI[OpenAI API]

  classDef ext fill:#a78bfa,stroke:#a78bfa,color:#fff
  class OAI ext
Loading

docker-compose.yml

services:
  app:
    build: .
    environment:
      OPENAI_API_KEY: ${OPENAI_API_KEY}
      DATABASE_URL: postgresql://rag:${PG_PASS}@db:5432/rag
    depends_on: [db]
    restart: unless-stopped
    ports: ["3000:3000"]
  db:
    image: pgvector/pgvector:pg16
    environment:
      POSTGRES_USER: rag
      POSTGRES_PASSWORD: ${PG_PASS}
      POSTGRES_DB: rag
    volumes: [pgdata:/var/lib/postgresql/data]
    restart: unless-stopped
volumes:
  pgdata:

Add the table schema from Swap to pgvector once Postgres is up.

Environment variables checklist

Before deploying, verify all of these:

  • OPENAI_API_KEY set
  • EMBEDDING_MODEL if overriding default
  • CHAT_MODEL if overriding default
  • DATABASE_URL if using pgvector
  • Function timeout extended if you'll index large PDFs

Smoke test after deploy

# Should return 200
curl -I https://your-domain.com

# Upload a PDF (returns a docId, page count, and the document list)
curl -F "file=@test.pdf" https://your-domain.com/api/upload

# List indexed documents
curl https://your-domain.com/api/upload

# Ask a question (response is an NDJSON stream: a citations event, then token
# events, then a done event)
curl -N -X POST https://your-domain.com/api/chat \
  -H "Content-Type: application/json" \
  -d '{"question": "what is this about?"}'

Monitoring

Vercel: built-in analytics + function logs.

Self-host: pipe stdout to a log aggregator. Logtail, Axiom, or journalctl if you're old-school.

Track:

  • Upload error rate (PDFs that fail to parse)
  • Average chunks per upload
  • Average time-to-first-token on chat
  • OpenAI rate-limit responses

Clone this wiki locally