Skip to content

Configuration

sarmakska edited this page May 3, 2026 · 3 revisions

Configuration

All configuration lives in .env.local. There are six knobs.

Required

OPENAI_API_KEY

Your OpenAI API key. Used for both embeddings and chat completions.

OPENAI_API_KEY=sk-proj-...

Get one at platform.openai.com/api-keys.

Optional (with defaults)

EMBEDDING_MODEL

Default: text-embedding-3-small (1536 dims, $0.02 per 1M tokens)

Alternatives:

  • text-embedding-3-large (3072 dims, $0.13 per 1M tokens) — slightly better recall, 2x cost, 2x latency
  • text-embedding-ada-002 (1536 dims, deprecated but still works) — older

Stick with the small model unless you have measured retrieval quality issues.

CHAT_MODEL

Default: gpt-4o-mini

Alternatives:

  • gpt-4o — better, more expensive
  • gpt-4-turbo — older, similar cost to 4o
  • gpt-3.5-turbo — cheaper but noticeably worse at instruction following

gpt-4o-mini is the right default. It is cheap, fast, and follows the "answer only from the chunks" instruction reliably.

CHUNK_SIZE

Default: 1000 (characters)

Bigger chunks = more context per retrieval, fewer chunks total, fewer embeddings. Smaller chunks = more precise retrieval, more granularity.

Content type Recommended size
Code, structured docs 500-800
Prose (articles, books) 1000-1500
Tabular data, reference 200-500 with metadata

CHUNK_OVERLAP

Default: 200 (characters)

Overlap means a sentence that spans two chunks is findable in either. 20 percent of CHUNK_SIZE is the standard heuristic.

Set to 0 for highly structured content (code, JSON) where there's no sentence to straddle.

TOP_K

Default: 5

How many chunks to retrieve and pass to the LLM as context.

TOP_K Trade-off
1-3 Fast, cheap, may miss the answer
5-7 Balanced (recommended)
10+ More context, more cost, can confuse the LLM

The right value depends on your chunk size and content density.

Tuning workflow

  1. Pick a PDF and write 10 questions you know the answer to.
  2. Run with defaults. Score correctness manually.
  3. If retrieval is missing relevant chunks, increase TOP_K or decrease CHUNK_SIZE.
  4. If the LLM is hallucinating, increase the system prompt's "say so plainly" emphasis or check that retrieval is actually finding the right chunks.
  5. If responses are slow, decrease TOP_K or switch to gpt-4o-mini.

Cost example

For a 50-page PDF (roughly 100k characters):

  • 100 chunks at 1000 chars each
  • Embedding cost: 100k tokens × £0.000016/1k = roughly £0.0016 once
  • Each question: 1 embedding (£0.000016) + 1 chat completion (£0.0008 with 4o-mini)
  • Total per question: under £0.001

You can answer 1000 questions on a 50-page PDF for under £1.

Clone this wiki locally