Configuration

All configuration lives in .env.local. There are six knobs.

Required

`OPENAI_API_KEY`

Your OpenAI API key. Used for both embeddings and chat completions.

OPENAI_API_KEY=sk-proj-...

Get one at platform.openai.com/api-keys.

Optional (with defaults)

`EMBEDDING_MODEL`

Default: text-embedding-3-small (1536 dims, $0.02 per 1M tokens)

Alternatives:

text-embedding-3-large (3072 dims, $0.13 per 1M tokens) — slightly better recall, 2x cost, 2x latency
text-embedding-ada-002 (1536 dims, deprecated but still works) — older

Stick with the small model unless you have measured retrieval quality issues.

`CHAT_MODEL`

Default: gpt-4o-mini

Alternatives:

gpt-4o — better, more expensive
gpt-4-turbo — older, similar cost to 4o
gpt-3.5-turbo — cheaper but noticeably worse at instruction following

gpt-4o-mini is the right default. It is cheap, fast, and follows the "answer only from the chunks" instruction reliably.

`CHUNK_SIZE`

Default: 1000 (characters)

Bigger chunks = more context per retrieval, fewer chunks total, fewer embeddings. Smaller chunks = more precise retrieval, more granularity.

Content type	Recommended size
Code, structured docs	500-800
Prose (articles, books)	1000-1500
Tabular data, reference	200-500 with metadata

`CHUNK_OVERLAP`

Default: 200 (characters)

Overlap means a sentence that spans two chunks is findable in either. 20 percent of CHUNK_SIZE is the standard heuristic.

Set to 0 for highly structured content (code, JSON) where there's no sentence to straddle.

`TOP_K`

Default: 5

How many chunks to retrieve and pass to the LLM as context.

TOP_K	Trade-off
1-3	Fast, cheap, may miss the answer
5-7	Balanced (recommended)
10+	More context, more cost, can confuse the LLM

The right value depends on your chunk size and content density.

Tuning workflow

Pick a PDF and write 10 questions you know the answer to.
Run with defaults. Score correctness manually.
If retrieval is missing relevant chunks, increase TOP_K or decrease CHUNK_SIZE.
If the LLM is hallucinating, increase the system prompt's "say so plainly" emphasis or check that retrieval is actually finding the right chunks.
If responses are slow, decrease TOP_K or switch to gpt-4o-mini.

Cost example

For a 50-page PDF (roughly 100k characters):

100 chunks at 1000 chars each
Embedding cost: 100k tokens × £0.000016/1k = roughly £0.0016 once
Each question: 1 embedding (£0.000016) + 1 chat completion (£0.0008 with 4o-mini)
Total per question: under £0.001

You can answer 1000 questions on a 50-page PDF for under £1.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Configuration

Configuration

Required

`OPENAI_API_KEY`

Optional (with defaults)

`EMBEDDING_MODEL`

`CHAT_MODEL`

`CHUNK_SIZE`

`CHUNK_OVERLAP`

`TOP_K`

Tuning workflow

Cost example

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Clone this wiki locally