Configuration

All configuration lives in .env.local. There are eight knobs.

Required

`OPENAI_API_KEY`

Your OpenAI API key. Used for embeddings, the LLM reranker, and chat completions.

OPENAI_API_KEY=sk-proj-...

Get one at platform.openai.com/api-keys.

Optional (with defaults)

`EMBEDDING_MODEL`

Default: text-embedding-3-small (1536 dims, $0.02 per 1M tokens)

Alternatives:

text-embedding-3-large (3072 dims, $0.13 per 1M tokens) — slightly better recall, 2x cost, 2x latency
text-embedding-ada-002 (1536 dims, deprecated but still works) — older

Stick with the small model unless you have measured retrieval quality issues.

`CHAT_MODEL`

Default: gpt-4o-mini

Used for both answer generation and the LLM reranker.

Alternatives:

gpt-4o better, more expensive
gpt-4-turbo older, similar cost to 4o

gpt-4o-mini is the right default. It is cheap, fast, follows the "answer only from the passages" instruction reliably, and reranks well enough in one batched call. If the reranker call fails for any reason, retrieval falls back to a deterministic lexical reranker, so a model hiccup never breaks a question.

`CHUNK_SIZE`

Default: 1000 (characters)

Bigger chunks = more context per retrieval, fewer chunks total, fewer embeddings. Smaller chunks = more precise retrieval, more granularity.

Content type	Recommended size
Code, structured docs	500-800
Prose (articles, books)	1000-1500
Tabular data, reference	200-500 with metadata

`CHUNK_OVERLAP`

Default: 200 (characters)

Overlap means a sentence that spans two chunks is findable in either. 20 percent of CHUNK_SIZE is the standard heuristic.

Set to 0 for highly structured content (code, JSON) where there's no sentence to straddle.

`TOP_K`

Default: 5

How many chunks the reranker keeps and passes to the model as context. Hybrid search first pulls a wider candidate pool (roughly four times TOP_K, minimum 20) for the reranker to work over, then the reranker trims to TOP_K.

TOP_K	Trade-off
1-3	Fast, cheap, may miss the answer
5-7	Balanced (recommended)
10+	More context, more cost, can confuse the model

The right value depends on your chunk size and content density.

`HYBRID_DENSE_WEIGHT` and `HYBRID_LEXICAL_WEIGHT`

Default: 1 and 1 (plain, unweighted Reciprocal Rank Fusion)

Hybrid search fuses two rankings, dense (embeddings) and lexical (BM25), with weighted RRF. Each ranking contributes weight / (60 + rank). Equal weights reproduce plain RRF, which is the right default. Tilt the weights when you know your corpus leans one way:

Corpus	Suggestion
Codes, SKUs, error strings, identifiers	Raise `HYBRID_LEXICAL_WEIGHT` to 1.5-2
Prose, articles, paraphrase-heavy text	Raise `HYBRID_DENSE_WEIGHT` to 1.5-2
Mixed or unknown	Leave both at 1

Weights of 0 are allowed: set the lexical weight to 0 for dense-only fusion, or the dense weight to 0 for lexical-only. Any non-negative finite value is accepted; an invalid value falls back to the default.

Tuning workflow

Pick a PDF and write 10 questions you know the answer to.
Run with defaults. Score correctness manually.
If retrieval is missing relevant chunks, increase TOP_K or decrease CHUNK_SIZE.
If an exact term is missed, confirm it is in the extracted text; hybrid search and BM25 should otherwise surface it.
If the model is hallucinating, check that retrieval is finding the right chunks; the citations make it obvious which passages were used.
If responses are slow or costly, decrease TOP_K, which shrinks both the candidate pool and the reranking call.

Cost example

For a 50-page PDF (roughly 100k characters):

100 chunks at 1000 chars each.
Embedding cost: 100k tokens at roughly £0.000016 per 1k, about £0.0016 once.
Each question: 1 question embedding, 1 reranking call over the candidate pool, and 1 chat completion. With gpt-4o-mini the rerank plus generation is well under a penny.
Total per question: under £0.002.

You can answer 1000 questions on a 50-page PDF for around £1 to £2. Drop the reranker if you want to remove one call.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Configuration

Configuration

Required

`OPENAI_API_KEY`

Optional (with defaults)

`EMBEDDING_MODEL`

`CHAT_MODEL`

`CHUNK_SIZE`

`CHUNK_OVERLAP`

`TOP_K`

`HYBRID_DENSE_WEIGHT` and `HYBRID_LEXICAL_WEIGHT`

Tuning workflow

Cost example

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Clone this wiki locally

Configuration

Configuration

Required

OPENAI_API_KEY

Optional (with defaults)

EMBEDDING_MODEL

CHAT_MODEL

CHUNK_SIZE

CHUNK_OVERLAP

TOP_K

HYBRID_DENSE_WEIGHT and HYBRID_LEXICAL_WEIGHT

Tuning workflow

Cost example

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Clone this wiki locally

`OPENAI_API_KEY`

`EMBEDDING_MODEL`

`CHAT_MODEL`

`CHUNK_SIZE`

`CHUNK_OVERLAP`

`TOP_K`

`HYBRID_DENSE_WEIGHT` and `HYBRID_LEXICAL_WEIGHT`