-
Notifications
You must be signed in to change notification settings - Fork 0
Configuration
All configuration lives in .env.local. There are six knobs.
Your OpenAI API key. Used for both embeddings and chat completions.
OPENAI_API_KEY=sk-proj-...
Get one at platform.openai.com/api-keys.
Default: text-embedding-3-small (1536 dims, $0.02 per 1M tokens)
Alternatives:
-
text-embedding-3-large(3072 dims, $0.13 per 1M tokens) — slightly better recall, 2x cost, 2x latency -
text-embedding-ada-002(1536 dims, deprecated but still works) — older
Stick with the small model unless you have measured retrieval quality issues.
Default: gpt-4o-mini
Alternatives:
-
gpt-4o— better, more expensive -
gpt-4-turbo— older, similar cost to 4o -
gpt-3.5-turbo— cheaper but noticeably worse at instruction following
gpt-4o-mini is the right default. It is cheap, fast, and follows the "answer only from the chunks" instruction reliably.
Default: 1000 (characters)
Bigger chunks = more context per retrieval, fewer chunks total, fewer embeddings. Smaller chunks = more precise retrieval, more granularity.
| Content type | Recommended size |
|---|---|
| Code, structured docs | 500-800 |
| Prose (articles, books) | 1000-1500 |
| Tabular data, reference | 200-500 with metadata |
Default: 200 (characters)
Overlap means a sentence that spans two chunks is findable in either. 20 percent of CHUNK_SIZE is the standard heuristic.
Set to 0 for highly structured content (code, JSON) where there's no sentence to straddle.
Default: 5
How many chunks to retrieve and pass to the LLM as context.
| TOP_K | Trade-off |
|---|---|
| 1-3 | Fast, cheap, may miss the answer |
| 5-7 | Balanced (recommended) |
| 10+ | More context, more cost, can confuse the LLM |
The right value depends on your chunk size and content density.
- Pick a PDF and write 10 questions you know the answer to.
- Run with defaults. Score correctness manually.
- If retrieval is missing relevant chunks, increase
TOP_Kor decreaseCHUNK_SIZE. - If the LLM is hallucinating, increase the system prompt's "say so plainly" emphasis or check that retrieval is actually finding the right chunks.
- If responses are slow, decrease
TOP_Kor switch togpt-4o-mini.
For a 50-page PDF (roughly 100k characters):
- 100 chunks at 1000 chars each
- Embedding cost: 100k tokens × £0.000016/1k = roughly £0.0016 once
- Each question: 1 embedding (£0.000016) + 1 chat completion (£0.0008 with 4o-mini)
- Total per question: under £0.001
You can answer 1000 questions on a 50-page PDF for under £1.