Skip to content

Commit dcecbdb

Browse files
committed
docs: add HyDE retrieval configuration guide
1 parent 313e586 commit dcecbdb

1 file changed

Lines changed: 260 additions & 0 deletions

File tree

docs/HYDE_RETRIEVAL.md

Lines changed: 260 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,260 @@
1+
# HyDE (Hypothetical Document Embedding) Retrieval
2+
3+
HyDE improves RAG and memory retrieval by generating a hypothetical answer before
4+
embedding. Instead of embedding the raw user query, HyDE first asks an LLM to
5+
produce a plausible answer, then embeds *that* answer for vector search. The
6+
hypothesis is semantically closer to actual stored documents than a question is,
7+
yielding better recall.
8+
9+
Based on:
10+
- Gao et al. 2023 "Precise Zero-Shot Dense Retrieval without Relevance Labels"
11+
- Lei et al. 2025 "Never Come Up Empty: Adaptive HyDE Retrieval for Improving
12+
LLM Developer Support"
13+
14+
## How It Works
15+
16+
```
17+
Standard: Query --> Embed(query) --> Vector Search --> Results
18+
HyDE: Query --> LLM(hypothesis) --> Embed(hypothesis) --> Vector Search --> Results
19+
^ ^
20+
Extra LLM call Better semantic match
21+
```
22+
23+
The key insight: questions and answers live in different regions of embedding
24+
space. A question like "What causes memory leaks in Node?" is far from the
25+
answer text "Memory leaks in Node.js are caused by...". But a hypothetical
26+
answer *generated from the question* is much closer to the stored answer,
27+
producing higher cosine similarity scores.
28+
29+
## When to Use HyDE
30+
31+
**Good candidates:**
32+
- Knowledge base queries where the question phrasing differs from document style
33+
- Vague or exploratory queries ("that thing about deployment")
34+
- Memory recall where stored traces are statement-form, not question-form
35+
- Background/batch processing where latency is less critical
36+
37+
**Avoid when:**
38+
- Real-time chat with tight latency budgets (adds one LLM call per query)
39+
- Simple keyword-style lookups where direct embedding already works well
40+
- The query is already in statement/answer form
41+
42+
## Configuration
43+
44+
### agent.config.json
45+
46+
HyDE is configured per-request, not globally. The `HydeRetriever` class and
47+
its config types are exported from `@framers/agentos/rag`.
48+
49+
```json
50+
{
51+
"rag": {
52+
"hyde": {
53+
"enabled": true,
54+
"initialThreshold": 0.7,
55+
"minThreshold": 0.3,
56+
"thresholdStep": 0.1,
57+
"adaptiveThreshold": true,
58+
"maxHypothesisTokens": 200,
59+
"fullAnswerGranularity": true
60+
}
61+
}
62+
}
63+
```
64+
65+
### Configuration Options
66+
67+
| Option | Type | Default | Description |
68+
|--------|------|---------|-------------|
69+
| `enabled` | `boolean` | `false` | Master switch for HyDE |
70+
| `initialThreshold` | `number` | `0.7` | Starting similarity threshold |
71+
| `minThreshold` | `number` | `0.3` | Lowest threshold before giving up |
72+
| `thresholdStep` | `number` | `0.1` | How much to reduce threshold per step |
73+
| `adaptiveThreshold` | `boolean` | `true` | Enable step-down when no results found |
74+
| `maxHypothesisTokens` | `number` | `200` | Max tokens for hypothesis generation |
75+
| `fullAnswerGranularity` | `boolean` | `true` | Generate full prose answers vs keywords |
76+
77+
## Programmatic API
78+
79+
### 1. RetrievalAugmentor (main RAG pipeline)
80+
81+
```typescript
82+
import { RetrievalAugmentor } from '@framers/agentos/rag';
83+
84+
const augmentor = new RetrievalAugmentor();
85+
await augmentor.initialize(config, embeddingManager, vectorStoreManager);
86+
87+
// Register an LLM caller for hypothesis generation
88+
augmentor.setHydeLlmCaller(async (systemPrompt, userPrompt) => {
89+
const response = await openai.chat.completions.create({
90+
model: 'gpt-4o-mini',
91+
messages: [
92+
{ role: 'system', content: systemPrompt },
93+
{ role: 'user', content: userPrompt },
94+
],
95+
max_tokens: 200,
96+
});
97+
return response.choices[0].message.content ?? '';
98+
});
99+
100+
// Enable HyDE per-request
101+
const result = await augmentor.retrieveContext('What causes memory leaks?', {
102+
hyde: {
103+
enabled: true,
104+
// Optional: pre-supply a hypothesis to skip the LLM call
105+
// hypothesis: 'Memory leaks are caused by...',
106+
// Optional: tune thresholds for this request
107+
// initialThreshold: 0.8,
108+
// minThreshold: 0.4,
109+
},
110+
});
111+
112+
// HyDE diagnostics are in the result
113+
console.log(result.diagnostics?.hyde);
114+
// {
115+
// hypothesis: 'Memory leaks in Node.js are typically caused by...',
116+
// hypothesisLatencyMs: 342,
117+
// effectiveThreshold: 0.7,
118+
// thresholdSteps: 0,
119+
// }
120+
```
121+
122+
### 2. MultimodalIndexer (cross-modal search)
123+
124+
```typescript
125+
import { MultimodalIndexer, HydeRetriever } from '@framers/agentos/rag';
126+
127+
const indexer = new MultimodalIndexer({
128+
embeddingManager,
129+
vectorStore,
130+
visionProvider,
131+
});
132+
133+
// Attach a HyDE retriever
134+
indexer.setHydeRetriever(new HydeRetriever({
135+
llmCaller: myLlmCaller,
136+
embeddingManager,
137+
config: { enabled: true },
138+
}));
139+
140+
// Search with HyDE
141+
const results = await indexer.search('architecture diagram', {
142+
modalities: ['image'],
143+
hyde: { enabled: true },
144+
});
145+
```
146+
147+
### 3. CognitiveMemoryManager (memory recall)
148+
149+
```typescript
150+
import { CognitiveMemoryManager, HydeRetriever } from '@framers/agentos';
151+
152+
const memoryManager = new CognitiveMemoryManager();
153+
await memoryManager.initialize(config);
154+
155+
// Attach a HyDE retriever
156+
memoryManager.setHydeRetriever(new HydeRetriever({
157+
llmCaller: myLlmCaller,
158+
embeddingManager,
159+
config: { enabled: true },
160+
}));
161+
162+
// Retrieve memories with HyDE
163+
const result = await memoryManager.retrieve(
164+
'that deployment discussion',
165+
currentMood,
166+
{ hyde: true },
167+
);
168+
```
169+
170+
### 4. Standalone HydeRetriever
171+
172+
```typescript
173+
import { HydeRetriever } from '@framers/agentos/rag';
174+
175+
const retriever = new HydeRetriever({
176+
llmCaller: async (system, user) => {
177+
// Your LLM call here
178+
return hypotheticalAnswer;
179+
},
180+
embeddingManager,
181+
config: {
182+
enabled: true,
183+
adaptiveThreshold: true,
184+
initialThreshold: 0.7,
185+
minThreshold: 0.3,
186+
},
187+
});
188+
189+
// Generate hypothesis only
190+
const { hypothesis, latencyMs } = await retriever.generateHypothesis(
191+
'What is retrieval augmented generation?',
192+
);
193+
194+
// Full retrieve cycle with adaptive thresholding
195+
const result = await retriever.retrieve({
196+
query: 'What is RAG?',
197+
vectorStore: myVectorStore,
198+
collectionName: 'knowledge-base',
199+
});
200+
```
201+
202+
## Adaptive Thresholding
203+
204+
HyDE supports adaptive threshold stepping: if no results are found at the
205+
initial similarity threshold, it steps down until content is found or the
206+
minimum threshold is reached. This ensures HyDE never "comes up empty."
207+
208+
```
209+
Initial threshold: 0.7 --> No results
210+
Step down to: 0.6 --> No results
211+
Step down to: 0.5 --> Found 3 results! (stop here)
212+
```
213+
214+
The `thresholdSteps` diagnostic tells you how many steps were needed.
215+
216+
## Audit Trail
217+
218+
When `includeAudit: true` is passed to `retrieveContext()`, HyDE operations
219+
appear in the audit trail with operation type `'hyde'`:
220+
221+
```typescript
222+
const result = await augmentor.retrieveContext(query, {
223+
hyde: { enabled: true },
224+
includeAudit: true,
225+
});
226+
227+
const hydeOp = result.auditTrail?.operations.find(
228+
(op) => op.operationType === 'hyde',
229+
);
230+
// hydeOp.hydeDetails.hypothesis
231+
// hydeOp.hydeDetails.effectiveThreshold
232+
// hydeOp.hydeDetails.thresholdSteps
233+
// hydeOp.tokenUsage (embedding + LLM tokens)
234+
```
235+
236+
## Performance Implications
237+
238+
| Metric | Without HyDE | With HyDE |
239+
|--------|-------------|-----------|
240+
| LLM calls per query | 0 | 1 |
241+
| Embedding calls | 1 | 1 (hypothesis instead of query) |
242+
| Vector searches | 1 | 1-N (N = adaptive steps) |
243+
| Typical added latency | 0 | 200-500ms (LLM generation) |
244+
| Recall improvement | baseline | +10-30% on vague queries |
245+
246+
The LLM call uses a small, fast model by default (configured via the caller).
247+
Using `gpt-4o-mini` or similar keeps latency under 300ms for most queries.
248+
249+
## Graceful Degradation
250+
251+
HyDE degrades gracefully in all failure scenarios:
252+
253+
1. **No LLM caller registered**: Falls back to direct query embedding with a
254+
diagnostic message.
255+
2. **LLM call fails**: Falls back to direct query embedding.
256+
3. **Hypothesis embedding fails**: Falls back to direct query embedding.
257+
4. **No results at any threshold**: Returns empty results (same as without HyDE).
258+
259+
The system never throws due to HyDE failures -- it always falls back to the
260+
standard retrieval path.

0 commit comments

Comments
 (0)