Skip to content

Commit 207f887

Browse files
committed
fix(memory/typed-network): tolerant observer parsing for gpt-5-mini extraction
Phase 4c smoke surfaced 240+ zod validation errors across 47 extraction failures at gpt-5-mini. The shipped observer was strict-mode: any validation error on any field of any fact dropped the entire fact array for that session. Five tolerance fixes, all in the observer + schema: 1. Auto-wrap top-level array. gpt-5-mini frequently returns a bare facts array instead of {facts: [...]}. Detect Array.isArray after JSON.parse and wrap before schema validation. Covers 39 of the 240 errors. 2. Per-fact tolerance. Replace TypedExtractionSchema.parse (all-or-nothing) with TypedExtractionFactSchema.safeParse per fact in a new extractFactsFromContainer helper. Bad facts drop silently; good facts in the same response are kept. IDs are sequential post-drop indices for contiguous addressing. 3. Schema defaults on optional arrays + temporal block. participants, reasoning_markers, entities default to []. temporal defaults to {mention: ''} and temporal.mention itself defaults to ''. Downstream rankByTemporalOverlap already handles empty mention strings via start/end interval fallback. Covers ~65 of the 240 errors. 4. Bank coercion via z.preprocess. Uppercase the bank value before enum check so a lowercase 'world' coerces to 'WORLD' rather than dropping the fact. Covers ~65 of the 240 errors. 5. Retry-on-outer-failure. Spec section 6 specified "malformed outputs are retried once with the validation error appended to the prompt." Specified but never shipped. Now implemented for catastrophic outer failures only (invalid JSON, primitive value, missing facts key). Per-fact failures do not trigger retry. MAX_ATTEMPTS = 2. Behavior change: extract() never throws on extractable input. Persistent catastrophic failure returns []; bad individual facts are dropped silently. The three existing tests that asserted strict-mode throws (empty text, unknown bank, confidence outside [0,1]) updated to assert tolerant-mode drops. Six new tests for the new tolerant behaviors. Tests: 81/81 typed-network pass (was 75; +6 new). Wider memory suite: 742/742 pass.
1 parent be7ded8 commit 207f887

3 files changed

Lines changed: 336 additions & 41 deletions

File tree

src/memory/retrieval/typed-network/TypedNetworkObserver.ts

Lines changed: 125 additions & 22 deletions
Original file line numberDiff line numberDiff line change
@@ -2,18 +2,42 @@
22
* @file TypedNetworkObserver.ts
33
* @description LLM-driven extractor that turns a conversation block
44
* into 0+ {@link TypedFact}s. Wraps the 6-step extraction prompt and
5-
* the zod-validated parsing of the LLM's structured-output response.
5+
* the tolerant zod parsing of the LLM's structured-output response.
66
*
77
* Production wiring: a typical caller constructs the observer once per
88
* pipeline (re-using the same `gpt-5-mini` adapter), then invokes
99
* {@link TypedNetworkObserver.extract} per session. The returned facts
1010
* are then upserted into a {@link TypedNetworkStore} and embedded by
1111
* the host's {@link IEmbeddingManager}.
1212
*
13+
* **Tolerance design (Phase 4c smoke fix):** the parser accepts the
14+
* common deviations gpt-5-mini emits at scale, rather than throwing on
15+
* any deviation:
16+
*
17+
* 1. **Code-fence stripping**: triple-backtick fences (with or without
18+
* language tag) are removed before JSON parse.
19+
* 2. **Top-level array auto-wrap**: a bare `[fact, fact]` is wrapped
20+
* as `{facts: [...]}` before schema validation.
21+
* 3. **Per-fact tolerance**: facts are validated one at a time via
22+
* `TypedExtractionFactSchema.safeParse`. Bad facts are dropped
23+
* silently; good facts in the same response are kept.
24+
* 4. **Schema-level defaults**: `temporal`, `participants`,
25+
* `reasoning_markers`, and `entities` default to sensible empties
26+
* when the LLM omits them. `bank` is uppercase-coerced. See
27+
* {@link TypedExtractionFactSchema} for the full tolerance surface.
28+
* 5. **Retry-on-outer-failure**: if the catastrophic outer parse
29+
* fails (invalid JSON, primitive value, neither array nor object
30+
* with `facts`), the extractor retries once with the validation
31+
* error appended to the user prompt. Implements spec section 6's
32+
* retry path that was specified but never shipped.
33+
*
34+
* The extract method NEVER throws on extractable input; persistent
35+
* outer failure returns `[]` so the caller can continue ingest.
36+
*
1337
* @module @framers/agentos/memory/retrieval/typed-network/TypedNetworkObserver
1438
*/
1539

16-
import { TypedExtractionSchema } from './prompts/extraction-schema.js';
40+
import { TypedExtractionFactSchema } from './prompts/extraction-schema.js';
1741
import {
1842
TYPED_EXTRACTION_SYSTEM_PROMPT,
1943
buildExtractionUserPrompt,
@@ -48,6 +72,13 @@ export interface TypedNetworkObserverOptions {
4872
temperature?: number;
4973
}
5074

75+
/**
76+
* Maximum total LLM invocations per `extract` call. The first attempt
77+
* uses the base prompt; the second appends the validation error from
78+
* the first attempt for the model to self-correct against.
79+
*/
80+
const MAX_ATTEMPTS = 2;
81+
5182
/**
5283
* The 6-step extractor. Stateless aside from its constructor options;
5384
* safe to share across concurrent extractions.
@@ -64,33 +95,104 @@ export class TypedNetworkObserver {
6495
}
6596

6697
/**
67-
* Extract typed facts from a conversation block. Uses the 6-step
68-
* prompt + zod-validated parsing. The resulting facts have stable
69-
* IDs of the form `<sessionId>-fact-<index>` so re-extraction
70-
* against the same content reproduces the same IDs.
98+
* Extract typed facts from a conversation block.
99+
*
100+
* Resulting facts have stable IDs of the form
101+
* `<sessionId>-fact-<index>`, where `<index>` is the sequential
102+
* POST-DROP position so dropped facts produce contiguous IDs in the
103+
* returned array.
104+
*
105+
* **Never throws on extractable input.** Catastrophic outer parse
106+
* failures (invalid JSON, primitive value, missing facts key) get
107+
* one retry; persistent failure returns `[]`. Bad individual facts
108+
* are dropped silently via per-fact `safeParse`.
71109
*
72110
* @param sessionText - Full conversation text. Will be wrapped in
73111
* the user prompt's delimiters automatically.
74112
* @param sessionId - Stable identifier used to namespace the
75113
* resulting fact IDs.
76114
* @returns Array of {@link TypedFact}s, possibly empty.
77-
* @throws ZodError if the LLM output fails schema validation.
78-
* @throws SyntaxError if the LLM output is not valid JSON.
79115
*/
80116
async extract(sessionText: string, sessionId: string): Promise<TypedFact[]> {
81-
const raw = await this.llm.invoke({
82-
system: TYPED_EXTRACTION_SYSTEM_PROMPT,
83-
user: buildExtractionUserPrompt(sessionText),
84-
maxTokens: this.maxTokens,
85-
temperature: this.temperature,
86-
});
87-
// Strip markdown code fences if the LLM wraps the JSON in them
88-
// (some models do this even with explicit "no commentary" prompts).
89-
const stripped = stripCodeFence(raw);
90-
const json = JSON.parse(stripped);
91-
const parsed = TypedExtractionSchema.parse(json);
92-
return parsed.facts.map((f, idx) => ({
93-
id: `${sessionId}-fact-${idx}`,
117+
const baseUserPrompt = buildExtractionUserPrompt(sessionText);
118+
let lastValidationError: string | null = null;
119+
120+
for (let attempt = 0; attempt < MAX_ATTEMPTS; attempt += 1) {
121+
// First attempt uses the bare prompt; retry appends the
122+
// validation error so the model can self-correct.
123+
const userPrompt =
124+
lastValidationError === null
125+
? baseUserPrompt
126+
: `${baseUserPrompt}\n\nThe previous response failed validation: ${lastValidationError}\nReturn JSON matching the schema strictly. Do not add commentary.`;
127+
128+
const raw = await this.llm.invoke({
129+
system: TYPED_EXTRACTION_SYSTEM_PROMPT,
130+
user: userPrompt,
131+
maxTokens: this.maxTokens,
132+
temperature: this.temperature,
133+
});
134+
135+
const stripped = stripCodeFence(raw);
136+
137+
// Parse JSON. SyntaxError captures bad-JSON outer failures into
138+
// the retry path.
139+
let json: unknown;
140+
try {
141+
json = JSON.parse(stripped);
142+
} catch (err) {
143+
lastValidationError = err instanceof Error ? err.message : String(err);
144+
continue;
145+
}
146+
147+
// Auto-wrap top-level array. gpt-5-mini frequently emits a bare
148+
// facts array instead of `{facts: [...]}`; this recovers the
149+
// most common deviation.
150+
const container = Array.isArray(json) ? { facts: json } : json;
151+
152+
// Outer-shape validation. We accept any object with a `facts`
153+
// array; per-fact validation runs in `extractFactsFromContainer`.
154+
if (
155+
typeof container !== 'object' ||
156+
container === null ||
157+
!('facts' in container) ||
158+
!Array.isArray((container as { facts: unknown }).facts)
159+
) {
160+
lastValidationError =
161+
'expected JSON object with a "facts" array; got unexpected outer shape';
162+
continue;
163+
}
164+
165+
return extractFactsFromContainer(
166+
(container as { facts: unknown[] }).facts,
167+
sessionId,
168+
);
169+
}
170+
171+
// Both attempts failed at the outer layer; return empty rather
172+
// than throwing so the caller can continue ingest. The caller is
173+
// responsible for downstream "no typed facts in this session"
174+
// semantics.
175+
return [];
176+
}
177+
}
178+
179+
/**
180+
* Run per-fact tolerance over a candidate array. Returns only the
181+
* facts that pass {@link TypedExtractionFactSchema} validation;
182+
* silently drops the rest. IDs are sequential post-drop indices to
183+
* keep the output array contiguously addressable.
184+
*/
185+
function extractFactsFromContainer(
186+
candidates: unknown[],
187+
sessionId: string,
188+
): TypedFact[] {
189+
const facts: TypedFact[] = [];
190+
for (const candidate of candidates) {
191+
const result = TypedExtractionFactSchema.safeParse(candidate);
192+
if (!result.success) continue;
193+
const f = result.data;
194+
facts.push({
195+
id: `${sessionId}-fact-${facts.length}`,
94196
bank: f.bank,
95197
text: f.text,
96198
embedding: [],
@@ -99,8 +201,9 @@ export class TypedNetworkObserver {
99201
reasoningMarkers: f.reasoning_markers,
100202
entities: f.entities,
101203
confidence: f.confidence,
102-
}));
204+
});
103205
}
206+
return facts;
104207
}
105208

106209
/**

src/memory/retrieval/typed-network/__tests__/TypedNetworkObserver.test.ts

Lines changed: 156 additions & 6 deletions
Original file line numberDiff line numberDiff line change
@@ -79,13 +79,14 @@ describe('TypedNetworkObserver', () => {
7979
expect(facts[0].reasoningMarkers).toEqual(['Because', 'we use']);
8080
});
8181

82-
it('throws on missing required field (zod validation)', async () => {
82+
it('drops fact with empty text (per-fact tolerance)', async () => {
8383
const llm = mockLLM('{"facts": [{"text": ""}]}');
8484
const obs = new TypedNetworkObserver({ llm });
85-
await expect(obs.extract('blah', 'session-2')).rejects.toThrow();
85+
const facts = await obs.extract('blah', 'session-2');
86+
expect(facts).toEqual([]);
8687
});
8788

88-
it('throws on unknown bank label', async () => {
89+
it('drops fact with bank label that does not coerce to W/E/O/S', async () => {
8990
const llm = mockLLM(JSON.stringify({
9091
facts: [{
9192
text: 'foo',
@@ -98,10 +99,11 @@ describe('TypedNetworkObserver', () => {
9899
}],
99100
}));
100101
const obs = new TypedNetworkObserver({ llm });
101-
await expect(obs.extract('text', 's1')).rejects.toThrow();
102+
const facts = await obs.extract('text', 's1');
103+
expect(facts).toEqual([]);
102104
});
103105

104-
it('throws on confidence outside [0, 1]', async () => {
106+
it('drops fact with confidence outside [0, 1]', async () => {
105107
const llm = mockLLM(JSON.stringify({
106108
facts: [{
107109
text: 'foo',
@@ -114,7 +116,155 @@ describe('TypedNetworkObserver', () => {
114116
}],
115117
}));
116118
const obs = new TypedNetworkObserver({ llm });
117-
await expect(obs.extract('text', 's1')).rejects.toThrow();
119+
const facts = await obs.extract('text', 's1');
120+
expect(facts).toEqual([]);
121+
});
122+
123+
// ---------------------------------------------------------------------------
124+
// Tolerance fixes (Phase 4c smoke surfaced 240+ zod errors at gpt-5-mini)
125+
// ---------------------------------------------------------------------------
126+
127+
it('auto-wraps top-level array as {facts: ...} when LLM omits the wrapping object', async () => {
128+
// gpt-5-mini frequently returns a bare facts array instead of {facts: [...]}.
129+
// The observer detects this shape and wraps it so the rest of the pipeline
130+
// works unchanged.
131+
const llm = mockLLM(JSON.stringify([{
132+
text: 'Berlin is in Germany',
133+
bank: 'WORLD',
134+
temporal: { mention: '2026-04-26' },
135+
participants: [],
136+
reasoning_markers: [],
137+
entities: ['Berlin', 'Germany'],
138+
confidence: 1.0,
139+
}]));
140+
const obs = new TypedNetworkObserver({ llm });
141+
const facts = await obs.extract('User: Where is Berlin?', 'session-aw');
142+
expect(facts).toHaveLength(1);
143+
expect(facts[0].bank).toBe('WORLD');
144+
expect(facts[0].entities).toContain('Berlin');
145+
});
146+
147+
it('drops invalid facts and keeps valid facts in the same response', async () => {
148+
// Per-fact tolerance: one bad apple does not spoil the bunch. The
149+
// shipped strict-mode parser threw on any single-fact failure, losing
150+
// every other fact in the same extraction call.
151+
const llm = mockLLM(JSON.stringify({
152+
facts: [
153+
{
154+
text: 'Berlin is in Germany',
155+
bank: 'WORLD',
156+
temporal: { mention: '2026-04-26' },
157+
participants: [],
158+
reasoning_markers: [],
159+
entities: ['Berlin'],
160+
confidence: 1.0,
161+
},
162+
null,
163+
'a string fact',
164+
{
165+
text: 'Munich is in Germany',
166+
bank: 'WORLD',
167+
temporal: { mention: '2026-04-26' },
168+
participants: [],
169+
reasoning_markers: [],
170+
entities: ['Munich'],
171+
confidence: 1.0,
172+
},
173+
],
174+
}));
175+
const obs = new TypedNetworkObserver({ llm });
176+
const facts = await obs.extract('blah', 'session-pft');
177+
expect(facts).toHaveLength(2);
178+
expect(facts.map((f) => f.text)).toEqual([
179+
'Berlin is in Germany',
180+
'Munich is in Germany',
181+
]);
182+
});
183+
184+
it('defaults missing array fields (participants, reasoning_markers, entities) to []', async () => {
185+
// gpt-5-mini frequently omits empty array fields entirely instead of
186+
// emitting them as []. The schema accepts the missing fields and fills
187+
// in [].
188+
const llm = mockLLM(JSON.stringify({
189+
facts: [{
190+
text: 'Berlin is in Germany',
191+
bank: 'WORLD',
192+
temporal: { mention: '2026-04-26' },
193+
confidence: 1.0,
194+
// missing: participants, reasoning_markers, entities
195+
}],
196+
}));
197+
const obs = new TypedNetworkObserver({ llm });
198+
const facts = await obs.extract('blah', 'session-def');
199+
expect(facts).toHaveLength(1);
200+
expect(facts[0].participants).toEqual([]);
201+
expect(facts[0].reasoningMarkers).toEqual([]);
202+
expect(facts[0].entities).toEqual([]);
203+
});
204+
205+
it('coerces lowercase bank to uppercase before validation', async () => {
206+
// The 6-step prompt instructs UPPERCASE banks but the LLM sometimes
207+
// emits lowercase. A single uppercase coercion at parse time recovers
208+
// the fact instead of dropping it.
209+
const llm = mockLLM(JSON.stringify({
210+
facts: [{
211+
text: 'Berlin is in Germany',
212+
bank: 'world',
213+
temporal: { mention: '2026-04-26' },
214+
participants: [],
215+
reasoning_markers: [],
216+
entities: ['Berlin'],
217+
confidence: 1.0,
218+
}],
219+
}));
220+
const obs = new TypedNetworkObserver({ llm });
221+
const facts = await obs.extract('blah', 'session-co');
222+
expect(facts).toHaveLength(1);
223+
expect(facts[0].bank).toBe('WORLD');
224+
});
225+
226+
it('retries once when outer parse fails completely (spec section 6 retry path)', async () => {
227+
// Spec §6: "malformed outputs are retried once with the validation
228+
// error appended to the prompt." Originally specified, not implemented
229+
// in shipping code. Only retries on catastrophic outer failure
230+
// (invalid JSON, primitive value, missing facts key) — per-fact errors
231+
// are handled silently via tolerance above.
232+
let calls = 0;
233+
const llm: ITypedExtractionLLM = {
234+
invoke: async () => {
235+
calls += 1;
236+
if (calls === 1) return 'definitely not json';
237+
return JSON.stringify({
238+
facts: [{
239+
text: 'Berlin is in Germany',
240+
bank: 'WORLD',
241+
temporal: { mention: '2026-04-26' },
242+
participants: [],
243+
reasoning_markers: [],
244+
entities: ['Berlin'],
245+
confidence: 1.0,
246+
}],
247+
});
248+
},
249+
};
250+
const obs = new TypedNetworkObserver({ llm });
251+
const facts = await obs.extract('blah', 'session-rt');
252+
expect(calls).toBe(2);
253+
expect(facts).toHaveLength(1);
254+
});
255+
256+
it('returns [] when retry also fails (no infinite retry loop)', async () => {
257+
let calls = 0;
258+
const llm: ITypedExtractionLLM = {
259+
invoke: async () => {
260+
calls += 1;
261+
return 'still not json';
262+
},
263+
};
264+
const obs = new TypedNetworkObserver({ llm });
265+
const facts = await obs.extract('blah', 'session-rt2');
266+
expect(calls).toBe(2);
267+
expect(facts).toEqual([]);
118268
});
119269

120270
it('tolerates triple-backtick code fence around JSON', async () => {

0 commit comments

Comments
 (0)