fix(memory/typed-network): tolerant observer parsing for gpt-5-mini extraction

jddunn · jddunn · commit 207f887c9886 · 2026-04-25T23:35:34.000-07:00
Phase 4c smoke surfaced 240+ zod validation errors across 47 extraction
failures at gpt-5-mini. The shipped observer was strict-mode: any
validation error on any field of any fact dropped the entire fact array
for that session. Five tolerance fixes, all in the observer + schema:

1. Auto-wrap top-level array. gpt-5-mini frequently returns a bare facts
   array instead of {facts: [...]}. Detect Array.isArray after JSON.parse
   and wrap before schema validation. Covers 39 of the 240 errors.

2. Per-fact tolerance. Replace TypedExtractionSchema.parse (all-or-nothing)
   with TypedExtractionFactSchema.safeParse per fact in a new
   extractFactsFromContainer helper. Bad facts drop silently; good facts
   in the same response are kept. IDs are sequential post-drop indices
   for contiguous addressing.

3. Schema defaults on optional arrays + temporal block. participants,
   reasoning_markers, entities default to []. temporal defaults to
   {mention: ''} and temporal.mention itself defaults to ''. Downstream
   rankByTemporalOverlap already handles empty mention strings via
   start/end interval fallback. Covers ~65 of the 240 errors.

4. Bank coercion via z.preprocess. Uppercase the bank value before enum
   check so a lowercase 'world' coerces to 'WORLD' rather than dropping
   the fact. Covers ~65 of the 240 errors.

5. Retry-on-outer-failure. Spec section 6 specified "malformed outputs
   are retried once with the validation error appended to the prompt."
   Specified but never shipped. Now implemented for catastrophic outer
   failures only (invalid JSON, primitive value, missing facts key).
   Per-fact failures do not trigger retry. MAX_ATTEMPTS = 2.

Behavior change: extract() never throws on extractable input. Persistent
catastrophic failure returns []; bad individual facts are dropped silently.
The three existing tests that asserted strict-mode throws (empty text,
unknown bank, confidence outside [0,1]) updated to assert tolerant-mode
drops. Six new tests for the new tolerant behaviors.

Tests: 81/81 typed-network pass (was 75; +6 new). Wider memory suite:
742/742 pass.
diff --git a/src/memory/retrieval/typed-network/TypedNetworkObserver.ts b/src/memory/retrieval/typed-network/TypedNetworkObserver.ts
@@ -2,18 +2,42 @@
  * @file TypedNetworkObserver.ts
  * @description LLM-driven extractor that turns a conversation block
  * into 0+ {@link TypedFact}s. Wraps the 6-step extraction prompt and
- * the zod-validated parsing of the LLM's structured-output response.
+ * the tolerant zod parsing of the LLM's structured-output response.
  *
  * Production wiring: a typical caller constructs the observer once per
  * pipeline (re-using the same `gpt-5-mini` adapter), then invokes
  * {@link TypedNetworkObserver.extract} per session. The returned facts
  * are then upserted into a {@link TypedNetworkStore} and embedded by
  * the host's {@link IEmbeddingManager}.
  *
+ * **Tolerance design (Phase 4c smoke fix):** the parser accepts the
+ * common deviations gpt-5-mini emits at scale, rather than throwing on
+ * any deviation:
+ *
+ * 1. **Code-fence stripping**: triple-backtick fences (with or without
+ *    language tag) are removed before JSON parse.
+ * 2. **Top-level array auto-wrap**: a bare `[fact, fact]` is wrapped
+ *    as `{facts: [...]}` before schema validation.
+ * 3. **Per-fact tolerance**: facts are validated one at a time via
+ *    `TypedExtractionFactSchema.safeParse`. Bad facts are dropped
+ *    silently; good facts in the same response are kept.
+ * 4. **Schema-level defaults**: `temporal`, `participants`,
+ *    `reasoning_markers`, and `entities` default to sensible empties
+ *    when the LLM omits them. `bank` is uppercase-coerced. See
+ *    {@link TypedExtractionFactSchema} for the full tolerance surface.
+ * 5. **Retry-on-outer-failure**: if the catastrophic outer parse
+ *    fails (invalid JSON, primitive value, neither array nor object
+ *    with `facts`), the extractor retries once with the validation
+ *    error appended to the user prompt. Implements spec section 6's
+ *    retry path that was specified but never shipped.
+ *
+ * The extract method NEVER throws on extractable input; persistent
+ * outer failure returns `[]` so the caller can continue ingest.
+ *
  * @module @framers/agentos/memory/retrieval/typed-network/TypedNetworkObserver
  */
 
-import { TypedExtractionSchema } from './prompts/extraction-schema.js';
+import { TypedExtractionFactSchema } from './prompts/extraction-schema.js';
 import {
   TYPED_EXTRACTION_SYSTEM_PROMPT,
   buildExtractionUserPrompt,
@@ -48,6 +72,13 @@ export interface TypedNetworkObserverOptions {
   temperature?: number;
 }
 
+/**
+ * Maximum total LLM invocations per `extract` call. The first attempt
+ * uses the base prompt; the second appends the validation error from
+ * the first attempt for the model to self-correct against.
+ */
+const MAX_ATTEMPTS = 2;
+
 /**
  * The 6-step extractor. Stateless aside from its constructor options;
  * safe to share across concurrent extractions.
@@ -64,33 +95,104 @@ export class TypedNetworkObserver {
   }
 
   /**
-   * Extract typed facts from a conversation block. Uses the 6-step
-   * prompt + zod-validated parsing. The resulting facts have stable
-   * IDs of the form `<sessionId>-fact-<index>` so re-extraction
-   * against the same content reproduces the same IDs.
+   * Extract typed facts from a conversation block.
+   *
+   * Resulting facts have stable IDs of the form
+   * `<sessionId>-fact-<index>`, where `<index>` is the sequential
+   * POST-DROP position so dropped facts produce contiguous IDs in the
+   * returned array.
+   *
+   * **Never throws on extractable input.** Catastrophic outer parse
+   * failures (invalid JSON, primitive value, missing facts key) get
+   * one retry; persistent failure returns `[]`. Bad individual facts
+   * are dropped silently via per-fact `safeParse`.
    *
    * @param sessionText - Full conversation text. Will be wrapped in
    *   the user prompt's delimiters automatically.
    * @param sessionId - Stable identifier used to namespace the
    *   resulting fact IDs.
    * @returns Array of {@link TypedFact}s, possibly empty.
-   * @throws ZodError if the LLM output fails schema validation.
-   * @throws SyntaxError if the LLM output is not valid JSON.
    */
   async extract(sessionText: string, sessionId: string): Promise<TypedFact[]> {
-    const raw = await this.llm.invoke({
-      system: TYPED_EXTRACTION_SYSTEM_PROMPT,
-      user: buildExtractionUserPrompt(sessionText),
-      maxTokens: this.maxTokens,
-      temperature: this.temperature,
-    });
-    // Strip markdown code fences if the LLM wraps the JSON in them
-    // (some models do this even with explicit "no commentary" prompts).
-    const stripped = stripCodeFence(raw);
-    const json = JSON.parse(stripped);
-    const parsed = TypedExtractionSchema.parse(json);
-    return parsed.facts.map((f, idx) => ({
-      id: `${sessionId}-fact-${idx}`,
+    const baseUserPrompt = buildExtractionUserPrompt(sessionText);
+    let lastValidationError: string | null = null;
+
+    for (let attempt = 0; attempt < MAX_ATTEMPTS; attempt += 1) {
+      // First attempt uses the bare prompt; retry appends the
+      // validation error so the model can self-correct.
+      const userPrompt =
+        lastValidationError === null
+          ? baseUserPrompt
+          : `${baseUserPrompt}\n\nThe previous response failed validation: ${lastValidationError}\nReturn JSON matching the schema strictly. Do not add commentary.`;
+
+      const raw = await this.llm.invoke({
+        system: TYPED_EXTRACTION_SYSTEM_PROMPT,
+        user: userPrompt,
+        maxTokens: this.maxTokens,
+        temperature: this.temperature,
+      });
+
+      const stripped = stripCodeFence(raw);
+
+      // Parse JSON. SyntaxError captures bad-JSON outer failures into
+      // the retry path.
+      let json: unknown;
+      try {
+        json = JSON.parse(stripped);
+      } catch (err) {
+        lastValidationError = err instanceof Error ? err.message : String(err);
+        continue;
+      }
+
+      // Auto-wrap top-level array. gpt-5-mini frequently emits a bare
+      // facts array instead of `{facts: [...]}`; this recovers the
+      // most common deviation.
+      const container = Array.isArray(json) ? { facts: json } : json;
+
+      // Outer-shape validation. We accept any object with a `facts`
+      // array; per-fact validation runs in `extractFactsFromContainer`.
+      if (
+        typeof container !== 'object' ||
+        container === null ||
+        !('facts' in container) ||
+        !Array.isArray((container as { facts: unknown }).facts)
+      ) {
+        lastValidationError =
+          'expected JSON object with a "facts" array; got unexpected outer shape';
+        continue;
+      }
+
+      return extractFactsFromContainer(
+        (container as { facts: unknown[] }).facts,
+        sessionId,
+      );
+    }
+
+    // Both attempts failed at the outer layer; return empty rather
+    // than throwing so the caller can continue ingest. The caller is
+    // responsible for downstream "no typed facts in this session"
+    // semantics.
+    return [];
+  }
+}
+
+/**
+ * Run per-fact tolerance over a candidate array. Returns only the
+ * facts that pass {@link TypedExtractionFactSchema} validation;
+ * silently drops the rest. IDs are sequential post-drop indices to
+ * keep the output array contiguously addressable.
+ */
+function extractFactsFromContainer(
+  candidates: unknown[],
+  sessionId: string,
+): TypedFact[] {
+  const facts: TypedFact[] = [];
+  for (const candidate of candidates) {
+    const result = TypedExtractionFactSchema.safeParse(candidate);
+    if (!result.success) continue;
+    const f = result.data;
+    facts.push({
+      id: `${sessionId}-fact-${facts.length}`,
       bank: f.bank,
       text: f.text,
       embedding: [],
@@ -99,8 +201,9 @@ export class TypedNetworkObserver {
       reasoningMarkers: f.reasoning_markers,
       entities: f.entities,
       confidence: f.confidence,
-    }));
+    });
   }
+  return facts;
 }
 
 /**
diff --git a/src/memory/retrieval/typed-network/__tests__/TypedNetworkObserver.test.ts b/src/memory/retrieval/typed-network/__tests__/TypedNetworkObserver.test.ts
@@ -79,13 +79,14 @@ describe('TypedNetworkObserver', () => {
     expect(facts[0].reasoningMarkers).toEqual(['Because', 'we use']);
   });
 
-  it('throws on missing required field (zod validation)', async () => {
+  it('drops fact with empty text (per-fact tolerance)', async () => {
     const llm = mockLLM('{"facts": [{"text": ""}]}');
     const obs = new TypedNetworkObserver({ llm });
-    await expect(obs.extract('blah', 'session-2')).rejects.toThrow();
+    const facts = await obs.extract('blah', 'session-2');
+    expect(facts).toEqual([]);
   });
 
-  it('throws on unknown bank label', async () => {
+  it('drops fact with bank label that does not coerce to W/E/O/S', async () => {
     const llm = mockLLM(JSON.stringify({
       facts: [{
         text: 'foo',
@@ -98,10 +99,11 @@ describe('TypedNetworkObserver', () => {
       }],
     }));
     const obs = new TypedNetworkObserver({ llm });
-    await expect(obs.extract('text', 's1')).rejects.toThrow();
+    const facts = await obs.extract('text', 's1');
+    expect(facts).toEqual([]);
   });
 
-  it('throws on confidence outside [0, 1]', async () => {
+  it('drops fact with confidence outside [0, 1]', async () => {
     const llm = mockLLM(JSON.stringify({
       facts: [{
         text: 'foo',
@@ -114,7 +116,155 @@ describe('TypedNetworkObserver', () => {
       }],
     }));
     const obs = new TypedNetworkObserver({ llm });
-    await expect(obs.extract('text', 's1')).rejects.toThrow();
+    const facts = await obs.extract('text', 's1');
+    expect(facts).toEqual([]);
+  });
+
+  // ---------------------------------------------------------------------------
+  // Tolerance fixes (Phase 4c smoke surfaced 240+ zod errors at gpt-5-mini)
+  // ---------------------------------------------------------------------------
+
+  it('auto-wraps top-level array as {facts: ...} when LLM omits the wrapping object', async () => {
+    // gpt-5-mini frequently returns a bare facts array instead of {facts: [...]}.
+    // The observer detects this shape and wraps it so the rest of the pipeline
+    // works unchanged.
+    const llm = mockLLM(JSON.stringify([{
+      text: 'Berlin is in Germany',
+      bank: 'WORLD',
+      temporal: { mention: '2026-04-26' },
+      participants: [],
+      reasoning_markers: [],
+      entities: ['Berlin', 'Germany'],
+      confidence: 1.0,
+    }]));
+    const obs = new TypedNetworkObserver({ llm });
+    const facts = await obs.extract('User: Where is Berlin?', 'session-aw');
+    expect(facts).toHaveLength(1);
+    expect(facts[0].bank).toBe('WORLD');
+    expect(facts[0].entities).toContain('Berlin');
+  });
+
+  it('drops invalid facts and keeps valid facts in the same response', async () => {
+    // Per-fact tolerance: one bad apple does not spoil the bunch. The
+    // shipped strict-mode parser threw on any single-fact failure, losing
+    // every other fact in the same extraction call.
+    const llm = mockLLM(JSON.stringify({
+      facts: [
+        {
+          text: 'Berlin is in Germany',
+          bank: 'WORLD',
+          temporal: { mention: '2026-04-26' },
+          participants: [],
+          reasoning_markers: [],
+          entities: ['Berlin'],
+          confidence: 1.0,
+        },
+        null,
+        'a string fact',
+        {
+          text: 'Munich is in Germany',
+          bank: 'WORLD',
+          temporal: { mention: '2026-04-26' },
+          participants: [],
+          reasoning_markers: [],
+          entities: ['Munich'],
+          confidence: 1.0,
+        },
+      ],
+    }));
+    const obs = new TypedNetworkObserver({ llm });
+    const facts = await obs.extract('blah', 'session-pft');
+    expect(facts).toHaveLength(2);
+    expect(facts.map((f) => f.text)).toEqual([
+      'Berlin is in Germany',
+      'Munich is in Germany',
+    ]);
+  });
+
+  it('defaults missing array fields (participants, reasoning_markers, entities) to []', async () => {
+    // gpt-5-mini frequently omits empty array fields entirely instead of
+    // emitting them as []. The schema accepts the missing fields and fills
+    // in [].
+    const llm = mockLLM(JSON.stringify({
+      facts: [{
+        text: 'Berlin is in Germany',
+        bank: 'WORLD',
+        temporal: { mention: '2026-04-26' },
+        confidence: 1.0,
+        // missing: participants, reasoning_markers, entities
+      }],
+    }));
+    const obs = new TypedNetworkObserver({ llm });
+    const facts = await obs.extract('blah', 'session-def');
+    expect(facts).toHaveLength(1);
+    expect(facts[0].participants).toEqual([]);
+    expect(facts[0].reasoningMarkers).toEqual([]);
+    expect(facts[0].entities).toEqual([]);
+  });
+
+  it('coerces lowercase bank to uppercase before validation', async () => {
+    // The 6-step prompt instructs UPPERCASE banks but the LLM sometimes
+    // emits lowercase. A single uppercase coercion at parse time recovers
+    // the fact instead of dropping it.
+    const llm = mockLLM(JSON.stringify({
+      facts: [{
+        text: 'Berlin is in Germany',
+        bank: 'world',
+        temporal: { mention: '2026-04-26' },
+        participants: [],
+        reasoning_markers: [],
+        entities: ['Berlin'],
+        confidence: 1.0,
+      }],
+    }));
+    const obs = new TypedNetworkObserver({ llm });
+    const facts = await obs.extract('blah', 'session-co');
+    expect(facts).toHaveLength(1);
+    expect(facts[0].bank).toBe('WORLD');
+  });
+
+  it('retries once when outer parse fails completely (spec section 6 retry path)', async () => {
+    // Spec §6: "malformed outputs are retried once with the validation
+    // error appended to the prompt." Originally specified, not implemented
+    // in shipping code. Only retries on catastrophic outer failure
+    // (invalid JSON, primitive value, missing facts key) — per-fact errors
+    // are handled silently via tolerance above.
+    let calls = 0;
+    const llm: ITypedExtractionLLM = {
+      invoke: async () => {
+        calls += 1;
+        if (calls === 1) return 'definitely not json';
+        return JSON.stringify({
+          facts: [{
+            text: 'Berlin is in Germany',
+            bank: 'WORLD',
+            temporal: { mention: '2026-04-26' },
+            participants: [],
+            reasoning_markers: [],
+            entities: ['Berlin'],
+            confidence: 1.0,
+          }],
+        });
+      },
+    };
+    const obs = new TypedNetworkObserver({ llm });
+    const facts = await obs.extract('blah', 'session-rt');
+    expect(calls).toBe(2);
+    expect(facts).toHaveLength(1);
+  });
+
+  it('returns [] when retry also fails (no infinite retry loop)', async () => {
+    let calls = 0;
+    const llm: ITypedExtractionLLM = {
+      invoke: async () => {
+        calls += 1;
+        return 'still not json';
+      },
+    };
+    const obs = new TypedNetworkObserver({ llm });
+    const facts = await obs.extract('blah', 'session-rt2');
+    expect(calls).toBe(2);
+    expect(facts).toEqual([]);
   });
 
   it('tolerates triple-backtick code fence around JSON', async () => {
diff --git a/src/memory/retrieval/typed-network/prompts/extraction-schema.ts b/src/memory/retrieval/typed-network/prompts/extraction-schema.ts