Skip to content

taOSmd: triple-gate dedup + claim-level memify granularity #210

@jaylfc

Description

@jaylfc

Follow-up to feedback from @m13v on #198. Two concrete changes to the memify / compression pipeline.

Triple-gate before cosine dedup

Current compression pass dedups purely on cosine similarity over normalised chunks. This merges memories that are semantically close but factually distinct:

  • "met with Jay on Monday about budgets"
  • "met with Jay on Tuesday about hiring"

Cosine >0.9 on both, but they're different events. Fix:

  1. Extract (subject, relation, object) triple from each candidate chunk.
  2. Bucket candidates by overlapping triples.
  3. Apply cosine similarity dedup within a bucket only — cosine becomes a secondary filter, not the primary key.
  4. Across-bucket merges are never allowed regardless of cosine score.

Needs an entity+predicate extractor in the compression pipeline. We already have the machinery in the KG extraction step; the work is wiring it as a pre-dedup gate instead of a parallel concern.

Claim-level memify granularity

Current pipeline over-chunks: one paragraph can produce 4+ overlapping facts that all carry identical metadata. The storage waste is manageable, but retrieval quality suffers because redundant atomic facts dilute the ranking.

Move to one-assertable-claim-per-node, and let the graph edges carry composition. Example:

  • ❌ Today: 4 nodes for "Jay works at JAN LABS", "JAN LABS is a company", "Jay is employed", "Jay's employer is JAN LABS"
  • ✅ Target: 1 node {subject: Jay, predicate: works_at, object: JAN LABS} — composition lives on the edges

Acceptance

  • Triple extractor gates dedup; Monday/Tuesday test case stays as two distinct memories
  • memify output drops to ~1 claim per assertable statement; benchmark LongMemEval-S retrieval to verify Recall@5 doesn't regress
  • Graph traversal replaces node-level composition in at least one retrieval path

Metadata

Metadata

Assignees

No one assigned

    Labels

    agentsAgent frameworks and deploymentenhancementNew feature or requestfeatureNew featurekilo-duplicateAuto-generated label by Kilokilo-triagedAuto-generated label by Kilo

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions