Skip to content

obversary/memory-guided-eval

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

2 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Memory-guided eval

A small research harness for studying memory-guided document (and workflow) transformations: what to do after extraction, how to rank candidate pipelines, and how to learn from traces without collapsing routing and learning into one opaque loop.

Core loop

Memory → Suggest → User Choice → Transform → Evaluate
   ↑                                      ↓
   └──────────── Learn / Update ─────────┘

Why this exists

Many pipelines stop at “PDF → text / chunks / embeddings.” The harder systems question is:

Given a document (or signal), what transformation should happen next?

This repo treats that as a decision problem: rank actions using a fast loop (read-only routing), log outcomes, and apply a slow loop (batch-style updates to weights and failure counts).

Published repository

Architecture: fast vs slow

Piece Role May mutate long-term memory?
Fast loop Primer + scoring → ranked pipeline suggestions No
Slow loop Aggregate logs → update weights / failure modes Yes
Memory Priors, weights, failure counts (shared snapshot) Updated only by slow loop

Design rule: the fast loop does not call memory.update(...). It ranks and suggests; execution and logging feed the slow loop.

Failure clusters as interventions

A failure cluster becomes useful when it changes the system.

This repo treats clustered failure modes as candidate intervention signals for:

  • routing policy
  • memory policy
  • tool constraints
  • bandit weights

See docs/failure_clusters_as_interventions.md.

Repository layout

memory-guided-eval/
├── docs/             # Design notes (e.g., failure clusters as interventions)
├── memory/           # Priors, clippings, failure examples (on disk)
├── eval/             # Experiments and logs
├── interventions/    # Candidate policies from cluster signals (scaffold)
├── pipelines/        # Transformation stubs — replace with your steps
├── router/           # Fast loop (attention_router)
├── optimize/         # Slow loop (constraint_optimizer)
└── run_eval.py       # Minimal demo

Run the demo

From the repo root:

python run_eval.py

This prints ranked suggestions for a sample document, then applies one slow-loop step from a synthetic log line.

Honest scope (anti–overfitting)

This repository does not claim:

  • Production readiness, SLOs, or security review
  • Generalization beyond the toy scoring rules and hand-authored priors shown here
  • Parity with full RL, bandits, or neural routers — this is explicitly a small, inspectable scaffold
  • Novelty in the literature; it is an implementation sketch of a common pattern (decision vs learning separation)

Evidence posture

  • Verified here: run_eval.py runs; fast and slow loops are importable and deterministic for the demo inputs
  • Described but not verified: regret-based updates, richer priors, embedding-based retrieval, real pipeline execution — roadmap, not results

Related writing

The public obversarystudios.org book is reset (ground zero). For narrative alongside this code, use this README and issues in this repo until a new site page exists.

Name on GitHub

Suggested short description:

Lightweight harness for memory-guided document workflows: fast-loop routing, slow-loop learning, and failure-weighted evaluation traces.

Suggested topics: ai-systems, document-processing, evaluation, human-in-the-loop, workflow, rag.

License

MIT — see LICENSE.

About

This repo treats clustered failure modes as candidate intervention signals

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages