A small research harness for studying memory-guided document (and workflow) transformations: what to do after extraction, how to rank candidate pipelines, and how to learn from traces without collapsing routing and learning into one opaque loop.
Memory → Suggest → User Choice → Transform → Evaluate
↑ ↓
└──────────── Learn / Update ─────────┘
Many pipelines stop at “PDF → text / chunks / embeddings.” The harder systems question is:
Given a document (or signal), what transformation should happen next?
This repo treats that as a decision problem: rank actions using a fast loop (read-only routing), log outcomes, and apply a slow loop (batch-style updates to weights and failure counts).
- Browse: github.com/architectfromthefuture/memory-guided-eval
- Clone:
git clone https://github.com/architectfromthefuture/memory-guided-eval.git
| Piece | Role | May mutate long-term memory? |
|---|---|---|
| Fast loop | Primer + scoring → ranked pipeline suggestions | No |
| Slow loop | Aggregate logs → update weights / failure modes | Yes |
| Memory | Priors, weights, failure counts (shared snapshot) | Updated only by slow loop |
Design rule: the fast loop does not call memory.update(...). It ranks and suggests; execution and logging feed the slow loop.
A failure cluster becomes useful when it changes the system.
This repo treats clustered failure modes as candidate intervention signals for:
- routing policy
- memory policy
- tool constraints
- bandit weights
See docs/failure_clusters_as_interventions.md.
memory-guided-eval/
├── docs/ # Design notes (e.g., failure clusters as interventions)
├── memory/ # Priors, clippings, failure examples (on disk)
├── eval/ # Experiments and logs
├── interventions/ # Candidate policies from cluster signals (scaffold)
├── pipelines/ # Transformation stubs — replace with your steps
├── router/ # Fast loop (attention_router)
├── optimize/ # Slow loop (constraint_optimizer)
└── run_eval.py # Minimal demo
From the repo root:
python run_eval.pyThis prints ranked suggestions for a sample document, then applies one slow-loop step from a synthetic log line.
This repository does not claim:
- Production readiness, SLOs, or security review
- Generalization beyond the toy scoring rules and hand-authored priors shown here
- Parity with full RL, bandits, or neural routers — this is explicitly a small, inspectable scaffold
- Novelty in the literature; it is an implementation sketch of a common pattern (decision vs learning separation)
Evidence posture
- Verified here:
run_eval.pyruns; fast and slow loops are importable and deterministic for the demo inputs - Described but not verified: regret-based updates, richer priors, embedding-based retrieval, real pipeline execution — roadmap, not results
The public obversarystudios.org book is reset (ground zero). For narrative alongside this code, use this README and issues in this repo until a new site page exists.
Suggested short description:
Lightweight harness for memory-guided document workflows: fast-loop routing, slow-loop learning, and failure-weighted evaluation traces.
Suggested topics: ai-systems, document-processing, evaluation, human-in-the-loop, workflow, rag.
MIT — see LICENSE.