Memory-guided eval

A small research harness for studying memory-guided document (and workflow) transformations: what to do after extraction, how to rank candidate pipelines, and how to learn from traces without collapsing routing and learning into one opaque loop.

Core loop

Memory → Suggest → User Choice → Transform → Evaluate
   ↑                                      ↓
   └──────────── Learn / Update ─────────┘

Why this exists

Many pipelines stop at “PDF → text / chunks / embeddings.” The harder systems question is:

Given a document (or signal), what transformation should happen next?

This repo treats that as a decision problem: rank actions using a fast loop (read-only routing), log outcomes, and apply a slow loop (batch-style updates to weights and failure counts).

Published repository

Browse: github.com/architectfromthefuture/memory-guided-eval
Clone: git clone https://github.com/architectfromthefuture/memory-guided-eval.git

Architecture: fast vs slow

Piece	Role	May mutate long-term memory?
Fast loop	Primer + scoring → ranked pipeline suggestions	No
Slow loop	Aggregate logs → update weights / failure modes	Yes
Memory	Priors, weights, failure counts (shared snapshot)	Updated only by slow loop

Design rule: the fast loop does not call memory.update(...). It ranks and suggests; execution and logging feed the slow loop.

Failure clusters as interventions

A failure cluster becomes useful when it changes the system.

This repo treats clustered failure modes as candidate intervention signals for:

routing policy
memory policy
tool constraints
bandit weights

See docs/failure_clusters_as_interventions.md.

Repository layout

memory-guided-eval/
├── docs/             # Design notes (e.g., failure clusters as interventions)
├── memory/           # Priors, clippings, failure examples (on disk)
├── eval/             # Experiments and logs
├── interventions/    # Candidate policies from cluster signals (scaffold)
├── pipelines/        # Transformation stubs — replace with your steps
├── router/           # Fast loop (attention_router)
├── optimize/         # Slow loop (constraint_optimizer)
└── run_eval.py       # Minimal demo

Run the demo

From the repo root:

python run_eval.py

This prints ranked suggestions for a sample document, then applies one slow-loop step from a synthetic log line.

Honest scope (anti–overfitting)

This repository does not claim:

Production readiness, SLOs, or security review
Generalization beyond the toy scoring rules and hand-authored priors shown here
Parity with full RL, bandits, or neural routers — this is explicitly a small, inspectable scaffold
Novelty in the literature; it is an implementation sketch of a common pattern (decision vs learning separation)

Evidence posture

Verified here: run_eval.py runs; fast and slow loops are importable and deterministic for the demo inputs
Described but not verified: regret-based updates, richer priors, embedding-based retrieval, real pipeline execution — roadmap, not results

Related writing

The public obversarystudios.org book is reset (ground zero). For narrative alongside this code, use this README and issues in this repo until a new site page exists.

Name on GitHub

Suggested short description:

Lightweight harness for memory-guided document workflows: fast-loop routing, slow-loop learning, and failure-weighted evaluation traces.

Suggested topics: ai-systems, document-processing, evaluation, human-in-the-loop, workflow, rag.

License

MIT — see LICENSE.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Memory-guided eval

Core loop

Why this exists

Published repository

Architecture: fast vs slow

Failure clusters as interventions

Repository layout

Run the demo

Honest scope (anti–overfitting)

Related writing

Name on GitHub

License

About

Uh oh!

Releases

Packages

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 2 Commits
docs		docs
eval		eval
interventions		interventions
memory		memory
optimize		optimize
pipelines		pipelines
router		router
.gitignore		.gitignore
AGENTS.md		AGENTS.md
LICENSE		LICENSE
README.md		README.md
pyproject.toml		pyproject.toml
run_eval.py		run_eval.py

Folders and files

Latest commit

History

Repository files navigation

Memory-guided eval

Core loop

Why this exists

Published repository

Architecture: fast vs slow

Failure clusters as interventions

Repository layout

Run the demo

Honest scope (anti–overfitting)

Related writing

Name on GitHub

License

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Packages