ttt-conv-memory

Empirical validation of the question:

Can test-time training (TTT) fast-weight updates serve as a multi-turn conversation memory mechanism, replacing the context window?

We feed a model a dialogue containing facts, let TTT update its fast weights during ingestion, then drop the conversation from context and ask probe questions. The fast-weight state is the only place the facts could persist. We measure how many of them the model can still answer.

The vehicle is the In-Place TTT architecture from ByteDance-Seed (ICLR 2026 oral) on Qwen3-8B.

Verdict

NEGATIVE under our minimum-scale (B-mini) training. EM(A) = 0.929 with conversation in context, EM(B) = 0.000 with TTT memory only, EM(C) = 0.016 no-memory baseline. memory_efficiency_ratio = EM(B)/EM(A) = 0.000. The TTT-modified weights actively degrade the model below the no-memory baseline — they are perturbation noise, not encoded memory. Full report in RESULTS.md.

This is not a refutation of In-Place TTT at paper scale (5000 steps × seq 65536 × 8×H100, joint base+TTT training); it is an answer to the literal experimental question under realistic single-GPU constraints.

Documents

SPEC.md — refined experimental design
DECISIONS.md — every research-level fork in the road and why
STATE.md — full chronological run log
RESULTS.md — final report with numbers, figures, interpretation

Layout

ttt-conv-memory/
├── README.md             # this file
├── SPEC.md               # detailed experimental spec
├── STATE.md              # live state — updated after every step
├── DECISIONS.md          # every fork + rationale
├── RESULTS.md            # final report (written at end)
├── requirements.txt      # pinned Python deps for the GPU host
├── setup.sh              # bare-metal Linux GPU host install
├── modal_app.py          # Modal serverless entry point
├── build_benchmark.py    # generate the 300-sample benchmark
├── benchmark_v1.json     # the generated benchmark (committed)
├── model_utils.py        # model loading + TTT cache / fast-weight control
├── train_minimal.py      # minimal continual-pretrain to bring TTT params out of init
├── run_experiment.py     # main 4-condition pipeline
├── evaluate.py           # metrics + figures
├── results/              # per-condition outputs + final report
│   ├── condition_a.json
│   ├── condition_b.json
│   ├── condition_c.json
│   ├── condition_d.json
│   ├── report.json
│   └── figures/
└── logs/                 # raw run logs (training, inference, sanity checks)

Quick reference

Question	Answer
Hardware	Single A100-40G via Modal
Cost budget	~$25-30 (covered by Modal $30/mo free credit)
Model	Qwen3-8B (base) + In-Place TTT layers
Checkpoint	Self-trained minimal (see `DECISIONS.md` D-001)
Dataset	300 synthetic conversations × 5-10 facts each
Conditions	A: context baseline / B: TTT memory / C: no memory / D: TTT + distractor
Verdict bar	Memory efficiency ratio = EM(B) / EM(A) > 0.7 ⇒ "this path works"

Reproduce

# 1. Build the benchmark (CPU-only, ~30s)
python build_benchmark.py

# 2. End-to-end on Modal
modal run modal_app.py::full_pipeline

# OR step-by-step on a Linux GPU host
bash setup.sh
python train_minimal.py
python run_experiment.py --condition all
python evaluate.py

License

MIT (see LICENSE).

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

ttt-conv-memory

Verdict

Documents

Layout

Quick reference

Reproduce

License

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 19 Commits
results		results
.gitignore		.gitignore
CLOSURE.md		CLOSURE.md
DECISIONS.md		DECISIONS.md
FINDINGS.md		FINDINGS.md
README.md		README.md
RESULTS.md		RESULTS.md
SPEC.md		SPEC.md
STATE.md		STATE.md
analyze_phase_a.py		analyze_phase_a.py
benchmark_v1.json		benchmark_v1.json
build_benchmark.py		build_benchmark.py
evaluate.py		evaluate.py
modal_app.py		modal_app.py
model_utils.py		model_utils.py
requirements.txt		requirements.txt
run_experiment.py		run_experiment.py
sanity.py		sanity.py
setup.sh		setup.sh
train_minimal.py		train_minimal.py

Folders and files

Latest commit

History

Repository files navigation

ttt-conv-memory

Verdict

Documents

Layout

Quick reference

Reproduce

License

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages