TL;DR: I co-trained a summarizer and a generator to learn a compression scheme for text in the same token space as the base model, so it can continue with almost the same quality using an order of magnitude fewer context tokens. Along the way the model discovers its own compression tricks: aggressive pruning, dense punctuation (lots of semicolons), and even occasionally switching into Mandarin to pack more information per token.
You can read the full blog post here.
It's super simple.
- Set your
TINKER_API_KEYandWANDB_API_KEYin.envfile. - Run
uv sync - Run
uv run run_train.py