Skip to content

Incremental training and chat context fix for microgpt#1015

Merged
aaylward merged 10 commits intomainfrom
incremental_training
Feb 17, 2026
Merged

Incremental training and chat context fix for microgpt#1015
aaylward merged 10 commits intomainfrom
incremental_training

Conversation

@aaylward
Copy link
Copy Markdown
Collaborator

Summary

  • Checkpoint/resume for training — custom Adam optimizer with serializable m/v state replaces candle_nn::AdamW. New --resume and --checkpoint-every CLI flags let long training runs be interrupted and continued with identical results.
  • Fix chat context exhaustion — reserve 1/4 of the context window for generation (was 1 token) and snap truncation to turn boundaries. Shared via Tokenizer::truncate_chat_prompt() in both CLI and inference service. Reports tokens_dropped to users.
  • Bump CLI to 0.3.1, update all three READMEs, fix server_pal deprecation warning.

Fixes #1012

Test plan

  • New test_checkpoint_resume_roundtrip — trains N steps, checkpoints, resumes M more, verifies weights match uninterrupted N+M run
  • New test_optimizer_state_serialization_roundtrip — verifies m/v save/load preserves values exactly
  • All 38 existing cargo test -p microgpt tests pass
  • bazel test //... — 75/75 pass
  • cargo build — zero warnings across entire workspace

🤖 Generated with Claude Code

aaylward and others added 9 commits February 17, 2026 14:04
Replace candle_nn::AdamW with a custom Adam optimizer that exposes
serializable m/v accumulators and step counter. Add --resume and
--checkpoint-every CLI flags so long training runs can be interrupted
and continued with identical results.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Reserve 1/4 of the context window for generation instead of just 1
token, and snap history truncation to turn boundaries so the model
sees coherent conversation context.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Add Tokenizer::truncate_chat_prompt() to the library — reserves 1/4 of
the context window for generation and snaps truncation to turn
boundaries. Use it in both the CLI chat loop and the /chat endpoint.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
truncate_chat_prompt() now returns the number of tokens dropped.
CLI prints a notice when history is truncated; the /chat endpoint
includes tokens_dropped in the response body.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Use TimeoutLayer::with_status_code(408) instead.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Split main.rs into three files:
- main.rs: CLI parsing and dispatch
- train.rs: unified training loop via TrainingData trait
- infer.rs: generate, chat, and info commands

Eliminates ~100 lines of duplication between the old run_train
and run_train_chat functions.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
@aaylward aaylward merged commit f8c42a9 into main Feb 17, 2026
9 checks passed
@aaylward aaylward deleted the incremental_training branch February 17, 2026 20:23
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

microgpt: support incremental training (resume from checkpoint)

1 participant