Incremental training and chat context fix for microgpt#1015
Merged
Conversation
Replace candle_nn::AdamW with a custom Adam optimizer that exposes serializable m/v accumulators and step counter. Add --resume and --checkpoint-every CLI flags so long training runs can be interrupted and continued with identical results. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Reserve 1/4 of the context window for generation instead of just 1 token, and snap history truncation to turn boundaries so the model sees coherent conversation context. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Add Tokenizer::truncate_chat_prompt() to the library — reserves 1/4 of the context window for generation and snaps truncation to turn boundaries. Use it in both the CLI chat loop and the /chat endpoint. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
truncate_chat_prompt() now returns the number of tokens dropped. CLI prints a notice when history is truncated; the /chat endpoint includes tokens_dropped in the response body. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Use TimeoutLayer::with_status_code(408) instead. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Split main.rs into three files: - main.rs: CLI parsing and dispatch - train.rs: unified training loop via TrainingData trait - infer.rs: generate, chat, and info commands Eliminates ~100 lines of duplication between the old run_train and run_train_chat functions. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
candle_nn::AdamW. New--resumeand--checkpoint-everyCLI flags let long training runs be interrupted and continued with identical results.Tokenizer::truncate_chat_prompt()in both CLI and inference service. Reportstokens_droppedto users.server_paldeprecation warning.Fixes #1012
Test plan
test_checkpoint_resume_roundtrip— trains N steps, checkpoints, resumes M more, verifies weights match uninterrupted N+M runtest_optimizer_state_serialization_roundtrip— verifies m/v save/load preserves values exactlycargo test -p microgpttests passbazel test //...— 75/75 passcargo build— zero warnings across entire workspace🤖 Generated with Claude Code