refactor(archon): extract utility functions and simplify engine code#954
refactor(archon): extract utility functions and simplify engine code#954garrett4wade merged 1 commit intomainfrom
Conversation
Summary of ChangesHello, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed! This pull request focuses on enhancing the maintainability and robustness of the Archon engine through significant code refactoring. It extracts complex logic into dedicated utility functions, streamlines the handling of parallel process groups by caching them, and adopts a safer, more idiomatic approach to distributed lock management using Python's context manager protocol. These changes are purely structural, ensuring no alteration to the existing behavior or functionality of the system. Highlights
🧠 New Feature in Public Preview: You can now enable Memory to help Gemini Code Assist learn from your team's feedback. This makes future code reviews more consistent and personalized to your project's style. Click here to enable Memory in your admin console. Changelog
Activity
Using Gemini Code AssistThe full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips. Invoking Gemini You can request assistance from Gemini at any point by creating a comment using either
Customization To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a Limitations & Feedback Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here. You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension. Footnotes
|
There was a problem hiding this comment.
Code Review
This pull request introduces a significant and well-executed refactoring of the Archon engine. Key improvements include extracting complex setup logic from ArchonEngine into a new archon_utils.py module, which greatly enhances modularity and readability. The caching of Tensor Parallel and Context Parallel groups is a good performance optimization that avoids repeated lookups in hot paths. Furthermore, the adoption of context managers for DistributedLock makes resource handling safer and more idiomatic. The minor inlining and documentation updates are also positive changes. Overall, this is a high-quality refactoring with no issues found.
Move optimizer/scheduler creation, activation checkpoint config, zero-bubble validation, deterministic mode setup, and pad_to_maximum validation into archon_utils.py for reuse and testability. Cache tp/cp parallel groups to avoid repeated lookups, and use context managers for DistributedLock. Key changes: - Extract 6 utility functions into new archon_utils.py module - Cache _tp_group and _cp_group on engine initialization - Add __enter__/__exit__ to DistributedLock for context manager usage - Replace manual lock acquire/release with `with` statements - Add venv activation note to installation docs
…nclusionAI#954) Move optimizer/scheduler creation, activation checkpoint config, zero-bubble validation, deterministic mode setup, and pad_to_maximum validation into archon_utils.py for reuse and testability. Cache tp/cp parallel groups to avoid repeated lookups, and use context managers for DistributedLock. Key changes: - Extract 6 utility functions into new archon_utils.py module - Cache _tp_group and _cp_group on engine initialization - Add __enter__/__exit__ to DistributedLock for context manager usage - Replace manual lock acquire/release with `with` statements - Add venv activation note to installation docs
…nclusionAI#954) Move optimizer/scheduler creation, activation checkpoint config, zero-bubble validation, deterministic mode setup, and pad_to_maximum validation into archon_utils.py for reuse and testability. Cache tp/cp parallel groups to avoid repeated lookups, and use context managers for DistributedLock. Key changes: - Extract 6 utility functions into new archon_utils.py module - Cache _tp_group and _cp_group on engine initialization - Add __enter__/__exit__ to DistributedLock for context manager usage - Replace manual lock acquire/release with `with` statements - Add venv activation note to installation docs
Description
Extract six utility functions from
ArchonEngineinto a newarchon_utils.pymodule for improved reuse and testability. Cache TP/CP parallel groups on initialization to eliminate repeated lookups in hot paths. Add context manager protocol toDistributedLockand convert manual acquire/release patterns towithstatements for safer lock handling.Related Issue
N/A
Type of Change
Checklist
jb build docs/gemini review)Breaking Change Details (if applicable):
N/A
Additional Context
Key changes:
archon_utils.py(new):create_optimizer,create_lr_scheduler,build_ac_config,validate_zero_bubble_compatibility,setup_deterministic_mode,validate_and_force_pad_to_maximumarchon_engine.py: Delegates to extracted utilities; caches_tp_groupand_cp_groupto avoid repeatedparallel_dims.get_group()callsarchon_weight_sync.py: Replaces manualacquire()/release()withwith engine.engine_lock:context managerarchon_checkpoint.py/archon_runner.py: Minor expression inlininglock.py: Adds__enter__/__exit__toDistributedLockinstallation.md: Documents venv activation requirementAll changes are strictly behavioral no-ops — no logic or control flow modifications.