This release note summarizes changes since the last GitHub release (0.0.42), up to 0.0.90.
Highlights
- Major serialization upgrade with a high-level Checkpointer, AsyncManager improvements, and structured checkpoints.
- Expanded quantization stack with NF4 improvements, 1-bit support, a dedicated config object, and stronger tests.
- Mesh creation and sharding utilities modernized, including a switch to JAX-driven dynamic mesh creation.
- Executor/Ray tooling enhanced with better error handling, new helpers, and device management updates.
- API documentation reorganized into a hierarchical layout with a dedicated generation script.
Breaking and Migration Notes
- Legacy
calliband several oldopssubpackages were removed during the codebase restructure. Update imports to the current module layout (notably undereformer/ops/quantizationand consolidated packages). - Python 3.10 support was dropped and JAX was updated (to 0.7.x); verify your runtime compatibility.
- API docs paths and module names changed to a nested structure; update any custom doc links.
Serialization and Checkpointing
- Added a high-level Checkpointer with policy-based management and improved lifecycle handling.
- Introduced structured checkpoint support and fsspec path utilities.
- Added distributed checkpointing and sharded checkpoint helpers.
- Implemented CPU offloading paths and SafeTensors support.
- Improved AsyncManager behavior, version management, and checkpoint safety.
- Added and refined GCS support with unified index handling.
Quantization and Implicit Arrays
- Added a quantization configuration object and updated training examples.
- Extended NF4 kernel support and fixed issues across implicit array implementations.
- Added 1-bit quantization support and improved 8-bit/NF4 behavior.
- Expanded quantization tests and runtime coverage for sharded use cases.
Mesh, Sharding, and Partitioning
- Switched dynamic mesh creation to
jax.make_meshinstead of manual topology calculations. - Added CPU mesh utilities and barrier synchronization helpers.
- Improved partition constraints and error handling for sharding utilities.
- Added new tests for mesh creation utilities.
Executor, Ray, and TPU Tooling
- Added a
device_remotedecorator and expanded executor helper utilities. - Improved execute_multislice error handling and pool management (including DeviceHostActor support).
- Refined Ray executor health checks, blocking behavior, and resource manager logic.
- Updated TPU patching and Ray integration utilities.
Optimizers
- Refactored the optimizer stack into clearer builders, factories, and tx utilities.
- Improved WhiteKron and Mars implementations with new tests.
- Updated optimizer documentation to match the new layout.
Paths and Storage
- Fixed GCSPath glob/iterdir edge cases and improved error handling.
- Improved LocalPath mkdir handling and path safety checks.
Docs, Tooling, and Tests
- API docs reorganized into hierarchical packages with per-package indices.
- Added
format_and_generate_docs.pyfor formatting and documentation generation. - Added tests for mesh creation, aparser, paths, mixed precision utilities, and serialization helpers.
- Expanded optimizer, quantization, cluster, and logging tests.
Versioning and Packaging
- Version bumps from 0.0.43 through 0.0.90 with incremental fixes and dependency updates.
- Added optional dev dependencies (including pytest) and updated requirements for serialization backends.