fix: let HTTPClientConfig and template regen work on non-Linux#291
fix: let HTTPClientConfig and template regen work on non-Linux#291nv-alicheng merged 3 commits intomainfrom
Conversation
Two independent issues that together broke pre-commit (and any HTTPClientConfig construction) on macOS: 1. HTTPClientConfig()._resolve_defaults calls _get_auto_num_workers when num_workers=-1, which invokes get_current_numa_node(). That function is @require_linux and raises UnsupportedPlatformError on darwin. Fix: catch UnsupportedPlatformError and fall back to min_workers=10, matching the existing "NUMA not discoverable" branch. 2. scripts/regenerate_templates.py::_dump_defaults was documented to avoid running model validators, but called default_factory() for every field — which constructs the nested model (and runs its validators) whenever the factory is a BaseModel subclass. Fix: when the factory is a BaseModel subclass, recurse into _dump_defaults(factory) instead. Factories that dynamically pick a concrete subclass (e.g. TransportConfig.create_default -> ZMQTransportConfig) are not types, so they still get called as before and the concrete subclass is walked via the existing isinstance(default, BaseModel) branch. Verified on Linux: pre-commit passes, templates are byte-identical. Verified against simulated non-Linux (UnsupportedPlatformError patched to always raise): HTTPClientConfig() constructs, and the regen script completes without entering the NUMA code path at all. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
|
MLCommons CLA bot All contributors have signed the MLCommons CLA ✍️ ✅ |
There was a problem hiding this comment.
Pull request overview
Fixes macOS/non-Linux breakages by making HTTP client worker auto-detection resilient to unsupported NUMA probing and by preventing template regeneration from constructing nested Pydantic models (and thus running validators) when extracting defaults.
Changes:
- Catch
UnsupportedPlatformErrorduring NUMA probing in_get_auto_num_workers()and fall back tomin_workers. - Update
scripts/regenerate_templates.py::_dump_defaults()to recurse intoBaseModelsubclasses used asdefault_factoryinstead of calling them.
Reviewed changes
Copilot reviewed 2 out of 2 changed files in this pull request and generated 2 comments.
| File | Description |
|---|---|
src/inference_endpoint/endpoint_client/config.py |
Makes HTTPClientConfig(num_workers=-1) safe on non-Linux by handling unsupported NUMA probing. |
scripts/regenerate_templates.py |
Avoids running validators during template regeneration by not calling BaseModel class factories. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
There was a problem hiding this comment.
Code Review
This pull request improves cross-platform compatibility and prevents side effects during template generation. Specifically, scripts/regenerate_templates.py was updated to recurse into BaseModel subclasses when used as a default_factory to avoid triggering validators. Additionally, src/inference_endpoint/endpoint_client/config.py now handles UnsupportedPlatformError during NUMA probing, allowing the client configuration to be instantiated on non-Linux systems. I have no feedback to provide as the review comments were purely explanatory.
Adds unit tests for both fixes in this PR: - tests/unit/endpoint_client/test_http_client_config.py: patches get_current_numa_node / get_cpus_in_numa_node to raise UnsupportedPlatformError and asserts _get_auto_num_workers and HTTPClientConfig() fall back to min_workers=10. - tests/unit/config/test_regenerate_templates.py: defines a BaseModel with a counter-incrementing model_validator used as default_factory, and asserts _dump_defaults does not invoke the validator while still emitting the nested defaults. Also covers the non-BaseModel callable factory path (lambda) to ensure it still gets called. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Personal editor/settings files (.claude.local.md, .nvimrc.lua) were pulled in by git add -A in the previous commit. They should stay untracked. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
There was a problem hiding this comment.
Pull request overview
Copilot reviewed 4 out of 4 changed files in this pull request and generated no new comments.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
|
Verified on MacOS, thanks! |
Summary
Two independent issues broke
pre-commit run(and anyHTTPClientConfig()construction) on macOS. Fixing both:(A)
src/inference_endpoint/endpoint_client/config.py—_get_auto_num_workersnow catchesUnsupportedPlatformErrorfromget_current_numa_node/get_cpus_in_numa_nodeand falls back tomin_workers=10. Matches the existing "NUMA not discoverable" branch. Unbricks the CLI on macOS too.(B)
scripts/regenerate_templates.py—_dump_defaultswas documented to avoid running model validators, but it calleddefault_factory()on every field, which constructs nested models (and runs their validators) whenever the factory is aBaseModelsubclass. Now it recurses into_dump_defaults(factory)when the factory itself is aBaseModelsubclass. Factories that dynamically pick a concrete subclass (e.g.TransportConfig.create_default → ZMQTransportConfig) are not types, so they still get called and the resulting instance is walked via the existing branch.Either fix alone unbricks pre-commit on macOS — (A) makes the validator safe to run, (B) stops the script from running it. Together they're belt-and-suspenders: (A) fixes the product, (B) hardens the tooling against future validators adding side effects.
Template output is unchanged (and platform-independent)
Generated templates are byte-identical before and after this change, and identical across macOS / Linux.
_dump_defaultsemits class-level declared defaults (e.g.num_workers: -1— the "auto" sentinel), not the validator-resolved values. Even on Linux before this PR, the resolved value fromHTTPClientConfig()was silently discarded because the serializer re-walks the class (not the instance). So the validator's NUMA-aware resolution never landed in the template file — it was only ever a runtime concern. This PR just stops the pointless construction.Alternatives considered
IS_LINUXfromcpu_affinity.pyand checking it at the top of_get_auto_num_workers): arguably more idiomatic Python for a non-racy condition, and could replace thetry/exceptin (A). Kept the current form for minimal diff; happy to swap in a follow-up.cpu_affinity.py(drop@require_linuxfrom read-only queries, keep it on mutators likepin_loadgen): the most principled shape — "errors for exceptional conditions, sentinels for absent data" — but a broader change that needsAttributeErrorhandling foros.sched_getaffinityon macOS. Deferred.Test plan
pre-commit run --all-filespasses, generated templates are byte-identicalUnsupportedPlatformErrorpatched to always raise):HTTPClientConfig()constructs withnum_workers=10; regen script completes without entering the NUMA code pathpre-commit run --all-fileson darwin🤖 Generated with Claude Code