feat: enable cpu affinity, pin loadgen to CPU-0#69
Conversation
|
MLCommons CLA bot All contributors have signed the MLCommons CLA ✍️ ✅ |
Summary of ChangesHello @viraatc, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed! This pull request introduces a significant new feature that enables CPU affinity for worker processes within the inference endpoint client. By allowing workers to be pinned to specific CPU cores, this enhancement aims to optimize resource utilization, potentially leading to improved performance and stability, particularly in demanding or specialized computing environments. The implementation provides flexible configuration options, including automatic core distribution and explicit core selection. Highlights
🧠 New Feature in Public Preview: You can now enable Memory to help Gemini Code Assist learn from your team's feedback. This makes future code reviews more consistent and personalized to your project's style. Click here to enable Memory in your admin console. Using Gemini Code AssistThe full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips. Invoking Gemini You can request assistance from Gemini at any point by creating a comment using either
Customization To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a Limitations & Feedback Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here. You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension. Footnotes
|
There was a problem hiding this comment.
Pull request overview
This PR adds CPU affinity configuration to pin worker processes to specific CPU cores for performance optimization. The feature supports automatic core assignment, manual core specification, or can be disabled entirely.
Key Changes:
- Added
cpu_affinityconfiguration parameter supporting "auto", explicit core list, or None - Implemented CPU core pinning logic in worker spawn process with error handling
Reviewed changes
Copilot reviewed 2 out of 2 changed files in this pull request and generated 2 comments.
| File | Description |
|---|---|
| src/inference_endpoint/endpoint_client/configs.py | Adds cpu_affinity configuration field with three modes: None (disabled), "auto" (automatic assignment), or explicit core list |
| src/inference_endpoint/endpoint_client/worker.py | Implements CPU affinity logic that pins spawned workers to cores based on configuration, with fallback error handling |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
There was a problem hiding this comment.
Code Review
This pull request introduces CPU affinity for worker processes, allowing them to be pinned to specific CPU cores for better performance and stability. The implementation is mostly correct, but I've found a potential ZeroDivisionError that could crash the application if no available CPUs are found. My review includes a fix for this issue along with improved logging for better diagnostics.
There was a problem hiding this comment.
Pull request overview
Copilot reviewed 4 out of 4 changed files in this pull request and generated 2 comments.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
bc575b2 to
9e1a96c
Compare
There was a problem hiding this comment.
Pull request overview
Copilot reviewed 4 out of 4 changed files in this pull request and generated 2 comments.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
There was a problem hiding this comment.
Pull request overview
Copilot reviewed 5 out of 5 changed files in this pull request and generated 3 comments.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
56dcefb to
1e22394
Compare
There was a problem hiding this comment.
Pull request overview
Copilot reviewed 5 out of 5 changed files in this pull request and generated no new comments.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
There was a problem hiding this comment.
Pull request overview
Copilot reviewed 5 out of 5 changed files in this pull request and generated 1 comment.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
01ae2bf to
9022145
Compare
There was a problem hiding this comment.
Pull request overview
Copilot reviewed 5 out of 5 changed files in this pull request and generated 2 comments.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
There was a problem hiding this comment.
Pull request overview
Copilot reviewed 5 out of 5 changed files in this pull request and generated 3 comments.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
4c9edb9 to
2a69896
Compare
There was a problem hiding this comment.
Pull request overview
Copilot reviewed 5 out of 5 changed files in this pull request and generated 2 comments.
Comments suppressed due to low confidence (1)
src/inference_endpoint/utils/cpu_affinity.py:1
- The comment references 'cpu_affinity.py' but this file itself is 'worker.py'. The comment should reference the module name 'inference_endpoint.utils.cpu_affinity' or clarify that it refers to the imported AVAILABLE_CPUS variable to avoid confusion.
# SPDX-FileCopyrightText: Copyright (c) 2025 NVIDIA CORPORATION & AFFILIATES. All rights reserved.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
2a69896 to
860eefa
Compare
There was a problem hiding this comment.
Pull request overview
Copilot reviewed 5 out of 5 changed files in this pull request and generated no new comments.
Comments suppressed due to low confidence (1)
src/inference_endpoint/endpoint_client/worker.py:1
- The comment states 'excluding CPU 0', but the actual implementation excludes the loadgen CPU (which may not be CPU 0 if a different core is configured or detected as fastest). Update the comment to say 'excluding loadgen CPU' for accuracy.
# SPDX-FileCopyrightText: Copyright (c) 2026 NVIDIA CORPORATION & AFFILIATES. All rights reserved.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
b12ecb3 to
6020265
Compare
6020265 to
2f10034
Compare
There was a problem hiding this comment.
Pull request overview
Copilot reviewed 6 out of 6 changed files in this pull request and generated no new comments.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
What does this PR do?
Type of change
Related issues
Testing
Checklist