Conversation
|
MLCommons CLA bot All contributors have signed the MLCommons CLA ✍️ ✅ |
There was a problem hiding this comment.
Pull request overview
This PR introduces a ZMQ transport protocol abstraction for worker pool inter-process communication. The changes refactor the existing ZMQ implementation into a clean protocol-based architecture that decouples transport mechanisms from worker logic.
Key Changes:
- Introduced transport protocol abstractions (
WorkerPoolTransport,WorkerConnector,ReceiverTransport,SenderTransport) - Implemented ZMQ transport with event-driven I/O using asyncio event loop integration
- Moved SSE accumulator logic out of Worker into dedicated protocol and implementations
- Removed
ZMQConfigin favor of transport-specific configuration - Updated worker initialization to use connector pattern instead of direct ZMQ socket management
Reviewed changes
Copilot reviewed 27 out of 27 changed files in this pull request and generated 2 comments.
Show a summary per file
| File | Description |
|---|---|
src/inference_endpoint/endpoint_client/transport/protocol.py |
Defines transport protocol interfaces for worker IPC abstraction |
src/inference_endpoint/endpoint_client/transport/zmq.py |
ZMQ implementation with event-driven transport and automatic temp directory management |
src/inference_endpoint/endpoint_client/transport/__init__.py |
Transport module exports and documentation |
src/inference_endpoint/endpoint_client/worker.py |
Refactored to use transport protocols; removed ZMQ-specific code and accumulator classes |
src/inference_endpoint/endpoint_client/worker_manager.py |
New dedicated worker manager using transport factory pattern |
src/inference_endpoint/endpoint_client/configs.py |
Added accumulator and transport configuration; removed ZMQConfig |
src/inference_endpoint/endpoint_client/accumulator_protocol.py |
Protocol definition for SSE accumulators |
src/inference_endpoint/openai/accumulator.py |
OpenAI SSE accumulator implementation |
src/inference_endpoint/sglang/accumulator.py |
SGLang SSE accumulator implementation |
src/inference_endpoint/endpoint_client/http_client.py |
Updated to use worker manager and simplified initialization |
tests/unit/endpoint_client/test_zmq_config.py |
Replaced ZMQConfig tests with ZmqWorkerPoolTransport factory tests |
tests/integration/endpoint_client/test_zmq_communication.py |
Simplified integration tests using new transport layer |
| Multiple test files | Updated to remove ZMQConfig usage and use new transport connector pattern |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
Summary of ChangesHello @viraatc, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed! This pull request significantly refactors the internal architecture of the inference endpoint client by introducing a robust abstraction layer for inter-process communication (IPC) and Server-Sent Events (SSE) stream accumulation. This change decouples the core client and worker logic from specific implementation details of ZMQ and SSE handling, enhancing modularity, maintainability, and extensibility. The refactoring involves encapsulating ZMQ-specific logic into a new Highlights
🧠 New Feature in Public Preview: You can now enable Memory to help Gemini Code Assist learn from your team's feedback. This makes future code reviews more consistent and personalized to your project's style. Click here to enable Memory in your admin console. Using Gemini Code AssistThe full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips. Invoking Gemini You can request assistance from Gemini at any point by creating a comment using either
Customization To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a Limitations & Feedback Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here. You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension. Footnotes
|
There was a problem hiding this comment.
Code Review
This pull request introduces a significant and well-executed refactoring of the endpoint client's inter-process communication. By abstracting the transport layer into a separate protocol-based component, the ZMQ-specific logic has been cleanly separated from the core client and worker implementation. This greatly improves modularity, maintainability, and testability. The introduction of SSEAccumulatorProtocol and moving accumulator logic to dedicated classes is also a welcome improvement.
My review includes a few suggestions to address potential bugs and improve code clarity, particularly in the probe command and one of the integration tests. Overall, this is a high-quality refactoring that strengthens the client's architecture.
There was a problem hiding this comment.
Pull request overview
Copilot reviewed 30 out of 30 changed files in this pull request and generated 9 comments.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
2fe5c29 to
237fb7f
Compare
237fb7f to
5b5c50d
Compare
There was a problem hiding this comment.
Pull request overview
Copilot reviewed 27 out of 27 changed files in this pull request and generated 7 comments.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
There was a problem hiding this comment.
Pull request overview
Copilot reviewed 29 out of 29 changed files in this pull request and generated 1 comment.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
There was a problem hiding this comment.
Pull request overview
Copilot reviewed 31 out of 31 changed files in this pull request and generated no new comments.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
59fc3e6 to
0ba3f97
Compare
There was a problem hiding this comment.
Pull request overview
Copilot reviewed 32 out of 32 changed files in this pull request and generated no new comments.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
ec1bc22 to
e63e7d9
Compare
e63e7d9 to
3306e6c
Compare
There was a problem hiding this comment.
Pull request overview
Copilot reviewed 34 out of 34 changed files in this pull request and generated no new comments.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
9b077b8 to
e02a762
Compare
There was a problem hiding this comment.
Pull request overview
Copilot reviewed 35 out of 35 changed files in this pull request and generated no new comments.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
0656ebf to
cbe58d8
Compare
arekay-nv
left a comment
There was a problem hiding this comment.
Much nicer to read now. Thanks!
src/inference_endpoint/endpoint_client/transport/zmq/transport.py
Outdated
Show resolved
Hide resolved
There was a problem hiding this comment.
Pull request overview
Copilot reviewed 35 out of 35 changed files in this pull request and generated no new comments.
Comments suppressed due to low confidence (3)
tests/integration/endpoint_client/test_zmq_communication.py:1
- The copyright year is 2026, which is in the future. This should be changed to 2025 or the actual year when the code was created.
tests/unit/endpoint_client/test_zmq_config.py:1 - The copyright year is 2026, which is in the future. This should be changed to 2025 or the actual year when the code was created.
src/inference_endpoint/endpoint_client/zmq_utils.py:1 - The copyright year is 2026, which is in the future. This should be changed to 2025 or the actual year when the code was created.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
What does this PR do?
MR includes refactors:
MR Includes changes to ZMQ:
worker,worker_manager,http_clientzmq_utils.pyNew impl. ~= Old Impl. for msg-size 64kb+ (since ser/des overhead starts dominating).
Much faster (with tighter latencies) for smaller packet sizes.
new:
old:
Type of change
Related issues
Testing
Checklist