Skip to content

fix: sampling-param defaults, and make futures-http-client a testing util#32

Merged
viraatc merged 2 commits intomainfrom
fix/viraatc-refactor-chore1
Dec 5, 2025
Merged

fix: sampling-param defaults, and make futures-http-client a testing util#32
viraatc merged 2 commits intomainfrom
fix/viraatc-refactor-chore1

Conversation

@viraatc
Copy link
Copy Markdown
Collaborator

@viraatc viraatc commented Nov 20, 2025

What does this PR do?

MR includes changes:

  • futures client is now a minimal test wrapper for HttpClient (not used by probe.py anymore)
  • futures client drops callback functionality since not needed
  • cleanup http client tests
  • updated defaults for sampling-params (added repetition-penalty, topk)

Type of change

  • Bug fix
  • New feature
  • Documentation update
  • Refactor/cleanup

Related issues

Testing

  • Tests added/updated
  • All tests pass locally
  • Manual testing completed

Checklist

  • Code follows project style
  • Pre-commit hooks pass
  • Documentation updated (if needed)

@viraatc viraatc requested a review from a team as a code owner November 20, 2025 22:11
Copilot AI review requested due to automatic review settings November 20, 2025 22:11
@github-actions
Copy link
Copy Markdown

github-actions Bot commented Nov 20, 2025

MLCommons CLA bot All contributors have signed the MLCommons CLA ✍️ ✅

@gemini-code-assist
Copy link
Copy Markdown

Summary of Changes

Hello @viraatc, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

This pull request focuses on refactoring and cleaning up the futures-client module, primarily by simplifying the FuturesHttpClient class. It removes direct ZMQ-related setup and teardown, delegating these concerns to its parent class, thereby making the client more focused and reducing its internal complexity. Additionally, an unimplemented method was removed from http_sample_issuer.py for further code hygiene.

Highlights

  • Refactored Futures Client: The FuturesHttpClient class has been significantly simplified by removing its direct ZMQ-related initialization and management, delegating these responsibilities to its parent class for a cleaner implementation.
  • Removed ZMQ Dependencies: Imports and usage of zmq.asyncio, WorkerManager, ZMQPullSocket, and ZMQPushSocket have been eliminated from futures_client.py, streamlining the client's dependencies.
  • Streamlined Initialization and Shutdown: The async_start method in FuturesHttpClient now primarily calls its superclass's async_start, reducing boilerplate. Additionally, the shutdown timeout for the response handler task was reduced from 1.0s to 0.2s.
  • Improved Error Handling: A cleanup call (await self.async_shutdown()) has been added within the async_start exception block to ensure proper resource release in case of startup failures.
  • Cleanup of Unused Method: The process_sample_data method, which previously raised a NotImplementedError, has been removed from HttpClientSampleIssuer for better code hygiene.
Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point by creating a comment using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands on the current page.

Feature Command Description
Code Review /gemini review Performs a code review for the current pull request in its current state.
Pull Request Summary /gemini summary Provides a summary of the current pull request in its current state.
Comment @gemini-code-assist Responds in comments when explicitly tagged, both in pull request comments and review comments.
Help /gemini help Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here.

You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.

Footnotes

  1. Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution.

Copy link
Copy Markdown

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull Request Overview

This PR simplifies the futures_client.py by removing redundant initialization code and unused imports. The main change delegates worker and ZMQ setup to the parent class instead of duplicating it.

Key changes:

  • Removed duplicate ZMQ and worker initialization logic in FuturesHttpClient.async_start()
  • Removed unused imports (zmq.asyncio, WorkerManager, ZMQ socket utilities)
  • Removed unused process_sample_data method from HttpClientSampleIssuer

Reviewed Changes

Copilot reviewed 2 out of 2 changed files in this pull request and generated 1 comment.

File Description
src/inference_endpoint/endpoint_client/http_sample_issuer.py Removed unused process_sample_data method and Any import
src/inference_endpoint/endpoint_client/futures_client.py Delegated initialization to parent class, removed duplicate setup code and unused imports

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread tests/futures_client.py
@viraatc viraatc requested a review from nv-alicheng November 20, 2025 22:12
Copy link
Copy Markdown

@gemini-code-assist gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request simplifies the FuturesHttpClient by refactoring the ZMQ and worker management logic, delegating it to the parent HTTPEndpointClient. The changes make the code cleaner and reduce duplication. I have one suggestion to improve the robustness of error handling during client startup.

Comment thread tests/futures_client.py
@viraatc viraatc force-pushed the fix/viraatc-refactor-chore1 branch from d152aee to e72b129 Compare November 25, 2025 08:54
Copilot AI review requested due to automatic review settings November 25, 2025 21:41
Copy link
Copy Markdown

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 15 out of 15 changed files in this pull request and generated 6 comments.


💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread tests/integration/endpoint_client/test_http_client_streaming.py
Comment thread tests/integration/endpoint_client/test_http_client_core.py
Comment thread src/inference_endpoint/endpoint_client/zmq_utils.py
Comment thread src/inference_endpoint/endpoint_client/configs.py
Comment thread src/inference_endpoint/dataset_manager/dataloader.py
Comment thread src/inference_endpoint/commands/probe.py Outdated
Copilot AI review requested due to automatic review settings November 26, 2025 22:18
Copy link
Copy Markdown

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 13 out of 13 changed files in this pull request and generated 2 comments.

Comments suppressed due to low confidence (1)

src/inference_endpoint/commands/probe.py:1

  • The assertion was weakened from checking the error message content to just verifying an exception was raised. This reduces test effectiveness by not validating that the correct type of error occurred.
# SPDX-FileCopyrightText: Copyright (c) 2025 NVIDIA CORPORATION & AFFILIATES. All rights reserved.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread tests/futures_client.py
Comment thread src/inference_endpoint/dataset_manager/dataloader.py
@viraatc viraatc changed the title chore: simplify futures-client fix: sampling-param defaults, and make futures-http-client a testing util Dec 3, 2025
Copilot AI review requested due to automatic review settings December 3, 2025 22:46
@viraatc viraatc force-pushed the fix/viraatc-refactor-chore1 branch from 8760f42 to 34ff0c8 Compare December 3, 2025 22:46
Copy link
Copy Markdown

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 15 out of 15 changed files in this pull request and generated 4 comments.


💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread src/inference_endpoint/dataset_manager/dataloader.py
Comment thread tests/futures_client.py
Comment thread src/inference_endpoint/endpoint_client/configs.py
Comment thread src/inference_endpoint/endpoint_client/configs.py
Copy link
Copy Markdown
Collaborator

@arekay-nv arekay-nv left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM!

Comment thread src/inference_endpoint/commands/probe.py Outdated
Comment thread src/inference_endpoint/commands/probe.py Outdated
Comment thread src/inference_endpoint/config/schema.py Outdated
@viraatc viraatc force-pushed the fix/viraatc-refactor-chore1 branch 2 times, most recently from 3e9a12f to 1a06a2e Compare December 5, 2025 01:21
Copilot AI review requested due to automatic review settings December 5, 2025 01:40
@viraatc viraatc force-pushed the fix/viraatc-refactor-chore1 branch from 1a06a2e to 60644b1 Compare December 5, 2025 01:40
Copy link
Copy Markdown

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 16 out of 16 changed files in this pull request and generated 2 comments.


💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread src/inference_endpoint/config/schema.py Outdated
Comment thread src/inference_endpoint/endpoint_client/configs.py
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
Copilot AI review requested due to automatic review settings December 5, 2025 03:43
Copy link
Copy Markdown

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 16 out of 16 changed files in this pull request and generated 2 comments.


💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread src/inference_endpoint/endpoint_client/configs.py
Comment thread tests/integration/endpoint_client/test_http_client_core.py
@viraatc viraatc merged commit d4d5e16 into main Dec 5, 2025
4 checks passed
@github-actions github-actions Bot locked and limited conversation to collaborators Dec 5, 2025
@viraatc viraatc deleted the fix/viraatc-refactor-chore1 branch February 6, 2026 23:08
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants