Use reporter-generated Report as source for final metrics#43
Use reporter-generated Report as source for final metrics#43
Conversation
|
MLCommons CLA bot All contributors have signed the MLCommons CLA ✍️ ✅ |
Summary of ChangesHello @anandhu-eng, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed! This pull request refactors the benchmark command to standardize how performance metrics are collected and reported. Instead of calculating metrics ad-hoc, it now leverages a dedicated report object generated by the session's load generator. This change aims to improve the accuracy and reliability of benchmark results by using a single, authoritative source for all key performance indicators, while also adding error handling for missing reports. Highlights
Using Gemini Code AssistThe full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips. Invoking Gemini You can request assistance from Gemini at any point by creating a comment using either
Customization To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a Limitations & Feedback Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here. You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension. Footnotes
|
There was a problem hiding this comment.
Code Review
This pull request refactors the benchmark command to use the reporter-generated Report object as the authoritative source for final metrics. This is a good change that centralizes metric calculation and improves accuracy. The changes in BenchmarkSession to store the report and in _run_benchmark to consume it are correct. My review includes a suggestion to simplify the metric-gathering logic in benchmark.py for better readability and maintainability.
There was a problem hiding this comment.
Pull request overview
This PR refactors the benchmark command to use the reporter-generated Report object as the authoritative source for final metrics, rather than manually tracking timing and sample counts. This improves consistency and reduces redundancy by leveraging the comprehensive metrics already computed by the MetricsReporter.
Key changes:
- Store the generated report on the
BenchmarkSessionobject for external access - Replace manual timing (
start_time/elapsed_time) and counting (scheduler.total_samples_to_issue) with values from the report - Add error handling to ensure the report is available before extracting metrics
Reviewed changes
Copilot reviewed 2 out of 2 changed files in this pull request and generated no comments.
| File | Description |
|---|---|
| src/inference_endpoint/load_generator/session.py | Added self.report attribute to store the generated report after test completion, enabling external callers to access authoritative metrics |
| src/inference_endpoint/commands/benchmark.py | Refactored metrics calculation to use report fields (duration_ns, n_samples_issued, n_samples_completed, qps) instead of manual tracking, with error handling for missing reports |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
There was a problem hiding this comment.
Pull request overview
Copilot reviewed 2 out of 2 changed files in this pull request and generated 1 comment.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
There was a problem hiding this comment.
Pull request overview
Copilot reviewed 3 out of 3 changed files in this pull request and generated 4 comments.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
There was a problem hiding this comment.
Pull request overview
Copilot reviewed 5 out of 5 changed files in this pull request and generated 2 comments.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
There was a problem hiding this comment.
Pull request overview
Copilot reviewed 5 out of 5 changed files in this pull request and generated no new comments.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
There was a problem hiding this comment.
Pull request overview
Copilot reviewed 5 out of 5 changed files in this pull request and generated 2 comments.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
There was a problem hiding this comment.
Pull request overview
Copilot reviewed 7 out of 7 changed files in this pull request and generated 3 comments.
Comments suppressed due to low confidence (1)
tests/unit/commands/test_utils.py:194
- This test creates multiple template files in the current working directory without cleanup. Each iteration creates a file like "offline_template.yaml", "online_template.yaml", etc., which remain after the test completes. This pollutes the workspace and can cause issues across test runs. Consider adding cleanup logic to remove generated files after assertions, or use tmp_path to create files in a temporary directory that pytest automatically cleans up.
async def test_init_all_templates(self):
"""Test generating all template types."""
templates = ["offline", "online", "eval", "submission"]
for template_type in templates:
output_file = Path(f"{template_type}_template.yaml")
args = MagicMock()
args.template = template_type
await run_init_command(args)
assert output_file.exists()
assert output_file.stat().st_size > 0
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
There was a problem hiding this comment.
Pull request overview
Copilot reviewed 7 out of 7 changed files in this pull request and generated 3 comments.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
There was a problem hiding this comment.
Pull request overview
Copilot reviewed 7 out of 7 changed files in this pull request and generated 1 comment.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
What does this PR do?
Type of change
Related issues
Testing
Checklist