Add report building#216
Conversation
|
MLCommons CLA bot All contributors have signed the MLCommons CLA ✍️ ✅ |
Summary of ChangesHello, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed! This pull request introduces a new, streamlined reporting system for benchmark metrics, designed to replace the older, more complex Highlights
🧠 New Feature in Public Preview: You can now enable Memory to help Gemini Code Assist learn from your team's feedback. This makes future code reviews more consistent and personalized to your project's style. Click here to enable Memory in your admin console. Using Gemini Code AssistThe full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips. Invoking Gemini You can request assistance from Gemini at any point by creating a comment using either
Customization To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a Limitations & Feedback Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for GitHub and other Google products, sign up here. Footnotes
|
There was a problem hiding this comment.
Code Review
This pull request introduces a new, simplified metrics reporting system. It includes a compute_summary function for statistical rollups, a Report dataclass to hold benchmark results, and a build_report function to construct reports from KVStore data. The changes also include comprehensive unit tests and design documentation. Feedback suggests improving robustness by using zip(..., strict=True) in compute_summary and _display_metric to prevent potential length mismatches, and ensuring consistency in the Report dataclass by having the tps property return 0.0 instead of None when the total output sequence length is zero.
Review Council — Multi-AI Code ReviewReviewed by: Claude | Depth: standard Found 2 issues across 2 files:
🤖 Generated with Claude Code |
fd7b4ed to
b3e80eb
Compare
e765e6a to
cd3c3f7
Compare
nv-alicheng
left a comment
There was a problem hiding this comment.
Review Council — Multi-AI Code Review
Reviewed by: Claude (Codex unavailable) | Depth: standard
Found 3 issues across 2 files:
- 2 medium
- 1 low
| # | File | Line | Severity | Category | Summary |
|---|---|---|---|---|---|
| 1 | report_builder.py |
55 | medium | performance | _summarize copies all raw values into numpy for every series metric. At 50k+ QPS over 60s, that's |
| 2 | report.py |
127 | medium | performance | dataclasses.asdict(self) deep-copies every field, then msgspec.json.encode traverses it again. C |
| 3 | report.py |
222 | low | data-integrity | All histogram buckets display as [lo, hi), but np.histogram's last bin is actually [lo, hi] (i |
🤖 Generated with Claude Code
…uint64 or float64 for precision purposes
…tead of linear interpolation for numpy percentile
What does this PR do?
Adds functionality to build a report like legacy MetricsReporter from the MetricsAggregator service.
Type of change
Related issues
Testing
Checklist