feat: GitHub annotations, human-readable output, auto baseline loading#1
Merged
farhan-syah merged 8 commits intomainfrom Feb 13, 2026
Merged
feat: GitHub annotations, human-readable output, auto baseline loading#1farhan-syah merged 8 commits intomainfrom
farhan-syah merged 8 commits intomainfrom
Conversation
Add cache deletion step to prevent stale baseline artifacts when re-running the same commit. GitHub Actions caches are immutable per key, so re-runs on the same SHA would fail to update the baseline without this fix. Also add actions:write permission required for cache deletion API.
Make format_duration public and export from the report crate root, enabling consistent human-readable duration formatting (ns/us/ms/s) across CLI and report generation.
Replace raw nanosecond values with adaptive formatting in human-readable output. Metrics now display as "2.5 ms" instead of "2500000.00 ns", eliminating mental unit conversion.
Improve baseline path resolution: - Make --baseline flag accept optional path argument - Fall back to config baseline_path or default target/fluxbench/baseline.json - Enable simpler CLI usage: just --baseline instead of --baseline /path/to/file Add GitHub Actions annotations: - Emit ::error:: for crashed/failed benchmarks with file/line location - Annotate significant regressions with baseline comparison - Mark verification failures as errors or warnings based on severity - Annotations appear inline on PR diffs in GitHub CI Use human-readable duration formatting in comparison output for consistency with other reports.
Binary executables don't need API documentation, only library crates do.
Extract baseline comparison logic into apply_baseline_comparison() to eliminate code duplication between run_benchmarks and compare_benchmarks. This enables baseline comparison support when using --baseline flag with the run command, not just the compare command. Also fixes documentation formatting in ComparisonSeries field comment.
Enable benchmarks to override the global CI regression threshold with benchmark-specific values. Critical benchmarks can now enforce stricter thresholds while less sensitive benchmarks use the global setting. Changes: - Add threshold field to BenchmarkReportResult for per-benchmark overrides - Thread threshold value through execution and reporting pipeline - Implement threshold selection logic (per-benchmark > 0 overrides global) - Display custom thresholds in GitHub Actions annotations - Add comprehensive unit tests for threshold precedence scenarios - Raise default CI threshold to 25% to accommodate shared runner variance The per-benchmark threshold is specified via the bench macro attribute and defaults to 0.0 (use global). When set, it takes precedence over the global regression_threshold in regression detection and reporting.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
config.ci.github_annotationsto emit::error::and::warning::annotations with file/line for crashed benchmarks, regressions, and verification failures--baselinenow works without a path, defaulting toconfig.output.baseline_pathortarget/fluxbench/baseline.jsonTest plan
All workspace tests pass. Verified
format_durationoutput and baseline resolution logic.