feat: GitHub annotations, human-readable output, auto baseline loading by farhan-syah · Pull Request #1 · ml-rust/fluxbench

farhan-syah · 2026-02-13T07:03:19Z

Summary

GitHub Actions annotations: Wire config.ci.github_annotations to emit ::error:: and ::warning:: annotations with file/line for crashed benchmarks, regressions, and verification failures
Human-readable terminal output: Replace raw nanosecond display with scaled duration formatting (ns/us/ms/s) in terminal output, matching GitHub summary readability
Automatic baseline loading: --baseline now works without a path, defaulting to config.output.baseline_path or target/fluxbench/baseline.json
CI fix: Clear stale baseline cache before saving so workflow re-runs on the same commit succeed

Test plan

All workspace tests pass. Verified format_duration output and baseline resolution logic.

Add cache deletion step to prevent stale baseline artifacts when re-running the same commit. GitHub Actions caches are immutable per key, so re-runs on the same SHA would fail to update the baseline without this fix. Also add actions:write permission required for cache deletion API.

Make format_duration public and export from the report crate root, enabling consistent human-readable duration formatting (ns/us/ms/s) across CLI and report generation.

Replace raw nanosecond values with adaptive formatting in human-readable output. Metrics now display as "2.5 ms" instead of "2500000.00 ns", eliminating mental unit conversion.

Improve baseline path resolution: - Make --baseline flag accept optional path argument - Fall back to config baseline_path or default target/fluxbench/baseline.json - Enable simpler CLI usage: just --baseline instead of --baseline /path/to/file Add GitHub Actions annotations: - Emit ::error:: for crashed/failed benchmarks with file/line location - Annotate significant regressions with baseline comparison - Mark verification failures as errors or warnings based on severity - Annotations appear inline on PR diffs in GitHub CI Use human-readable duration formatting in comparison output for consistency with other reports.

Binary executables don't need API documentation, only library crates do.

Extract baseline comparison logic into apply_baseline_comparison() to eliminate code duplication between run_benchmarks and compare_benchmarks. This enables baseline comparison support when using --baseline flag with the run command, not just the compare command. Also fixes documentation formatting in ComparisonSeries field comment.

Enable benchmarks to override the global CI regression threshold with benchmark-specific values. Critical benchmarks can now enforce stricter thresholds while less sensitive benchmarks use the global setting. Changes: - Add threshold field to BenchmarkReportResult for per-benchmark overrides - Thread threshold value through execution and reporting pipeline - Implement threshold selection logic (per-benchmark > 0 overrides global) - Display custom thresholds in GitHub Actions annotations - Add comprehensive unit tests for threshold precedence scenarios - Raise default CI threshold to 25% to accommodate shared runner variance The per-benchmark threshold is specified via the bench macro attribute and defaults to 0.0 (use global). When set, it takes precedence over the global regression_threshold in regression detection and reporting.

farhan-syah added 8 commits February 13, 2026 14:57

feat(report): export duration formatting utility for reuse

2204308

Make format_duration public and export from the report crate root, enabling consistent human-readable duration formatting (ns/us/ms/s) across CLI and report generation.

feat(cli): use human-readable duration formatting in benchmark reports

768f779

Replace raw nanosecond values with adaptive formatting in human-readable output. Metrics now display as "2.5 ms" instead of "2500000.00 ns", eliminating mental unit conversion.

chore(release): bump version to 0.1.2

2203a41

chore(cli): disable documentation generation for binary target

173b9a5

Binary executables don't need API documentation, only library crates do.

farhan-syah merged commit 0a163cc into main Feb 13, 2026
5 checks passed

farhan-syah deleted the 0.1.2 branch February 13, 2026 07:46

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: GitHub annotations, human-readable output, auto baseline loading#1

feat: GitHub annotations, human-readable output, auto baseline loading#1
farhan-syah merged 8 commits intomainfrom
0.1.2

farhan-syah commented Feb 13, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

farhan-syah commented Feb 13, 2026

Summary

Test plan

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant