Skip to content

feat: GitHub annotations, human-readable output, auto baseline loading#1

Merged
farhan-syah merged 8 commits intomainfrom
0.1.2
Feb 13, 2026
Merged

feat: GitHub annotations, human-readable output, auto baseline loading#1
farhan-syah merged 8 commits intomainfrom
0.1.2

Conversation

@farhan-syah
Copy link
Collaborator

Summary

  • GitHub Actions annotations: Wire config.ci.github_annotations to emit ::error:: and ::warning:: annotations with file/line for crashed benchmarks, regressions, and verification failures
  • Human-readable terminal output: Replace raw nanosecond display with scaled duration formatting (ns/us/ms/s) in terminal output, matching GitHub summary readability
  • Automatic baseline loading: --baseline now works without a path, defaulting to config.output.baseline_path or target/fluxbench/baseline.json
  • CI fix: Clear stale baseline cache before saving so workflow re-runs on the same commit succeed

Test plan

All workspace tests pass. Verified format_duration output and baseline resolution logic.

Add cache deletion step to prevent stale baseline artifacts when re-running
the same commit. GitHub Actions caches are immutable per key, so re-runs on
the same SHA would fail to update the baseline without this fix.

Also add actions:write permission required for cache deletion API.
Make format_duration public and export from the report crate root, enabling
consistent human-readable duration formatting (ns/us/ms/s) across CLI and
report generation.
Replace raw nanosecond values with adaptive formatting in human-readable
output. Metrics now display as "2.5 ms" instead of "2500000.00 ns",
eliminating mental unit conversion.
Improve baseline path resolution:
- Make --baseline flag accept optional path argument
- Fall back to config baseline_path or default target/fluxbench/baseline.json
- Enable simpler CLI usage: just --baseline instead of --baseline /path/to/file

Add GitHub Actions annotations:
- Emit ::error:: for crashed/failed benchmarks with file/line location
- Annotate significant regressions with baseline comparison
- Mark verification failures as errors or warnings based on severity
- Annotations appear inline on PR diffs in GitHub CI

Use human-readable duration formatting in comparison output for consistency
with other reports.
Binary executables don't need API documentation, only library crates do.
Extract baseline comparison logic into apply_baseline_comparison() to
eliminate code duplication between run_benchmarks and compare_benchmarks.
This enables baseline comparison support when using --baseline flag with
the run command, not just the compare command.

Also fixes documentation formatting in ComparisonSeries field comment.
Enable benchmarks to override the global CI regression threshold with
benchmark-specific values. Critical benchmarks can now enforce stricter
thresholds while less sensitive benchmarks use the global setting.

Changes:
- Add threshold field to BenchmarkReportResult for per-benchmark overrides
- Thread threshold value through execution and reporting pipeline
- Implement threshold selection logic (per-benchmark > 0 overrides global)
- Display custom thresholds in GitHub Actions annotations
- Add comprehensive unit tests for threshold precedence scenarios
- Raise default CI threshold to 25% to accommodate shared runner variance

The per-benchmark threshold is specified via the bench macro attribute
and defaults to 0.0 (use global). When set, it takes precedence over
the global regression_threshold in regression detection and reporting.
@farhan-syah farhan-syah merged commit 0a163cc into main Feb 13, 2026
5 checks passed
@farhan-syah farhan-syah deleted the 0.1.2 branch February 13, 2026 07:46
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant