Conversation
There was a problem hiding this comment.
Looks good overall — small, targeted CI fixes. Two inline comments follow.
Summary: The pkgin path fix for NetBSD and the stale-process cleanup for port 4433 are straightforward. The kill_port helper in perfcompare.py is clean. One formatting/style issue on the NetBSD line worth addressing.
There was a problem hiding this comment.
Pull request overview
This PR makes small CI reliability fixes by cleaning up leftover processes that can block required ports and by correcting NetBSD pkgin usage in the VM setup action.
Changes:
- Kill stale processes holding port 4433 before running the bench workflow.
- Add a
kill_port()helper inperfcompare.pyand invoke it during setup to avoid port conflicts. - Fix NetBSD package installation by calling
pkginvia its full path and adding missing build tools.
Reviewed changes
Copilot reviewed 3 out of 3 changed files in this pull request and generated 1 comment.
| File | Description |
|---|---|
| .github/workflows/bench.yml | Proactively frees port 4433 prior to running benchmarks on the self-hosted runner. |
| .github/scripts/perfcompare.py | Adds port cleanup in the perf-compare setup to avoid failures from stale listeners. |
| .github/actions/check-vm/action.yml | Fixes NetBSD VM dependency installation by using /usr/pkg/bin/pkgin and installing additional build tools. |
Codecov Report✅ All modified and coverable lines are covered by tests. Additional details and impacted files@@ Coverage Diff @@
## main #3566 +/- ##
==========================================
- Coverage 94.55% 94.42% -0.14%
==========================================
Files 128 133 +5
Lines 39736 40163 +427
Branches 39736 40163 +427
==========================================
+ Hits 37573 37924 +351
- Misses 1327 1393 +66
- Partials 836 846 +10
Flags with carried forward coverage won't be shown. Click here to find out more.
|
- Kill old processes hogging needed ports. - Fix `pkgin`. - Turn off qlog for benches (oops). - Generate a separate profile for each bench variant.
ba49681 to
2905bc9
Compare
There was a problem hiding this comment.
Clean CI maintenance PR. Four independent fixes, all straightforward.
The test-fixture/bench feature propagation is the most important change here. Without it, new_neqo_qlog() in test-fixture/src/lib.rs:553 was creating live qlog instances during benchmarks (the cfg!(feature = "bench") guard was never true for test-fixture), adding I/O noise to measurements. The fix correctly threads the feature through neqo-http3/bench and neqo-transport/bench. This does create a mutual feature cycle (neqo-http3[bench] ↔ test-fixture[bench] and neqo-transport[bench] ↔ test-fixture[bench]), but Cargo handles this correctly since the back-edge is through a dev-dependency.
The per-variant perf profiling loop in bench.yml is a nice improvement — individual flamegraphs per benchmark variant will be much more useful than a single combined profile. The fallback to whole-binary profiling for non-criterion benchmarks (e.g., min_bandwidth) is handled correctly.
The NetBSD ;; formatting issue noted by an earlier reviewer is still present on this commit.
| for proto in ("udp", "tcp"): | ||
| subprocess.run( | ||
| ["sudo", "fuser", "-k", f"{port}/{proto}"], | ||
| stdout=subprocess.DEVNULL, | ||
| stderr=subprocess.DEVNULL, | ||
| ) |
There was a problem hiding this comment.
Tip
subprocess.run defaults to check=False, so this works correctly. However, since fuser -k returns non-zero when no matching process exists, adding check=False explicitly would make the intent clearer — consistent with how sh() is defined elsewhere in this file.
(Re: Copilot's suggestion to add shutil.which guards and timeouts — I'd push back on that. This script only runs on the self-hosted benchmark runner where sudo, fuser, and passwordless sudo are guaranteed. Those guards add complexity for an environment that doesn't need them.)
| for proto in ("udp", "tcp"): | |
| subprocess.run( | |
| ["sudo", "fuser", "-k", f"{port}/{proto}"], | |
| stdout=subprocess.DEVNULL, | |
| stderr=subprocess.DEVNULL, | |
| ) | |
| for proto in ("udp", "tcp"): | |
| subprocess.run( | |
| ["sudo", "fuser", "-k", f"{port}/{proto}"], | |
| stdout=subprocess.DEVNULL, | |
| stderr=subprocess.DEVNULL, | |
| check=False, | |
| ) |
Benchmark resultsSignificant performance differences relative to a1331f5. transfer/walltime/pacing-false/varying-seeds: 💚 Performance has improved by -71.505%. time: [23.089 ms 23.107 ms 23.127 ms]
change: [-71.533% -71.505% -71.475] (p = 0.00 < 0.05)
Performance has improved.
Found 4 outliers among 100 measurements (4.00%)
3 (3.00%) high mild
1 (1.00%) high severetransfer/walltime/pacing-true/varying-seeds: 💚 Performance has improved by -71.676%. time: [23.422 ms 23.453 ms 23.503 ms]
change: [-71.723% -71.676% -71.610] (p = 0.00 < 0.05)
Performance has improved.
Found 7 outliers among 100 measurements (7.00%)
1 (1.00%) low mild
5 (5.00%) high mild
1 (1.00%) high severetransfer/walltime/pacing-false/same-seed: 💚 Performance has improved by -71.459%. time: [23.213 ms 23.241 ms 23.281 ms]
change: [-71.504% -71.459% -71.405] (p = 0.00 < 0.05)
Performance has improved.
Found 4 outliers among 100 measurements (4.00%)
1 (1.00%) low mild
2 (2.00%) high mild
1 (1.00%) high severetransfer/walltime/pacing-true/same-seed: 💚 Performance has improved by -71.283%. time: [23.833 ms 23.852 ms 23.872 ms]
change: [-71.347% -71.283% -71.236] (p = 0.00 < 0.05)
Performance has improved.
Found 6 outliers among 100 measurements (6.00%)
1 (1.00%) low mild
5 (5.00%) high mildAll resultstransfer/1-conn/1-100mb-resp (aka. Download): No change in performance detected. time: [207.53 ms 207.87 ms 208.29 ms]
thrpt: [480.10 MiB/s 481.06 MiB/s 481.85 MiB/s]
change:
time: [-0.1756% +0.0447% +0.2784] (p = 0.71 > 0.05)
thrpt: [-0.2776% -0.0447% +0.1759]
No change in performance detected.
Found 1 outliers among 100 measurements (1.00%)
1 (1.00%) high severetransfer/1-conn/10_000-parallel-1b-resp (aka. RPS): Change within noise threshold. time: [288.00 ms 289.66 ms 291.34 ms]
thrpt: [34.324 Kelem/s 34.523 Kelem/s 34.722 Kelem/s]
change:
time: [-2.0491% -1.3367% -0.5871] (p = 0.00 < 0.05)
thrpt: [+0.5906% +1.3548% +2.0920]
Change within noise threshold.
Found 1 outliers among 100 measurements (1.00%)
1 (1.00%) high mildtransfer/1-conn/1-1b-resp (aka. HPS): No change in performance detected. time: [38.626 ms 38.787 ms 38.970 ms]
thrpt: [25.661 B/s 25.782 B/s 25.889 B/s]
change:
time: [-0.3990% +0.2171% +0.7923] (p = 0.50 > 0.05)
thrpt: [-0.7861% -0.2166% +0.4006]
No change in performance detected.
Found 8 outliers among 100 measurements (8.00%)
3 (3.00%) high mild
5 (5.00%) high severetransfer/1-conn/1-100mb-req (aka. Upload): Change within noise threshold. time: [208.98 ms 209.34 ms 209.73 ms]
thrpt: [476.81 MiB/s 477.68 MiB/s 478.51 MiB/s]
change:
time: [+0.2296% +0.4689% +0.7234] (p = 0.00 < 0.05)
thrpt: [-0.7182% -0.4667% -0.2291]
Change within noise threshold.
Found 1 outliers among 100 measurements (1.00%)
1 (1.00%) high severestreams/walltime/1-streams/each-1000-bytes: No change in performance detected. time: [589.05 µs 591.07 µs 593.52 µs]
change: [-0.5011% +0.0241% +0.5607] (p = 0.93 > 0.05)
No change in performance detected.
Found 8 outliers among 100 measurements (8.00%)
1 (1.00%) high mild
7 (7.00%) high severestreams/walltime/1000-streams/each-1-bytes: No change in performance detected. time: [12.272 ms 12.290 ms 12.309 ms]
change: [-0.4312% -0.2115% +0.0096] (p = 0.06 > 0.05)
No change in performance detected.streams/walltime/1000-streams/each-1000-bytes: Change within noise threshold. time: [44.056 ms 44.133 ms 44.237 ms]
change: [-0.5031% -0.2755% -0.0037] (p = 0.02 < 0.05)
Change within noise threshold.
Found 2 outliers among 100 measurements (2.00%)
1 (1.00%) high mild
1 (1.00%) high severetransfer/walltime/pacing-false/varying-seeds: 💚 Performance has improved by -71.505%. time: [23.089 ms 23.107 ms 23.127 ms]
change: [-71.533% -71.505% -71.475] (p = 0.00 < 0.05)
Performance has improved.
Found 4 outliers among 100 measurements (4.00%)
3 (3.00%) high mild
1 (1.00%) high severetransfer/walltime/pacing-true/varying-seeds: 💚 Performance has improved by -71.676%. time: [23.422 ms 23.453 ms 23.503 ms]
change: [-71.723% -71.676% -71.610] (p = 0.00 < 0.05)
Performance has improved.
Found 7 outliers among 100 measurements (7.00%)
1 (1.00%) low mild
5 (5.00%) high mild
1 (1.00%) high severetransfer/walltime/pacing-false/same-seed: 💚 Performance has improved by -71.459%. time: [23.213 ms 23.241 ms 23.281 ms]
change: [-71.504% -71.459% -71.405] (p = 0.00 < 0.05)
Performance has improved.
Found 4 outliers among 100 measurements (4.00%)
1 (1.00%) low mild
2 (2.00%) high mild
1 (1.00%) high severetransfer/walltime/pacing-true/same-seed: 💚 Performance has improved by -71.283%. time: [23.833 ms 23.852 ms 23.872 ms]
change: [-71.347% -71.283% -71.236] (p = 0.00 < 0.05)
Performance has improved.
Found 6 outliers among 100 measurements (6.00%)
1 (1.00%) low mild
5 (5.00%) high mildDownload data for |
Failed Interop TestsQUIC Interop Runner, client vs. server, differences relative to
All resultsSucceeded Interop TestsQUIC Interop Runner, client vs. server neqo-pr as client
neqo-pr as server
Unsupported Interop TestsQUIC Interop Runner, client vs. server neqo-pr as client
neqo-pr as server
|
Client/server transfer resultsPerformance differences relative to a1331f5. Transfer of 33554432 bytes over loopback, min. 100 runs. All unit-less numbers are in milliseconds.
Table above only shows statistically significant changes. See all results below. All resultsTransfer of 33554432 bytes over loopback, min. 100 runs. All unit-less numbers are in milliseconds.
Download data for |
pkgin.