feat: move dns lookup to run independently#3654
Conversation
ben-dz
left a comment
There was a problem hiding this comment.
Other than these two changes, I think this looks good.
|
@ben-dz apologies for the late reply Will make the suggested changes |
|
@0xzrf Can you please rebase this branch? |
|
@vihu done |
## Summary of Changes - Report each trusted (`/run-e2e`) e2e and shreds shard's result as a check run on the **PR head SHA**, so branch protection's required `e2e (shard N)` / `shard-e2e (shard N)` contexts are satisfied. External fork PRs can now be merged normally instead of requiring a maintainer to bypass the `main` ruleset. - Checks are created by the GitHub Actions app — the integration the required contexts are pinned to — and gated to `workflow_dispatch`, so internal `pull_request` runs keep using their native checks (no double-reporting). - Make the `trusted-fork-e2e` dispatcher's confirmation comment non-fatal: a capped `GITHUB_TOKEN` denying `issues:write` was failing the dispatch job *after* the runs had already launched. It is now wrapped in try/catch so the failure is a warning, not a red job. This is builds on #3777: that PR let maintainers *run* trusted e2e on fork PRs, but the runs execute on the base ref so their native checks attach to `main`, never the PR head — leaving fork PRs BLOCKED even after a green run. This change makes those results *count* by re-reporting them onto the validated PR head SHA. Related RFC/PRs: builds on [#3777](#3777); unblocks fork PRs such as [#3654](#3654). ## Diff Breakdown | Category | Files | Lines (+/-) | Net | |-------------------|-------|-----------------|--------| | Config/build | 3 | +90 / -13 | +77 | | Docs | 1 | +1 / -0 | +1 | | **Total** | 4 | +91 / -13 | +78 | Entirely GitHub Actions workflow changes plus a changelog entry; no application code. <details> <summary>Key files (click to expand)</summary> - `.github/workflows/e2e.yml` — add `checks: write`; in each `e2e (shard N)` job, create an `in_progress` check run on the dispatched `head_sha` before the tests and update it to success/failure after, gated to `workflow_dispatch`. - `.github/workflows/shreds-e2e.yml` — same per-shard check-run reporting for the `shard-e2e (shard N)` jobs. - `.github/workflows/trusted-fork-e2e.yml` — wrap the dispatcher's confirmation `createComment` in try/catch so a denied `issues:write` no longer fails the job. </details> ## Testing Verification - `actionlint` (with the repo's `.github/actionlint.yaml` runner-label config) reports no findings on all three modified workflows. - End-to-end validation requires a real dispatch (the check-run path only runs under `workflow_dispatch`): after merge, comment `/run-e2e` on a fork PR (e.g. #3654) and confirm the `e2e (shard N)` / `shard-e2e (shard N)` checks turn green **on the PR head** and clear branch protection. ## Notes for reviewers - To satisfy *all* required contexts, the maintainer must run the full `/run-e2e` (the default, which dispatches both suites). A partial `/run-e2e e2e` posts only the 5 `e2e` checks and leaves the 4 `shard-e2e` contexts unreported, so the PR stays blocked. - Checks bind to a specific SHA. If the contributor pushes a new commit, a maintainer must re-run `/run-e2e` — this is intentional (never auto-run untrusted new code). - Known follow-up (not in this PR): if the dispatched `setup` job fails before the shards run, no checks post and the PR stays blocked; `setup` could post failure checks for all shards to surface that on the PR.
|
/run-e2e |
|
Hey @0xzrf This looks good, and we now are able to run CI on it, so we will be able to merge it. I'm ready to approve it, but our branch protection policy requires that your commits are signed. if you already have commit signing set up:
|
9c4249d to
e2994e5
Compare
|
@ben-dz I've made the changes, rebasing with |
|
The branch protection requires all the commits to be signed, but you should be able to run that rebase command (HEAD~3 instead of 2), and force-push over the non-signed commits with 3 signed ones. |
|
@ben-dz done |
|
/run-e2e |
Summary of Changes
Describe what changed in the PR
Introduced a
DeliveryDNSRefresherthat runs DNS for result-destinationhost:portstrings outside the measurement loop: it refreshes on a 5-minute ticker, once at startup, and whenever the desired set of destinations changes (after outbound or ICMP target discovery updates). The measurement/send path now usesDNSCache.LookupDeliveryUDPAddr, which never performsLookupHostfor hostnames—it only returns an address if the cache is already populated and still valid; literal public IPs are still resolved synchronously with the existingValidateScope()checks.sendCompositeOffsetsusesdeliveryDNS.Lookupinstead ofdnsCache.Resolve, so slow or timing DNS no longer runs inside the RTT/measurement cycle. If a hostname is not ready yet, that target is skipped for that cycle with a warning log.Explain why the change is necessary
Due to: geolocation: move geoprobe-agent DNS lookups out of measurement loop #3544
Is there supporting documentation or external resources that explain the change? No
Is a CHANGELOG.md update needed? No
Testing Verification
go testoncontrolplane/telemetry/internal/geoprobeforDNSCache_*,DeliveryDNS_*, andLookupDelivery*tests: pass (covers startup refresh, coalesced refresh afterSetDesiredHostPorts, literal-IP lookup without waiting, domain miss before refresh, cache hit/miss/TTL, private IP rejection, invalidhost:port).GOOS=linux go build ./controlplane/telemetry/cmd/geoprobe-agent/: succeeds (agent is//go:build linux).go test ./controlplane/telemetry/cmd/geoprobe-agent/...may fail on macOS because main_test.go is not tagged linux while main.go is—run agent package tests on Linux / CI for a full green check.