Skip to content

Verification Fixes#5822

Merged
RobertJoonas merged 11 commits intomasterfrom
verification-fixes
Oct 27, 2025
Merged

Verification Fixes#5822
RobertJoonas merged 11 commits intomasterfrom
verification-fixes

Conversation

@RobertJoonas
Copy link
Copy Markdown
Contributor

@RobertJoonas RobertJoonas commented Oct 23, 2025

Changes

NOTE: Reviewing commit by commit recommended

The only customer facing functional change

When the Browserless request responds with an unexpected status code (400 | 429), or the request times out, we'll show this to the customer:

image

Observability changes

  1. Add the check module name into the service_error when our own Elixir check timeout kicks in. This makes it clearer in the logged diagnostics, whether it was cache bust or the initial installation_v2 check that timed out.

  2. Set the Req receive_timeout option to the current endpoint_timeout which gets passed to the /function API request. With this we should stop getting 408s from Browserless. See this commit (description) for more info: 557f18f

  3. In case we arrive to the case displayed in the above screenshot, a warning log + telemetry :unhandled event are emitted (current behaviour). With this PR, we'll start notifying Sentry with a new error message that will be separate from the current "Unhandled case for site verification (v2)". These new sentry errors, called "Browserless failure in verification (v2)" will be turned into escalating issues in Sentry, because a certain volume of those events is expected.

Refactorings

  • Make verification/checks_test.exs more DRY, also organise them into describe blocks
  • Turn service error into a map with code and extra keys, where code should always be an atom.

Tests

  • Automated tests have been added

Changelog

  • This PR does not make a user-facing change

Documentation

  • This change does not need a documentation update

Dark mode

  • This PR does not change the UI

Otherwise, it can sometimes remain unclear in the diagnostics, whether
it was InstallationV2 or InstallationV2CacheBust that timed out.
The current production logs show two types of verification timeouts:

* service_error: "Unhandled Browserless response status: 408" (vast
  majority of cases)
* service_error: :timeout (only a few cases)

The latter happens when we hit the Req receive_timeout
(endpoint_timeout + 2s). I've seen Browserless not respect the timeout
param from time to time, so it's better to keep the timeout logic
"in-house" only.
...but still consider them "unhandled" for telemetry, also notifying Sentry
and logging the warning.
@RobertJoonas RobertJoonas requested a review from apata October 23, 2025 09:36
Copy link
Copy Markdown
Contributor

@apata apata left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

All good with the new functionality. The app-side refactors are good, though I think there's two potential spots to discuss (whether we still want a Browserless-side timeout and whether we need two different Sentry messages).

Regarding refactored tests, I left a few non-blocking nitpicks which are probably a matter of personal preference. Ideally, though, we'd come to some sort of agreement what's good enough so we don't end up doing too many stylistic test refactors.

@RobertJoonas RobertJoonas added this pull request to the merge queue Oct 27, 2025
Merged via the queue into master with commit ad2c8e8 Oct 27, 2025
16 checks passed
@RobertJoonas RobertJoonas deleted the verification-fixes branch October 27, 2025 09:24
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants