Skip to content

fix(validators): reject HTML metacharacters and whitespace in websiteUrl#1249

Merged
rdimitrov merged 1 commit intomainfrom
fix/validator-canonicalize-website-url
May 4, 2026
Merged

fix(validators): reject HTML metacharacters and whitespace in websiteUrl#1249
rdimitrov merged 1 commit intomainfrom
fix/validator-canonicalize-website-url

Conversation

@rdimitrov
Copy link
Copy Markdown
Member

validateWebsiteURL only checked that the URL parses, is absolute, and
uses the https scheme. It accepted literal ", ', <, >, and
ASCII space — none of which are valid in a URI per RFC 3986, and all of
which broke rendering when the value flowed into the catalogue UI's
<a href="..."> template.

Add a strings.IndexAny check after the scheme check that rejects any
of these characters with a clear website-url-invalid-characters issue
that points at the offending byte position. Control characters (\t,
\n, \r) are also covered: they're caught one step earlier by Go's
url.Parse, but the bytes are listed in the rejection set so the
behaviour is explicit and survives any future relaxation of url.Parse.
Already-percent-encoded URLs (e.g. ?q=hello%20world) continue to
validate cleanly.

Adds table-driven cases for ", ', <>, space, newline, and a
positive case for percent-encoded special characters.

Co-Authored-By: Claude Opus 4.7 (1M context) noreply@anthropic.com

`validateWebsiteURL` only checked that the URL parses, is absolute, and
uses the `https` scheme. It accepted literal `"`, `'`, `<`, `>`, and
ASCII space — none of which are valid in a URI per RFC 3986, and all of
which broke rendering when the value flowed into the catalogue UI's
`<a href="...">` template.

Add a `strings.IndexAny` check after the scheme check that rejects any
of these characters with a clear `website-url-invalid-characters` issue
that points at the offending byte position. Control characters (`\t`,
`\n`, `\r`) are also covered: they're caught one step earlier by Go's
`url.Parse`, but the bytes are listed in the rejection set so the
behaviour is explicit and survives any future relaxation of `url.Parse`.
Already-percent-encoded URLs (e.g. `?q=hello%20world`) continue to
validate cleanly.

Adds table-driven cases for `"`, `'`, `<>`, space, newline, and a
positive case for percent-encoded special characters.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@rdimitrov rdimitrov merged commit 78b7bbd into main May 4, 2026
5 checks passed
@rdimitrov rdimitrov deleted the fix/validator-canonicalize-website-url branch May 4, 2026 13:26
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant