Skip to content

Bug 5501: Squid may exit when ACLs decode an invalid URI (#2145)#2369

Open
squidadm wants to merge 1 commit intosquid-cache:v7from
squidadm:v7-backport-pr2145
Open

Bug 5501: Squid may exit when ACLs decode an invalid URI (#2145)#2369
squidadm wants to merge 1 commit intosquid-cache:v7from
squidadm:v7-backport-pr2145

Conversation

@squidadm
Copy link
Collaborator

@squidadm squidadm commented Feb 6, 2026

2025/08/14 09:28:51| FATAL: invalid pct-encoded triplet
    exception location: Uri.cc(102) Decode

The bug affects url_regex and urllogin ACLs. However, not every use of
those ACLs results in a FATAL exit. The exact preconditions are unknown.

Three pct-encoding (RFC 3986) error handling algorithms were considered:

Algorithm A: An ACL that cannot decode, mismatches

This algorithm is similar to the algorithm used for handling "ACL is
used in context without ALE" and similar errors, but there is a
significant context difference: Those "without ALE" errors are Squid
misconfigurations or bugs! Decoding failures, on the other hand, are
caused by request properties outside of admin or Squid control.

With this algorithm, a request can easily avoid a "deny urlHasX" rule
match by injecting an invalid pct-encoding (e.g., X%bad). Such
injections may not be practical for URLs of most resources outside of
client control because most servers are unlikely to recognize the
malformed URL as something useful for the client. As for resources that
client does control, a urlHasX ACL cannot be effective for those anyway
because the client can change URLs.

Algorithm A does not let Squid admins match problematic URLs!

Algorithm B: An ACL that cannot decode X, tests raw/encoded X

With this algorithm, a request can trigger some "allow urlHasY" rule
matches by injecting an invalid pct-encoding that looks like Y (e.g., if
an "allow" rule looks for the word good, a request may contain a
%good or %XXgood sequence). Just like with algorithm A, such
injections probably have little practical value, for similar reasons.

Algorithm B lets Squid admins match problematic URLs.

Algorithm C: An ACL that cannot decode X, tests partially decoded X

With this algorithm, a "partially decoded X" is X where invalid
pct-encoding sequences (or their parts) are left "as is" while valid
pct-encoding triplets are decoded. This is actually a family of similar
algorithms because there are multiple ways to define invalid
pct-encoding sequence boundaries in certain URLs! For example,
%6Fne%f%6Fo can be replaced with one%foo or one%f%6Fo. This
additional complexity/uncertainty aggravates the two concerns below.

Algorithm B notes apply to algorithm C as well.

Algorithm C lets admins match problematic URLs but, again, it requires
that admins know exactly how Squid is going to isolate problematic
pct-encoding triplets (e.g., skip/leave just % byte that starts an
invalid pct-encoding sequence or the following two bytes as well).

Algorithm C family includes rfc1738_unescape() behavior. That decoding
function was used for the two ACLs before commit cbb9bf1 and commit
226394f started to use AnyP::Uri::Decode() added in commit 26256f2.
For example, rfc1738_unescape() decodes %% as % and leaves some
other invalid pct-encoding one-, two-, and three-byte sequences in the
decoded result. It is unlikely that many admins know exactly what that
old decoding does, but they could tune their rules to "work" as they
expect for specific cases. Those rules could stop working after the
above commits (v7.0.1+) and this change, to their surprise.

This change implements Algorithm B:

  • Unlike Algorithm A, B allows admins to match bad URLs.
  • Unlike Algorithm C, B does not force admins to guess how Squid
    mangles a bad URL before matching it.

Also updated ACLs documentation to reflect current implementation.

…#2145)

    2025/08/14 09:28:51| FATAL: invalid pct-encoded triplet
        exception location: Uri.cc(102) Decode

The bug affects url_regex and urllogin ACLs. However, not every use of
those ACLs results in a FATAL exit. The exact preconditions are unknown.

Three pct-encoding (RFC 3986) error handling algorithms were considered:

This algorithm is similar to the algorithm used for handling "ACL is
used in context without ALE" and similar errors, but there is a
significant context difference: Those "without ALE" errors are Squid
misconfigurations or bugs! Decoding failures, on the other hand, are
caused by request properties outside of admin or Squid control.

With this algorithm, a request can easily avoid a "deny urlHasX" rule
match by injecting an invalid pct-encoding (e.g., `X%bad`). Such
injections may not be practical for URLs of most resources outside of
client control because most servers are unlikely to recognize the
malformed URL as something useful for the client. As for resources that
client does control, a urlHasX ACL cannot be effective for those anyway
because the client can change URLs.

Algorithm A does not let Squid admins match problematic URLs!

With this algorithm, a request can trigger some "allow urlHasY" rule
matches by injecting an invalid pct-encoding that looks like Y (e.g., if
an "allow" rule looks for the word `good`, a request may contain a
`%good` or `%XXgood` sequence). Just like with algorithm A, such
injections probably have little practical value, for similar reasons.

Algorithm B lets Squid admins match problematic URLs.

With this algorithm, a "partially decoded X" is X where invalid
pct-encoding sequences (or their parts) are left "as is" while valid
pct-encoding triplets are decoded. This is actually a family of similar
algorithms because there are multiple ways to define invalid
pct-encoding sequence boundaries in certain URLs! For example,
`%6Fne%f%6Fo` can be replaced with `one%foo` or `one%f%6Fo`. This
additional complexity/uncertainty aggravates the two concerns below.

Algorithm B notes apply to algorithm C as well.

Algorithm C lets admins match problematic URLs but, again, it requires
that admins know exactly how Squid is going to isolate problematic
pct-encoding triplets (e.g., skip/leave just `%` byte that starts an
invalid pct-encoding sequence or the following two bytes as well).

Algorithm C family includes rfc1738_unescape() behavior. That decoding
function was used for the two ACLs before commit cbb9bf1 and commit
226394f started to use AnyP::Uri::Decode() added in commit 26256f2.
For example, rfc1738_unescape() decodes `%%` as `%` and leaves some
other invalid pct-encoding one-, two-, and three-byte sequences in the
decoded result. It is unlikely that many admins know exactly what that
old decoding does, but they could tune their rules to "work" as they
expect for specific cases. Those rules could stop working after the
above commits (v7.0.1+) and this change, to their surprise.

This change implements Algorithm B:

* Unlike Algorithm A, B allows admins to match bad URLs.
* Unlike Algorithm C, B does not force admins to guess how Squid
  mangles a bad URL before matching it.

Also updated ACLs documentation to reflect current implementation.
@rousskov
Copy link
Contributor

rousskov commented Feb 6, 2026

Section titles in the original commit message were lost while backporting. Please restore them.

@yadij
Copy link
Contributor

yadij commented Feb 6, 2026

Section titles in the original commit message were lost while backporting.

No, the Markdown is dropped by Anubis original patch (https://github.com/squidadm/squid/commit/0f39e8caed5e867b9771a9c21e112764f4dbe92f.patch).

Please restore them.

I have now manually copied the PR description from the original PR to this one.

Please be aware though that this will most likely make the titles actually be erased on merge. Since lines with "#" as first non-whitespace character are treated as comments and ignored by git.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants