Function url.escape currently encodes all characters other than alphanumeric ASCII and underscore. This is going beyond what the encoding specification calls for and against its recommendation of not encoding so-called unreserved characters, namely:
For consistency, percent-encoded octets in the ranges of ALPHA (%41-%5A and %61-%7A), DIGIT (%30-%39), hyphen (%2D), period (%2E), underscore (%5F), or tilde (%7E) should not be created by URI producers and, when found in a URI, should be decoded to their corresponding unreserved characters by URI normalizers.
This "over-encoding" is causing issues with some HTTP targets that do not process such payloads correctly. As such, I am proposing to update url.lua not to encode these unreserved characters:
--- a/nselib/url.lua+++ b/nselib/url.lua@@ -66,7 +66,7 @@
-- @param s Binary string to be encoded.
-- @return Escaped representation of string.
local function protect_segment(s)
- return string.gsub(s, "([^A-Za-z0-9_])", function (c)+ return string.gsub(s, "([^A-Za-z0-9_.~-])", function (c)
if segment_set[c] then return c
else return string.format("%%%02x", string.byte(c)) end
end)
@@ -108,7 +108,7 @@
-- @return Escaped representation of string.
-----------------------------------------------------------------------------
function escape(s)
- return string.gsub(s, "([^A-Za-z0-9_])", function(c)+ return string.gsub(s, "([^A-Za-z0-9_.~-])", function(c)
return string.format("%%%02x", string.byte(c))
end)
end
(The first of the two changes is done for consistency reasons; it is not technically necessary because all unreserved characters are included in segment_set. The second change is the critical one.)
Please leave a note if you have any questions or concerns. Otherwise the patch will be committed in a few weeks.
The text was updated successfully, but these errors were encountered:
Function
url.escape
currently encodes all characters other than alphanumeric ASCII and underscore. This is going beyond what the encoding specification calls for and against its recommendation of not encoding so-called unreserved characters, namely:Per RFC 3986, section 2.3:
This guidance is also aligned with the specification for
application/x-www-form-urlencoded
at https://url.spec.whatwg.org/#urlencoded-serializing.This "over-encoding" is causing issues with some HTTP targets that do not process such payloads correctly. As such, I am proposing to update
url.lua
not to encode these unreserved characters:(The first of the two changes is done for consistency reasons; it is not technically necessary because all unreserved characters are included in
segment_set
. The second change is the critical one.)Please leave a note if you have any questions or concerns. Otherwise the patch will be committed in a few weeks.
The text was updated successfully, but these errors were encountered: