Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

URL (percent) encoding should not be applied to unreserved characters #936

nnposter opened this issue Jul 9, 2017 · 2 comments

URL (percent) encoding should not be applied to unreserved characters #936

nnposter opened this issue Jul 9, 2017 · 2 comments


Copy link

@nnposter nnposter commented Jul 9, 2017

Function url.escape currently encodes all characters other than alphanumeric ASCII and underscore. This is going beyond what the encoding specification calls for and against its recommendation of not encoding so-called unreserved characters, namely:

unreserved  = ALPHA / DIGIT / "-" / "." / "_" / "~"

Per RFC 3986, section 2.3:

For consistency, percent-encoded octets in the ranges of ALPHA (%41-%5A and %61-%7A), DIGIT (%30-%39), hyphen (%2D), period (%2E), underscore (%5F), or tilde (%7E) should not be created by URI producers and, when found in a URI, should be decoded to their corresponding unreserved characters by URI normalizers.

This guidance is also aligned with the specification for application/x-www-form-urlencoded at

This "over-encoding" is causing issues with some HTTP targets that do not process such payloads correctly. As such, I am proposing to update url.lua not to encode these unreserved characters:

--- a/nselib/url.lua
+++ b/nselib/url.lua
@@ -66,7 +66,7 @@
 -- @param s Binary string to be encoded.
 -- @return Escaped representation of string.
 local function protect_segment(s)
-  return string.gsub(s, "([^A-Za-z0-9_])", function (c)
+  return string.gsub(s, "([^A-Za-z0-9_.~-])", function (c)
     if segment_set[c] then return c
     else return string.format("%%%02x", string.byte(c)) end
@@ -108,7 +108,7 @@
 -- @return Escaped representation of string.
 function escape(s)
-  return string.gsub(s, "([^A-Za-z0-9_])", function(c)
+  return string.gsub(s, "([^A-Za-z0-9_.~-])", function(c)
     return string.format("%%%02x", string.byte(c))

(The first of the two changes is done for consistency reasons; it is not technically necessary because all unreserved characters are included in segment_set. The second change is the critical one.)

Please leave a note if you have any questions or concerns. Otherwise the patch will be committed in a few weeks.

Copy link

@cldrn cldrn commented Jul 9, 2017

Good catch!


Copy link

@dmiller-nmap dmiller-nmap commented Jul 10, 2017

Sounds good to me.


@nmap-bot nmap-bot closed this in 86cf5a1 Jul 22, 2017
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
None yet
None yet
Linked pull requests

Successfully merging a pull request may close this issue.

None yet
3 participants