Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

Already on GitHub? Sign in to your account

Adds libraries for Punycode and IDNA conversions. #958

Closed
wants to merge 24 commits into
from

Conversation

Projects
None yet
2 participants
Contributor

rewanth1997 commented Aug 3, 2017

No description provided.

Ok, this is almost done! Now we're done bickering over rulesets and we're defining the interface for other NSE programmers and streamlining the code. Take a look at these changes and let me know if you have time to keep working on it.

nselib/idna.lua
+-- Since this library is dependent on punycode and vice-versa, we need to
+-- import each of the libraries into another. This prevents idna from entering
+-- into a recursive loop.
+package.loaded["idna"] = _ENV
@dmiller-nmap

dmiller-nmap Sep 21, 2017

We can eliminate this confusion if we divide the functions cleanly between the two libraries. In punycode.lua we should only handle Punycode encoding and decoding of labels. These functions should be in idna.lua instead of punycode.lua:

  • Splitting names into labels based on RFC 3490 separators.
  • Unicode de/encoding. In fact, we could even make this the responsibility of the caller, and only accept tables of code points for Unicode inputs (i.e. never a string of encoded bytes)
  • Joining labels with "."

This has the potential to really simplify a lot of the code here. Most likely, scripts and other libraries will only require idna.lua, since that's the higher-level interface.

@rewanth1997

rewanth1997 Sep 24, 2017

Contributor

Done.

nselib/idna.lua
+-- Since this library is dependent on punycode and vice-versa, we need to
+-- import each of the libraries into another. This prevents idna from entering
+-- into a recursive loop.
+package.loaded["idna"] = _ENV
@dmiller-nmap

dmiller-nmap Sep 21, 2017

We can avoid this confusion by limiting punycode.lua to simply encoding and decoding labels instead of whole names. In other words, only idna.lua is responsible for:

  • Separating and joining names and labels on RFC 3490 separators.
  • Unicode-to-bytes encoding/decoding. In fact, we could require the caller to handle this instead, though that might lead to code duplication.
@rewanth1997

rewanth1997 Sep 24, 2017

Contributor

Done.

nselib/idna.lua
+ for index, cp in ipairs(decoded_tbl) do
+ local lookup = idnaMappings[cp]
+ if type(lookup) == "table" then
+ if lookup.status == "deviation" then
@dmiller-nmap

dmiller-nmap Sep 21, 2017

There is no reason to keep looping over and over the code points. One loop is sufficient. Inside the loop:

  1. do the lookup in idnaMappings
  2. Replace the code point if it's a number
  3. if it's a table, check the status. You can do if transitionalProcessing and lookup.status == "deviation" then to handle that case.
  4. handle the code point as needed.

In other words, handle each code point exactly once. The output of idnaMappings is always final; if it's mapped, then it is mapped to one or more valid code points which do not need to be checked again.

@rewanth1997

rewanth1997 Sep 24, 2017

Contributor

Done.

nselib/idna.lua
+ decoded_tbl = concat_table_in_tables(decoded_tbl)
+
+ -- Regular expressions (RFC 3490 separators)
+ for index, cp in ipairs(decoded_tbl) do
@dmiller-nmap

dmiller-nmap Sep 21, 2017

This mapping is actually done in idnaMappings, so we don't need to do the additional checks here.

@rewanth1997

rewanth1997 Sep 24, 2017

Contributor

Done.

nselib/idna.lua
+ decoded_tbl = concat_table_in_tables(decoded_tbl)
+
+ -- Regular expressions (RFC 3490 separators)
+ for index, cp in ipairs(decoded_tbl) do
@dmiller-nmap

dmiller-nmap Sep 21, 2017

These are already handled by idnaMappings, so no need to check separately for them.

@rewanth1997

rewanth1997 Sep 24, 2017

Contributor

Removed this piece of code.

nselib/idna.lua
+-- @param checkBidi Boolean flag to represent if the input is of Bidi type.
+-- @param checkJoiners Boolean flag to check for ContextJ rules in input.
+-- @param useSTD3ASCIIRules Boolean value to represent ASCII rules.
+-- @param transitionalProcessing Boolean value.
@dmiller-nmap

dmiller-nmap Sep 21, 2017

Add default values to the NSEdoc and note any flags that are not supported. We can't rely on consumers to read the code; at best they will look at the documentation, so it must be correct.

@rewanth1997

rewanth1997 Sep 24, 2017

Contributor

Done.

nselib/idna.lua
+ local inputString = unicode.encode(codepoints, encoder)
+
+ -- Checks for invalid domain codepoints and proceeds further.
+ if not match(inputString, "[^a-z0-9_@=+/%-%(%)%%]") then
@dmiller-nmap

dmiller-nmap Sep 21, 2017

What's the reference for these valid code points/characters? I'm surprised to see _@=+()% in there.

@rewanth1997

rewanth1997 Sep 24, 2017

Contributor

My bad, I deleted this piece of code.

+ -- incrementing `n` each time, so we'll fix that now:
+ if (floor(i / out) > maxInt - n) then
+ --error('overflow');
+ return nil, "Overflow exception occurred."
@dmiller-nmap

dmiller-nmap Sep 21, 2017

Are we checking for this and other punycode errors when we call these functions from idna.lua? We should be doing so and passing the error up to the caller.

@rewanth1997

rewanth1997 Sep 24, 2017

Contributor

punycode.mapLabels is the only function being used in idna.lua and this function is returned to the caller, so if there is an error the error will be returned to the caller along with nil as response value.

@rewanth1997

rewanth1997 Sep 24, 2017

Contributor

So, effectively it becomes the job of the caller to handle the errors being thrown.

+function encode_label(s, decoder)
+
+ local flag = false
+ local decoded_tbl = unicode.decode(s, decoder)
@dmiller-nmap

dmiller-nmap Sep 21, 2017

Remember, for simplicity we can require callers of punycode functions to already do the decoding and pass in a table of code points. After all, this is what idna.lua has after doing the mapping/validation, so it doesn't make sense to keep encoding and decoding between calls.

@rewanth1997

rewanth1997 Sep 24, 2017

Contributor

I thought decoding here will help the NSE developers who import this function exclusively.

+end
+
+-- Table of punycode test cases.
+local testCases = {
@dmiller-nmap

dmiller-nmap Sep 21, 2017

When you convert the libraries to move the domain splitting/joining and Unicode stuff to idna.lua, you'll have to change these test cases to each be a single label.

@rewanth1997

rewanth1997 Sep 24, 2017

Contributor

Done.

Contributor

rewanth1997 commented Sep 25, 2017

Almost all the requested changes have been done, please review the modified code.

Just add missing requires for math and table to both, address these 2 typos, and commit! Thanks so much!

nselib/idna.lua
+ -- To use this part of code, add disallowed_STD3_mapped and disallowed_STD3_valid
+ -- codepoints to idnaMappings.lua. For now, we ignore these because idnaMappings.lua
+ -- is set to support only for the latest version of IDNA.
+ if UseSTD3ASCIIRules then
@dmiller-nmap

dmiller-nmap Sep 27, 2017

This is miscapitalized and therefore not referring to useSTD3ASCIIRules from the parameters list.

nselib/punycode.lua
+ local delimiterCodePoint = 0x002E
+ local delimiter = unicode.encode({0x002E}, encoder)
+
+ codepoints = unicode.decode(input, decoder)
@dmiller-nmap

dmiller-nmap Sep 27, 2017

Need to make this local

@nmap-bot nmap-bot closed this in fdc9b19 Sep 27, 2017

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment