Adds libraries for Punycode and IDNA conversions.#958
Adds libraries for Punycode and IDNA conversions.#958rewanthtammana wants to merge 24 commits intonmap:masterfrom
Conversation
dmiller-nmap
left a comment
There was a problem hiding this comment.
Ok, this is almost done! Now we're done bickering over rulesets and we're defining the interface for other NSE programmers and streamlining the code. Take a look at these changes and let me know if you have time to keep working on it.
nselib/idna.lua
Outdated
| -- Since this library is dependent on punycode and vice-versa, we need to | ||
| -- import each of the libraries into another. This prevents idna from entering | ||
| -- into a recursive loop. | ||
| package.loaded["idna"] = _ENV |
There was a problem hiding this comment.
We can eliminate this confusion if we divide the functions cleanly between the two libraries. In punycode.lua we should only handle Punycode encoding and decoding of labels. These functions should be in idna.lua instead of punycode.lua:
- Splitting names into labels based on RFC 3490 separators.
- Unicode de/encoding. In fact, we could even make this the responsibility of the caller, and only accept tables of code points for Unicode inputs (i.e. never a string of encoded bytes)
- Joining labels with "."
This has the potential to really simplify a lot of the code here. Most likely, scripts and other libraries will only require idna.lua, since that's the higher-level interface.
nselib/idna.lua
Outdated
| decoded_tbl = concat_table_in_tables(decoded_tbl) | ||
|
|
||
| -- Regular expressions (RFC 3490 separators) | ||
| for index, cp in ipairs(decoded_tbl) do |
There was a problem hiding this comment.
This mapping is actually done in idnaMappings, so we don't need to do the additional checks here.
nselib/idna.lua
Outdated
| for index, cp in ipairs(decoded_tbl) do | ||
| local lookup = idnaMappings[cp] | ||
| if type(lookup) == "table" then | ||
| if lookup.status == "deviation" then |
There was a problem hiding this comment.
There is no reason to keep looping over and over the code points. One loop is sufficient. Inside the loop:
- do the lookup in
idnaMappings - Replace the code point if it's a number
- if it's a table, check the status. You can do
if transitionalProcessing and lookup.status == "deviation" thento handle that case. - handle the code point as needed.
In other words, handle each code point exactly once. The output of idnaMappings is always final; if it's mapped, then it is mapped to one or more valid code points which do not need to be checked again.
nselib/idna.lua
Outdated
| -- Since this library is dependent on punycode and vice-versa, we need to | ||
| -- import each of the libraries into another. This prevents idna from entering | ||
| -- into a recursive loop. | ||
| package.loaded["idna"] = _ENV |
There was a problem hiding this comment.
We can avoid this confusion by limiting punycode.lua to simply encoding and decoding labels instead of whole names. In other words, only idna.lua is responsible for:
- Separating and joining names and labels on RFC 3490 separators.
- Unicode-to-bytes encoding/decoding. In fact, we could require the caller to handle this instead, though that might lead to code duplication.
nselib/idna.lua
Outdated
| decoded_tbl = concat_table_in_tables(decoded_tbl) | ||
|
|
||
| -- Regular expressions (RFC 3490 separators) | ||
| for index, cp in ipairs(decoded_tbl) do |
There was a problem hiding this comment.
These are already handled by idnaMappings, so no need to check separately for them.
There was a problem hiding this comment.
Removed this piece of code.
nselib/idna.lua
Outdated
| -- @param checkBidi Boolean flag to represent if the input is of Bidi type. | ||
| -- @param checkJoiners Boolean flag to check for ContextJ rules in input. | ||
| -- @param useSTD3ASCIIRules Boolean value to represent ASCII rules. | ||
| -- @param transitionalProcessing Boolean value. |
There was a problem hiding this comment.
Add default values to the NSEdoc and note any flags that are not supported. We can't rely on consumers to read the code; at best they will look at the documentation, so it must be correct.
nselib/idna.lua
Outdated
| local inputString = unicode.encode(codepoints, encoder) | ||
|
|
||
| -- Checks for invalid domain codepoints and proceeds further. | ||
| if not match(inputString, "[^a-z0-9_@=+/%-%(%)%%]") then |
There was a problem hiding this comment.
What's the reference for these valid code points/characters? I'm surprised to see _@=+()% in there.
There was a problem hiding this comment.
My bad, I deleted this piece of code.
| -- incrementing `n` each time, so we'll fix that now: | ||
| if (floor(i / out) > maxInt - n) then | ||
| --error('overflow'); | ||
| return nil, "Overflow exception occurred." |
There was a problem hiding this comment.
Are we checking for this and other punycode errors when we call these functions from idna.lua? We should be doing so and passing the error up to the caller.
There was a problem hiding this comment.
punycode.mapLabels is the only function being used in idna.lua and this function is returned to the caller, so if there is an error the error will be returned to the caller along with nil as response value.
There was a problem hiding this comment.
So, effectively it becomes the job of the caller to handle the errors being thrown.
| function encode_label(s, decoder) | ||
|
|
||
| local flag = false | ||
| local decoded_tbl = unicode.decode(s, decoder) |
There was a problem hiding this comment.
Remember, for simplicity we can require callers of punycode functions to already do the decoding and pass in a table of code points. After all, this is what idna.lua has after doing the mapping/validation, so it doesn't make sense to keep encoding and decoding between calls.
There was a problem hiding this comment.
I thought decoding here will help the NSE developers who import this function exclusively.
| end | ||
|
|
||
| -- Table of punycode test cases. | ||
| local testCases = { |
There was a problem hiding this comment.
When you convert the libraries to move the domain splitting/joining and Unicode stuff to idna.lua, you'll have to change these test cases to each be a single label.
|
Almost all the requested changes have been done, please review the modified code. |
dmiller-nmap
left a comment
There was a problem hiding this comment.
Just add missing requires for math and table to both, address these 2 typos, and commit! Thanks so much!
nselib/idna.lua
Outdated
| -- To use this part of code, add disallowed_STD3_mapped and disallowed_STD3_valid | ||
| -- codepoints to idnaMappings.lua. For now, we ignore these because idnaMappings.lua | ||
| -- is set to support only for the latest version of IDNA. | ||
| if UseSTD3ASCIIRules then |
There was a problem hiding this comment.
This is miscapitalized and therefore not referring to useSTD3ASCIIRules from the parameters list.
nselib/punycode.lua
Outdated
| local delimiterCodePoint = 0x002E | ||
| local delimiter = unicode.encode({0x002E}, encoder) | ||
|
|
||
| codepoints = unicode.decode(input, decoder) |
No description provided.