Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Adds libraries for Punycode and IDNA conversions. #958

Closed
wants to merge 24 commits into from

Conversation

rewanthtammana
Copy link
Contributor

@rewanthtammana rewanthtammana commented Aug 3, 2017

No description provided.

Copy link

@dmiller-nmap dmiller-nmap left a comment

Ok, this is almost done! Now we're done bickering over rulesets and we're defining the interface for other NSE programmers and streamlining the code. Take a look at these changes and let me know if you have time to keep working on it.

nselib/idna.lua Outdated
-- Since this library is dependent on punycode and vice-versa, we need to
-- import each of the libraries into another. This prevents idna from entering
-- into a recursive loop.
package.loaded["idna"] = _ENV

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We can eliminate this confusion if we divide the functions cleanly between the two libraries. In punycode.lua we should only handle Punycode encoding and decoding of labels. These functions should be in idna.lua instead of punycode.lua:

  • Splitting names into labels based on RFC 3490 separators.
  • Unicode de/encoding. In fact, we could even make this the responsibility of the caller, and only accept tables of code points for Unicode inputs (i.e. never a string of encoded bytes)
  • Joining labels with "."

This has the potential to really simplify a lot of the code here. Most likely, scripts and other libraries will only require idna.lua, since that's the higher-level interface.

Copy link
Contributor Author

@rewanthtammana rewanthtammana Sep 24, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

nselib/idna.lua Outdated
decoded_tbl = concat_table_in_tables(decoded_tbl)

-- Regular expressions (RFC 3490 separators)
for index, cp in ipairs(decoded_tbl) do

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This mapping is actually done in idnaMappings, so we don't need to do the additional checks here.

Copy link
Contributor Author

@rewanthtammana rewanthtammana Sep 24, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

nselib/idna.lua Outdated
for index, cp in ipairs(decoded_tbl) do
local lookup = idnaMappings[cp]
if type(lookup) == "table" then
if lookup.status == "deviation" then

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There is no reason to keep looping over and over the code points. One loop is sufficient. Inside the loop:

  1. do the lookup in idnaMappings
  2. Replace the code point if it's a number
  3. if it's a table, check the status. You can do if transitionalProcessing and lookup.status == "deviation" then to handle that case.
  4. handle the code point as needed.

In other words, handle each code point exactly once. The output of idnaMappings is always final; if it's mapped, then it is mapped to one or more valid code points which do not need to be checked again.

Copy link
Contributor Author

@rewanthtammana rewanthtammana Sep 24, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

nselib/idna.lua Outdated
-- Since this library is dependent on punycode and vice-versa, we need to
-- import each of the libraries into another. This prevents idna from entering
-- into a recursive loop.
package.loaded["idna"] = _ENV

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We can avoid this confusion by limiting punycode.lua to simply encoding and decoding labels instead of whole names. In other words, only idna.lua is responsible for:

  • Separating and joining names and labels on RFC 3490 separators.
  • Unicode-to-bytes encoding/decoding. In fact, we could require the caller to handle this instead, though that might lead to code duplication.

Copy link
Contributor Author

@rewanthtammana rewanthtammana Sep 24, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

nselib/idna.lua Outdated
decoded_tbl = concat_table_in_tables(decoded_tbl)

-- Regular expressions (RFC 3490 separators)
for index, cp in ipairs(decoded_tbl) do

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These are already handled by idnaMappings, so no need to check separately for them.

Copy link
Contributor Author

@rewanthtammana rewanthtammana Sep 24, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Removed this piece of code.

nselib/idna.lua Outdated
-- @param checkBidi Boolean flag to represent if the input is of Bidi type.
-- @param checkJoiners Boolean flag to check for ContextJ rules in input.
-- @param useSTD3ASCIIRules Boolean value to represent ASCII rules.
-- @param transitionalProcessing Boolean value.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Add default values to the NSEdoc and note any flags that are not supported. We can't rely on consumers to read the code; at best they will look at the documentation, so it must be correct.

Copy link
Contributor Author

@rewanthtammana rewanthtammana Sep 24, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

nselib/idna.lua Outdated
local inputString = unicode.encode(codepoints, encoder)

-- Checks for invalid domain codepoints and proceeds further.
if not match(inputString, "[^a-z0-9_@=+/%-%(%)%%]") then

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What's the reference for these valid code points/characters? I'm surprised to see _@=+()% in there.

Copy link
Contributor Author

@rewanthtammana rewanthtammana Sep 24, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My bad, I deleted this piece of code.

-- incrementing `n` each time, so we'll fix that now:
if (floor(i / out) > maxInt - n) then
--error('overflow');
return nil, "Overflow exception occurred."

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are we checking for this and other punycode errors when we call these functions from idna.lua? We should be doing so and passing the error up to the caller.

Copy link
Contributor Author

@rewanthtammana rewanthtammana Sep 24, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

punycode.mapLabels is the only function being used in idna.lua and this function is returned to the caller, so if there is an error the error will be returned to the caller along with nil as response value.

Copy link
Contributor Author

@rewanthtammana rewanthtammana Sep 24, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So, effectively it becomes the job of the caller to handle the errors being thrown.

function encode_label(s, decoder)

local flag = false
local decoded_tbl = unicode.decode(s, decoder)

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Remember, for simplicity we can require callers of punycode functions to already do the decoding and pass in a table of code points. After all, this is what idna.lua has after doing the mapping/validation, so it doesn't make sense to keep encoding and decoding between calls.

Copy link
Contributor Author

@rewanthtammana rewanthtammana Sep 24, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I thought decoding here will help the NSE developers who import this function exclusively.

end

-- Table of punycode test cases.
local testCases = {

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

When you convert the libraries to move the domain splitting/joining and Unicode stuff to idna.lua, you'll have to change these test cases to each be a single label.

Copy link
Contributor Author

@rewanthtammana rewanthtammana Sep 24, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

@rewanthtammana
Copy link
Contributor Author

@rewanthtammana rewanthtammana commented Sep 25, 2017

Almost all the requested changes have been done, please review the modified code.

Copy link

@dmiller-nmap dmiller-nmap left a comment

Just add missing requires for math and table to both, address these 2 typos, and commit! Thanks so much!

nselib/idna.lua Outdated
-- To use this part of code, add disallowed_STD3_mapped and disallowed_STD3_valid
-- codepoints to idnaMappings.lua. For now, we ignore these because idnaMappings.lua
-- is set to support only for the latest version of IDNA.
if UseSTD3ASCIIRules then

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is miscapitalized and therefore not referring to useSTD3ASCIIRules from the parameters list.

local delimiterCodePoint = 0x002E
local delimiter = unicode.encode({0x002E}, encoder)

codepoints = unicode.decode(input, decoder)

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Need to make this local

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants