tools: non-Ascii linter for /lib only #18043

SirR4T · 2018-01-08T17:51:36Z

Non-ASCII characters in /lib get compiled into the node binary,
and may bloat the binary size unnecessarily. A linter rule may
help prevent this.

Fixes: #11209

Checklist

make -j4 test (UNIX), or vcbuild test (Windows) passes
commit message follows commit guidelines

Affected core subsystem(s)

eslint

addaleax

I would make it clear somewhere in a comment that this only catches characters in literals, not the source code itself (which is what we would eventually want to have instead)

SirR4T · 2018-01-08T18:02:30Z

oh? then @hkal 's approach of parsing all tokens would be preferred? I had assumed the eslint selector would select the source code too, i see where i might have gone wrong 😅

Also, doubts:

the regexp /[^\n\x20-\x7e]/ doesn't capture \r, which I have seen some instances of, while running the linter. I should be adding that in the regexp, right?
I have seen instances of \u0000 and \u000c etc. in code. I should be disabling this rule for these, right?

addaleax · 2018-01-08T18:04:51Z

@SirR4T I’m not sure how to do it, but yes, you want all tokens and even comments.

I have seen instances of \u0000 and \u000c etc. in code. I should be disabling this rule for these, right?

It’s fine to have the escape sequences spelled out as escape sequences in the source code. What matters is whether the files themselves contain characters that don’t fit into the ASCII range, because that’s what the script that bakes those files into the node binary looks for.

I hope that’s helpful. :)

SirR4T · 2018-01-08T18:30:51Z

now I'm running into a very peculiar error, while doing make lint-js:

I'm unable to disable the below error, via the regular methods (//eslint-disable-line, //eslint-disable-next-line, etc.)

$ make lint-js
Running JS linter...

/Users/sarat/Play/Github/node/lib/internal/test/unicode.js
  1:1  error  Non-ASCII character '✓' detected  non-ascii-character

✖ 1 problem (1 error, 0 warnings)
  1 error, 0 warnings potentially fixable with the `--fix` option.

make: *** [lint-js] Error 1

SirR4T · 2018-01-09T03:35:23Z

@addaleax : review latest commit? This was the approach followed in the (now closed) pull #11371, but I think that this is catching much lesser violations.

not-an-aardvark · 2018-01-10T01:17:45Z

I'm a bit confused about why non-ascii characters would bloat the binary size. Aren't the files encoded as UTF-8?

SirR4T · 2018-01-10T13:21:26Z

@not-an-aardvark that is what I understood, from #11129 (comment) :

I would have to assume it is because the external string API does not directly support UTF8. That is, if you look at ExternalStringResource, it assumes a utf16_t* buffer while ExternalOneByteStringResource assumes const char*. If the strings were stored as UTF8 we would incur an additional cost at startup that is not necessary.

Is that not correct?

addaleax · 2018-01-10T13:29:30Z

@SirR4T Yes, that is correct. Files that don’t fit into ASCII are currently saved as UTF-16 instead:

node/tools/js2c.py

Lines 230 to 240 in 1e0f331

    
           def Render(var, data): 
        
             # Treat non-ASCII as UTF-8 and convert it to UTF-16. 
        
             if any(ord(c) > 127 for c in data): 
        
               template = TWO_BYTE_STRING 
        
               data = map(ord, data.decode('utf-8').encode('utf-16be')) 
        
               data = [data[i] * 256 + data[i+1] for i in xrange(0, len(data), 2)] 
        
               data = ToCArray(data) 
        
             else: 
        
               template = ONE_BYTE_STRING 
        
               data = ToCString(data) 
        
             return template.format(var=var, data=data)

addaleax

I mean, this still looks fine to me, but I’m not an expert for eslint anyway.

If it doesn’t catch the diagram in lib/timers.js and the check mark in lib/internal/test/unicode.js (for both of which we’ll want eslint-disable comments), it’s probably not quite strict enough, though.

Trott · 2018-01-10T21:48:22Z

tools/eslint-rules/non-ascii-character.js

+// Rule Definition
+//------------------------------------------------------------------------------
+
+const nonAsciiRegexPattern = new RegExp(/[^\r\n\x20-\x7e]/);


Any reason not to use a RegExp literal like this?:

const nonAsciiRegexPattern = /[^\r\n\x20-\x7e]/;

Seems more readable to me and also more in line with our general coding style. (I'm kind of surprised this isn't caught by a lint rule itself, to be honest. Or maybe it is?)

not-an-aardvark · 2018-01-10T22:12:59Z

tools/eslint-rules/non-ascii-character.js

+      const commentTokens = source.getAllComments();
+      const tokens = sourceTokens.concat(commentTokens);
+
+      tokens.forEach((token) => reportIfError(node, token));


It might be better to match on source.text rather than each individual token, to ensure that non-ascii whitespace is detected (which would not be part of any token or comment).

SirR4T · 2018-01-11T09:30:32Z

Yay! 🎉

Could catch the lib/timers.js too, using source.text. But am still unable to bypass / skip the rule, for it ☹️

Any ideas? @Trott @not-an-aardvark

SirR4T · 2018-01-19T16:54:36Z

cc: @Trott @not-an-aardvark @addaleax

can someone help me out here? I'm unable to skip the lint checks, inside lib/ folder. Need to skip the rules for lib/timers.js, and the check mark in lib/internal/test/unicode.js.

not-an-aardvark · 2018-01-19T17:00:28Z

I think there are a few issues:

You're using a comment like // eslint-disable non-ascii-character, but I think it should be a block comment (/* eslint-disable non-ascii-character */)
When the rule reports a problem, the location of the problem is currently always the top of the file, because you're passing the Program node to context.report. Instead, it would be better if the report location were the point in the file that contains the non-ascii character. To do this, you could find the index of the non-ascii character in sourceCode.text, and then pass { ... loc: sourceCode.getLocFromIndex(theIndex) } to context.report.

If you make both of those changes, I think the disable comments will work as expected.

Non-ASCII characters in /lib get compiled into the node binary, and may bloat the binary size unnecessarily. A linter rule may help prevent this. Fixes: nodejs#11209

the linter should detect not just literals, but also source code and all comments too.

SirR4T · 2018-01-19T17:43:10Z

Thanks @not-an-aardvark ! worked like a charm! pushing the latest changes, once the build is done.

BridgeAR · 2018-02-02T10:21:31Z

CI https://ci.nodejs.org/job/node-test-pull-request/12908/

@not-an-aardvark PTAL

addaleax · 2018-02-04T15:55:50Z

Landed in c45afe8

Non-ASCII characters in /lib get compiled into the node binary, and may bloat the binary size unnecessarily. A linter rule may help prevent this. PR-URL: #18043 Fixes: #11209 Reviewed-By: Anna Henningsen <anna@addaleax.net> Reviewed-By: Ruben Bridgewater <ruben@bridgewater.de> Reviewed-By: Teddy Katz <teddy.katz@gmail.com>

MylesBorins · 2018-03-20T16:44:25Z

Should this be backported to v8.x-staging or v6.x-staging? If yes please follow the guide and raise a backport PR, if not let me know or add the dont-land-on label.

Non-ASCII characters in /lib get compiled into the node binary, and may bloat the binary size unnecessarily. A linter rule may help prevent this. PR-URL: nodejs#18043 Fixes: nodejs#11209 Reviewed-By: Anna Henningsen <anna@addaleax.net> Reviewed-By: Ruben Bridgewater <ruben@bridgewater.de> Reviewed-By: Teddy Katz <teddy.katz@gmail.com>

Non-ASCII characters in /lib get compiled into the node binary, and may bloat the binary size unnecessarily. A linter rule may help prevent this. PR-URL: #18043 Backport-PR-URL: #19499 Fixes: #11209 Reviewed-By: Anna Henningsen <anna@addaleax.net> Reviewed-By: Ruben Bridgewater <ruben@bridgewater.de> Reviewed-By: Teddy Katz <teddy.katz@gmail.com>

Non-ASCII characters in /lib get compiled into the node binary, and may bloat the binary size unnecessarily. A linter rule may help prevent this. PR-URL: nodejs#18043 Fixes: nodejs#11209 Reviewed-By: Anna Henningsen <anna@addaleax.net> Reviewed-By: Ruben Bridgewater <ruben@bridgewater.de> Reviewed-By: Teddy Katz <teddy.katz@gmail.com>

nodejs-github-bot added the tools Issues and PRs related to the tools directory. label Jan 8, 2018

addaleax approved these changes Jan 8, 2018

View reviewed changes

SirR4T force-pushed the eslintForLib branch from 1233f7b to 0403094 Compare January 9, 2018 03:37

addaleax approved these changes Jan 10, 2018

View reviewed changes

Trott reviewed Jan 10, 2018

View reviewed changes

not-an-aardvark reviewed Jan 10, 2018

View reviewed changes

SirR4T force-pushed the eslintForLib branch from 0403094 to cf8e821 Compare January 11, 2018 08:54

SirR4T force-pushed the eslintForLib branch from d4fb97c to 004ffb3 Compare January 11, 2018 16:58

SirR4T added 3 commits January 19, 2018 22:36

tools: non-Ascii linter for /lib only

3d63b03

Non-ASCII characters in /lib get compiled into the node binary, and may bloat the binary size unnecessarily. A linter rule may help prevent this. Fixes: nodejs#11209

tools: fix non-ascii linter detection

1399bf5

the linter should detect not just literals, but also source code and all comments too.

tools: using source.text instead of tokens

bd452d5

Adding loc to report

1e0ab23

SirR4T force-pushed the eslintForLib branch from 004ffb3 to 1e0ab23 Compare January 19, 2018 18:10

maclover7 force-pushed the master branch from bb5575a to 993b716 Compare January 26, 2018 22:02

cjihrig force-pushed the master branch from 993b716 to 082f952 Compare January 26, 2018 22:36

BridgeAR approved these changes Feb 2, 2018

View reviewed changes

not-an-aardvark approved these changes Feb 2, 2018

View reviewed changes

BridgeAR added the author ready PRs that have at least one approval, no pending requests for changes, and a CI started. label Feb 2, 2018

addaleax approved these changes Feb 2, 2018

View reviewed changes

addaleax closed this Feb 4, 2018

addaleax removed the author ready PRs that have at least one approval, no pending requests for changes, and a CI started. label Feb 4, 2018

MylesBorins mentioned this pull request Feb 21, 2018

v9.6.0 proposal #18902

Merged

MylesBorins added backport-requested-v6.x labels Mar 20, 2018

SirR4T mentioned this pull request Mar 20, 2018

[v6.x backport] tools: non-Ascii linter for /lib only #19493

Closed

4 tasks

SirR4T mentioned this pull request Mar 21, 2018

[v8.x backport] tools: non-Ascii linter for /lib only #19499

Closed

4 tasks

MylesBorins mentioned this pull request May 2, 2018

v8.11.2 proposal #20478

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

tools: non-Ascii linter for /lib only #18043

tools: non-Ascii linter for /lib only #18043

SirR4T commented Jan 8, 2018 •

edited

Loading

addaleax left a comment

SirR4T commented Jan 8, 2018

addaleax commented Jan 8, 2018

SirR4T commented Jan 8, 2018

SirR4T commented Jan 9, 2018

not-an-aardvark commented Jan 10, 2018

SirR4T commented Jan 10, 2018

addaleax commented Jan 10, 2018

addaleax left a comment

Trott Jan 10, 2018

not-an-aardvark Jan 10, 2018

SirR4T commented Jan 11, 2018

SirR4T commented Jan 19, 2018

not-an-aardvark commented Jan 19, 2018

SirR4T commented Jan 19, 2018

BridgeAR commented Feb 2, 2018

addaleax commented Feb 4, 2018

MylesBorins commented Mar 20, 2018

tools: non-Ascii linter for /lib only #18043

tools: non-Ascii linter for /lib only #18043

Conversation

SirR4T commented Jan 8, 2018 • edited Loading

Checklist

Affected core subsystem(s)

addaleax left a comment

Choose a reason for hiding this comment

SirR4T commented Jan 8, 2018

addaleax commented Jan 8, 2018

SirR4T commented Jan 8, 2018

SirR4T commented Jan 9, 2018

not-an-aardvark commented Jan 10, 2018

SirR4T commented Jan 10, 2018

addaleax commented Jan 10, 2018

addaleax left a comment

Choose a reason for hiding this comment

Trott Jan 10, 2018

Choose a reason for hiding this comment

not-an-aardvark Jan 10, 2018

Choose a reason for hiding this comment

SirR4T commented Jan 11, 2018

SirR4T commented Jan 19, 2018

not-an-aardvark commented Jan 19, 2018

SirR4T commented Jan 19, 2018

BridgeAR commented Feb 2, 2018

addaleax commented Feb 4, 2018

MylesBorins commented Mar 20, 2018

SirR4T commented Jan 8, 2018 •

edited

Loading