Less fails at non-standard whitespace and newline chars #2542

ralphsmith80 · 2015-04-07T18:51:34Z

This a broken issue with less@2.5.0. I've tested less@2.4.0 and it is working.

Related grunt-contrib-less which is where I discoverd this.

Here's a test case that breaks

.opacity(@op:100) {
    filter:alpha(opacity=@op);
    -moz-opacity:@op/100;
    -khtml-opacity:@op/100;
    opacity:@op/100;
}
body {
  .opacity();
}

seven-phases-max · 2015-04-07T23:00:18Z

What options you compile this with and what result do you actually get? (I can't reproduce this with lessc 2.5.0, I didn't yet try this with grunt-contrib-less though).

Upd.: I've tested with grunt-contrib-less (with the 2.5.0 less module) with default options and it compiles as expected... So waiting for more details.

ralphsmith80 · 2015-04-07T23:18:32Z

I'm not using any options. I created a simple test project that only has a dependency on less@2.5.0. Then I added a test.less file that is exactly what I posted above. After that it's all the basic stuff. Here's the output just doing npm install then ./node_modules/less/bin/lessc test.less.

$ npm install
npm WARN package.json test@1.0.0 No description
npm WARN package.json test@1.0.0 No repository field.
npm WARN package.json test@1.0.0 No README data
less@2.5.0 node_modules/less
├── graceful-fs@3.0.6
├── mime@1.3.4
├── image-size@0.3.5
├── promise@6.1.0 (asap@1.0.0)
├── errno@0.1.2 (prr@0.0.0)
├── source-map@0.4.2 (amdefine@0.1.0)
├── mkdirp@0.5.0 (minimist@0.0.8)
└── request@2.55.0 (caseless@0.9.0, json-stringify-safe@5.0.0, aws-sign2@0.5.0, forever-agent@0.6.1, stringstream@0.0.4, oauth-sign@0.6.0, tunnel-agent@0.4.0, isstream@0.1.2, node-uuid@1.4.3, qs@2.4.1, combined-stream@0.0.7, form-data@0.2.0, http-signature@0.10.1, tough-cookie@0.12.1, hawk@2.3.1, bl@0.9.4, mime-types@2.0.10, har-validator@1.6.1)

$ ./node_modules/less/bin/lessc test.less 
ParseError: Missing closing ')' in /Users/ralphs/Workspace/test-play/test/test.less on line 2, column 25:
1 .opacity(@op:100) {
2     filter:alpha(opacity=@op);
3     -moz-opacity:@op/100;

This is what the package.json looks like but it's just barebones so I could use npm install to install less.

{
  "name": "test",
  "version": "1.0.0",
  "description": "",
  "main": "index.js",
  "scripts": {
    "test": "echo \"Error: no test specified\" && exit 1"
  },
  "dependencies": {
    "less": "2.5.0"
  },
  "author": "",
  "license": "ISC"
}

Also I'm my npm and node versions are as follows:
npm: 2.1.18
node: 0.10.29

seven-phases-max · 2015-04-07T23:44:43Z

I see. OK, mysterious (still no error with all the same settings and modules in my environment (using node 0.11.16 though)... some char-encoding/new-line symbols maybe?). I'll add bug label then.

ralphsmith80 · 2015-04-08T01:17:13Z

Just a heads up I tried using node 0.11.16 and got the same failure I posted above. This might be obvious but did you clear your npm cache npm cache clear to ensure it's actually pulling the version you expect?

seven-phases-max · 2015-04-08T01:57:40Z

Yes I did try a clean npm install as well.
Btw. it's not just me who can't reproduce this. Notice that this particular syntax is covered by the tests and 2.5.0 build passes this test part in the travis environment too (it fails at further test stage but that stage is unrelated).

ralphsmith80 · 2015-04-08T12:51:18Z

Well it look like it is some kind of new-line symbol. Not really sure what just yet. I'm using sublime text and there is some indication of the line-endings not being the same. Here's a screenshot just in case someone else runs into this.

Notice the dots are space indentations, dashes would be tab indentations. Despite having 'unix' line-endings set there are some lines that do not have the 'unix' line-ending.

I fixed this by going through and manually re-tabbing each line.

Here's what the re-tabbed code looks like in sublime text.

lukeapage · 2015-04-08T12:54:05Z

We should cope with both. Was it the line endings or indentation?

ralphsmith80 · 2015-04-08T13:41:37Z

It's the indentation.

I'm still miffed as to how this file ever got into this state but so as not to be defeated I printed out the ascii values of the file. Since the file is small it wasn't to hard to grep through after that. Looks like the values are mapping to something outside of the basic ASCII range.

The ASCII sequence of the offending indentation is 194,160,194,160,194,160,194,160. If that's of any use. Since this was working in 2.4.0 that's probably irrelevant to any fix you'll have.

seven-phases-max · 2015-04-08T15:24:28Z

194,160 is UTF-8 c2a0->no-break space symbol.

ralphsmith80 · 2015-04-08T15:50:20Z

Thanks for that piece of knowledge @seven-phases-max.
Learn something new today ✅

seven-phases-max · 2016-01-30T11:02:42Z

Just a self-note:
Currently Less understands only [ \t\n\r] "whitespaces" (per 29f8ae2#diff-d0b0e282e3de141dd4f97e210d0c6df0R62).
But should handle [ \f\n\r\t\v\u00a0\u1680\u180e\u2000-\u200a\u2028\u2029\u202f\u205f\u3000\ufeff] to match initial "regexp-based" behaviour (maybe with a few exceptions? I'm not sure about all those \u codes).

matthew-dean · 2016-01-30T18:05:34Z

It should handle them, but they aren't white space I don't think?

I've been working on Less grammar in my spare time to correspond with node types and possibly use with a parser generator (which I would only be proposed would be used if it clearly outperformed the current parser).

So, this is (part of) what I have so far, which if I recall correctly, was largely lifted from CSS grammar:

Ident
  = "-"? NameStart NameChar*

// Primitives
Int
  = [0-9]+  
Hex3
  = Hex Hex Hex
Hex
  = [a-fA-F0-9]
NameStart
  = [a-zA-Z_] / NonAscii / Escape 
NameChar
  = [A-Za-z0-9_] / NonAscii / Escape 
NonAscii
  = [\u0080-\uD7FF\uE000-\uFFFD]
Escape
  = Unicode / "\\" [\u0020-\u007E\u0080-\uD7FF\uE000-\uFFFD]
Unicode
  = "\\" Hex+ _?


// Whitespace
_  = __?
__ = [ \t\r\n\f]+

So that should pretty well define what's called "ident" in the CSS spec, with the exception that it misses some unicode ranges that aren't supported by JavaScript regex. I can't recall though, if the white space regex came from somewhere else.

The CSS spec just defines whitespace as:

whitespace
A newline, U+0009 CHARACTER TABULATION, or U+0020 SPACE

Where did you find those whitespace ranges?

seven-phases-max · 2016-01-30T18:34:28Z

Where did you find those whitespace ranges?

MDN
Actually I'm also wondering of the characters they listed.

Technically W3C specs (both CSS2 and CSS3) only permit [ \f\n\r\t] (not counting this \u0000 remark), but it seems that browsers more tolerate to this. And the problem of current Less behaviour is that many editors tend to occasionally insert these exotic charcodes (usually when we accidentally press some [ctrl]+[alt]+[shift]+[bla-bla-bla] thing) and debugging it may become a head ache since these characters are usually invisible and the compiler only says "Missing closing ')/}'" or "Unexpected input" at best. (I saw at least two issues with this at Stackoverflow after 2.5.0 is released).

So alternatively we could still support only [ \f\n\r\t] but make errors more detailed (by printing the symbol/charcode the compiler stop at). This can require patching of too many messages though.

matthew-dean · 2016-01-30T18:42:25Z

Hmm, MDN is just listing the values of regex characters. The definition of "white space" in regex is different from the definition of "white space" in the CSS spec, which is why we don't just put [\s].

That said, your point about exotic charcodes is valid. Buuuut.... If the characters are known to be invisible (if that's determinable), I think Less might be smarter to just silently gobble them while parsing (if that's sane). Because really, how would you debug that, even if Less told you about where it was? Some editors/IDEs will represent invisibles with a visible placeholder, but some won't.

lukeapage · 2016-01-30T18:46:22Z

Seems an easy win to just support all unicode whitespace?

Re: grammar, nice one. I doubt it will be as fast but more maintainable. Maybe you can get close enough. I wanted to do this one day and architecturally things have moved slowly in that direction. The problem with the less grammar is
A)
/* comments are usually global but in css that would be valid in an unquoted url. So to add proper support you have to mention everywhere comments can be used instead of globally tokenising first. I guess mixing with whitespace and using comment or whitespace everywhere might do it. In fact unquoted urls are nightmares and it would make life easier to say less is a superset but dorsnt support unquoted urls
B)
You might need to find a n recursive backtracking grammar.. the hand built one relies on this. Alot of generated grammars i dont think would support a direct translation of the current parser.

matthew-dean · 2016-01-30T18:46:52Z

As far as browser tolerance... that may or may not be relevant, but I don't know how much time should be spent on it. If we decide to just use a regex \s, I'm not sure there'd be any downside, as long as those characters can't be interpreted as part of anything else that's valid? (Looks like what @lukeapage just said.)

matthew-dean · 2016-01-30T18:55:56Z

@lukeapage Yeah, I've been tinkering with it ever since it came up a long time ago. I've played with a few parser generators now. Most of them allow you to capture the grammar and transform the result. For instance, my Less grammar currently has stuff like:

Ruleset
  = a:SelectorList _ b:Block
    { return { selectors:a, block:b } }

Block
  = "{" _ primary:Primary* _ "}" ";"?
    { return primary }

SelectorList
  = a:Selector b:(_ "," _ Selector)* c:Guard?
    { return [a].concat(sel(b, [3])); }

As you can see, it's incomplete, but the point is you can probably gobble comments and then send them through a filtering function to separate them without backtracking.

Maybe I'll put my work in a public repo somewhere? The only reason I haven't is because it's not "official Less" and it's not guaranteed to be more performant (although I agree that currently, it's far more readable and maintainable). And I don't want to make any promises that I'll finish it. That said, if you or anyone else wants to help on it, I'd be happy to put it up. The parser generator I settled on, after using 3 or 4 is PEG.js: http://pegjs.org/

By the way, another upside is that the PEG.js generated parser is a bit better at capturing start / end of lines and columns on first pass, so that source map generation (and error reporting) might be faster. It looks like currently Less captures just the character index, and then goes through and does another pass to count new lines to return just the start line / column position.

matthew-dean · 2016-01-30T19:03:28Z

You could also have the parser check for comment starts within optional whitespace, so that any place where optional whitespace appears could potentially extract a comment. You can do a lot with smart grammar.

ralphsmith80 mentioned this issue Apr 7, 2015

Unable to compile mixins with default paramaters gruntjs/grunt-contrib-less#266

Closed

vladikoff mentioned this issue Apr 7, 2015

Release 1.1.0 when compat is fixed with less 2.5 gruntjs/grunt-contrib-less#268

Closed

seven-phases-max added the needs info label Apr 7, 2015

seven-phases-max added the bug label Apr 7, 2015

ralphsmith80 closed this as completed Apr 8, 2015

lukeapage reopened this Apr 8, 2015

seven-phases-max removed the needs info label Apr 8, 2015

seven-phases-max mentioned this issue Nov 4, 2015

Compiler error with v-2.5.3 #2721

Closed

seven-phases-max changed the title ~~Unable to compile mixins with default paramaters~~ Less fails at non-standard whitespace and newline chars Jan 15, 2016

seven-phases-max added the medium priority label Jan 15, 2016

allybee mentioned this issue May 2, 2016

Replace NO-BREAK SPACE characters concretecms/concretecms#3718

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Less fails at non-standard whitespace and newline chars #2542

Less fails at non-standard whitespace and newline chars #2542

ralphsmith80 commented Apr 7, 2015

seven-phases-max commented Apr 7, 2015

ralphsmith80 commented Apr 7, 2015

seven-phases-max commented Apr 7, 2015

ralphsmith80 commented Apr 8, 2015

seven-phases-max commented Apr 8, 2015

ralphsmith80 commented Apr 8, 2015

lukeapage commented Apr 8, 2015

ralphsmith80 commented Apr 8, 2015

seven-phases-max commented Apr 8, 2015

ralphsmith80 commented Apr 8, 2015

seven-phases-max commented Jan 30, 2016

matthew-dean commented Jan 30, 2016

seven-phases-max commented Jan 30, 2016

matthew-dean commented Jan 30, 2016

lukeapage commented Jan 30, 2016

matthew-dean commented Jan 30, 2016

matthew-dean commented Jan 30, 2016

matthew-dean commented Jan 30, 2016

Less fails at non-standard whitespace and newline chars #2542

Less fails at non-standard whitespace and newline chars #2542

Comments

ralphsmith80 commented Apr 7, 2015

seven-phases-max commented Apr 7, 2015

ralphsmith80 commented Apr 7, 2015

seven-phases-max commented Apr 7, 2015

ralphsmith80 commented Apr 8, 2015

seven-phases-max commented Apr 8, 2015

ralphsmith80 commented Apr 8, 2015

lukeapage commented Apr 8, 2015

ralphsmith80 commented Apr 8, 2015

seven-phases-max commented Apr 8, 2015

ralphsmith80 commented Apr 8, 2015

seven-phases-max commented Jan 30, 2016

matthew-dean commented Jan 30, 2016

seven-phases-max commented Jan 30, 2016

matthew-dean commented Jan 30, 2016

lukeapage commented Jan 30, 2016

matthew-dean commented Jan 30, 2016

matthew-dean commented Jan 30, 2016

matthew-dean commented Jan 30, 2016