Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Less fails at non-standard whitespace and newline chars #2542

Open
ralphsmith80 opened this issue Apr 7, 2015 · 18 comments
Open

Less fails at non-standard whitespace and newline chars #2542

ralphsmith80 opened this issue Apr 7, 2015 · 18 comments

Comments

@ralphsmith80
Copy link

This a broken issue with less@2.5.0. I've tested less@2.4.0 and it is working.

Related grunt-contrib-less which is where I discoverd this.

Here's a test case that breaks

.opacity(@op:100) {
    filter:alpha(opacity=@op);
    -moz-opacity:@op/100;
    -khtml-opacity:@op/100;
    opacity:@op/100;
}
body {
  .opacity();
}
@seven-phases-max
Copy link
Member

What options you compile this with and what result do you actually get? (I can't reproduce this with lessc 2.5.0, I didn't yet try this with grunt-contrib-less though).

Upd.: I've tested with grunt-contrib-less (with the 2.5.0 less module) with default options and it compiles as expected... So waiting for more details.

@ralphsmith80
Copy link
Author

I'm not using any options. I created a simple test project that only has a dependency on less@2.5.0. Then I added a test.less file that is exactly what I posted above. After that it's all the basic stuff. Here's the output just doing npm install then ./node_modules/less/bin/lessc test.less.

$ npm install
npm WARN package.json test@1.0.0 No description
npm WARN package.json test@1.0.0 No repository field.
npm WARN package.json test@1.0.0 No README data
less@2.5.0 node_modules/less
├── graceful-fs@3.0.6
├── mime@1.3.4
├── image-size@0.3.5
├── promise@6.1.0 (asap@1.0.0)
├── errno@0.1.2 (prr@0.0.0)
├── source-map@0.4.2 (amdefine@0.1.0)
├── mkdirp@0.5.0 (minimist@0.0.8)
└── request@2.55.0 (caseless@0.9.0, json-stringify-safe@5.0.0, aws-sign2@0.5.0, forever-agent@0.6.1, stringstream@0.0.4, oauth-sign@0.6.0, tunnel-agent@0.4.0, isstream@0.1.2, node-uuid@1.4.3, qs@2.4.1, combined-stream@0.0.7, form-data@0.2.0, http-signature@0.10.1, tough-cookie@0.12.1, hawk@2.3.1, bl@0.9.4, mime-types@2.0.10, har-validator@1.6.1)

$ ./node_modules/less/bin/lessc test.less 
ParseError: Missing closing ')' in /Users/ralphs/Workspace/test-play/test/test.less on line 2, column 25:
1 .opacity(@op:100) {
2     filter:alpha(opacity=@op);
3     -moz-opacity:@op/100;

This is what the package.json looks like but it's just barebones so I could use npm install to install less.

{
  "name": "test",
  "version": "1.0.0",
  "description": "",
  "main": "index.js",
  "scripts": {
    "test": "echo \"Error: no test specified\" && exit 1"
  },
  "dependencies": {
    "less": "2.5.0"
  },
  "author": "",
  "license": "ISC"
}

Also I'm my npm and node versions are as follows:
npm: 2.1.18
node: 0.10.29

@seven-phases-max
Copy link
Member

I see. OK, mysterious (still no error with all the same settings and modules in my environment (using node 0.11.16 though)... some char-encoding/new-line symbols maybe?). I'll add bug label then.

@ralphsmith80
Copy link
Author

Just a heads up I tried using node 0.11.16 and got the same failure I posted above. This might be obvious but did you clear your npm cache npm cache clear to ensure it's actually pulling the version you expect?

@seven-phases-max
Copy link
Member

Yes I did try a clean npm install as well.
Btw. it's not just me who can't reproduce this. Notice that this particular syntax is covered by the tests and 2.5.0 build passes this test part in the travis environment too (it fails at further test stage but that stage is unrelated).

@ralphsmith80
Copy link
Author

Well it look like it is some kind of new-line symbol. Not really sure what just yet. I'm using sublime text and there is some indication of the line-endings not being the same. Here's a screenshot just in case someone else runs into this.

line-ending

Notice the dots are space indentations, dashes would be tab indentations. Despite having 'unix' line-endings set there are some lines that do not have the 'unix' line-ending.

I fixed this by going through and manually re-tabbing each line.

Here's what the re-tabbed code looks like in sublime text.

fixed

@lukeapage
Copy link
Member

We should cope with both. Was it the line endings or indentation?

@lukeapage lukeapage reopened this Apr 8, 2015
@ralphsmith80
Copy link
Author

It's the indentation.

I'm still miffed as to how this file ever got into this state but so as not to be defeated I printed out the ascii values of the file. Since the file is small it wasn't to hard to grep through after that. Looks like the values are mapping to something outside of the basic ASCII range.

The ASCII sequence of the offending indentation is 194,160,194,160,194,160,194,160. If that's of any use. Since this was working in 2.4.0 that's probably irrelevant to any fix you'll have.

@seven-phases-max
Copy link
Member

194,160 is UTF-8 c2a0->no-break space symbol.

@ralphsmith80
Copy link
Author

Thanks for that piece of knowledge @seven-phases-max.
Learn something new today ✅

@seven-phases-max seven-phases-max changed the title Unable to compile mixins with default paramaters Less fails at non-standard whitespace and newline chars Jan 15, 2016
@seven-phases-max
Copy link
Member

Just a self-note:
Currently Less understands only [ \t\n\r] "whitespaces" (per 29f8ae2#diff-d0b0e282e3de141dd4f97e210d0c6df0R62).
But should handle [ \f\n\r\t\v​\u00a0\u1680​\u180e\u2000​-\u200a​\u2028\u2029\u202f\u205f​\u3000\ufeff] to match initial "regexp-based" behaviour (maybe with a few exceptions? I'm not sure about all those \u codes).

@matthew-dean
Copy link
Member

It should handle them, but they aren't white space I don't think?

I've been working on Less grammar in my spare time to correspond with node types and possibly use with a parser generator (which I would only be proposed would be used if it clearly outperformed the current parser).

So, this is (part of) what I have so far, which if I recall correctly, was largely lifted from CSS grammar:

Ident
  = "-"? NameStart NameChar*

// Primitives
Int
  = [0-9]+  
Hex3
  = Hex Hex Hex
Hex
  = [a-fA-F0-9]
NameStart
  = [a-zA-Z_] / NonAscii / Escape 
NameChar
  = [A-Za-z0-9_] / NonAscii / Escape 
NonAscii
  = [\u0080-\uD7FF\uE000-\uFFFD]
Escape
  = Unicode / "\\" [\u0020-\u007E\u0080-\uD7FF\uE000-\uFFFD]
Unicode
  = "\\" Hex+ _?


// Whitespace
_  = __?
__ = [ \t\r\n\f]+

So that should pretty well define what's called "ident" in the CSS spec, with the exception that it misses some unicode ranges that aren't supported by JavaScript regex. I can't recall though, if the white space regex came from somewhere else.

The CSS spec just defines whitespace as:

whitespace
A newline, U+0009 CHARACTER TABULATION, or U+0020 SPACE

Where did you find those whitespace ranges?

@seven-phases-max
Copy link
Member

Where did you find those whitespace ranges?

MDN
Actually I'm also wondering of the characters they listed.


Technically W3C specs (both CSS2 and CSS3) only permit [ \f\n\r\t] (not counting this \u0000 remark), but it seems that browsers more tolerate to this. And the problem of current Less behaviour is that many editors tend to occasionally insert these exotic charcodes (usually when we accidentally press some [ctrl]+[alt]+[shift]+[bla-bla-bla] thing) and debugging it may become a head ache since these characters are usually invisible and the compiler only says "Missing closing ')/}'" or "Unexpected input" at best. (I saw at least two issues with this at Stackoverflow after 2.5.0 is released).

So alternatively we could still support only [ \f\n\r\t] but make errors more detailed (by printing the symbol/charcode the compiler stop at). This can require patching of too many messages though.

@matthew-dean
Copy link
Member

Hmm, MDN is just listing the values of regex characters. The definition of "white space" in regex is different from the definition of "white space" in the CSS spec, which is why we don't just put [\s].

That said, your point about exotic charcodes is valid. Buuuut.... If the characters are known to be invisible (if that's determinable), I think Less might be smarter to just silently gobble them while parsing (if that's sane). Because really, how would you debug that, even if Less told you about where it was? Some editors/IDEs will represent invisibles with a visible placeholder, but some won't.

@lukeapage
Copy link
Member

Seems an easy win to just support all unicode whitespace?

Re: grammar, nice one. I doubt it will be as fast but more maintainable. Maybe you can get close enough. I wanted to do this one day and architecturally things have moved slowly in that direction. The problem with the less grammar is
A)
/* comments are usually global but in css that would be valid in an unquoted url. So to add proper support you have to mention everywhere comments can be used instead of globally tokenising first. I guess mixing with whitespace and using comment or whitespace everywhere might do it. In fact unquoted urls are nightmares and it would make life easier to say less is a superset but dorsnt support unquoted urls
B)
You might need to find a n recursive backtracking grammar.. the hand built one relies on this. Alot of generated grammars i dont think would support a direct translation of the current parser.

@matthew-dean
Copy link
Member

As far as browser tolerance... that may or may not be relevant, but I don't know how much time should be spent on it. If we decide to just use a regex \s, I'm not sure there'd be any downside, as long as those characters can't be interpreted as part of anything else that's valid? (Looks like what @lukeapage just said.)

@matthew-dean
Copy link
Member

@lukeapage Yeah, I've been tinkering with it ever since it came up a long time ago. I've played with a few parser generators now. Most of them allow you to capture the grammar and transform the result. For instance, my Less grammar currently has stuff like:

Ruleset
  = a:SelectorList _ b:Block
    { return { selectors:a, block:b } }

Block
  = "{" _ primary:Primary* _ "}" ";"?
    { return primary }

SelectorList
  = a:Selector b:(_ "," _ Selector)* c:Guard?
    { return [a].concat(sel(b, [3])); }

As you can see, it's incomplete, but the point is you can probably gobble comments and then send them through a filtering function to separate them without backtracking.

Maybe I'll put my work in a public repo somewhere? The only reason I haven't is because it's not "official Less" and it's not guaranteed to be more performant (although I agree that currently, it's far more readable and maintainable). And I don't want to make any promises that I'll finish it. That said, if you or anyone else wants to help on it, I'd be happy to put it up. The parser generator I settled on, after using 3 or 4 is PEG.js: http://pegjs.org/

By the way, another upside is that the PEG.js generated parser is a bit better at capturing start / end of lines and columns on first pass, so that source map generation (and error reporting) might be faster. It looks like currently Less captures just the character index, and then goes through and does another pass to count new lines to return just the start line / column position.

@matthew-dean
Copy link
Member

You could also have the parser check for comment starts within optional whitespace, so that any place where optional whitespace appears could potentially extract a comment. You can do a lot with smart grammar.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

4 participants