Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improve R syntax highlighting #383

Merged
merged 25 commits into from
Jun 2, 2016
Merged

Improve R syntax highlighting #383

merged 25 commits into from
Jun 2, 2016

Conversation

klmr
Copy link
Contributor

@klmr klmr commented Feb 3, 2016

This PR adds the following features to the R lexer:

  • Recognition of filenames ending in lowercase .r
  • Backtick delimited tokens are not strings, they are identifiers
  • Fix recognition of reserved constants (used to be recognised as normal identifiers)
  • ... (and in fact any combination of dots in identifiers) is a valid name, and not a keyword
  • Identifiers may not start with _
  • Operators delimited by %% are recognised, and may contain whitespace
  • return isn’t a keyword in R, it’s a normal identifier
  • Comments starting with #' are marked as doc comments

The following improvements are still outstanding:

A separate lexer should be added to support the various flavours (RMarkdown and TeX) of Knitr. This is beyond the scope of this PR.

Backtick delimited identifiers are names, not strings, and should be
highlighted accordingly.
A backtick delimited name cannot contain backticks, and contains at
least one character.
This fixes the recognition of `NULL.` as an identifier, rather than a
constant followed by a dot.
Identifiers in R may contain any number of letters, digits, dots and
underscores; their only constraint is that they may not start with an
underscore.
Identifiers that start with a dot cannot immediately be followed by a
number.
Having all constants in an array ensures fewer errors due to uniform
handling; case in point: there was still a bug that would have
highlighted `NA.` etc as a constant followed by a dot.
@klmr
Copy link
Contributor Author

klmr commented Feb 4, 2016

Regarding the highlighting of function names, several strategies are viable:

  1. Highlight all primitive functions (those defined in terms of the .Primitive directive). This would include return, switch etc.
  2. Same as (1), but only when used as a function call (i.e. in foo(…), but not in x = foo).
  3. Same as (2) but for all (also user-defined) functions, not just primitives.
  4. Don’t highlight anything.

I’m not a fan of (1) since R’s functions are just normal identifiers, and thus it’s impossible to tell whether sum refers to a user-defined variable, a user-defined function, or the builtin function, base::sum. Case in point, the following code:

sum = a + b + c

I can’t think of any good argument to suggest that sum should be highlighted as a function here: it’s just any old identifier (it’s worth noting that many editors disagree; for instance, RStudio will happily highlight return and switch no matter how they are used; however, I think that RStudio’s highlighting is unhelpful in this instance).

I like (2) although it has unavoidable false negatives. For instance, it’s impossible to tell in general whether the names in lapply(foo, names) refers to a (built-in) function. The same is true for (3). In fact, because functions in R are just normal variables, it’s arguable whether visually distinguishing functions and other names makes sense at all.

@klmr
Copy link
Contributor Author

klmr commented Feb 4, 2016

Roxygen highlighting isn’t currently supported by the syntax highlighter, if I understand the list of tokens correctly: there’s no way of highlighting something as a keyword (or similar) inside a comment. I’ve reported this as a separate issue (#388).

@kevinushey
Copy link

Although return() isn't a keyword per-se it should definitely be highlighted as such IMHO, although note that recent versions of RStudio distinguish highlighting of return() and return.

FWIW, recent versions of RStudio provide an option to allow users to opt-in for 2); however, in practice, it can be a bit visually noisy.

https://github.com/rstudio/rstudio/blob/master/src/gwt/acesupport/acemode/r_highlight_rules.js might be useful for inspiration (although obviously not written in Ruby)

@klmr
Copy link
Contributor Author

klmr commented Feb 7, 2016

@kevinushey Have a look at my most recent commit, which implements (2) after a fashion, though I intentionally chose to highlight primitive functions not as Keyword but rather as Name::Function. The precise token is of course negotiable; however, highlighting it as Keyword is simply wrong and misleading in my opinion (after all, the whole point of a keyword is that its meaning cannot be overloaded by the user, and they thus act as visual markers of control structure in the code, which isn’t the case here); and Name::Builtin, while supported by Rouge, doesn’t actually perform any highlighting in the default stylesheet.

@kevinushey
Copy link

Looks good! Thanks for being the one to put in the work here :)

I think highlighting primitives as Functions is an okay compromise. My bias is still to treat return(), when called as a function, as a keyword, even if that is strictly not the case.

@klmr klmr mentioned this pull request Feb 9, 2016
@klmr klmr changed the title [WIP] Improve R syntax highlighting Improve R syntax highlighting Feb 10, 2016
@jneen jneen merged commit b534063 into rouge-ruby:master Jun 2, 2016
@jneen
Copy link
Member

jneen commented Jun 6, 2016

Thanks for your work! This is released in v1.11.0.

@klmr klmr deleted the fix-r branch June 6, 2016 22:17
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants