Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: add syntax highlighting in the REPL #2254

Merged
merged 23 commits into from
May 31, 2024

Conversation

Snehil-Shah
Copy link
Member

@Snehil-Shah Snehil-Shah commented May 20, 2024

Resolves #2072

Description

What is the purpose of this pull request?

This pull request:

  • adds syntax highlighting to the REPL
  • add more color themes and API for user-defined themes
  • add settings to control behavior and change themes.
  • handle non-TTY contexts
  • add tests

This PR is in progress. Requesting for feedback..

Related Issues

Does this pull request have any related issues?

This pull request:

Questions

Any questions for reviewers of this pull request?

As there are only a few colors we can choose from to highlight using ANSI codes, I have kept a minimalist approach to highlighting. We don't highlight identifiers, operators and brackets. This is consistent with how IPython and node's repl-rewrite do their syntax highlighting. This can easily be changed as acorn's tokenizer is pretty verbose, so would love to hear your thoughts..

Other

Any other information relevant to this pull request? This may include screenshots, references, and/or implementation notes.

Currently, I have only added one theme. I will be extending this and also going to add an API for user-defined themes.

Tokens gettings highlighted:

  • String literals: strings, regexes, templates
  • Numeric literals
  • Keywords
  • Looping keywords: while, for etc
  • REPL commands
  • Comments

Attaching some screenshots at different terminal backgrounds:

Light:

light

Dark:

dark

Monokai:

monokai

Solarized:

solarized

Checklist

Please ensure the following tasks are completed before submitting this pull request.


@stdlib-js/reviewers

Signed-off-by: Snehil Shah <snehilshah.989@gmail.com>
Signed-off-by: Snehil Shah <snehilshah.989@gmail.com>
Signed-off-by: Snehil Shah <snehilshah.989@gmail.com>
@kgryte kgryte added Enhancement Issue or pull request for enhancing existing functionality. REPL Issue or pull request specific to the project REPL. labels May 21, 2024
Copy link
Member

@Planeshifter Planeshifter left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Overall, this looks pretty good. I left a suggestion on how we could move the colors to their own package for further downstream use.

In general, I would suggest keeping the scope of this PR limited and do theming and settings in follow-up PRs. Tests and the non-TTY handling may be good as part of this PR.

My main other suggestion would be to see whether we want to support any other token types; it's probably better to be exhaustive here as this would then allow for more styling options. As a reference, here are the token types used by the popular highlight.js library:
https://highlightjs.readthedocs.io/en/latest/css-classes-reference.html

I propose going through their list and see whether there are additional token types to add to the tokenizer.js handling.

*/
var ANSI = {
// Original colors:
'black': '\u001b[30m',
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's get this in for now as-is, but as a follow-up it may be worthwhile to add a separate package specifically for colors, as an alternative to https://www.npmjs.com/package/colors or https://www.npmjs.com/package/chalk. Similar to what chalk is doing, we could do feature detection in there for which ANSI colors are supported and adjust the exports accordingly. A potential location for such a package may be @stdlib/cli/colors, with color support detection potentially living in the assert namespace.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That would be nice to have! will add it as a TODO.

Signed-off-by: Snehil Shah <snehilshah.989@gmail.com>
Signed-off-by: Snehil Shah <snehilshah.989@gmail.com>
Signed-off-by: Snehil Shah <snehilshah.989@gmail.com>
@Snehil-Shah Snehil-Shah marked this pull request as ready for review May 25, 2024 16:03
@Snehil-Shah
Copy link
Member Author

Snehil-Shah commented May 25, 2024

@Planeshifter I went ahead and added more token types.

image

Punctuation and operators can also be colored now, I have decided not to with the default theme.

I am planning to add tests together with settings and API in the follow-up PR as custom themes would make testing each token-type easier. Is that okay?

I also tried adding escape sequences as a token type. I wrote a regex that matches any single character after \ or any 2 or 4 characters after \u or \x but apparently the escape sequences in the strings weren't escaped (by double backslash), so the regex wasn't working as javascript was treating \n as a newline char instead of an n after a backslash.
The only solution I can think of is having a list of common escape sequences (\n, \t etc) to individually match from but that doesn't cover all cases.

@Snehil-Shah
Copy link
Member Author

Snehil-Shah commented May 26, 2024

@Planeshifter @kgryte I was also wondering if we could highlight output values (like true in the above image)?

Signed-off-by: Snehil Shah <snehilshah.989@gmail.com>
@kgryte
Copy link
Member

kgryte commented May 26, 2024

@Snehil-Shah I'd probably avoid highlighting return values, as string representations of return values may not be syntactically correct (e.g., ndarray string serialization, etc). Additionally, the lack of highlighting in return values helps distinguish output from input, as only input is syntax highlighted.

Re: tests. Yes, deferring until we've added custom theming makes sense. As I believe you are suggesting, custom theming will allow toggling on and off token types, etc, thus allowing for easier unit tests.

Signed-off-by: Snehil Shah <snehilshah.989@gmail.com>
Signed-off-by: Snehil Shah <snehilshah.989@gmail.com>
Signed-off-by: Snehil Shah <snehilshah.989@gmail.com>
Signed-off-by: Snehil Shah <snehilshah.989@gmail.com>
Signed-off-by: Snehil Shah <snehilshah.989@gmail.com>
Signed-off-by: Snehil Shah <snehilshah.989@gmail.com>
Signed-off-by: Snehil Shah <snehilshah.989@gmail.com>
Signed-off-by: Snehil Shah <snehilshah.989@gmail.com>
Signed-off-by: Snehil Shah <snehilshah.989@gmail.com>
Signed-off-by: Snehil Shah <snehilshah.989@gmail.com>
@Snehil-Shah
Copy link
Member Author

Snehil-Shah commented May 28, 2024

@kgryte I have also decided to highlight when the user is declaring an existing variable as I feel it can help the user avoid writing conflicting variable names. So if the user goes var console, console will be highlighted as an object token. What do you think?

@kgryte
Copy link
Member

kgryte commented May 28, 2024

I have also decided to highlight when the user is declaring an existing variable as I feel it can help the user avoid writing conflicting variable names.

@Snehil-Shah Yes, that seems reasonable. Some overlap with preview completions, which also hint at existing identifiers, but likely worth distinguishing. I suspect that most themes will make existing and new identifiers color the same, but there may be some which would like to distinguish as you propose.

Copy link
Member

@Planeshifter Planeshifter left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@Snehil-Shah Is the PR ready for a final review? Would be great if we could get it in shortly and save follow-up work for subsequent PRs.

@Snehil-Shah
Copy link
Member Author

@Planeshifter Yes, it's ready!

var isStringTokenType = contains.factory( [ 'string', 'template' ] );
var isLiteralType = contains.factory( [ 'string', 'boolean', 'number' ] );
var isReservedName = contains.factory( [ 'undefined', 'async', 'await', 'let' ] );
var RE_PUNCTUATION = /(\{|\}|\[|\]|\(|\)|,|;|:|\.|\?|\?\.|=>|\.\.\.|`|\${)/;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We could consider moving this into its own @stdlib/regexp/* package down the line.

// VARIABLES //

var COMMANDS = commands();
var isControlKeyword = contains.factory( [ 'if', 'else', 'switch', 'case', 'catch', 'finally', 'try', 'return', 'break', 'continue' ] );
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It may be a good idea to move this array and the ones below into a separate JSON file such as the one below:

{
  "controlKeywords": ["if", "else", "switch", "case", "catch", "finally", "try", "return", "break", "continue"],
  "specialIdentifiers": ["this", "super"],
  "reservedLiterals": ["null", "true", "false"],
  "unrecognizedKeywords": ["async", "await", "let"],
  "stringTokenTypes": ["string", "template"],
  "literalTypes": ["string", "boolean", "number"],
  "reservedNames": ["undefined", "async", "await", "let"]
}

Such separation of data from the code logic may be helpful when writing tests and would make reuse of these sets easier.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think these sets mean much apart from this code logic. For example, the keywords only contains the keywords that acorn doesn't recognise as a keyword token. So it kind of becomes specific to the tokenizer. We can still move it to a separate JSON if that makes the code cleaner. Should I make a separate module that returns the object or should I make a JSON file (and at which directory)?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@Snehil-Shah That's fair; let's get the PR in now and we could still move it to its own file later if needed for testing etc. Since specific to the tokenizer, separate file would live in the same directory as the tokenizer.js file. In general, we prefer JSON when there is no reason that the exported object would ever be mutated, but use a module with a function instantiating the object when we want to allow returning a new copy.

Copy link
Member

@Planeshifter Planeshifter left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Left a few last comments. Should be ready to land!

Signed-off-by: Snehil Shah <snehilshah.989@gmail.com>
@Planeshifter Planeshifter merged commit 24f4a8f into stdlib-js:develop May 31, 2024
6 checks passed
kgryte added a commit that referenced this pull request Jun 8, 2024
PR-URL: #2291
Ref: #2254
Co-authored-by: Athan Reines <kgryte@gmail.com>
Reviewed-by: Athan Reines <kgryte@gmail.com> 
Signed-off-by: Snehil Shah <snehilshah.989@gmail.com>
Signed-off-by: Athan Reines <kgryte@gmail.com>
aman-095 pushed a commit to aman-095/stdlib that referenced this pull request Jun 13, 2024
PR-URL: stdlib-js#2254
Resolves: stdlib-js#2072

---------

Signed-off-by: Snehil Shah <snehilshah.989@gmail.com>
Reviewed-by: Athan Reines <kgryte@gmail.com> 
Reviewed-by: Philipp Burckhardt <pburckhardt@outlook.com>
aman-095 pushed a commit to aman-095/stdlib that referenced this pull request Jun 13, 2024
PR-URL: stdlib-js#2291
Ref: stdlib-js#2254
Co-authored-by: Athan Reines <kgryte@gmail.com>
Reviewed-by: Athan Reines <kgryte@gmail.com> 
Signed-off-by: Snehil Shah <snehilshah.989@gmail.com>
Signed-off-by: Athan Reines <kgryte@gmail.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Enhancement Issue or pull request for enhancing existing functionality. REPL Issue or pull request specific to the project REPL.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[RFC]: add syntax highlighting support to the REPL
3 participants