Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

add an extension point to run per-lang setups #661

Closed
wants to merge 129 commits into from

Conversation

capfredf
Copy link
Sponsor Contributor

@capfredf capfredf commented May 30, 2023

It works, but I think @greghendershott should have a better idea of how the code is organized.

Notes on the implementation:

A lang name is extracted textually in the Racket backend. I feel this piece of information should be provided by Drracket or some related library eventually.

Also, It looks to me that a per-language configuration is quite desirable.

This commit is a squash of my old commits from over a year ago on the
old "lang-lexer" branch. This was my starting point for the new
"hash-lang" branch.
Change the token-map struct into a hash-lang% class. We need to supply
an object% with certain methods for drracket:indentation functions,
so, we may as well just make the whole thing a class instead of a
struct. (Maybe this could be refactored into smaller pieces via mixins
or something because it's starting to feel a little "fat".)

Abandon our old strategies for parens and whitespace. Just go with the
latest flow from drracket/expeditor.
Including:

Implement a skip-whitespace method that produces same result as that
of racket:text%. However {backward forward}-match and don't pass.

Start to sketch out an alternative to forward-sexp-function handling,
using drracket:grouping-position.
Although it's not baked into the token struct anymore, we create
similar values when notifying, using paren-matches.
Rename some things from "lexindent" and "token-map" to hash-lang.

Delete many files that are now N/A compared to the approach from a
year ago, including steps toward making some of the code its own
"token-map" package.

For now, just focus on this being code for Racket Mode. Although I
want to contribute the efficient interval-map idea for use in
expeditor and other tools, some of that is still a moving target, and
anyway I want to continue validating/refining things for use in Emacs.
When my piece settles down, as well as some things in expeditor, I can
take another look at how best to make some of this available for
reuse.

Also note:

- This needs Racket 6.12+ for interval-map-ref/bounds.

- A few tests need `framework` and are skipped on headless Racket --
e.g. CI like GitHub Actions -- but do run locally.
This is only necessary due to hash-lang.rkt test using a racket:text%
object from the framework collection.
Eliminate bounds+token struct; just use list.

Eliminate -get-tokens; use get-tokens in tests.

Initial implementation to use drracket:range-indent -- but not yet
actually tested with a lang that supplies it, like e.g. rhombus.
Implement failure token bounds as needed by forward-sexp-function.

Implement racket-hash-lang-{up down} commands. I'm still not 100% if
these are needed vs. standard Emacs {up down}-list commands which use
forward-sexp, now that the previous item is implemented. But I think
good to have for now, for development/testing.

Re-implement position-paragraph and paragraph-{start end}-position
methods. These still do a linear search from the start, which isn't
ideal, but I'm not yet convinced it's worth trying to do a trickier
incremental update like we do for the tokens interval-map.

Start to add tests exercising #lang rhombus. Some of the indentation
tests don't yet pass, for reasons I don't yet understand.
Incorporate some expeditor commits:

  racket/expeditor@e017835

  racket/expeditor@5b1a374

Also update tests.

At this point all the tests pass except for some shrubbery indentation
tests in small number of lines in the shrubbery demo.rkt file.
Handle it being available (with new-enough versions of Racket and/or
syntax-color-lib) or not. After all, a goal of Racket Mode is for it
work to the extent possible on older versions of Racket.

This is a good opportunity to move the tests to their own file, since
they need to be skipped when color-textoid is not available. Anyway,
they were starting to become unwieldy in the same file; it was
becoming more tests than tested.

Some of this will become N/A when we move it to syntax-color-lib
itself; then hash-lang-bridge.rkt will need to do a similar test. But
moving the tests to their own file is good prep, as those will end up
in syntax-color-test not syntax-color-lib.
From racket/expeditor#10 it sounds like the
"traditional" approach isn't desirable after all; remove it.

Make the paragraph numbering "moderately" optimized: On do-udpate! we
invalidate it, and then the paragraph methods recalculate it
on-demand.

This is a mid point on design spectrum between "naively scan from 0
every single time" and "do some minimal rebuild on every update".

Note that I'm not 100% confident about the concurrency safety. But I'm
committing anyway, for now, because I plan to take hard look at
concurrency for the class, soon, including this as well as the
potential for update generations arriving on command threads out of
order and needing to be queued something like TCP packets.
Make hash-lang.rkt itself less idiosyncratic to how we want to use it
in Emacs.

Instead of supplying the class an async-channel, now it takes an
optional on-notify procedure. Something using the class only for
indent might not need changed-token notifications at all. And even
something that needs them can choose how to handle them (although some
non-blocking technique like an async-channel is important).

Similarly, move the code that massaged the notifications into the
desired format for Emacs, from hash-lang.rkt to hash-lang-bridge.rkt.
By supplying paren-matches as the first argument to on-notify, we
give it the ability to massage parenthesis tokens. Also, since
on-notify gets the full token struct, it can handle the case where
token-type is a hash-table instead of a symbol, as with module-lexer*.

Although I'm not sure this is everything that needs to be done, or
that all of these details are just right, it's at least a first step.
This test allows ours to take <= 3X the time.

Although I'm not sure that's good enough, it sets an initial bar.

(Although 3X isn't superb, the test example is range-indents
rhombus/demo.rkt, a 600 line file -- and does so 10 times. The
total time is around 54 ms for us vs. 24 ms racket:text%.)

(To some extent this is measuring how "fancy" we are being in
supporting the position-paragraph method, which effectively is
position->line-number.)
Some things broke in commit 45197ed.

Furthermore, now that hash-lang% uses an on-notify procedure (instead
of a channel), we can simplify things have our procedure transform and
put things directly to the token-notify-channel; we no longer need a
channel per hash-lang% in hash-lang-bridge.rkt.
Upon RET, the electric-indent stuff does an indent-line for the
original line, as well as an indent-line for the new line. The former
doesn't work well with hash-lang indenters. Although maybe I should
figure out why, it also seems reasonable simply to rebind RET to
newline-and-indent, which this commit does.
In real use in Emacs, I saw this happen sometimes, which caused a
blocking command like indent to timeout and things to be left in an
inconsistent state.
This commit just updates a bunch of variables "tm" (after the old
toke-map struct) to "o" (objects of the hash-lang% class).
Only check for new #lang when a change position is <= the end of the
last lang spec we read.

When we have a changed lang, notify the front end. So now, there is a
'hash-lang notification of varieties 'lang or 'token.

The notification includes lang-info information like
paren-matches, quote-matches, and booleans for whether the lang
supplies a grouping-position, line-indenter, and range-indenter.

The idea here is to supply enough information to appropriately
configure things like the Emacs syntax-table, as well as
indent-line-function, indent-region-function, and
forward-sexp-function. That will be the next commit. The idea of this
commit is to get enough back end stuff sorted out for this that I can
focus on the Emacs end for awhile.
Back end:

Handle case of inserting in middle of existing token; split it before
expanding the interval-map to make the old/new compare work correctly
and emit minimal changed-token notifications.

Add a message when racket-hash-lang-mode is used with a #lang that
supplies nothing special beyond a color-lexer -- no grouping-position
or indent functions. In that case plain old racket-mode would suffice,
unless a user prefers the "more boring" (or "less garish", depending
on your opinion) coloring. Similarly append symbols to the mode-line
lighter when each of these is supplied by a lang. (Although I'm not
sure this is really the right UX, it's information that at least is
helpful to me now when dogfooding and trying to understand what is
supposed to work and why.)

Front end:

Stop setting forward-exp-function. Instead supply navigation commands.
When the lang supplies no grouping-position at all, or when it does
but a specific call returns "use s-expression", we use the default
Emacs commands.

Switch from cl-case to pcase.
@greghendershott
Copy link
Owner

Looking at this at again, I'm coming around to your original suggestion, mostly.

  • I do think it's fine to rely on Emacs auto-mode-alist as the mechanism to say, "Given some file extension, what Emacs major mode to use?" So for example all of .rkt, .scrbl, .rhm, etc. can use racket-mode, and, the user racket-mode-hook can enable the racket-hash-lang-mode minor mode. All these extensions are handled the same way with the hash langs.

  • Things like comment-start seem like a hole in the lang info spec. Whether it's DrRacket or Emacs or vscode, any editor that wants to offer comment/uncomment commands needs to know this. So stuff like this seems like something where Robby and Matthew and I could/should coordindate to add an info key.

  • Having said that, there may be miscellaneous config that a user wants to do based on the module language. Like in your example, you want M-q to fill-paragraph (not reindent) when in scribble lang. In that vein:

    • It turns out that a language's "info" function can support a 'module-language key, as discussed for the #:info option of syntax/module-reader. (It's possible that older langs might not use this, but maybe the best answer there is we submit PRs to update those?) We don't need to re-implement read-language with regexps.

    • I could define an Emacs hook, say racket-hash-lang-module-language, which is called with the mod lang value whenever that changes (from loading a file or from user editing). Users can add hook functions. This could be the point to do stuff like tweak M-q.

    So this could handle "all other" config. Even in cases where we think adding a new lang info key is the ultimate Right Way, this could help in the meantime.


Any thoughts?

I've been sketching this out and doing some initial testing, so I'm not asking you to update your PR.

greghendershott added a commit that referenced this pull request Aug 25, 2023
This is roughly in the same spirit as PR #661, but using the
module-language key supported by new lang's info function.

Also this defines a hook as the means for users to customize. A
default hook function sets comment-start for a few popular
langs (although ultimately comment-start should become a new
"official" lang info key).
@greghendershott
Copy link
Owner

I pushed a commit to the hash-lang branch. When you have a chance, let me know if it seems OK?

greghendershott added a commit that referenced this pull request Aug 26, 2023
Although we keep the racket-hash-lang-module-language-hook added in
the previous commit, for end users, we no longer use a default hook to
set comment-{start end padding}. Instead have the back end provide
those values. The goal is to have a new info key, e.g.
"drracket:comments", for langs to supply this. Meanwhile, any
fallbacks live down there (although a user could still use the hook to
add their own fallback or work-around).

For background on this commit and the previous commit see PR #661.
@greghendershott
Copy link
Owner

I pushed commit 2075184 which moves some stuff down to the back end with a view toward adding a new info key for langs to supply.

@capfredf
Copy link
Sponsor Contributor Author

@greghendershott Thank you. I will give it a shot and get back to you in a week or so.

greghendershott added a commit that referenced this pull request Sep 4, 2023
Use token class to decide whether to do prog-indent-sexp,
fill-paragraph, fill-comment, or nothing.

This should avoid users needing to do the kind of configuration as in
issue #661. Furthermore it can vary smartly within e.g. a Scribble doc
to support three possible behaviors based on location. (In my
dog-fooding so far this works well.)
@greghendershott
Copy link
Owner

This isn't a nudge; on the contrary it's a summary for when you do have time to catch up:

  1. A lang can now supply a drracket:comment-delimiters info key. racket-hash-lang-mode will use this to set comment-xxx variables.

    Proposal: New info key drracket:comments racket/drracket#634 tracks the progress. But even now, before my PRs for scribble and rhombus are merged to supply this, the Racket Mode back end supplies fallbacks for those.

  2. A new command racket-mode-C-M-q-dwim is bound to C-M-q by default. Based on the lang lexer's token under point, it does a prog-indent-sexp or fill-paragraph or fill-comment.

    This has worked well for me so far editing a .scrbl file -- it fills in text section, but indents in racketblock code examples. But let me know of any problems/omissions.

  3. Although the previous points address your configuration motivation (IIUC), there is also the new racket-hash-lang-module-language-hook for other configuration.


Finally I think I might go ahead and merge the hash-lang branch by the end of this week. (I might slap an "experimental" caveat in the docs. But these days it's probably better for this to live on the main branch, to get more use and improvement.)

@capfredf
Copy link
Sponsor Contributor Author

capfredf commented Sep 5, 2023 via email

@greghendershott
Copy link
Owner

That's an open ended question. 😄 A couple answers:

  • Apparently the set of tokens a lexer may return is open.

    • If a relatively popular lang like rhombus or scribble adds a new token type, then I'll probably update this to look for that and map it to a specific face. e.g. I did this for rhombus "at" and "operator" tokens.

    • At the same time, there should probably be a config var alist for users to add/override those choices. That's still TO-DO.

  • Overall, the font-lock is rather "plain" compared to classic racket-mode. Much like DrRacket. e.g There aren't regexp rules to highlight popular functions, or variable names in let or define, etc. Of course that also means there aren't buggy corner cases, because regexps. My current thinking is:

    • The lang lexer and racket-hash-lang-mode is about "syntactic" highlighting. By design that's somewhat basic... but also guaranteed to be correct.

    • Sometimes people refer to "semantic" highlighting. This is much of the value-add from the "gaudy" regexp rules in classic Racket Mode. But here something like racket-xp-mode, based on check-syntax analysis, could do a good, and more-correct job. e.g. Highlight everything that's a variable. Or give font-lock-keyword-face to things imported from certain modules like racket/base. etc. I think that would give back much of the classic variety, but again with fewer regexp gotchas? Probably, but TBD.

    And in fact this is another reason I'd like to merge to racket-hash-lang-mode. My other long-running project is a check-syntax db, "pdb". With those on separate branches, it's awkward to experiment with this mix of lexer highlighting and semantic highlighting.

That's my little brain dump. If you were actually asking some other, third question, please let me know. 😄

greghendershott added a commit that referenced this pull request Sep 6, 2023
This is roughly in the same spirit as PR #661, but using the
module-language key supported by new lang's info function.

Also this defines a hook as the means for users to customize. A
default hook function sets comment-start for a few popular
langs (although ultimately comment-start should become a new
"official" lang info key).
greghendershott added a commit that referenced this pull request Sep 6, 2023
Although we keep the racket-hash-lang-module-language-hook added in
the previous commit, for end users, we no longer use a default hook to
set comment-{start end padding}. Instead have the back end provide
those values. The goal is to have a new info key, e.g.
"drracket:comments", for langs to supply this. Meanwhile, any
fallbacks live down there (although a user could still use the hook to
add their own fallback or work-around).

For background on this commit and the previous commit see PR #661.
greghendershott added a commit that referenced this pull request Sep 6, 2023
Use token class to decide whether to do prog-indent-sexp,
fill-paragraph, fill-comment, or nothing.

This should avoid users needing to do the kind of configuration as in
issue #661. Furthermore it can vary smartly within e.g. a Scribble doc
to support three possible behaviors based on location. (In my
dog-fooding so far this works well.)
greghendershott added a commit that referenced this pull request Sep 17, 2023
This is roughly in the same spirit as PR #661, but using the
module-language key supported by new lang's info function.

Also this defines a hook as the means for users to customize. A
default hook function sets comment-start for a few popular
langs (although ultimately comment-start should become a new
"official" lang info key).
greghendershott added a commit that referenced this pull request Sep 17, 2023
Although we keep the racket-hash-lang-module-language-hook added in
the previous commit, for end users, we no longer use a default hook to
set comment-{start end padding}. Instead have the back end provide
those values. The goal is to have a new info key, e.g.
"drracket:comments", for langs to supply this. Meanwhile, any
fallbacks live down there (although a user could still use the hook to
add their own fallback or work-around).

For background on this commit and the previous commit see PR #661.
greghendershott added a commit that referenced this pull request Sep 17, 2023
Use token class to decide whether to do prog-indent-sexp,
fill-paragraph, fill-comment, or nothing.

This should avoid users needing to do the kind of configuration as in
issue #661. Furthermore it can vary smartly within e.g. a Scribble doc
to support three possible behaviors based on location. (In my
dog-fooding so far this works well.)
greghendershott added a commit that referenced this pull request Sep 19, 2023
This is roughly in the same spirit as PR #661, but using the
module-language key supported by new lang's info function.

Also this defines a hook as the means for users to customize. A
default hook function sets comment-start for a few popular
langs (although ultimately comment-start should become a new
"official" lang info key).
greghendershott added a commit that referenced this pull request Sep 19, 2023
Although we keep the racket-hash-lang-module-language-hook added in
the previous commit, for end users, we no longer use a default hook to
set comment-{start end padding}. Instead have the back end provide
those values. The goal is to have a new info key, e.g.
"drracket:comments", for langs to supply this. Meanwhile, any
fallbacks live down there (although a user could still use the hook to
add their own fallback or work-around).

For background on this commit and the previous commit see PR #661.
greghendershott added a commit that referenced this pull request Sep 19, 2023
Use token class to decide whether to do prog-indent-sexp,
fill-paragraph, fill-comment, or nothing.

This should avoid users needing to do the kind of configuration as in
issue #661. Furthermore it can vary smartly within e.g. a Scribble doc
to support three possible behaviors based on location. (In my
dog-fooding so far this works well.)
@greghendershott
Copy link
Owner

New commit f314ae9 has racket-xp-mode contribute faces to text not already fontified by racket-hash-lang-mode -- specifically identifiers at binding definition and use sites. This gives a more colorful presentation (if desired), closer to "classic" racket-mode than to Dr Racket.

Although I'm not 100.0% sure about all the details, this feels like the right basic approach.

  • On the Emacs side: It goes with the grain of Emacs modes -- a buffer has a major mode, optionally enhanced by one or more minor modes. A basic unit of user preference is the mode; changing the major mode for a buffer, and enabling/disabling minor modes on top that. Furthermore there are customizable faces, and a customizable map of token types to faces.

  • On the Racket side: It corresponds to the division of labor between syntax/color-lexer for basic token coloring and drracket/check-syntax for "semantic" highlighting.

@greghendershott greghendershott added the racket-hash-lang-mode Issues using racket-hash-lang-mode instead of "classic" racket-mode for edit buffers label Sep 23, 2023
greghendershott added a commit that referenced this pull request Nov 8, 2023
This commit is a squash of nearly 250 commits from the long-running
branch, `hash-lang`.

Major themes:

1. Change REPL I/O. We no longer use a TCP connection to do I/O for
each REPL. Instead use commands (input) and notifications (output).
Furthermore send various kinds of output as distinct notifications.

2. Support use of hash-lang colors, indent, navigation when editing
and in REPL.

Add racket-hash-lang-mode, an alternative to racket-mode for editing
source files, which uses coloring, indent and navigation supplied by a
lang.

Any number of racket-mode or racket-hash-lang-mode buffers may take
turns using the same racket-repl-mode. The last-run edit buffer's
settings are used in the REPL.

Needs Racket 6.12+ for interval-map-ref/bounds.

Use syntax-color/color-textoid when available (with new-enough
versions of Racket and/or syntax-color-lib) but not required.

3. racket-xp-mode: Do "semantic" highlighting of binding sites.
Intended for use by racket-hash-lang-mode to get more than just lexer
colors.

---

Closes #661.
Fixes #642.
Fixes #667.
Fixes #671.
Fixes #672.
Fixes #673.
greghendershott added a commit that referenced this pull request Nov 9, 2023
This commit is a squash of nearly 250 commits from the long-running
branch, `hash-lang`.

Major themes:

1. Change REPL I/O. We no longer use a TCP connection to do I/O for
each REPL. Instead use commands (input) and notifications (output).
Furthermore send various kinds of output as distinct notifications.

2. Support use of hash-lang colors, indent, navigation when editing
and in REPL.

Add racket-hash-lang-mode, an alternative to racket-mode for editing
source files, which uses coloring, indent and navigation supplied by a
lang.

Any number of racket-mode or racket-hash-lang-mode buffers may take
turns using the same racket-repl-mode. The last-run edit buffer's
settings are used in the REPL.

Needs Racket 6.12+ for interval-map-ref/bounds.

Use syntax-color/color-textoid when available (with new-enough
versions of Racket and/or syntax-color-lib) but not required.

3. racket-xp-mode: Do "semantic" highlighting of binding sites.
Intended for use by racket-hash-lang-mode to get more than just lexer
colors.

---

Fixes #482.
Fixes #619.
Fixes #642.
Fixes #663.
Fixes #667.
Fixes #671.
Fixes #672.
Fixes #673.

Closes #64.
Closes #633.

Closes PR #661.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
racket-hash-lang-mode Issues using racket-hash-lang-mode instead of "classic" racket-mode for edit buffers
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants