Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Lisp] WIP Overhaul #2387

Closed
wants to merge 20 commits into from
Closed

Conversation

michaelblyons
Copy link
Collaborator

@michaelblyons michaelblyons commented Jun 16, 2020

I don't really know what I'm doing with Lisp, so someone else will have to vet this.

One particularly scary thing is "How do I know if the first thing in some parentheses is a function or not?" Is it always triggered by the '()? If not, what are each of these: (a b c), ( a b c), ("foo"), (1 2)?

I also do not really understand # stuff. Like #c(4 -8) is a complex number, right? But there is also #(1 2 3) that is a vector or list... and it's immutable? Maybe?

/cc @gighmmpiob @JellyWX @rotty @samuel-jimenez

For the Packages veterans:

  • Do we have any analogue for ', which apparently tells Lisp not to execute a group or symbol?
  • Are there opinions on what scopes should be for complex (a type) versus the #c above, a specific complex number.
    • There are lots of these, from # for vector to #2A for a two-dimensional array
    • I've only added scopes for #\ which is a character

Status

  • Symbols
    • Can be virtually any character string
    • Cannot be only .s or digits
    • \-escaping symbol chars works
    • |-delimiting a whole symbol works
    • |-delimiting portions of a symbol works
  • Defining a function/macro/type/struct
    • Have parameters
    • Have &-word annotations
    • Are split, so that &-word annotations are def*-specific
  • Builtin types
    • Exist
    • Highlight only in special places? If that's a thing?
    • Two of them are listed as deprecated as of 1989.
  • "Dispatching" macros
    • Characters #\
    • Numbers #o, #b, #x, #c (kind of), #_r
    • Functions #'
    • Exceptions
    • Load-/read-time evaluation #,, #.
    • Feature flags #+, #-
    • Local assignment #_=, #_#
    • Lists/Vectors #(
    • Arrays #_a
    • #:
    • Structs #s
    • #p
  • Conditionals
    • Slightly more specific scoping for some terms
    • Context-aware
  • Loops
    • Slightly more specific scoping for some terms
    • Context-aware
  • Templates
    • You can use `
    • ` allows you to use , where you couldn't before (instead, , highlights everywhere)
  • Other "special" handling
    • quote
    • let
  • Strings
    • Escape chars
    • First item in parens doesn't try to be a function
  • format string mini-language
    • Newline and similar ~%, etc
    • Formatters for objects ~D, ~,,,,:D, etc.
    • Control structures

@ghost
Copy link

ghost commented Jun 17, 2020

Hey, this looks very good so far! I'm not a Lisp expert, but I can try to provide some answers here.

One particularly scary thing is "How do I know if the first thing in some parentheses is a function or not?" Is it always triggered by the '()? If not, what are each of these: (a b c), ( a b c), ("foo"), (1 2)?

It is impossible to know whether the first thing will be called as a function or not, without running the code. But I think the best solution would be to always highlight the first thing as a function name, unless the list is quoted, or the first thing is a string or a number.

(a b c) is simply a list of three identifiers. Whether it will be evaluated as function a called with arguments b and c depends on the context. Also, note that 'something is equivalent to (quote something). And () is equivalent to '(), which is equivalent to nil, which is in fact equivalent to 'nil. Consider this example:

(setq x (quote (1 2)))
(print (eval (append (list '+) x))) ; prints 3

Here (1 2) is just a literal list (because it is quoted), and then we prepend + and evaluate as (+ 1 2)

Do we have any analogue for ', which apparently tells Lisp not to execute a group or symbol?

Yes, 'x is (quote x). However, note that quote can be redefined. In this example, the second quote is actually a function that increments a number:

(defun f (a) (+ a 1))
(setq quote #'f)
(print (eval (append '(funcall) (quote (quote 4))))) ; prints 5

I also do not really understand # stuff.

The # character introduces a special syntax. It tells the parser that the next element should be parsed in a special way, depending on the next character. It has five different forms:

  • #x1234 - Interpret as a hexadecimal number
  • #b1010 - Interpret as a binary number
  • #'x - equivalent to (function x)
  • #c(2 3) - Create a new complex number
  • #(4 5 6) - Create a new vector

There are few more, for characters and multidimensional arrays, but I think some compilers introduce even more non-standard interpretations. I'm not sure what the specification says about it.


Regarding this PR, I'm a bit confused by the changes. What is the point of keeping the list of all built-in functions, while you instructed the highlighter to color every first element of a list as a function name?

@samuel-jimenez
Copy link

# is an example of a macro character.
For anyone interested, the associated documentation is here.

@michaelblyons
Copy link
Collaborator Author

It is impossible to know whether the first thing will be called as a function or not, without running the code. But I think the best solution would be to always highlight the first thing as a function name, unless the list is quoted, or the first thing is a string or a number.
[…] (a b c) is simply a list of three identifiers. Whether it will be evaluated as function a called with arguments b and c depends on the context.

Thanks. This is helpful. I'll roll with that.

Also, note that 'something is equivalent to (quote something).

Interesting. I think I will use the same scope for ' as for quote, then. Until someone disagrees.

And () is equivalent to '(), which is equivalent to nil, which is in fact equivalent to 'nil. Consider this example:

(setq x (quote (1 2)))
(print (eval (append (list '+) x))) ; prints 3

Here (1 2) is just a literal list (because it is quoted), and then we prepend + and evaluate as (+ 1 2)

Your example is a little scary. I don't really want to detect quote and push a no-initial-function context for its nested parens. If someone can then redefine quote to do something else, we obviously can't highlight it differently anyway.

Also, if (), '(), and 'nil are all the same as nil, should they have the constant.language scope? Or just () and nil? Or just nil?

The # character introduces a special syntax. It tells the parser that the next element should be parsed in a special way, depending on the next character. It has five different forms:

* `#x1234` - Interpret as a hexadecimal number
* `#b1010` - Interpret as a binary number
* `#'x` - equivalent to `(function x)`
* `#c(2 3)` - Create a new complex number
* `#(4 5 6)` - Create a new vector

There are few more, for characters and multidimensional arrays, but I think some compilers introduce even more non-standard interpretations. I'm not sure what the specification says about it.

I think I understand what they do now (h/t Samuel's link and CMU), but not necessarily what they are. I'm using a variety of scopes depending on the behavior of each one (minus a few that aren't done), but I can imagine someone being upset that the # in #( is scoped differently than #+. (As a counterexample, I don't think anyone would complain that #| is scoped like comments in other languages, and not like other dispatching macros in this one.)

Regarding this PR, I'm a bit confused by the changes. What is the point of keeping the list of all built-in functions, while you instructed the highlighter to color every first element of a list as a function name?

The scopes are subtly different (and obv depending on color scheme may look exactly the same) between built-in functions and things we assume are user-created functions. In my color scheme, for instance, built-ins are the same color but italic.

@michaelblyons michaelblyons changed the title [Lisp] WIP tweaks [Lisp] WIP Overhaul Jun 17, 2020
@FichteFoll
Copy link
Collaborator

Just like seemlingy everyone else I have no experience with Lisp, but I will provide some general comments/advice after I took a really quick look at the changes and the comments in here.

Interesting. I think I will use the same scope for ' as for quote, then. Until someone disagrees.

Since ' uses special syntax unlike a function, one might consider it to be a keyword or keyword.operator. I don't think it can be redefined either, judging from the examples, but some confirmation would be nice.

Your example is a little scary. I don't really want to detect quote and push a no-initial-function context for its nested parens. If someone can then redefine quote to do something else, we obviously can't highlight it differently anyway.

Depending on how likely this is, it may actually be a "good enough" heuristic to roll with. At least until someone complains about it.

+1 on using constant.language on all nil equivalents, although I would exclude the ' from that.

I'm using a variety of scopes depending on the behavior of each one

LGTM. Most of these have punctuation character (the numeric ones and #()), while #' could be seen as a function? No idea about this.

@jwortmann
Copy link
Contributor

Also, if (), '(), and 'nil are all the same as nil, should they have the constant.language scope? Or just () and nil? Or just nil?

I have never used Lisp either, but I would use the constant.language scope only for nil, but not for (), even though they have the same meaning. As Sublime always automatically adds the closing ) after typing an opening parenthesis, that would result in constant color changes for those while typing. I would imagine that to be a pretty unpleasant experience after a while, especially because parentheses seem to be used quite often in Lisp...

@FichteFoll
Copy link
Collaborator

Note that other functional languages, such as Haskell and Clojure, already scope () as a constant, though I agree that you arguably use parentheses a lot more opten in lisp.

@michaelblyons
Copy link
Collaborator Author

would result in constant color changes for those while typing.

Haha. I don't notice this because I have my active bracket settings bold/glow/fg-color (rather than underline). That's a very valid point you bring up. I don't expect it would be that unnerving in the circumstances, but ymmv.

Since ' uses special syntax unlike a function, one might consider it to be a keyword or keyword.operator.

No objection. My code highlighting right now is very function-heavy.

I don't think it can be redefined either, judging from the examples, but some confirmation would be nice.

Agreed.

[RE quote context] Depending on how likely this is, it may actually be a "good enough" heuristic to roll with. At least until someone complains about it.

Yeah, but how many special functions do we need to do special things for? I'll start to add the handled and unhandled bits to OP for reference.

@s-clerc
Copy link

s-clerc commented Jul 13, 2020

I think most of these changes are very welcome. See my PR for changes related to the scoping of ( and ) characters.

In terms of scoping () differently, I strongly support this change. I've already changed it in my personal settings and it's not been irritating. I think apart from the consistency argument, the other argument in favour is that you might accidentally leave a stray () somewhere and it would be difficult to see themes which grey-out block scoping characters, although it is true that generally nil vs () is used to communicate semantic differences.

I would advise against trying to special syntax quote except maybe as a control structure. 99% of lisp code will use ' so if quote is used, it's probably for a special/weird reason and so I think we shouldn't change the scoping. In terms of how quasiquoting works I think for the most part normal scoping should apply so things like functions are still highlighted appropriately, but it allows better-designed lisp-specific styles can perhaps apply a background colour or something to the quoted forms. In properly quoted lists with ', there shouldn't really be any highlighting of things like functions, but I'm not 100 % sure about this, it should also be possible to determine when we're in a quoted form for similar background highlighting.

I really like the idea of highlighting functions/macros in s-expressions, but I'm not sure if it's a good idea due to how common false positives will be, regardless I think it's more important to highlight the things below than random function calls.

Concrete additions:

  • +constant+ should be scoped as constants.
  • package:symbol and package:symbol the package, : and symbol parts should be scoped differently from each other.
  • Similarly, for package::symbol which indicates the use of a private symbol, the scope should be similar IMO but differentiable.
  • Because the let and let*special forms are common enough for most Lisps, they should have special scoping so that the variable definitions are scoped properly. So in the example below def and dec be marked as variable definitions and declarations (assuming such a distinction is made):
(let ((def 32) 
        dec)
  normal code here)
  • Similarly for the flet and labels special forms.
  • "function" calls prefixed with...
    • define- should be scoped like other definition things. I would doubt people would define things weirdly here.
    • def- similarly as above, but I'm not entirely sure because there could be some false positives, but at a very low rate as only about 500 words in English start with that sequence.
    • with- should be scoped as a control structure I'm not sure tbh, but it's used fairly often to add new bindings to the scope a little like with in JS.
    • make- should be scoped a constructor function
  • "function" calls ending with...
    • a ! should be scoped as a variable.function.mutating.lisp or similar. These indicate functions which will change the value in place.
    • a ? or -p or p except if it ends with ship should be scoped as a variable.function.predicate.lisp or similar. Predicates are functions which return boolean values.
  • variables ending with the above should also be scoped as variable.predicate.lisp or similar.
    For the above, the problem is that in Common Lisp, predicates end with just p or -p (in-set-p vs insidep) so we exclude ship as it's the most common English word and suffix ending with p lol. I just redid the maths, and I was wrong. I don't think there is a reasonable way to do this without it seeming arbitrary. The best regex is probably [^aeiou]p$ but I think that it might be too inconsistent to be useful, although applying that to the CL symbol index seems to yield good results. What would be good is if we used a dictionary approach excluding words like ship etc. although that seems like it will make the regex very big. I don't really know if it'll slow ST down or not.
  • It should be possible to differentiate defmacros from defuns, right now they're scoped the same way. I understand most schemes don't differentiate, so IMO a suffix should be added to the end of defmacro's scopes to allow for lisp-specific themes to colour them differently.
  • Add my PR mentioned above to fix scoping of ( and )
  • Add Scheme specific things like... (right now I feel it's too Common Lisp focused)
    • define-syntax, syntax-case, syntax-ruleandsyntaxas forms similar todefmacro`. I'm not too sure about the last one because I don't use Scheme, but it seems similar enough.

@michaelblyons
Copy link
Collaborator Author

@s-clerc Thanks! Sorry I missed @-ing you with the first set of changes. I'll probably end up having questions about some of the specifics when I dig into them.

@s-clerc
Copy link

s-clerc commented Jul 13, 2020

No problem! Honestly you’re doing me a huge favour with all this work as it is.

@s-clerc
Copy link

s-clerc commented Jul 13, 2020

  • One thing I think we should add, but I'm not sure about the feasibility is a counterpart to the stray bracket indication, a missing bracket indication.
    The way it should work IMO is that if a form isn't closed, the bracket marking the start of the form is indicated in red. While I agree that ideally, we'd mark exactly where it's missing I'm not sure if that's possible:
; bracket line below in the first column is highlighted in my proposal
(let ((test x))
   (do-whatever) ; bracket missing here, we want a marker here ideally

(let ((test y))
    (do-nothing)) ; but theoretically, this could be contained within the form above

To achieve an at-end bracket warning, we'd need to consider both line-breaks and indentation to guess where people meant the form to end.

@wbond
Copy link
Member

wbond commented Jul 21, 2020

I'm not sure what the state of this is, or how much of an improvement this is over the existing lisp syntax.

That said, if we are at a point where it provides some significant improvements, we could treat it like we did Haskell in #2225 and merge the current state, which will hopefully get feedback from other users and allow future improvements.

It all sort of depends on if this is a superset of the existing syntax, or if there are significant regressions.

@michaelblyons
Copy link
Collaborator Author

My personal non-Lisp-programmer opinion:

  • let statements are currently regressed.

  • Related very strongly to the let case, I'll quote @s-clerc:

    I really like the idea of highlighting functions/macros in s-expressions, but I'm not sure if it's a good idea due to how common false positives will be, regardless I think it's more important to highlight the things below than random function calls.

Everything else I'm aware of is in a better or equivalent state.

@s-clerc
Copy link

s-clerc commented Jul 26, 2020

There seems to be a bug in the stray bracket detection; in this file, on line 300 LispWIP will mark the last bracket as a stray which is incorrect and a regression as no stray is reported by Lisp.

@deathaxe
Copy link
Collaborator

Closing as superseded by #3896 as everything of this PR is included and has evolved further.

@deathaxe deathaxe closed this Feb 10, 2024
deathaxe added a commit that referenced this pull request Apr 19, 2024
Resolves #1968 

Supersedes #2387
Supersedes #2312

Inspired by #2387 

This PR actually started with #2387 but ended up being a complete rewrite. 
Hence opening a new PR seems more reasonable.

It uses rules from https://www.lispworks.com/documentation/common-lisp.html
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

7 participants