Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Haskell] Rewrite Haskell syntax #2225

Merged
merged 19 commits into from Jul 21, 2020
Merged

Conversation

FichteFoll
Copy link
Collaborator

@FichteFoll FichteFoll commented Dec 3, 2019

After working with Haskell a bit, I couldn't help but notice how poor the syntax highlighting for it was. It got the job done, but there were and still are several things blatantly missing or using the wrong scopes.

This PR is in a mergeable state. I'm not done or satisfied with it, but I submitted it anyway to collect some feedback if people happen to stumble upon it. I am by no means a Haskell pro. However, this is already a significant improvement over the current syntax.

Notably, the following things still need attention:

  • class definitions
  • type definitions
  • lambdas & \case
  • curlies & semicolon
  • records
  • separate subscopes for known operators, such as assignment
  • | and -> are actual keywords, but context-dependant. Others?
  • unicode operators
  • accessors
  • (builtin) types
  • import scope names
  • more I don't remember or know about yet

Fixes #214.

- Use variables
- Highlight '*' (and combinations) as operator
- Add punctuation scope to infix notation
- Scope non-infix notation as `keyword.operator`
- Add proper scopes to control keywords
- Add proper scopes to declarations
- Use proper scope names for entities in declarations
Also highlight superfluous characters.
Not final due to sublimehq#1842 being
unresolved, but still an improvement.
@FichteFoll FichteFoll changed the title [Haskell] [WIP] Rewrite Haskell syntax [Haskell] Rewrite Haskell syntax Dec 3, 2019
@@ -93,21 +102,28 @@
--
a a = (+) a 2
-- ^ keyword.operator.haskell
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Shouldn't this be keyword.operator.assignment.haskell? Also perhaps keyword.operator.declaration.haskell.

# note that Haskell permits the definition of new operators
# which can be nearly any string of punctuation characters,
# such as $%^&*.
operator_infix: '[*|!%$?~+:\-.=</>\\]+'
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

With extensions, Haskell also permits arbitrary unicode operators, so this regex is actually overly-specific.

Copy link
Collaborator Author

@FichteFoll FichteFoll Dec 3, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not exactly arbitrary, I believe. I intended to go through the list I linked in OP and add those to this variable.

# such as $%^&*.
operator_infix: '[*|!%$?~+:\-.=</>\\]+'
operator_parens: '(?:{{operator_infix}}|,+)'
module_name: (?:[A-Z][A-Za-z._']*)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Unicode should probably be considered here as well.

Copy link
Collaborator Author

@FichteFoll FichteFoll Dec 3, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How would I differentiate a unicode name and a unicode operator?

[:upper:][:alnum:] ? Do you happen to know if there is a lexical description of the Haskell syntax somewhere?

I did most on this based on my 2 day experience and some guides on areas I haven't dug into yet.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How would I differentiate a unicode name and a unicode operator?

Hmm, I'll see if I can find a more comprehensive description of Haskell's lexical syntax in this area. It's entirely possible that you can't. In the Scala mode, for example, I gave up entirely on this and just matched identifiers as… identifiers, since there's no meaningful distinction. Haskell does distinguish both at definition site and at call site, but only in a context-sensitive fashion:

  • At definition site, parens/backticks may distinguish names from operators, as well as infix usage of the type-declared name (this is the context-sensitive bit)
  • At call site, parens/backticks may also be found, but also the use of an identifier in infix position rather than prefix, which should theoretically be parseable

There's got to be some distinguishing lexical feature though, which should give us something to hang our hats on. I'll do some digging

@@ -198,56 +305,55 @@ contexts:
- match: '-\}'
scope: punctuation.definition.comment.end.haskell
pop: true
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This seems insufficient. For example, it would not catch the following:

Prelude> {- {- test -} 42 -} 42
42

-- ^ constant.numeric.integer.decimal.haskell
a a = (-) a 2
-- ^ keyword.operator.haskell
-- ^^^ entity.name.function.infix.haskell
-- ^^^ variable.function.infix.haskell
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should the variable.function scope really cover the whole thing? I generally think of the parentheses more as operators in this context (similar to the backticks in the ascii case) and the contents as the actual function.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same problem as #2145 (comment).

-- ^^^^^^^^^ keyword.control.import.haskell
-- ^^^^^^^^^^^^^^^^^^^ support.other.module.haskell
-- ^^ keyword.control.import.haskell
-- ^^^^^^^^^^^^^ support.other.module.haskell
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This scoping is inconsistent with Scala, which has a similar import mechanism and which is itself consistent with JavaScript, Python, and C#. For reference:

   import Data.Vector.{Mutable => MutableVector}
// ^^^^^^ meta.import.scala keyword.other.import.scala
//        ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ meta.import.scala
//                    ^ meta.import.scala meta.import.selector.scala punctuation.section.group.begin.scala
//                     ^^^^^^^^^^^^^^^^^^^^^^^^^ meta.import.scala meta.import.selector.scala
//                             ^^ meta.import.scala meta.import.selector.scala keyword.operator.arrow.scala

-- ^^^^^^^^^^^^^^^ support.other.module.haskell
-- ^^^^^^^^^ meta.declaration.exports.haskell
-- ^ punctuation.section.group.begin.haskell
-- ^^^^^^^ variable.function.haskell
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I do like scoping this as variable.function, but it is inconsistent with other import scopes.

test a = case a of
-- ^^^^ keyword.control.conditional.select.haskell
-- ^^ keyword.control.conditional.select.haskell
Nothing -> 0
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are the arrows scoped? They should be.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

They are matched by the operator pattern.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it's probably worth scoping them specifically, since operators should not be scoped as keyword, but -> should.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree, ->should be a proper keyword, much like guards are. It is used in many different situations and you can't assign to it.

Haskell/syntax_test_haskell.hs Show resolved Hide resolved
@@ -186,3 +260,25 @@
0XdeafBEEF42
-- ^^^^^^^^^^^^ constant.numeric.integer.hexadecimal
-- ^^ punctuation.definition.numeric.base
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Probably worth also adding a test for the 0x prefix, just to ensure the case insensitivity is working.

@FichteFoll
Copy link
Collaborator Author

@djspiewak please note that none of the things you commented on have been changed in the PR, but I appreciate the comments. I'll translate them into todo items for the OP.

@FichteFoll
Copy link
Collaborator Author

Seeing how I probably won't be able to return back to this any time soon, I'm removing the draft status. As mentioned in the OP, the PR is mergeable and already an improvement, but there is much more to be improved on.

Copy link
Contributor

@moodmosaic moodmosaic left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks great, much more precise than the current syntax. ❤️ 😍 💯 🥇 Thank you so much, @FichteFoll.

See comments about meta.group.haskell and meta.sequence.haskell scopes.

Haskell/Haskell.sublime-syntax Show resolved Hide resolved
Haskell/Haskell.sublime-syntax Show resolved Hide resolved
@moodmosaic
Copy link
Contributor

Notably, the following things still need attention:

After merging those latest pull requests, apart from

the rest look good enough after doing a comparison with @lierdakil's atom-haskell/language-haskell grammar (which I consider one of the most precise grammars for Haskell).

@lierdakil
Copy link

Fwiw, I think both atom-haskell/language-haskell and Sublime's Haskell grammar are more or less based on the same TextMate grammar originally. Which in theory means that it should be possible to backport atom-haskell changes here. Or alternatively, convert language-haskell to tmlanguage. Although I think sublime grammar is strictly more powerful than tmlanguage, from which Haskell grammar could benefit a lot -- my implementation has a few limitations that stem from the lack of direct control over context, which I hear sublime's grammar engine allows.

@FichteFoll
Copy link
Collaborator Author

I'm pretty sure Sublime's grammar was copied from Textmate and basically never touched since then.

Trying to backport atom's grammer doesn't look like a good option because of how convoluted its regular expressions are and I'd like to take advantage of ST's power with the context stack and branching where feasible, which is better done from scratch. Thanks for the heads-up, however.

On the PR at hand, I incorporated the changes suggested by @moodmosaic. I still believe record syntax to be pretty broken, as it's basically unhandled right now, but the syntax doesn't break on it and has some highlighting at least.

@moodmosaic
Copy link
Contributor

moodmosaic commented Feb 26, 2020

@FichteFoll

I still believe record syntax to be pretty broken, as it's basically unhandled right now, but the syntax doesn't break on it and has some highlighting at least.

There's only one thing missing to fix the record syntax, just don't match the first field and be done:

image

As you can see above, we're matching the first field as meta.function.type-declaration.haskell but if we can avoid that, we're all set.

@AllanZyne
Copy link

This syntax file is what I wrote when I was interested in Haskell. It might be helpful as reference for unicode operator, which I studied from The Haskell 2010 language report by some effort.

  id_char: '[\p{Ll}\p{Lu}\p{Lt}\p{Nd}_'']' 
  con_id: '[\p{Lu}\p{Lt}]{{id_char}}*'
  var_id: '[\p{Ll}_]{{id_char}}*'
  reserved_id : 'case|class|data|default|deriving|do|else|foreign|if|import|in|infix|infixl|infixr|instance|let|module|newtype|of|then|type|where|_'
  mod_id: (?:(?:{{con_id}}.)*{{con_id}})
  symbol: '[\p{S}\p{P}&&[^(),;\[\]`{}_"'']]'

  qcon_id: '{{con_id}}(?:\.{{con_id}})*'
  qvar_id: (?:{{qcon_id}}\.)?{{var_id}}
  qvar_sym: (?:{{mod_id}}\.)?{{symbol}}+

  qvar: \b(?:{{qvar_id}}|\({{qvar_sym}}\))
  qvar_op: \b({{qvar_sym}}|`{{qvar_id}}`)

  operator_char: (?:[\p{S}\p{P}&&[^(),;\[\]`{}_"']])
  operator: '{{operator_char}}+'
  operator_fun: (?:\((?!--+\)){{operator}}\))

@moodmosaic
Copy link
Contributor

moodmosaic commented Mar 1, 2020

@FichteFoll

There's only one thing missing to fix the record syntax, just don't match the first field and be done:

What I mean is that we go from this:
image

To this:
image


Here's the above code

data Post =
  Post {
      postId :: PostId
    , postId :: PostId
    , postUserId :: UserId
    , postTitle :: Text
    , postBody :: Text
    , postCreatedAt :: UTCTime
    } deriving (Eq, Ord, Show)

@moodmosaic
Copy link
Contributor

Ping

@FichteFoll
Copy link
Collaborator Author

It's still on my radar, but so are many other things. I don't have any updates to share atm.

@moodmosaic
Copy link
Contributor

@FichteFoll, @wbond, this should be merged, even without the improvement described in #2225 (comment). If there's anything I can do to help further, let me know.

@wbond
Copy link
Member

wbond commented Jul 20, 2020

@FichteFoll So you are still of the opinion that this is a pretty significant step forward?

@FichteFoll
Copy link
Collaborator Author

Definitely, it's mergeable and a significant improvement. The unchecked items in the OP are merely a checklist of things to also check out that I haven't gotten around to yet and probably won't in the near future either. They are no blockers.

@wbond wbond merged commit 3a5b387 into sublimehq:master Jul 21, 2020
@wbond
Copy link
Member

wbond commented Jul 21, 2020

Thanks for the work on writing this and the reviews!

@FichteFoll FichteFoll deleted the pr/haskell/misc branch July 21, 2020 17:04
@wbond wbond mentioned this pull request Jul 21, 2020
36 tasks
moodmosaic added a commit to moodmosaic/SublimeHaskell that referenced this pull request Sep 8, 2020
- Match $ in keyword.operator.haskell
- Match via in meta.deriving.haskell
- Match instance declaration under newtype declaration

    -- | Derive 'Read' / 'Show' using @DerivingVia@.
    newtype Quiet a =
      Quiet {
          unQuiet :: a
        }

    instance (Generic a, QShow (Rep a)) => Show (Quiet a) where
      showsPrec n =
        qshowsPrec n . unQuiet

    instance (Generic a, QRead (Rep a)) => Read (Quiet a) where
      readPrec =
        fmap Quiet qreadPrec
- Match @# in keyword.operator.haskell
- Match @# in entity.name.function.infix.haskell
- Add scope support.function.prelude.haskell
- Sort support.function.prelude.haskell
- Base support.function.prelude.haskell on language-haskell haskell.cson
  https://github.com/atom-haskell/language-haskell/blob/e036e449909816e616b880157e2703e70fc9b5df/grammars/haskell.cson
- Sync syntax_test_haskell.hs with upstream
  https://github.com/sublimehq/Packages/blob/549ee49a38d4b77bcb1df068ef2c122a3bec75e7/Haskell/syntax_test_haskell.hs
- Match type signatures that are split in newlines
- Set name as Haskell (λ)
- Test OPTIONS_HADDOCK
- Test via (deriving)
- Test type signatures that are split in newlines
- Test keyword.operator.haskell (*, @, and #)
- Test entity.name.function.infix.haskell (:@, and :#)
mitranim pushed a commit to mitranim/Packages that referenced this pull request Mar 25, 2022
* [Haskell] Rewrite operator matching

- Use variables
- Highlight '*' (and combinations) as operator
- Add punctuation scope to infix notation
- Scope non-infix notation as `keyword.operator`

* [Haskell] Update keyword matching

- Add proper scopes to control keywords
- Add proper scopes to declarations
- Use proper scope names for entities in declarations

* [Haskell] Restructure with contexts

* [Haskell] Remove usages of double quoted scalars

* [Haskell] Simplify string matches

Also highlight superfluous characters.

* [Haskell] Reduce max line length

* [Haskell] Match groups

* [Haskell] Adjust scopes for imports

Not final due to sublimehq#1842 being
unresolved, but still an improvement.

* [Haskell] Match lists

* [Haskell] Correctly match idents with trailing '

* [Haskell] More gracious infix operator matching

* [Haskell] Adjust keyword scopes to recent standards

* [Haskell] match OPTIONS_HADDOCK

Same as https://github.com/sublimehq/Packages/pull/2270/files

* [Haskell] match deriving (..) via (..)

Same as https://github.com/sublimehq/Packages/pull/2271/files

* [Haskell] match @ and # in keyword.operator.haskell

Same as
- https://github.com/sublimehq/Packages/pull/2272/files
- https://github.com/sublimehq/Packages/pull/2273/files


Co-Authored-By: Nikos Baxevanis <nikos.baxevanis@gmail.com>

* [Haskell] match deriving instance (..)

* [Haskell] Match functions from the prelude

Based on https://github.com/atom-haskell/language-haskell/blob/e036e449909816e616b880157e2703e70fc9b5df/grammars/haskell.cson#L1306-L1307

Co-Authored-By: Nikos Baxevanis <nikos.baxevanis@gmail.com>

* [Haskell] Add tests for `via` derives

* [Haskell] match deriving instance (..) without breaking data deriving

This fixes a bug introduced via 0d36dd1

Co-authored-by: Nikos Baxevanis <nikos.baxevanis@gmail.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[Haskell] "&" does not get highlighted
6 participants