New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Haskell] Rewrite Haskell syntax #2225
Conversation
- Use variables - Highlight '*' (and combinations) as operator - Add punctuation scope to infix notation - Scope non-infix notation as `keyword.operator`
- Add proper scopes to control keywords - Add proper scopes to declarations - Use proper scope names for entities in declarations
Also highlight superfluous characters.
Not final due to sublimehq#1842 being unresolved, but still an improvement.
@@ -93,21 +102,28 @@ | |||
-- | |||
a a = (+) a 2 | |||
-- ^ keyword.operator.haskell |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Shouldn't this be keyword.operator.assignment.haskell
? Also perhaps keyword.operator.declaration.haskell
.
Haskell/Haskell.sublime-syntax
Outdated
# note that Haskell permits the definition of new operators | ||
# which can be nearly any string of punctuation characters, | ||
# such as $%^&*. | ||
operator_infix: '[*|!%$?~+:\-.=</>\\]+' |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
With extensions, Haskell also permits arbitrary unicode operators, so this regex is actually overly-specific.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not exactly arbitrary, I believe. I intended to go through the list I linked in OP and add those to this variable.
# such as $%^&*. | ||
operator_infix: '[*|!%$?~+:\-.=</>\\]+' | ||
operator_parens: '(?:{{operator_infix}}|,+)' | ||
module_name: (?:[A-Z][A-Za-z._']*) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Unicode should probably be considered here as well.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How would I differentiate a unicode name and a unicode operator?
[:upper:][:alnum:]
? Do you happen to know if there is a lexical description of the Haskell syntax somewhere?
I did most on this based on my 2 day experience and some guides on areas I haven't dug into yet.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How would I differentiate a unicode name and a unicode operator?
Hmm, I'll see if I can find a more comprehensive description of Haskell's lexical syntax in this area. It's entirely possible that you can't. In the Scala mode, for example, I gave up entirely on this and just matched identifiers as… identifiers, since there's no meaningful distinction. Haskell does distinguish both at definition site and at call site, but only in a context-sensitive fashion:
- At definition site, parens/backticks may distinguish names from operators, as well as infix usage of the type-declared name (this is the context-sensitive bit)
- At call site, parens/backticks may also be found, but also the use of an identifier in infix position rather than prefix, which should theoretically be parseable
There's got to be some distinguishing lexical feature though, which should give us something to hang our hats on. I'll do some digging
@@ -198,56 +305,55 @@ contexts: | |||
- match: '-\}' | |||
scope: punctuation.definition.comment.end.haskell | |||
pop: true |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This seems insufficient. For example, it would not catch the following:
Prelude> {- {- test -} 42 -} 42
42
-- ^ constant.numeric.integer.decimal.haskell | ||
a a = (-) a 2 | ||
-- ^ keyword.operator.haskell | ||
-- ^^^ entity.name.function.infix.haskell | ||
-- ^^^ variable.function.infix.haskell |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should the variable.function
scope really cover the whole thing? I generally think of the parentheses more as operators in this context (similar to the backticks in the ascii case) and the contents as the actual function.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same problem as #2145 (comment).
-- ^^^^^^^^^ keyword.control.import.haskell | ||
-- ^^^^^^^^^^^^^^^^^^^ support.other.module.haskell | ||
-- ^^ keyword.control.import.haskell | ||
-- ^^^^^^^^^^^^^ support.other.module.haskell |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This scoping is inconsistent with Scala, which has a similar import mechanism and which is itself consistent with JavaScript, Python, and C#. For reference:
import Data.Vector.{Mutable => MutableVector}
// ^^^^^^ meta.import.scala keyword.other.import.scala
// ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ meta.import.scala
// ^ meta.import.scala meta.import.selector.scala punctuation.section.group.begin.scala
// ^^^^^^^^^^^^^^^^^^^^^^^^^ meta.import.scala meta.import.selector.scala
// ^^ meta.import.scala meta.import.selector.scala keyword.operator.arrow.scala
-- ^^^^^^^^^^^^^^^ support.other.module.haskell | ||
-- ^^^^^^^^^ meta.declaration.exports.haskell | ||
-- ^ punctuation.section.group.begin.haskell | ||
-- ^^^^^^^ variable.function.haskell |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I do like scoping this as variable.function
, but it is inconsistent with other import scopes.
test a = case a of | ||
-- ^^^^ keyword.control.conditional.select.haskell | ||
-- ^^ keyword.control.conditional.select.haskell | ||
Nothing -> 0 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Are the arrows scoped? They should be.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
They are matched by the operator pattern.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it's probably worth scoping them specifically, since operators should not be scoped as keyword
, but ->
should.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree, ->
should be a proper keyword, much like guards are. It is used in many different situations and you can't assign to it.
@@ -186,3 +260,25 @@ | |||
0XdeafBEEF42 | |||
-- ^^^^^^^^^^^^ constant.numeric.integer.hexadecimal | |||
-- ^^ punctuation.definition.numeric.base |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Probably worth also adding a test for the 0x
prefix, just to ensure the case insensitivity is working.
@djspiewak please note that none of the things you commented on have been changed in the PR, but I appreciate the comments. I'll translate them into todo items for the OP. |
Seeing how I probably won't be able to return back to this any time soon, I'm removing the draft status. As mentioned in the OP, the PR is mergeable and already an improvement, but there is much more to be improved on. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This looks great, much more precise than the current syntax. ❤️ 😍 💯 🥇 Thank you so much, @FichteFoll.
See comments about meta.group.haskell
and meta.sequence.haskell
scopes.
Same as - https://github.com/sublimehq/Packages/pull/2272/files - https://github.com/sublimehq/Packages/pull/2273/files Co-Authored-By: Nikos Baxevanis <nikos.baxevanis@gmail.com>
After merging those latest pull requests, apart from
the rest look good enough after doing a comparison with @lierdakil's atom-haskell/language-haskell grammar (which I consider one of the most precise grammars for Haskell). |
Fwiw, I think both atom-haskell/language-haskell and Sublime's Haskell grammar are more or less based on the same TextMate grammar originally. Which in theory means that it should be possible to backport atom-haskell changes here. Or alternatively, convert language-haskell to tmlanguage. Although I think sublime grammar is strictly more powerful than tmlanguage, from which Haskell grammar could benefit a lot -- my implementation has a few limitations that stem from the lack of direct control over context, which I hear sublime's grammar engine allows. |
Based on https://github.com/atom-haskell/language-haskell/blob/e036e449909816e616b880157e2703e70fc9b5df/grammars/haskell.cson#L1306-L1307 Co-Authored-By: Nikos Baxevanis <nikos.baxevanis@gmail.com>
I'm pretty sure Sublime's grammar was copied from Textmate and basically never touched since then. Trying to backport atom's grammer doesn't look like a good option because of how convoluted its regular expressions are and I'd like to take advantage of ST's power with the context stack and branching where feasible, which is better done from scratch. Thanks for the heads-up, however. On the PR at hand, I incorporated the changes suggested by @moodmosaic. I still believe record syntax to be pretty broken, as it's basically unhandled right now, but the syntax doesn't break on it and has some highlighting at least. |
There's only one thing missing to fix the record syntax, just don't match the first field and be done: As you can see above, we're matching the first field as |
This syntax file is what I wrote when I was interested in Haskell. It might be helpful as reference for unicode operator, which I studied from The Haskell 2010 language report by some effort.
|
This fixes a bug introduced via 0d36dd1
Ping |
It's still on my radar, but so are many other things. I don't have any updates to share atm. |
@FichteFoll, @wbond, this should be merged, even without the improvement described in #2225 (comment). If there's anything I can do to help further, let me know. |
@FichteFoll So you are still of the opinion that this is a pretty significant step forward? |
Definitely, it's mergeable and a significant improvement. The unchecked items in the OP are merely a checklist of things to also check out that I haven't gotten around to yet and probably won't in the near future either. They are no blockers. |
Thanks for the work on writing this and the reviews! |
- Match $ in keyword.operator.haskell - Match via in meta.deriving.haskell - Match instance declaration under newtype declaration -- | Derive 'Read' / 'Show' using @DerivingVia@. newtype Quiet a = Quiet { unQuiet :: a } instance (Generic a, QShow (Rep a)) => Show (Quiet a) where showsPrec n = qshowsPrec n . unQuiet instance (Generic a, QRead (Rep a)) => Read (Quiet a) where readPrec = fmap Quiet qreadPrec - Match @# in keyword.operator.haskell - Match @# in entity.name.function.infix.haskell - Add scope support.function.prelude.haskell - Sort support.function.prelude.haskell - Base support.function.prelude.haskell on language-haskell haskell.cson https://github.com/atom-haskell/language-haskell/blob/e036e449909816e616b880157e2703e70fc9b5df/grammars/haskell.cson - Sync syntax_test_haskell.hs with upstream https://github.com/sublimehq/Packages/blob/549ee49a38d4b77bcb1df068ef2c122a3bec75e7/Haskell/syntax_test_haskell.hs - Match type signatures that are split in newlines - Set name as Haskell (λ) - Test OPTIONS_HADDOCK - Test via (deriving) - Test type signatures that are split in newlines - Test keyword.operator.haskell (*, @, and #) - Test entity.name.function.infix.haskell (:@, and :#)
* [Haskell] Rewrite operator matching - Use variables - Highlight '*' (and combinations) as operator - Add punctuation scope to infix notation - Scope non-infix notation as `keyword.operator` * [Haskell] Update keyword matching - Add proper scopes to control keywords - Add proper scopes to declarations - Use proper scope names for entities in declarations * [Haskell] Restructure with contexts * [Haskell] Remove usages of double quoted scalars * [Haskell] Simplify string matches Also highlight superfluous characters. * [Haskell] Reduce max line length * [Haskell] Match groups * [Haskell] Adjust scopes for imports Not final due to sublimehq#1842 being unresolved, but still an improvement. * [Haskell] Match lists * [Haskell] Correctly match idents with trailing ' * [Haskell] More gracious infix operator matching * [Haskell] Adjust keyword scopes to recent standards * [Haskell] match OPTIONS_HADDOCK Same as https://github.com/sublimehq/Packages/pull/2270/files * [Haskell] match deriving (..) via (..) Same as https://github.com/sublimehq/Packages/pull/2271/files * [Haskell] match @ and # in keyword.operator.haskell Same as - https://github.com/sublimehq/Packages/pull/2272/files - https://github.com/sublimehq/Packages/pull/2273/files Co-Authored-By: Nikos Baxevanis <nikos.baxevanis@gmail.com> * [Haskell] match deriving instance (..) * [Haskell] Match functions from the prelude Based on https://github.com/atom-haskell/language-haskell/blob/e036e449909816e616b880157e2703e70fc9b5df/grammars/haskell.cson#L1306-L1307 Co-Authored-By: Nikos Baxevanis <nikos.baxevanis@gmail.com> * [Haskell] Add tests for `via` derives * [Haskell] match deriving instance (..) without breaking data deriving This fixes a bug introduced via 0d36dd1 Co-authored-by: Nikos Baxevanis <nikos.baxevanis@gmail.com>
After working with Haskell a bit, I couldn't help but notice how poor the syntax highlighting for it was. It got the job done, but there were and still are several things blatantly missing or using the wrong scopes.
This PR is in a mergeable state. I'm not done or satisfied with it, but I submitted it anyway to collect some feedback if people happen to stumble upon it. I am by no means a Haskell pro. However, this is already a significant improvement over the current syntax.
Notably, the following things still need attention:
\case
|
and->
are actual keywords, but context-dependant. Others?Fixes #214.