Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add to GitHub #14

Closed
lpil opened this issue Feb 24, 2022 · 22 comments
Closed

Add to GitHub #14

lpil opened this issue Feb 24, 2022 · 22 comments

Comments

@lpil
Copy link
Member

lpil commented Feb 24, 2022

Looks like we may be able to add this to GitHub to get better support for Gleam.

https://twitter.com/importantshock/status/1496884996317011968

Create an issue on your repo and CC me (/patrickt on GitHub) or /dcreager.

Is this something the grammar and queries are ready for?

@the-mikedavis
Copy link
Member

We'll need tags.scm queries (see elixir-lang/tree-sitter-elixir#18 and tree-sitter/tree-sitter#1649). Otherwise I think this parser is in great shape :)

I can take a look at adding these, I bet it wouldn't be too tough given how the nodes are structured 👍

@J3RN
Copy link
Member

J3RN commented Feb 24, 2022

Hi @lpil! Sorry for the delay. As @the-mikedavis said, the parser is in pretty good shape. I have a few items on my shortlist:

  • Move corpus under test (I think this can just be done with no further changes; just an organization item)
  • Rename type_constructor* to data_constructor*; it's a misnomer as-is

New one:

And also tags.scm, but it looks like @the-mikedavis got that one 😁

Somewhat related, but I think we should move the canonical version of this repository to under the github.com/gleam-lang organization once those are done, but before we ask for it to be used with GitHub.

I'll post updates here; I'm hoping to get my shortlist done this evening 😁

@lpil
Copy link
Member Author

lpil commented Feb 25, 2022

All sounds fab to me!

@J3RN
Copy link
Member

J3RN commented Mar 11, 2022

Hey @lpil, sorry for the delay! The repository is now at a good point to transition ownership to under the gleam-lang org. Whether you want to fork it to that org or clone-and-push a copy is totally up to you! Once it's over there, I'd love if you could add myself and @the-mikedavis as maintainers of the project, and it should also be ready to provide to GitHub for integration 🙏🎉

@the-mikedavis
Copy link
Member

I remember transferring a repo to be a bit strange when working on tree-sitter-iex. Iirc I transferred it to José and then José transferred it to the elixir-lang organization, and doing that way you keep the git history, stars and prs/issues

@J3RN
Copy link
Member

J3RN commented Mar 11, 2022

Ah, I didn't even think about GitHub having an official mechanism for this! 😅 Sure enough, I tried requesting a transfer to gleam-lang, but that didn't work as it said I didn't have access to create repositories for gleam-lang. I have now requested a transfer to @lpil, such that he can then transfer it to gleam-lang.

@J3RN
Copy link
Member

J3RN commented Mar 19, 2022

Hey @lpil! Just wondering if we've reached out to the team at GitHub about adding tree-sitter-gleam functionality to their platform. If not we can cc them here 😁

@patrickt
Copy link
Contributor

👋🏻 GitHubber here. We’ve added Gleam support to our roadmap. I can’t make guarantees as to a timeframe, unfortunately, but we’re on the case!

@lpil
Copy link
Member Author

lpil commented Mar 22, 2022

Amazing, thank you @patrickt !!

@patrickt
Copy link
Contributor

patrickt commented May 9, 2022

We’ve just integrated highlights.scm-based syntax highlighting for Gleam! You may notice slightly faster page load times.

@lpil
Copy link
Member Author

lpil commented May 9, 2022

@patrickt Great news! Though to me it doesn't look quite right. We have lost syntax highlighting for types and constructors, and labels and arguments are two different colours now.

image

The orange module names stand out a lot more than I think is ideal too. In this snippet I think they are not very important but they are very bold. Previously they were white, like variables.

image

Is this something we can fix on our end?

@lpil
Copy link
Member Author

lpil commented May 10, 2022

Found a few more places in which syntax highlighting is broken:

Type definitions are no longer highlighted

image

https://github.com/gleam-lang/stdlib/blob/a78e3a2eca5593eb0e9eb3c97d0d776a25081573/test/gleam/string_test.gleam#L442-L446

Strings with // in them break highlighting

https://github.com/gleam-lang/stdlib/blob/a78e3a2eca5593eb0e9eb3c97d0d776a25081573/test/gleam/string_test.gleam#L602-L622

image

@J3RN
Copy link
Member

J3RN commented May 10, 2022

FWIW, the general non-highlighting of types (and data constructors/names, which we treat as types for better or for worse) appears to be an issue on GitHub's side. Those use the @type highlight name, which tree-sitter-rust uses as well (our matching, for comparison), and works well in the tree-sitter based Emacs mode:
image

We also verify this in some of our highlighting tests.

The "comment in string" bug is real and affects environments other than GitHub. I'll try to run that down this evening.

@J3RN
Copy link
Member

J3RN commented May 17, 2022

To give a small update on this, the string/comment parsing issue appears to be an artifact of parsing strings as non-terminal nodes/tokens, which means that any field in extras (whitespace, comments) should be able to appear before any child node of the string node.

To counter this, I tried wrapping the string child nodes with token.immediate, whose documentation states:

Usually, whitespace (and any other extras, such as comments) is optional before each token. This function means that the token will only match if there is no whitespace.

It may truly be specific to whitespace, because it had no effect.

My approach looks like this:

    string: ($) =>
      seq(
        '"',
        repeat(choice($.escape_sequence, $.quoted_content)),
        token.immediate('"')
      ),
    escape_sequence: ($) => token.immediate(/\\[efnrt\"\\]/),
    quoted_content: ($) => token.immediate(/(?:[^\\\"]|\\[^efnrt\"\\])+/),

FWIW, tree-sitter-rust's approach is very similar:

    string_literal: $ => seq(
      alias(/b?"/, '"'),
      repeat(choice(
        $.escape_sequence,
        $._string_content
      )),
      token.immediate('"')
    ),
    // ...
    escape_sequence: $ => token.immediate(
      seq('\\',
        choice(
          /[^xu]/,
          /u[0-9a-fA-F]{4}/,
          /u{[0-9a-fA-F]+}/,
          /x[0-9a-fA-F]{2}/
        )
      )),

A notable difference, however, is that tree-sitter-rust's _string_content anonymous node is implemented as an external scanner, whereas our equivalent (quoted_content) is implemented in the tree-sitter DSL. I'll attempt to make it an external scanner to see if that makes a difference.

the-mikedavis added a commit to the-mikedavis/helix that referenced this issue May 23, 2022
With respect to the queries:

The locals scope for functions was not large enough, so a function's
parameter could outlive the function body. To fix it, we just widen
the scope to the `function` node.

See also gleam-lang/tree-sitter-gleam#25

With respect to the parser:

An external scanner has been added that fixes the parsing of strings.
Previously, a comment inside a string would act like a comment rather
than string contents.

See also gleam-lang/tree-sitter-gleam#14 (comment)
@patrickt
Copy link
Contributor

patrickt commented May 24, 2022

@patrickt Great news!

Thanks! I’m glad you’re enjoying it and I’m grateful for your patience.

Though to me it doesn't look quite right. We have lost syntax highlighting for types and constructors, and labels and arguments are two different colours now.

As regards type names, that is technically correct. @type is defined to map to an smi CSS class, that in dark mode is meant to map to an uncolored face (see also Java interfaces, which do the same). This is true also for @property, which, if you prefer argument labels to be highlighted the same as arguments, might not be the right choice. Given that argument labels are a pretty common construct, we could perhaps add a new @property.label class, if you think it’s too jarring. However, all other things being equal, our team would prefer to keep these highlighted differently so as to be consistent with highlighting for languages that have this distinction such as Swift.

With respect to constructors, I think those should be mapped to the @constructor highlight, which doesn’t appear to be mentioned in highlights.scm, so I think that should be fixable on your end.

The orange module names stand out a lot more than I think is ideal too. In this snippet I think they are not very important but they are very bold. Previously they were white, like variables.

I agree it’s pretty loud. Unfortunately, we don’t have any facilities for grammars to direct the manner in which things are highlighted: on our end, it’s basically just a map from highlighting classes to CSS classes (closed-source, unfortunately, though I can give a subject-to-change summary of what is done should you need it). I think it would be cool to allow highlighting grammars to suggest how emphatically or subtly a given class should be highlighted, but that would be a long-term project (as would making the module highlighting a little less punchy, given that we’d have to get input from design, plan for the various colorblindness modes, etc etc). Note that Elixir module names are also highlighted thusly, so at least we’re consistent in our gaudiness. :)

If you’d prefer for us to go back to the old highlighting queries, that’s not a problem on our end (though you will notice some marginally longer page load times).

@patrickt
Copy link
Contributor

Oh, I see that the comment-in-string bug is fixed. I can bump that today.

@J3RN
Copy link
Member

J3RN commented May 24, 2022

I forgot to address the property vs argument bit. The TL;DR is

  1. Labels and the arguments they label aren't the same syntactically, so I believe that there should be an option to highlight them differently. Personally, I find having them highlighted differently to aid readability.
  2. The same parsing logic and AST nodes are used for arguments to data constructors and arguments to functions. The use of the "property" highlight was born from the former as "property" seemed to fit "the fields of a record/struct" better than anything else.

I'm happy to change what highlight is applied to labels and also to differentiate the highlight for data constructor argument labels vs function argument labels if desired (though I'm not especially fond of that idea), but I do believe that labels and the arguments they point to should be able to be highlighted differently.

the-mikedavis added a commit to the-mikedavis/helix that referenced this issue May 24, 2022
With respect to the queries:

The locals scope for functions was not large enough, so a function's
parameter could outlive the function body. To fix it, we just widen
the scope to the `function` node.

See also gleam-lang/tree-sitter-gleam#25

With respect to the parser:

An external scanner has been added that fixes the parsing of strings.
Previously, a comment inside a string would act like a comment rather
than string contents.

See also gleam-lang/tree-sitter-gleam#14 (comment)

A new constructor node has been added as well which makes type
highlighting more fine grained.

See also gleam-lang/tree-sitter-gleam#29
archseer pushed a commit to helix-editor/helix that referenced this issue May 25, 2022
With respect to the queries:

The locals scope for functions was not large enough, so a function's
parameter could outlive the function body. To fix it, we just widen
the scope to the `function` node.

See also gleam-lang/tree-sitter-gleam#25

With respect to the parser:

An external scanner has been added that fixes the parsing of strings.
Previously, a comment inside a string would act like a comment rather
than string contents.

See also gleam-lang/tree-sitter-gleam#14 (comment)

A new constructor node has been added as well which makes type
highlighting more fine grained.

See also gleam-lang/tree-sitter-gleam#29
@patrickt
Copy link
Contributor

The new highlighting queries have been bumped in production.

mtoohey31 pushed a commit to mtoohey31/helix that referenced this issue Jun 15, 2022
With respect to the queries:

The locals scope for functions was not large enough, so a function's
parameter could outlive the function body. To fix it, we just widen
the scope to the `function` node.

See also gleam-lang/tree-sitter-gleam#25

With respect to the parser:

An external scanner has been added that fixes the parsing of strings.
Previously, a comment inside a string would act like a comment rather
than string contents.

See also gleam-lang/tree-sitter-gleam#14 (comment)

A new constructor node has been added as well which makes type
highlighting more fine grained.

See also gleam-lang/tree-sitter-gleam#29
mtoohey31 pushed a commit to mtoohey31/helix that referenced this issue Jun 15, 2022
With respect to the queries:

The locals scope for functions was not large enough, so a function's
parameter could outlive the function body. To fix it, we just widen
the scope to the `function` node.

See also gleam-lang/tree-sitter-gleam#25

With respect to the parser:

An external scanner has been added that fixes the parsing of strings.
Previously, a comment inside a string would act like a comment rather
than string contents.

See also gleam-lang/tree-sitter-gleam#14 (comment)

A new constructor node has been added as well which makes type
highlighting more fine grained.

See also gleam-lang/tree-sitter-gleam#29
@lpil
Copy link
Member Author

lpil commented Nov 17, 2022

Hi @patrickt , is there a process for updating the grammar on GitHub? Thank you 🙏

@lpil
Copy link
Member Author

lpil commented Dec 12, 2022

https://github.com/github/linguist/tree/master/vendor/grammars

I think we may be able to update this here!

@the-mikedavis
Copy link
Member

I think the new tree-sitter highlighting/analysis is separate from the linguist configurations and is probably closed-source for now. The grammars in linguist are textmate grammars (regex-based) I believe - for example the Elixir one points at https://github.com/elixir-editors/elixir-tmbundle/blob/b01fffc49179bdec936ca19b53ba4fc7c51a2cc0/Syntaxes/Elixir.tmLanguage although Elixir is highlighted/analyzed with tree-sitter

@lpil
Copy link
Member Author

lpil commented Dec 13, 2022

Ah boo, I misread and thought they had moved the tree sitters into this repo as well.

@lpil lpil closed this as completed Feb 6, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants