Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat(perl)!: replace parser #5222

Merged
merged 2 commits into from Oct 21, 2023

Conversation

rabbiveesh
Copy link
Contributor

This replaces the perl parser from ganezdragon with an actively maintained + considerably
more correct version. There are some features that we're still working on, but by and
large this has already proven very useful for me in my day to day.

One of the things that we support is coloring scalars, arrays and hashes differently. Is
there a way to expose that in a simple way to users? We're doing it by making a
@variable.scalar, @variable.array and @variable.hash.

@clason
Copy link
Contributor

clason commented Aug 10, 2023

Is this the one @leonerd has been working on?

queries/perl/injections.scm Outdated Show resolved Hide resolved
@leonerd
Copy link
Contributor

leonerd commented Aug 10, 2023

Is this the one @leonerd has been working on?

It is, yes

@clason
Copy link
Contributor

clason commented Aug 10, 2023

So, first order of business:

  1. we use a different query precedence than upstream (for now; they'll change eventually): latest match wins
  2. we use different capture names, see https://github.com/nvim-treesitter/nvim-treesitter/blob/master/CONTRIBUTING.md#highlights -- any other (toplevel) captures are illegal and trip CI.

@leonerd
Copy link
Contributor

leonerd commented Aug 10, 2023

  • we use a different query precedence than upstream (for now; they'll change eventually): latest match wins

Ah as a total aside, I was curious about that for my own implementation. I think this seems to be the most common interpretation. Good.

But I'm unclear why you bring it up specifically now - I don't think we rely on that anywhere, do we? I've mostly been using/testing this via nvim anyway, so I believe nvim's interpretation is correct.

Ahyes; probably some adjusting needs doing there then.

Copy link
Member

@ObserverOfTime ObserverOfTime left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Missing highlights for operators and punctuation.

Also missing this for shebangs:

((source_file . (comment) @preproc)
  (#lua-match? @preproc "^#!/"))

And possibly more.

queries/perl/highlights.scm Outdated Show resolved Hide resolved
queries/perl/highlights.scm Outdated Show resolved Hide resolved
queries/perl/highlights.scm Outdated Show resolved Hide resolved
queries/perl/highlights.scm Outdated Show resolved Hide resolved
queries/perl/highlights.scm Show resolved Hide resolved
queries/perl/highlights.scm Outdated Show resolved Hide resolved
queries/perl/injections.scm Outdated Show resolved Hide resolved
@clason
Copy link
Contributor

clason commented Aug 10, 2023

Ah as a total aside, I was curious about that for my own implementation. I think this seems to be the most common interpretation. Good.

It's the interpretation upstream will switch to, as far as I know.

But I'm unclear why you bring it up specifically now - I don't think we rely on that anywhere, do we? I've mostly been using/testing this via nvim anyway, so I believe nvim's interpretation is correct.

It's just a very common gotcha when queries are downstreamed, so I thought I'd mention it just to potentially save some time.

Copy link
Member

@ObserverOfTime ObserverOfTime left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You should also highlight the angle brackets as @punctutation.delimiter if possible.

I would suggest adding these as well:

(interior_sequence
  (sequence_letter) @character
  (#eq? @letter "X")
  (content) @text.reference)
(interior_sequence
  (sequence_letter) @character
  (#eq? @letter "E")
  (content) @string.escape)

queries/pod/highlights.scm Outdated Show resolved Hide resolved
queries/pod/highlights.scm Show resolved Hide resolved
queries/pod/highlights.scm Outdated Show resolved Hide resolved
queries/pod/highlights.scm Outdated Show resolved Hide resolved
queries/pod/highlights.scm Outdated Show resolved Hide resolved
queries/pod/highlights.scm Outdated Show resolved Hide resolved
@clason
Copy link
Contributor

clason commented Aug 12, 2023

@rabbiveesh we've just switched the format of injection queries to that of upstream, so please rebase and adapt yours. (Holler if you need help, but should be straightforward.)

@leonerd
Copy link
Contributor

leonerd commented Aug 12, 2023

Damn, I literally just yesterday implemented injections in Text::Treesitter. Looks like I'm gonna have to alter it again then.

@lucario387
Copy link
Member

There isn't much to alter, just that only @injection.language and @injection.content now

queries/perl/injections.scm Outdated Show resolved Hide resolved
@clason
Copy link
Contributor

clason commented Aug 12, 2023

Damn, I literally just yesterday implemented injections in Text::Treesitter. Looks like I'm gonna have to alter it again then.

Oh, you were basing your own implementation on Neovim's custom format? That's not a good idea... The new format is the same that tree-sitter (and everybody else) is using, so you should, too ;)

@rabbiveesh
Copy link
Contributor Author

I'm working on the operator + punctuation highlighting RN, and I'm having an issue. We handle parsing semicolons in our scanner, and I can't for the life of me get tree-sitter to match the semicolon token. Any ideas?
the obvious
";" @punctuation.delimiter is not working, b/c it doesn't come out as a token. And I can't get a match using #eq? to work either.

Worst comes to worst i can always rework the grammar to parse semicolon as a normal token, but looking fo rideas

@amaanq
Copy link
Member

amaanq commented Aug 13, 2023

alias $._PERLY_SEMICOLON to ';' wherever it's used

@ObserverOfTime
Copy link
Member

Or change the name so it isn't hidden.

@clason clason added the breaking change changes that users need to adapt to label Aug 14, 2023
@rabbiveesh
Copy link
Contributor Author

Rebased + force-pushed so CI can run.

Still open question about the variable.scalar and such highlights that i've moved to the bottom of the perl highlights file. Are those not allowed?

@ObserverOfTime
Copy link
Member

Still open question about the variable.scalar and such highlights that i've moved to the bottom of the perl highlights file. Are those not allowed?

They are, but they won't be highlighted any differently by default.

@rabbiveesh
Copy link
Contributor Author

Okay, now it's ready!

queries/perl/highlights.scm Outdated Show resolved Hide resolved
queries/perl/highlights.scm Show resolved Hide resolved
queries/perl/highlights.scm Outdated Show resolved Hide resolved
queries/perl/highlights.scm Outdated Show resolved Hide resolved
queries/perl/highlights.scm Outdated Show resolved Hide resolved
queries/perl/highlights.scm Outdated Show resolved Hide resolved
queries/perl/highlights.scm Outdated Show resolved Hide resolved
@amaanq
Copy link
Member

amaanq commented Aug 14, 2023

I'm also a bit concerned about state count, a 9000 large state count might be very laggy/slow on lower end machines, just something to consider

@leonerd
Copy link
Contributor

leonerd commented Aug 14, 2023

I'm also a bit concerned about state count, a 9000 large state count might be very laggy/slow on lower end machines, just something to consider

Yeah; we're not happy about that either. A big part of that is because tree-sitter itself doesn't have a native concept of list-associative, or non-associative operators. When we tried to properly implement things like chaining comparisons 1 <= $x <= 10 or the non-associative isa operator, these all made the resulting parser.c file enormous, and very slow to generate and compile. We'd love to find a better way to implement them without some of the hackery we have, but we've bounced back and forward between lots of other tree-sitter related folks and not found a solution. I suspect to do it better needs tree-sitter itself to gain support for list-associative operators. It was rather beyond our scope to make such a feature addition.

But if you have any way on the process, or even know the right folks to talk to, please do get in touch :) We want that fixed.

Copy link
Member

@ObserverOfTime ObserverOfTime left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Perl is still missing @punctuation.bracket highlights, $# (maybe @operator ?), some operators (e.g. &&), and regexp modifiers (@character.special). Built-in variables (e.g. ARGV) should be highlighted as @variable.builtin (or @constant.builtin where applicable).

Pod should include injections for =begin and =for blocks. You could also add folds but they aren't necessary.

queries/pod/highlights.scm Outdated Show resolved Hide resolved
queries/perl/highlights.scm Outdated Show resolved Hide resolved
queries/perl/highlights.scm Outdated Show resolved Hide resolved
queries/perl/highlights.scm Outdated Show resolved Hide resolved
@ObserverOfTime
Copy link
Member

ObserverOfTime commented Aug 15, 2023

Also, if you can make the regexp content a named node or field so it can be captured in textobjects, that'd be appreciated (but not necessary).

@rabbiveesh
Copy link
Contributor Author

I actually meant INTERNAL consistency, not here.

I need to test a different part of the capture in one of the injections, and that's tripping the queries test. How can I do that w/o getting an error (basically, we inject perl into itself when the replacement of a regexp is a perl expression, ie when the modifiers on your s operator is a single e)

@clason
Copy link
Contributor

clason commented Oct 12, 2023

If you only need a capture to refer to it in a predicate, you need to mark it as private with an underscore: @_modifiers. That is tripping the test.

@rabbiveesh
Copy link
Contributor Author

Ah, thanks a mil, fixed + repushed

queries/perl/highlights.scm Show resolved Hide resolved
queries/perl/highlights.scm Show resolved Hide resolved
queries/perl/highlights.scm Outdated Show resolved Hide resolved
queries/perl/highlights.scm Outdated Show resolved Hide resolved
queries/perl/highlights.scm Outdated Show resolved Hide resolved
queries/perl/injections.scm Outdated Show resolved Hide resolved
queries/pod/highlights.scm Outdated Show resolved Hide resolved
queries/perl/highlights.scm Outdated Show resolved Hide resolved
@ObserverOfTime
Copy link
Member

the language names aren't standardized, so we're not sure how much usefulness it'll provide

This is a problem that all markup languages share (#812).
I don't know how useful language injections would be in pod's case,
but if you can do it without having to update the parser then why not?

@rabbiveesh
Copy link
Contributor Author

the language names aren't standardized, so we're not sure how much usefulness it'll provide

This is a problem that all markup languages share (#812). I don't know how useful language injections would be in pod's case, but if you can do it without having to update the parser then why not?

well, only for blocks would work out of the box. Otherwise, we'd need to add parsing for begin/end sections, and you'll have to get @leonerd to do that b/c I tried and it didn't go well. But like i said, we can add it in later + take care of it then

@ObserverOfTime
Copy link
Member

That's fine then.

queries/pod/highlights.scm Outdated Show resolved Hide resolved
queries/pod/highlights.scm Outdated Show resolved Hide resolved
queries/perl/injections.scm Outdated Show resolved Hide resolved
@rabbiveesh
Copy link
Contributor Author

Okay, only question remaining is if i can keep using plain-ol match or if we should switch it to lua-match

@rabbiveesh
Copy link
Contributor Author

Is CI borked? these failures don't seem to have anything to do w/ my PR

@ObserverOfTime
Copy link
Member

Please avoid merge commits. :)

@rabbiveesh
Copy link
Contributor Author

oh sorry, i'm used to a squash merge flow. Let me rebase

@ObserverOfTime
Copy link
Member

ObserverOfTime commented Oct 16, 2023

Is CI borked? 

CI does seem to be borked.

i'm used to a squash merge flow

You can squash the commits if you want.
Maybe one for perl and one for pod?

lockfile.json Outdated Show resolved Hide resolved
@clason
Copy link
Contributor

clason commented Oct 17, 2023

You can squash the commits if you want.

No, please do squash commits, especially if you want to keep perl and pod separate. (In that case, the pod commit needs to be first for the injections to not be broken.)

@rabbiveesh
Copy link
Contributor Author

Sure, but could I just squash it down to one? POD doesn't get too much use outside of perl, so it's effectively one commit. Unless you specifically want them separate

@clason
Copy link
Contributor

clason commented Oct 17, 2023

That's fine, too.

@rabbiveesh
Copy link
Contributor Author

k, squashed; did it in 2 commits b/c the commit log reads bettter that way

@clason clason requested a review from amaanq October 17, 2023 20:07
@clason
Copy link
Contributor

clason commented Oct 21, 2023

@amaanq

@amaanq amaanq merged commit bb3f8f4 into nvim-treesitter:master Oct 21, 2023
8 checks passed
@amaanq
Copy link
Member

amaanq commented Oct 21, 2023

Thanks for your patience and contribution @rabbiveesh!

@rabbiveesh
Copy link
Contributor Author

🥳 thanks for your patience!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
breaking change changes that users need to adapt to
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

6 participants