Add Perl 6 Pod lexer #978

Tyil · 2018-08-29T10:38:11Z

I've been working to add Perl 6 Pod as a Lexer, to improve the current support for Perl 6 Pod on GitLab.

I've tested my lexer visually, and I am reasonably content with the results thus far. However, before this can be merged, there's still some issues to iron out.

rake fails, reporting a lot of <Token Error> symbols.
There's a TODO that should be resolved. It won't pose problems most of the times, but I'd rather not have it posing problems ever.

If possible, I'd like to have some feedback to grasp the issue that makes rake fail right now, so that I can fix this. I'd also like to have some pointers on fixing the TODO.

vidarh

Hi, thanks for your submission. I'm trying to help triage lexer submissions to see if we can work down the pull request backlog... Please take a look at the comments.

lib/rouge/demos/perl6pod

vidarh · 2019-01-11T17:34:06Z

lib/rouge/lexers/perl6pod.rb

+      state :item do
+         rule(/\n/, Text::Whitespace, :pop!)
+
+         rule(/B\</, Punctuation::Indicator, :formatting_b)


The :formatting_... part seems excessive. You can do this with far fewer rules. E.g, something like:

formatting_tokens = { "B" => Generic::String, ... } rule(/([BCEIKLN...]<)([^>]*)>/) do m t = formatting_tokens[m[0][0]] groups Punctuation::Indicator, t , Punctuation::Indicator end

.. or similar ought to work. (Untested; let me know if you run into problems)

If I use this, I lose the < and > in the output.

lib/rouge/lexers/perl6pod.rb

labster · 2019-07-06T08:53:49Z

I can look at this for a bit. Do you need any help putting these changes together @Tyil?

I'm also wondering if there's a way to call into this parser from a theoretical Perl 6 parser I might write. Or would it make more sense to just merge it in at a later date? POD6 files are valid Perl 6, right?

Tyil · 2019-07-06T10:56:47Z

On Sat, 06 Jul 2019 01:53:52 -0700 Brent Laabs ***@***.***> wrote: I can look at this for a bit. Do you need any help putting these changes together @Tyil?

I probably will, Ruby is not my first language after all. However, this weekend I have plans already, but I should have some spare time during the week and the following weekend to look at this again.

I'm also wondering if there's a way to call into this parser from a theoretical Perl 6 parser I might write. Or would it make more sense to just merge it in at a later date? POD6 files are valid Perl 6, right?

POD6 is valid Perl 6, yes. If you were to write a Perl 6 parser, I would be interested in looking at it, as the language is huge and I would presume it to be very hard to parse in different languages. The people in #perl6 on Freenode can certainly help you out if needed in this aspect, though.

…

-- With kind regards, Patrick Spek www: https://www.tyil.nl/ mail: p.spek@tyil.nl pgp: 1660 F6A2 DFA7 5347 322A 4DC0 7A6A C285 E2D9 8827

Tyil · 2019-07-11T07:38:58Z

@labster I've added some changes that hopefully make the PR on itself better, however, rake still fails with a number of Token Errors, which seem to be all about newlines ("\n" and "\n\n"). I'm unsure what I'm doing wrong. If you have any pointers, it would be much appreciated!

pyrmont · 2019-07-11T09:53:02Z

@Tyil I suspect the problem is that /./ does not match newlines. You also need a rule in the :root state to match newlines that might come after =end pod. Finally, Rouge uses 2-space indentation—could you tidy up the indentation to conform with that?

Tyil · 2019-07-11T09:56:20Z

The white space is easily solved, and should be OK now. I'll take a look at the other points you've made.

Tyil · 2019-07-11T10:06:25Z

OK, I updated the /./ to /./m to match newlines. I've checked the output with rackup, and it still seems to look alright. The Token Errors on \n seem to have been resolved with this as well.

pyrmont · 2019-07-11T18:36:22Z

@jneen Is creating tokens per character (as happens in a number of places here) worse from a performance perspective than matching as much of the text as possible? Is that an anti-pattern we should discourage?

pyrmont · 2019-07-18T17:02:44Z

Pinging @jneen regarding the question in the previous comment.

jneen · 2019-07-18T17:05:12Z

Yes - try to avoid matching character-at-a-time.

jneen · 2019-07-18T17:06:21Z

If you're waiting to encounter a specific set of characters, use a negative character class, like /[^:=]+/

pyrmont

Here's some comments :) Oh, and you're missing a spec file that will run tests. You can see some examples in spec/lexers/ :)

lib/rouge/lexers/perl6pod.rb

pyrmont · 2019-07-19T10:00:07Z

lib/rouge/lexers/perl6pod.rb

+        rule(/\=output/, Keyword, :output)
+        rule(/\=defn/, Keyword)
+
+        rule(/(BCEIKLNTUZ)<([^>]*)>/) do |m|


Is (BCEIKLNTUZ) meant to be a character class?

Yes, it was, so I tried to fix that. However, this particular rule seemed to drop the < and > characters, which are important to keep. If I can get pointers on how to do something like this that keeps the < and > characters, I can put a working variant back into the PR.

I was wondering if there's a way to forcefully insert a character into the resulting document, so I can simply push the < and > back in. If there is such a method, I could rewrite some of the =begin ... =end logic to be a single block as well, I think. This would reduce the overall size of the codebase and duplicated efforts for all the special blocks that I have right now.

You might want to read through Ruby's Regexp documentation. It sounds like that might be helpful in consolidating a few things.

To answer your specific question, I think what you want to use are capture groups. This allows you to 'capture' the value matching the capture group's pattern. Using this, you could rewrite this:

rule(/\=begin code/, Keyword, :block_code) rule(/\=begin input/, Keyword, :block_input) rule(/\=begin output/, Keyword, :block_output)

as this:

rule(/(\=begin)( )(code|input|output)/i) do |m| s = case |m[3].downcase| # use the result of the third capture group when "code" then :block_code when "input" then :block_input when "output" then :block_output end groups Keyword, Text, Keyword # use the groups method to assign tokens to capture groups in order push s # push the new state onto the stack end

The rule method can take a block (as in the example above) that accepts a single variable (the convention in Rouge is to call the variable m for 'match'). Within the block you can access the capture groups with the [] operator. m[0] is the complete pattern, m[1] is the first capture group, m[2] is the second capture group and so on.

lib/rouge/lexers/perl6pod.rb

The markup stuff has been removed, as it currently doesn't work (the < and > characters were being dropped, which is not desired). Additionally, applying some requests eventually broke the block attributes, so a new solution to get these highlighted correctly needs to be found. If I can find a solution to the former, I can most likely also resolve the latter, while reducing the codebase.

Tyil · 2019-07-19T11:37:16Z

I will look into adding a spec once I get the rendering to look the way I intended.

Tyil force-pushed the perl6pod branch from 41f3fb0 to 2991fa9 Compare August 29, 2018 11:02

vidarh suggested changes Jan 11, 2019

View reviewed changes

pyrmont added the needs-review The PR needs to be reviewed label Jun 26, 2019

Tyil added 4 commits July 11, 2019 09:08

Add Perl 6 Pod lexer

539bfcb

Make perl6pod demo file smaller, as requested in the PR

cf2852c

Merge semantic matchers together

0f2bb64

Put inline formatting into smaller blocks

0e96fff

Tyil force-pushed the perl6pod branch from 2991fa9 to 0e96fff Compare July 11, 2019 07:31

Redo indentation, to conform to the Rouge codebase

334e56c

Update /./ to /./m, to match newlines

ec2a637

pyrmont added maintainer-action The PR has been reviewed but action by a maintainer is required and removed needs-review The PR needs to be reviewed labels Jul 11, 2019

pyrmont added needs-review The PR needs to be reviewed and removed maintainer-action The PR has been reviewed but action by a maintainer is required labels Jul 19, 2019

pyrmont suggested changes Jul 19, 2019

View reviewed changes

Tyil force-pushed the perl6pod branch from 4582dc3 to bdb5210 Compare July 19, 2019 11:32

pyrmont added author-action The PR has been reviewed but action by the author is needed and removed needs-review The PR needs to be reviewed labels Jul 19, 2019

pyrmont self-assigned this Aug 26, 2019

rypervenche mentioned this pull request Aug 20, 2020

Need Raku in popular syntax highlighters Raku/raku-most-wanted#50

Open

pyrmont removed their assignment Dec 9, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add Perl 6 Pod lexer #978

Add Perl 6 Pod lexer #978

Tyil commented Aug 29, 2018 •

edited

vidarh left a comment

vidarh Jan 11, 2019

Tyil Jul 19, 2019

labster commented Jul 6, 2019

Tyil commented Jul 6, 2019 via email

Tyil commented Jul 11, 2019

pyrmont commented Jul 11, 2019

Tyil commented Jul 11, 2019

Tyil commented Jul 11, 2019

pyrmont commented Jul 11, 2019

pyrmont commented Jul 18, 2019

jneen commented Jul 18, 2019

jneen commented Jul 18, 2019

pyrmont left a comment

pyrmont Jul 19, 2019

Tyil Jul 19, 2019

pyrmont Jul 19, 2019

Tyil commented Jul 19, 2019

Add Perl 6 Pod lexer #978

Are you sure you want to change the base?

Add Perl 6 Pod lexer #978

Conversation

Tyil commented Aug 29, 2018 • edited

vidarh left a comment

Choose a reason for hiding this comment

vidarh Jan 11, 2019

Choose a reason for hiding this comment

Tyil Jul 19, 2019

Choose a reason for hiding this comment

labster commented Jul 6, 2019

Tyil commented Jul 6, 2019 via email

Tyil commented Jul 11, 2019

pyrmont commented Jul 11, 2019

Tyil commented Jul 11, 2019

Tyil commented Jul 11, 2019

pyrmont commented Jul 11, 2019

pyrmont commented Jul 18, 2019

jneen commented Jul 18, 2019

jneen commented Jul 18, 2019

pyrmont left a comment

Choose a reason for hiding this comment

pyrmont Jul 19, 2019

Choose a reason for hiding this comment

Tyil Jul 19, 2019

Choose a reason for hiding this comment

pyrmont Jul 19, 2019

Choose a reason for hiding this comment

Tyil commented Jul 19, 2019

Tyil commented Aug 29, 2018 •

edited