Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add Perl 6 Pod lexer #978

Open
wants to merge 7 commits into
base: master
Choose a base branch
from
Open

Add Perl 6 Pod lexer #978

wants to merge 7 commits into from

Conversation

Tyil
Copy link

@Tyil Tyil commented Aug 29, 2018

I've been working to add Perl 6 Pod as a Lexer, to improve the current support for Perl 6 Pod on GitLab.

I've tested my lexer visually, and I am reasonably content with the results thus far. However, before this can be merged, there's still some issues to iron out.

  1. rake fails, reporting a lot of <Token Error> symbols.
  2. There's a TODO that should be resolved. It won't pose problems most of the times, but I'd rather not have it posing problems ever.

If possible, I'd like to have some feedback to grasp the issue that makes rake fail right now, so that I can fix this. I'd also like to have some pointers on fixing the TODO.

Copy link
Contributor

@vidarh vidarh left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi, thanks for your submission. I'm trying to help triage lexer submissions to see if we can work down the pull request backlog... Please take a look at the comments.

lib/rouge/demos/perl6pod Show resolved Hide resolved
state :item do
rule(/\n/, Text::Whitespace, :pop!)

rule(/B\</, Punctuation::Indicator, :formatting_b)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The :formatting_... part seems excessive. You can do this with far fewer rules. E.g, something like:

    formatting_tokens = { "B" => Generic::String, ... }
    rule(/([BCEIKLN...]<)([^>]*)>/) do m
      t = formatting_tokens[m[0][0]]
      groups Punctuation::Indicator, t , Punctuation::Indicator
    end 

.. or similar ought to work. (Untested; let me know if you run into problems)

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If I use this, I lose the < and > in the output.

lib/rouge/lexers/perl6pod.rb Outdated Show resolved Hide resolved
@pyrmont pyrmont added the needs-review The PR needs to be reviewed label Jun 26, 2019
@labster
Copy link
Contributor

labster commented Jul 6, 2019

I can look at this for a bit. Do you need any help putting these changes together @Tyil?

I'm also wondering if there's a way to call into this parser from a theoretical Perl 6 parser I might write. Or would it make more sense to just merge it in at a later date? POD6 files are valid Perl 6, right?

@Tyil
Copy link
Author

Tyil commented Jul 6, 2019 via email

@Tyil
Copy link
Author

Tyil commented Jul 11, 2019

@labster I've added some changes that hopefully make the PR on itself better, however, rake still fails with a number of Token Errors, which seem to be all about newlines ("\n" and "\n\n"). I'm unsure what I'm doing wrong. If you have any pointers, it would be much appreciated!

@pyrmont
Copy link
Contributor

pyrmont commented Jul 11, 2019

@Tyil I suspect the problem is that /./ does not match newlines. You also need a rule in the :root state to match newlines that might come after =end pod. Finally, Rouge uses 2-space indentation—could you tidy up the indentation to conform with that?

@Tyil
Copy link
Author

Tyil commented Jul 11, 2019

The white space is easily solved, and should be OK now. I'll take a look at the other points you've made.

@Tyil
Copy link
Author

Tyil commented Jul 11, 2019

OK, I updated the /./ to /./m to match newlines. I've checked the output with rackup, and it still seems to look alright. The Token Errors on \n seem to have been resolved with this as well.

@pyrmont
Copy link
Contributor

pyrmont commented Jul 11, 2019

@jneen Is creating tokens per character (as happens in a number of places here) worse from a performance perspective than matching as much of the text as possible? Is that an anti-pattern we should discourage?

@pyrmont pyrmont added maintainer-action The PR has been reviewed but action by a maintainer is required and removed needs-review The PR needs to be reviewed labels Jul 11, 2019
@pyrmont
Copy link
Contributor

pyrmont commented Jul 18, 2019

Pinging @jneen regarding the question in the previous comment.

@jneen
Copy link
Member

jneen commented Jul 18, 2019

Yes - try to avoid matching character-at-a-time.

@jneen
Copy link
Member

jneen commented Jul 18, 2019

If you're waiting to encounter a specific set of characters, use a negative character class, like /[^:=]+/

@pyrmont pyrmont added needs-review The PR needs to be reviewed and removed maintainer-action The PR has been reviewed but action by a maintainer is required labels Jul 19, 2019
Copy link
Contributor

@pyrmont pyrmont left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Here's some comments :) Oh, and you're missing a spec file that will run tests. You can see some examples in spec/lexers/ :)

lib/rouge/lexers/perl6pod.rb Outdated Show resolved Hide resolved
rule(/\=output/, Keyword, :output)
rule(/\=defn/, Keyword)

rule(/(BCEIKLNTUZ)<([^>]*)>/) do |m|
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is (BCEIKLNTUZ) meant to be a character class?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, it was, so I tried to fix that. However, this particular rule seemed to drop the < and > characters, which are important to keep. If I can get pointers on how to do something like this that keeps the < and > characters, I can put a working variant back into the PR.

I was wondering if there's a way to forcefully insert a character into the resulting document, so I can simply push the < and > back in. If there is such a method, I could rewrite some of the =begin ... =end logic to be a single block as well, I think. This would reduce the overall size of the codebase and duplicated efforts for all the special blocks that I have right now.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You might want to read through Ruby's Regexp documentation. It sounds like that might be helpful in consolidating a few things.

To answer your specific question, I think what you want to use are capture groups. This allows you to 'capture' the value matching the capture group's pattern. Using this, you could rewrite this:

rule(/\=begin code/, Keyword, :block_code)
rule(/\=begin input/, Keyword, :block_input)
rule(/\=begin output/, Keyword, :block_output)

as this:

rule(/(\=begin)( )(code|input|output)/i) do |m|
  s = case |m[3].downcase| # use the result of the third capture group
      when "code" then :block_code
      when "input" then :block_input
      when "output" then :block_output
  end
  groups Keyword, Text, Keyword # use the groups method to assign tokens to capture groups in order
  push s # push the new state onto the stack
end

The rule method can take a block (as in the example above) that accepts a single variable (the convention in Rouge is to call the variable m for 'match'). Within the block you can access the capture groups with the [] operator. m[0] is the complete pattern, m[1] is the first capture group, m[2] is the second capture group and so on.

lib/rouge/lexers/perl6pod.rb Outdated Show resolved Hide resolved
lib/rouge/lexers/perl6pod.rb Outdated Show resolved Hide resolved
lib/rouge/lexers/perl6pod.rb Outdated Show resolved Hide resolved
lib/rouge/lexers/perl6pod.rb Outdated Show resolved Hide resolved
lib/rouge/lexers/perl6pod.rb Outdated Show resolved Hide resolved
lib/rouge/lexers/perl6pod.rb Outdated Show resolved Hide resolved
The markup stuff has been removed, as it currently doesn't work (the <
and > characters were being dropped, which is not desired).

Additionally, applying some requests eventually broke the block
attributes, so a new solution to get these highlighted correctly needs
to be found.

If I can find a solution to the former, I can most likely also resolve
the latter, while reducing the codebase.
@Tyil
Copy link
Author

Tyil commented Jul 19, 2019

I will look into adding a spec once I get the rendering to look the way I intended.

@pyrmont pyrmont added author-action The PR has been reviewed but action by the author is needed and removed needs-review The PR needs to be reviewed labels Jul 19, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
author-action The PR has been reviewed but action by the author is needed
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

5 participants