Recovering segments from pygmetized code fails for some lexers #143

shiftyp · 2013-12-01T10:00:20Z

The regex that splits the segments from the pygmetized code assumes that there will be a single space between SEGMENT and DIVIDER. This fails for some lexers. Relaxing this assumption fixes the issue.

…hat there will be a single space between SEGMENT and DIVIDER. This fails for some lexers. Relaxing this assumption fixes the issue.

kmdavis · 2013-12-20T16:09:04Z

lib/utils.coffee

@@ -509,7 +509,7 @@ module.exports = Utils =
      result = result.replace('<div class="highlight"><pre>', '').replace('</pre></div>', '')

      # Extract our segments from the pygmentized source.
-      highlighted = "\n#{result}\n".split ///.*<span.*#{seg}\s#{div}.*<\/span>.*///
+      highlighted = "\n#{result}\n".split ///.*<span.*#{seg}.*?#{div}.*<\/span>.*///


Using .? seems a bit dangerous. It could match Anything. \s would be safer.

I agree its not ideal, but for some languages, JSP for instance, the words SEGMENT and DIVIDER are in separate spans. It might be possible to come up with a more exact regex. I'll try and post an example of the parsed JSP comment HTML today and we can discuss options.

they wind up in separate spans??? wow... I wasn't expecting that...

this is especially disconcerting because ORIGINALLY "#{seg}\s#{div}" was just a literal "SEGMENT DIVIDER"

I split it up into 2 words like this to prevent an error when running groc on itself.

you need this for the xml/html language support?

I suspect that the lexer might be doing something wierd... hmm...

one option might be: (?:\s|INSERT_VERY_SPECIFIC_REGEX_FOR_THIS_ISSUE) so we match either a single space Or the extra close/open span

I suppose .*? was a bit lazy of me, sorry. Here is the output for JSP comments:

<%-- SEGMENT DIVIDER --%>

so   is the bad stuff we need to match

(?:\s|<\/span>\s*<span\sclass="n">)

Yeah, that would work. I'm not sure if the same pattern applies to comments in other languages though. I'll have to check XML and HTML.

sjorek · 2013-12-21T10:09:49Z

Please read this comment: #134 (comment)

If we implement the code-segmentation in the proper tool, which is to my mind the syntax-highlighter, we would not just simplify groc's implementation, because groc would get a well known JSON-formatted stream of raw-comments and formatted-code, but also raise groc's precision dramatically. Another benefit of this approach is, to add new syntax-highlighters, we just need to implement this kind of glue-API that spits out the preprocessed JSON-stream. Possible candidates: ruby's rouge or node's highlight.js. As rouge uses the same styles as pygments, only highlight.js would have a different styling.

This was referenced Dec 1, 2013

Added JSP language support #142

Merged

Added XML/HTML language support #141

Open

The regex that splits the segments from the pygmetized code assumes t…

7c96417

…hat there will be a single space between SEGMENT and DIVIDER. This fails for some lexers. Relaxing this assumption fixes the issue.

kmdavis reviewed Dec 20, 2013
View reviewed changes

sjorek mentioned this pull request Dec 21, 2013

Add Multi Line Comments to SCSS #134

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Recovering segments from pygmetized code fails for some lexers #143

Recovering segments from pygmetized code fails for some lexers #143

shiftyp commented Dec 1, 2013

kmdavis Dec 20, 2013

shiftyp Dec 20, 2013

kmdavis Dec 20, 2013

kmdavis Dec 20, 2013

kmdavis Dec 20, 2013

kmdavis Dec 20, 2013

shiftyp Dec 20, 2013

kmdavis Dec 20, 2013

shiftyp Dec 20, 2013

sjorek commented Dec 21, 2013

Recovering segments from pygmetized code fails for some lexers #143

Are you sure you want to change the base?

Recovering segments from pygmetized code fails for some lexers #143

Conversation

shiftyp commented Dec 1, 2013

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

sjorek commented Dec 21, 2013