Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Use Textmate's HAML grammar #3627

Merged
merged 2 commits into from
May 16, 2017
Merged

Use Textmate's HAML grammar #3627

merged 2 commits into from
May 16, 2017

Conversation

vmg
Copy link
Contributor

@vmg vmg commented May 16, 2017

The language-haml syntax for HAML has a recursive include (https://github.com/ezekg/language-haml/blob/master/grammars/ruby%20haml.cson#L360-L408) which is causing unbounded nesting of tags when parsing some HAML documents.

As an example, see the HTML source for this rendered document: https://github.com/ManageIQ/manageiq-ui-classic/blob/9b4bfe77184a177f24159fce36cd9377a2f020da/app/views/layouts/angular/_multi_auth_credentials.html.haml

It's not clear to me how to fix the HAML grammar, but I really believe the recursive inclusion is not correct. Hence, I propose we rollback to TextMate's grammar, which does not have this issue.

cc @github/linguist
Fixes https://github.com/github/github/issues/72875

@vmg
Copy link
Contributor Author

vmg commented May 16, 2017

Warnings:
vendor/licenses/grammar/ruby-haml.tmbundle.txt:
  - license needs reviewed: other.

How do I review a license? :)

@Alhadis
Copy link
Collaborator

Alhadis commented May 16, 2017

When you say "unbounded nesting", which lines are you referring to?

I've only taken a cursory glance through the current HAML grammar, but I see a few oddities:

  1. Matching newlines in patterns is handled inconsistently between implementors of the TextMate grammar system. Having a newline included as part of a match is actually a hack used by grammar authors: \n should actually be \n? instead.

  2. This line isn't doing anything. It should be this instead:

    -'patterns': [
    +'captures': [
    +  '0': 'patterns': [
  3. This should end in haml, not html, I assume.

  4. This should also include an .embedded scope, so it becomes source.embedded.ruby.

I doubt any of these errors are what're causing the nesting errors. May I see exactly which lines in the file you linked to are being directly affected?

@Alhadis
Copy link
Collaborator

Alhadis commented May 16, 2017

Regarding the license review: I believe our approach is to whitelist the license file's checksum in test/test_grammars.rb. @pchaigno knows more about that than I do, however.

@pchaigno Should the hash list be moved to a config file in lib/linguist/? It took me ages to recall where it was located...

@lildude
Copy link
Member

lildude commented May 16, 2017

Further to @Alhadis's comments, it appears the original author (@meganemura) of the switch (#3355) soon encountered issues with the grammar they implemented (see #3371) and asked for this to be reverted but it was missed.

With that in mind, I think it may be worth reverting (this PR) to address #3371 and https://github.com/github/github/issues/68429 and https://github.com/github/github/issues/72875

As for the failing test: the license that has been detected (other) is none of those on the whitelist. This was "permissive" when we last used this grammar as we can see in #3355. I'm not sure how it got that though.

@Alhadis
Copy link
Collaborator

Alhadis commented May 16, 2017

Oh! I forgot all about that...

Alright, in that case, revert the hell out of this. =)

@vmg
Copy link
Contributor Author

vmg commented May 16, 2017

@Alhadis: Just for completeness, you can see how https://github.com/ezekg/language-haml/blob/master/grammars/ruby%20haml.cson#L408 includes the original pattern again (i.e. recursively) when hitting a chunk of inline Ruby code.

The result is that if you have several lines in a row, each line contains all the tags from the previous lines.

It's very obvious in the HAML file I posted because of the huge chunks of redundant HTML in the page:

<td id="LC308" class="blob-code blob-code-inner js-file-line"><span class="pl-sre"><span class="pl-s1"><span class="pl-sre"><span class="pl-s1"><span class="pl-sre"><span class="pl-s1"><span class="pl-sre"><span class="pl-s1"><span class="pl-sre"><span class="pl-s1"><span class="pl-sre"><span class="pl-s1"><span class="pl-sre"><span class="pl-s1"><span class="pl-sre"><span class="pl-s1"><span class="pl-sre"><span class="pl-s1"><span class="pl-sre"><span class="pl-s1"><span class="pl-sre"><span class="pl-s1"><span class="pl-sre"><span class="pl-s1"><span class="pl-sre"><span class="pl-s1"><span class="pl-sre"><span class="pl-s1"><span class="pl-sre"><span class="pl-s1"><span class="pl-sre"><span class="pl-s1"><span class="pl-sre"><span class="pl-s1"><span class="pl-sre"><span class="pl-s1"><span class="pl-sre"><span class="pl-s1"><span class="pl-sre"><span class="pl-s1"><span class="pl-sre"><span class="pl-s1"><span class="pl-sre"><span class="pl-s1"><span class="pl-sre"><span class="pl-s1"><span class="pl-sre"><span class="pl-s1"><span class="pl-sre"><span class="pl-s1"><span class="pl-sre"><span class="pl-s1"><span class="pl-sre"><span class="pl-s1"><span class="pl-sre"><span class="pl-s1"><span class="pl-sre"><span class="pl-s1"><span class="pl-sre"><span class="pl-s1"><span class="pl-sre"><span class="pl-s1"><span class="pl-sre"><span class="pl-s1"><span class="pl-sre"><span class="pl-s1"><span class="pl-sre"><span class="pl-s1"><span class="pl-sre"><span class="pl-s1"><span class="pl-sre"><span class="pl-s1"><span class="pl-sre"><span class="pl-s1"><span class="pl-sre"><span class="pl-s1"><span class="pl-sre"><span class="pl-s1"><span class="pl-sre"><span class="pl-s1"><span class="pl-sre"><span class="pl-s1"><span class="pl-sre"><span class="pl-s1"><span class="pl-sre"><span class="pl-s1"><span class="pl-sre"><span class="pl-s1"><span class="pl-sre"><span class="pl-s1"><span class="pl-sre"><span class="pl-s1"><span class="pl-sre"><span class="pl-s1"><span class="pl-sre"><span class="pl-s1"><span class="pl-sre"><span class="pl-s1"><span class="pl-sre"><span class="pl-s1"><span class="pl-sre"><span class="pl-s1"><span class="pl-sre"><span class="pl-s1"><span class="pl-sre"><span class="pl-s1"><span class="pl-sre"><span class="pl-s1"><span class="pl-sre"><span class="pl-s1"><span class="pl-sre"><span class="pl-s1"><span class="pl-sre"><span class="pl-s1"><span class="pl-sre"><span class="pl-s1"><span class="pl-sre"><span class="pl-s1"><span class="pl-sre"><span class="pl-s1"><span class="pl-sre"><span class="pl-s1"><span class="pl-sre"><span class="pl-s1"><span class="pl-sre"><span class="pl-s1"><span class="pl-sre"><span class="pl-s1"><span class="pl-sre"><span class="pl-s1"><span class="pl-sre"><span class="pl-s1"><span class="pl-sre"><span class="pl-s1"><span class="pl-sre"><span class="pl-s1"><span class="pl-sre"><span class="pl-s1"><span class="pl-sre"><span class="pl-s1"><span class="pl-sre"><span class="pl-s1"><span class="pl-sre"><span class="pl-s1"><span class="pl-sre"><span class="pl-s1"><span class="pl-sre"><span class="pl-s1"><span class="pl-sre"><span class="pl-s1"><span class="pl-sre"><span class="pl-s1"><span class="pl-sre"><span class="pl-s1"><span class="pl-sre"><span class="pl-s1"><span class="pl-sre"><span class="pl-s1"><span class="pl-sre"><span class="pl-s1"><span class="pl-sre"><span class="pl-s1"><span class="pl-sre"><span class="pl-s1"><span class="pl-sre"><span class="pl-s1"><span class="pl-sre"><span class="pl-s1"><span class="pl-sre"><span class="pl-s1"><span class="pl-sre"><span class="pl-s1"><span class="pl-sre"><span class="pl-s1"><span class="pl-sre"><span class="pl-s1"><span class="pl-sre"><span class="pl-s1"><span class="pl-sre"><span class="pl-s1"><span class="pl-sre"><span class="pl-s1"><span class="pl-sre"><span class="pl-s1"><span class="pl-sre"><span class="pl-s1"><span class="pl-sre"><span class="pl-s1"><span class="pl-sre"><span class="pl-s1"><span class="pl-sre"><span class="pl-s1"><span class="pl-sre"><span class="pl-s1"><span class="pl-sre"><span class="pl-s1"><span class="pl-sre"><span class="pl-s1"><span class="pl-sre"><span class="pl-s1"><span class="pl-sre"><span class="pl-s1"><span class="pl-sre"><span class="pl-s1"><span class="pl-sre"><span class="pl-s1"><span class="pl-sre"><span class="pl-s1"><span class="pl-sre"><span class="pl-s1"><span class="pl-sre"><span class="pl-s1"><span class="pl-sre"><span class="pl-s1"><span class="pl-sre"><span class="pl-s1"><span class="pl-sre"><span class="pl-s1"><span class="pl-sre"><span class="pl-s1"><span class="pl-sre"><span class="pl-s1"><span class="pl-sre"><span class="pl-s1"><span class="pl-sre"><span class="pl-s1"><span class="pl-sre"><span class="pl-s1"><span class="pl-sre"><span class="pl-s1"><span class="pl-sre"><span class="pl-s1"><span class="pl-sre"><span class="pl-s1"><span class="pl-sre"><span class="pl-s1"><span class="pl-sre"><span class="pl-s1"><span class="pl-sre"><span class="pl-s1"><span class="pl-sre"><span class="pl-s1"><span class="pl-sre"><span class="pl-s1"><span class="pl-sre"><span class="pl-s1"><span class="pl-sre"><span class="pl-s1"><span class="pl-sre"><span class="pl-s1"><span class="pl-sre"><span class="pl-s1"><span class="pl-sre"><span class="pl-s1"><span class="pl-sre"><span class="pl-s1"><span class="pl-sre"><span class="pl-s1"><span class="pl-sre"><span class="pl-s1"><span class="pl-sre"><span class="pl-s1"><span class="pl-sre"><span class="pl-s1"><span class="pl-sre"><span class="pl-s1"><span class="pl-sre"><span class="pl-s1"><span class="pl-sre"><span class="pl-s1"><span class="pl-sre"><span class="pl-s1"><span class="pl-sre"><span class="pl-s1"><span class="pl-sre"><span class="pl-s1"><span class="pl-sre"><span class="pl-s1"><span class="pl-sre"><span class="pl-s1"><span class="pl-sre"><span class="pl-s1"><span class="pl-sre"><span class="pl-s1"><span class="pl-sre"><span class="pl-s1"><span class="pl-sre"><span class="pl-s1"><span class="pl-sre"><span class="pl-s1"><span class="pl-sre"><span class="pl-s1"><span class="pl-sre"><span class="pl-s1"><span class="pl-sre"><span class="pl-s1">              =<span class="pl-sre"> _(&quot;Used to gather Capacity &amp; Utilization metrics.&quot;)</span></span></span></span></span></span></span></span></span></span></span></span></span></span></span></span></span></span></span></span></span></span></span></span></span></span></span></span></span></span></span></span></span></span></span></span></span></span></span></span></span></span></span></span></span></span></span></span></span></span></span></span></span></span></span></span></span></span></span></span></span></span></span></span></span></span></span></span></span></span></span></span></span></span></span></span></span></span></span></span></span></span></span></span></span></span></span></span></span></span></span></span></span></span></span></span></span></span></span></span></span></span></span></span></span></span></span></span></span></span></span></span></span></span></span></span></span></span></span></span></span></span></span></span></span></span></span></span></span></span></span></span></span></span></span></span></span></span></span></span></span></span></span></span></span></span></span></span></span></span></span></span></span></span></span></span></span></span></span></span></span></span></span></span></span></span></span></span></span></span></span></span></span></span></span></span></span></span></span></span></span></span></span></span></span></span></span></span></span></span></span></span></span></span></span></span></span></span></span></span></span></span></span></span></span></span></span></span></span></span></span></span></span></span></span></span></span></span></span></span></span></span></span></span></span></span></span></span></span></span></span></span></span></span></span></span></span></span></span></span></span></span></span></span></span></span></span></span></span></span></span></span></span></span></span></span></span></span></span></span></span></span></span></span></span></span></span></span></span></span></span></span></span></span></span></span></span></span></span></span></span></span></span></span></span></span></span></span></span></span></span></span></span></span></span></span></span></td>
      </tr>

That is one line; the following line has the same contents + 2 extra tags, and the following one +4, and so on.

@lildude: Fixed the licensing. Can you please approve?

@vmg vmg merged commit a1e09ae into master May 16, 2017
@vmg
Copy link
Contributor Author

vmg commented May 16, 2017

Thank you @Alhadis!

@lildude lildude deleted the vmg/tm-haml branch May 16, 2017 10:46
@Alhadis
Copy link
Collaborator

Alhadis commented May 16, 2017

@bighairydude beat @lildude. =) You're most welcome.

@meganemura
Copy link
Contributor

meganemura commented May 16, 2017

@lildude Thanks for the mention to me. I'll close the #3371.

@pchaigno
Copy link
Contributor

@pchaigno Should the hash list be moved to a config file in lib/linguist/? It took me ages to recall where it was located...

It was supposed to be only temporary, but I'm having a hard time getting rid of these unrecognized licenses. I suppose we could move the whitelists to their own file.

@vmg Has the issue been reported to the language-haml grammar? Would be great if we could go back to that grammar in the long term, since it receives updates more frequently.

@Alhadis
Copy link
Collaborator

Alhadis commented May 16, 2017

I'll take a break from developing a graphical troff emulator for Atom to have a look at what's wrong...

@vmg
Copy link
Contributor Author

vmg commented May 16, 2017

@pchaigno: No, I have not reported anything upstream. As @Alhadis already reviewed earlier in the thread, that grammar is very complex and has a lot of dubious features.

Of course if anybody wants to spend some time fixing the bugs there, that'd be awesome. Opensource is magic ✨🌈

@Alhadis
Copy link
Collaborator

Alhadis commented May 16, 2017

Already fixed. How's this look?

@lildude
Copy link
Member

lildude commented May 16, 2017

Already fixed. How's this look?

I wouldn't trust it 😉. Lightshow is currently running Linguist 4.3.1 😱

I'll be updating it later.

@vmg
Copy link
Contributor Author

vmg commented May 16, 2017

@Alhadis Dayum, that works like a charm. Highlighting looks great and it doesn't create any redundant nodes. I've verified it using our production highlighting engine. 🙏

Could you please upstream that? We can then revert this PR right away.

@Alhadis
Copy link
Collaborator

Alhadis commented May 16, 2017

Already on it. =)

@Alhadis
Copy link
Collaborator

Alhadis commented May 16, 2017

:sorted => HarlemSquirrel/language-haml#60

#hashrocket #hashtag #hashtagrocket

ezekg pushed a commit to HarlemSquirrel/language-haml that referenced this pull request May 16, 2017
@lildude
Copy link
Member

lildude commented May 16, 2017

Upstream fix has been merged (🙇 @Alhadis) so I'm going to revert this and then update the grammar SHA.

lildude added a commit that referenced this pull request May 16, 2017
@vmg
Copy link
Contributor Author

vmg commented May 16, 2017

@Alhadis: Since I have you here (he hehehe), I've just noticed another bug on that grammar. It's basically #3371 -- where some string interpolations with # are showing up as comments instead.

I'm afraid your changes didn't catch that one. If you have some spare time to look into that, that'd be n e a t o. ✨

lildude added a commit that referenced this pull request May 16, 2017
* Revert "Use Textmate's HAML grammar (#3627)"

This reverts commit a1e09ae.

* Add back missing grammar sources
@Alhadis
Copy link
Collaborator

Alhadis commented May 16, 2017

Sorry! I was AFK (which happens every now and then).

I'll tackle it now.

@Alhadis
Copy link
Collaborator

Alhadis commented May 16, 2017

Is it possible to update the version of Linguist used by Lightshow? Or is that a bit more complicated than punching in a few commands?

Both Atom and Lightshow generate the same output, which is different to what's showing on GitHub. Which kinda makes it impossible to tell if changes will actually fix anything...

Other than Lightshow, is there anything else I can use for previewing highlighted output?

@lildude
Copy link
Member

lildude commented May 16, 2017

Is it possible to update the version of Linguist used by Lightshow? Or is that a bit more complicated than punching in a few commands?

As simple as updating a Gemfile and redeploying 😄 I was going to wait for the next linguist, but I can do it now if it helps you.

@Alhadis
Copy link
Collaborator

Alhadis commented May 16, 2017

Yes please! =) That'd be ace.

@Alhadis
Copy link
Collaborator

Alhadis commented May 16, 2017

@lildude Please ping me once it's done. I'm afraid there's very little I can do in the meantime.

@lildude
Copy link
Member

lildude commented May 16, 2017

@Alhadis will do. Things are so far behind that it's taking a little more than updating a Gemfile to solve. Will let you know when it's good to go.

@lildude
Copy link
Member

lildude commented May 20, 2017

@Alhadis Done. Lightshow is now running the same version of the parser and linguist (5.0.9) as we're running on github.com.

I've also added a little tweak to the interface to show the version of Linguist so peeps can determine this easily and give us a prod when we forget to update Lightshow 😄

If you have the time, and are able to sort out the HAML issue and get it merged by the end of the day on Thursday, that would be amazing.

@ezekg
Copy link

ezekg commented May 20, 2017

Hey all! Maintainer of the Haml language package jumping in here—so I'm curious as to why GitHub renders the grammar differently than Atom? Running this test code through Lightshow (and of course on GitHub) renders very different results than Atom.

This is what it looks like in Atom (the whole file renders is correctly),

image

Is the GitHub parser vastly different than the one you're using in Atom?

Let me know if I need to dig in and fix things with the Haml grammar—when I took over that language package I didn't know a whole lot about grammars. After taking it over, I went in and closed out a lot of the issues that had been lingering in the Textmate grammar (mostly multiline stuff), but most of that was honestly by trial and error playing with different regexes.

tldr; I knew regex, but not how to put together a grammar so it may be incorrect/messy.

@Alhadis
Copy link
Collaborator

Alhadis commented May 23, 2017

Pushed what I believe will be a fix.

It's a silly kludge but it fixes the interpolation highlighting:

Figure 1

@lildude
Copy link
Member

lildude commented May 23, 2017

Is the GitHub parser vastly different than the one you're using in Atom?

@ezekg it is quite different so there can be some slight differences. @Alhadis is probably the most knowledgable when it comes to getting grammar working correctly with the GitHub parser.

I think to keep this alive and getting the attention it needs, @ezekg can you move this to a new issue? 🙇

ezekg pushed a commit to HarlemSquirrel/language-haml that referenced this pull request May 23, 2017
@lildude lildude mentioned this pull request May 25, 2017
@github-linguist github-linguist locked as resolved and limited conversation to collaborators Jun 17, 2024
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants