Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Quoted-printable encoding unrecognized on GitHub #15

Closed
1 task done
pchaigno opened this issue Aug 14, 2018 · 12 comments
Closed
1 task done

Quoted-printable encoding unrecognized on GitHub #15

pchaigno opened this issue Aug 14, 2018 · 12 comments
Labels
Bug ❤️ Community By the Open Source Community Help Wanted

Comments

@pchaigno
Copy link

pchaigno commented Aug 14, 2018

Prerequisites

I haven't reproduced the issue in Atom. Please see the description.

Description

We're considering (cf. github-linguist/linguist#4201) using this EML grammar to highlight all .eml and .mbox files on GitHub and GitLab. Unfortunately, the grammar's output has some errors (see Lightshow example) for quoted-printable encoding (if that's not obvious, I don't know much about EML).

I haven't tested under Atom yet, so it's possible the issue is only with GitHub's interpreter for grammars. GitHub uses a PCRE-based regular expression engine, which has a few differences in interpretation. I had a quick look at the grammar and did not spotted any though.

Steps to Reproduce

  1. Try to highlight a EML file containing quoted-printable encoding with Lightshow.

Expected behavior: Characters following the quoted-printable marker (=3D) to be highlighted correctly.

Actual behavior: Characters following the quoted-printable marker are highlighted with a red background.

Reproduces how often: 100%

Versions

The highlighting bug happens with commit c9f354e of the present grammar.

Additional Information

Lightshow is a simple interface to test the highlighter used on GitHub.com. It simply takes the grammar provided and interprets it on the given example.

@pchaigno
Copy link
Author

@mariozaizar Did you get a chance to look into this?

@mariozaizar
Copy link
Owner

@pchaigno I'm sorry, somehow I'm didn't got this notification until now. Let me take a closer look in the next couple days. Thanks for reporting it 🙏

@wesinator
Copy link
Contributor

=3D text looks normal in Atom; the 2 color code values are still highlighted red by the inherited HTML highlighting.

<body lang=3D"RU" link=3D"#0563C1" vlink=3D"#954F72">

@pchaigno
Copy link
Author

=3D text looks normal in Atom; the 2 color code values are still highlighted red by the inherited HTML highlighting.

@wesinator So the highlighting is the same as with Lightshow? Could you post a screenshot of the result?

@wesinator
Copy link
Contributor

@pchaigno No, Atom has the expected behaviour.

<meta http-equiv=3D"Content-Type" content=3D"text/html; charset=3Dwindows-1=
is not highlighted red.

image

@pchaigno
Copy link
Author

@wesinator Ah, I see. Thanks!

@Alhadis Looks like there is a discrepancy between the interpretations GitHub and Atom are having for this grammar. I couldn't find any non-PCRE-compliant regular expressions, but maybe you will...?

@Alhadis
Copy link
Contributor

Alhadis commented Sep 24, 2018

The expressions aren't the problem. This is:

# Content-Type: text/html;
{
	'match': '^(?i)<html>(.|\s)*<\/html>$',
	'include': 'text.html.basic'
},

We're still using the old TextMate grammar for HTML highlighting, which is the likely culprit. Ergo, we have two choices:

  1. Submit a fix upstream
  2. Switch to Atom's grammar for HTML highlighting (recommended)

@Alhadis
Copy link
Contributor

Alhadis commented Sep 24, 2018

FYI, this grammar has numerous mistakes (several patterns are incorrectly escaped, and the HTML pattern mentioned above is ignoring the match key because the include rule takes precedence. The intended entry should have been:

 # Content-Type: text/html;
 {
 	'match': '^(?i)<html>(.|\s)*<\/html>$',
-	'include': 'text.html.basic'
+	'captures':
+		'0':
+			'patterns': [
+				{
+					'include': 'text.html.basic'
+				}
+			]
+	}
 },

Or much more succinctly:

 # Content-Type: text/html;
 {
 	'match': '^(?i)<html>(.|\s)*<\/html>$',
-	'include': 'text.html.basic'
+	'captures':
+		'0': 'patterns': ['include': 'text.html.basic']
 },

But this alone is inadequate, because:

  1. TextMate patterns are restricted to matching one line at a time, so <html>(.|\s)</html> will only match lines which start with <html> and end in </html>. Multiline markup isn't matched.
  2. An opening <html> tag with attributes will fail to match, as in the examples.

The only reason there even is HTML highlighting is because of the incorrect captures key (explained above). Which is also the reason HTML highlighting appears outside of a pair of <html> tags.

I'll send a pull-request.

@mariozaizar
Copy link
Owner

mariozaizar commented Sep 24, 2018

Sorry folks, I'm quite disconnected from work and opensource for the last weeks due to an imminent baby arrival 🍼 ; but if you guys want to open PRs against this addressable comments I'm more than happy to review and merge to keep things moving. Thanks again!

@mariozaizar
Copy link
Owner

💡I just published John's fixes #16 as v1.1.1.

Unfortunately, I'm still seeing the red background on Lightshow using the new version in the same way as the old version, but as Ԝеѕ pointed out, Atom doesn't show any red background 🤔

bug

@Alhadis
Copy link
Contributor

Alhadis commented Sep 25, 2018

Aye, that's because it's loading the old TextMate grammar, which is applying the highlighting. I'm literally in the middle of prepping a PR to get it replaced on Linguist's side.

Once merged, changes will become visible across the site when the next release of Linguist has been cut.

@pchaigno
Copy link
Author

Closing this as it's an issue with the HTML grammar. Thanks a lot @Alhadis!

@mariozaizar Once the fix is in place at Linguist, we're likely to use this grammar to highlight all .eml and .mbox files on GitHub.com (cf. github-linguist/linguist#4201).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Bug ❤️ Community By the Open Source Community Help Wanted
Projects
None yet
Development

No branches or pull requests

4 participants