Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

GFM Enhancements (Backticks) #132

Merged
merged 3 commits into from Nov 28, 2013
Merged

GFM Enhancements (Backticks) #132

merged 3 commits into from Nov 28, 2013

Conversation

jaswrks
Copy link
Contributor

@jaswrks jaswrks commented Sep 23, 2013

  • Adding support for GFM backticks (for fenced code blocks); in additional to ~~~ already supported in Markdown Extra.
  • Adding support for GFM language marker with fenced code blocks.

- Adding support for GFM backticks ``` (for fenced code blocks); in additional to ~~~ already supported in Markdown Extra.
- Adding support for GFM language marker; e.g. ```php
@jaswrks
Copy link
Contributor Author

jaswrks commented Sep 23, 2013

Hi Michel :-) I think the modifications I made (just minor tweaks); should get GFM-style code blocks working in Markdown Extra. If there are any problems, please let me know. I will be MORE than happy to adjust further and/or conform in any what that you deem necessary. Thanks for reviewing my pull request! :-)

It is now possible to do this in Markdown Extra

```
<?php
echo 'sample code';
echo 'oh nice! GFM-style fenced code blocks are now supported!';
?>
```

Also supports language specification (resulting in lang="php" as one example). This is really helpful because it allows the language of the code to be specified and later a syntax highlighter (such as Prettify) can do a better job :-)

```php
<?php
echo 'sample code';
echo 'oh nice! GFM-style fenced code blocks are now supported!';
?>
```

The existing ~~~ delimiters are still supported also of course.

~~~php
<?php
echo 'sample code';
echo 'oh nice! Markdown Extra fenced code blocks are still supported!';
?>
~~~

@michelf
Copy link
Owner

michelf commented Sep 25, 2013

The HTML lang attribute is not intended for programming languages. Quoting HTML5:

The lang attribute (in no namespace) specifies the primary language for the element's contents and for any of the element's attributes that contain text. Its value must be a valid BCP 47 language tag, or the empty string. Setting the attribute to the empty string indicates that the primary language is unknown.

Thus, "php" is an invalid value for attribute lang (not a BCP 47 language tag). Also, your language specification syntax is a conflicting subset of the optional standalone class name attribute syntax for a code block, which already works and is intended for the same purpose (identify the code's language), and also works similarly to other Markdown parsers out there. Both cannot coexist peacefully.

I'm also concerned by that you allow mixing backticks with tildes when writing a fence. It should be one or the other, mixing not allowed. Try replacing []{3,}with(?:{3,}|{3,}), and also ([~]+)with(~+|+) in your regular expressions, that should do the trick.

@jaswrks
Copy link
Contributor Author

jaswrks commented Oct 1, 2013

I'm also concerned by that you allow mixing backticks with tildes when writing a fence. It should be one or the other, mixing not allowed. Try replacing [~]{3,} with (?:~{3,}|{3,}), and also ([~]+) with (~+|+) in your regular expressions, that should do the trick.

Yes, I agree on not mixing these together. Sorry, I overlooked that before. I updated this in my last commit per your instructions and tested this myself again to be sure it's all still working as expected. Looks great!


Thus, "php" is an invalid value for attribute lang (not a BCP 47 language tag). Also, your language specification syntax is a conflicting subset of the optional standalone class name attribute syntax for a code block, which already works and is intended for the same purpose (identify the code's language), and also works similarly to other Markdown parsers out there. Both cannot coexist peacefully.

I agree on this also (e.g. lang="" is not the right approach). It sounds like you want this out completely. However, before I take this out, I would ask that you please reconsider after my latest change to use data-lang="" instead, which is valid in HTML 5; see: http://ejohn.org/blog/html-5-data-attributes/

The reason I would like to see this in your lib.

This will NOT work as expected; e.g. GFM-style fenced code blocks; where the language is used commonly here at GitHub. Without the addition I propose, this will not work.

```php
<?php
echo 'hello world';
?>
```

With the latest change I propose.

Both cannot coexist peacefully.

Please correct me if I'm wrong, but I think they can. The existing implementation requires whitespace and a dotted notation; or curly brackets. The addition that I made allows the existing implementation/syntax to remain fully functional but also allows for the optional language specifier used with GFM-style fenced code blocks.

I can confirm that all of the following tests work as expected.

~~~
<?php
echo 'hello world';
?>
~~~

```
<?php
echo 'hello world';
?>
```

~~~php
<?php
echo 'hello world';
?>
~~~

```php
<?php
echo 'hello world';
?>
```

``` .html
<?php
echo 'hello world';
?>
```

~~~php .html
<?php
echo 'hello world';
?>
~~~

```php {.html #example-1}
<?php
echo 'hello world';
?>
```

This works because the language specifier CANNOT be preceded by whitespace, and it may NOT contain a dot . or a curly bracket { as would be required in the existing implementation.

$text = preg_replace_callback('{
        (?:\n|\A)
        # 1: Opening marker
        (
            (?:~{3,}|`{3,}) # 3 or more tildes/backticks.
        )
        # 2: Code language marker (optional).
        (
            [A-Za-z0-9_\-]* # Word chars and/or `-`.
        )
        [ ]*
        (?:
            \.?([-_:a-zA-Z0-9]+) # 3: Stand-alone class name.
        |
            '.$this->id_class_attr_catch_re.' # 4: Extra attributes.
        )?

That being said, if you just don't want this in there, please let me know and I'll take it out :-) Thanks Michel!

@jaswrks
Copy link
Contributor Author

jaswrks commented Oct 1, 2013

Update: I see the whitespace and dot are actually both optional in the existing implementation Hmm, it would be nice if the whitespace were required (e.g. [ ]+ instead of [ ]*), and that way they could both co-exist peacefully as you say. However, that would likely break things for people that have used your lib in this way previously. Bummer.

I'll leave this for a couple days for your review; otherwise I'll take out the language specifier completely unless you can think of a way to make it work effectively.

@michelf
Copy link
Owner

michelf commented Oct 5, 2013

The syntax allowing for a class class name is already intended to be a language specifier. We can't have two different syntaxes for the language specifier that differ only by one whitespace, that'd be too error-prone.

If you need the language specifier in the output to be more explicit, you can set the code_class_prefix configuration variable to "lang-" and the class name will become "lang-php", or "lang-c", etc. Details here.

If you need the language specifier to be something else in the output, then maybe we could add a configuration variable for that (if you can explain why the current output fails you). But that's a separate matter that should have nothing to do with Github-style code blocks.

Removing the additional code that I applied previously to implement the `data-lang` attribute. In retrospect, this was a bad idea; as the existing implementation does a good job with this already when used properly.

Many thanks to Michel Fortin for his help in accomplishing these changes :-) I hope this might be merged into the official copy at some point in the future.
@jaswrks
Copy link
Contributor Author

jaswrks commented Oct 13, 2013

Hi Michel. My latest commit removed the extra code I added previously to deal with data-lang; which is not necessary as you pointed out! Thanks for the help with this.

https://github.com/JasWSInc/php-markdown/commit/ab786ae1d5f8d83ff661fb1dd5a78f07a27c8e8b

Removing the additional code that I applied previously to implement the data-lang attribute. In retrospect, this was a bad idea; as the existing implementation does a good job with this already; when used properly.

Many thanks to Michel Fortin for the help in accomplishing these changes :-) I hope what is now just very MINOR tweaks; might be merged into the official copy at some point in the future.

@michelf
Copy link
Owner

michelf commented Nov 6, 2013

Looks good. I'll merge this for the next release.

@jaswrks
Copy link
Contributor Author

jaswrks commented Nov 6, 2013

Excellent news. Thank you Michel :-)

@markseuffert
Copy link

I had a problem with markdown that mixes code span marker (single backtick) and code block (three backticks)... you might want to add a guard in function _hashHTMLBlocks_inMarkdown(). See forked example. Hope this helps!

@michelf
Copy link
Owner

michelf commented Nov 11, 2013

@markseu Unfortunately, code spans can have markers of any length, so strlen($tag)==1 won't do. This is so a code span can have backticks within them. So a backtick fenced code block also happen to be a valid code span, we just have to prioritize the code block if it matches. Here's the official documentation on Daring Fireball: http://daringfireball.net/projects/markdown/syntax#code

It's a good observation. I think this pull request has the same problem: it'll give priority to code spans in the HTML block parser. We might need to check for fenced code blocks first, and also be more stringent about matching them in a way that lets legitimate code spans pass through. Hard problem.

@markseuffert
Copy link

@michelf Let me know how to help. I would be happy if PHP Markdown Extra (1.2.7 or later) adds GFM backticks and I can delete my fork.

@alrik11es
Copy link

We really need this.

@michelf
Copy link
Owner

michelf commented Nov 28, 2013

@alrik11es I'm of the same opinion. We need this. Here's my plan:

  • incorporate new tests into MDTest -- Add test cases for backtick fenced code blocks mdtest#4
  • add a test for code spans and code blocks that could be mixed up by the HTML block parser given the order markers are checked in the HTML block aprser (as stated in my last comment)
  • add a test for a fenced code block containing another one using a different marker, in both directions
  • figure out tests that could potentially confuse the parser
  • fix any issue that arising from those tests
  • create a new version, push to lib branch, write the change log, update the website

I've reserved some time for this this week.

@michelf michelf merged commit ab786ae into michelf:lib Nov 28, 2013
@michelf
Copy link
Owner

michelf commented Nov 28, 2013

Ok, so this is now merged. Rejoice! :-)

As a followup, in d6ed0d0 I fixed an issue where the HTML block parser was matching backtick fenced code blocks as code spans. This was most of the time inconsequential, as the HTML block parser treats the same way the content of a code span as it does with a code block. But if you had a marker-like backticks run somewhere in your code block it could cause problems.

@michelf michelf mentioned this pull request Nov 28, 2013
@markseuffert
Copy link

Works great, thank you!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants