Skip to content

HTTPS clone URL

Subversion checkout URL

You can clone with
or
.
Download ZIP

Loading…

GFM Enhancements (Backticks) #132

Merged
merged 3 commits into from

4 participants

@jaswsinc
  • Adding support for GFM backticks (for fenced code blocks); in additional to ~~~ already supported in Markdown Extra.
  • Adding support for GFM language marker with fenced code blocks.
@jaswsinc jaswsinc GFM Enhancements (Backticks)
- Adding support for GFM backticks ``` (for fenced code blocks); in additional to ~~~ already supported in Markdown Extra.
- Adding support for GFM language marker; e.g. ```php
2383db7
@jaswsinc

Hi Michel :-) I think the modifications I made (just minor tweaks); should get GFM-style code blocks working in Markdown Extra. If there are any problems, please let me know. I will be MORE than happy to adjust further and/or conform in any what that you deem necessary. Thanks for reviewing my pull request! :-)

It is now possible to do this in Markdown Extra

```
<?php
echo 'sample code';
echo 'oh nice! GFM-style fenced code blocks are now supported!';
?>
```

Also supports language specification (resulting in lang="php" as one example). This is really helpful because it allows the language of the code to be specified and later a syntax highlighter (such as Prettify) can do a better job :-)

```php
<?php
echo 'sample code';
echo 'oh nice! GFM-style fenced code blocks are now supported!';
?>
```

The existing ~~~ delimiters are still supported also of course.

~~~php
<?php
echo 'sample code';
echo 'oh nice! Markdown Extra fenced code blocks are still supported!';
?>
~~~
@michelf
Owner

The HTML lang attribute is not intended for programming languages. Quoting HTML5:

The lang attribute (in no namespace) specifies the primary language for the element's contents and for any of the element's attributes that contain text. Its value must be a valid BCP 47 language tag, or the empty string. Setting the attribute to the empty string indicates that the primary language is unknown.

Thus, "php" is an invalid value for attribute lang (not a BCP 47 language tag). Also, your language specification syntax is a conflicting subset of the optional standalone class name attribute syntax for a code block, which already works and is intended for the same purpose (identify the code's language), and also works similarly to other Markdown parsers out there. Both cannot coexist peacefully.

I'm also concerned by that you allow mixing backticks with tildes when writing a fence. It should be one or the other, mixing not allowed. Try replacing [`~]{3,} with (?:~{3,}|`{3,}), and also ([~`]+) with (~+|`+) in your regular expressions, that should do the trick.

@jaswsinc

I'm also concerned by that you allow mixing backticks with tildes when writing a fence. It should be one or the other, mixing not allowed. Try replacing [~]{3,} with (?:~{3,}|{3,}), and also ([~]+) with (~+|+) in your regular expressions, that should do the trick.

Yes, I agree on not mixing these together. Sorry, I overlooked that before. I updated this in my last commit per your instructions and tested this myself again to be sure it's all still working as expected. Looks great!


Thus, "php" is an invalid value for attribute lang (not a BCP 47 language tag). Also, your language specification syntax is a conflicting subset of the optional standalone class name attribute syntax for a code block, which already works and is intended for the same purpose (identify the code's language), and also works similarly to other Markdown parsers out there. Both cannot coexist peacefully.

I agree on this also (e.g. lang="" is not the right approach). It sounds like you want this out completely. However, before I take this out, I would ask that you please reconsider after my latest change to use data-lang="" instead, which is valid in HTML 5; see: http://ejohn.org/blog/html-5-data-attributes/

The reason I would like to see this in your lib.

This will NOT work as expected; e.g. GFM-style fenced code blocks; where the language is used commonly here at GitHub. Without the addition I propose, this will not work.

```php
<?php
echo 'hello world';
?>
```

With the latest change I propose.

Both cannot coexist peacefully.

Please correct me if I'm wrong, but I think they can. The existing implementation requires whitespace and a dotted notation; or curly brackets. The addition that I made allows the existing implementation/syntax to remain fully functional but also allows for the optional language specifier used with GFM-style fenced code blocks.

I can confirm that all of the following tests work as expected.

~~~
<?php
echo 'hello world';
?>
~~~

```
<?php
echo 'hello world';
?>
```

~~~php
<?php
echo 'hello world';
?>
~~~

```php
<?php
echo 'hello world';
?>
```

``` .html
<?php
echo 'hello world';
?>
```

~~~php .html
<?php
echo 'hello world';
?>
~~~

```php {.html #example-1}
<?php
echo 'hello world';
?>
```

This works because the language specifier CANNOT be preceded by whitespace, and it may NOT contain a dot . or a curly bracket { as would be required in the existing implementation.

$text = preg_replace_callback('{
        (?:\n|\A)
        # 1: Opening marker
        (
            (?:~{3,}|`{3,}) # 3 or more tildes/backticks.
        )
        # 2: Code language marker (optional).
        (
            [A-Za-z0-9_\-]* # Word chars and/or `-`.
        )
        [ ]*
        (?:
            \.?([-_:a-zA-Z0-9]+) # 3: Stand-alone class name.
        |
            '.$this->id_class_attr_catch_re.' # 4: Extra attributes.
        )?

That being said, if you just don't want this in there, please let me know and I'll take it out :-) Thanks Michel!

@jaswsinc

Update: I see the whitespace and dot are actually both optional in the existing implementation Hmm, it would be nice if the whitespace were required (e.g. [ ]+ instead of [ ]*), and that way they could both co-exist peacefully as you say. However, that would likely break things for people that have used your lib in this way previously. Bummer.

I'll leave this for a couple days for your review; otherwise I'll take out the language specifier completely unless you can think of a way to make it work effectively.

@michelf
Owner

The syntax allowing for a class class name is already intended to be a language specifier. We can't have two different syntaxes for the language specifier that differ only by one whitespace, that'd be too error-prone.

If you need the language specifier in the output to be more explicit, you can set the code_class_prefix configuration variable to "lang-" and the class name will become "lang-php", or "lang-c", etc. Details here.

If you need the language specifier to be something else in the output, then maybe we could add a configuration variable for that (if you can explain why the current output fails you). But that's a separate matter that should have nothing to do with Github-style code blocks.

@jaswsinc jaswsinc GFM-Style Fenced Code Blocks Only (no lang attribute)
Removing the additional code that I applied previously to implement the `data-lang` attribute. In retrospect, this was a bad idea; as the existing implementation does a good job with this already when used properly.

Many thanks to Michel Fortin for his help in accomplishing these changes :-) I hope this might be merged into the official copy at some point in the future.
ab786ae
@jaswsinc

Hi Michel. My latest commit removed the extra code I added previously to deal with data-lang; which is not necessary as you pointed out! Thanks for the help with this.

JasWSInc/php-markdown@ab786ae

Removing the additional code that I applied previously to implement the data-lang attribute. In retrospect, this was a bad idea; as the existing implementation does a good job with this already; when used properly.

Many thanks to Michel Fortin for the help in accomplishing these changes :-) I hope what is now just very MINOR tweaks; might be merged into the official copy at some point in the future.

@michelf
Owner

Looks good. I'll merge this for the next release.

@jaswsinc

Excellent news. Thank you Michel :-)

@markseu

I had a problem with markdown that mixes code span marker (single backtick) and code block (three backticks)... you might want to add a guard in function _hashHTMLBlocks_inMarkdown(). See forked example. Hope this helps!

@michelf
Owner

@markseu Unfortunately, code spans can have markers of any length, so strlen($tag)==1 won't do. This is so a code span can have backticks within them. So a backtick fenced code block also happen to be a valid code span, we just have to prioritize the code block if it matches. Here's the official documentation on Daring Fireball: http://daringfireball.net/projects/markdown/syntax#code

It's a good observation. I think this pull request has the same problem: it'll give priority to code spans in the HTML block parser. We might need to check for fenced code blocks first, and also be more stringent about matching them in a way that lets legitimate code spans pass through. Hard problem.

@markseu

@michelf Let me know how to help. I would be happy if PHP Markdown Extra (1.2.7 or later) adds GFM backticks and I can delete my fork.

@alrik11es

We really need this.

@michelf
Owner

@alrik11es I'm of the same opinion. We need this. Here's my plan:

  • incorporate new tests into MDTest -- michelf/mdtest#4
  • add a test for code spans and code blocks that could be mixed up by the HTML block parser given the order markers are checked in the HTML block aprser (as stated in my last comment)
  • add a test for a fenced code block containing another one using a different marker, in both directions
  • figure out tests that could potentially confuse the parser
  • fix any issue that arising from those tests
  • create a new version, push to lib branch, write the change log, update the website

I've reserved some time for this this week.

@michelf michelf merged commit ab786ae into michelf:lib
@michelf
Owner

Ok, so this is now merged. Rejoice! :-)

As a followup, in d6ed0d0 I fixed an issue where the HTML block parser was matching backtick fenced code blocks as code spans. This was most of the time inconsequential, as the HTML block parser treats the same way the content of a code span as it does with a code block. But if you had a marker-like backticks run somewhere in your code block it could cause problems.

@michelf michelf referenced this pull request
Closed

GitHub markdown #107

@markseu

Works great, thank you!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Commits on Sep 23, 2013
  1. @jaswsinc

    GFM Enhancements (Backticks)

    jaswsinc authored
    - Adding support for GFM backticks ``` (for fenced code blocks); in additional to ~~~ already supported in Markdown Extra.
    - Adding support for GFM language marker; e.g. ```php
Commits on Oct 1, 2013
  1. @jaswsinc
Commits on Oct 13, 2013
  1. @jaswsinc

    GFM-Style Fenced Code Blocks Only (no lang attribute)

    jaswsinc authored
    Removing the additional code that I applied previously to implement the `data-lang` attribute. In retrospect, this was a bad idea; as the existing implementation does a good job with this already when used properly.
    
    Many thanks to Michel Fortin for his help in accomplishing these changes :-) I hope this might be merged into the official copy at some point in the future.
This page is out of date. Refresh to see the latest.
Showing with 3 additions and 3 deletions.
  1. +3 −3 Michelf/Markdown.php
View
6 Michelf/Markdown.php
@@ -1846,7 +1846,7 @@ protected function _hashHTMLBlocks_inMarkdown($text, $indent = 0,
|
# Fenced code block marker
(?<= ^ | \n )
- [ ]{0,'.($indent+3).'}~{3,}
+ [ ]{0,'.($indent+3).'}(?:~{3,}|`{3,})
[ ]*
(?:
\.?[-_:a-zA-Z0-9]+ # standalone class name
@@ -1918,7 +1918,7 @@ protected function _hashHTMLBlocks_inMarkdown($text, $indent = 0,
#
# Check for: Fenced code block marker.
#
- else if (preg_match('{^\n?([ ]{0,'.($indent+3).'})(~+)}', $tag, $capture)) {
+ else if (preg_match('{^\n?([ ]{0,'.($indent+3).'})(~+|`+)}', $tag, $capture)) {
# Fenced code block marker: find matching end marker.
$fence_indent = strlen($capture[1]); # use captured indent in re
$fence_re = $capture[2]; # use captured fence in re
@@ -2767,7 +2767,7 @@ protected function doFencedCodeBlocks($text) {
(?:\n|\A)
# 1: Opening marker
(
- ~{3,} # Marker: three tilde or more.
+ (?:~{3,}|`{3,}) # 3 or more tildes/backticks.
)
[ ]*
(?:
Something went wrong with that request. Please try again.