better approach to extending markdown #74

cebe · 2014-01-29T17:01:18Z

this approach is totally based on callbacks and more consistent.

obsoletes #70 and #73

this approach is totally based on callbacks and more consistent. obsoletes erusev#70 and erusev#73

cebe · 2014-01-29T17:16:57Z

Usage example for code highlighting:

$handler = function(&$block, &$markup)
{
    if (strncmp($block['text'], '<?php', 5) === 0) {
        $text = highlight_string($block['text'], true);
    } else {
        $text = highlight_string("<?php\n".$block['text'], true);
    }

        $markup .= '<pre><code';

        if (isset($block['language']))
        {
            if ($block['language'] !== 'php') {
                return false;
            }
            $markup .= ' class="language-'.$block['language'].'"';
        }

    $markup .= '>'.$text.'</code></pre>'."\n";
    return true;
}

$p  = Parsedown::instance();
$p->register_block_hander('code', $handler);
$p->register_block_hander('fenced', $handler);
$p->parse(...);

scottchiefbaker · 2014-01-29T17:22:25Z

This would be to override the internal handlers for certain types of blocks?

cebe · 2014-01-29T17:41:04Z

if you return true you will overide it completely i.e. only your implementation will handle it. if function returns nothing or false you can manipulate the block but let Parsedown do the rendering.

keeping the array index itself does not help

scottchiefbaker · 2014-01-29T17:58:20Z

Perfect... I like this implementation. There is going to be some serious documentation/examples that will need to be written to support this. Are you also able to do that?

cebe · 2014-01-29T17:59:17Z

Sure, just wanted to make sure it is going to get accepted before writing docs.

scottchiefbaker · 2014-01-29T18:04:10Z

It's not my project, but I fully support this. I imagine @erusev's main concern would be performance for instances where the user is NOT doing any overrides. We'd want to make very sure the impact is low.

cebe · 2014-01-29T18:11:38Z

the effect on span parsing is nearly 0 as I only added isset() check in default part of switch.
In block parsing there one isset() call more per block which should also not do too much as it is one of the fastest things you can do in php.
the only thing that may have any impact is ksort() that is added to span parsing and comes ones per span element that occurs. Still nothing I would be worried about.
Imo it is worth compared to the flexibility it brings.

scottchiefbaker · 2014-01-29T18:19:52Z

Some benchmarks before/after on big blocks of Markdown would be pretty simple to do.

scottchiefbaker · 2014-01-29T18:24:47Z

If you can implement/document this without a major impact to the speed of the code (one of the major focuses of Parsedown) I'd fully support this. It seems like a great feature.

hkdobrev · 2014-01-29T20:56:37Z

@cebe These handlers should really just return HTML. Every function is a better function if it relies only on its arguments and return a value based on these arguments. The so-called "pure" functions.

I don't see why the $block and $markup should be passed by reference.

Here is an example:

$handler = function($block)
{
    if (strncmp($block['text'], '<?php', 5) === 0) {
        $text = highlight_string($block['text'], true);
    } else {
        $text = highlight_string("<?php\n".$block['text'], true);
    }

    $markup = '<pre><code';

    if (isset($block['language']))
    {
        if ($block['language'] !== 'php') {
            return null;
        }
        $markup .= ' class="language-'.$block['language'].'"';
    }

    $markup .= '>'.$text.'</code></pre>'."\n";
    return $markup;
}

$p  = Parsedown::instance();
$p->register_block_hander('code', $handler);
$p->register_block_hander('fenced', $handler);
$p->parse(...);

What you gain with your current implementation is the ability to update not only the markup, but the $block as well. Which would be a very rare case IMHO.

You could write some tests about this.

You could squash your commits into one.

cebe · 2014-02-01T18:15:13Z

Thanks for your review and comments, will get back to them when I find the time for it.

* master: simplify em/strong routine outdented is shorter and probably more accurate improve contributing guidelines improve consistency of list item add contributing guidelines dense list items that follow sparse ones should not be rendered as sparse ones improve parsing of list item and code block by measuring line indentation Remove one unnecessary /u flag. Remove /u flag from '*' chars. Add /u to urls. some edge case tests for the code tag Add unicode support for strong/em regex.

cebe · 2014-02-03T14:11:54Z

Some benchmarks before/after on big blocks of Markdown would be pretty simple to do.

Created a benchmark:

<?php

require('Parsedown.php');

$markdown = file_get_contents('http://daringfireball.net/projects/markdown/syntax.text');

$m = [];
$t = [];

for ($n = 0; $n < 1000; $n++) {
    $oldmem = memory_get_usage();
    $begin = microtime(true);

    // ---

    $pd = new Parsedown();
    $pd->parse($markdown);

    // ---

    $mem = memory_get_usage();
    $t[$n] = microtime(true) - $begin;
    $m[$n] = $mem - $oldmem;
    unset($pd);
}

// discard the first iteration
array_shift($m);
array_shift($t);

echo "memory usage: \n";
echo " - min: ". (min($m)/1024)." kb\n";
echo " - avg: ". (array_sum($m)/count($m)/1024)." kb\n";
echo " - max: ". (($max=max($m))/1024)." kb\n";
foreach($m as $k => $v) {
    if ($v == $max) {
        echo " - max at $k\n";
    }
}
echo "\n";

echo "time: \n";
echo " - min: ". (min($t))." s\n";
echo " - avg: ". (array_sum($t)/count($t))." s\n";
echo " - max: ". ($max=max($t))." s\n";
foreach($t as $k => $v) {
    if ($v > $max - 0.00001) {
        echo " - max at $k\n";
    }
}
echo "\n";

Ran it 3 times on current master:

# php benchmark.php 
memory usage: 
 - min: 4.8359375 kb
 - avg: 4.8752971721722 kb
 - max: 5.1328125 kb
 - max at 569

time: 
 - min: 0.0085000991821289 s
 - avg: 0.0086580107997249 s
 - max: 0.0097930431365967 s
 - max at 458

# php benchmark.php 
memory usage: 
 - min: 4.8359375 kb
 - avg: 4.8752971721722 kb
 - max: 5.1328125 kb
 - max at 569

time: 
 - min: 0.0084569454193115 s
 - avg: 0.0086186533575659 s
 - max: 0.010159015655518 s
 - max at 461

# php benchmark.php 
php benchmark.php 
memory usage: 
 - min: 4.8359375 kb
 - avg: 4.8752971721722 kb
 - max: 5.1328125 kb
 - max at 569

time: 
 - min: 0.0082619190216064 s
 - avg: 0.0084668878797774 s
 - max: 0.012995004653931 s
 - max at 825
 - max at 826

Ran it 3 times on this branch:

# php benchmark.php 
memory usage: 
 - min: 4.859375 kb
 - avg: 4.8985626251251 kb
 - max: 5.140625 kb
 - max at 519

time: 
 - min: 0.0085580348968506 s
 - avg: 0.0088489790697833 s
 - max: 0.013104915618896 s
 - max at 340

# php benchmark.php 
memory usage: 
 - min: 4.859375 kb
 - avg: 4.8985626251251 kb
 - max: 5.140625 kb
 - max at 519

time: 
 - min: 0.0089371204376221 s
 - avg: 0.0092274615237185 s
 - max: 0.01418399810791 s
 - max at 355

# php benchmark.php 
memory usage: 
 - min: 4.859375 kb
 - avg: 4.8985626251251 kb
 - max: 5.140625 kb
 - max at 519

time: 
 - min: 0.0089361667633057 s
 - avg: 0.0091655559845276 s
 - max: 0.010545015335083 s
 - max at 961

hkdobrev · 2014-02-03T14:13:28Z

@cebe BTW I would really love to have benchmarks for Parsedown. Do you think creating a separate issue for that and discussing it there would make sense? Thanks!

cc @erusev

cebe · 2014-02-03T14:42:31Z

@cebe These handlers should really just return HTML. Every function is a better function if it relies only on its arguments and return a value based on these arguments. The so-called "pure" functions.

I don't see why the $block and $markup should be passed by reference.

In general you are right. A good function does not take parameters by reference and modifies them when it could just return the modified result.
But as these functions here are hooks into a parser they should be designed like it.
They get a text to parse and the markup to append to but also have to ensure the text is shortened by the part they parsed. For the block it might work to only return the HTML markup but this is not possible for the inline callback. I'd like to keep both handlers consistent in how they work.
Another argument for giving by reference is that the text would be duplicated in memory in each call. when working with large files this will increase memory usage significantly.
Imo the signature of the functions fit good for the usecase in a parser.

What you gain with your current implementation is the ability to update not only the markup, but the $block as well. Which would be a very rare case IMHO.

It is not possible to call the rendering of inline markup from within the handler so you might want to modify the block and let parsedown render it finally. This can be useful for complex blocks like lists or tables where you would need to duplicate the complete rendering logic when you only want to adjust minimal parts of it.

You could squash your commits into one.

I could do this but I think there are good reasons not to do it. For example the introduction of the ksort() method will allow you to see the reason directly when doing a git blame. I can rebase the commits and squash those which could fit well into one when we agree that implementation is final.

You could write some tests about this.

will do when we agree on this to be merged. there is not much to test in this except that handlers are called in the right place so I am going to do it when code is finalized.

I would really love to have benchmarks for Parsedown. Do you think creating a separate issue for that and discussing it there would make sense? Thanks!

yeah, better open a new issue for this, it would mess up the discussion here.

cebe · 2014-02-03T14:45:21Z

@erusev can I please have your opinion on this one? Feel free to share your doubts, if any, so we can discuss them.

erusev · 2014-02-03T23:12:05Z

@cebe

Before I decide to go with an implementation that uses callbacks, I'd like to explore one that uses inheritance.

I appreciate your contributions.

p.s. It is unlikely that I'd merge a pull request that introduces a feature or changes the API.

scottchiefbaker · 2014-02-03T23:23:14Z

My vote is callbacks over inheritance. I think it would be more logical, and easier to maintain.

hkdobrev · 2014-02-03T23:48:10Z

My vote would go to inheritance. Here is why:

Extensions to Parsedown are an ideal candidate for object-oriented polymorphism. Every child would do what the parent would do with a little more. Example child is ParsedownExtra (for Markdown Extra syntax).
In the long run inheritance would contribute to the better structure of the Parsedown core.
Methods declarations (both in Parsedown and extensions) could have PHPDoc hints and more. It is less likely people would write them for closures.
There would be literally zero performance hit. You would not even think the core Parsedown could be slowed down because of an extension if you don't use it.
Extensions could be Composer packages. You could then just alias ParsedownExtra as Parsedown using the PHP 5.3+ use feature. You would use only one line of PHP to define which extension is used.
You could use PHP 5.4+ Traits. You could extend the Parsedown class and use some traits about syntax-highlighting, ASCII art or whatever and all of them could be in separate repos.

cebe · 2014-02-04T00:32:53Z

@erusev thanks for you answer. I am happy to propose an extension way that uses inheritance but I am not sure if it can be as fast as this one (as far as I understood you are concerned about speed :) ). It is also not as easy to implement as it would be with callbacks because the existing methods would need some re-arrangement.

However, I feel that I have to say that it is highly unlikely that I'd merge a pull request that introduces a feature or changes the API. I've updated the contributing guidelines to make this clear. I hope that you would understand.

I'm not sure I understand. If there is a PR that implements a feature you would like to have (there is an issue for this: #18) and it adds some methods, you are free to propose different naming and arrangement if you think it should be implemented differently. This is an open source project so you can have other people do some work for you if you just let them. Not merging PRs because you just want to do it on your own does not make sense if we can come up with a better solution together.
I am not saying that mine or your solution would be the better one but I am sure the one we think/create together would be better than one I or you have written on my own.
You have created a great tool. You should not refuse to allow making it a perfect one together.

erusev · 2014-02-04T00:39:52Z

I am not saying that mine or your solution would better but I am sure the one we think/create together would be better than one I or you have written on my own.

I'd welcome any pull request that improves an existing feature.

scottchiefbaker · 2014-02-04T00:42:11Z

Not merging PRs because you just want to do it on your own does not make sense if we can come up with a better solution together.

👍

cebe · 2014-02-04T00:48:01Z

Btw: We are currently in the process of choosing a markdown parser that will be bundled with the Yii Framework 2.0 (I am part of the core dev team there). I am quite unhappy with most of the existing implementations and I found yours which is great considering speed and I also like the implementation approach.
The only thing that is missing here is extension support so I am happy to help you adding this feature.

I could start maintaining a fork that adds it but I prefer working together with the original community and contribute directly instead of creating a competition. You have a healthy community here, you should use it.

hkdobrev · 2014-02-04T00:50:30Z

+1 to all of what @cebe just said!

erusev · 2014-02-04T00:55:39Z

The only thing that is missing here is extension support so I am happy to help you adding this feature.

I'm working on it and I'm open to ideas.

hkdobrev · 2014-02-04T00:58:46Z

@erusev If you use a branch for your ideas/experiments it would be much easier to suggest ideas.

@cebe The benchmark of Ciconia's author is really cool. It really shows Parsedown is much faster even with GFM baked-in. Thank you for the link!

rr- · 2014-03-13T19:06:02Z

👍 for subclassing approach. I (ab)use it all the time with PHP-markdown, adding my own handlers and replacing existing ones. The current implementation of the project isn't very flexible in that regard, mainly due to usage of big switches. Each of the case statement could be replaced to separate method. That alone would allow users to hack around quite a lot. I don't think calling handlers, as opposed to having them hard-coded into switch, would hit performance very much.

Edit: I see it's partially done in better-extendability branch. Great! All that's left is a way for users to add custom block parsers and allow to customize span parsing (for example, let the users configure their subclasses to parse \s*\n into <br/>, not just [ ][ ]\n).

I haven't dug into the implementation very deep, but I have this random idea to drop switches altogether in favor of some user-controllable dictionary, that would map blocks/span markers to respective handlers. The more methods, the more control users have, and that's a good thing.

hkdobrev · 2014-04-17T19:08:20Z

@erusev @cebe This should be implemented now with the many public and protected function introduced in Parsedown.

cebe · 2014-04-17T19:24:47Z

yeah, closing this then.

better approach to extending markdown

8fb24b2

this approach is totally based on callbacks and more consistent. obsoletes erusev#70 and erusev#73

This was referenced Jan 29, 2014

added pre_process_block_element() method hook #70

Closed

allows to add markers for extending span markdown parsing #73

Closed

cebe added 2 commits January 29, 2014 18:56

forgot to index inline span handler

98b5b6e

preserve order of markers

807a1fc

keeping the array index itself does not help

This was referenced Jan 30, 2014

added parseLine() method for parsing oneliners #71

Closed

Enhance api doc generator yiisoft/yii2#1797

Closed

Allow "extensions" #18

Closed

wkpark mentioned this pull request Feb 17, 2014

syntax highlight() method added #111

Closed

cebe closed this Apr 17, 2014

cebe deleted the better-extendability branch April 17, 2014 19:24

hkdobrev mentioned this pull request Dec 17, 2015

Improved extensibility by adding hooks which allow custom methods to be added without inheritance. #340

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

better approach to extending markdown #74

better approach to extending markdown #74

cebe commented Jan 29, 2014

cebe commented Jan 29, 2014

scottchiefbaker commented Jan 29, 2014

cebe commented Jan 29, 2014

scottchiefbaker commented Jan 29, 2014

cebe commented Jan 29, 2014

scottchiefbaker commented Jan 29, 2014

cebe commented Jan 29, 2014

scottchiefbaker commented Jan 29, 2014

scottchiefbaker commented Jan 29, 2014

hkdobrev commented Jan 29, 2014

cebe commented Feb 1, 2014

cebe commented Feb 3, 2014

hkdobrev commented Feb 3, 2014

cebe commented Feb 3, 2014

cebe commented Feb 3, 2014

erusev commented Feb 3, 2014

scottchiefbaker commented Feb 3, 2014

hkdobrev commented Feb 3, 2014

cebe commented Feb 4, 2014

erusev commented Feb 4, 2014

scottchiefbaker commented Feb 4, 2014

cebe commented Feb 4, 2014

hkdobrev commented Feb 4, 2014

erusev commented Feb 4, 2014

hkdobrev commented Feb 4, 2014

rr- commented Mar 13, 2014

hkdobrev commented Apr 17, 2014

cebe commented Apr 17, 2014

better approach to extending markdown #74

better approach to extending markdown #74

Conversation

cebe commented Jan 29, 2014

cebe commented Jan 29, 2014

scottchiefbaker commented Jan 29, 2014

cebe commented Jan 29, 2014

scottchiefbaker commented Jan 29, 2014

cebe commented Jan 29, 2014

scottchiefbaker commented Jan 29, 2014

cebe commented Jan 29, 2014

scottchiefbaker commented Jan 29, 2014

scottchiefbaker commented Jan 29, 2014

hkdobrev commented Jan 29, 2014

cebe commented Feb 1, 2014

cebe commented Feb 3, 2014

hkdobrev commented Feb 3, 2014

cebe commented Feb 3, 2014

cebe commented Feb 3, 2014

erusev commented Feb 3, 2014

scottchiefbaker commented Feb 3, 2014

hkdobrev commented Feb 3, 2014

cebe commented Feb 4, 2014

erusev commented Feb 4, 2014

scottchiefbaker commented Feb 4, 2014

cebe commented Feb 4, 2014

hkdobrev commented Feb 4, 2014

erusev commented Feb 4, 2014

hkdobrev commented Feb 4, 2014

rr- commented Mar 13, 2014

hkdobrev commented Apr 17, 2014

cebe commented Apr 17, 2014