Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

better approach to extending markdown #74

Closed
wants to merge 4 commits into from

Conversation

cebe
Copy link
Contributor

@cebe cebe commented Jan 29, 2014

this approach is totally based on callbacks and more consistent.

obsoletes #70 and #73

this approach is totally based on callbacks and more consistent.

obsoletes erusev#70 and erusev#73
@cebe
Copy link
Contributor Author

cebe commented Jan 29, 2014

Usage example for code highlighting:

$handler = function(&$block, &$markup)
{
    if (strncmp($block['text'], '<?php', 5) === 0) {
        $text = highlight_string($block['text'], true);
    } else {
        $text = highlight_string("<?php\n".$block['text'], true);
    }

        $markup .= '<pre><code';

        if (isset($block['language']))
        {
            if ($block['language'] !== 'php') {
                return false;
            }
            $markup .= ' class="language-'.$block['language'].'"';
        }

    $markup .= '>'.$text.'</code></pre>'."\n";
    return true;
}

$p  = Parsedown::instance();
$p->register_block_hander('code', $handler);
$p->register_block_hander('fenced', $handler);
$p->parse(...);

@scottchiefbaker
Copy link

This would be to override the internal handlers for certain types of blocks?

@cebe
Copy link
Contributor Author

cebe commented Jan 29, 2014

if you return true you will overide it completely i.e. only your implementation will handle it. if function returns nothing or false you can manipulate the block but let Parsedown do the rendering.

@scottchiefbaker
Copy link

Perfect... I like this implementation. There is going to be some serious documentation/examples that will need to be written to support this. Are you also able to do that?

@cebe
Copy link
Contributor Author

cebe commented Jan 29, 2014

Sure, just wanted to make sure it is going to get accepted before writing docs.

@scottchiefbaker
Copy link

It's not my project, but I fully support this. I imagine @erusev's main concern would be performance for instances where the user is NOT doing any overrides. We'd want to make very sure the impact is low.

@cebe
Copy link
Contributor Author

cebe commented Jan 29, 2014

the effect on span parsing is nearly 0 as I only added isset() check in default part of switch.
In block parsing there one isset() call more per block which should also not do too much as it is one of the fastest things you can do in php.
the only thing that may have any impact is ksort() that is added to span parsing and comes ones per span element that occurs. Still nothing I would be worried about.
Imo it is worth compared to the flexibility it brings.

@scottchiefbaker
Copy link

Some benchmarks before/after on big blocks of Markdown would be pretty simple to do.

@scottchiefbaker
Copy link

If you can implement/document this without a major impact to the speed of the code (one of the major focuses of Parsedown) I'd fully support this. It seems like a great feature.

@hkdobrev
Copy link
Contributor

@cebe These handlers should really just return HTML. Every function is a better function if it relies only on its arguments and return a value based on these arguments. The so-called "pure" functions.

I don't see why the $block and $markup should be passed by reference.

Here is an example:

$handler = function($block)
{
    if (strncmp($block['text'], '<?php', 5) === 0) {
        $text = highlight_string($block['text'], true);
    } else {
        $text = highlight_string("<?php\n".$block['text'], true);
    }

    $markup = '<pre><code';

    if (isset($block['language']))
    {
        if ($block['language'] !== 'php') {
            return null;
        }
        $markup .= ' class="language-'.$block['language'].'"';
    }

    $markup .= '>'.$text.'</code></pre>'."\n";
    return $markup;
}

$p  = Parsedown::instance();
$p->register_block_hander('code', $handler);
$p->register_block_hander('fenced', $handler);
$p->parse(...);

What you gain with your current implementation is the ability to update not only the markup, but the $block as well. Which would be a very rare case IMHO.


You could write some tests about this.


You could squash your commits into one.

@cebe
Copy link
Contributor Author

cebe commented Feb 1, 2014

Thanks for your review and comments, will get back to them when I find the time for it.

* master:
  simplify em/strong routine
  outdented is shorter and probably more accurate
  improve contributing guidelines
  improve consistency of list item
  add contributing guidelines
  dense list items that follow sparse ones should not be rendered as sparse ones
  improve parsing of list item and code block by measuring line indentation
  Remove one unnecessary /u flag.
  Remove /u flag from '*' chars. Add /u to urls.
  some edge case tests for the code tag
  Add unicode support for strong/em regex.
@cebe
Copy link
Contributor Author

cebe commented Feb 3, 2014

Some benchmarks before/after on big blocks of Markdown would be pretty simple to do.

Created a benchmark:

<?php

require('Parsedown.php');

$markdown = file_get_contents('http://daringfireball.net/projects/markdown/syntax.text');

$m = [];
$t = [];

for ($n = 0; $n < 1000; $n++) {
    $oldmem = memory_get_usage();
    $begin = microtime(true);

    // ---

    $pd = new Parsedown();
    $pd->parse($markdown);

    // ---

    $mem = memory_get_usage();
    $t[$n] = microtime(true) - $begin;
    $m[$n] = $mem - $oldmem;
    unset($pd);
}

// discard the first iteration
array_shift($m);
array_shift($t);

echo "memory usage: \n";
echo " - min: ". (min($m)/1024)." kb\n";
echo " - avg: ". (array_sum($m)/count($m)/1024)." kb\n";
echo " - max: ". (($max=max($m))/1024)." kb\n";
foreach($m as $k => $v) {
    if ($v == $max) {
        echo " - max at $k\n";
    }
}
echo "\n";

echo "time: \n";
echo " - min: ". (min($t))." s\n";
echo " - avg: ". (array_sum($t)/count($t))." s\n";
echo " - max: ". ($max=max($t))." s\n";
foreach($t as $k => $v) {
    if ($v > $max - 0.00001) {
        echo " - max at $k\n";
    }
}
echo "\n";

Ran it 3 times on current master:

# php benchmark.php 
memory usage: 
 - min: 4.8359375 kb
 - avg: 4.8752971721722 kb
 - max: 5.1328125 kb
 - max at 569

time: 
 - min: 0.0085000991821289 s
 - avg: 0.0086580107997249 s
 - max: 0.0097930431365967 s
 - max at 458

# php benchmark.php 
memory usage: 
 - min: 4.8359375 kb
 - avg: 4.8752971721722 kb
 - max: 5.1328125 kb
 - max at 569

time: 
 - min: 0.0084569454193115 s
 - avg: 0.0086186533575659 s
 - max: 0.010159015655518 s
 - max at 461

# php benchmark.php 
php benchmark.php 
memory usage: 
 - min: 4.8359375 kb
 - avg: 4.8752971721722 kb
 - max: 5.1328125 kb
 - max at 569

time: 
 - min: 0.0082619190216064 s
 - avg: 0.0084668878797774 s
 - max: 0.012995004653931 s
 - max at 825
 - max at 826

Ran it 3 times on this branch:

# php benchmark.php 
memory usage: 
 - min: 4.859375 kb
 - avg: 4.8985626251251 kb
 - max: 5.140625 kb
 - max at 519

time: 
 - min: 0.0085580348968506 s
 - avg: 0.0088489790697833 s
 - max: 0.013104915618896 s
 - max at 340

# php benchmark.php 
memory usage: 
 - min: 4.859375 kb
 - avg: 4.8985626251251 kb
 - max: 5.140625 kb
 - max at 519

time: 
 - min: 0.0089371204376221 s
 - avg: 0.0092274615237185 s
 - max: 0.01418399810791 s
 - max at 355

# php benchmark.php 
memory usage: 
 - min: 4.859375 kb
 - avg: 4.8985626251251 kb
 - max: 5.140625 kb
 - max at 519

time: 
 - min: 0.0089361667633057 s
 - avg: 0.0091655559845276 s
 - max: 0.010545015335083 s
 - max at 961

@hkdobrev
Copy link
Contributor

hkdobrev commented Feb 3, 2014

@cebe BTW I would really love to have benchmarks for Parsedown. Do you think creating a separate issue for that and discussing it there would make sense? Thanks!

cc @erusev

@cebe
Copy link
Contributor Author

cebe commented Feb 3, 2014

@cebe These handlers should really just return HTML. Every function is a better function if it relies only on its arguments and return a value based on these arguments. The so-called "pure" functions.

I don't see why the $block and $markup should be passed by reference.

In general you are right. A good function does not take parameters by reference and modifies them when it could just return the modified result.
But as these functions here are hooks into a parser they should be designed like it.
They get a text to parse and the markup to append to but also have to ensure the text is shortened by the part they parsed. For the block it might work to only return the HTML markup but this is not possible for the inline callback. I'd like to keep both handlers consistent in how they work.
Another argument for giving by reference is that the text would be duplicated in memory in each call. when working with large files this will increase memory usage significantly.
Imo the signature of the functions fit good for the usecase in a parser.

What you gain with your current implementation is the ability to update not only the markup, but the $block as well. Which would be a very rare case IMHO.

It is not possible to call the rendering of inline markup from within the handler so you might want to modify the block and let parsedown render it finally. This can be useful for complex blocks like lists or tables where you would need to duplicate the complete rendering logic when you only want to adjust minimal parts of it.

You could squash your commits into one.

I could do this but I think there are good reasons not to do it. For example the introduction of the ksort() method will allow you to see the reason directly when doing a git blame. I can rebase the commits and squash those which could fit well into one when we agree that implementation is final.

You could write some tests about this.

will do when we agree on this to be merged. there is not much to test in this except that handlers are called in the right place so I am going to do it when code is finalized.

I would really love to have benchmarks for Parsedown. Do you think creating a separate issue for that and discussing it there would make sense? Thanks!

yeah, better open a new issue for this, it would mess up the discussion here.

@cebe
Copy link
Contributor Author

cebe commented Feb 3, 2014

@erusev can I please have your opinion on this one? Feel free to share your doubts, if any, so we can discuss them.

@erusev
Copy link
Owner

erusev commented Feb 3, 2014

@cebe

Before I decide to go with an implementation that uses callbacks, I'd like to explore one that uses inheritance.

I appreciate your contributions.

p.s. It is unlikely that I'd merge a pull request that introduces a feature or changes the API.

@scottchiefbaker
Copy link

My vote is callbacks over inheritance. I think it would be more logical, and easier to maintain.

@hkdobrev
Copy link
Contributor

hkdobrev commented Feb 3, 2014

My vote would go to inheritance. Here is why:

  • Extensions to Parsedown are an ideal candidate for object-oriented polymorphism. Every child would do what the parent would do with a little more. Example child is ParsedownExtra (for Markdown Extra syntax).
  • In the long run inheritance would contribute to the better structure of the Parsedown core.
  • Methods declarations (both in Parsedown and extensions) could have PHPDoc hints and more. It is less likely people would write them for closures.
  • There would be literally zero performance hit. You would not even think the core Parsedown could be slowed down because of an extension if you don't use it.
  • Extensions could be Composer packages. You could then just alias ParsedownExtra as Parsedown using the PHP 5.3+ use feature. You would use only one line of PHP to define which extension is used.
  • You could use PHP 5.4+ Traits. You could extend the Parsedown class and use some traits about syntax-highlighting, ASCII art or whatever and all of them could be in separate repos.

@cebe
Copy link
Contributor Author

cebe commented Feb 4, 2014

@erusev thanks for you answer. I am happy to propose an extension way that uses inheritance but I am not sure if it can be as fast as this one (as far as I understood you are concerned about speed :) ). It is also not as easy to implement as it would be with callbacks because the existing methods would need some re-arrangement.

However, I feel that I have to say that it is highly unlikely that I'd merge a pull request that introduces a feature or changes the API. I've updated the contributing guidelines to make this clear. I hope that you would understand.

I'm not sure I understand. If there is a PR that implements a feature you would like to have (there is an issue for this: #18) and it adds some methods, you are free to propose different naming and arrangement if you think it should be implemented differently. This is an open source project so you can have other people do some work for you if you just let them. Not merging PRs because you just want to do it on your own does not make sense if we can come up with a better solution together.
I am not saying that mine or your solution would be the better one but I am sure the one we think/create together would be better than one I or you have written on my own.
You have created a great tool. You should not refuse to allow making it a perfect one together.

@erusev
Copy link
Owner

erusev commented Feb 4, 2014

I am not saying that mine or your solution would better but I am sure the one we think/create together would be better than one I or you have written on my own.

I'd welcome any pull request that improves an existing feature.

@scottchiefbaker
Copy link

Not merging PRs because you just want to do it on your own does not make sense if we can come up with a better solution together.

👍

@cebe
Copy link
Contributor Author

cebe commented Feb 4, 2014

Btw: We are currently in the process of choosing a markdown parser that will be bundled with the Yii Framework 2.0 (I am part of the core dev team there). I am quite unhappy with most of the existing implementations and I found yours which is great considering speed and I also like the implementation approach.
The only thing that is missing here is extension support so I am happy to help you adding this feature.

I could start maintaining a fork that adds it but I prefer working together with the original community and contribute directly instead of creating a competition. You have a healthy community here, you should use it.

@hkdobrev
Copy link
Contributor

hkdobrev commented Feb 4, 2014

+1 to all of what @cebe just said!

@erusev
Copy link
Owner

erusev commented Feb 4, 2014

The only thing that is missing here is extension support so I am happy to help you adding this feature.

I'm working on it and I'm open to ideas.

@hkdobrev
Copy link
Contributor

hkdobrev commented Feb 4, 2014

@erusev If you use a branch for your ideas/experiments it would be much easier to suggest ideas.

@cebe The benchmark of Ciconia's author is really cool. It really shows Parsedown is much faster even with GFM baked-in. Thank you for the link!

@rr-
Copy link

rr- commented Mar 13, 2014

👍 for subclassing approach. I (ab)use it all the time with PHP-markdown, adding my own handlers and replacing existing ones. The current implementation of the project isn't very flexible in that regard, mainly due to usage of big switches. Each of the case statement could be replaced to separate method. That alone would allow users to hack around quite a lot. I don't think calling handlers, as opposed to having them hard-coded into switch, would hit performance very much.

Edit: I see it's partially done in better-extendability branch. Great! All that's left is a way for users to add custom block parsers and allow to customize span parsing (for example, let the users configure their subclasses to parse \s*\n into <br/>, not just [ ][ ]\n).

I haven't dug into the implementation very deep, but I have this random idea to drop switches altogether in favor of some user-controllable dictionary, that would map blocks/span markers to respective handlers. The more methods, the more control users have, and that's a good thing.

@hkdobrev
Copy link
Contributor

@erusev @cebe This should be implemented now with the many public and protected function introduced in Parsedown.

@cebe
Copy link
Contributor Author

cebe commented Apr 17, 2014

yeah, closing this then.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants