Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add noMarkup option to escape user HTML #224

Merged
merged 1 commit into from
Sep 21, 2014
Merged

Add noMarkup option to escape user HTML #224

merged 1 commit into from
Sep 21, 2014

Conversation

hkdobrev
Copy link
Contributor

Resolves #106. Addresses #161.

This change introduces a new option - noMarkup. You could set it in the
setNoMarkup() method similar to the setBreaksEnabled() one.

Example usage:

<?php

$parsedown = new Parsedown();
$parsedown->setNoMarkup(true);
$parsedown->text('<div><strong>*Some text*</strong></div>');

// Outputs:
// <p>&lt;div>&lt;strong><em>Some text</em>&lts;/strong>&lt;/div></p>

Resolves #106.

This change introduces a new option - `noMarkup`. You could set it the
`setNoMarkup()` method similar to the `setBreaksEnabled()` one.

Example usage:

``` php
<?php

$parsedown = new Parsedown();
$parsedown->setNoMarkup(true);
$parsedown->text('<div><strong>*Some text*</strong></div>');

// Outputs:
// <p>&lt;div>&lt;strong><em>Some text</em>&lts;/strong>&lt;/div></p>
```
This was referenced Sep 20, 2014
@xPaw
Copy link

xPaw commented Sep 20, 2014

Shouldn't it escape '>' like it does with '<'?

@hkdobrev
Copy link
Contributor Author

@xPaw No need. http://stackoverflow.com/questions/9010678/html-should-i-encode-greater-than-or-not-gt

I think you'd need to escape < in an XML document.

It should also be escaped in attributes, but this is not relevant to this change.

I think Parsedown does not escape it for performance reasons and because there is no reason to do it.

@xPaw
Copy link

xPaw commented Sep 20, 2014

HTML spec suggests escaping it. See http://www.w3.org/html/wg/drafts/html/master/syntax.html#serializing-html-fragments

This includes & and ".

@hkdobrev
Copy link
Contributor Author

@xPaw I believe this text in the spec is targeted towards user agents and other software parsing HTML or working with the DOM.

Even if Parsedown should escape <, this is not related to this pull request. This is the way Parsedown works regardless of the noMarkup option. You should probably discuss it here: #197 or open a new issue.

@hkdobrev
Copy link
Contributor Author

BTW @javiereguiluz has raised the question about releasing in #106 (comment)

@erusev I'd like to add that if you want to be very strict about Semantic Versioning this should be released as a new minor version instead of a patch version, because it changes the interface and the major version is bigger than 0. It doesn't matter that it is backwards compatible.

  1. Minor version Y (x.Y.z | x > 0) MUST be incremented if new, backwards compatible functionality is introduced to the public API.

erusev added a commit that referenced this pull request Sep 21, 2014
Add `noMarkup` option to escape user HTML
@erusev erusev merged commit e0965ce into erusev:master Sep 21, 2014
@erusev
Copy link
Owner

erusev commented Sep 21, 2014

@hkdobrev

I'd like to add that if you want to be very strict about Semantic Versioning this should be released as a new minor version instead of a patch version

👍

@hkdobrev hkdobrev deleted the no-markup-option branch September 21, 2014 20:23
@hkdobrev
Copy link
Contributor Author

@erusev I've just updated the wiki with the example usage.

https://github.com/erusev/parsedown/wiki/Usage

@erusev
Copy link
Owner

erusev commented Sep 21, 2014

@hkdobrev I'm contemplating changing the name of the method to make it more consistent with the name of the other setter. Probably to setMarkupEscaped.

@hkdobrev
Copy link
Contributor Author

@erusev I was considering the following names: disallowMarkup, setMarkupDisallowed, setEscapeMarkup, setDisallowHtml, setNoHtml.

I went for setNoMarkup for the following reasons:

  • It is consistent with what users have used with other parsers.
  • It is short and clear.
  • It is very clear that you are setting an option which is named noMarkup. In the case of setMarkupEscaped one could think you could pass it some markup which is already escaped. Generally such options should be named so it is clear they are options. Thus said I am not happy with breaksEnabled as well. It should be enableBreaks and the method would be setEnableBreaks.

@erusev
Copy link
Owner

erusev commented Sep 21, 2014

@hkdobrev

The name needs to be an adjective, as it represents a boolean, and the attribute in question doesn't remove markup. It escapes it.

I appreciate the PR and I hope you wouldn't mind me making this change.

@markseuffert
Copy link

Does setMarkupEscaped(true) escape HTML elements and entities?

@erusev
Copy link
Owner

erusev commented Oct 9, 2014

@markseu It does. You can test at http://parsedown.org/demo?set[MarkupEscaped]=1.

@javiereguiluz
Copy link

@markseu it does and you must be aware of this if you use the HTML generated by Parsedown afterwards. For instance, I use Parsedown to convert Markdown into HTML and then I use GeSHi to highlight the code listings. This is my workflow:

// use Parsedown to get HTML from Markdown
$parser = new \ParsedownExtra();
$parser->setMarkupEscaped(true);

$html = $parser->text($content);


// highlight code with GeSHi
$html = preg_replace_callback(
    '/^<pre><code( class="language\-(?<syntax>.*)")?>(?<source>.*)<\/code><\/pre>/Ums',
    'highlightCode',
    $html
);

The problem is that I need an ugly hack on highlightCode() function in order to unescape elements and entities. Otherwise, GeSHi would double escape them:

public function highlightCode($matches)
{
    $source = trim($matches['source']);

    // this is the ugly hack to avoid double escaping
    $source = str_replace(
        array('&lt;', '&gt;', '&amp;'),
        array('<',    '>',    '&'),
        $source
    );

    // highlight source code with GeSHi ...
}

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Parse all html tags as normal strings
5 participants