Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

According to the web component spec, a dash is required, which breaks md parsing #407

Closed
rodsimpson opened this issue Jul 13, 2016 · 4 comments
Milestone

Comments

@rodsimpson
Copy link

rodsimpson commented Jul 13, 2016

If you are using web components in your html / md, you must have a dash. For example:

Invalid:
<mycomponent></mycomponent>

Valid:
<my-component></my-component>

The former case stops further parsing of markdown. For example:

Works:
<mycomponent></mycomponent>
# header

Doesn't work:
<my-component></my-component>
# header

In the second case, # header is not converted to an H1 tag, but it should be.

@pablotheissen
Copy link
Contributor

pablotheissen commented Sep 22, 2016

The following seems to fix it:

Line 676:
- if (preg_match('/^<(\w*)(?:[ ]*'.$this->regexHtmlAttribute.')*[ ]*(\/)?>/', $Line['text'], $matches))
+ if (preg_match('/^<(\w[\w-]*)(?:[ ]*'.$this->regexHtmlAttribute.')*[ ]*(\/)?>/', $Line['text'], $matches))

Ditto on line 1277:
- if ($Excerpt['text'][1] !== ' ' and preg_match('/^<\w*(?:[ ]*'.$this->regexHtmlAttribute.')*[ ]*\/?>/s', $Excerpt['text'], $matches))
+ if ($Excerpt['text'][1] !== ' ' and preg_match('/^<\w[\w-]*(?:[ ]*'.$this->regexHtmlAttribute.')*[ ]*\/?>/s', $Excerpt['text'], $matches))

Line 1261:
- if ($Excerpt['text'][1] === '/' and preg_match('/^<\/\w*[ ]*>/s', $Excerpt['text'], $matches))
- if ($Excerpt['text'][1] === '/' and preg_match('/^<\/\w[\w-]*[ ]*>/s', $Excerpt['text'], $matches))

This changed regex allows custom element names such as <valid-tag> but not <-notvalid>. I have not tested the code against all the other unicode characters mentioned in above link.

-- edit: after studying the WHATWG document again: <valid-.> as well as <valid-ℬ> should be valid but would make the RegExp way more complex.

-- edit2: changed the regexp to validate as well as and (line 1261) which didn't work before.

@PhrozenByte
Copy link
Contributor

@pablotheissen: Don't make it overcomplicated, Parsedown is no HTML validator, it's just about matching HTML elements, not about validating them. So, just add the . next to the - as allowed char and this should be good to go.

@taufik-nurrohman
Copy link

What about namespaces? Example <foo:bar baz:qux="xyz">

/<[\w:.-]+(\s[^<>]*?)?>/

@aidantwoods
Copy link
Collaborator

Provided that you include a newline between the HTML and the markdown following it this should work as expected in the next release :)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

6 participants