Parser completely removes body elements if html string has open <html> and <body> tags, but does not have close </body> and tag #18

sanex3339 · 2019-10-29T05:06:20Z

Expected Behavior

Parser should try to keep all existing html tags if html string has open and tags, but does not have close and tags

Actual Behavior

Parser completely removes all html children elements if html string has open and tags, but does not have close tag

Steps to Reproduce

Just try to parse following html string:

 <html>
        <body>
          <h1 style="font-family: Arial;">
            html-react-parser
          </h1>

Reproducible Demo

https://jsfiddle.net/d2g59ch4/

Environment

Version: 0.2.2
Platform: Mac OS
Browser: Chrome 77

The text was updated successfully, but these errors were encountered:

sanex3339 · 2019-10-29T08:04:58Z

So, looks like the problem is here:
https://github.com/remarkablemark/html-dom-parser/blob/master/lib/domparser.js#L140

remarkablemark · 2019-10-31T01:37:13Z

Thanks for opening this issue (and closing remarkablemark/html-react-parser#126) @sanex3339

I can confirm that this is a bug. Along with your fiddle, I can reproduce that this happens on the client parser (see repl) and not on the server parser (see repl).

Would you be interested in making a fix? Otherwise, I should have some time over the weekend to work on it.

sanex3339 · 2019-10-31T17:21:55Z

Hi, unfortunately, I haven't time to fix this bug, but I can wait until you'll fix it.

remarkablemark · 2019-11-01T01:46:14Z

No worries, I'll update you know once I have the fix.

Because the head and body regexes test against the closing tag, this causes html with unclosed head or body to not be parsed correctly. For example, given the following: ```js parse('<html><body>'); ``` The expected output is: ``` [ { type: 'tag', name: 'html', attribs: {}, children: [ { type: 'tag', name: 'body', attribs: {}, children: [], next: null, prev: null, parent: [Circular] } ], next: null, prev: null, parent: null } ] ``` But the actual output is: ``` [ { "next": null, "prev": null, "parent": null, "name": "html", "attribs": {}, "type": "tag", "children": [] } ] ``` The fix is to update the regex to use the opening tag instead of the closing tag. Add test case. Fixes #18

remarkablemark · 2019-11-04T05:17:11Z

html-dom-parser@0.2.3 has been released and published:

# update with npm
npm i -S html-dom-parser@0.2.3

# or with yarn
yarn add html-dom-parser@0.2.3

I'll have a followup PR for html-react-parser.

html-dom-parser 0.2.2 → 0.2.3 This fixes a bug related to client-side DOM parsing for unclosed HTML markup. Relates to #126 and remarkablemark/html-dom-parser#18

sanex3339 · 2019-11-04T07:36:08Z

Thank you!

remarkablemark added the bug label Oct 31, 2019

remarkablemark mentioned this issue Nov 4, 2019

fix: harden head and body regex in domparser #22

Merged

remarkablemark closed this as completed in #22 Nov 4, 2019

remarkablemark mentioned this issue Nov 4, 2019

build(package): upgrade dependency html-dom-parser@0.2.3 remarkablemark/html-react-parser#128

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Parser completely removes body elements if html string has open <html> and <body> tags, but does not have close </body> and tag #18

Parser completely removes body elements if html string has open <html> and <body> tags, but does not have close </body> and tag #18

sanex3339 commented Oct 29, 2019

sanex3339 commented Oct 29, 2019

remarkablemark commented Oct 31, 2019

sanex3339 commented Oct 31, 2019

remarkablemark commented Nov 1, 2019

remarkablemark commented Nov 4, 2019

sanex3339 commented Nov 4, 2019

Parser completely removes body elements if html string has open <html> and <body> tags, but does not have close </body> and tag #18

Parser completely removes body elements if html string has open <html> and <body> tags, but does not have close </body> and tag #18

Comments

sanex3339 commented Oct 29, 2019

Expected Behavior

Actual Behavior

Steps to Reproduce

Reproducible Demo

Environment

sanex3339 commented Oct 29, 2019

remarkablemark commented Oct 31, 2019

sanex3339 commented Oct 31, 2019

remarkablemark commented Nov 1, 2019

remarkablemark commented Nov 4, 2019

sanex3339 commented Nov 4, 2019