Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Parser completely removes body elements if html string has open <html> and <body> tags, but does not have close </body> and tag #18

Closed
sanex3339 opened this issue Oct 29, 2019 · 6 comments · Fixed by #22
Labels

Comments

@sanex3339
Copy link

Expected Behavior

Parser should try to keep all existing html tags if html string has open and tags, but does not have close and tags

Actual Behavior

Parser completely removes all html children elements if html string has open and tags, but does not have close tag

Steps to Reproduce

Just try to parse following html string:

 <html>
        <body>
          <h1 style="font-family: Arial;">
            html-react-parser
          </h1>

Reproducible Demo

https://jsfiddle.net/d2g59ch4/

Environment

  • Version: 0.2.2
  • Platform: Mac OS
  • Browser: Chrome 77
@sanex3339
Copy link
Author

@remarkablemark
Copy link
Owner

Thanks for opening this issue (and closing remarkablemark/html-react-parser#126) @sanex3339

I can confirm that this is a bug. Along with your fiddle, I can reproduce that this happens on the client parser (see repl) and not on the server parser (see repl).

Would you be interested in making a fix? Otherwise, I should have some time over the weekend to work on it.

@sanex3339
Copy link
Author

Hi, unfortunately, I haven't time to fix this bug, but I can wait until you'll fix it.

@remarkablemark
Copy link
Owner

No worries, I'll update you know once I have the fix.

remarkablemark added a commit that referenced this issue Nov 4, 2019
Because the head and body regexes test against the closing tag,
this causes html with unclosed head or body to not be parsed
correctly.

For example, given the following:

```js
parse('<html><body>');
```

The expected output is:

```
[ { type: 'tag',
    name: 'html',
    attribs: {},
    children:
     [ { type: 'tag',
         name: 'body',
         attribs: {},
         children: [],
         next: null,
         prev: null,
         parent: [Circular] } ],
    next: null,
    prev: null,
    parent: null } ]
```

But the actual output is:

```
[
  {
    "next": null,
    "prev": null,
    "parent": null,
    "name": "html",
    "attribs": {},
    "type": "tag",
    "children": []
  }
]
```

The fix is to update the regex to use the opening tag instead of
the closing tag.

Add test case.

Fixes #18
@remarkablemark
Copy link
Owner

html-dom-parser@0.2.3 has been released and published:

# update with npm
npm i -S html-dom-parser@0.2.3

# or with yarn
yarn add html-dom-parser@0.2.3

I'll have a followup PR for html-react-parser.

remarkablemark added a commit to remarkablemark/html-react-parser that referenced this issue Nov 4, 2019
 html-dom-parser    0.2.2  →   0.2.3

This fixes a bug related to client-side DOM parsing for unclosed
HTML markup.

Relates to #126 and remarkablemark/html-dom-parser#18
@sanex3339
Copy link
Author

Thank you!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants