html-serializer doesn't work with nested blocks #1497

nghuuphuoc · 2018-01-02T07:11:11Z

Do you want to request a feature or report a bug?

A bug

What's the current behavior?

const BLOCK_TAGS = {
  blockquote: 'quote',
  p: 'paragraph',
  //div: 'div'
}

const rules = [
  {
    deserialize(el, next) {
      const type = BLOCK_TAGS[el.tagName.toLowerCase()]
      if (!type) return
      return {
        kind: 'block',
        type: type,
        nodes: next(el.childNodes)
      }
    }
  }
]

const pureHtml = '<blockquote><div>a text<blockquote>inner quote</blockquote></div></blockquote>'
const initialValue = new HtmlSerializer({ rules: rules }).deserialize(pureHtml);

It only renders a text element, and I couldn't see inner quote.
See https://jsfiddle.net/oj53q1n2/26/

What's the expected behavior?

We should see both text and quote.

The text was updated successfully, but these errors were encountered:

nghuuphuoc · 2018-01-02T07:14:06Z

If I add div to BLOCK_TAGS:

Parsed correctly (https://jsfiddle.net/oj53q1n2/27/):

const pureHtml = '<blockquote><div>a text</div><blockquote>inner quote</blockquote></blockquote>'

Parsed incorrectly (https://jsfiddle.net/oj53q1n2/28/):

const pureHtml = '<blockquote><div>a text<blockquote>inner quote</blockquote></div></blockquote>'

zhujinxuan · 2018-01-02T19:28:13Z

My solution about 'div':

if-div
  return next(children)

nghuuphuoc · 2018-01-03T02:26:38Z

Do you mean the following approach?

const rules = [
  {
    deserialize(el, next) {
      const tag = el.tagName.toLowerCase()
      if (tag == 'div') {
      	return next(el.childNodes)
      }
      const type = BLOCK_TAGS[tag]
      if (!type) return
      return {
        kind: 'block',
        type: type,
        nodes: next(el.childNodes)
      }
    }
  }
]

zhujinxuan · 2018-01-03T03:09:21Z

Yes

…

On Tue, Jan 2, 2018 at 21:26 Phuoc Nguyen ***@***.***> wrote: Do you mean the following approach? const rules = [ { deserialize(el, next) { const tag = el.tagName.toLowerCase() if (tag == 'div') { return next(el.childNodes) } const type = BLOCK_TAGS[tag] if (!type) return return { kind: 'block', type: type, nodes: next(el.childNodes) } } } ] — You are receiving this because you commented. Reply to this email directly, view it on GitHub <#1497 (comment)>, or mute the thread <https://github.com/notifications/unsubscribe-auth/AAtir6uQFX8TJiXBnfh2aRt8GLZR5QB9ks5tGuVigaJpZM4RQVuK> .

bengotow · 2018-01-08T17:20:10Z

Hey folks—I've messed with the JSFiddle and created what I think is a minimal example of this problem (https://jsfiddle.net/oj53q1n2/29/).

Input HTML:
<div>aa<div>missing</div></div>

Output Value:

{
  "object": "value",
  "document": {
    "object": "document",
    "data": {},
    "nodes": [
      {
        "object": "block",
        "type": "paragraph",
        "isVoid": false,
        "data": {},
        "nodes": [
          {
            "object": "block",
            "type": "div",
            "isVoid": false,
            "data": {},
            "nodes": [
              {
                "object": "text",
                "leaves": [
                  {
                    "object": "leaf",
                    "text": "aa",
                    "marks": []
                  }
                ]
              }
            ]
          }
        ]
      }
    ]
  }
}

I'm not sure why the second text (missing) isn't present in the result, but I can't see why this shouldn't be valid?

bengotow · 2018-01-08T18:07:50Z

After a bit more digging, the problem appears to be this constraint in the core Slate schema (which is enforced by Value.fromJSON inside the HTML deserializer):

/**
 * Only allow block nodes or inline and text nodes in blocks.
 *
 * @type {Object}
 */

{
  validateNode: function validateNode(node) {
    if (node.object != 'block') return;
    var first = node.nodes.first();
    if (!first) return;
    var objects = first.object == 'block' ? ['block'] : ['inline', 'text'];
    var invalids = node.nodes.filter(function (n) {
      return !objects.includes(n.object);
    });
    if (!invalids.size) return;

    return function (change) {
      invalids.forEach(function (child) {
        change.removeNodeByKey(child.key, { normalize: false });
      });
    };
  }
},

If the input HTML contains <div> tags and the Serializer rules convert those div to blocks rather than ignoring them, it's easy to create a structure that will be ripped apart by the schema validation after parsing, because Slate does not allow blocks to have both block children and text / inline children and this is a very common <div> case.

My solution is here: https://gist.github.com/bengotow/f5408e9cb543f22409d033df58e34579. Before running the HTML deserializer, I traverse the DOM tree and ensure that divs, blockquotes, and other nodes converted to Slate blocks contain either text + inline children OR block children, wrapping children into blocks as necessary. Curious whether this would be welcomed as default behavior in some way (cc @ianstormtaylor).

ianstormtaylor · 2018-01-08T21:32:54Z

Hey @bengotow good digging! I'd be open to a way to make the default behavior more helpful. I'm not sure if there's a way to do it that doesn't get too restrictive though, but if you have ideas I'd love to hear them.

I'd also be open to a PR that adding warning logging for these kinds of "shouldn't really be encountered often" normalization rules, so that it's more obvious what's going on.

Kornil · 2018-03-11T10:27:55Z

I feel this issue is really important and should be addressed by slatejs:
At the moment pasting a nested list does not return and error, just copy the non-nested content, users have no way to identify the omission if not manually going back to check the pasted content.

In light of this I'd suggest to use part of the gist by @bengotow (which I adapted already to a private project to fix the same issue) to make blocks that contains BOTH text and other blocks, readable by slate by adding a div (or p) node around the text.

bengotow · 2018-03-11T22:02:09Z

Hey @Kornil! After a bit more polish, I actually ended up switching to an approach that adds wrapping blocks, etc. to the resulting Slate graph before passing it through the normalizer, rather than changing the HTML before converting it. I think that's preferable because it works with any HTML <> Slate mapping rather than relying on an assumed set of conversions.

You can find the latest code I'm using here: https://github.com/Foundry376/Mailspring/blob/master/app/src/components/composer-editor/conversion.jsx#L172. I also wrote code to join adjacent text nodes rather then letting Slate do it during normalization, which sped things up a LOT because it's a simple transform and Slate "assumes the worst" when it runs a normalization step (and spends time re-finding the nodes, etc.)

pvande · 2018-04-11T18:16:12Z

I'd be curious to hear more about why that validation rule exists at all. While I agree that there's a conceptual correctness to it, there's no such restriction in HTML. In addition to the example provided by @nghuuphuoc, the following is valid HTML that is "unrepresentable" in Slate.

<ul>
  <li>
    Text Content
    <ul>
      <li>Nested Text Content</li>
    </ul>
  </li>
</ul>

The implicit behavior of silently destroying content feels like it needs a strong justification and prominent documentation.

crisward · 2018-10-03T15:30:42Z

For what it's worth, I'm also having issues with mixed inline / block level elements in slate with

<figure>
  <img src="" />
  <figcaption>Some Caption</figcaption>
</figure>

This is pretty standard html. I can get around this by wrapping the image in a div, but that's pretty nasty.

ianstormtaylor · 2018-10-03T18:43:41Z

Hey folks, this is by design. Slate does not allow you to have mixed inline and block level content in the same node. A block can either contain all block nodes, or it can contain inline and text nodes. This is enforced in the core editor-level schema.

The reason for this is that it makes implementing editing behaviors much simpler. It allows you to avoid a whole class of issues and questions that crop up related to intermingling. I realize there are no restrictions on HTML, but that's also what makes the native contenteditable behaviors so hard to standardize and predict.

If someone wants to open a pull request with a specific improvement to the docs for this, I'd be happy to merge it. I'm going to close this otherwise, since it's not something that is a bug that we can address.

q1998763 · 2018-12-11T02:55:01Z

@crisward I have the same problem. Have you found a solution?

crisward · 2018-12-11T11:28:05Z

Bit of a hack, when I convert the source to slate I wrap inlines in blocks. The remove them again on save eg

<figure>
  <div class="inlinewrapper">
    <img src="" />
  </div>
  <figcaption>Some Caption</figcaption>
</figure>

Not ideal, but easy enough to remove.

crisward · 2018-12-11T11:32:18Z

Rough code here - https://gist.github.com/crisward/b61bd926d44c1e58d05f0c0c472262a4
There is a bit of sanitisation code mixed in with that method, I was using it when pulling in content from our older cms.

dmitrizzle mentioned this issue Mar 12, 2018

Unwrap divs on paste roast-cms/french-press-editor#13

Closed

ianstormtaylor added improvement ♥ help labels Mar 21, 2018

t3chguy mentioned this issue Jul 11, 2018

Replace Draft with Slate matrix-org/matrix-react-sdk#1890

Merged

33 tasks

ianstormtaylor closed this as completed Oct 3, 2018

barrymun mentioned this issue May 8, 2019

Deserialization results in broken plugin functionality ConvertKit/slate-plugins#33

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

html-serializer doesn't work with nested blocks #1497

html-serializer doesn't work with nested blocks #1497

nghuuphuoc commented Jan 2, 2018

nghuuphuoc commented Jan 2, 2018 •

edited

Loading

zhujinxuan commented Jan 2, 2018

nghuuphuoc commented Jan 3, 2018

zhujinxuan commented Jan 3, 2018 via email

bengotow commented Jan 8, 2018

bengotow commented Jan 8, 2018 •

edited

Loading

ianstormtaylor commented Jan 8, 2018

Kornil commented Mar 11, 2018

bengotow commented Mar 11, 2018 •

edited

Loading

pvande commented Apr 11, 2018

crisward commented Oct 3, 2018

ianstormtaylor commented Oct 3, 2018

q1998763 commented Dec 11, 2018

crisward commented Dec 11, 2018 •

edited

Loading

crisward commented Dec 11, 2018 •

edited

Loading

html-serializer doesn't work with nested blocks #1497

html-serializer doesn't work with nested blocks #1497

Comments

nghuuphuoc commented Jan 2, 2018

Do you want to request a feature or report a bug?

What's the current behavior?

What's the expected behavior?

nghuuphuoc commented Jan 2, 2018 • edited Loading

zhujinxuan commented Jan 2, 2018

nghuuphuoc commented Jan 3, 2018

zhujinxuan commented Jan 3, 2018 via email

bengotow commented Jan 8, 2018

bengotow commented Jan 8, 2018 • edited Loading

ianstormtaylor commented Jan 8, 2018

Kornil commented Mar 11, 2018

bengotow commented Mar 11, 2018 • edited Loading

pvande commented Apr 11, 2018

crisward commented Oct 3, 2018

ianstormtaylor commented Oct 3, 2018

q1998763 commented Dec 11, 2018

crisward commented Dec 11, 2018 • edited Loading

crisward commented Dec 11, 2018 • edited Loading

nghuuphuoc commented Jan 2, 2018 •

edited

Loading

bengotow commented Jan 8, 2018 •

edited

Loading

bengotow commented Mar 11, 2018 •

edited

Loading

crisward commented Dec 11, 2018 •

edited

Loading

crisward commented Dec 11, 2018 •

edited

Loading