Removing jQuery HTML parsing by default is breaking #1880

mbest · 2015-09-24T23:17:59Z

I was originally hopeful that we could stop piggybacking on jQuery's HTML parser, at least for modern browsers. But after attempting to upgrade my current project to KO 3.4.0, I can see it's a bit more complicated than that!

Some things that the $.parseHTML handles that KO's built-in parser doesn't (even on modern browsers):

<col> and <colgroup> elements and probably others like <legend>
Markup with special elements like <thead> prefixed by comments or whitespace, e.g., <thead>...</thead>

For KO 3.4.0 to ship without pretty bad breaking changes here, I think we need to:

Go back to using $.parseHTML when available by default (possibly with some ko.options flag to disable that, but it shouldn't be the default)
Or, extend KO's native parser to handle the full set of edge cases that jQuery does

Looking at jQuery's source for $.parseHTML, I don't feel too enthusiastic about putting that much stuff into KO, so am leaning towards "go back to using jQuery's parser again". I know that jQuery's parser doesn't handle certain bizarrely-named custom elements (e.g., <th-mything>) but that's much more of an edge case than standard elements like <col> that jQuery does support.

Thoughts?

brianmhunt · 2015-09-15T19:03:47Z

I don't disagree with either approach (reverting to $.parseHTML or updating KO).

The actual wrapping is not too complex, really just wrapMap and buildFragment.

Comparing from ko's parser to the list of "hoistable tags" in jQuery 1.9.1, which is:

        option: [ 1, "<select multiple='multiple'>", "</select>" ],
        legend: [ 1, "<fieldset>", "</fieldset>" ],
        area: [ 1, "<map>", "</map>" ],
        param: [ 1, "<object>", "</object>" ],
        thead: [ 1, "<table>", "</table>" ],
        tr: [ 2, "<table><tbody>", "</tbody></table>" ],
        col: [ 2, "<table><tbody></tbody><colgroup>", "</colgroup></table>" ],
        td: [ 3, "<table><tbody><tr>", "</tr></tbody></table>" ],

... only a few are missing. It doesn't look tremendously difficult (at first glance, at least) to patch KO to do the missing elements correctly.

That said, if jQuery already does it, that certainly keeps KO cleaner.

To get 3.4 out sooner, I'd say $.parseHTML feels like the better option, until we are inclined to sort out the edge cases of a parser, in hopefully something leaner than the jQuery parsing.

Perhaps we might note in the documentation the problems with <thead-*> parsing.

Glad you were checking this out, @SteveSanderson.

brianmhunt · 2015-09-16T13:18:37Z

So .... I've experimented with this a little, and may have devised something useful for browsers that support the template tag, i.e. those going forward. Best explanation is to have a look:

> d = document.createElement('div')
<div></div>
> d.innerHTML = '<tr></tr>'
"<tr></tr>"
> d.innerHTML
""
// ^^ Uh oh.  This is why we need all those workarounds.

> t = document.createElement('template')
<template>…</template>
> t.innerHTML = '<tr></tr>'
"<tr></tr>"
> t.innerHTML
"<tr></tr>"
// ^^ Happy dance?

... meaning on browsers that support the <template> tag we ought to be able to use innerHTML to parse the nodes, and the cloneNode to copy them into a target in the DOM, without all the workarounds for DOM elements that require parents of a particular type. As far as I know, nobody has published about this way to parse HTML, so it's difficult to tell how effective it will be in practice– i.e. we'd need to experiment a bit to make sure it works as expected across platforms.

If this is a viable option, for "legacy" browsers I think we ought to keep the current and jQuery parsing, but document the caveats (the dash-checking-workaround @mbest suggested is IMO probably also worthwhile).

In other words, I'm proposing that we not change parsing too much now, because we might want to test the template tag option in future, because it may solve the problem more elegantly. To know for sure we'll have to wait for the ongoing IE-Edge support of the template tag to wrap up.

Thoughts?

brianmhunt · 2015-09-17T02:05:06Z

Just ran the small patch above through, expanding the tests slightly and adding a <template> parser, just to see how it turns out. Will see what the auto-tests turn up (since it's derived from #1360, but should be easy to cherry-pick).

Travis-CI test results – only issue is IE 7 and <legend> tag, which I think is trivial to fix.

Also, I've not yet added comments-before-tags parsing . TODO. :)

brianmhunt · 2015-09-17T13:24:52Z

So the strategy for parsing in the branch I've created (#1881) is:

Use <template> tag parsing by default, otherwise
Use jQuery $.parsingHTML unless the html string matches a regular expression indicating that there's a problematic case e.g. <tr-something>, otherwise
Use the Simple HTML parser (which fails in cases like <thead>)

Parsing HTML via a template tag is trivial and ought to work for all expected use cases, so that is preferred.

To support the <tr- workaround for jQuery I had to add a regular expression to test for tags that have <tr|thead|caption...-something.

The regular expression might really slow things down, but will be 110% accurate (the extra 10% is for the false-positives like ).

Newer browsers using template tag parsing will, as a result of not needing any workarounds whatsoever, be substantially faster.

This feels like the most correct option and the most forward-looking, but comes at the expense of speed for legacy support.

Thoughts?

brianmhunt · 2015-09-19T16:41:49Z

If the regular expression causes a huge performance issue with detecting the <tr- case (i.e. when jQuery cannot parse the expression), we can add an option that causes the parsing to always use jQuery e.g. disableSimpleHTMLParser, that circumvents the tagWithDashRegex.test(html). When the option is true then KO would alway use jQuery for parsing (except where template tag parsing is available).

mbest · 2015-09-24T10:06:09Z

I found that the tr-* parsing problem will be fixed in jQuery 3.0 (jquery/jquery#1988). That issue led me to a related Angular issue (angular/angular.js#10617). Looking at their code for this issue and the related test, they just skip the test if you're using a version of jQuery lower than 3.0.

mbest · 2015-09-24T10:24:49Z

Here's what I think we should do:

Without jQuery, do our best to support custom element names, but don't worry about initial comments or other things that we didn't previously support.
With jQuery, always use jQuery parsing with the understanding that full support for custom elements won't happen until jQuery 3.0.0.
Continue to explore and investigate using <template> for parsing, possibly for use in an upcoming Knockout version.

brianmhunt · 2015-09-24T12:32:51Z

Thanks @mbest. I'm okay with that.

Unless there's a pressing desire to get the template-tag parsing in, I am fine with it going into 3.5.0 (or whenever it makes sense).

Template-tag parsing will invariably be faster and simpler, but there's probably a bit of risk to it (particularly with IE Edge 13 being untested, though I did review the W3C spec and compliance with the spec should mean it'll work as we expect it to).

rniemeyer · 2015-09-24T13:24:40Z

@mbest - I think the plan that you outlined sounds very reasonable

…versions fail with certain tags.

mbest · 2015-09-24T23:18:55Z

spec/lib/loadDependencies.js

@@ -40,7 +40,7 @@
        if (url) {
            var shouldInclude = getParam(name);
            if ((dependency.include || shouldInclude) && shouldInclude !== "0" && shouldInclude !== "false") {
-                if (shouldInclude && /^[0-9]+\.[0-9.]+$/.test(shouldInclude)) {
+                if (shouldInclude && shouldInclude !== "1" && shouldInclude !== "true") {
                    url = url.replace(dependency.versionString || 'latest', shouldInclude);


This change allows the use of jquery=git to test with the 3.0.0-pre version.

mbest · 2015-09-24T23:19:59Z

So this should be ready to merge. Hopefully this is the last code change for 3.4.0.

Removing jQuery HTML parsing by default is breaking

…current versions fail with certain tags." This reverts commit b85e327. re. knockout/knockout#1880 # Conflicts: # spec/lib/loadDependencies.js

SteveSanderson added this to the 3.4.0 milestone Sep 15, 2015

mbest added severity: major type: bug affected: most labels Sep 15, 2015

brianmhunt added a commit to brianmhunt/knockout that referenced this pull request Sep 17, 2015

Re knockout#1880 add <template> parser and more parser tests

0696316

This was referenced Sep 17, 2015

1880 template html parser #1881

Open

simpleHtmlParse() fails to parse components having names beginning with thead, tbody, tfoot, tr, th and td #1772

Closed

Always use jQuery for HTML parsing if available, even though current …

b85e327

…versions fail with certain tags.

mbest reviewed Sep 24, 2015
View reviewed changes

mbest added a commit that referenced this pull request Sep 28, 2015

Merge pull request #1880 from knockout/1880-revert-to-jquery-parsing

a2d8d56

Removing jQuery HTML parsing by default is breaking

mbest merged commit a2d8d56 into master Sep 28, 2015

mbest deleted the 1880-revert-to-jquery-parsing branch September 28, 2015 09:25

brianmhunt mentioned this pull request Dec 20, 2015

🚀 Merges and goals knockout/tko#1

Open

43 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Removing jQuery HTML parsing by default is breaking #1880

Removing jQuery HTML parsing by default is breaking #1880

mbest commented Sep 24, 2015

brianmhunt commented Sep 15, 2015

brianmhunt commented Sep 16, 2015

brianmhunt commented Sep 17, 2015

brianmhunt commented Sep 17, 2015

brianmhunt commented Sep 19, 2015

mbest commented Sep 24, 2015

mbest commented Sep 24, 2015

brianmhunt commented Sep 24, 2015

rniemeyer commented Sep 24, 2015

mbest Sep 24, 2015

mbest commented Sep 24, 2015

Removing jQuery HTML parsing by default is breaking #1880

Removing jQuery HTML parsing by default is breaking #1880

Conversation

mbest commented Sep 24, 2015

brianmhunt commented Sep 15, 2015

brianmhunt commented Sep 16, 2015

brianmhunt commented Sep 17, 2015

brianmhunt commented Sep 17, 2015

brianmhunt commented Sep 19, 2015

mbest commented Sep 24, 2015

mbest commented Sep 24, 2015

brianmhunt commented Sep 24, 2015

rniemeyer commented Sep 24, 2015

mbest Sep 24, 2015

Choose a reason for hiding this comment

mbest commented Sep 24, 2015