Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Removing jQuery HTML parsing by default is breaking #1880

Merged
merged 1 commit into from
Sep 28, 2015

Conversation

mbest
Copy link
Member

@mbest mbest commented Sep 24, 2015

I was originally hopeful that we could stop piggybacking on jQuery's HTML parser, at least for modern browsers. But after attempting to upgrade my current project to KO 3.4.0, I can see it's a bit more complicated than that!

Some things that the $.parseHTML handles that KO's built-in parser doesn't (even on modern browsers):

  • <col> and <colgroup> elements and probably others like <legend>
  • Markup with special elements like <thead> prefixed by comments or whitespace, e.g., <!-- ko something --><thead>...</thead>

For KO 3.4.0 to ship without pretty bad breaking changes here, I think we need to:

  • Go back to using $.parseHTML when available by default (possibly with some ko.options flag to disable that, but it shouldn't be the default)
  • Or, extend KO's native parser to handle the full set of edge cases that jQuery does

Looking at jQuery's source for $.parseHTML, I don't feel too enthusiastic about putting that much stuff into KO, so am leaning towards "go back to using jQuery's parser again". I know that jQuery's parser doesn't handle certain bizarrely-named custom elements (e.g., <th-mything>) but that's much more of an edge case than standard elements like <col> that jQuery does support.

Thoughts?

@SteveSanderson SteveSanderson added this to the 3.4.0 milestone Sep 15, 2015
@brianmhunt
Copy link
Member

I don't disagree with either approach (reverting to $.parseHTML or updating KO).

The actual wrapping is not too complex, really just wrapMap and buildFragment.

Comparing from ko's parser to the list of "hoistable tags" in jQuery 1.9.1, which is:

        option: [ 1, "<select multiple='multiple'>", "</select>" ],
        legend: [ 1, "<fieldset>", "</fieldset>" ],
        area: [ 1, "<map>", "</map>" ],
        param: [ 1, "<object>", "</object>" ],
        thead: [ 1, "<table>", "</table>" ],
        tr: [ 2, "<table><tbody>", "</tbody></table>" ],
        col: [ 2, "<table><tbody></tbody><colgroup>", "</colgroup></table>" ],
        td: [ 3, "<table><tbody><tr>", "</tr></tbody></table>" ],

... only a few are missing. It doesn't look tremendously difficult (at first glance, at least) to patch KO to do the missing elements correctly.

That said, if jQuery already does it, that certainly keeps KO cleaner.

To get 3.4 out sooner, I'd say $.parseHTML feels like the better option, until we are inclined to sort out the edge cases of a parser, in hopefully something leaner than the jQuery parsing.

Perhaps we might note in the documentation the problems with <thead-*> parsing.

Glad you were checking this out, @SteveSanderson.

@brianmhunt
Copy link
Member

So .... I've experimented with this a little, and may have devised something useful for browsers that support the template tag, i.e. those going forward. Best explanation is to have a look:

> d = document.createElement('div')
<div></div>​
> d.innerHTML = '<tr></tr>'
"<tr></tr>"
> d.innerHTML
""
// ^^ Uh oh.  This is why we need all those workarounds.

> t = document.createElement('template')
<template></template>​
> t.innerHTML = '<tr></tr>'
"<tr></tr>"
> t.innerHTML
"<tr></tr>"
// ^^ Happy dance?

... meaning on browsers that support the <template> tag we ought to be able to use innerHTML to parse the nodes, and the cloneNode to copy them into a target in the DOM, without all the workarounds for DOM elements that require parents of a particular type. As far as I know, nobody has published about this way to parse HTML, so it's difficult to tell how effective it will be in practice– i.e. we'd need to experiment a bit to make sure it works as expected across platforms.

If this is a viable option, for "legacy" browsers I think we ought to keep the current and jQuery parsing, but document the caveats (the dash-checking-workaround @mbest suggested is IMO probably also worthwhile).

In other words, I'm proposing that we not change parsing too much now, because we might want to test the template tag option in future, because it may solve the problem more elegantly. To know for sure we'll have to wait for the ongoing IE-Edge support of the template tag to wrap up.

Thoughts?

brianmhunt added a commit to brianmhunt/knockout that referenced this pull request Sep 17, 2015
@brianmhunt
Copy link
Member

Just ran the small patch above through, expanding the tests slightly and adding a <template> parser, just to see how it turns out. Will see what the auto-tests turn up (since it's derived from #1360, but should be easy to cherry-pick).

Travis-CI test results – only issue is IE 7 and <legend> tag, which I think is trivial to fix.

Also, I've not yet added comments-before-tags parsing <!-- ... -->. TODO. :)

@brianmhunt
Copy link
Member

So the strategy for parsing in the branch I've created (#1881) is:

  1. Use <template> tag parsing by default, otherwise
  2. Use jQuery $.parsingHTML unless the html string matches a regular expression indicating that there's a problematic case e.g. <tr-something>, otherwise
  3. Use the Simple HTML parser (which fails in cases like <!-- ko ...--><thead>)

Parsing HTML via a template tag is trivial and ought to work for all expected use cases, so that is preferred.

To support the <tr- workaround for jQuery I had to add a regular expression to test for tags that have <tr|thead|caption...-something.

The regular expression might really slow things down, but will be 110% accurate (the extra 10% is for the false-positives like <!-- the <tr-tag is in a comment but matches the regex -->).

Newer browsers using template tag parsing will, as a result of not needing any workarounds whatsoever, be substantially faster.

This feels like the most correct option and the most forward-looking, but comes at the expense of speed for legacy support.

Thoughts?

@brianmhunt
Copy link
Member

If the regular expression causes a huge performance issue with detecting the <tr- case (i.e. when jQuery cannot parse the expression), we can add an option that causes the parsing to always use jQuery e.g. disableSimpleHTMLParser, that circumvents the tagWithDashRegex.test(html). When the option is true then KO would alway use jQuery for parsing (except where template tag parsing is available).

@mbest
Copy link
Member

mbest commented Sep 24, 2015

I found that the tr-* parsing problem will be fixed in jQuery 3.0 (jquery/jquery#1988). That issue led me to a related Angular issue (angular/angular.js#10617). Looking at their code for this issue and the related test, they just skip the test if you're using a version of jQuery lower than 3.0.

@mbest
Copy link
Member

mbest commented Sep 24, 2015

Here's what I think we should do:

  1. Without jQuery, do our best to support custom element names, but don't worry about initial comments or other things that we didn't previously support.
  2. With jQuery, always use jQuery parsing with the understanding that full support for custom elements won't happen until jQuery 3.0.0.
  3. Continue to explore and investigate using <template> for parsing, possibly for use in an upcoming Knockout version.

@brianmhunt
Copy link
Member

Thanks @mbest. I'm okay with that.

Unless there's a pressing desire to get the template-tag parsing in, I am fine with it going into 3.5.0 (or whenever it makes sense).

Template-tag parsing will invariably be faster and simpler, but there's probably a bit of risk to it (particularly with IE Edge 13 being untested, though I did review the W3C spec and compliance with the spec should mean it'll work as we expect it to).

@rniemeyer
Copy link
Member

@mbest - I think the plan that you outlined sounds very reasonable

@@ -40,7 +40,7 @@
if (url) {
var shouldInclude = getParam(name);
if ((dependency.include || shouldInclude) && shouldInclude !== "0" && shouldInclude !== "false") {
if (shouldInclude && /^[0-9]+\.[0-9.]+$/.test(shouldInclude)) {
if (shouldInclude && shouldInclude !== "1" && shouldInclude !== "true") {
url = url.replace(dependency.versionString || 'latest', shouldInclude);
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This change allows the use of jquery=git to test with the 3.0.0-pre version.

@mbest
Copy link
Member

mbest commented Sep 24, 2015

So this should be ready to merge. Hopefully this is the last code change for 3.4.0.

mbest added a commit that referenced this pull request Sep 28, 2015
Removing jQuery HTML parsing by default is breaking
@mbest mbest merged commit a2d8d56 into master Sep 28, 2015
@mbest mbest deleted the 1880-revert-to-jquery-parsing branch September 28, 2015 09:25
@brianmhunt brianmhunt mentioned this pull request Dec 20, 2015
43 tasks
brianmhunt added a commit to knockout/tko that referenced this pull request Dec 20, 2015
…current versions fail with certain tags."

This reverts commit b85e327.

re. knockout/knockout#1880

# Conflicts:
#	spec/lib/loadDependencies.js
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants