Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Corner case for fragments #39

Closed
petterek opened this issue Sep 10, 2012 · 16 comments
Closed

Corner case for fragments #39

petterek opened this issue Sep 10, 2012 · 16 comments
Labels

Comments

@petterek
Copy link

    Dim s As String = "a < b"
    Dim cs = CsQuery.CQ.Create(s)
    Assert.AreEqual("a &lt; b", cs.Render())

Do not work..

cs.Render returns "a "

@jamietre
Copy link
Owner

Sure enough. I think the parser doesn't reject a carat followed by whitespace properly.

This text

"a<b"

should actually be parsed as just "a" (at least, that is how chrome handles it - rejecting the unclosed tag). With spaces, though, the caret should be treated as text. Will get this fixed.

@petterek
Copy link
Author

Superb..

As you say only < followed by a-z should be treated as a tag.
Perhaps it's possible to implement some kind of "sliding caret" logic so.. can only have 1 open tag active at any time.
Then

"a<b" returns "a&lt;b" because there is no end >.. 

and

"a<<b" returns "a&lt;&lt;b"

Could be rather hard to implement?

@jamietre
Copy link
Owner

I don't think it would be hard to treat an unclosed caret as text, but I'm going for consistency with the HTML5 spec.
Since the actual spec is, sometimes, extraordinarily complicated, sometimes I just see how Chrome parses edge cases instead of being sure I understand the exact handling according to the spec :)

Chrome could also be doing it wrong, though, but it seems like as good a model as any to the extent that I don't have the time to fully delve into the low level details of the spec...

@petterek
Copy link
Author

:)

@jamietre
Copy link
Owner

This got me thinking a bit... just to make sure chrome handles things the same for parsing a whole document vs. a fragment

var div= document.createElement('div');
div.innerHTML='a<b';
console.log(div.innerHTML);
==> a

it does indeed toss the unclosed tag when parsing HTML fragments.

@petterek
Copy link
Author

hmmm
Then I belive your strategy "do as Chrome does" is a good one..

I am using CSQuery to sanitize input from users. And as you might know, user are capeable of doing all the wrong things.. :)
I thought I upload it to Github when it's in production, with a link back to CSQuery..
Keep up the good work ..

@jamietre
Copy link
Owner

Sanitizing input where you want to allow some valid HTML and also allow people to just enter text is definitely a little tricky!

The HTML parser does have a couple internal settings for different ways to handle broken tags. It might be not that hard to pass through unclosed tags as a special parsing option, I'll take a look when I'm looking at this later.

@petterek
Copy link
Author

Superb :) !

jamietre added a commit that referenced this issue Sep 13, 2012
@jamietre
Copy link
Owner

Pushed out a change that should deal with this. If you want to try it out, pull the update or grab the DLLs from the "distribution" folder.

I still haven't decided what to do about the rendering options from your last issue -- and no major bugs have come up since the last release. So I'm not in a hurry to push this out NuGet.

@petterek
Copy link
Author

That is ok. I will give it a go.

@petterek
Copy link
Author

    Dim s As String = "a < bsdf <>"
    Dim cs = CsQuery.CQ.Create(s)

    Assert.AreEqual("a &lt; bsdf &lt;&gt;", cs.Render()) <= OK
    Assert.AreEqual("a &lt; bsdf &lt;&gt;", cs.Render(CsQuery.DomRenderingOptions.HtmlEncodingMinimum)) <=FAILS

@jamietre
Copy link
Owner

Fixed in last push.

@petterek
Copy link
Author

<Test> Sub UnwrapWithOutParentFails()
    Dim s As String = "This is <b> a big</b> text"
    Dim f = CsQuery.CQ.Create(s)

    Assert.DoesNotThrow(Sub() f("b").Unwrap().Render())
End Sub

Just another corner case.. probably not a bug...

@jamietre
Copy link
Owner

Bug - this is actually related to the one substantial change i had to make b/c of the new parser. Everything has a "Document" now, before, sometimes fragments did not before depending on how they were created. This is a correct model now but it looks like a piece of code is still testing for a missing parent. Will get this updated shortly.

@jamietre
Copy link
Owner

Fixed now.

@jamietre
Copy link
Owner

jamietre commented Oct 9, 2012

This fix is now on nuget (prerelease) as beta2 - closing.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

2 participants