New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Corner case for fragments #39
Comments
Sure enough. I think the parser doesn't reject a carat followed by whitespace properly. This text "a<b" should actually be parsed as just "a" (at least, that is how chrome handles it - rejecting the unclosed tag). With spaces, though, the caret should be treated as text. Will get this fixed. |
Superb.. As you say only < followed by a-z should be treated as a tag.
and
Could be rather hard to implement? |
I don't think it would be hard to treat an unclosed caret as text, but I'm going for consistency with the HTML5 spec. Chrome could also be doing it wrong, though, but it seems like as good a model as any to the extent that I don't have the time to fully delve into the low level details of the spec... |
:) |
This got me thinking a bit... just to make sure chrome handles things the same for parsing a whole document vs. a fragment
it does indeed toss the unclosed tag when parsing HTML fragments. |
hmmm I am using CSQuery to sanitize input from users. And as you might know, user are capeable of doing all the wrong things.. :) |
Sanitizing input where you want to allow some valid HTML and also allow people to just enter text is definitely a little tricky! The HTML parser does have a couple internal settings for different ways to handle broken tags. It might be not that hard to pass through unclosed tags as a special parsing option, I'll take a look when I'm looking at this later. |
Superb :) ! |
Pushed out a change that should deal with this. If you want to try it out, pull the update or grab the DLLs from the "distribution" folder. I still haven't decided what to do about the rendering options from your last issue -- and no major bugs have come up since the last release. So I'm not in a hurry to push this out NuGet. |
That is ok. I will give it a go. |
|
Fixed in last push. |
Just another corner case.. probably not a bug... |
Bug - this is actually related to the one substantial change i had to make b/c of the new parser. Everything has a "Document" now, before, sometimes fragments did not before depending on how they were created. This is a correct model now but it looks like a piece of code is still testing for a missing parent. Will get this updated shortly. |
Fixed now. |
This fix is now on nuget (prerelease) as beta2 - closing. |
Do not work..
cs.Render returns "a "
The text was updated successfully, but these errors were encountered: