Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Update to 1.6.9 breaks HTML entities #62

Closed
cyruscollier opened this issue Apr 4, 2016 · 5 comments
Closed

Update to 1.6.9 breaks HTML entities #62

cyruscollier opened this issue Apr 4, 2016 · 5 comments

Comments

@cyruscollier
Copy link

When I updated to 1.6.9, it appears the way HTML entities are treated in parsed documents has changed, and it incorrectly handles some characters. The only entity I see so far that converts incorrectly is i (lower case "i"). When requesting a node's text with this entity in it, it returns "\n5;" instead (newline, "5", semi-colon). As soon as I reverted back to 1.6.8, it fixed it.

Is it possible to disable all entity handling entirely, since I can easily do that myself with html_entity_decode() if I need to?

@paquettg
Copy link
Owner

paquettg commented Apr 5, 2016

Interesting, we do not do any handling of entities in the module. We do use the &#10 entity to preserve new lines though, which might be what is causing this issue for you.

I am not able to duplicate this solely with the line you provided, it just returns the correct line. I have added a test to confirm this (I'll tag this comment in my new text).

Could you please change the test so that it breaks for you/causes the issue you are experiencing? After that is done I can get to fixing the issue.

thanks.

@paquettg
Copy link
Owner

paquettg commented Apr 5, 2016

Here is the commit that added the test.
01a9e2f

@cyruscollier
Copy link
Author

Thanks, I'll take a look at the test and see if I can replicate it. The newline entity you mentioned may indeed have something to do with it. It is possible that hexadecimal entities are converted to decimal ones somewhere in the process? The hex entity i becomes the dec entity i, which matches my output if that entity somehow gets broken up into two parts.

@paquettg
Copy link
Owner

paquettg commented Apr 5, 2016

hey @clarinetlord,

I added a semi-color to the end of my conversion code, which really should have been their in the first place... this is probably why they have that.

This should solve the issue. Let me know if you have any other issue.

@cyruscollier
Copy link
Author

Yep, looks like that did it! You could now change that new TextNode test if you want, to test something like i instead of i, since any decimal entities starting with &#10 are what actually broke, not that particular hex entity I originally thought.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants