Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Gumbo] Entity Ö is converted to Ö;, leaving the semicolon behind #114

Closed
dfguo opened this issue Aug 27, 2014 · 9 comments
Closed
Labels
Milestone

Comments

@dfguo
Copy link

dfguo commented Aug 27, 2014

I'm not so sure why

val = "Ö"
=> "Ö"
r = Sanitize.fragment(val, Sanitize::Config.merge(Sanitize::Config::BASIC))
=> "Ö;"

Was this intentional? It breaks German characters by adding an extra ';'.

@rgrove
Copy link
Owner

rgrove commented Aug 27, 2014

I can't reproduce this. What version of Ruby and Sanitize are you using?

@dfguo
Copy link
Author

dfguo commented Aug 28, 2014

@rgrove Sorry that my input got rendered to HTML. Here is the code version.

> val = "Ö"
=> "Ö"
> r = Sanitize.fragment(val, Sanitize::Config.merge(Sanitize::Config::BASIC))
=> "Ö;"

I'm using ruby 2.0.0p481, sanitize (3.0.0) Thank you for looking into this.

@rgrove
Copy link
Owner

rgrove commented Aug 28, 2014

Thanks.

So, it's normal for the entity Ö to be converted into Ö as part of the parsing step, but it shouldn't have a trailing semicolon after the conversion. I'll see if I can figure out what's causing that.

@rgrove rgrove added bug and removed need info labels Aug 28, 2014
@rgrove
Copy link
Owner

rgrove commented Aug 28, 2014

Yep, looks like this is a Gumbo parser bug. I'll try to get it fixed upstream.

https://github.com/google/gumbo-parser/blob/3a61e9ad963cacfb3246468feab28c5058f621c1/src/char_ref.c#L467-L468

@rgrove rgrove changed the title non-unicode UFT characters [Gumbo] Entity Ö is converted to Ö;, leaving the semicolon behind Aug 28, 2014
@dfguo
Copy link
Author

dfguo commented Aug 29, 2014

Awesome! Glad you found the issue!

@dfguo
Copy link
Author

dfguo commented Sep 1, 2014

@rgrove when will the next release be? Thanks!

@rgrove
Copy link
Owner

rgrove commented Sep 2, 2014

@dfguo Next step is to get a new Nokogumbo gem with the Gumbo fix: rubys/nokogumbo#11

Once that happens, I'll release Sanitize 3.0.1 with an updated Nokogumbo dependency.

@rgrove rgrove added this to the 3.0.1 milestone Sep 2, 2014
@rgrove rgrove closed this as completed in 30f27fe Sep 3, 2014
@rgrove
Copy link
Owner

rgrove commented Sep 3, 2014

@dfguo Just pushed out the 3.0.1 gem with this fix.

Edit: Correction, 3.0.2 has the fix. Looks like Nokogumbo 1.1.11 silently reverted the change we wanted. Oops!

@dfguo
Copy link
Author

dfguo commented Sep 6, 2014

Awesome! Thanks!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

2 participants