Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix and test case for href with unicode in it (Ruby 1.9.2). #30

Closed
wants to merge 1 commit into from

Conversation

dabble
Copy link

@dabble dabble commented Jun 6, 2011

Fix issue with parsing UTF-8 encoded attributes. Problem encountered with some RSS feeds and feedzirra.

Encoding::CompatibilityError: incompatible encoding regexp match (ASCII-8BIT regexp with UTF-8 string)
loofah/lib/loofah/html5/scrub.rb:20:in gsub' loofah/lib/loofah/html5/scrub.rb:20:inblock in scrub_attributes'
loofah/lib/loofah/html5/scrub.rb:11:in each' loofah/lib/loofah/html5/scrub.rb:11:inscrub_attributes'
loofah/lib/loofah/scrubber.rb:95:in html5lib_sanitize' loofah/lib/loofah/scrubbers.rb:98:inscrub'
loofah/lib/loofah/scrubber.rb:108:in traverse_conditionally_top_down' loofah/lib/loofah/scrubber.rb:78:intraverse'
loofah/lib/loofah/instance_methods.rb:46:in scrub!' loofah/lib/loofah/instance_methods.rb:54:inblock in scrub!'
.rvm/gems/ruby-1.9.2-p180/gems/nokogiri-1.4.4/lib/nokogiri/xml/node_set.rb:239:in block in each' .rvm/gems/ruby-1.9.2-p180/gems/nokogiri-1.4.4/lib/nokogiri/xml/node_set.rb:238:inupto'
.rvm/gems/ruby-1.9.2-p180/gems/nokogiri-1.4.4/lib/nokogiri/xml/node_set.rb:238:in each' loofah/lib/loofah/instance_methods.rb:54:inscrub!'
loofah/lib/loofah/instance_methods.rb:44:in scrub!' loofah/lib/loofah.rb:52:inscrub_fragment'
loofah/test/html5/test_sanitizer.rb:16:in test_unicode_quote' .rvm/gems/ruby-1.9.2-p180/gems/mocha-0.9.12/lib/mocha/integration/mini_test/version_142_to_172.rb:27:inrun'

…with some RSS feeds and feedzirra.

Encoding::CompatibilityError: incompatible encoding regexp match (ASCII-8BIT regexp with UTF-8 string)
loofah/lib/loofah/html5/scrub.rb:20:in `gsub'
loofah/lib/loofah/html5/scrub.rb:20:in `block in scrub_attributes'
loofah/lib/loofah/html5/scrub.rb:11:in `each'
loofah/lib/loofah/html5/scrub.rb:11:in `scrub_attributes'
loofah/lib/loofah/scrubber.rb:95:in `html5lib_sanitize'
loofah/lib/loofah/scrubbers.rb:98:in `scrub'
loofah/lib/loofah/scrubber.rb:108:in `traverse_conditionally_top_down'
loofah/lib/loofah/scrubber.rb:78:in `traverse'
loofah/lib/loofah/instance_methods.rb:46:in `scrub!'
loofah/lib/loofah/instance_methods.rb:54:in `block in scrub!'
.rvm/gems/ruby-1.9.2-p180/gems/nokogiri-1.4.4/lib/nokogiri/xml/node_set.rb:239:in `block in each'
.rvm/gems/ruby-1.9.2-p180/gems/nokogiri-1.4.4/lib/nokogiri/xml/node_set.rb:238:in `upto'
.rvm/gems/ruby-1.9.2-p180/gems/nokogiri-1.4.4/lib/nokogiri/xml/node_set.rb:238:in `each'
loofah/lib/loofah/instance_methods.rb:54:in `scrub!'
loofah/lib/loofah/instance_methods.rb:44:in `scrub!'
loofah/lib/loofah.rb:52:in `scrub_fragment'
loofah/test/html5/test_sanitizer.rb:16:in `test_unicode_quote'
.rvm/gems/ruby-1.9.2-p180/gems/mocha-0.9.12/lib/mocha/integration/mini_test/version_142_to_172.rb:27:in `run'
@flavorjones
Copy link
Owner

Thanks for this! I'll pull it in (with some small changes) this week.

@flavorjones
Copy link
Owner

I've modified your patch to actually use:

/[`\u0000-\u0020\u007F\s\u0080-\u0101]/

This will be released early next week. I'm closing this pull request, even though I'm not committed to master yet, as this issue is tracked in #25 and #29.

Thanks for your help!

@flavorjones
Copy link
Owner

Fix is in 1.1.0, just released.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants