Skip to content
Browse files

Don't assume all pages are UTF-8. Allow the page encoding to be set w…

…ith via an :encoding option.
  • Loading branch information...
1 parent dd14a86 commit 31ae81280412e03c06cdcb7a0873ce80d61dd3a6 @andykent andykent committed Jul 3, 2011
Showing with 4 additions and 2 deletions.
  1. +4 −2 lib/readability.rb
View
6 lib/readability.rb
@@ -9,7 +9,8 @@ class Document
:remove_unlikely_candidates => true,
:weight_classes => true,
:clean_conditionally => true,
- :remove_empty_nodes => true
+ :remove_empty_nodes => true,
+ :encoding => 'UTF-8'
}.freeze
attr_accessor :options, :html
@@ -20,11 +21,12 @@ def initialize(input, options = {})
@remove_unlikely_candidates = @options[:remove_unlikely_candidates]
@weight_classes = @options[:weight_classes]
@clean_conditionally = @options[:clean_conditionally]
+ @encoding = @options[:encoding]
make_html
end
def make_html
- @html = Nokogiri::HTML(@input, nil, 'UTF-8')
+ @html = Nokogiri::HTML(@input, nil, @encoding)
end
REGEXES = {

0 comments on commit 31ae812

Please sign in to comment.
Something went wrong with that request. Please try again.