Skip to content

Commit

Permalink
Fix bug in Mechanize::Page#charset
Browse files Browse the repository at this point in the history
  • Loading branch information
drbrain committed Apr 9, 2011
1 parent d934857 commit 287913b
Show file tree
Hide file tree
Showing 3 changed files with 17 additions and 13 deletions.
1 change: 1 addition & 0 deletions CHANGELOG.rdoc
Expand Up @@ -62,6 +62,7 @@ Mechanize is now under the MIT license
* file URIs are now read in binary mode. GH #83
* Content-Encoding: x-gzip is now treated like gzip per RFC 2616.
* Mechanize now unescapes URIs for meta refresh. GH #68
* Mechanize now has more robust HTML charset detection. GH #43

=== 1.0.0

Expand Down
4 changes: 3 additions & 1 deletion lib/mechanize/file.rb
Expand Up @@ -29,7 +29,9 @@ class File
alias :content :body

def initialize(uri=nil, response=nil, body=nil, code=nil)
@uri, @body, @code = uri, body, code
@uri = uri
@body = body
@code = code
@response = Headers.new

# Copy the headers in to a hash to prevent memory leaks
Expand Down
25 changes: 13 additions & 12 deletions lib/mechanize/page.rb
Expand Up @@ -34,22 +34,23 @@ def initialize(uri=nil, response=nil, body=nil, code=nil, mech=nil)
@encoding = charset value
end

# Force the encoding to be 8BIT so we can perform regular expressions.
# We'll set it to the detected encoding later
body.force_encoding('ASCII-8BIT') if
body && body.respond_to?(:force_encoding)
if body
# Force the encoding to be 8BIT so we can perform regular expressions.
# We'll set it to the detected encoding later
body.force_encoding('ASCII-8BIT') if body.respond_to?(:force_encoding)

body.scan /<meta .*?>/i do |meta|
next unless meta =~ /http-equiv=(["'])?content-type\1/i
body.scan(/<meta .*?>/i) do |meta|
next unless meta =~ /http-equiv=(["'])?content-type\1/i

meta =~ /content=(["'])?(.*?)\1/i
meta =~ /content=(["'])?(.*?)\1/i

encoding = charset $2
encoding = charset $2

@encoding = encoding if encoding
end if body
@encoding = encoding if encoding
end

@encoding ||= Mechanize::Util.detect_charset(body)
@encoding ||= Mechanize::Util.detect_charset(body)
end

super(uri, response, body, code)
end
Expand All @@ -72,7 +73,7 @@ def title

def charset content_type
charset = content_type[/charset=([^; ]+)/i, 1]
return nil if encoding == 'none'
return nil if charset == 'none'
charset
end

Expand Down

0 comments on commit 287913b

Please sign in to comment.