add optional fallback encoding to HTML::Document.parse w/ autodetect #660

Closed
wants to merge 1 commit into
from

Conversation

Projects
None yet
3 participants

add an escape hatch to pass a default encoding to HTML::Document.parse even when we want Nokogiri's
autodetect to happen.

Sending pull request as discussed in the mailing list.

add an escape hatch to pass a default encoding
to HTML::Document.parse even when we want Nokogiri's
autodetect to happen.
Owner

flavorjones commented Apr 23, 2012

Thanks, will take a look.

Owner

flavorjones commented Apr 27, 2012

Hi there!

Because we're very likely going to heavily refactor EncodingReader soon, I want to make sure we've got complete test coverage for Nokogiri's behavior in all scenarios.

Based on your nokogiri-talk thread, I think there should probably be clear test coverage for the following cases where .parse is passed an IO object:

  • HTML declares charset X, encoding Y is passed to .parse -> what happens?
  • HTML declares charset X, no encoding passed to .parse -> parsed as X
  • HTML declares charset X, fallback encoding Y is passed to .parse -> parsed as X
  • HTML with encoding X does not declare charset, encoding Y is passed to parse -> what happens?
  • HTML with encoding X does not declare charset, no encoding is passed to .parse -> what happens?
  • HTML with encoding X does not declare charset, fallback encoding Y is passed to .parse -> what happens?

And then repeat all these tests for a String object.

I don't see explicit coverage for all these cases ... would you mind making sure they're covered?

I shall have some free time tomorrow, I'll try to get it done in a new
branch and send you a pull request for that ASAP.

On Fri, Apr 27, 2012 at 5:41 AM, Mike Dalessio
reply@reply.github.com
wrote:

Hi there!

Because we're very likely going to heavily refactor EncodingReader soon, I want to make sure we've got complete test coverage for Nokogiri's behavior in all scenarios.

Based on your nokogiri-talk thread, I think there should probably be clear test coverage for the following cases where .parse is passed an IO object:

  • HTML declares charset X, encoding Y is passed to .parse -> what happens?
  • HTML declares charset X, no encoding passed to .parse -> parsed as X
  • HTML declares charset X, fallback encoding Y is passed to .parse -> parsed as X
  • HTML with encoding X does not declare charset, encoding Y is passed to parse -> what happens?
  • HTML with encoding X does not declare charset, no encoding is passed to .parse -> what happens?
  • HTML with encoding X does not declare charset, fallback encoding Y is passed to .parse -> what happens?

And then repeat all these tests for a String object.

I don't see explicit coverage for all these cases ... would you mind making sure they're covered?


Reply to this email directly or view it on GitHub:
https://github.com/tenderlove/nokogiri/pull/660#issuecomment-5372930

twitter: @riffraff
blog (en, it): www.riffraff.info riffraff.blogsome.com
work: circleme.com

Owner

leejarvis commented Jan 18, 2014

@riffraff Closing this due to the time span and I'm cleaning up, happy to discuss a merge should it be revisited.

@leejarvis leejarvis closed this Jan 18, 2014

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment