Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

encoding issues #1113 #1251

Closed
wants to merge 4 commits into from

Conversation

mkristian
Copy link
Contributor

feel free to cherry-pick any commit - each should be self contained.

2f43a0c does break

1) Failure:
Nokogiri::XML::TestNode#test_node_context_parsing_of_malformed_html_fragment_with_recover_is_corrected [test/xml/test_node.rb:109]:
Expected: 1
  Actual: 0

  2) Failure:
Nokogiri::XML::TestNode#test_node_context_parsing_of_malformed_html_fragment_without_recover_is_not_corrected [test/xml/test_node.rb:119]:
Expected: 1
  Actual: 0

nekohtml just does not report the missing attribute error anymore (or I did not find a way to tell nekohtml to do so). and I am not sure what to do to resolve this failures.

so any input and or ideas are welcome.

the xerces update it no really needed but I think it is overdue.

this is partial fix for sparklemotion#1113 to NOT use character entities when the encoding
of the document can encode the data.

Sponsored by Lookout Inc.
the last release of xerces is quite a while ago. using the latest version
seems appropriate.

Sponsored by Lookout Inc.
the new version of nekohtml brought a few regressions. this commit fixes
but two error warning ones.

it avoids to autocomplete the tbody tag around tr tags of a table. the check
of unknown html did change upstream and got adjusted.

fixes sparklemotion#1113

Sponsored by Lookout Inc.
@vilius
Copy link

vilius commented Jul 9, 2015

👍 would love to see this fixed

@flavorjones flavorjones added this to the 1.6.8 milestone Nov 30, 2015
@flavorjones
Copy link
Member

tagging @jvshahid

@jvshahid
Copy link
Member

I was able to fix the regression in nekohtml. I have the commit in my nekohtml fork. Both nekohtml tests as well as nokogiri tests pass and the special case for jruby introduced in this pr isn't required anymore. I'll try to write a test and submit it upstream. I also reverted the xerces.jar file update since this was updated in #1212

@jvshahid jvshahid closed this in 86a683a Jan 2, 2016
jvshahid added a commit that referenced this pull request Mar 25, 2018
the patch accidentally removed the parents of the TR element. This caused any
document fragment with a dangling (i.e. with no parent) TD or TR element to
cause a stack overflow

fixes #1501
jvshahid added a commit that referenced this pull request Mar 25, 2018
this is an ugly change whose only purpose is to mask the difference between
libxml and nekohtml. we agreed to stop doing that a while ago and just accept
that different libraries will behave different. furthermore, it caused a stack
overflow while parding documents with a TD element that doesn't have any
parents in #1501

fixes #1501
flavorjones added a commit that referenced this pull request Mar 29, 2018
remove monkey patch introduced in #1251
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants