Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Bug fix: _prepDocument removes every other <font> element #313

Closed
wants to merge 5 commits into from

Conversation

andrei-ch
Copy link
Contributor

Since doc.getElementsByTagName("font") returns live array, _forEachNode
must be called with backward=true or it will skip elements.

Also, _forEachNode with backward=true was NYI.

Since doc.getElementsByTagName("font") returns live array, _forEachNode
must be called with backward=true or it will skip elements.

Also, _forEachNode with backward=true was NYI.
@gijsk
Copy link
Contributor

gijsk commented Nov 2, 2016

Thanks for your PR.

Over in #300, we decided instead of having the 'backward' param, we'd just have a helper to remove items instead. I think it would be clearer if we did the same for the _setNodeTag usage - we could have a helper called _rewriteAllNodes that takes the original tag name and new tag name as an argument, for instance. We should remove the confusing unused backward parameter - I'm a bit surprised eslint didn't pick those up.

eslint noted some styling issues with the indentation, could you fix those?

Otherwise, could you explain how the extra cleanup calls for input/textarea/button/select relate to the bugfix for <font> ?

1) Avoid conversion of whitespace text nodes into paragraphs. They
create a lot of noise and actually prevent sibling joining logic from
working in many pages.

2) Handle case when adjacent content is actually located in parent's
sibling node instead of top candidate’s sibling.
Solution: strip one level of empty <DIV> elements so they don’t
obstruct merging adjacent content downstream.
@gijsk
Copy link
Contributor

gijsk commented Dec 17, 2016

I separated out the font tag mistakes and cleaning up the form elements, fixed some tests and code style issues and merged those to master. Thank you!

It seems to me some of the reparenting logic would be interesting and probably fixes real bugs, but at the moment it's causing a lot of test failures. Have you had a chance to look at those at all? You should be able to run the tests locally with something like mocha test/test-readability.js, optionally passing something like -g clean-links to select a specific test to run. It's possible that in some (maybe even most?) cases we simply end up updating the testcases, if the result is not any worse (ie the failures do not indicate actual issues). You can do this automatically using node generate-testcase.js <testcase-slug>.

@gijsk
Copy link
Contributor

gijsk commented Jan 21, 2017

All of these changes have now been merged separately / after fixing test failures. Thank you!

@gijsk gijsk closed this Jan 21, 2017
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants