Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Bug fix: _prepDocument removes every other <font> element #313

Closed
wants to merge 5 commits into from

Commits on Nov 2, 2016

  1. Bug fix: _prepDocument removes every other <font> element

    Since doc.getElementsByTagName("font") returns live array, _forEachNode
    must be called with backward=true or it will skip elements.
    
    Also, _forEachNode with backward=true was NYI.
    andrei-ch committed Nov 2, 2016
    Configuration menu
    Copy the full SHA
    7fcde95 View commit details
    Browse the repository at this point in the history
  2. Configuration menu
    Copy the full SHA
    e7ec661 View commit details
    Browse the repository at this point in the history

Commits on Nov 20, 2016

  1. Bug fix: many pages only grab partial content (dirty.ru, nytimes.com)

    1) Avoid conversion of whitespace text nodes into paragraphs. They
    create a lot of noise and actually prevent sibling joining logic from
    working in many pages.
    
    2) Handle case when adjacent content is actually located in parent's
    sibling node instead of top candidate’s sibling.
    andrei-ch committed Nov 20, 2016
    Configuration menu
    Copy the full SHA
    8ec087f View commit details
    Browse the repository at this point in the history
  2. Bug fix: still not grabbing full content from nytimes.com articles

    Solution: strip one level of empty <DIV> elements so they don’t
    obstruct merging adjacent content downstream.
    andrei-ch committed Nov 20, 2016
    Configuration menu
    Copy the full SHA
    310b636 View commit details
    Browse the repository at this point in the history

Commits on Dec 5, 2016

  1. Configuration menu
    Copy the full SHA
    53d1e15 View commit details
    Browse the repository at this point in the history