Permalink
Browse files

Be more accurate when excluding text that is below the minimum length…

… threshold(s): don't count leading or trailing whitespace when counting the length of a node of content.
  • Loading branch information...
1 parent d00f172 commit a3a1725695a7d89e649d8abe982e5d5c7422adce Lee Mallabone committed with cantino Mar 29, 2010
Showing with 2 additions and 2 deletions.
  1. +1 −1 lib/readability.rb
  2. +1 −1 spec/fixtures/samples/channel4-1-fragments.rb
View
@@ -233,7 +233,7 @@ def sanitize(node, candidates, options = {})
counts = %w[p img li a embed input].inject({}) { |m, kind| m[kind] = el.css(kind).length; m }
counts["li"] -= 100
- content_length = el.text.length
+ content_length = el.text.strip.length # Count the text length excluding any surrounding whitespace
link_density = get_link_density(el)
to_remove = false
reason = ""
@@ -9,6 +9,6 @@
]
$excluded_fragments = [
-# "Share this article" # ideally this would not be present
+ "Share this article" # ideally this would not be present
]

0 comments on commit a3a1725

Please sign in to comment.