Skip to content
This repository
tree: 0f1cffdeac
Fetching contributors…

Octocat-spinner-32-eaf2f5

Cannot retrieve contributors at this time

file 29 lines (22 sloc) 0.655 kb
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28
# encoding=utf-8

require 'nokogiri'

class WordCounter

  attr_accessor :text
  
  def initialize(text)
    @text = text
  end
  
  # only count actual text
  # scan by word boundaries after stripping hyphens and apostrophes
  # so one-word and one's will be counted as one word, not two.
  # -- is replaced by — (emdash) before strip so one--two will count as 2
  def count
    count = 0
    body = Nokogiri::HTML(@text).xpath('//body').first
    body.traverse do |node|
      if node.is_a? Nokogiri::XML::Text
        count += node.inner_text.gsub(/--/, "—").gsub(/['’‘-]/, "").scan(/[[:word:]]+/).size
      end
    end
    count
  end
  
end
Something went wrong with that request. Please try again.