WordStats provides a set of methods useful for counting character and word frequencies.
Add this line to your application's Gemfile:
And then execute:
Or install it yourself as:
$ gem install word_stats
Require the WordStats gem as follows:
require 'word_stats' # Remember to require Ruby Gems first if using Ruby 1.8 text = "The quick brown fox jumps over the lazy dog." # Note: all strings processed by WordStats are downcased!!
WordStats provides shortcuts for single letter frequencies, bigrams and trigrams. The
WordStats::Characters.ngrams(n,text) method can be used to find n-grams of any length. The output is a hash of the form [:word,count].
letter_frequencies = WordStats::Characters.letters(text) letter_frequencies[:'u'] #=> 2 bigrams = WordStats::Characters.bigrams(text) bigrams[:'th'] #=> 2 trigrams = WordStats::Characters.trigrams(text) trigrams['qui'.to_sym] #=> 1 octocats = WordStats::Characters.ngrams(8,text) octocats[:'The quic'] #=> 0 octocats[:'the quic'] #=> 1
Similarly, WordStats provides a method to count words and any arbitrary length sequence of words:
word_count = WordStats::Words.nwords(1,text) word_count[:'the'] #=> 2 word_pairs = WordStats::Words.nwords(2,text) word_pairs[:'quick brown'] #=> 1
WordStats will downcase any string that you pass into it. It also strips punctuation before processing.
- Fork it
- Create your feature branch (
git checkout -b my-new-feature)
- Commit your changes (
git commit -am 'Added some feature')
- Push to the branch (
git push origin my-new-feature)
- Create new Pull Request