layout | title | date |
---|---|---|
post |
A Simple Ruby NGram Generator |
2012-04-24 08:45:43 -0700 |
I was playing around with Ruby the other night and wrote a simple n-gram generator. In case anyone is interested, here is the script:
{% highlight ruby %} #!/usr/bin/env ruby -w
$words = File.read(ARGV[0]).downcase.scan(/[a-z]+/)
bi_grams = Hash.new(0) tri_grams = Hash.new(0)
num = $words.length - 2 num.times {|i| bi = $words[i] + ' ' + $words[i+1] tri = bi + ' ' + $words[i+2] bi_grams[bi] += 1 tri_grams[tri] += 1 }
puts "## -- bi-grams -- ##" bg = bi_grams.sort{|a,b| b[1] <=> a[1]} (num / 10).times {|i| puts "#{bg[i][0]} : #{bg[i][1]}"} puts "\n" puts "## -- tri-grams -- ##" tg = tri_grams.sort{|a,b| b[1] <=> a[1]} (num / 10).times {|i| puts "#{tg[i][0]} : #{tg[i][1]}"} {% endhighlight %}