Skip to content

Commit

Permalink
Add my solutions for last month's SacRuby meeting
Browse files Browse the repository at this point in the history
  • Loading branch information
jeremyevans committed Aug 17, 2016
1 parent c6c97bd commit c61fd3f
Show file tree
Hide file tree
Showing 3 changed files with 279 additions and 0 deletions.
55 changes: 55 additions & 0 deletions 2016-07-shakespeare/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,55 @@
# rubyquiz-201607 Fun with words

Your challenge is to create a word that reads the entire Shakepeare set of texts
and processes the files to answer the following questions:

1. What is the distribution of words according to initial letter of the word?
2. What are the top ten words that appear in the entire body of texts?
3. What are the top three-letter combinations that appear in the text, ignoring all special characters, punctuation, and spaces?

Create three functions (and any other supporting programming you desire). The specification is as follows:

`$ ruby shakespeare.rb [alpha] [dir to data set]`

Should print out a histogram to standard output in the form of:

A | 2003
B | 1243
...
Z | 45

The second program should be run with the command line:

`$ ruby shakespeare.rb [ten] [dir to data set]`

And should print out a list ten words in alphabetical order, comma separated:

apple,boy,charlie,...,ten

The final program function should be run with the command line:

`$ ruby shakespeare.rb [three] [dir to data set]`

And should print out a comma separated list of three letter combinations in alphabetical order:

abb,cat,egg,...,zip

All functions should be able to be run from the same program file.

The main program file should be called `shakespeare.rb`.

Submit your programs in a self contained directory as a pull request to this challenge. Use your team or user name as the subdirectory in the solutions folder.

## Testing

We'll compare solutions on the testing machine which has Ruby 2.3.1 installed
and using the Unix `time` tool.

Runs can be executed in any order.

It is ok to write temporary files for your processing, but they must be written
in the current workding directory, not the data directory.

## Credits

The data used for this exercise is from http://www.folgerdigitaltexts.org/. See the license in the data folder and on the web site.
123 changes: 123 additions & 0 deletions 2016-07-shakespeare/shakespeare-optimized.rb
Original file line number Diff line number Diff line change
@@ -0,0 +1,123 @@
#!/usr/bin/env ruby

=begin
# Times using AMD Athlon(tm) Processor TF-20, 1596.25 MHz
# Previous, unoptimized code
alpha 0m05.29s real 0m04.53s user 0m00.59s system
ten 0m02.61s real 0m02.14s user 0m00.35s system
three 0m39.84s real 0m35.30s user 0m03.56s system
# Optimized? code:
alpha 0m04.75s real 0m04.57s user 0m00.10s system
ten 0m04.54s real 0m04.40s user 0m00.11s system
three 0m20.91s real 0m20.71s user 0m00.05s system
=end

require 'find'
require 'strscan'

class Shake
def self.usage
$stderr.puts "usage: ruby shakespeare.rb [alpha|ten|three] dir"
exit(1)
end

def self.call(dir)
new(dir).call
end

attr_reader :dir, :files

def initialize(dir)
@dir = dir
end

def each_file
Find.find(dir) do |f|
yield f if File.file?(f)
end
end

def each_text
each_file{|f| yield File.binread(f).downcase}
end

def counter
Hash.new(0)
end

def top_ten(hash)
puts hash.sort_by{|_, v| -v}[0...10].map(&:first).sort.join(',')
end

class Scanner < StringScanner
def each_match(regexp)
skipper = /(?=#{regexp})/
while (x = skip_until(skipper); match = scan(regexp))
yield match
end
end
end

class Alpha < self
def call
c = counter

each_text do |text|
Scanner.new(text).each_match(/\b[a-z]+/) do |letter|
c[letter[0]] += 1
end
end

c.sort.each do |k, v|
puts "#{k.upcase} | #{v}"
end
end
end

class Ten < self
def call
c = counter

each_text do |text|
scanner = StringScanner.new(text)
Scanner.new(text).each_match(/\b[a-z]+\b/) do |word|
c[word] += 1
end
end

top_ten(c)
end
end

class Three < self
def call
c = counter
p1 = nil
p2 = nil

each_text do |text|
scanner = StringScanner.new(text)
Scanner.new(text).each_match(/[a-z]/) do |letter|
c["#{p2}#{p1}#{letter}"] += 1
p2 = p1
p1 = letter
end
end

top_ten(c)
end
end
end

case v = ARGV.first
when 'alpha', 'ten', 'three'
if dir = ARGV[1]
Shake.const_get(v.capitalize).call(ARGV[1])
else
Shake.usage
end
else
Shake.usage
end if $0 == __FILE__
101 changes: 101 additions & 0 deletions 2016-07-shakespeare/shakespeare.rb
Original file line number Diff line number Diff line change
@@ -0,0 +1,101 @@
#!/usr/bin/env ruby

=begin
# Unoptimized code, times using AMD Athlon(tm) Processor TF-20, 1596.25 MHz
alpha
0m05.29s real 0m04.53s user 0m00.59s system
ten
0m02.61s real 0m02.14s user 0m00.35s system
three
0m39.84s real 0m35.30s user 0m03.56s system
=end

require 'find'

class Shake
def self.usage
$stderr.puts "usage: ruby shakespeare.rb [alpha|ten|three] dir"
exit(1)
end

def self.call(dir)
new(dir).call
end

attr_reader :dir, :files

def initialize(dir)
@dir = dir
files = []
Find.find(dir) do |f|
next unless File.file?(f)
files << f
end
@files = files
end

def text
@text ||= files.map{|f| File.binread(f)}.join(' ').downcase
end

def words
@words ||= text.split
end

def counter(enum)
h = Hash.new(0)
if block_given?
enum.each do |w|
if value = yield(w)
h[value] += 1
end
end
else
enum.each do |w|
h[w] += 1
end
end
h
end

def top_ten(hash)
puts hash.sort_by{|_, v| -v}[0...10].map(&:first).sort.join(',')
end

class Alpha < self
def call
h = Hash.new(0)
letters = counter(words) do |word|
letter = word[0]
letter if letter =~ /[a-z]/
end
letters.sort.each do |k, v|
puts "#{k.upcase} | #{v}"
end
end
end

class Ten < self
def call
top_ten(counter(words))
end
end

class Three < self
def call
top_ten(counter(text.gsub(/[^a-z]/, '').split(//).each_cons(3), &:join))
end
end
end

case v = ARGV.first
when 'alpha', 'ten', 'three'
if dir = ARGV[1]
Shake.const_get(v.capitalize).call(ARGV[1])
else
Shake.usage
end
else
Shake.usage
end if $0 == __FILE__

0 comments on commit c61fd3f

Please sign in to comment.