-
Notifications
You must be signed in to change notification settings - Fork 0
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
Add my solutions for last month's SacRuby meeting
- Loading branch information
1 parent
c6c97bd
commit c61fd3f
Showing
3 changed files
with
279 additions
and
0 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,55 @@ | ||
# rubyquiz-201607 Fun with words | ||
|
||
Your challenge is to create a word that reads the entire Shakepeare set of texts | ||
and processes the files to answer the following questions: | ||
|
||
1. What is the distribution of words according to initial letter of the word? | ||
2. What are the top ten words that appear in the entire body of texts? | ||
3. What are the top three-letter combinations that appear in the text, ignoring all special characters, punctuation, and spaces? | ||
|
||
Create three functions (and any other supporting programming you desire). The specification is as follows: | ||
|
||
`$ ruby shakespeare.rb [alpha] [dir to data set]` | ||
|
||
Should print out a histogram to standard output in the form of: | ||
|
||
A | 2003 | ||
B | 1243 | ||
... | ||
Z | 45 | ||
|
||
The second program should be run with the command line: | ||
|
||
`$ ruby shakespeare.rb [ten] [dir to data set]` | ||
|
||
And should print out a list ten words in alphabetical order, comma separated: | ||
|
||
apple,boy,charlie,...,ten | ||
|
||
The final program function should be run with the command line: | ||
|
||
`$ ruby shakespeare.rb [three] [dir to data set]` | ||
|
||
And should print out a comma separated list of three letter combinations in alphabetical order: | ||
|
||
abb,cat,egg,...,zip | ||
|
||
All functions should be able to be run from the same program file. | ||
|
||
The main program file should be called `shakespeare.rb`. | ||
|
||
Submit your programs in a self contained directory as a pull request to this challenge. Use your team or user name as the subdirectory in the solutions folder. | ||
|
||
## Testing | ||
|
||
We'll compare solutions on the testing machine which has Ruby 2.3.1 installed | ||
and using the Unix `time` tool. | ||
|
||
Runs can be executed in any order. | ||
|
||
It is ok to write temporary files for your processing, but they must be written | ||
in the current workding directory, not the data directory. | ||
|
||
## Credits | ||
|
||
The data used for this exercise is from http://www.folgerdigitaltexts.org/. See the license in the data folder and on the web site. |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,123 @@ | ||
#!/usr/bin/env ruby | ||
|
||
=begin | ||
# Times using AMD Athlon(tm) Processor TF-20, 1596.25 MHz | ||
# Previous, unoptimized code | ||
alpha 0m05.29s real 0m04.53s user 0m00.59s system | ||
ten 0m02.61s real 0m02.14s user 0m00.35s system | ||
three 0m39.84s real 0m35.30s user 0m03.56s system | ||
# Optimized? code: | ||
alpha 0m04.75s real 0m04.57s user 0m00.10s system | ||
ten 0m04.54s real 0m04.40s user 0m00.11s system | ||
three 0m20.91s real 0m20.71s user 0m00.05s system | ||
=end | ||
|
||
require 'find' | ||
require 'strscan' | ||
|
||
class Shake | ||
def self.usage | ||
$stderr.puts "usage: ruby shakespeare.rb [alpha|ten|three] dir" | ||
exit(1) | ||
end | ||
|
||
def self.call(dir) | ||
new(dir).call | ||
end | ||
|
||
attr_reader :dir, :files | ||
|
||
def initialize(dir) | ||
@dir = dir | ||
end | ||
|
||
def each_file | ||
Find.find(dir) do |f| | ||
yield f if File.file?(f) | ||
end | ||
end | ||
|
||
def each_text | ||
each_file{|f| yield File.binread(f).downcase} | ||
end | ||
|
||
def counter | ||
Hash.new(0) | ||
end | ||
|
||
def top_ten(hash) | ||
puts hash.sort_by{|_, v| -v}[0...10].map(&:first).sort.join(',') | ||
end | ||
|
||
class Scanner < StringScanner | ||
def each_match(regexp) | ||
skipper = /(?=#{regexp})/ | ||
while (x = skip_until(skipper); match = scan(regexp)) | ||
yield match | ||
end | ||
end | ||
end | ||
|
||
class Alpha < self | ||
def call | ||
c = counter | ||
|
||
each_text do |text| | ||
Scanner.new(text).each_match(/\b[a-z]+/) do |letter| | ||
c[letter[0]] += 1 | ||
end | ||
end | ||
|
||
c.sort.each do |k, v| | ||
puts "#{k.upcase} | #{v}" | ||
end | ||
end | ||
end | ||
|
||
class Ten < self | ||
def call | ||
c = counter | ||
|
||
each_text do |text| | ||
scanner = StringScanner.new(text) | ||
Scanner.new(text).each_match(/\b[a-z]+\b/) do |word| | ||
c[word] += 1 | ||
end | ||
end | ||
|
||
top_ten(c) | ||
end | ||
end | ||
|
||
class Three < self | ||
def call | ||
c = counter | ||
p1 = nil | ||
p2 = nil | ||
|
||
each_text do |text| | ||
scanner = StringScanner.new(text) | ||
Scanner.new(text).each_match(/[a-z]/) do |letter| | ||
c["#{p2}#{p1}#{letter}"] += 1 | ||
p2 = p1 | ||
p1 = letter | ||
end | ||
end | ||
|
||
top_ten(c) | ||
end | ||
end | ||
end | ||
|
||
case v = ARGV.first | ||
when 'alpha', 'ten', 'three' | ||
if dir = ARGV[1] | ||
Shake.const_get(v.capitalize).call(ARGV[1]) | ||
else | ||
Shake.usage | ||
end | ||
else | ||
Shake.usage | ||
end if $0 == __FILE__ |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,101 @@ | ||
#!/usr/bin/env ruby | ||
|
||
=begin | ||
# Unoptimized code, times using AMD Athlon(tm) Processor TF-20, 1596.25 MHz | ||
alpha | ||
0m05.29s real 0m04.53s user 0m00.59s system | ||
ten | ||
0m02.61s real 0m02.14s user 0m00.35s system | ||
three | ||
0m39.84s real 0m35.30s user 0m03.56s system | ||
=end | ||
|
||
require 'find' | ||
|
||
class Shake | ||
def self.usage | ||
$stderr.puts "usage: ruby shakespeare.rb [alpha|ten|three] dir" | ||
exit(1) | ||
end | ||
|
||
def self.call(dir) | ||
new(dir).call | ||
end | ||
|
||
attr_reader :dir, :files | ||
|
||
def initialize(dir) | ||
@dir = dir | ||
files = [] | ||
Find.find(dir) do |f| | ||
next unless File.file?(f) | ||
files << f | ||
end | ||
@files = files | ||
end | ||
|
||
def text | ||
@text ||= files.map{|f| File.binread(f)}.join(' ').downcase | ||
end | ||
|
||
def words | ||
@words ||= text.split | ||
end | ||
|
||
def counter(enum) | ||
h = Hash.new(0) | ||
if block_given? | ||
enum.each do |w| | ||
if value = yield(w) | ||
h[value] += 1 | ||
end | ||
end | ||
else | ||
enum.each do |w| | ||
h[w] += 1 | ||
end | ||
end | ||
h | ||
end | ||
|
||
def top_ten(hash) | ||
puts hash.sort_by{|_, v| -v}[0...10].map(&:first).sort.join(',') | ||
end | ||
|
||
class Alpha < self | ||
def call | ||
h = Hash.new(0) | ||
letters = counter(words) do |word| | ||
letter = word[0] | ||
letter if letter =~ /[a-z]/ | ||
end | ||
letters.sort.each do |k, v| | ||
puts "#{k.upcase} | #{v}" | ||
end | ||
end | ||
end | ||
|
||
class Ten < self | ||
def call | ||
top_ten(counter(words)) | ||
end | ||
end | ||
|
||
class Three < self | ||
def call | ||
top_ten(counter(text.gsub(/[^a-z]/, '').split(//).each_cons(3), &:join)) | ||
end | ||
end | ||
end | ||
|
||
case v = ARGV.first | ||
when 'alpha', 'ten', 'three' | ||
if dir = ARGV[1] | ||
Shake.const_get(v.capitalize).call(ARGV[1]) | ||
else | ||
Shake.usage | ||
end | ||
else | ||
Shake.usage | ||
end if $0 == __FILE__ |