# Ch 9: Case Study: Word Play

### Reading Word Lists

We will perform the second case study, which involves solving word puzzles by searching for words that have certain properties. For example, we’ll find the longest palindromes in English and search for words whose letters appear in alphabetical order.

We will use one of the word lists collected and contributed to the public domain by Grady Ward as part of the Moby lexicon project (see https://wikipedia.org/wiki/Moby_Project). It is a list of 113809 official crosswords; that is, words that are considered valid in crossword puzzles and other word games. In the Moby collection, the filename is 113809of.fic; you can download a copy, with the simpler name words.txt, from https://github.com/BenLauwens/ThinkJulia.jl/blob/master/data/words.txt.

In [1]:
fin = open("words.txt")    # fin is a file stream

IOStream(<file words.txt>)

In [2]:
readline(fin)

"aa"

In [3]:
readline(fin)     # readline remembers which line we are at

"aah"

In [4]:
close(fin)

### Exercise 9-1

Write a program that reads words.txt and prints only the words with more than 20 characters (not counting whitespace).

In [5]:
for word in eachline("words.txt")
    
    #word_wo_space = filter(x -> !isspace(x), word)
    word_wo_space = [ c for c in word if !isspace(c) ]
    if length(word_wo_space) > 20
        println(word)
    end
    
end


counterdemonstrations
hyperaggressivenesses
microminiaturizations


### Exercise 9-2

In 1939 Ernest Vincent Wright published a 50,000-word novel called Gadsby that does not contain the letter e. Since e is the most common letter in English, that’s not easy to do. In fact, it is difficult to construct a solitary thought without using that most common symbol. It is slow going at first, but with caution and hours of training you can gradually gain facility.

Write a function called `hasno_e` that returns true if the given word doesn’t have the letter e in it. Modify your program from the previous section to print only the words that have no e and compute the percentage of the words in the list that have no e.

In [5]:
function hasno_e(word)
    for letter in word
        if letter == 'e'
            return false
        end
    end
    true
end

hasno_e (generic function with 1 method)

In [6]:
count_all = 0
count_no_e = 0

for word in eachline("words.txt")

    count_all += 1
    if hasno_e(word)
        count_no_e += 1
#        println(word)
    end
    
end


count_all, count_no_e, count_no_e / count_all * 100

(113809, 37641, 33.07383423103621)

### Exercise 9-3

Write a function named `avoids` that takes a word and a string of forbidden letters, and that returns true if the word doesn’t use any of the forbidden letters. Modify your program to prompt the user to enter a string of forbidden letters and then print  the number of words that don’t contain any of them. 

You can write a concise program using the ∉ ( \notin TAB ) operator.

Can you find a combination of 5 forbidden letters that excludes the smallest number of words?

In [7]:
function avoids(word, forbidden)
    for letter in word
        if letter ∈ forbidden
            return false
        end
    end
    true
end

avoids (generic function with 1 method)

In [15]:
println("Input a string of forbidden letters: ")

forbidden_input = readline()
count = 0

for word in eachline("words.txt")

    if avoids(word, forbidden_input)
        count += 1
    end
    
end

count

Input a string of forbidden letters: 
stdin> xyz


95196

Now, to try out all possible combinations of 5 characters, we will use a Julia package called `Combinatorics`.

In [9]:
using Pkg
Pkg.add("Combinatorics")
using Combinatorics

[32m[1m  Updating[22m[39m registry at `C:\Users\ST\.julia\registries\General`
[32m[1m  Updating[22m[39m git-repo `https://github.com/JuliaRegistries/General.git`
[?25l[2K[?25h[32m[1m Resolving[22m[39m package versions...
[32m[1m  Updating[22m[39m `C:\Users\ST\.julia\environments\v1.2\Project.toml`
[90m [no changes][39m
[32m[1m  Updating[22m[39m `C:\Users\ST\.julia\environments\v1.2\Manifest.toml`
[90m [no changes][39m


In [10]:
?combinations

search: [0m[1mc[22m[0m[1mo[22m[0m[1mm[22m[0m[1mb[22m[0m[1mi[22m[0m[1mn[22m[0m[1ma[22m[0m[1mt[22m[0m[1mi[22m[0m[1mo[22m[0m[1mn[22m[0m[1ms[22m multiset_[0m[1mc[22m[0m[1mo[22m[0m[1mm[22m[0m[1mb[22m[0m[1mi[22m[0m[1mn[22m[0m[1ma[22m[0m[1mt[22m[0m[1mi[22m[0m[1mo[22m[0m[1mn[22m[0m[1ms[22m [0m[1mC[22m[0m[1mo[22molLexCo[0m[1mm[22m[0m[1mb[22m[0m[1mi[22m[0m[1mn[22m[0m[1ma[22m[0m[1mt[22m[0m[1mi[22m[0m[1mo[22m[0m[1mn[22m[0m[1ms[22m



```
combinations(a, n)
```

Generate all combinations of `n` elements from an indexable object `a`. Because the number of combinations can be very large, this function returns an iterator object. Use `collect(combinations(a, n))` to get an array of all combinations.

---

```
combinations(a)
```

Generate combinations of the elements of `a` of all orders. Chaining of order iterators is eager, but the sequence at each order is lazy.


In [119]:
letters = ['a', 'b', 'c','d']
combinations(letters,3)

Combinatorics.Combinations{Array{Char,1}}(['a', 'b', 'c', 'd'], 3)

Whenever you expect to get an array or sequence of some items and yet you do not, try `collect`.

In [120]:
collect(combinations(letters,3))

4-element Array{Array{Char,1},1}:
 ['a', 'b', 'c']
 ['a', 'b', 'd']
 ['a', 'c', 'd']
 ['b', 'c', 'd']

In [36]:
for combo in collect(combinations(letters,3))
    println(join(combo))
    println(typeof(join(combo)))
end
    

abc
String
abd
String
acd
String
bcd
String


In [12]:
# This is going to run several hours if we pick 5 characters!
# so, let's do 3 characters C(26,3) = 2,600; C(26,5) = 65,780

alphabets = ['a','b','c','d','e','f','g','h','i','j','k','l','m','n','o','p','q','r','s','t','u','v','w','x','y','z']


fewest_exclusion = 113809  # worst case
forbidden_wfe =""

for combo in collect(combinations(alphabets,3))
    
    forbidden_input = join(combo)
    
    count = 0
    for word in eachline("words.txt")
        
        if avoids(word, forbidden_input)
            count += 1
        end
    end
    
    if count < fewest_exclusion
        fewest_exclusion = count
        forbidden_wfe = forbidden_input
    end
    
end

fewest_exclusion, forbidden_wfe



(5597, "aei")

### Exercise 9-4

Write a function named `usesonly` that takes a word and a string of letters, and that returns true if the word contains only letters in the list. Can you make a sentence using only the letters acefhlo ? Other than "Hoe alfalfa?"

In [16]:
function usesonly(word, available)
    for letter in word
        if letter ∉ available
            return false
        end
    end
    true
end

usesonly (generic function with 1 method)

In [18]:
available = "acefhlo"

for word in eachline("words.txt")

    if usesonly(word,available)
        println(word)
    end
    
end

aa
aah
aal
ace
ache
achoo
ae
aff
ah
aha
ahchoo
ala
alae
alcohol
ale
alec
alee
alef
alfa
alfalfa
all
allele
allheal
aloe
aloha
aloof
cacao
cache
caeca
caecal
cafe
caleche
calf
call
calla
ceca
cecal
cee
cell
cella
cellae
cello
chafe
chaff
chalah
chaleh
challah
chef
chela
chelae
cholla
clach
clef
cloaca
cloacae
cloacal
cloche
coach
coal
coala
coalhole
coca
coccal
cochlea
cochleae
coco
cocoa
coff
coffee
coffle
coho
col
cola
cole
coo
cooch
cooee
coof
cool
each
eche
echo
ecole
eel
ef
eff
efface
eh
el
elf
ell
fa
face
faecal
fall
fallal
falloff
feal
fecal
fee
feel
fell
fella
fellah
felloe
feoff
feoffee
flea
fleche
flee
fleece
fleech
floc
floe
foal
focal
foe
foh
fool
ha
haaf
hae
hah
halala
halalah
hale
half
hall
hallah
hallel
hallo
halloa
halloo
halo
haole
he
heal
heel
hell
hellhole
hello
ho
hoe
hole
holla
hollo
holloa
holloo
hooch
hoof
la
lac
lace
lall
lea
leach
leaf
leal
lech
lee
leech
lo
loach
loaf
loca
local
locale
loch
loco
locofoco
loll
loo
loof
loofa
loofah
oaf
oca
oe
of
off
offal
oh
oho

### Exercise 9-5

Write a function named `usesall` that takes a word and a string of required letters, and that returns true if the word uses all the required letters at least once. How many words are there that use all the vowels aeiou ? How about aeiouy ?

In [19]:
function usesall(word, required)
    for letter in required
        if letter ∉ word
            return false
        end
    end
    true
end

usesall (generic function with 1 method)

If you were really thinking like a computer scientist, you would have recognized that usesall was an instance of a
previously solved problem, and you would have written:

In [20]:
function usesall(word, required)
    usesonly(required, word)
end

usesall (generic function with 1 method)

This is an example of a program development plan called **reduction** to a previously solved problem, which means that you
recognize the problem you are working on as an instance of a solved problem and apply an existing solution.

### Exercise 9-6

Write a function called isabecedarian that returns true if the letters in a word appear in alphabetical order (double letters are ok). How many abecedarian words are there?

In [21]:
# using for loop

function isabecedarian(word)
    
    i = firstindex(word)
    previous = word[i]
    j = nextind(word, i)
    
    for c in word[j:end]
        if c < previous
            return false
        end
        previous = c
    end
    
    true
    
end

isabecedarian (generic function with 1 method)

In [22]:
# using while loop

function isabecedarian(word)
    
    i = firstindex(word)
    j = nextind(word, i)
        
    while j <= sizeof(word)
        if word[j] < word[i]
            return false
        end
        i = j
        j = nextind(word, i)
    end
    
    true
    
end

isabecedarian (generic function with 1 method)

The loop starts at ```i=1``` and ```j=nextind(word, 1)``` and ends when ```j>sizeof(word)```. Each time through the loop, it compares the $i$-th character (which you can think of as the current character) to the $j$ th character (which you can think of as the next). If the next character is less than (alphabetically before) the current one, then we have discovered a break in the abecedarian trend, and we return false. If we get to the end of the loop without finding a fault, then the word passes the test. To convince yourself that the loop ends correctly, consider an example like "flossy" .


In [25]:
isabecedarian("flossy")

true

In [75]:
# using recursion

function isabecedarian(word)
    
    println("Running with $word")
    
    if length(word) <= 1
        return true
    end
    
    i = firstindex(word)
    j = nextind(word, i)
    
    if word[i] > word[j]
        return false
    end
    
    isabecedarian(word[j:end])
    
end

isabecedarian (generic function with 1 method)

In [77]:
isabecedarian("flossy")

Running with flossy
Running with lossy
Running with ossy
Running with ssy
Running with sy
Running with y

true




### Further Examples

Here is a version of `ispalindrome` that uses two indices; one starts at the beginning and goes up; the other starts at the
end and goes down.

In [27]:
function ispalindrome(word)
    
    i = firstindex(word)
    j = lastindex(word)
    
    while i<j
        if word[i] != word[j]
            return false
        end  
        i = nextind(word, i)
        j = prevind(word, j)
    end
    
    true

end

ispalindrome (generic function with 1 method)

Or we could reduce to a previously solved problem (`isreverse`) to implement `ispalindrome`:

In [29]:
function ispalindrome(word)
    isreverse(word, word)
end

ispalindrome (generic function with 1 method)

Well, speaking of reduction, let's see if we can rewrite `isreverse`:

In [32]:
function isreverse(word1,word2)
    
    word1 == word2[end:-1:1]
        
end

isreverse (generic function with 1 method)

In [33]:
isreverse("pots","stop")

true

### Debugging

Testing programs is hard. The functions in this chapter are relatively easy to test because you can check the results by hand. Even so, it is somewhere between difficult and impossible to choose a set of words that test for all possible errors.  You should test long words, short words, and very short words, like the empty string. The empty string is an example of a special case, which is one of the non-obvious cases where errors often lurk. 

In general, testing can help you find bugs, but it is not easy to generate a good set of test cases, and even if you do, you can’t be sure your program is correct. According to a legendary computer scientist: 

**"Program testing can be used to show the presence of bugs, but never to show their absence!"**

— Edsger W. Dijkstra

### Exercise 9-7

This exercise is based on a Puzzler that was broadcast on the radio program Car Talk
(https://www.cartalk.com/puzzler/browse):

"Give me a word with three consecutive double letters. I’ll give you a couple of words that almost qualify, but don’t. For example, the word committee, c-o-m-m-i-t-t-e-e. It would be great except for the i that sneaks in there. Or Mississippi: M-i-s-s-i-s-s-i-p-p-i. If you could take out those i’s it would work. But there is a word that has three consecutive pairs of letters and to the best of my knowledge this may be the only word. Of course there are probably 500 more but I can only think of one. What is the word? Write a program to find it."

TIPs: 

- First write a function `is_three_consec_double` and then scan words.txt using that function.  
- You may want to consider cases like "abbbccdde", "aabbbccdde", and "aaabbccd" and so on in testing the function.

In [147]:
function is_three_consec_double(word)
    
    if length(word) < 6
        return false
    end
    
    i = firstindex(word)
    j = nextind(word,i)
    
    while j <= lastindex(word) - 4

# refactored the following
#
#         if (word[i] == word[j])
            
#             k = nextind(word, j)
#             l = nextind(word, k)
            
#             if word[k] == word[l]
                
#                 m = nextind(word, l)
#                 n = nextind(word, m)
            
#                 if word[m] == word[n]
#                     return true
#                 end

#             end
        
#         end
        
        if (word[i] == word[j]) && mycheck_next4(word,j)
            return true
        end
        
        i = nextind(word, i)
        j = nextind(word, j)
                
    end

    return false
    
end


function mycheck_next4(word,j)

    word[nextind(word, j,1)] == word[nextind(word, j,2)] && word[nextind(word, j,3)] == word[nextind(word,j,4)]
    
end
    


mycheck_next4 (generic function with 1 method)

In [148]:
is_three_consec_double("Mississippi"), is_three_consec_double("Mssssppi")

(false, true)

In [149]:
is_three_consec_double("abbbccdde"),is_three_consec_double("aabbbccdde"),is_three_consec_double("aaabbccde")

(true, true, true)

In [151]:
is_three_consec_double("aaabbbccde"),is_three_consec_double("aaabbbccdde")

(false, true)

In [152]:
for word in eachline("words.txt")

    if is_three_consec_double(word)
        println(word)
    end
    
end


bookkeeper
bookkeepers
bookkeeping
bookkeepings


Let's see if we can rewrite the function `is_three_consec_double` using for-loop.

In [153]:
# function is_three_consec_double_vfor(word)
    
#     if length(word) < 6
#         return false
#     end
    
#     i = firstindex(word)
#     j = nextind(word,i)
#     k = nextind(word,j)
#     l = nextind(word,k)
#     m = nextind(word,l)
#     n = nextind(word,m)
        
#     prev = word[i]
#     next1 = word[k]
#     next2 = word[l]
#     next3 = word[m]
#     next4 = word[n]
    
    
#     for c in word[j:end-4]  # j is in the index, so we'd better not change it inside the loop?
        
#         if c == prev && next1 == next2 && next3 == next4
#             return true
#         else
#             prev = c
#             next1 = next2
#             next2 = next3
#             next4 = ??
#
#
#   Hmm, this is getting cumbersome to handle.  I need to carry and track index (or indices) inside the for loop



# Let's assume all the characters are simple alphabets

function is_three_consec_double_vfor(word)
    
    if length(word) < 6
        return false
    end
    
        
    for i in 1:lastindex(word)-5
        
        if word[i] == word[i+1] && word[i+2] == word[i+3] && word[i+4] == word[i+5]
            return true
        end
        
    end
    
    return false
    
end



is_three_consec_double_vfor (generic function with 1 method)

In [154]:
is_three_consec_double("Mississippi"), is_three_consec_double("Mssssppi")

(false, true)

In [155]:
is_three_consec_double("abbbccdde"),is_three_consec_double("aabbbccdde"),is_three_consec_double("aaabbccde")

(true, true, true)

In [156]:
is_three_consec_double("aaabbbccde"),is_three_consec_double("aaabbbccdde")

(false, true)

In [157]:
for word in eachline("words.txt")

    if is_three_consec_double_vfor(word)
        println(word)
    end
    
end


bookkeeper
bookkeepers
bookkeeping
bookkeepings


### Exercise 9-8


I was driving on the highway the other day and I happened to notice my odometer. Like most odometers, it shows six digits, in whole miles only. So, if my car had 300000 miles, for example, I’d see 3-0-0-0-0-0.

Now, what I saw that day was very interesting. I noticed that the last 4 digits were palindromic; that is, they read the same forward as backward. For example, 5-4-4-5 is a palindrome, so my odometer could have read 3-1-5-4-4-5.

One mile later, the last 5 numbers were palindromic. For example, it could have read 3-6-5-4-5-6.

One mile after that, the middle 4 out of 6 numbers were palindromic. And you ready for this? One mile later, all 6 were palindromic!

The question is, what was on the odometer when I first looked?

Write a Julia program that tests all the six-digit numbers and prints any numbers that satisfy these requirements.

TIP: For this and next exercise, you may find function `lpad` useful.  Also, function `string (n::Int)` that converts an integer number into a string.

In [6]:
?lpad

search: [0m[1ml[22m[0m[1mp[22m[0m[1ma[22m[0m[1md[22m c[0m[1ml[22mi[0m[1mp[22mbo[0m[1ma[22mr[0m[1md[22m re[0m[1ml[22m[0m[1mp[22m[0m[1ma[22mth rea[0m[1ml[22m[0m[1mp[22m[0m[1ma[22mth sp[0m[1ml[22mit[0m[1mp[22m[0m[1ma[22mth [0m[1mL[22mOAD_[0m[1mP[22m[0m[1mA[22mTH



```
lpad(s, n::Integer, p::Union{AbstractChar,AbstractString}=' ') -> String
```

Stringify `s` and pad the resulting string on the left with `p` to make it `n` characters (code points) long. If `s` is already `n` characters long, an equal string is returned. Pad with spaces by default.

# Examples

```jldoctest
julia> lpad("March", 10)
"     March"
```


In [134]:
for i = 0:999999
    
    ic = lpad(string(i), 6,'0')
    
    if ispalindrome(ic[3:6])
        ic_p1 = lpad(string(i+1), 6,'0')
        if ispalindrome(ic_p1[2:6])
            ic_p2 = lpad(string(i+2), 6,'0')
            if ispalindrome(ic_p2[2:5])
                ic_p3 = lpad(string(i+3), 6,'0')
                    if ispalindrome(ic_p3)
                        println(ic)
                    end
            end
        end
    end
    
end

                

198888
199999


I am seeing a lot of lpad(string...) repeated.  Let's refactor the above code!

In [159]:
function to0padstr(number, digits=6)    # we set 6 as the default value (taken if not supplied) for digits
    
    lpad(string(number), digits,'0')
    
end


for i = 0:999999
        
    if ispalindrome(to0padstr(i)[3:6]) && ispalindrome(to0padstr(i+1)[2:6]) && 
        ispalindrome(to0padstr(i+2)[2:5]) && ispalindrome(to0padstr(i+3))
        
        println(i)
    
    end
    
end

198888
199999


## Exercise 9-9

“Recently I had a visit with my mom and we realized that the two digits that make up my age when reversed resulted in her age. For example, if she’s 73, I’m 37. We wondered how often this has happened over the years but we got sidetracked with other  topics and we never came up with an answer.

When I got home I figured out that the digits of our ages have been reversible six times so far. I also figured out that if we’re lucky it would happen again in a few years, and if we’re really lucky it would happen one more time after that. In other words, it would have happened 8 times over all. So the question is, what were (and would be) my ages when these happened (and would happen)?"

Write a Julia program that searches for solutions to this Puzzler.

TIP: One way to go about is do a loop over the possible range of his mother's age (maybe 14 ~ 40?) when she gave birth to him.  Let's call the corresponding variable `mom_preg_age`.

In [139]:
my_age_array = []
mom_age_array = []

for mom_preg_age = 15:40
    
    count = 0
    
    for mom_age = mom_preg_age:100
       
        my_age = mom_age - mom_preg_age
        if isreverse(lpad(mom_age, 2, '0'),lpad(my_age, 2, '0'))
            count += 1
            push!(my_age_array,my_age)
            push!(mom_age_array,mom_age)
        end
    
    end
    
    if count == 8
        println("my ages:  ", my_age_array)
        println("mom's ages:  ", mom_age_array)
    end
        
end



my ages:  Any[2, 13, 24, 35, 46, 57, 68, 79]
mom's ages:  Any[20, 31, 42, 53, 64, 75, 86, 97]
