## Longest ordered word

My friend Susie asked on facebook "what is the longest word that is in order? where in order means that each letter comes after the previous one alphabetically."

Some references that I used are 
- https://docs.python.org/3/library/functions.html
- http://www.grantjenks.com/docs/sortedcontainers/

In [4]:
%ls
# list the directory - magic command

[0m[01;34mdata[0m/  lecture_3_scrabble.ipynb


In [6]:
%ls data 
# here's the file that I want

sowpods.txt


In [1]:
!head data/sowpods.txt

AA
AAH
AAHED
AAHING
AAHS
AAL
AALII
AALIIS
AALS
AARDVARK


In [14]:
def read_dictionary(filename="data/sowpods.txt"):
    """create a list of words in the scrabble dictionary"""
    with open(filename,'r') as scrabblefile:
        scrabble_dict = [word.strip() for word in scrabblefile]
    return(scrabble_dict)

In [15]:
scrabble_dict = read_dictionary()

In [16]:
print(scrabble_dict[:10])

['AA', 'AAH', 'AAHED', 'AAHING', 'AAHS', 'AAL', 'AALII', 'AALIIS', 'AALS', 'AARDVARK']


In [17]:
scrabble_dict[10:0:-2]

['AARDVARKS', 'AALS', 'AALII', 'AAHS', 'AAHED']

In [18]:
tw = scrabble_dict[1111]
print(tw)

ACCESSORIZING


In [9]:
## Practice with above word

In [11]:
'd' < 'c'

False

In [21]:
def is_in_order(word):
    """return a true if the word is in order"""
    witer = iter(word)
    lettera = next(witer)
    for letterb in witer:
        if lettera > letterb:
            return False
        lettera = letterb
    return True

In [22]:
is_in_order(tw)

False

In [25]:
wi = iter(tw)

In [27]:
next(wi)

'C'

In [30]:
def longest_word(scrabble_dict):
    """Returns all of the longest ordered words in the scrabble dictionary"""
    curr_len = 0
    for word in scrabble_dict:
        if is_in_order(word):
            wl = len(word)
            if wl > curr_len:
                curr_len = wl
                word_list = [word]
            if wl==curr_len:
                word_list.append(word)
    return word_list

In [31]:
longest_word(scrabble_dict)

['ADDEEMS', 'ADDEEMS', 'BEEFILY', 'BILLOWY', 'CHIKORS', 'DIKKOPS', 'GIMMORS']

In [32]:
#Why do we make doc strings?
longest_word?

## Unique letters: list comprehensions, dictionaries, sets

In [13]:
set("AAHDVAHK")

{'A', 'D', 'H', 'K', 'V'}

In [35]:
%%timeit
scrabble_sets = [] # construct word sets
for word in scrabble_dict:
    scrabble_sets.append(set(word))

1 loop, best of 3: 284 ms per loop


In [36]:
%%timeit
scrabble_sets = [set(word) for word in scrabble_dict]

1 loop, best of 3: 252 ms per loop


In [37]:
#Let's look at the array object, which is a more compact list for certain immutable types
from array import array
from math import log

ws_lens = array('i',(len(ws) for ws in scrabble_sets)) #Initialize the array with a gen expr
n = len(scrabble_sets)

In [55]:
n

267751

In [56]:
len(ws_lens)

267751

In [57]:
# Counting the number occurances of each unique letter number in each word
lens = [0]*20
for l in ws_lens:
    lens[l-1] += 1

In [58]:
lens

[5,
 347,
 3002,
 11852,
 27695,
 46474,
 57964,
 52950,
 37879,
 19927,
 7559,
 1843,
 233,
 19,
 2,
 0,
 0,
 0,
 0,
 0]

In [None]:
Counter(ws_lens)

In [39]:
#Is there a better way using Counter?
from collections import Counter
Counter.__init__?

In [43]:
a = Counter([1,2,1])
b = Counter([2,3,2])
sum([a,b],Counter())

Counter({1: 2, 2: 3, 3: 1})

In [59]:
word_lens = Counter(ws_lens)

In [60]:
word_lens

Counter({1: 5,
         2: 347,
         3: 3002,
         4: 11852,
         5: 27695,
         6: 46474,
         7: 57964,
         8: 52950,
         9: 37879,
         10: 19927,
         11: 7559,
         12: 1843,
         13: 233,
         14: 19,
         15: 2})

In [46]:
#Let's explore iterables!  Print 
I = iter(range(len(word_lens)))
for l in word_lens:
    i = next(I)
    if l > 0:
        print("{}: {}> {}".format(i,"-"*round(log(l)),l))

0: > 1
1: -> 2
2: -> 4
3: --> 5
4: -> 3
5: --> 6
6: --> 7
7: --> 9
8: --> 8
9: --> 10
10: --> 12
11: --> 11
12: ---> 13
13: ---> 14
14: ---> 15


In [None]:
# What is the more succinct way to do this with enumerate!

In [18]:
# I want to look up the number of unique letters in a word in O(1) time, how can I do this? 

In [None]:
# Similarity between words?

In [47]:
ws_lens[1111]

10

In [48]:
scrabble_dict[1111]

'ACCESSORIZING'

## Hash table and dictionaries
<img src="https://upload.wikimedia.org/wikipedia/commons/thumb/d/d0/Hash_table_5_0_1_1_1_1_1_LL.svg/450px-Hash_table_5_0_1_1_1_1_1_LL.svg.png">

In [49]:
ind_dict = {word: i for i, word in enumerate(scrabble_dict)}

In [52]:
%%timeit 
ind_dict['ACCESSORIZING']

The slowest run took 63.08 times longer than the fastest. This could mean that an intermediate result is being cached.
10000000 loops, best of 3: 113 ns per loop


In [53]:
%%timeit
scrabble_dict.index('ACCESSORIZING')

10000 loops, best of 3: 25.7 µs per loop


In [54]:
hash('ACCESSORIZING')

922876837765020496

In [None]:
# Example of hash

## B tree and bisection
<img src="http://btechsmartclass.com/DS/images/B-Tree%20Example.jpg">

In [17]:
# Sorted lists (binary bisection)
from sortedcontainers import SortedList

In [19]:
sorted_scrabble = SortedList(scrabble_dict)

In [21]:
sorted_scrabble.bisect('HUMIDIFIER')

108281

In [None]:
Z