There are multiple hash table-based data structures commonly used in Python--set, dict, collections.defaultdict, and collections.Counter. The difference between set and the ohter three is that is set simply stores keys, whereas the ohters store key-value pairs. All have the property that they do not allow for duplicate keys, unlike, for example, list. 

In a dict, accessing value associated with a key that is not present lead to a KeyError exception. However, a collections.defaultdict returns the default value for the type that was specified when the collection was instantiated, e.g., if d = collections.defaultdict(list), then if k not in d then d[k] is []. A collections.Counter is used for counting the number of occurrences of keys, with a number of set-like operations, as illustrated below. 

In [2]:
import collections

In [3]:
c = collections.Counter(a = 3, b =1)
d = collections.Counter(a = 1, b = 2)
# add two counters toghether: c[x] + d[x]
c + d

Counter({'a': 4, 'b': 3})

In [4]:
# substract (keeping only positive counts)
c -  d

Counter({'a': 2})

In [5]:
# Intersection: min(c[x], d[x])
c & d

Counter({'a': 1, 'b': 1})

In [6]:
# Union: max(c[x], d[x])
c | d

Counter({'a': 3, 'b': 2})

In [7]:
A = [1,2,4,5,2,4,6,3,2]
Aset = set(A)

In [8]:
Aset

{1, 2, 3, 4, 5, 6}

In [9]:
Aset.add(9)

In [10]:
Aset

{1, 2, 3, 4, 5, 6, 9}

In [11]:
Aset.remove(6)

In [12]:
Aset

{1, 2, 3, 4, 5, 9}

In [13]:
Aset.discard(5)

In [14]:
Aset

{1, 2, 3, 4, 9}

In [15]:
newset = {1,2,3}

In [16]:
newset <= Aset

True

In [17]:
Aset.remove(6)

KeyError: 6

In [18]:
Aset.discard(6)

The difference between remove() and discard() is remove() remove the element from the set if it not constained it raises a KeyError. On the other hand, discard() remove the element too if it is contained but if not contained does not raise any error. 

The basic operations on the three key-value collections are similar to those on set. One difference is with iterators-- iteration over a key-value collection yields the keys. To iterate over the key-value paris, iterave over items(); to iterate over values, uses values(). (The keys() method return iterator to the keys.)

Not every type is "hashable", i.e., can be added to a set or used as a key in a dict. In particular, mutable containers are not hashable--this is to prevent a client from modifying an object after adding it to the container, since the lookup will then fail to find it if the slot that the modified object hashes to is difference. 

## 12.1 Test for Palindromic Permutations

Write a program to test whether the letters forming a string can be permuted to form a palindrome. For example, "edified" can be permuted to form "deified".

**Hint:** All characters must occur in pairs for a string to be permutable into a palindrome, with one exception, if the string is of odd length. 

In [19]:
def can_form_palindrome(s: str) -> bool:
    # A string can be permuted to form a palindrome if and only if the number
    # of chars whose frequencies is odd is at most 1.
    return sum(v % 2 for v in collections.Counter(s).values()) <= 1

In [20]:
s = 'loeloev'

In [21]:
can_form_palindrome(s)

True

In [22]:
s = 'loelooe'
can_form_palindrome(s)

True

In [23]:
s = 'loeloe'
can_form_palindrome(s)

True

In [24]:
s = 'loelov'
can_form_palindrome(s)

False

In [25]:
hash_s = collections.Counter(s)

In [26]:
hash_s

Counter({'l': 2, 'o': 2, 'e': 1, 'v': 1})

In [27]:
hash_s.values()

dict_values([2, 2, 1, 1])

In [28]:
hash_s.items()

dict_items([('l', 2), ('o', 2), ('e', 1), ('v', 1)])

In [29]:
hash_s.keys()

dict_keys(['l', 'o', 'e', 'v'])

In [32]:
small_sum = 0 
for v in hash_s.values():
    small_sum += v %2 
    print(small_sum )

0
0
1
2


The time complexit is O(n), where n i sht elenght of the string. The space complexity is O(c), where c is the number of distinct characters appearing in the string. 

## 12.2 Is an Anonymous Letter Constructible?

Write a program which takes text for an anonymous letter and text for a magazine and determines if it is possible to write the anonymouse letter using the magazine. The anonymous letter can be written using the magazine if for each character in the anonymous letter, the number of times it appears in the anonymous letter is no more than the number of times it appears in the magazine. 

**Hint:** Count the number of distince characters appearing in the letter. 

A better approach is to make a single pass over the letter, storing the character counts for the letter in a single hash table--keys are characters, and values are the number of times that character appears. Next, we make a pass over the magazine. When processing a character c, if c appears in the hash table, we reduce its count by 1; we remove it from the hash when its count goes to zero. If the hash becomes empty, we return true. If we reach the end of the magazine and the hash is empty, we return false--each of the characters remaining in the hash occurs more times in the letter than the magazine. 

In [34]:
def is_letter_constructible_from_magazine(letter_text: str, 
                                          magazine_text: str) -> bool:
    # Compute the frequencies for all chars in letter_text
    char_frequency_for_letter = collections.Counter(letter_text)
    
    # Checks if characters in magazine_text can cover characters in 
    # char_frequency_for_letter.
    for c in magazine_text:
        if c in char_frequency_for_letter:
            char_frequency_for_letter[c] -= 1
            if char_frequency_for_letter[c] == 0:
                del char_frequency_for_letter[c]
                if not char_frequency_for_letter:
                    # All characters for letter_text are matched
                    return True
    
    # Empty char_frequency_for_letter means every char in letter_text 
    # can be covered by a character in magazine_text
    return not char_frequency_for_letter


In [36]:
letter_text = 'I went to school in Sunday'
magazine_text = 'Sunday is so great Marry and I went to shopping instead of taking classes in school'

In [37]:
is_letter_constructible_from_magazine(letter_text, magazine_text)

True

In [38]:
letter_text = 'I like singing'
is_letter_constructible_from_magazine(letter_text, magazine_text)

True

In [39]:
letter_text = 'Brown like dancing'
is_letter_constructible_from_magazine(letter_text, magazine_text)

False

In [40]:
collections.Counter(letter_text)

Counter({'B': 1,
         'r': 1,
         'o': 1,
         'w': 1,
         'n': 3,
         ' ': 2,
         'l': 1,
         'i': 2,
         'k': 1,
         'e': 1,
         'd': 1,
         'a': 1,
         'c': 1,
         'g': 1})

In [41]:
collections.Counter(magazine_text)

Counter({'S': 1,
         'u': 1,
         'n': 7,
         'd': 3,
         'a': 7,
         'y': 2,
         ' ': 15,
         'i': 5,
         's': 8,
         'o': 6,
         'g': 3,
         'r': 3,
         'e': 4,
         't': 5,
         'M': 1,
         'I': 1,
         'w': 1,
         'h': 2,
         'p': 2,
         'f': 1,
         'k': 1,
         'c': 2,
         'l': 2})

In [42]:
collections.Counter(letter_text) - collections.Counter(magazine_text)

Counter({'B': 1})

In [43]:
# Pythonic solution that exploits collections.Counter. Note that the
# substraction only keeps keys with positive counts. 
def is_letter_constructible_from_magazine_pythonic(letter_text, magazine_text):
    return (not collections.Counter(letter_text) - collections.Counter(magazine_text))

In [44]:
is_letter_constructible_from_magazine_pythonic(letter_text, magazine_text)

False

In the worst-case, the letter is not constructible or the last character of the magazine is essentially required. Therefore, the time complexity is O(m+n) where m and n are the number of characters in the letter and magazine, respectively. The space complexity is the size of the hash table constructed in the pass over the letter, i.e., O(L), where L is the number of distinct characters appearing in the letter. 

## 12.3 Implement an ISBN Cache

The international Standard Book Number (ISBN) is a unique commercial book identifier. It is a string of length 10. The first 9 characters are digits; the last character is a check character. The check character is the sum of the first 9 digits, mod 11, with 10 represented by 'X'. 

Create a cache for looking up prices of books identified by their ISBN. For the purpose of this exercise, treat ISBNs and prices as positive integers. You must implement lookup, insert, and erase method. Use the Least Recently Used(LRU) policy for cache eviction. 

* Insert: If an ISBN is already present, insert should not update the price, but should update the ISBN to be the most recently used entry.
* Lookup: given an ISBN, return the corresponding price; if the element is not present, return -1. If the ISBN is present, update the entry to be the most recently used ISBN. 
* Erase: remove the specified ISBN and corresponding value from the case. Return true is the ISBN was present; otherwise, return false. 

**Sol:** Hash tables are ideally suited for fast lookups. We can use a hash table to quickly lookup price by using ISBNs as keys, and a counter, which we use to record when an operation was performed--every time we do an insert or a lookup we incremenet the counter. For each ISBN we store a value which is the price and the "timestamp", which is the count corresponding to when the ISBN was most recently inserted or looked-up. 

This approach has O(1) lookup and delete times. Inserts are O(1) time, until the cache is full. Once the cache fills up, to add a new ISBN we have to find the LRU ISBN, which will be evicted to make place for the new entry. Finding the entry takes O(n) time, where n is the cache size, since we have to scan all entries to find the one with the smallest timestamp. Therefore the time complexity of inserts is O(n). 

The way to imporve efficiency is to avoid processing all ISBNs. Conceptually, the ISBNs are ordered by when they were most recently used, and we only need to be able to find the oldest ISBN efficiently. This suggests recording the ISBNs in a queue (in addition to the hash table). 

Specifically, for each ISBN, we store a reference to its location in the queue. Each time we do a lookup on an ISBN, we move to the front of the queue. (This requires us to use a linked list implementation of the queue, so taht items in the middle of the queue can be moved to the head.) We do the same when we insert an ISBN that's already present. If an insert results in the queue size exceeding n, the ISBN at the tail of the queue is deleted from the cache, i.e., from the queue and the hash table. 

In [46]:
from collections import OrderedDict

In [85]:
class LruCache:
    def __init__(self, capacity: int) -> None: 
        self._isbn_price_table = collections.OrderedDict()
        self._capacity = capacity
        
    def lookup(self, isbn: int) -> int:
        if isbn not in self._isbn_price_table:
            return -1 
        price = self._isbn_price_table.pop(isbn)
        self._isbn_price_table[isbn] = price
        return price 
    
    def insert(self, isbn: int, price: int) -> None:
        # We add the value for key only if key is not present -- we don't update 
        # existing value 
        if isbn in self._isbn_price_table:
            price = self._isbn_price_table.pop(isbn)
        elif len(self._isbn_price_table) == self._capacity:
            self._isbn_price_table.popitem(last=False)
        self._isbn_price_table[isbn] = price
        
    def erase(self, isbn: int) -> bool:
        return self._isbn_price_table.pop(isbn, None) is not None

### Ordered Dict in Python 

In [48]:
print("This is a Dict:\n")
d = {}
d['a'] = 1
d['b'] = 2
d['c'] = 3
d['d'] = 4

This is a Dict:



In [49]:
for key,value in d.items():
    print(key, value)

a 1
b 2
c 3
d 4


In [50]:
print("This is an Ordered Dict: \n")
od = OrderedDict()
od['a'] = 1
od['b'] = 2
od['c'] = 3
od['d'] = 4

for key,value in od.items():
    print(key, value)

This is an Ordered Dict: 

a 1
b 2
c 3
d 4


1. Key value Change: If the value of a certian key is changed, the position of the key remains unchanged in OrderedDict. 

In [51]:
print("Before: \n")
for key,value in od.items():
    print(key, value)
print("\n After:\n")
od['c'] = 5
for key,value in od.items():
    print(key, value)

Before: 

a 1
b 2
c 3
d 4

 After:

a 1
b 2
c 5
d 4


2. Deletion and Re-Inserting: Deeting and re-inserting the same key will push it to the back as OrderedDict however maintains the order of insertion. 

In [52]:
print("Before deleting:\n")

for key, value in od.items():
    print(key, value)
    
print("\n After deleting: \n")
od.pop('c')
for key,value in od.items():
    print(key, value)

print("\n After re-inserting: \n")
od['c'] = 3
for key, value in od.items():
    print(key, value)

Before deleting:

a 1
b 2
c 5
d 4

 After deleting: 

a 1
b 2
d 4

 After re-inserting: 

a 1
b 2
d 4
c 3


In [53]:
od.popitem('c')

('c', 3)

In [54]:
for key, value in od.items():
    print(key, value)

a 1
b 2
d 4


In [55]:
od.popitem(last = False)

('a', 1)

In [56]:
for key,value in od.items():
    print(key, value)

b 2
d 4


In [57]:
od.popitem(last = True)
for key, value in od.items():
    print(key, value)

b 2


In [86]:
lalala_table = LruCache(capacity = 5)

In [87]:
lalala_table._capacity

5

In [88]:
lalala_table.insert(isbn = 123421332, price = 5)

In [89]:
for isbn, price in lalala_table._isbn_price_table.items():
    print(isbn, price)

123421332 5


In [90]:
lalala_table.insert(isbn = 123421333, price = 8)
lalala_table.insert(isbn = 123421334, price = 2)
lalala_table.insert(isbn = 123421335, price = 11)
lalala_table.insert(isbn = 123421336, price = 3)
lalala_table.insert(isbn = 123421337, price = 4)
lalala_table.insert(isbn = 123421338, price = 12)
lalala_table.insert(isbn = 123421339, price = 7)

In [91]:
for isbn, price in lalala_table._isbn_price_table.items():
    print(isbn, price)

123421335 11
123421336 3
123421337 4
123421338 12
123421339 7


In [92]:
lalala_table.lookup(123421225)

-1

In [93]:
lalala_table.lookup(123421335)

11

In [94]:
for isbn, price in lalala_table._isbn_price_table.items():
    print(isbn, price)

123421336 3
123421337 4
123421338 12
123421339 7
123421335 11


In [81]:
lalala_table._isbn_price_table.pop(123421335)

11

In [82]:
for isbn, price in lalala_table._isbn_price_table.items():
    print(isbn, price)

123421336 3
123421337 4
123421338 12
123421339 7


In [83]:
lalala_table._isbn_price_table[123421335] = 11

In [84]:
for isbn, price in lalala_table._isbn_price_table.items():
    print(isbn, price)

123421336 3
123421337 4
123421338 12
123421339 7
123421335 11


In [95]:
lalala_table.erase(123421334)

False

In [96]:
lalala_table.erase(123421335)

True

In [97]:
for isbn, price in lalala_table._isbn_price_table.items():
    print(isbn, price)

123421336 3
123421337 4
123421338 12
123421339 7


The time complexity for each lookup is O(1) for the hash table lookup and O(1) for updating the queue, i.e. O(1) overall. 

## 12.4 Compute the LCA, Optimizing for Close Ancestors

Design an algorithm for computing the LCA of wo nodes in a binary tree. The algorithm's time complexity should depend only on the distance from the nodes to the LCA. 

**Sol:** Intuitively, the brute-force approach is suboptimal because it potentially processes nodes well above the LCA. We can avoid this by alternating moving upwards from the two nodes and storing the nodes visited as we move up in a hash table.Each time we visit a node we check to see if it has been visited before. 

In [99]:
class BinaryTreeNode:
    def __init__(self, data=None, left=None, right= None, parent = None, nexxt = None):
        self.data = data
        self.left = left
        self.right = right
        self.parent = parent
        self.next = nexxt 

In [108]:
def lca(node0: BinaryTreeNode,
        node1: BinaryTreeNode) -> BinaryTreeNode:
    iter0, iter1 = node0, node1
    nodes_on_path_to_root = set()
    while iter0 or iter1:
        # Ascend tree in tandem for these two nodes.
        if iter0:
            if iter0 in nodes_on_path_to_root:
                return iter0
            print('the traversal point of iter0')
            print(tree_traversal_inorder(iter0))
            nodes_on_path_to_root.add(iter0)
            iter0 = iter0.parent
        if iter1:
            if iter1 in nodes_on_path_to_root:
                return iter1 
            print('the traversal point of iter1')
            print(tree_traversal_inorder(iter1))
            nodes_on_path_to_root.add(iter1)
            iter1 = iter1.parent
    raise ValueError('node0 and node1 are not in the same tree')

In [101]:
node1 = BinaryTreeNode(314)
node2 = BinaryTreeNode(6)
node3 = BinaryTreeNode(6)
node4 = BinaryTreeNode(271)
node5 = BinaryTreeNode(561)
node6 = BinaryTreeNode(2)
node7 = BinaryTreeNode(271)
node8 = BinaryTreeNode(28)
node9 = BinaryTreeNode(0)
node10 = BinaryTreeNode(3)
node11 = BinaryTreeNode(1)
node12 = BinaryTreeNode(28)
node13 = BinaryTreeNode(17)
node14 = BinaryTreeNode(401)
node15 = BinaryTreeNode(257)
node16 = BinaryTreeNode(641)

In [102]:
node1.left = node2
node1.right = node3
node2.left = node4
node2.right = node5
node3.left = node6
node3.right = node7
node4.left = node8
node4.right = node9
node5.right = node10
node10.left = node13
node6.right = node11
node11.left = node14
node11.right = node15
node14.right = node16
node7.right = node12

In [103]:
node2.parent = node1
node3.parent = node1
node4.parent = node2
node5.parent = node2
node6.parent = node3
node7.parent = node3
node8.parent = node4
node9.parent = node4
node10.parent = node5
node13.parent = node10
node11.parent = node6
node14.parent = node11
node15.parent = node11
node16.parent = node14
node12.parent = node7

In [104]:
def tree_traversal_inorder(root: BinaryTreeNode) -> None:
    if root:
        tree_traversal_inorder(root.left) 
        print('Inorder: %d' % root.data)
        tree_traversal_inorder(root.right)

In [105]:
tree_traversal_inorder(node1)

Inorder: 28
Inorder: 271
Inorder: 0
Inorder: 6
Inorder: 561
Inorder: 17
Inorder: 3
Inorder: 314
Inorder: 2
Inorder: 401
Inorder: 641
Inorder: 1
Inorder: 257
Inorder: 6
Inorder: 271
Inorder: 28


In [110]:
lca(node14, node12)

the traversal point of iter0
Inorder: 401
Inorder: 641
None
the traversal point of iter1
Inorder: 28
None
the traversal point of iter0
Inorder: 401
Inorder: 641
Inorder: 1
Inorder: 257
None
the traversal point of iter1
Inorder: 271
Inorder: 28
None
the traversal point of iter0
Inorder: 2
Inorder: 401
Inorder: 641
Inorder: 1
Inorder: 257
None
the traversal point of iter1
Inorder: 2
Inorder: 401
Inorder: 641
Inorder: 1
Inorder: 257
Inorder: 6
Inorder: 271
Inorder: 28
None


<__main__.BinaryTreeNode at 0x10d47cb00>

In [107]:
tree_traversal_inorder(lca(node14, node15))

Inorder: 401
Inorder: 641
Inorder: 1
Inorder: 257


Note that we are trading space for tie. The algorithm for Solution 9.4 on Page 125 used O(1) space and O(h) time, whereas the algorithm presented above uses O(D0 + D1) space and time, where D0 is the distance from the LCA to the first node, and D1 is the distance from the LCA to the second node. In the worst-case, the nodes are leaves whose LCA is the root, and we end up using O(h) space and time, where h is the height of the tree. 

## 12.5 Find the Nearest Repeated Entries in an Array

Peopel do not like reading text in which a word is used multiple imes in a short paragraph. You are to write a program which helps identify such a problem. 

Write a program which takes as input an array and finds the distance between a closet pair of equal entries. For example, if s = <"All", "work", "and", "no", "play", "makes", "for", "no", "work", "no", "fun", "and", "no", "results">, then the second and third occurrences of "no" is the closest pair. 

**Sol:** The brute-force approach is to iterate over all pairs of entries, check if they are the same, and if so, if the distance between them is less than the smallest such distance seen so far. The time complexity is O(n^2), where n is the array length. 

We can improve upon the brute-force algorithm by nothing that when examining an entry, we do not need to look at every other entry--we only care about entries which are the same. We can store the set of indices corresponding to a given value using a hash table and iterate over all such sets. However, there is a better approach-- when processing an entry, all we care about is the closest previous equal entry. Specifically, as we scan through the array, for each vaue seen so far, we store in a hash tabe and latest index at which it appreas. When processing the lement, we use the hash table to see the latest index less than the current index holding the same value. 

In [115]:
import typing

In [116]:
def find_nearest_repetition(paragraph: list) -> int:
    word_to_latest_index = {}
    nearest_repreated_distance = float('inf')
    for i, word in enumerate(paragraph):
        if word in word_to_latest_index:
            latest_equal_word = word_to_latest_index[word]
            nearest_repreated_distance = min(nearest_repreated_distance, i- latest_equal_word)
        
        word_to_latest_index[word] = i
    return typing.cast(int, nearest_repreated_distance
                      ) if nearest_repreated_distance != float('inf') else -1 
    

In [117]:
paragraph = ['All', 'work', 'and', 'no', 'play', 'makes', 'for', 'no', 'work',
            'no', 'fun', 'and', 'no', 'results']
find_nearest_repetition(paragraph)

2

In [120]:
paragraph = ['wow', 'such' ,'a' ,'wonderful' ,'day', 'wow', 'so', 'so']

In [121]:
find_nearest_repetition(paragraph)

1

The time complexity is O(n), since we perform a constant amount of work per entry. The space complexity is O(d), where d is the number of distinct entries in the array. 