# Chapter 12: Hash Tables

## Notes

* Given a corpus of `n` words with `m` being the maximum length of any word, the list of anagrams can be generated in `O(nm)` time as below. The chances of overflow can be reduced by reducing the probability of `hash` being a large value. This can be done by assigning lower primes to the more frequently used characters. Also look at this [link](https://stackoverflow.com/questions/11108541/get-list-of-anagrams-from-a-dictionary). 

```
procedure anagrams(corpus):
    - dict := Empty HashMap
    - For word in corpus:
        - dict[anagram_hash(word)] := append "word"
    - Return values of "dict" whose length is atleast 2
    
procedure anagram_hash(word):
    - prime_array := array of the first 26 prime numbers
    - hash := 1
    - For char in word:
        - hash := hash * prime_array[(ascii(char) - ascii(`a`))]
    - Return hash
```

* `set.remove(x)` vs `set.discard(x)`? The former raises a `KeyError` if `x` is not in set. The latter returns `None` instead.

* The built-in `hash()` function can greatly simplify the implementation of a hash function for a user-defined class i.e implementing `__hash__(self)`.

* `frozenset` is hashable alternative to `set` if a collection of non-duplicate elements need to be hashed.

The below solutions might contain lots of repeated code. This is done deliberately to facilitate practice

In [3]:
# Imports
from collections import deque, OrderedDict, Counter, defaultdict
from copy import copy

## 12.1 Test for palindromic permutations

In [13]:
def can_form_palindrome(s):
    """
    Returns True iff the given string is a palindrome
    """
    return sum(v%2 for v in Counter(s).values()) <= 1

# Tests
assert not can_form_palindrome("hakuna")
assert can_form_palindrome("edified")

# 12.2 Is an anonymous letter constructible?

In [14]:
def is_letter_constructible(letter, magazine):
    """
    Returns True iff the given letter can be constructed from the given magazine
    """
    letter_count = Counter(letter)
    mag_count = Counter(magazine)
    return not letter_count - mag_count
    

# Tests
assert is_letter_constructible("hey", "hasenasdfy")
assert not is_letter_constructible("heyz", "hasenasdfy")
assert not is_letter_constructible("hhh", "hasdfas")

In [15]:
def is_letter_constructible_2(letter, magazine):
    letter_count = Counter(letter)
    for c in magazine:
        if c in letter_count:
            letter_count[c] -= 1
    return sum(v for v in letter_count.values() if v >= 0) == 0
    
    
# Tests
assert is_letter_constructible_2("hey", "hasenasdfy")
assert not is_letter_constructible_2("heyz", "hasenasdfy")
assert not is_letter_constructible_2("hhh", "hasdfas")
assert not is_letter_constructible_2("az", "aaay")

# 12.3 Implement an ISBN cache

In [16]:
class ISBNCache:
    def __init__(self, capacity):
        self._cache = OrderedDict()
        self._capacity = capacity
        
    def lookup(self, isbn):
        if isbn not in self._cache:
            raise KeyError("ISBN {} does not exist".format(isbn))
        # The below pop + insert operation, puts the key-value pair at the end of the OrderedDict
        price = self._cache.pop(isbn)
        self._cache[isbn] = price
        return price
    
    def insert(self, isbn, price):
        if isbn in self._cache:
            self._cache.pop(isbn)
        elif len(self._cache) >= self._capacity:
            # Remove LRU item. last = False assumes a FIFO ordering
            self._cache.popitem(last=False)
        self._cache[isbn] = price
    
    def delete(self, isbn):
        return self._cache.pop(isbn, None) is not None

In [17]:
# Generic LRU Cache without using OrderedDict. 
# Other implementations: https://discuss.leetcode.com/topic/14591/python-dict-double-linkedlist/9

class Node:
    def __init__(self, k, v):
        self.key = k
        self.val = v
        self.prev = None
        self.next = None

class LRUCache:
    def __init__(self, capacity):
        self.capacity = capacity
        self.dic = dict()
        self.prev = self.next = self
        
    def get(self, key):
        if key in self.dic:
            n = self.dic[key]
            self._remove(n)
            self._add(n)
            return n.val
        return -1

    def put(self, key, value):
        if key in self.dic:
            self._remove(self.dic[key])
        n = Node(key, value)
        self._add(n)
        self.dic[key] = n
        if len(self.dic) > self.capacity:
            n = self.next
            self._remove(n)
            del self.dic[n.key]

    def _remove(self, node):
        p = node.prev
        n = node.next
        p.next = n
        n.prev = p

    def _add(self, node):
        p = self.prev
        p.next = node
        self.prev = node
        node.prev = p
        node.next = self

## 12.5 Find the nearest repeated entries in an array

In [1]:
def nearest_entries(A):
    """
    Given an array of words
    Returns the nearest repeated entry if present, else None
    """
    d = {}
    nearest = None
    min_dist = float("inf")

    for i, word in enumerate(A):
        if word in d:
            if i - d[word] < min_dist:
                min_dist = i - d[word]
                nearest = word
        d[word] = i
    return nearest


# Tests
A = "All work and no play makes for no work no fun and no results".split()
assert nearest_entries(A) == "no"
assert not nearest_entries("Its a beautiful day".split())

## 12.6 Find the smallest subarray covering all the values

In [19]:
def subarray_cover_1(A, s):
    """
    Given an array of strings and a set of strings
    Returns the indices of the smallest subarray of A that covers all the strings in s
    """
    to_cover = s.copy()
    result = (-1, -1)
    left = 0
    
    for right, rword in enumerate(A):
        if rword in to_cover:
            to_cover.remove(rword)
        
        while len(to_cover) == 0:
            if result == (-1, -1) or right - left < result[1] - result[0]:
                result = (left, right)
            lword = A[left]
            if lword in s:
                to_cover.add(lword)
            left += 1
    return result


# Tests
assert subarray_cover_1("apple banana apple apple dog cat apple dog banana apple cat dog".split(),
                     {"banana", "cat"}) == (8, 10)

The time complexity of the above solution is `O(n)`. However, the disadvantage is that the subarray (left, right) has to be in memory. How can the algorithm be modified to tackle cases where `A` is a stream?

In [3]:
class DoublyLinkedNode:
    """
    Represents a linkedlist node with prev and next pointers
    """
    def __init__(self, data=None):
        self.data = data
        self.prev = self.next = None

class LinkedList:
    """
    Represents a linked list of doubly linked nodes
    """
    def __init__(self):
        self.head = self.tail = None
        self._size = 0
    
    def __len__(self):
        return self._size
        
    def remove(self, node):
        """
        Removes the given `node` from this list
        """
        if node.next:
            node.next.prev = node.prev
        else:
            self.tail = node.prev
        
        if node.prev:
            node.prev.next = node.next
        else:
            self.head = node.next
        
        node.next = node.prev = None
        self._size -= 1
    
    def append(self, data):
        """
        Appends a node encapsulating the given data to the end of the list
        """
        node = DoublyLinkedNode(data=data)
        if self.tail:
            self.tail.next = node
            node.prev = self.tail
        else:
            self.head = node
        self.tail = node
        self._size += 1
        

def subarray_cover_2(A, s):
    """
    Given an array of strings and a set of strings
    Returns the indices of the smallest subarray of A that covers all the strings in s
    """
    node_map = {q:None for q in s}
    result = (-1, -1)
    dllist = LinkedList()
    for i, word in enumerate(A):
        if word in node_map:
            node_ref = node_map[word]
            if node_ref is not None:
                dllist.remove(node_ref)
            dllist.append(i)
            node_map[word] = dllist.tail
            
            if len(dllist) == len(s):
                if (result == (-1, -1)) or (dllist.tail.data - dllist.head.data < result[1] - result[0]):
                    result = (dllist.head.data, dllist.tail.data)
    return result
    
# Tests
assert subarray_cover_2(iter("apple banana apple apple dog cat apple dog banana apple cat dog".split()),
                     {"banana", "cat"}) == (8, 10)

### Variant 1: Given an array `A`, find the shortest subarray `A[i, j]` such that each distinct value present in `A` is also present in the subarray

In [17]:
class DoublyLinkedNode:
    def __init__(self, data):
        self.data = data
        self.prev = self.next = None
        
class LinkedList:
    def __init__(self):
        self.tail = self.head = None
        self._size = 0
    
    def __len__(self):
        return self._size
    
    def append(self, data):
        node = DoublyLinkedNode(data)
        if self.tail:
            self.tail.next = node
            node.prev = self.tail
        else:
            self.head = node
        self.tail = node
        self._size +=1
    
    def remove(self, node):
        if node.next:
            node.next.prev = node.prev
        else:
            self.tail = node.prev
        
        if node.prev:
            node.prev.next = node.next
        else:
            self.head = node.next
        node.next = node.prev = None
        self._size -=1

        
def subarray_cover_var1(A, s):
    """
    Given an array of strings and a set of strings
    Returns the indices of the smallest subarray of A that covers all the strings in s
    """
    node_map = {q:None for q in s}
    result = (-1, -1)
    dllist = LinkedList()
    for i, word in enumerate(A):
        if word in node_map:
            node_ref = node_map[word]
            if node_ref:
                dllist.remove(node_ref)
            dllist.append(i)
            node_map[word] = dllist.tail
            
            if len(dllist) == len(s):
                if (result == (-1, -1)) or (dllist.tail.data - dllist.head.data) < (result[1] - result[0]):
                    result = (dllist.head.data, dllist.tail.data)
    return result

# Tests
A = "apple banana apple apple dog cat apple dog banana apple cat dog".split()
s = set(A)
assert subarray_cover_var1(iter(A), s) == (5, 8)

### Variant 2: Given an array A, rearrange the elements so that the shortest subarray containing all the distinct values in A has maximum possible length

### Possible algorithm:
    - Partition array such that the duplicates are grouped together
    - Find smallest subarray of distinct elements
    - Push the distinct elements to the ends 

Also look at [this](http://talk.elementsofprogramminginterviews.com/t/variant-13-9-2-given-an-array-a-rearrange-the-elements-so-that-the-shortest-subarray-containing-all-the-distinct-values-in-a-has-maximum-possible-length/311/4)
