## Hash Tables

### Phone Book (Open Addressing Method with Double Hashing)

**Task:** In this task your goal is to implement a simple phone book manager. It should be able to process the
following types of user‚Äôs queries:

- **add** number name. It means that the user adds a person with name **name** and phone number **number** to the phone book. If there exists a user with such number already, then your manager has to overwrite the corresponding name.
- **del** number. It means that the manager should erase a person with number **number** from the phone book. If there is no such person, then it should just ignore the query.
- **find** number. It means that the user looks for a person with phone number **number**. The manager should reply with the appropriate name, or with string ‚Äúnot found" (without quotes) if there is no such person in the book.

**Input Format:** There is a single integer $N$ in the first line ‚Äî the number of queries. It‚Äôs followed by $N$
lines, each of them contains one query in the format described above.

**Constraints:** $1 \leq N \leq 10^5$. All phone numbers consist of decimal digits, they don‚Äôt have leading zeros, and
each of them has no more than $7$ digits. All names are non-empty strings of latin letters, and each of
them has length at most $15$. It‚Äôs guaranteed that there is no person with name ‚Äúnot found".

**Output Format:** Print the result of each **find** query ‚Äî the name corresponding to the phone number or
‚Äúnot found" (without quotes) if there is no person in the phone book with such phone number. Output
one result per line in the same order as the **find** queries are given in the input.

In [18]:
class HashTable():
    size = 31
    prime = 29
    max_load_factor = 0.75
    
    def __init__(self):
        self.num_of_keys = 0
        self.table = [None] * self.size
         
    def hashing(self, key):
        return key % self.size
    
    def double_hashing(self, key):
        return self.prime - key % self.prime
    
    def insert(self, key, value):
        self.num_of_keys += 1
        hashed_key = self.hashing(key)
        if self.table[hashed_key] == None:
            self.table[hashed_key] = (key, value)
            #print(self.table)
            return
        i = 1
        while self.table[hashed_key] != None and self.table[hashed_key] != 0:
            if self.table[hashed_key][0] == key:
                self.table[hashed_key] = (key, value)
                self.rehash(self.table)
                #print(self.table)
                return
            hashed_key += i * self.double_hashing(key)
            hashed_key = hashed_key % self.size
            #print(hashed_key)
            i += 1 
        self.table[hashed_key] = (key, value)
        #print(self.table)
        self.rehash(self.table)

    def find(self, key):
        hashed_key = self.hashing(key)
        i = 1
        while self.table[hashed_key] != None:
            if self.table[hashed_key] != 0 and self.table[hashed_key][0] == key:
                print(self.table[hashed_key][1])
                return
            hashed_key += i * self.double_hashing(key)
            hashed_key = hashed_key % self.size
            i += 1
        print('not found')
        return
        
    def delete(self, key):
        hashed_key = self.hashing(key)
        i = 1
        while self.table[hashed_key] != None:
            if self.table[hashed_key][0] == key:
                self.num_of_keys -= 1
                self.table[hashed_key] = 0
                #print(self.table)
                return
            hashed_key += i * self.double_hashing(key)
            hashed_key = hashed_key % self.size
            i += 1
            
    def rehash(self, table):
        load_factor = self.num_of_keys / self.size
        if load_factor > self.max_load_factor:
            #print(load_factor)
            temp = self.table
            self.size *= 2
            self.table = [None] * self.size
            for i in temp:
                if i != None and i != 0:
                    self.insert(i[0], i[1])
                
    def print_table(self):
        print(self.table)
                
if __name__ == '__main__':
    phone_book = HashTable()
    n = int(input())
    for i in range(n):
        query = input().split()
        if query[0] == 'add':
            phone_book.insert(int(query[1]), query[2])
        elif query[0] == 'del':
            phone_book.delete(int(query[1]))
        elif query[0] == 'find':
            phone_book.find(int(query[1]))
        else:
            assert(0)

10
add 3610838 Ata
add 3163318 Emi
find 5463423
not found
add 4995588 Men
add 6727102 Ayturan
find 3610838
Ata
del 3610838
find 4995588
Men
find 3610838
not found
find 6727102
Ayturan


In [4]:
if __name__ == '__main__':
    phone_book = {}
    n = int(input())
    for i in range(n):
        query = input().split()
        if query[0] == 'add':
            phone_book[query[1]] = query[2]
        elif query[0] == 'del':
            phone_book.pop(query[1], None)
        elif query[0] == 'find':
            if query[1] not in phone_book.keys():
                print('not found')
            else:
                print(phone_book[query[1]])
        else:
            assert(0)
    print(phone_book)

10
add 4995588 Arturk
add 7563690 Aqsin
add 8306699 Adil
find 4531022
not found
add 7777777 Noone
add 3131033 Orxan
find 7563690
Aqsin
del 7777777
add 4995588 Erturk
find 4995588
Erturk
{'4995588': 'Erturk', '7563690': 'Aqsin', '8306699': 'Adil', '3131033': 'Orxan'}


### Hashing with Chains (Seperate Chaining Method with Linked List)

**Task:** In this task your goal is to implement a hash table with lists chaining. You are already given the
number of buckets $m$ and the hash function. It is a polynomial hash function:
$$h(S) = \Bigg(\sum_{i=0}^{|S|-1} S[i] \space x^i \space mod \space p \Bigg) \space mod \space m$$

where $S[i]$ is the ASCII code of the $i$-th symbol of $S$, $p = 1 000 000 007$ and $x = 263$. Your program
should support the following kinds of queries:
- **add** string ‚Äî insert string into the table. If there is already such string in the hash table, then just ignore the query.
- **del** string ‚Äî remove string from the table. If there is no such string in the hash table, then just ignore the query.
- **find** string ‚Äî output ‚Äúyes" or ‚Äúno" (without quotes) depending on whether the table contains string or not.
- **check** $i$ ‚Äî output the content of the $i$-th list in the table. Use spaces to separate the elements of the list. If $i$-th list is empty, output a blank line.

When inserting a new string into a hash chain, you must insert it in the beginning of the chain.

**Input Format:** There is a single integer $m$ in the first line ‚Äî the number of buckets you should have. The
next line contains the number of queries $N$. It‚Äôs followed by $N$ lines, each of them contains one query
in the format described above.

**Constraints:** $1 \leq N \leq 10^5; \frac{N}{5} \leq m \leq N$. All the strings consist of latin letters. Each of them is non-empty and has length at most $15$.

**Output Format:** Print the result of each of the **find** and **check** queries, one result per line, in the same
order as these queries are given in the input.

In [16]:
class Node:
    def __init__(self, key):
        self.key = key
        self.next = None

class LinkedList:
    def __init__(self):
        self.head = None
        self.tail = None
        
    def push_front(self, key):
        node = Node(key)
        node.next = self.head
        self.head = node
        if self.tail == None:
            self.tail = self.head
    
    def pop_front(self):
        if self.head == None:
            return
        else:
            self.head = self.head.next
        if self.head == None:
            self.tail = self.head
    
    def find_key(self, key):
        temp = self.head
        while(temp != None and temp.key != key):
            temp = temp.next
        if self.head == None or temp == None:
            print('no')
        else:
            print('yes')          

    def delete_key(self, key):
        temp = self.head
        if temp == None:
            return
        elif temp.key == key:
            self.pop_front()
        else:
            while(temp.next != None and temp.next.key != key):
                temp = temp.next
            if temp.next == None and temp.key != key:
                return
            else:
                if temp.next.next != None:
                    temp.next = temp.next.next
                else:
                    self.tail = temp
                    temp.next = None
    
    def print_linked_list(self):
        temp = self.head
        if temp == None:
            print()
        while(temp != None):
            if temp != self.tail:
                print(temp.key, end = ' ')
            else:
                print(temp.key)
            temp = temp.next
        
class HashChainTable:
    multiplier = 263
    prime = 1000000007

    def __init__(self, bucket_count):
        self.bucket_count = bucket_count
        self.hash_table = [None] * self.bucket_count
        for i in range(self.bucket_count) :
            self.hash_table[i] = LinkedList()
        
    def hash_func(self, string):
        hash_value = 0
        for i in range(len(string)):
            hash_value = (hash_value + ord(string[i]) * (self.multiplier ** i)) % self.prime
        return hash_value % self.bucket_count
    
    def insert(self, string):
        index = self.hash_func(string)
        self.hash_table[index].push_front(string)

    def find(self, string):
        index = self.hash_func(string)
        self.hash_table[index].find_key(string)
        
    def delete(self, string):
        index = self.hash_func(string)
        self.hash_table[index].delete_key(string)
                
    def print_chain(self, index):
        self.hash_table[index].print_linked_list()

if __name__ == '__main__':
    bucket_count = int(input())
    chain = HashChainTable(bucket_count)
    query_count = int(input())
    for i in range(query_count):
        query = input().split()
        if query[0] == 'add':
            chain.insert(query[1])
        elif query[0] == 'del':
            chain.delete(query[1])
        elif query[0] == 'find':
            chain.find(query[1])
        elif query[0] == 'check':
            chain.print_chain(int(query[1]))
        else:
            assert(0)

5
12
add world
add HellO
check 4
HellO world
find World
no
find world
yes
del world
check 4
HellO
del HellO
add luck
add GooD
check 1

check 2
GooD luck


In [17]:
from collections import deque

def hash_func(s, m):
    h = 0
    for i in range(len(s)):
        h = (h + ord(s[i]) * (263 ** i)) % 1000000007
    return h % m

if __name__ == '__main__':
    hash_chains = {}
    elements = []
    m = int(input())
    for i in range(m):
        hash_chains[i] = deque()   
    n = int(input())
    for _ in range(n):
        query = input().split()
        if query[0] == 'add':
            if query[1] not in elements:
                elements.append(query[1])
                index = hash_func(query[1], m)
                hash_chains[index].appendleft(query[1])
        elif query[0] == 'del':
            if query[1] in elements:
                elements.remove(query[1])
                index = hash_func(query[1], m)
                hash_chains[index].remove(query[1])
        elif query[0] == 'find':
            index = hash_func(query[1], m)
            if query[1] in hash_chains[index]:
                print('yes')
            else:
                print('no')
        elif query[0] == 'check':
            if len(hash_chains.get(int(query[1]))) == 0:
                print()
            for i in range(len(hash_chains.get(int(query[1])))):
                if i != len(hash_chains.get(int(query[1]))) - 1:
                    print(hash_chains.get(int(query[1]))[i], end = ' ')
                else:
                    print(hash_chains.get(int(query[1]))[i])
        else:
            assert(0)
    print(hash_chains)

3
12
check 0

find help
no
add help
add del
add add
find add
yes
find del
yes
del del
find del
no
check 0

check 2

check 1
add help
{0: deque([]), 1: deque(['add', 'help']), 2: deque([])}


### Find Pattern in Text (Rabin‚ÄìKarp‚Äôs Algorithm)

**Task:** In this problem your goal is to implement the Rabin‚ÄìKarp‚Äôs algorithm for searching the given pattern
in the given text.

**Input Format:** There are two strings in the input: the pattern $P$ and the text $T$.

**Constraints:** $1 \leq |P| \leq |T| \leq 5 \cdot 10^5$.
The total length of all occurrences of $P$ in $T$ doesn‚Äôt exceed 10^8.
The pattern and the text contain only latin letters.

**Output Format:** Print all the positions of the occurrences of $P$ in $T$ in the ascending order. Use 0-based
indexing of positions in the the text $T$.

### Substring Equality

**Task:** In this problem, you will use hashing to design an algorithm that is able to preprocess a given string $s$
to answer any query of the form ‚Äúare these two substrings of $s$ equal?‚Äù efficiently. This, in turn, is a basic
building block in many string processing algorithms.

**Input Format:** The first line contains a string $s$ consisting of small Latin letters. The second line contains
the number of queries $q$. Each of the next $q$ lines specifies a query by three integers $a$, $b$, and $l$.

**Constraints:** $1 \leq |s| \leq 500000. 1 \leq q \leq 100000. 0 \leq a, b \leq |s| ‚àí l$ (hence the indices $a$ and $b$ are 0-based).

**Output Format:** For each query, output **‚ÄúYes‚Äù** if $s_a s_{a+1} \dotsc s_{a+l-1} = s_b s_{b+1} \dotsc s_{b+l-1}$ are equal, and **‚ÄúNo‚Äù** otherwise.

### Longest Common Substring

**Task:** In the longest common substring problem one is given two strings $s$ and $t$ and the goal is to find a string $w$
of maximal length that is a substring of both $s$ and $t$. This is a natural measure of similarity between two
strings. The problem has applications in text comparison and compression as well as in bioinformatics.

The problem can be seen as a special case of the edit distance problem (where only insertions and
deletions are allowed). Hence, it can be solved in time $O(|s| ¬∑ |t|)$ using dynamic programming. Later in
this specialization, we will learn highly non-trivial data structures for solving this problem in linear time
$O(|s| + |t|)$. In this problem, your goal is to use hashing to solve it in almost linear time.

**Input Format:** Every line of the input contains two strings $s$ and $t$ consisting of lower case Latin letters.

**Constraints:** The total length of all $s$‚Äôs as well as the total length of all $t$‚Äôs does not exceed $100000$.

**Output Format:** For each pair of strings $ùë†$ and $t_i$ find its longest common substring and specify it by
outputting three integers: its starting position in $s$, its starting position in $t$ (both 0-based), and its
    length. More formally, output integers $0 \leq i < |s|, 0 \leq j < |t|$, and $ùëô \geq 0$ such that $s_i s_{i+1} \dotsc s_{i+l-1} = t_j t_{j+1} \dotsc t_{j+l-1}$ and $l$ is maximal. (As usual, if there are many such triples with maximal $l$, output any
of them.)

### Pattern Matching with Mismatches

**Task:** A natural generalization of the pattern matching problem is the following: find all text locations where distance
from pattern is sufficiently small. This problems has applications in text searching (where mismatches
correspond to typos) and bioinformatics (where mismatches correspond to mutations).

For an integer parameter $k$ and two strings $t = t_o t_1 \dotsc t_{m-1}$ and $p = p_o p_1 \dotsc p_{n-1}$, we say that
$p$ occurs in $t$ at position $i$ with at most $k$ mismatches if the strings $p$ and $t[i:i+p]=t_i t_{i+1} \dotsc t_{i+n-1}$ differ in at most $k$ positions.

**Input Format:** Every line of the input contains an integer $k$ and two strings $t$ and $p$ consisting of lower
case Latin letters.

**Constraints:** $0 \leq k \leq 5, 1 \leq |t| \leq 200000, 1 \leq |p| \leq min(|t|, 100000)$. The total length of all $t$‚Äôs does not exceed $200000$, the total length of all $p$‚Äôs does not exceed $100000$.

**Output Format:** For each triple $(k, t, p)$, find all positions $0 \leq i_1 < i_2 < \dots < i_l < |t|$ where $p$ occurs in $t$ with at most $k$ mismatches. Output $l$ and $i_1, i_2, \dotsc , i_l$.