# BASICS

# Index Mapping (or Trivial Hashing) with negatives allowed

Given a limited range array contains both positive and non-positive numbers, i.e., elements are in the range from -MAX to +MAX. Our task is to search if some number is present in the array or not in O(1) time.

In [1]:
# INDEX MAPPING/TRIVIAL MAPPING

def insert(hash,value):
    if value>=0:
        hash[value][0]=1
    else:
        hash[abs(value)][1]=1

def search(hash,value):
    if value>=0:
        return hash[value][0]==1
    return hash[abs(value)][1]==1


if __name__ == '__main__':
    arr=[-1, 9, -5, -8, -5, -2]
    hash=[[0 for j in range(2)]for i in range(1000)]
    for i in arr:
        insert(hash,i)
    x=-5
    print(search(hash,x))


True


# Chaining - Handling Collisions

Two keys having same hash values is called collision

The idea is to make each cell of hash table point to a linked list of records that have same hash function value. 

In [2]:
# SEPARATE CHAINING FOR COLLISION HANDLING

class Node:
    def __init__(self,data):
        self.data=data
        self.next=None

def insert(arr,value):
    index=value%7
    if arr[index] is None:
        arr[index]=Node(value)
        # return
    else:
        temp=arr[index]
        while temp.next:
            temp=temp.next
        temp.next=Node(value)

def traverseLinkedList(head):
    temp=head
    while temp:
        print(temp.data,end=" ")
        temp=temp.next
    print("\n")

def traverseHashTable(arr):
    for i in range(len(arr)):
        print(i)
        traverseLinkedList(arr[i])

def searchLinkedList(head,value):
    if head is None:
        return False
    temp=head
    while temp:
        if temp.data==value:
            return True
        temp=temp.next
    return False

def searchValue(arr,value):
    index=value%7
    return searchLinkedList(arr[index],value)

if __name__ == '__main__':
    values=[50, 700, 76, 85, 92, 73, 101]
    arr=[None]*(len(values))
    for i in values:
        insert(arr,i)
    traverseHashTable(arr)
    print("\n\n")
    print("Element Found" if searchValue(arr,85) else "Element Not Found")


0
700 

1
50 85 92 

2


3
73 101 

4


5


6
76 




Element Found


# Open Addressing | Collision Handling 

In Open Addressing, all elements are stored in the hash table itself. So at any point, the size of the table must be greater than or equal to the total number of keys (Note that we can increase table size by copying old data if needed). 

Insert(k): Keep probing until an empty slot is found. Once an empty slot is found, insert k. 

Search(k): Keep probing until slot’s key doesn’t become equal to k or an empty slot is reached. 

Delete(k): Delete operation is interesting. If we simply delete a key, then the search may fail. So slots of deleted keys are marked specially as “deleted”. 

The insert can insert an item in a deleted slot, but the search doesn’t stop at a deleted slot. 

Open Addressing is done in the following ways: 

**Linear Probing**

**Quadratic Probing**

**Double Hashing**

# Load Factor and Rehashing

**How hashing works:**

For insertion of a key(K) – value(V) pair into a hash map, 2 steps are required:

1) K is converted into a small integer (called its hash code) using a hash function.

2) The hash code is used to find an index (hashCode % arrSize) and the entire linked list at that index(Separate chaining) is first searched for the presence of the K already.

3) If found, it’s value is updated and if not, the K-V pair is stored as a new node in the list.

**Complexity and Load Factor**

For the first step, time taken depends on the K and the hash function.
For example, if the key is a string “abcd”, then it’s hash function may depend on the length of the string. But for very large values of n, the number of entries into the map, length of the keys is almost negligible in comparison to n so hash computation can be considered to take place in constant time, i.e, O(1).

For the second step, traversal of the list of K-V pairs present at that index needs to be done. For this, the worst case may be that all the n entries are at the same index. So, time complexity would be O(n). But, enough research has been done to make hash functions uniformly distribute the keys in the array so this almost never happens.

So, on an average, **if there are n entries and b is the size of the array there would be n/b entries on each index. This value n/b is called the load factor that represents the load that is there on our map.**

This **Load Factor needs to be kept low, so that number of entries at one index is less and so is the complexity almost constant, i.e., O(1).**

**Rehashing:**
    
Basically, when the load factor increases to more than its pre-defined value (default value of load factor is 0.75), the complexity increases. So to overcome this, the size of the array is increased (doubled) and all the values are hashed again and stored in the new double sized array to maintain a low load factor and low complexity.

