In simple terms, a hash function maps a big number or string to a small integer that can be used as index in hash table.
A good hash function should have following properties
1) Efficiently computable.
2) Should uniformly distribute the keys (Each table position equally likely for each key)

Collision Handling: The situation where a newly inserted key maps to an already occupied slot in hash table is called collision and must be handled using some collision handling technique.

There are two methods to handle collision: 

Chaining:The idea is to make each cell of hash table point to a linked list of records that have same hash function value. Chaining is simple, but requires additional memory outside the table.

Open Addressing: In open addressing, all elements are stored in the hash table itself. Each table entry contains either a record or NIL. When searching for an element, we one by one examine table slots until the desired element is found or it is clear that the element is not in the table.

Sınırlı sayıda sayının bulunduğu bir unique sayı dizisini hash tablosuna yerleştirme

In [3]:
def insert(a, n):
    for i in range(n):
        if a[i] >= 0: 
            has[a[i]][0] = 1
        else: 
            has[abs(a[i])][1] = 1

MAX = 1000
has = [[0, 0] for i in range(MAX + 1)]
#burda max değerimize kadarki tüm sayılara 0, 0 değerinin verildiği bir 2D array oluşturduk. Eğer bu hash tablosuna yeni bir 
#değer eklemek istiyorsak bu değerin arraydeki değerini(eğer pozitifse ilk değeri, eğer negatifse ikinci değeri) 1 yaparız. 
#böylece eğer daha sonra bu hash tablosundan değer aratmak istiyorsak rahatça aratabiliriz. 

    Separate Chaining (Open hashing):

    The idea is to make each cell of hash table point to a linked list of records that have same hash function value.

![image.png](attachment:image.png)

    Advantage : Hash table never fills up, we can always add more elements to the chain.

    Disadvantage : Cache performance of chaining is not good as keys are stored using a linked list. Open addressing provides better cache performance as everything is stored in the same table. 

    In open hashing, if there are n elements in the set then each cell will have roughly λ = n / TableSize members(We hope that the number of elements per cell is roughly equal in size). λ is called the load factor of the hash table. If we can estimate n, then we can choose TableSize, so that the lists will have only 1-3 members on average. 
    Open hashing becomes slow once λ > 2. 
    
    In open hashing; insert, find and remove operations take O(1 + n / tableSize) time each. If we can choose tableSize to be about n, running time will be constant. 

    Open Addressing (Closed hashing):

    In Open Addressing, all elements are stored in the hash table itself. So at any point, the size of the table must be greater than or equal to the total number of keys.
    (It is better to choose table size as a prime number because prime number distributes numbers better) 

    Insert(k): Keep probing until an empty slot is found. Once an empty slot is found, insert k. 

    Search(k): Keep probing until slot’s key becomes equal to k or an empty slot is reached. If empty slot is reached that means the key is not in the table so stop searching.  

    Delete(k): Delete operation is interesting. If we simply delete a key, then the search may fail(search will stop at the deleted key but the key we are searching may be later than this key). So slots of deleted keys are marked specially as “deleted”. 
    The insert can insert an item in a deleted slot, but the search doesn’t stop at a deleted slot. 

    There are some different ways to do open adressing: 

    a) Linear Probing: In linear probing, we linearly probe for next slot. For example, the typical gap between two probes is 1 as taken in below example also.

    If (hash(x) + 1) % S is also full, then we try (hash(x) + 2) % S

    If (hash(x) + 2) % S is also full, then we try (hash(x) + 3) % S 

    ..................................................


![image.png](attachment:image.png)

    Advantages : As long as the table is big enough, a free cell can always be found. 
    Disadvantages : After some time adding and searching becomes longer and longer. The time to insert can get quite large even if the table is relatively empty since blocks of occupied cells are forming. This effect is known as primary clustering.


    b) Quadratic Probing: We look for i2‘th slot in i’th iteration.  

    If slot hash(x) % S is full, then we try (hash(x) + 1*1) % S

    If (hash(x) + 1*1) % S is also full, then we try (hash(x) + 2*2) % S

    If (hash(x) + 2*2) % S is also full, then we try (hash(x) + 3*3) % S

    Disadvantages : No guarantee that an empty cell will be found if table is more than half full, or even before that if table size is not prime. 
    

    c) Double Hashing: We use another hash function hash2(x) and look for i*hash2(x) slot in i’th rotation. 

    If slot hash(x) % S is full, then we try (hash(x) + 1*hash2(x)) % S

    If (hash(x) + 1*hash2(x)) % S is also full, then we try (hash(x) + 2*hash2(x)) % S

    If (hash(x) + 2*hash2(x)) % S is also full, then we try (hash(x) + 3*hash2(x)) % S


    Advantages of open adressing : Open addressing provides better cache performance as everything is stored in the same table.

    Disadvantages : In open addressing, table may become full(if λ is equal to 1 -> n = tableSize) . Open addressing is used when the frequency and number of keys is known.
    Quadratic probing can fail if λ > 0.5
    Linear probing and double hashing are slow if λ > 0.5 

Double hashing örneği

In [2]:
class Hash(self):
    TABLE_SIZE = 13
    Prime = 7
    curr_size = 0
    hashTable = [-1]*TABLE_SIZE 


    def isFull():
        return self.curr_size == self.TABLE_SIZE

    def hash1(key):
        return key % self.TABLE_SIZE

    def hash2(key):
        return self.Prime - (key % self.TABLE_SIZE)

    def insertHash(key):
        #if hash table is full 
        if isFull():
            return 

        #get index from first hash 
        index = hash1(key)

        #if collision occurs 
        if self.hashTable[index] != -1:
            index2 = hash2(key)
            i = 1
            while True: 
                #get index2 from second hash 
                newIndex = (index + i*index2) % self.TABLE_SIZE

                #if no collision occurs, store the key 
                if self.hashTable[newIndex] == -1: 
                    self.hashTable[newIndex] = key
                    break
                i += 1
        #if no collision occurs
        else: 
            self.hashTable[index] = key
        self.curr_size += 1

newHash = Hash()
lst = [19, 27, 36, 10, 64]
for x in lst: 
    newHash.insertHash(x)
print(newHash.hashTable)

NameError: name 'self' is not defined

    For hash function there are two desired properties: It is simple to evaluate and it distributes keys as evenly as possible.  

    When table gets too full, the running time for operations will start taking too long. Solution is for this problem is building a new table that is twice big and insert all the items to the new table. This process is called rehashing. Running time is O(n), but happens rarely. 

    A string or a class(or any other data type) must be converted to a number before hashing because hashing only works on numbers. 

    For a sequence of n operations: max number of rehashes are log n.
    Total time analysis : 
    - regular hash function takes time a.
    - rehashing a table of k elements takes b*k.
    - total time = a*n + b*(1 + 2 + 4 +... + n) = a*n + b*(2*n - 1) 
    So average total time is O(n).