## A Tutorial on HashMap
 Data structures help us in representing and efficiently manipulating the data associated with real world problems.

 Let's work on such a problem.

### The problem Scenario

In a class of students, store heights for each student.



The problem in itself is very simple. We have the data of heights of each student. We want to store it so that next time someone asks for height of a student, we can easily return the value. But how can we store these heights?

Obviously we can use a database and store these values. But, let's say we don't want to do that for now. We want to use a data structure to store these values as part of our program. For the sake of simplicity, our problem is limited to storing heights of students. But you can certainly imagine scenarios where you have to store such `key-value` pairs and later on when someone gives you a `key`, you can efficiently return the corrresponding `value`.

The class diagram for HashMaps would look something like this.

In [None]:
class HashMap:
    
    def __init__(self):
        self.num_entries = 0
    
    def put(self, key, value):
        pass
    
    def get(self, key):
        pass
    
    def size(self):
        return self.num_entries

### Arrays, Linked-Lists, Queues and Stacks

We can try any of the above data structures to see if possible to implement a hashmap. 
<br><br>
An array to store the names and another to store the heights would work. However, in a worst case scenario we would need to traverse the aentire array to match the name with the requested height leading to `O(n)` time complexity and when using a sorted array at best `O(log(n))` complexity.
<br><br>
A linked list would indeed work however it still would need traversing hence this would not increase the lookup to constant time.
<br><br>
Queues and Stacks would surely increase the lookup for the cases when getting the oldest and newest elemts respectively to `O(1)`, however this would not work for other elements as we would need to traverse the queues and the stacks.
<br><br>
Looking again at arrays if we could use an array index associated with a `key` (in this specific case a name) to look up the `value` (in this case height). i.e `arr[3]` then this would be a constant look up time. `The only problem now is how to turn strings or any other data structure to an array index??`

#### Hashing functions
Hashing functions in turn help us to answer the above question and therefore make our hashmap functional with constant lookup time.


In [1]:
def hash_function(string):
    # we can use sum corresponding ASCII values of string and use it as the hash value
    # ord(character) determines ASCII value of a particular character
    hash_code = 0
    for char in string:
         hash_code += ord(char)

    return hash_code

In [2]:
print(hash_function("abcd"))

394


The above hashing function is not a good one. This is because it gives the same answer for `abcd` and `bcad`,leading to coliision. A good hashing function should avoid collision. Honestly in reality differrent types of keys require different hashing functions. i.e. hash functions for strings will be different from hash functions for intergers and different still for objects of our own created classes.

## Hash Functions for Strings
For a string like `abcde` an effective solution is to treat this as number of prime number base `p`.  This is to say:
For a number e.g. `578` can be represented in base 10 number system as $$5*10^2+7*10^1+8*10^0$$

Now for our string we can similarly write; $$a*p^4+b*10^3+c*10^2+d*10^1+e*10^0$$ However,we replace the letters with their corresponding ASCII values. This methood of implementing hash functions for strings is among the best functions among a well researched area. Prime numbers are ideal since they offer a good distribution. Most common prime numbers used are `31` and `37`.


Hence using the above algorithm we can get a corresponding interger value for each string key and **use it as an array index** of an array e.g. `bucket_array`. Each entry in this array is called a `bucket` and the index to store a bucket is a `bucket_index`. The `bucket_array` can be visualised as shown below.

<img style="float: center;height:500px" src="bucket0.png"><br>

Defining our class with these details.

In [4]:
class HashMap:

    def __init__(self,initial_size=10) -> None:
        self.bucket_array = [None for _ in range(initial_size)]
        self.num_entries = 0
        self.p = 37
        
    def put(self, key, value):
        pass
    
    def get(self, key):
        pass
    
    def size(self):
        return self.num_entries
    
    def get_bucket_index(self,key):
        return self.get_hash_code(key)

    def get_hash_code(self,key):
        # to ensure it is a key
        key = str(key)
        hash_code = 0
        # first coeffecient represented below as self.p^0=1
        coeffecient = 1

        for character in key:
            hash_code += ord(character)*coeffecient
            coeffecient *= self.p

        return hash_code

In [5]:
hash_map = HashMap()

bucket_index = hash_map.get_bucket_index("abcd")
print(bucket_index)

5204554


In [6]:
hash_map = HashMap()

bucket_index = hash_map.get_bucket_index("bcda")
print(bucket_index)

5054002


The above shows that our hash function is not causing collision as the simple one discussed above. However the `bucket_index` are *way huge* and **creating such large arrays will be a space complexity issue which is not viable**. A way out of this is to use a compression function to compress the values outputed above and hence create reasonably sized arrays.<br><br>
A very good,simple and effective compression function can be the `mod len(array)` utilizing the `modulu operator %`, which returns the remainder of one number divided by the other.<br><br>
So, if we have an array of size 10, we can be sure that modulo of any number with 10 will be less than 10, allowing it to fit into our bucket array. You can visualize the `bucket array` again as shown in the figure below, in which the `bucket_index` is generated by the string key:

<img style="float:center;height:500px" src="bucket1.png"><br>

**Note that here we are storing the string key and corresponding numeric value in a Node**. 
Because of how modulo operator works, instead of creating a new function, we can write the logic for compression function in our `get_hash_code()` function itself.

https://www.khanacademy.org/computing/computer-science/cryptography/modarithmetic/a/modular-multiplication