## **HASH TABLE**

### ***LIST IMPLEMENTATION***

In [3]:
stock_prices = list()
with open ("data_files/stock_prices.csv","r") as file:
    for line in file:
        tokens = line.split(",")
        days = tokens[0]
        prices = float(tokens[1])
        stock_prices.append([days, prices])

In [4]:
stock_prices

[['march 6', 310.0],
 ['march 7', 340.0],
 ['march 8', 380.0],
 ['march 9', 302.0],
 ['march 10', 297.0],
 ['march 11', 323.0]]

* ***If we want to know price of march 9***

In [5]:
for element in stock_prices:
    if element[0] == 'march 9':
        print(element[1])

302.0


* While using a list to here it is not effcient way.
* for example think that file size is million record and you want to knw last day.In this case million iteration for this.
    * And iteration according to size of array.
    * So the complexity of this method ***big-o : O(n) = n*** 

### ***DICTIONARY IMPLEMENTATION***

In [10]:
stock_prices = dict()
with open ("data_files/stock_prices.csv","r") as file:
    for line in file:
        tokens = line.split(",")
        days = tokens[0]
        prices = float(tokens[1])

        stock_prices[days] = prices

In [5]:
stock_prices

{'march 6': 310.0,
 'march 7': 340.0,
 'march 8': 380.0,
 'march 9': 302.0,
 'march 10': 297.0,
 'march 11': 323.0}

* ***If we want to know price of march 9***

In [6]:
stock_prices['march 9']

302.0

* We must look the implemantation of dictionary, ***dictionary implements hash table*** underline data structure.
* The complexity of this method ***big-o : O(n) = 1*** 

### ***Compare Memory Operations*** 

### ***List***

* When you store data as ***two dimensional list***:

![image](images/ram_list.png)

### ***Dictionary***

* When you store data as ***dictionary***:

![image](images/ram_dict.png)

* Use ***hash function to create index of each specific element***
* The hash functions conversts to string "key" into an index into an array
* We have integer index in list and array, but we have specific string index in hashmap(hash table)

#### NOTE : 
* ***How did we get number to 9 from "march 6"?  How is hash function work ?***
    * There are different ways implement the hash table.
    * Lets look ASCII method:

![image](images/hash_function.png)

### ***HASH TABLE IMPLEMENTATION***

In [7]:
def hash_function(key : str):
    sum = 0 
    for char in key:
        sum += ord(char)
    return sum % 100     # ASSUME HASH MAP ARRAY SIZE IS 100

In [8]:
hash_function('march 6')

9

In [9]:
ord('m')  ## ASCII VALUE 

109

In [10]:
class HashTable:  
    def __init__(self):
        self.MAX = 20
        self.arr = [None for i in range(self.MAX)]
        
    def get_hash(self, key):
        hash = 0
        for char in key:
            hash += ord(char)
        return hash % self.MAX
    
    def add(self, key, val):
        h = self.get_hash(key)
        self.arr[h] = val
    
    def get(self, key):
        h = self.get_hash(key)
        return self.arr[h]        

In [11]:
t = HashTable()
# t.get_hash('march 6') --> 9
t.add('march 6', 130)

In [12]:
# t.arr

In [13]:
t.get('march 6')

130

#### NOTE : 
* An easy way for this:
* Python support owerwrite in this shape:

In [14]:
t = HashTable()  # ERRROOOR
t["march 6"] = 310
t["march 7"] = 420

TypeError: 'HashTable' object does not support item assignment

* **BUT YOU MUST DEFINE METHODS : "__getitem__ __setitem__**

In [20]:
class HashTable:  
    def __init__(self):
        self.MAX = 20
        self.arr = [None for i in range(self.MAX)]
        
    def get_hash(self, key):
        hash = 0
        for char in key:
            hash += ord(char)
        return hash % self.MAX
    
    def __setitem__(self, key, val):
        h = self.get_hash(key)
        self.arr[h] = val
    
    def __getitem__(self, key):
        h = self.get_hash(key)
        return self.arr[h]        
    
    def __delitem__(self, key):
        h = self.get_hash(key)
        self.arr[h] = None


In [21]:
t = HashTable()  # ERRROOOR
t["march 6"] = 310
t["march 7"] = 420
t["december 12"] = 1000

In [22]:
t["march 6"]

310

In [23]:
t["march 7"]

420

In [24]:
t["december 12"]

1000

In [25]:
del(t["march 7"])

In [26]:
t.arr

[None,
 None,
 None,
 None,
 None,
 None,
 None,
 None,
 None,
 310,
 None,
 None,
 None,
 None,
 1000,
 None,
 None,
 None,
 None,
 None]

### ***Bıg-O Notation Hash Table***


![image](images/bigo.png)

####  NOTE : Hash Table in Python/Java/C++

![image](images/C.png)

### ***COLLISION HANDLING***

### ***Separate Chaining***

![img](images/sperate.png)

* In this case we see a collision in "march 17". This means we already have some values in the same index.
* For this reason we need a special handling, **because two keys are trying to store thir values at the same location.**
* We can use an approach called as **separate chaining :Chaining where instead of directly storing the value store the linked list or a list.**
* In this method : So first when "march 6" came in we appended this element of the linked list next time when we have a collision second element append the tail.
* ***NOTE:  This way list can keep on growing and multiple keys can share same cash value.***


![img](images/10.png)

* When want to return value of key of "march 17": 
    - first use hash function and get index value:9
    - then go to index 9
    - apply the linear searcch in this linked list.
    

### ***Separate Chaining Implementation***

In [27]:
class HashTable:  
    def __init__(self):
        self.MAX = 10
        self.arr = [None for i in range(self.MAX)]
        
    def get_hash(self, key):
        hash = 0
        for char in key:
            hash += ord(char)
        return hash % self.MAX
    
    def __getitem__(self, key):
        h = self.get_hash(key)
        return self.arr[h]
    
    def __setitem__(self, key, val):
        h = self.get_hash(key)
        self.arr[h] = val
       

In [28]:
t = HashTable()

In [29]:
t.get_hash("march 6")

9

In [30]:
t.get_hash("march 17")    # collision

9

In [31]:
t["march 6"] = 120
t["march 8"] = 67
t["march 9"] = 4
t["march 17"] = 459

In [32]:
t["march 6"] # 120 -> 459

459

#### **SOLUTION**

In [33]:
class HashTable:  
    def __init__(self):
        self.MAX = 10
        self.arr = [[] for i in range(self.MAX)]
        
    def get_hash(self, key):
        hash = 0
        for char in key:
            hash += ord(char)
        return hash % self.MAX
    
    def __getitem__(self, key):
        h = self.get_hash(key)
        for element in self.arr[h]:
            if element[0] == key:
                return element[1]
            
    def __setitem__(self, key, val):
        h = self.get_hash(key)
        found = False
        for index, element in enumerate(self.arr[h]):
            if len(element) == 2 and element[0] == key:
                self.arr[h][index] = (key,val)
                found = True
        if not found:
            self.arr[h].append((key,val))
        
    def __delitem__(self, key):
        h = self.get_hash(key)
        for index, element in enumerate(self.arr[h]):
            if element[0] == key:
                print("del",index)
                del self.arr[h][index]


In [34]:
t = HashTable()
t["march 6"] = 310
t["march 7"] = 420
t["march 8"] = 67
t["march 17"] = 63457

In [35]:
t["march 6"]

310

In [36]:
t["march 17"]

63457

In [37]:
t.arr

[[('march 7', 420)],
 [('march 8', 67)],
 [],
 [],
 [],
 [],
 [],
 [],
 [],
 [('march 6', 310), ('march 17', 63457)]]

In [38]:
t["march 6"] = 11

In [39]:
t.arr

[[('march 7', 420)],
 [('march 8', 67)],
 [],
 [],
 [],
 [],
 [],
 [],
 [],
 [('march 6', 11), ('march 17', 63457)]]

In [40]:
t = HashTable()
t["march 6"] = 310
t["march 6"] = 10000
t["march 7"] = 420
t["march 8"] = 67
t["march 17"] = 63457
t["march 17"] = 10
t["march 26"] = 1387218973
t["march 26"] = 435
t["march 26"] = 5
t["march 26"] = 0
t.arr

[[('march 7', 420)],
 [('march 8', 67)],
 [],
 [],
 [],
 [],
 [],
 [],
 [],
 [('march 6', 10000), ('march 17', 10), ('march 26', 0)]]

In [41]:
del t["march 26"]

del 2


In [42]:
t.arr

[[('march 7', 420)],
 [('march 8', 67)],
 [],
 [],
 [],
 [],
 [],
 [],
 [],
 [('march 6', 10000), ('march 17', 10)]]

### NOTE :

In [64]:
def fn():
    l = []
    x = [('march 6', 310), ('march 6', 63457)]
    for idx, element in enumerate(x):
        l.append((idx, element))
        print(l)

In [65]:
fn()

[(0, ('march 6', 310))]
[(0, ('march 6', 310)), (1, ('march 6', 63457))]


### ***Linear Probing***

* In this case when we see collision at an index: **we go to the next available location in this array.**




![images](images/k.png)

### ***Linear Probing Implementation***

In [24]:
class HashTable:  
    def __init__(self):
        self.MAX = 10
        self.arr = [None for i in range(self.MAX)]
        
    def get_hash(self, key):
        hash = 0
        for char in key:
            hash += ord(char)
        return hash % self.MAX

    def get_probe_range(self, index):
        return [*range(index, len(self.arr))] + [*range(0, index)]
    
    def __getitem__(self, key):
        h = self.get_hash(key)
#         if self.arr[h] is None:
#             return
        probe_range = self.get_probe_range(h)
        for probe_index in probe_range:
            element = self.arr[probe_index] 
            if element is None:
                return
            if element[0] == key:
                return element[1]
                
    def find_slot(self, key, h):
        probe_range = self.get_probe_range(h)
        for probe_index in probe_range:
            if self.arr[probe_index] is None:
                return probe_index
            if self.arr[probe_index][0] == key:
                return probe_index
        raise Exception("HASH TABLE FULL.")
        
    def __setitem__(self, key, val):
        h = self.get_hash(key)
        if self.arr[h] is None:
            self.arr[h] = (key,val)
        else:
            new_h = self.find_slot(key, h)
            self.arr[new_h] = (key,val)
        print (self.arr)

    def __delitem__(self, key):
        h = self.get_hash(key)
        probe_range = self.get_probe_range(h)
        for probe_index in probe_range:
            if self.arr[probe_index] is None:
                return 
            if self.arr[probe_index][0] == key:
                self.arr[probe_index] = None
        print(self.arr)

In [25]:
index = 2
[*range(index, 5)] + [*range(0, index)] 

[2, 3, 4, 0, 1]

In [26]:
t = HashTable()
t["march 6"] = 20
t["march 17"] =  88

[None, None, None, None, None, None, None, None, None, ('march 6', 20)]
[('march 17', 88), None, None, None, None, None, None, None, None, ('march 6', 20)]


In [27]:
t["march 17"] = 29

[('march 17', 29), None, None, None, None, None, None, None, None, ('march 6', 20)]


In [28]:
t["nov 1"] = 1

[('march 17', 29), ('nov 1', 1), None, None, None, None, None, None, None, ('march 6', 20)]


In [29]:
t["march 33"] = 234

[('march 17', 29), ('nov 1', 1), None, None, None, None, None, ('march 33', 234), None, ('march 6', 20)]


In [30]:
t["dec 1"]

In [31]:
t["march 33"]

234

In [32]:
t["march 33"] = 999

[('march 17', 29), ('nov 1', 1), None, None, None, None, None, ('march 33', 999), None, ('march 6', 20)]


In [33]:
t["march 33"]

999

In [34]:
t["april 1"]=87

[('march 17', 29), ('nov 1', 1), None, None, None, None, None, ('march 33', 999), ('april 1', 87), ('march 6', 20)]


In [35]:
t["april 2"]=123

[('march 17', 29), ('nov 1', 1), ('april 2', 123), None, None, None, None, ('march 33', 999), ('april 1', 87), ('march 6', 20)]


In [36]:
t["april 3"]=234234

[('march 17', 29), ('nov 1', 1), ('april 2', 123), ('april 3', 234234), None, None, None, ('march 33', 999), ('april 1', 87), ('march 6', 20)]


In [37]:
t["april 4"]=91

[('march 17', 29), ('nov 1', 1), ('april 2', 123), ('april 3', 234234), ('april 4', 91), None, None, ('march 33', 999), ('april 1', 87), ('march 6', 20)]


In [38]:
t["May 22"]=4

[('march 17', 29), ('nov 1', 1), ('april 2', 123), ('april 3', 234234), ('april 4', 91), ('May 22', 4), None, ('march 33', 999), ('april 1', 87), ('march 6', 20)]


In [40]:
t["May 7"]=47

[('march 17', 29), ('nov 1', 1), ('april 2', 123), ('april 3', 234234), ('april 4', 91), ('May 22', 4), ('May 7', 47), ('march 33', 999), ('april 1', 87), ('march 6', 20)]


In [18]:
t["Jan 1"]=0

Exception: HASH TABLE FULL.

In [19]:
del t["april 2"]

[('march 17', 29), ('nov 1', 1), None, ('april 3', 234234), ('april 4', 91), ('May 22', 4), ('May 7', 47), ('march 33', 999), ('april 1', 87), ('march 6', 20)]


In [20]:
t["Jan 1"]=0

[('march 17', 29), ('nov 1', 1), ('Jan 1', 0), ('april 3', 234234), ('april 4', 91), ('May 22', 4), ('May 7', 47), ('march 33', 999), ('april 1', 87), ('march 6', 20)]


### ***Bıg-O Notation With Separate Chaining - Linear Probing***

* Big-o analysis in here will be that usually:
   - For avarage **O(n) = 1**
   - But it might be go up till **O(n) = n**
        - **for ex**: if you have a bad hashing function and all the elements all the keys generate same index and then you will have all values stock in one bucket, this will be a long linked list.
        - And complexity is accordcing to size of linked list (n). 