## Direct Address Table

Direct Address Table is a data structure that has the capability of mapping records to their corresponding keys using arrays. In direct address tables, records are placed using their key values directly as indexes. They facilitate fast searching, insertion and deletion operations. 

We can understand the concept using the following example. We create an array of size equal to maximum value plus one (assuming 0 based index) and then use values as indexes. For example, in the following diagram key 21 is used directly as index. 

 

- **Advantages:**

    - Searching in O(1) Time: Direct address tables use arrays which are random access data structure, so, the key values (which are also the index of the array) can be easily used to search the records in O(1) time.
    - Insertion in O(1) Time: We can easily insert an element in an array in O(1) time. The same thing follows in a direct address table also.
    - Deletion in O(1) Time: Deletion of an element takes O(1) time in an array. Similarly, to delete an element in a direct address table we need O(1) time.
 
- **Limitations:**

    - Prior knowledge of maximum key value
    - Practically useful only if the maximum value is very less.
    - It causes wastage of memory space if there is a significant difference between total records and maximum value.
    - Hashing can overcome these limitations of direct address tables. 

**How to handle collisions?**
<br>

Collisions can be handled like Hashing. We can either use Chaining or open addressing to handle collisions. The only difference from hashing here is, we do not use a hash function to find the index. We rather directly use values as indexes.

# Hashing Function
Hashing is a technique or process of mapping keys, and values into the hash table by using a hash function.<br>
It is done for faster access to elements. The efficiency of mapping depends on the efficiency of the hash function used.

```
Let a hash function H(x) maps the valuexat the index x%10 in an Array. For example if the list of values is [11,12,13,14,15] it will be stored at positions {1,2,3,4,5} in the array or Hash table respectively.

```
**The mod method:**
<br>

In this method for creating hash functions, we map a key into one of the slots of table by taking the remainder of key divided by table_size. That is, the hash function is 

```
h(key) = key mod table_size 

i.e. key % table_size
For Example

37599 % 17 = 12
But for key = 573, its hash function is also 573 % 17 = 12

```

**The multiplication method:**
```
In multiplication method, we multiply the key k by a constant real number c in the range 0 < c < 1 and extract the fractional part of k * c.
Then we multiply this value by table_size m and take the floor of the result. It can be represented as
h(k) = floor (m * (k * c mod 1))
                     or
h(k) = floor (m * frac (k * c))
```

# Collision Handling: 
Since a hash function gets us a small number for a big key, there is possibility that two keys result in same value. The situation where a newly inserted key maps to an already occupied slot in hash table is called collision and must be handled using some collision handling technique. Following are the ways to handle collisions: 


### Chaining:
The idea is to make each cell of hash table point to a linked list of records that have same hash function value. Chaining is simple, but requires additional memory outside the table.


### Open Addressing: 
In open addressing, all elements are stored in the hash table itself. Each table entry contains either a record or NIL. When searching for an element, we examine the table slots one by one until the desired element is found or it is clear that the element is not in the table.
### Separate Chaining:
The idea behind separate chaining is to implement the array as a linked list called a chain. Separate chaining is one of the most popular and commonly used techniques in order to handle collisions.
<br>
<br>
The linked list data structure is used to implement this technique. So what happens is, when multiple elements are hashed into the same slot index, then these elements are inserted into a singly-linked list which is known as a chain

### Open Addressing:
```
Like separate chaining, open addressing is a method for handling collisions. In Open Addressing, all elements are stored in the hash table itself. So at any point, the size of the table must be greater than or equal to the total number of keys (Note that we can increase table size by copying old data if needed). This approach is also known as closed hashing. This entire procedure is based upon probing. We will understand the types of probing ahead:
```
### 1. Linear Probing: 
```
In linear probing, the hash table is searched sequentially that starts from the original location of the hash. If in case the location that we get is already occupied, then we check for the next location. 
The function used for rehashing is as follows: rehash(key) = (n+1)%table-size. 
```

### 2. Quadratic Probing 
If you observe carefully, then you will understand that the interval between probes will increase proportionally to the hash value. Quadratic probing is a method with the help of which we can solve the problem of clustering that was discussed above.  This method is also known as the mid-square method. In this method, we look for the i2‘th slot in the ith iteration. We always start from the original hash location. If only the location is occupied then we check the other slots.

```
let hash(x) be the slot index computed using hash function.  

If slot hash(x) % S is full, then we try (hash(x) + 1*1) % S
If (hash(x) + 1*1) % S is also full, then we try (hash(x) + 2*2) % S
If (hash(x) + 2*2) % S is also full, then we try (hash(x) + 3*3) % S
```

### Double Hashing 
```
The intervals that lie between probes are computed by another hash function. Double hashing is a technique that reduces clustering in an optimized way. In this technique, the increments for the probing sequence are computed by using another hash function. We use another hash function hash2(x) and look for the i*hash2(x) slot in the ith rotation. 
```


In [75]:
# Chaining
class MyHash:
    def __init__(self, b):
        self.BUCKET = b
        self.table = [[] for x in range(b)]

    def insert(self, x):
        i = x % self.BUCKET
        self.table[i].append(x)

    def remove(self, x):
        i = x % self.BUCKET
        if x in self.table[i]:
            self.table[i].remove(x)

    def search(self, x):
        i = x % self.BUCKET
        return x in self.table[i]


# h = MyHash(7)
# h.insert(70)
# h.insert(71)
# h.insert(9)
# h.insert(56)
# h.insert(72)
# print(h.search(56))
# h.remove(56)
# print(h.search(56))
# h.remove(56)


In [77]:
# Open Addressing
class MyHash:
    def __init__(self, c):
        self.cap = c
        self.table = [-1] * c
        self.size = 0

    def hash(self, x):
        return x % self.cap

    def search(self, x):
        h = self.hash(x)
        t = self.table
        i = h
        while t[i] != -1:
            if t[i] == x:
                return True
            i = (i + 1) % self.cap
            if i == h:
                return False
        return False

    def insert(self, x):
        if self.size == self.cap:
            return False

        if self.search(x) == True:
            return False
        i = self.hash(x)
        t = self.table
        while t[i] not in (-1, -2):
            i = (i + 1) % self.cap

        t[i] = x
        self.size = self.size + 1
        return True

    def remove(self, x):
        h = self.hash(x)
        t = self.table
        i = h
        while t[i] != -1:
            if t[i] == x:
                t[i] = -2
                return True
            i = (i + 1) % self.cap
            if i == h:
                return False
        return False


# h = MyHash(7)
# h.insert(70)
# h.insert(71)
# h.insert(9)
# h.insert(56)
# h.insert(72)
# print(h.search(56))
# h.remove(56)
# print(h.search(56))
# h.remove(56)


<br><br><br><br>

# Chaining vs Open Addressing
 


### Chaining
- Chaining is Simpler to implement.	
- In chaining, Hash table never fills up, we can always add more elements to chain.	
- Chaining is Less sensitive to the hash function or load factors.	
- Chaining is mostly used when it is unknown how many and how frequently keys may be inserted or deleted.	
- Cache performance of chaining is not good as keys are stored using linked list.	
- Wastage of Space (Some Parts of hash table in chaining are never used).	
- Chaining uses extra space for links. 


### Open Addressing
- Open Addressing requires more computation.

- In open addressing, table may become full.

- Open addressing requires extra care to avoid clustering and load factor.

- Open addressing is used when the frequency and number of keys is known.

- Open addressing provides better cache performance as everything is stored in the same table.

- In Open addressing, a slot can be used even if an input doesn’t map to it.

- No links in Open addressing

<br><br><br><br>

# Set
**A Set is an unordered collection data type that is iterable, mutable and has no duplicate elements.**

**Set are represented by { } (values enclosed in curly braces)**

The major advantage of using a set, as opposed to a list, is that it has a highly optimized method for checking whether a specific element is contained in the set. This is based on a data structure known as a hash table. Since sets are unordered, we cannot access items using indexes like we do in lists
<br>
<br>

### Methods for Sets

#### Adding elements to Python Sets
Insertion in set is done through **set.add()** function, where an appropriate record value is created to store in the hash table. Same as checking for an item, i.e., O(1) on average. However, in worst case it can become O(n).

#### Union operation on Python Sets
Two sets can be merged using **union() function or | operator**. Both Hash Table values are accessed and traversed with merge operation perform on them to combine the elements, at the same time duplicates are removed. The Time Complexity of this is O(len(s1) + len(s2)) where s1 and s2 are two sets whose union needs to be done.

#### Intersection operation on Python Sets
This can be done through **intersection() or & operator**. Common Elements are selected. They are similar to iteration over the Hash lists and combining the same values on both the Table. Time Complexity of this is O(min(len(s1), len(s2)) where s1 and s2 are two sets whose union needs to be done.

 

#### Finding Difference of Sets in Python
To find difference in between sets. Similar to find difference in linked list. This is done through **difference() or – operator**. Time complexity of finding difference s1 – s2 is O(len(s1))

 

#### Clearing Python Sets
Set **Clear()** method empties the whole set inplace.

<br><br><br><br>

# Python Dictionary

**Dictionary in Python is a collection of keys values, used to store data values like a map, which, unlike other data types which hold only a single value as an element.**

### Example of Dictionary in Python 

**Dictionary holds key:value pair. Key-Value is provided in the dictionary to make it more optimized.**

#### Creating a Dictionary
In Python, a dictionary can be created by placing a sequence of elements within curly {} braces, separated by ‘comma’. Dictionary holds pairs of values, one being the Key and the other corresponding pair element being its Key:value. Values in a dictionary can be of any data type and can be duplicated, whereas keys can’t be repeated and must be immutable. 

<br>
<br>

**Note – Dictionary keys are case sensitive, the same name but different cases of Key will be treated distinctly.**

In [63]:
def findSymPairs(arr, row):
    hM = dict()
    for i in range(row):
        first = arr[i][0]
        sec = arr[i][1]
        if sec in hM.keys() and hM[sec] == first:
            print("(", sec, ",", first, ")")
        else:
            hM[first] = sec

if __name__ == '__main__':
    arr = [[0 for i in range(2)] for j in range(5)]
    arr[0][0], arr[0][1] = 11, 20
    arr[1][0], arr[1][1] = 30, 40
    arr[2][0], arr[2][1] = 5, 10
    arr[3][0], arr[3][1] = 40, 30
    arr[4][0], arr[4][1] = 10, 5
    findSymPairs(arr, 5)


( 30 , 40 )
( 5 , 10 )


## Questions

#### Hashing for pair - 1
You are given an array of distinct integers and a sum. Check if there's a pair with the given sum in the array.

In [24]:
def sumExists(arr, N, sum1):
    complement_hashset = set()
    for i in range(N):
        complement = sum1 -  arr[i]
        print(complement)
        if arr[i] in complement_hashset:
            return True
        else:
            complement_hashset.add(complement)
    return False

### Linear Probing in Hashing
Linear probing is a collision-handling technique in hashing. Linear probing says that whenever a collision occurs, search for the immediate next position.

- Given an array of integers and a hash table size. 
- Fill the array elements into a hash table using Linear Probing to handle collisions. 
- Duplicate elements must be mapped to the same position in the hash table while colliding elements must be mapped to the [(value+1)%hashSize] position.

- Note :-  If there's no more space to insert a new element, just drop that element. 

In [65]:
def linearProbing(hashSize, arr, sizeOfArray):
    hashArr = [-1] * (hashSize)
    for i in arr:
        curr = (i)%hashSize
        end = curr
        while hashArr[curr]!=-1:
            curr+=1
            if curr== hashSize:
                curr=0
                
            if curr ==end:
                return hashArr
        hashArr[curr] = i
    return hashArr
            

### Given an array which may contain duplicates, print all elements and their frequencies.


In [72]:
# Python3 program to count frequencies 
# of array items
def countFreq(arr, n):

    mp = dict()

    # Traverse through array elements 
    # and count frequencies
    for i in range(n):
        if arr[i] in mp.keys():
            mp[arr[i]] += 1
        else:
            mp[arr[i]] = 1
            
    # Traverse through map and print 
    # frequencies
    for x in mp:
        print(x, " ", mp[x])

# Driver code
arr = [10, 20, 20, 10, 10, 20, 5, 20 ]
n = len(arr)
countFreq(arr, n)


10   3
20   4
5   1


### Quadratic Probing

In [81]:
def QuadraticProbing( hashSize, arr, N):
    hashTable = [-1] * (hashSize)
    for i in range(N):
        index = (arr[i])%hashSize

        if hashTable[index] == -1:
            hashTable[index] = arr[i]
        else:
            j = 1
            while hashTable[(index + j*j) % hashSize] != -1:
                j += 1
            hashTable[(index + j*j) % hashSize] = arr[i]
    return hashTable


In [82]:
QuadraticProbing(11,[21, 10, 32, 43],4)

[10, -1, -1, 32, -1, -1, -1, -1, 43, -1, 21]

### Count Non-Repeated Elements
Hashing is very useful to keep track of the frequency of the elements in a list.

You are given an array of integers. You need to print the count of non-repeated elements in the array.

In [125]:
hashset = set()
ref = set()
arr = [9,1,9,1,9,1,0]
hashset = set()
ref = set()
arr = [1 ,1 ,2 ,2,8 ,3, 3, 4 ,5 ,6 ,7]
for i in arr:
    if i in hashset:
        ref.add(i)
#         hashset.discard(i)
    else:
        
        hashset.add(i)
# print(ref)
# print(hashset)
# print(hashset - ref)
# len(hashset - ref)

### Winner of an election

Given an array of n names arr of candidates in an election, where each name is a string of lowercase characters. 
- A candidate name in the array represents a vote casted to the candidate. 
- Print the name of the candidate that received the maximum count of votes. 
- If there is a draw between two candidates, then print lexicographically smaller

In [126]:
def winner(self,arr,n):
    # Your code here
    # return the name of the winning candidate and the votes he recieved
    a = dict()
    for i in arr:
        if i in a:
            a[i]+=1
        else:
            a[i] = 1

    res = 0
    name = ''
    for i,j in a.items():
    
        if j> res or (j==res and i<name):
            
            res = j
            name = i
        
            
    print(a)
    return name,res



### First Repeating Element
Given an array arr[] of size n, find the first repeating element. The element should occur more than once and the index of its first occurrence should be the smallest.

Note:- The position you return should be according to 1-based indexing.

In [151]:
def firstRepeated(self,arr, n):
    myHash = dict()      
    for i in range(n):
        if arr[i] in myHash:
            myHash[arr[i]].append(i+1)
        else:
            myHash[arr[i]] = [i+1]
    print(myHash)
    idx = -1 
    for key,l in myHash.items():
        if len(l)>1:
            print(l)
            if l[0]<idx or idx== -1:
    
                idx =  l[0] 
    return idx
firstRepeated(1,[1,2,2,1,32,2,3,432,323],9)

{1: [1, 4], 2: [2, 3, 6], 32: [5], 3: [7], 432: [8], 323: [9]}
[1, 4]
[2, 3, 6]


1

In [139]:
a = {1:'Hardik',2:'Harshit'}