# <center><b> Data Structures Part II </b></center>

## <center><b> Dictionary </b></center>

A data structure that represents the mathematical concept of the partial function $R:D -> C$, or key-value associations.

- D is the domain (keys)
- C is the codomain (values)

<u> Operators </u>
- **Lookup**: the values associated with a particular key, if present, None otherwise
- **Insert**: a new key-value association, deleting associations that may be already present
- **Remove**: an existing key-value association

<center><img src="./img/56.png" width="500"/></center>


Differents complexity of data structures:
<center><img src="./img/57.png" width="500"/></center>

<hr>


## <center><b> Hash Tables </b></center>

- choose a hash function $h$ that maps each key $k$ to an integer $h(k)$
- the key-value pair $(k, v)$ is stored in a list at position $h(k)$
- the list is called a **hash table**


<u> Definitions </u>

- all the possible keys are within the universe $U$ of size $u$
- the table is stored in a list $T[0...m-1]$ with size $m$
- a hash functin is defined as $h:U -> {0, 1, .., m-1}

<center><img src="./img/58.png" width="500"/></center>

<u> Rules </u>

- if two objects are **equal**, then their hashes should be equal
- if two objects have the **same hash**, then they are *likely to be equal**

i.e. you should avoid to return values that generate collisions in your hash functions

- in order for an object to be hashable, it must be **immutable**

i.e. the hash value of an object should not change over time


<u> Collisions </u>

- when two value or more keys in the dictionary have the same hash value, we say theat a **collision** happened.
- ideally, we would like a hash function **without** collisions
- there are several ways to deal with these, sometimes you cannot avoid it

<center><img src="./img/59.png" width="500"/></center>




#### <b> Direct Access Table </b>
Example of function that does not generate collisions:

**idea**: our universe $U$ is a subset of $Z$. We use the identity function as hash function $h(k) = k$


**example**: days of the year


**problems**: 
- if $U$ is **very large** it might be unfeasible because we might have a very big list. 
- If the number of keys that are recorded is much smaller than the size of $U$, we might **waste a lot of memory**


#### <b> Perfect Hash Function </b>

<center><img src="./img/60.png" width="500"/></center>

<br>

#### <b> How can we minimize the number of collisions? </b>
We want a hash function that distributes keys into the hash indexes $[0,..m]$ in a uniform way.
<center><img src="./img/61.png" width="500"/></center>

To obtain a hash function with simple uniformity, the probability distribution P should be known.
Most of the time, we don't have this information (property)

Example:

if $U$ is given by real number in $[0,1]$ and each key has same probability of being selected, then $H(k) = [km]$ has simple uniformity.



#### <b> Implementation </b>

Each key can be translated to a numberical, non-negative value by reading its internal representation as a number

**Example**: string transformation

**ord**(c): ordinal binary value of character C in ASCII

**bin**(k): binary representation of key $k$, by concatenating the binary values of its characters

**int**(b): numerical value associated to the binary number $b$

In [5]:
def H(in_string):
    d = "".join([str(bin(ord(x))).replace("b", "") for x in in_string])
    int_d = int(d, 2)
    return int_d

people = ["Luca", "Davide", "Cristina", "Elena", "Alessandro", "Alan Turing"]

print(f"{'Name':<15}{'Hash Value'}")
print('-' * 25)

for p in people:
    print(f"{p:<15}{H(p)}")

Name           Hash Value
-------------------------
Luca           1282761569
Davide         75185389134949
Cristina       4860062892481408609
Elena          298171330145
Alessandro     308953380059970024010351
Alan Turing    39545995566905718680940135


But how do we **convert** these numbers into values in $[0,...,m-1]$ where m is the size of the hash table? 
We use the Module (Division) operator (Method).

**DIVISION METHOD**:

- let m be an odd number (prime)
- $H(k) = int(k) mod m$

In [8]:
def H2(in_string):
    # Convert each character in the input string to its binary representation
    # ord(x) gets the ASCII value of x
    # bin(...) converts this ASCII value to binary
    # str(...) converts the binary value to a string
    # The replace("b", "") removes the 'b' from the binary representation
    # "".join(...) concatenates all the binary strings into one string
    d = "".join([str(bin(ord(x))).replace("b", "") for x in in_string])
    
    # Convert the binary string to an integer
    int_d = int(d, 2)
    
    # Return the integer
    return int_d

def my_hash_func(key_str, m=383):
    # Compute the hash code of the key string using the H2 function
    h = H2(key_str)
    
    # Compute the hash key by taking the modulus of the hash code with m
    # This ensures that the hash key is within the range [0, m-1]
    hash_key = h % m
    
    # Return the hash key
    return hash_key

print(my_hash_func("Hello"))

34


#### <b> HANDLING COLLISIONS </b>

<center><img src="./img/63.png" width="500"/></center>


##### **Complexity of Separate Chaining**

<u> Worst case analysis </u>

- all the keys are inserted in a unique list
- insert(): $\Theta(1) $
- lookup(), remove(): $\Theta(n) $


<u> Average case analysis </u>
- Let's assume the hash function has simple uniformity
- Hash function computation $\Theta(1)$, to be added to al searches

<u> How long the list are? </u>
- The expected lenght of a list is equal to $\alpha = n/m$

<center><img src="./img/64.png" width="400"/></center>

In [9]:
#### da fare

class HashTable:
    def __init__(self, size):
        self.size = size
        self.table = [[] for _ in range(self.size)]

    def _hash(self, key):
        return hash(key) % self.size

    def set(self, key, value):
        hash_key = self._hash(key)
        key_exists = False
        bucket = self.table[hash_key]
        
        for i, kv in enumerate(bucket):
            k, v = kv
            if key == k:
                key_exists = True
                break

        if key_exists:
            bucket[i] = ((key, value))
        else:
            bucket.append((key, value))

    def get(self, key):
        hash_key = self._hash(key)
        bucket = self.table[hash_key]
        
        for i, kv in enumerate(bucket):
            k, v = kv
            if key == k:
                return v
        return None

    def remove(self, key):
        hash_key = self._hash(key)
        bucket = self.table[hash_key]
        
        for i, kv in enumerate(bucket):
            k, v = kv
            if key == k:
                del bucket[i]
                return True
        return False

in Python, set and dict are implemented through hash table, set is a degenerate form of dictionary, which only values and no keys.


Both are unordered data structures: the order between keys is not preserved by the hash function -> this is why you get unordered results when printing them

<b> Python Set </b>
<center><img src="./img/65.png" width="400"/></center>

<b> Python dict </b>
<center><img src="./img/66.png" width="400"/></center>

<hr>

## <center><b> Stack </b></center>

A stack is a **linear, dynamic data structure**, in which the remove operation remove (and returns) the element that has remained in the data structure for the least time.

LIFO

<center><img src="./img/67.png" width="300"/>
        <img src="./img/69.png" width="300"/></center>


<u>Operations</u>:
- isEmpty(): returns True if the stack is empty
- size(): size of the stack
- push(object v): insert v *on top* of the stack
- pop(object v): removes the top element of the stack and returns it to the caller
- peek(object v): read the top element of the stack without modifying it


<u>Applications:</u>

- to balance parentheses in **compilers** like vscode
- to keep track of function call activation record in **interpreters**

*Possible implementation (how can we implement this):*
- through **bidirectional** lists (reference to the top element)
- through **vectors** (limited memory size, small overhead)


In [14]:
def my_funct(x):
    if x <= 2:
        return x
    else:
        print(f"{x} + my_funct({x//4})")
        return x + my_funct(x//4)
    
print(my_funct(106))

106 + my_funct(26)
26 + my_funct(6)
6 + my_funct(1)
139


In [16]:
# to fix
class Stack:
    def __init__(self):
        # Initialize an empty list to store the stack elements
        self.stack = []

    def push(self, item):
        # Add an item to the top of the stack
        self.stack.append(item)

    def pop(self):
        # Remove and return the top item from the stack
        # If the stack is empty, return None
        if self.isEmpty():
            return None
        return self.stack.pop()

    def peek(self):
        # Return the top item from the stack without removing it
        # If the stack is empty, return None
        if self.isEmpty():
            return None
        return self.stack[-1]

    def __len__(self):
        # Return the number of items in the stack
        return len(self.stack)

    def isEmpty(self):
        # Return True if the stack is empty, False otherwise
        return len(self.stack) == 0

    def __str__(self):
        # Return a string representation of the stack
        return str(self.stack)
    
my_stack = Stack()

my_stack.push("Apple")
my_stack.push("Banana")
my_stack.push("Cherry")

print(len(my_stack))  # Outputs: 3
print(my_stack.isEmpty())  # Outputs: False

print(my_stack.peek())  # Outputs: Cherry
print(my_stack.pop())  # Outputs: Cherry

print(my_stack)  # Outputs: ['Apple', 'Banana']

3
False
Cherry
Cherry
['Apple', 'Banana']


*EXAMPLE*

**PAR CHECKER:**

check whether the parenthesis are balanced
- implement this function with a stack
- only allowed input are parenthesis

In [18]:
def par_checker(symbol_string):
    # Create an empty stack
    s = Stack()
    
    # Initialize a variable to keep track of whether the parentheses are balanced
    balanced = True
    
    # Initialize an index to iterate over the string
    index = 0

    # Iterate over the string while the index is less than the length of the string and the parentheses are still balanced
    while index < len(symbol_string) and balanced:
        # Get the symbol at the current index
        symbol = symbol_string[index]
        
        # If the symbol is an opening parenthesis, push it onto the stack
        if symbol == "(":
            s.push(symbol)
        else:
            # If the symbol is a closing parenthesis, check if the stack is empty
            if s.isEmpty():
                # If the stack is empty, it means there's a closing parenthesis without a matching opening parenthesis, so the parentheses are not balanced
                balanced = False
            else:
                # If the stack is not empty, pop the opening parenthesis from the stack
                s.pop()

        # Move to the next symbol
        index = index + 1

    # After processing the entire string, check if the parentheses are balanced and the stack is empty
    # If the stack is empty and the parentheses are balanced, it means all the parentheses are balanced
    if balanced and s.isEmpty():
        return True
    else:
        # If the stack is not empty or the parentheses are not balanced, it means the parentheses are not balanced
        return False

# Example usage:

print(par_checker('((()))'))  # Outputs: True
print(par_checker('(()'))     # Outputs: False

True
False


<hr>

## <center><b> Queue </b></center>

A linear, dynamic data structure in which the "remove" operation returns (and removes) the element that has remained in the data structure for **the longest time**.

<center><img src="./img/70.png" width="300"/></center>

<u>Operations</u>:
- isEmpty(): returns True if the queue is empty
- size(): size of the queue
- enqueue(object v): insert v at the end of the queue
- dequeue(object v): extract q from the beginning of the queue
- top(object v): read the top element of the queue without modifying it


<u>Applications:</u>

- To queue requests perfomed on a limited resource (eg. printers, servers, etc)
- To visit graphs


*Possible implementation (how can we implement this):*
- Through lists (add to the tail, remove from the head)
- Through circular arrays (limited size, small overhead)

In [19]:
from collections import deque

class Queue:
    def __init__(self):
        # Initialize an empty deque to store the queue elements
        self.queue = deque()

    def isEmpty(self):
        # Return True if the queue is empty, False otherwise
        return len(self.queue) == 0

    def size(self):
        # Return the number of items in the queue
        return len(self.queue)

    def enqueue(self, item):
        # Add an item to the end of the queue
        self.queue.append(item)

    def dequeue(self):
        # Remove and return the first item from the queue
        # If the queue is empty, return None
        if self.isEmpty():
            return None
        return self.queue.popleft()

    def top(self):
        # Return the first item from the queue without removing it
        # If the queue is empty, return None
        if self.isEmpty():
            return None
        return self.queue[0]