### Map
- a relationship between two pieces of information where each input (key) corresponds to exactly one output (value)

#### Key points:
- Unique keys: Each key can only be associated with one value.
- Not necessarily a two-way relationship: A map might not be reversible. For example, multiple people can have the same eye color, so a map from people to eye color might not be a map from eye color to people.

#### Examples of maps:
- People to their heights
- Baseball players to their batting averages
- Placemats to the people seated at them

#### Example of a non-map:
- States to jazz musicians born there (multiple musicians could be born in the same state)

### Hash Map
- a data structure used to store key-value pairs
- employ a hashing function to overcome a challenge to efficiently find the value associated with a given key
- built on top of an array using a special indexing system.
- key-value storage with fast assignments and lookup.
- a table that represents a map from a set of keys to a set of values.

#### Key Points
- Hash Function: A function that maps input data to a numerical value.
- Hash Code: The numerical value generated by a hash function.
- Array: A data structure used to store elements at specific indices.


#### Hashing Function
- A hashing function takes a key as input and produces a number, known as a hash code. 
- This hash code is then used as an index to store the corresponding value in an array.


##### How Hash Functions Work:
- Hashing: The hash function processes the input key and generates a hash code.
- Modulus Operation: The hash code is divided by the array size, and the remainder is used as the index.
- Storage: The value associated with the key is stored at the calculated index in the array.

##### Why Hash Functions Are Not Reversible:
- Data Loss: Hash functions often involve reducing the size of the input data, leading to loss of information.
- Security: Non-reversibility is essential for cryptographic hash functions used in security applications.

###### Example:
1) Consider a hash map storing the astrological signs of friends:
    - Key: Friend's name (e.g., "Joan McNeil")
    - Value: Astrological sign (e.g., "Libra")
    - Hashing: Apply a hashing function to "Joan McNeil" to get a hash code, say 17.
    - Indexing: Use 17 as the index to store "Libra" in the array.

2) Designing Hash Functions:
    - Vowel Count: Add up the number of vowels in the key.
    - Letter Values: Assign values to letters (A=1, B=2, etc.) and sum them.
    - Custom Functions: Create specific functions for different data types.


i.e.,
Key: test

Hashed Into ['t', 'e', 's', 't']

Code Point ['116', '101', '115', '116']

-> Add Them Up 
116 + 101 + 115 + 116 = 448 (Hash Value)



#### Key Takeaways:
- Hash maps are efficient for storing and retrieving data based on keys.
- Hash functions are crucial for mapping keys to array indices.
- Effective hash functions minimize collisions (multiple keys mapping to the same index)


![Screenshot 2024-10-27 at 9.07.19 PM.png](<attachment:Screenshot 2024-10-27 at 9.07.19 PM.png>)




##### Collision:
- Collision: When two different keys produce the same hash code, it's called a collision.
- Collision Handling: Hash maps employ strategies like separate chaining or open addressing to handle collisions.


##### Separate Chaining in Hash Maps

- a technique used to handle collisions in hash maps
- When two or more keys hash to the same index, instead of overwriting the existing value, a linked list is created at that index
- Each node in the linked list stores a key-value pair
- important to choose a good hash function to minimize collisions and maintain efficient performance


##### How it works:
- Hashing: The key is hashed to obtain a hash code.
- Indexing: The hash code is mapped to an index in the array.

##### Collision Handling:
- Empty Index: A new linked list is created at that index with the key-value pair.
- Non-Empty Index: The key-value pair is added to the existing linked list at that index.


< Advantages >
- Efficient Handling of Collisions: By using linked lists, separate chaining can handle collisions gracefully, even when multiple keys hash to the same index.
- Good Average-Case Performance: In most cases, the hash function distributes keys evenly, resulting in efficient lookups and insertions.


< Disadvantages >
- Worst-Case Performance: In the worst-case scenario, where all keys hash to the same index, the hash map degenerates into a linked list, leading to linear search time complexity for operations.
- Space Overhead: Separate chaining requires additional space for linked list nodes.


![Screenshot 2024-10-27 at 9.12.54 PM.png](<attachment:Screenshot 2024-10-27 at 9.12.54 PM.png>)




#### Saving Keys for Hash Collision Resolution

In hash maps, collisions can occur when two different keys produce the same hash code. To address this, a common technique is separate chaining. This involves storing colliding key-value pairs in a linked list at the corresponding array index.

##### Why Save Keys?
- Disambiguation: By saving both the key and the value, we can distinguish between different key-value pairs that might have the same hash code.
- Accurate Retrieval: When retrieving a value, we can compare the provided key with the saved keys in the linked list to ensure we get the correct value.

##### Process of Reading or Writing
1. Hash Calculation: Calculate the hash code for the given key.
2. Index Determination: Use the hash code to determine the appropriate array index.
3. Linked List Iteration: Traverse the linked list at the specified index.
4. Key Comparison: Compare the saved key with the provided key.
5. Value Retrieval or Update: If the keys match, retrieve or update the value. Otherwise, continue iterating.


![Screenshot 2024-10-27 at 9.15.45 PM.png](<attachment:Screenshot 2024-10-27 at 9.15.45 PM.png>)




#### Open Addressing: Linear Probing
- another strategy to handle hash collisions in hash maps
- Instead of using linked lists, it involves searching for an empty slot in the array to store the colliding key-value pair.

##### Linear Probing:
- technique within open addressing is linear probing. How it works:
    1. Hash Calculation: Calculate the hash code for the key.
    2. Index Determination: Use the hash code to determine the initial index in the array.
    3. Slot Check: If the slot is empty, insert the key-value pair.
    4. Collision Handling: If the slot is occupied, probe the next slot in a linear sequence until an empty slot is found.

##### Example:
- Consider a hash map to store famous horses and their owners:
Key: Horse's name (e.g., "Bucephalus", "Seabiscuit")
Value: Owner's name (e.g., "Alexander the Great", "Charles Howard")
Hashing "Bucephalus": Hash function returns index 3.
Storing "Bucephalus": Store "Bucephalus" and "Alexander the Great" at index 3.
Hashing "Seabiscuit": Hash function also returns index 3 (collision).
Linear Probing: Since index 3 is occupied, probe to index 4, which is empty.
Storing "Seabiscuit": Store "Seabiscuit" and "Charles Howard" at index 4.
Retrieving a Value:

Hash Calculation: Calculate the hash code for the key ("Seabiscuit").
Index Determination: Determine the initial index (3).
Slot Check:
Index 3: Key mismatch ("Bucephalus" ≠ "Seabiscuit").
Index 4: Key match ("Seabiscuit" = "Seabiscuit").
Value Retrieval: Retrieve "Charles Howard" from index 4.
By using linear probing, we can efficiently handle collisions and store key-value pairs within the array itself.

![Screenshot 2024-10-27 at 9.20.19 PM.png](<attachment:Screenshot 2024-10-27 at 9.20.19 PM.png>)


Other technique
![Screenshot 2024-10-27 at 9.20.52 PM.png](<attachment:Screenshot 2024-10-27 at 9.20.52 PM.png>)

In [2]:
class HashMap:
  """
  an array with hash code
  """
  def __init__(self, array_size):
    self.array_size = array_size
    self.array = [None for item in range(array_size)]


  def hash(self, key, count_collisions=0):
    """
    convert a string into its corresponding bytes and return sum of the bytes
    add count_collisions to sum of key_bytes to determine the hash code to return.
    """
    key_bytes = key.encode()
    hash_code = sum(key_bytes)
    return hash_code + count_collisions
    

  def compressor(self, hash_code):
    """
    Ensures that the index is valid and always within the bounds of the array by mapping a hash code to an array index.
    Returns a value between 0 and array_size - 1 by taking the modulus (%) of the hash code with the array size.
    """
    return hash_code % self.array_size
  

  def assign(self, key, value):
    array_index = self.compressor(self.hash(key))
    # self.array[array_index] = value

    """
    Address collision by storing k-v pair in the array
    """
    current_array_value = self.array[array_index]

    if current_array_value is None:
      # open addressing - place the new key-value pair into the hash map.
      self.array[array_index] = [key, value]
      return

    if current_array_value[0] == key:
      self.array[array_index] = [key, value]
      return

    """
    When the key we’re trying to set (key) is different from the key at our hash code’s address (current_array_value[0]), 
    create a new variable called number_collisions.
    And store new_hash_code in the array at new_array_index, incrementally increase number_collisions
    """
    number_collisions = 1
    while(current_array_value[0] != key):
      new_hash_code = self.hash(key, number_collisions)
      new_array_index = self.compressor(new_hash_code)
      current_array_value = self.array[new_array_index]

      if current_array_value is None:
        self.array[new_array_index] = [key, value]
        return

      if current_array_value[0] == key:
        self.array[new_array_index] = [key, value]
        return

      number_collisions += 1
    return


  def retrieve(self, key):
    array_index = self.compressor(self.hash(key))
    # return self.array[array_index]

    """Address collision - when the array item has kv pair - return only value """
    possible_return_value = self.array[array_index]
    if possible_return_value is None:
        return None

    if possible_return_value[0] == key:
        return possible_return_value[1]


    """
    Continue searching when possible_return_value has a different key than the one we’re looking for.
    Replicate the retrieval logic while increasing the count of retrieval_collisions to continue to look at other locations in the array.
    """
    retrieval_collisions = 1
    while (possible_return_value != key):
      new_hash_code = self.hash(key, retrieval_collisions)
      retrieving_array_index = self.compressor(new_hash_code)
      possible_return_value = self.array[retrieving_array_index]

      if possible_return_value is None:
        return None

      if possible_return_value[0] == key:
        return possible_return_value[1]

      retrieval_collisions += 1

    return
    
  

hash_map = HashMap(15)
hash_map.assign('gabbro','igneous')
hash_map.assign('sandstone', 'sedimentary')
hash_map.assign('gneiss', 'metamorphic')

print(hash_map.retrieve('gabbro'))
print(hash_map.retrieve('sandstone'))
print(hash_map.retrieve('gneiss'))

igneous
sedimentary
metamorphic
