# 5 üîë Hash Tables (The Speed King)

The **Hash Table** (or Hash Map) is the engine behind Python's `dict` and `set`. It is the only structure that offers **$O(1)$** (Constant Time) for Search, Insert, and Delete.

**Key Topics Covered:**
* **The Magic:** How `hash()` turns a String into an Array Index.
* **The Problem:** Collisions and how to fix them (Chaining).
* **Implementation:** Building a dictionary from scratch.
* **Metrics:** Load Factor and Resizing.
* **Use Case:** Solving the famous "Two Sum" problem.

## 5.1 üé© The Magic Trick: Hashing

How does a dictionary find `"Alice"` instantly among 1 million keys? It doesn't search. It does math. 

1.  **Input:** Key (`"Alice"`)
2.  **Hash Function:** Converts text to a unique-ish integer (`8402394`).
3.  **Compression:** Fits that huge integer into our small array size using Modulo (`8402394 % 10 = 4`).
4.  **Storage:** Put data in Array Index `4`.

To find it, we just re-run the math. The result `4` tells us exactly where to look.

In [None]:
def simple_hash(key: str, table_size: int) -> int:
    """A toy hash function."""
    hash_val = 0
    for char in key:
        hash_val += ord(char) # Sum ASCII values
    return hash_val % table_size

size = 10
print(f"'Alice' maps to index: {simple_hash('Alice', size)}")
print(f"'Bob'   maps to index: {simple_hash('Bob', size)}")

## 5.2 üí• The Problem: Collisions

What if `"Alice"` and `"Bob"` both calculate to Index 4? This is a **Collision**.

**The Solution: Separate Chaining**
Instead of storing the value directly in the slot, we store a **List** (or Linked List) at that slot. If a collision happens, we just append the new item to the list. 

-   **Best Case:** 1 item per slot $\rightarrow$ $O(1)$.
-   **Worst Case:** All items in 1 slot $\rightarrow$ $O(N)$ (It becomes a Linked List).

In [None]:
from typing import Any, List, Tuple

class HashMap:
    """A Hash Map using Separate Chaining."""
    def __init__(self, size=10):
        self.size = size
        # Array of buckets. Each bucket is a list of (key, value) tuples.
        self.map: List[List[Tuple[Any, Any]]] = [[] for _ in range(self.size)]

    def _get_index(self, key):
        return hash(key) % self.size

    def put(self, key, value):
        index = self._get_index(key)
        bucket = self.map[index]

        # Update existing key
        for i, (k, v) in enumerate(bucket):
            if k == key:
                bucket[i] = (key, value)
                return
        
        # Or append new key (Collision handled here!)
        bucket.append((key, value))

    def get(self, key):
        index = self._get_index(key)
        bucket = self.map[index]

        for k, v in bucket:
            if k == key:
                return v
        
        return None # Key not found

    def delete(self, key):
        index = self._get_index(key)
        bucket = self.map[index]

        for i, (k, v) in enumerate(bucket):
            if k == key:
                del bucket[i]
                return True
        return False

# --- Test ---
hm = HashMap(5) # Small size to force collisions
hm.put("Apple", 100)
hm.put("Banana", 200)
hm.put("Cherry", 300) 

print(f"Apple: {hm.get('Apple')}")
print(f"Internal Map: {hm.map}") # You might see multiple items in one bucket!

## 5.3 ‚öôÔ∏è Engineering Metrics: Load Factor

When do we resize the hash table?

$$ \text{Load Factor} (\alpha) = \frac{\text{Number of Items}}{\text{Size of Array}} $$

-   If $\alpha$ gets too high (e.g., $> 0.7$), collisions increase, and speed drops.
-   **Rehashing:** Python automatically doubles the array size and re-calculates indices for all items when the table gets ~2/3 full. This is an $O(N)$ operation, but it happens rarely (Amortized $O(1)$).

---

## ÓÅûÊΩÆ Mini-Challenge: Two Sum ($O(N)$)

**The Problem:** Given an array of integers `nums` and an integer `target`, return indices of the two numbers such that they add up to `target`.

-   Input: `nums = [2, 7, 11, 15]`, `target = 9`
-   Output: `[0, 1]` (Because 2 + 7 = 9)

**Constraint:** You must do this in **$O(N)$** time. (Nested loops are $O(N^2)$ and will fail).

**Hint:** Use a Dictionary to remember numbers you've already seen.

In [None]:
from typing import List

def two_sum(nums: List[int], target: int) -> List[int]:
    # Key: The number, Value: Its index
    seen = {}
    
    for i, num in enumerate(nums):
        complement = target - num
        
        if complement in seen:
            return [seen[complement], i]
        
        seen[num] = i
    return []

print(two_sum([2, 7, 11, 15], 9))
print(two_sum([3, 2, 4], 6))

---

## 5.4 üåç Real-World System Map

Where do you use Hash Tables?

### 1. Database Indexing
*   **Example:** **Searching SQL**.
*   **Why?** When you run `SELECT * FROM Users WHERE ID = 42`, the database doesn't scan millions of rows. It uses an Index (often a B-Tree or Hash Table) to jump straight to the disk location of User 42. It's the difference between reading every page of a book vs. using the Glossary.

### 2. Caching (Redis/Memcached)
*   **Example:** **Twitter API**.
*   **Why?** Generating a user's timeline takes 200ms of CPU time. Storing the result in Redis (a giant in-memory Hash Map) allows us to fetch it in 1ms the next time they refresh. Key=`"User_123_Timeline"`, Value=`JSON_Data`.

### 3. Symbol Tables (Compilers)
*   **Example:** **Python Interpreter**.
*   **Why?** When you write `x = 10`, Python stores `"x"` as a key in a Hash Table (the `locals()` dictionary). When you print `x`, it looks up the string `"x"` to find the value `10`.