# Chapter 2: The Power of Hash Maps

## 2.1 🚀 The Most Important Data Structure: Hash Maps

In the landscape of data structures, the **Hash Map**—known as a `dictionary` in Python—stands out as arguably the most critical tool for a software engineer, particularly in the context of coding interviews. Its prevalence stems from a single, powerful capability: the ability to perform average-time **$O(1)$ complexity for lookups, insertions, and deletions**.

This constant-time performance allows for the creation of highly efficient algorithms that would otherwise be sluggish. Many brute-force solutions with quadratic complexity, such as $O(n^2)$, can often be optimized to linear time, $O(n)$, by the clever application of a hash map. This optimization is not just an incremental improvement; it is a fundamental shift in efficiency that separates a non-viable solution from a production-ready one.

Therefore, it is essential to internalize the following critical thinking question. It should be the first thing you ask yourself when aiming to optimize a slow algorithm:

> **If you are ever stuck on an $O(n^2)$ solution, ask yourself: 'Can a hash map help?'**

## 2.2 ⚙️ Under the Hood: How Hashing Works

To wield hash maps effectively, it is crucial to understand their internal mechanics. A hash map is a generalization of a standard array. While an array uses an integer index to access an element, a hash map allows you to use a key of almost any type (e.g., a string, a number, or a tuple) to find its corresponding value. This is achieved through three core components:

### Hash Function
A hash function is a mathematical function that takes a key as input and deterministically produces an integer output. This integer is the "hash code." In the context of a hash map, this hash code is then mapped to an index within the underlying array.
$$ \text{index} = \text{hash}(\text{key}) \pmod{M} $$
Where $M$ is the size of the underlying array. A good hash function distributes keys uniformly across the available indices to minimize collisions.

### Buckets
The underlying data store for a hash map is an array. Each slot in this array is called a **bucket**. The index calculated by the hash function determines which bucket a key-value pair should be placed in.

### Collision Resolution
A **hash collision** occurs when two distinct keys produce the same hash code, and thus map to the same bucket index. Since two items cannot occupy the same single slot, a strategy is needed to handle this. The most common method is **Chaining**.

With chaining, each bucket, instead of holding a single value, holds a pointer to another data structure, typically a linked list. If multiple keys map to the same bucket, their key-value pairs are simply appended to the linked list at that index. When retrieving a value, the hash map first finds the correct bucket and then traverses the linked list, comparing the given key with the keys stored in the list until a match is found.

## 2.3 🛠️ Core Dictionary Methods & Time Complexity

Before diving into problem patterns, it is essential to master the fundamental operations of Python's `dict`. These methods are your primary tools for implementing hash map-based solutions.

- **Accessing and Updating (`my_dict[key]`)**: The most direct way to get or set a value. It will raise a `KeyError` if the key does not exist on access.
- **Safe Access (`.get(key, default)`)**: A crucial method that returns `None` (or a specified `default` value) if a key is not found, preventing `KeyError` exceptions. This is vital for safely checking for existence while retrieving a value.
- **Checking for a Key (`key in my_dict`)**: The standard Pythonic way to check if a key exists in a dictionary. This is a highly efficient $O(1)$ average-time operation.
- **Removing an Item (`del` or `.pop()`)**: `del my_dict[key]` removes a key-value pair, while `my_dict.pop(key, default)` removes the pair and returns the value, with an option to provide a default if the key is not found.
- **Iteration (`.keys()`, `.values()`, `.items()`)**: These methods provide views for iterating over the dictionary's keys, values, or key-value pairs, respectively. Using `for key, value in my_dict.items():` is the standard for iterating over both simultaneously.

In [None]:
# Demonstrating core dictionary methods
student_grades = {}

# 1. Accessing and Updating
student_grades['Alice'] = 91
student_grades['Bob'] = 85
print(f"Initial grades: {student_grades}")
student_grades['Alice'] = 95 # Update value
print(f"Alice's updated grade: {student_grades['Alice']}")

# 2. Safe Access with .get()
charlie_grade = student_grades.get('Charlie', 'Not Found')
print(f"Getting Charlie's grade safely: {charlie_grade}")

# 3. Checking for a key
has_bob = 'Bob' in student_grades
print(f"Does Bob exist in the dictionary? {has_bob}")

# 4. Removing an item
bob_grade = student_grades.pop('Bob')
print(f"Popped Bob's grade: {bob_grade}")
print(f"Grades after popping Bob: {student_grades}")

# 5. Iteration
print("\nIterating through students and their grades:")
for student, grade in student_grades.items():
    print(f"- {student}: {grade}")

### Big-O Complexity for Hash Map (Dictionary) Operations

The primary reason hash maps are so powerful is their exceptional average-case time complexity. However, it's critical to also understand the worst-case scenario.

| Operation              | Complexity (Average) | Complexity (Worst) | Notes                                                                                                                |
|------------------------|----------------------|--------------------|--------------------------------------------------------------------------------------------------------------------------|
| Access/Update (`d[k]`) | $O(1)$               | $O(n)$             | The worst case occurs if a poorly designed hash function causes all keys to collide into a single bucket.                        |
| Deletion (`del d[k]`)  | $O(1)$               | $O(n)$             | Similar to access, finding the item to delete may require scanning a bucket's entire linked list in the worst case.         |
| Search (`k in d`)      | $O(1)$               | $O(n)$             | The efficiency of checking for a key's existence is the core strength of a hash map.                                   |
| Iteration (`for k in d`)| $O(n)$               | $O(n)$             | Iteration must visit every key-value pair in the dictionary, so its complexity is always linear in the number of items.    |


## 2.4 💡 Common Problem Pattern: Finding Duplicates & Pairs

The $O(1)$ lookup capability of a hash map makes it exceptionally well-suited for problems that involve searching for items. This includes detecting duplicates, finding pairs that satisfy a specific condition, or, more generally, checking for the existence of an element in a collection.

A classic example is the **"Two Sum"** problem (LeetCode 1): given an array of integers `nums` and a target integer `target`, find the indices of two numbers such that they add up to `target`.

A naive, brute-force approach would use nested loops to check every possible pair of numbers, resulting in an $O(n^2)$ time complexity. However, a hash map can optimize this to $O(n)$.

The optimized algorithm works as follows: iterate through the array once. For each element `num`, calculate its required complement, `complement = target - num`. Then, check if `complement` already exists in the hash map. 
* If it does, we have found our pair.
* If it does not, add the current element `num` and its index to the hash map and proceed to the next element.

This works because by the time we are at index `i`, our hash map contains all the elements seen at indices `0` through `i-1`. The lookup for the complement is an $O(1)$ operation on average, and we only pass through the array once, yielding a total time complexity of $O(n)$.

### Problem Identification: Finding Duplicates & Pairs

Recognizing this pattern involves looking for problems where the core task is to check for the **existence of elements** you've previously encountered. The goal is often to find a relationship between the current element and an element seen in the past.

Here is a simple checklist:
- ✅ The input is a linear data structure like an array, list, or string.
- ✅ The problem asks you to find a **pair** of elements that satisfy a condition (e.g., `a + b = target`).
- ✅ The problem asks if the collection **"contains duplicates"** or to find the first duplicate.
- ✅ A brute-force solution would involve nested loops ($O(n^2)$), comparing every element with every other element.
- ✅ The key insight is that you can process the collection in a single pass ($O(n)$) by using a hash map to remember what you've seen.

### Boilerplate: The 'Two Sum' Pattern

This pattern is fundamental. It uses a hash map to store seen values and their indices, enabling a single-pass solution.

In [None]:
from typing import List

def find_sum_pair(nums: List[int], target: int) -> List[int]:
    """
    Finds two numbers in a list that sum to a target value.
    
    This function uses a hash map to achieve O(n) time complexity.
    Args:
        nums: A list of integers.
        target: The target sum.
    Returns:
        A list containing the indices of the two numbers, or an empty list if no pair is found.
    """
    seen_map = {}  # Key: number, Value: index
    
    for i, num in enumerate(nums):
        complement = target - num
        
        # O(1) average time lookup
        if complement in seen_map:
            # Found the pair
            return [seen_map[complement], i]
        
        # Add the current number and its index to the map for future lookups
        seen_map[num] = i
        
    return [] # Return empty list if no solution is found

# Example usage:
my_nums = [2, 7, 11, 15]
my_target = 9
result = find_sum_pair(my_nums, my_target)
print(f"Input: {my_nums}, Target: {my_target}")
print(f"Result (Indices): {result}") # Expected output: [0, 1]

## 2.5 📊 Common Problem Pattern: Grouping & Counting

Another powerful application of hash maps is aggregating data. This pattern involves iterating through a collection of items and using a hash map to either group them by a common property or count their frequencies.

For example, to count the frequency of each character in a string, you can iterate through the string. For each character, you use it as a key in a hash map. The value associated with that key is an integer counter, which you increment on each appearance. This simple process efficiently categorizes and counts elements in linear time.

### Problem Identification: Grouping & Counting

This pattern applies when you need to **categorize** or **aggregate** items from a collection based on some shared, derivable property.

Look for these signals:
- ✅ The problem asks you to **"group"** items together (e.g., "group anagrams").
- ✅ The problem asks you to **"count the frequency"** of elements (e.g., "find the most frequent element," "count character occurrences").
- ✅ The core of the problem involves processing a list of items and partitioning them into distinct buckets.
- ✅ The solution requires you to invent a **"canonical key"** that represents the group property (e.g., a sorted string for anagrams, the element itself for frequency counting).

### Boilerplate: Grouping and Counting

Python's `collections` module provides specialized dictionary subclasses that make these tasks even more straightforward. The two most important are `defaultdict` for grouping and `Counter` for counting.

In [None]:
from collections import defaultdict

# Template 1: Grouping items into lists using defaultdict
words = ["apple", "ant", "ball", "bat", "cat", "car"]

# Group words by their first letter
grouped_words = defaultdict(list)

for word in words:
    key = word[0]
    grouped_words[key].append(word)

print("Grouping with defaultdict:")
print(dict(grouped_words)) # Convert to regular dict for clean printing

The `defaultdict(list)` automatically provides an empty list for any new key that is accessed, saving you from writing boilerplate code to check if the key already exists.

In [None]:
from collections import Counter

# Template 2: Counting item frequencies with Counter
data = ['a', 'b', 'a', 'c', 'b', 'a', 'd']

# Counter can be initialized directly from an iterable
frequency_counts = Counter(data)

print("\nCounting with Counter:")
print(frequency_counts)

# Accessing counts is easy
print(f"Count of 'a': {frequency_counts['a']}") # Outputs 3
print(f"Count of 'z': {frequency_counts['z']}") # Outputs 0, no error

## 2.6 ⚡️ Application: Caching & Memoization

A hash map is the perfect data structure for caching the results of expensive computations. The input to a function can serve as the key, and the result can serve as the value. Before re-computing a result, the function first checks if the input already exists as a key in its cache. If so, it returns the stored value directly, which is an $O(1)$ operation.

This powerful technique is known as **Memoization** and is a cornerstone of **Dynamic Programming**. It is used to optimize recursive algorithms that have overlapping subproblems, such as the classic Fibonacci sequence calculation, transforming an exponential-time complexity into a linear-time one.

### Problem Identification: Caching & Memoization

This pattern is a key optimization technique, especially for problems that can be solved with recursion. It is the foundation of Dynamic Programming.

Look for these signals:
- ✅ The problem can be solved with a **recursive function**.
- ✅ The recursive solution results in **"overlapping subproblems"**—that is, the same function is called with the same arguments multiple times.
- ✅ This redundancy leads to an exponential time complexity ($O(2^n)$, $O(c^n)$) that times out.
- ✅ The goal is to **optimize the recursion** by storing the result of each unique function call so it only needs to be computed once.

### Boilerplate: Memoization

This template shows how to add a cache (a dictionary) to a recursive function to store and retrieve previously computed results.

In [None]:
# A dictionary used as a cache for memoization
fib_cache = {}

def fibonacci_memoized(n: int) -> int:
    """
    Calculates the n-th Fibonacci number using memoization.
    """
    # Base cases
    if n <= 1:
        return n
    
    # Check the cache first (O(1) lookup)
    if n in fib_cache:
        return fib_cache[n]
    
    # If not in cache, compute it, then store it
    result = fibonacci_memoized(n - 1) + fibonacci_memoized(n - 2)
    fib_cache[n] = result
    
    return result

# Example usage:
n = 30
print(f"Fibonacci({n}) = {fibonacci_memoized(n)}")

### Differentiating Hash Map Use Cases

Understanding the subtle differences between these patterns is key to quickly identifying the correct approach during an interview.

| Aspect | Finding Pairs / Duplicates | Grouping / Counting | Memoization (Caching) |
| :--- | :--- | :--- | :--- |
| **Primary Goal** | Check for the **existence** of an element or a relationship to a past element. | **Categorize** or **aggregate** a collection of items into distinct buckets. | **Store and retrieve** the results of expensive function calls to avoid re-computation. |
| **Map Key** | The element itself, or a derived value (e.g., `target - num`). | A **canonical representation** of the group property (e.g., sorted string, character count). | The **arguments** to the function (often a single value or a tuple of arguments). |
| **Map Value** | The element's index, a simple counter, or a boolean flag. | A **collection** of items that belong to the group (e.g., a list or set). | The **return value** of the function call for the given arguments. |
| **Problem Signal** | "Find a pair...", "contains duplicate", optimizing an $O(n^2)$ search. | "Group by...", "count frequencies", "partition into sets based on..." | "Optimize a recursive function", "overlapping subproblems". |


## 2.7 🐍 Advanced Techniques & Pythonic Tips

Mastery of hash maps in Python involves understanding the nuances of different dictionary types and adhering to language-specific rules.

### `defaultdict` vs. `Counter` vs. Standard `dict`

* **`dict` (with `.get()`)**: Use a standard dictionary when you need full control. The `.get(key, default)` method is your primary tool for avoiding `KeyError` exceptions. Use it when you need to provide a default value on-the-fly without permanently adding it to the dictionary. *Example: `count = my_dict.get('key', 0) + 1`*

* **`collections.defaultdict`**: Use this when you are grouping items into collections (like lists, sets, or other dictionaries). It simplifies the code by automatically creating a default item (e.g., an empty list) for a new key upon its first access, eliminating the need for existence checks.

* **`collections.Counter`**: Use this specifically for counting hashable objects. It is a dictionary subclass where elements are stored as keys and their counts are stored as values. It provides helpful methods like `.most_common()` and allows for direct arithmetic operations (e.g., subtracting one Counter from another to see the difference, which is invaluable for anagram problems).

### Hashability: The Immutable Key Rule

A crucial constraint for dictionary keys in Python is that they must be **hashable**. An object is hashable if it has a hash value that never changes during its lifetime. This implies that keys must be of an **immutable** type.

* **Hashable (Valid Keys)**: `str`, `int`, `float`, `bool`, `tuple`, `frozenset`.
* **Unhashable (Invalid Keys)**: `list`, `dict`, `set`. Attempting to use these as keys will raise a `TypeError: unhashable type`.

The reason for this rule is simple: if a key's value could change, its hash would also change, and the dictionary would no longer be able to find the object.

### Using Tuples as Composite Keys

A powerful and common technique is to use a tuple as a dictionary key. Since tuples are immutable, they are hashable. This is extremely useful when a single piece of data is insufficient to uniquely identify an entry. For example, in problems involving a grid or matrix, you can use a coordinate pair `(row, col)` as a single key to store information about a specific cell.

In [None]:
# Example: Storing data for grid coordinates
grid_data = {}
point1 = (10, 25) # A (row, col) tuple
point2 = (30, 15)

grid_data[point1] = "Start Node"
grid_data[point2] = "End Node"
to_find = (10, 25)
if to_find in grid_data:
    print(f"Data at {to_find}: {grid_data[to_find]}")

### Choosing a Canonical Key

For many advanced hash map problems, particularly those involving grouping, the central challenge is not the data structure itself, but rather devising a **canonical representation** for the data that can serve as the key. A canonical representation is a standard, unique format for data that may have multiple equivalent forms.

For a problem like "Group Anagrams," where we group words like "eat" and "tea", two primary strategies exist for creating a canonical key:

1.  **Sorted String (Intuitive, Less Performant)**: The most intuitive key is the sorted version of the string (e.g., `"eat"` becomes `"aet"`). While correct, this involves an $O(K \log K)$ sorting operation for every string of length $K$.

2.  **Character Count Array (Optimal)**: A more performant key is a tuple representing the frequency of each character. For lowercase English letters, this would be a 26-element tuple where each index corresponds to a letter `a-z`. For `"eat"`, the key would be `(1, 0, 0, 0, 1, ..., 1, ...)`. This key can be generated in $O(K)$ time, as it only requires a single pass over the string. This makes it the superior choice for performance.

## 2.8 🧩 LeetCode Case Study: "Group Anagrams"

Let's apply these concepts to a classic LeetCode problem, using the optimal canonical key strategy.

### Problem Statement

**LeetCode 49: Group Anagrams**

Given an array of strings `strs`, group the anagrams together. You can return the answer in any order.

An **Anagram** is a word or phrase formed by rearranging the letters of a different word or phrase, typically using all the original letters exactly once.

**Example:**
```
Input: strs = ["eat","tea","tan","ate","nat","bat"]
Output: [["bat"],["nat","tan"],["ate","eat","tea"]]
```

### Problem Identification & Strategy

1.  **Identify the Pattern**: The problem asks us to **"group"** items. Following our identification guide, this immediately signals the "Grouping & Counting" pattern, making a hash map the right data structure.

2.  **Devise a Key**: We need a canonical representation for anagrams. As discussed, the optimal key is a character count tuple. The words "eat" and "tea" are anagrams because they both contain exactly one 'a', one 'e', and one 't'. We represent this signature as a 26-element tuple, which is immutable and thus a perfect hash map key.

3.  **Choose the Right Tool**: Since we are grouping strings into lists, `collections.defaultdict(list)` is the ideal tool. It automatically handles the creation of a new list for any new character-count key we encounter.

### Step-by-Step Solution

The optimal algorithm is as follows:

1.  Initialize an empty `defaultdict(list)` named `anagram_map`.
2.  Iterate through each `word` in the input list `strs`.
3.  For each `word`, create a frequency array, `count`, of 26 zeros.
4.  Iterate through each character `c` of the `word`. For each character, increment the appropriate index in the `count` array (e.g., `count[ord(c) - ord('a')] += 1`).
5.  Once the `word` is fully processed, convert the `count` list into a `tuple`. This makes it immutable and thus hashable.
6.  Use this count-tuple as the key to access `anagram_map` and append the original `word` to the list at that key.
7.  After the loop finishes, the values of `anagram_map` are the lists of grouped anagrams. Return these values.

Here is the complete, optimized Python implementation for the 'Group Anagrams' problem.

In [None]:
from collections import defaultdict
from typing import List

def group_anagrams_optimal(strs: List[str]) -> List[List[str]]:
    """
    Groups anagrams together from a list of strings using an optimal
    character count key.
    
    Args:
        strs: A list of strings (assumed to be lowercase English letters).
    Returns:
        A list of lists, where each inner list contains a group of anagrams.
    """
    anagram_map = defaultdict(list)
    
    for word in strs:
        # Create a fixed-size array (list) for character counts.
        count = [0] * 26
        
        # Populate the count array for the current word.
        for char in word:
            count[ord(char) - ord('a')] += 1
        
        # Convert the list to a tuple to make it hashable.
        # This tuple is our canonical key.
        canonical_key = tuple(count)
        anagram_map[canonical_key].append(word)
    
    # The values of the map are the lists of grouped anagrams.
    return list(anagram_map.values())

# Example usage:
input_strs = ["eat","tea","tan","ate","nat","bat"]
grouped = group_anagrams_optimal(input_strs)
print(f"Input: {input_strs}")
print(f"Grouped Anagrams: {grouped}")

### Complexity Analysis

Let $N$ be the number of strings in the input list `strs`.
Let $K$ be the maximum length of a string in `strs`.

* **Time Complexity: $O(N \cdot K)$**
    * We iterate through each of the $N$ strings.
    * For each string of length up to $K$, the dominant operation is creating the character count array. This requires a single pass through the string's characters, taking $O(K)$ time.
    * The subsequent operations (converting the count list to a tuple and inserting into the hash map) are constant time on average with respect to the string's length.
    * Therefore, the total time complexity is the product of these two operations: $O(N \cdot K)$. This is a significant improvement over the $O(N \cdot K \log K)$ complexity of the sorting-based approach.

* **Space Complexity: $O(N \cdot K)$**
    * The space complexity is determined by the storage required for the `anagram_map`.
    * In the worst-case scenario, every string is unique, and no anagrams are found. In this case, the hash map must store all $N$ original strings.
    * The total space required is the sum of the lengths of all strings, which is bounded by $O(N \cdot K)$. The space for the keys (tuples of fixed size 26) is negligible in comparison.