## Rubric

| Criteria                    | Ratings                                                                                                                                      | Pts    |
| --------------------------- | -------------------------------------------------------------------------------------------------------------------------------------------- | ------ |
| **Sort Algorithms**         | - 20 pts Full Marks<br>- 10 pts data description (The arrangement of the data needs to be described)<br>- 0 pts No Marks                     | 20 pts |
| **Insertion Sort**          | - 20 pts Full Marks<br>- 0 pts No Marks                                                                                                      | 20 pts |
| **Missing Numbers**         | - 20 pts Full Marks<br>- 15 pts slower than O(n) solution<br>- 10 pts only one function implemented<br>- 0 pts No Marks                      | 20 pts |
| **Anagram Clusters**        | - 20 pts Full Marks (O(n) solution)<br>- 15 pts does not work with all test cases<br>- 10 pts slower than O(kn) solution<br>- 0 pts No Marks | 20 pts |
| **Longest Subarray Length** | - 20 pts Full Marks<br>- 15 pts does not work with certain cases<br>- 10 pts could be more efficient                                         | 20 pts |
| **Total Points**            |                                                                                                                                              | 100    |

## 1. Dataset Sorting Cases

Describe the worst case data and the best case data for each of the following sorting algorithms. Also, include the big O notation for each case.

- Bubble Sort
- Selection Sort
- Insertion Sort
- Merge Sort
- Quicksort

---
### Answers
**Bubble Sort**

- **Best Case Data**:
    - Already sorted array (ascending order).
    - Because with an optimization (checking if no swaps occurred), it only needs **one pass**.
    - **Best Case Complexity**: **O(n)**
- **Worst Case Data**:
    - Array sorted in **reverse order**.
    - Every comparison leads to a swap in every pass.
    - **Worst Case Complexity**: **O(n²)**


**Selection Sort**

- **Best Case Data**:
    - Unfortunately, **Selection Sort doesn’t improve with input order**.
    - Even if the array is already sorted, it still scans the entire unsorted section each pass to find the minimum.
    - **Best Case Complexity**: **O(n²)**
- **Worst Case Data**:
    - Reverse order, random order, or sorted order — it doesn’t matter.
    - Always does the same number of comparisons.
    - **Worst Case Complexity**: **O(n²)**


 **Insertion Sort**
 
- **Best Case Data**:
    - Already sorted array.
    - Only **n – 1 comparisons** and no shifts required.
    - **Best Case Complexity**: **O(n)**
- **Worst Case Data**:
    - Array sorted in **reverse order**.
    - Each element must be compared against and shifted past all earlier elements.
    - **Worst Case Complexity**: **O(n²)**


**Merge Sort**

- **Best Case Data**:
    - Merge sort always divides and merges regardless of input.
    - Even if already sorted, it still performs the divide-and-merge process.
    - **Best Case Complexity**: **O(n log n)**
- **Worst Case Data**:
    - Same as best case: input order doesn’t matter.
    - Still splits and merges every time.
    - **Worst Case Complexity**: **O(n log n)**


**Quicksort**

- **Best Case Data**:
    - Data is arranged so that the chosen pivot **always splits the array into two equal halves** (e.g., median pivot).
    - Balanced partitions reduce recursion depth.
    - **Best Case Complexity**: **O(n log n)**
- **Worst Case Data**:
    - If pivot is consistently the **smallest or largest element** (bad pivot choice).
    - Happens with sorted or reverse-sorted arrays if pivot is chosen as first/last element.
    - Partitions become highly unbalanced → recursion depth = n.
    - **Worst Case Complexity**: **O(n²)**


## 2. Insertion Sort

Implement an insertion sort function.

---
### Answers

In [None]:
from typing import TypeVar

T = TypeVar("T")

def insertion_sort(arr: list[T]) -> list[T]:
    """
    Sorts a list using Insertion Sort.

    Args:
        arr (list[T]): List of comparable elements.

    Returns:
        list[T]: Sorted list.
    """
    for i in range(1, len(arr)):
        key: T = arr[i]
        j: int = i - 1

        # Shift elements greater than key
        while j >= 0 and arr[j] > key:
            arr[j + 1] = arr[j]
            j -= 1

        arr[j + 1] = key

    return arr


# Example usage
numbers: list[int] = [5, 2, 9, 1, 5, 6]
print("Original:", numbers)
print("Sorted:", insertion_sort(numbers))

## 3. Missing Number Tracker

Implement **two** versions of a function that identifies missing numbers from an input array.

- **First Function:** Use a hashtable or hashset to track missing numbers.
- **Second Function:** Use an array-based approach instead of a hash structure.

Input:

- An array of size `n`, containing random integers in the range `[0, n-1]`.
- The array may contain duplicates and is not necessarily sorted.

Output:

- An array containing all missing numbers from the range `[0, n-1]`.

Both implementations must run in `O(n)` time complexity to receive full credit.

Example:

```python
find_missing([0, 3, 6, 7, 3, 3, 0, 4]) 

# Returns
[1, 2, 5]
```
---
### Answers

#### Version 1: HashSet Approach

- Traverse array, insert elements into a set.
- Then iterate over $[0..n-1]$ and collect numbers not in the set.
- Complexity: O(n) time, O(n) space.

In [None]:
def find_missing_hashset(arr: list[int]) -> list[int]:
    n = len(arr)
    seen: set[int] = set(arr)  # O(n)
    
    missing = [x for x in range(n) if x not in seen]  # O(n)
    return missing


# Example
print(find_missing_hashset([0, 3, 6, 7, 3, 3, 0, 4]))  # [1, 2, 5]

#### Version 2: Array-Based Approach

- Use an auxiliary boolean/int list of length n to mark presence.
- Traverse input, mark seen values.
- Collect unmarked indices.
- Complexity: O(n) time, O(n) space (but avoids hash overhead).

In [None]:
def find_missing_array(arr: list[int]) -> list[int]:
    n = len(arr)
    seen = [False] * n   # O(n) space
    
    for num in arr:      # O(n) pass
        if 0 <= num < n:
            seen[num] = True
    
    missing = [i for i, present in enumerate(seen) if not present]  # O(n)
    return missing


# Example
print(find_missing_array([0, 3, 6, 7, 3, 3, 0, 4]))  # [1, 2, 5]

## 4. Anagram Clusters

Write a function that accepts a list of words and groups them into clusters of anagrams.

An anagram is a word formed by rearranging the letters of another word, such as "listen" becoming "silent." 

**Requirements**:

- Input: A list of lowercase words with no spaces, symbols, or non-alphabetic characters
- Output: A list of lists, where each inner list contains anagram words grouped together
    - The order of words within each group is not important
- Make your code as time-efficient as possible
- State its time complexity using n as the number of words and k as the average word length

  
For example, given
```python
["listen", "silent", "enlist", "google", "gooegl", "elbow", "below", "bored", "robed"]
```

Output
```python
[  
  ["listen", "silent", "enlist"],  
  ["google", "gooegl"],  
  ["elbow", "below"],  
  ["bored", "robed"]  
]
```

---
### Answers

- For each word, generate a canonical key that uniquely identifies its anagram group.
- Option 1: sorted(word) → but sorting costs O(k log k).
- Option 2 (faster): Count characters (26 letters only). Use a tuple of counts as the key → O(k) per word.
- Group words by this key using a dictionary.
- Collect dictionary values as the result.

In [None]:
from collections import defaultdict

def group_anagrams(words: list[str]) -> list[list[str]]:
    groups: dict[tuple[int, ...], list[str]] = defaultdict(list)

    for word in words:
        # Character frequency (26 letters)
        count = [0] * 26
        for ch in word:
            count[ord(ch) - ord('a')] += 1
        key = tuple(count)  # immutable → usable as dict key
        groups[key].append(word)

    return list(groups.values())


# Example
words = ["listen", "silent", "enlist", "google", "gooegl", "elbow", "below", "bored", "robed"]
print(group_anagrams(words))


#### Complexity Analysis
- Let:
    - `n` = number of words
    - `k` = average word length
- Building character counts for each word: O(k)
- Doing it for all words: O(n·k)
- Dictionary operations: O(1) average per word
- Total Complexity = O(n·k)
- Space Complexity = O(n·k) (storing character counts + groups)

## 5. Longest Subarray Length

Write a function that returns the longest contiguous subarray whose sum equals a given target

- Input: an array of integers and a target value
- Output: an integer representing the length of the longest subarray with a sum equal to the target
- Your solution must run in O(n) time for full credit

Given an array: 
```python
[3, 1, -1, 2, -1, 5, -2, 3]
```

and a target value of 3, the longest subarray length is 

```txt
5 
Length of  [-1, 2, -1, 5, -2] (sum of 3)  
```

--- 

### Answers

- Use a running prefix sum as we iterate.
- This approach essentially checkes if `prefix_sum[j] - prefix_sum[i] = target`
- Store the first index where each prefix sum appears in a hashmap.
- At each step:
    - If `prefix_sum - target` has been seen before, then the subarray between that index+1 and the current index sums to target.
    - Update the max length if this subarray is longer.
- Also handle the case when the prefix sum itself equals the target (subarray from start).

#### Proof
Let  

$$
\text{prefix}[i] = \sum_{m=0}^{i} \text{arr}[m], \quad \text{with } \text{prefix}[-1] = 0
$$

The sum of a subarray from index $j+1$ to $i$ is  

$$
\text{sum}(j+1 \dots i) = \text{prefix}[i] - \text{prefix}[j]
$$

We want  

$$
\text{sum}(j+1 \dots i) = \text{target}
$$

So  

$$
\text{prefix}[i] - \text{prefix}[j] = \text{target}
$$

Rearrange:  

$$
\text{prefix}[j] = \text{prefix}[i] - \text{target}
$$

Thus, if at index $i$ the running sum is $\text{prefix}[i]$, and we have already seen a previous prefix sum equal to $\text{prefix}[i] - \text{target}$, then the subarray $(j+1 \dots i)$ must sum to the target.  

Its length is  

$$
i - j
$$



In [1]:
def longest_subarray_sum(nums: list[int], target: int) -> int:
    prefix_sum = 0
    first_occurrence: dict[int, int] = {}  # prefix_sum -> earliest index
    max_len = 0

    for i, num in enumerate(nums):
        prefix_sum += num

        # Case 1: subarray from start
        if prefix_sum == target:
            max_len = max(max_len, i + 1)

        # Case 2: subarray between two indices
        if (prefix_sum - target) in first_occurrence:
            length = i - first_occurrence[prefix_sum - target]
            max_len = max(max_len, length)

        # Store prefix sum index only if not seen (to maximize length)
        if prefix_sum not in first_occurrence:
            first_occurrence[prefix_sum] = i

    return max_len


# Example
arr = [3, 1, -1, 2, -1, 5, -2, 3]
target = 3
print(longest_subarray_sum(arr, target))  # 5

5


#### Complexity

- Time: O(n) (single pass through array)
- Space: O(n) (hashmap storing prefix sums)