# Question 1 <a class="anchor" id="one"></a>

Describe the worst case data and the best case data for each of the following sorting algorithms. Also, include the big O notation for each case.

* **Bubble Sort**
    * **Explanation**: This algorithm works by starting from the beginning of an array, and comparing the target index with the one after, and swapping the lower value with the higher value. It is called bubble sort since the highest value will "bubble" up to the end of the array to its proper position through this process.
    * **Best case**: $O(n)$ - Can check that the array is already sorted by seeing if swaps have been made, and exit the loop if there are no swaps
    * **Worst case**: $O(n^2)$ - If the array is in reverse order, then there will need to be n swaps, giving 1 full iteration of the array for each n elements (nested iteration)
    * **Space**: O(1): array is sorted in place
* **Selection Sort**
    * **Explanation**: This algorithm starts with a pointer at the beginning of the array. It iterates through every index of the array until it finds the lowest number, and then "selects" it and moves that number to the beginning of the array. The pointer is then incremented and then the process is repeated until the array is sorted. There is no check to see if the array is already sorted, resuling in the same best and worst case time complexities.
    * **Best case**: $O(n^2)$
    * **Worst case**: $O(n^2)$
    * **Space**: $O(n)$ - array is sorted in place
* **Insertion Sort**
    * **Explanation**: Insertion sort works by creating a sorted and unsorted portion of the array. Starting from the begining, the first index is designated as the boundary of the "sorted" portion. The pointer is then incremented, and the value of the pointer is compared with the value of the right most element in the sorted portion. The value at the pointer is then "inserted" into the sorted portion at the correct position. Once we get to the third index, we need to iterate through the sorted portion to find the correct position to insert the third index. The best case time complexity is when the array is already sorted,and we can verify this with simple index comparisons. In the worst case where insertions need to be made at every index, we'll need to iterate through the "sorted" portion of the array n times to find the correct index to insert.
    * **Best case**: $O(n)$ - array is already sorted
    * **Worst case**: $O(n^2)$ - array in reverse order
    * **Space**: $O(1)$ - array is sorted in place
* **Merge Sort**
    * **Explanation**: Merge sort is a divide and conquer algorithm that recursively splits an array into a "reverse pyramid" of subarrays and then works from the bottom up to sort the subarrays and merge them until a sorted array is returned. The algorithm takes the same number of steps if the array is sorted at the beginning or not, since this is only checked during the merge step, resulting in the same best and worst case time complexities.
    * **Best case**:$O(nlogn)$
    * **Worst case**: $O(nlogn)$
    * **Space**: $O(n)$ - when the array is merged, we need an additional holder array of size n to hold the merged result before copying back to the original array.
* **Quicksort**
    * **Explanation**: Quicksort relies on choosing a value (best if near the middle) to be a pivot value. This value is then sorted into the middle of the array, with all values greater than the pivot to the right and all values less than the pivot to the left. The array is then split into 2 subarrays, and the process is applied recursively until the full array is sorted.
    * **Best case**: $O(nlogn)$ - when the pivot value is correctly chosen near the middle of the range of values
    * **Worst case**: $O(n^2)$ - when the smallest or largest value is chosen as the pivot.
    * **Space**: 
      * Average is $O(logn)$, where $logn$ subarrays need to be made for each pivot
      * Worst case is $O(n)$, which will cause the depth of the recursion tree to grow to n elements, since the subarrays will have length $1$ and $n-1$.


# Question 2 <a class="anchor" id="2"></a>

Implement an insertion sort function.


In [7]:
def insertion_sort(arr: list[int]) -> list[int]:
    
    for i in range(1, len(arr)): #start with the first index already in sorted position
        
        sorting_boundary = i # need to init separate var since we don't want to decrement i inside the while loop
        
        while sorting_boundary > 0 and arr[sorting_boundary-1] > arr[sorting_boundary]:
            arr[sorting_boundary - 1], arr[sorting_boundary] = arr[sorting_boundary], arr[sorting_boundary - 1]

            sorting_boundary -= 1
        
    return arr        

The above algorithm loops through the array and swaps the first index outside of the sorting boundary with each index to its left until it gets to the correct position, and then returns a sorted array.

# Question 3 <a class="anchor" id="3"></a>


Write a function that accepts an array and returns an array of numbers missing from it

* The input array:
  * has a size $n$
  * contains random numbers ranging from $0$ to $n-1$
  * may contain duplicate values
  * not necessarily sorted
* The output array:
  * contains missing numbers ranging from $0$ to $n-1$.

This function must have a time complexity of $O(n)$ to get full credit.

For example:
Given the array `[0, 3, 6, 7, 3, 3, 0, 4]`, this function should return `[ 1, 2, 5 ]`

In [52]:
def find_missing_nums(arr: list[int]) -> list[int]:
    highest_val = max(arr) 
    seen_values = set(arr)
    
    missing_nums = []
    
    for i in range(highest_val+1):
        if i not in seen_values:
            missing_nums.append(i)
    
    return missing_nums

This function starts by finding the maximum value of the array (which is always $n-1$) and creating a set from the values in the inputted array. We then iterate one number at a time from $0$ to the `highest_val` and check if the number is in the set, which is always done in $O(1)$  to $O(n)$ time since sets are implemented using hash tables in python, and it is simply performing hash table look up. If the value is not in the set, it is appended to a list of missing values which is returned at the end.

* Time complexity of `max()` is $O(n)$ - [Source](https://www.geeksforgeeks.org/python-list-max-method/)
* Checking if an item is in a set is $O(n)$ worst case - [Source](https://www.geeksforgeeks.org/internal-working-of-set-in-python/)


# Question 4 <a class="anchor" id="4"></a>

Write a function that returns the first non-repeating character in a string with O(n) efficiency. It should return none or null if there are no non-repeating consecutive characters.

For example:

* string `"aaaaabbbbbbc"`, this function should return `"c"`
* string `"aabab"` should return `"b"`
* string `"aababb"` should return  None ("b" is repeating)

In [50]:
def first_non_repeating(input: str) -> str:
    repeat_chars = []
    
    for i in range(len(input) - 1):
        if input[i] == input[i+1]:
            repeat_chars.append(input[i])
            
    repeat_chars = set(repeat_chars)
    
    for val in input:
        if val not in repeat_chars:
            return val    

    return None


In [51]:
test_cases = {'abbbb':'a', 
              'aaaaabbbbbbc':'c',
                'aabab':'b',
                'aababb': None}
for key in test_cases:

    input = key
    output = first_non_repeating(key)
    expected_output = test_cases[key]
    passed = False
    
    if output == expected_output:
        passed = True

    print(f"Input: {input}, Output: {output}, Expected output: {expected_output}, Passed: {passed}")

Input: abbbb, Output: a, Expected output: a, Passed: True
Input: aaaaabbbbbbc, Output: c, Expected output: c, Passed: True
Input: aabab, Output: b, Expected output: b, Passed: True
Input: aababb, Output: None, Expected output: None, Passed: True


This function works by iterating through the input string and then checking for repeat values. Values that are repeated are added to an array. We then turn this array into a set, so that we can get near constant time operations since sets are implemented using hash tables. We then iterate through the inputted string, and check if the current character in the string is in the set of repeated values. We return the first value that is not repeated.

I acknowledge that it is technically possible that checking if a number within a set could be $O(n)$, in the case of hash collisions. Since this is extremely unlikely, I will consider it $O(1)$ for the purposes of this exercise.

# Question 5 <a class="anchor" id="five"></a>

Write a function that given an array of integers and a target value, returns the length of the longest subarray with a sum equal to the target value. Write the function with $O(n)$ efficiency for full credit.

Note: while the sliding window technique is acceptable as a solution, try solving this using a hash table.
 
For example:
Given an array `[3, 1, -1, 2, -1, 5, -2, 3]` and a target value of `3`, the longest subarray length is `5` (`[-1, 2, -1, 5, -2]`)


For this algorithm, we will iterate through an array, and then return the longest subarray within it that sums to a target value.

First, we'll initialize the following variables: 
* `cum_sum`: cumulative sum as we iterate through the array
* `len_long_sub`: the length of the longest subarray 
* `index_start`: index of the start of the longest subarray
* `cum_sums`: Hash Table storing cumulative sums as keys and their indicies as values

Then, we'll for loop through the array, and add the value at index `i` of the array to the cumulative sum. If the cumulative sum is equal to the target, we'll update `len_long_sub` and `index_start`, since we know that the longest possible subarray we could have seen thus far is the array from `0` to the current index.

Next, we'll check if $cumulative \space sum - target$ is in the hash table. We do this because we know that for any subarray with starting index $i$ and ending index $j$, the cumulative sum of the subarray is equal to: $cum\_sum_i - cumulative\_sum_j$.

Since we are specifically looking for instances where $cum\_sum_i - cumulative\_sum_j = target$, we can rewrite this equation to be:

$cumulative\_sum_j - target = cum\_sum_i$

Thus, if $cumulative\_sum_j - target$ is in the hash table, we know that there must exist a subarray from the index of the key corresponding to the cum sum (it's value) to $j$ (the current index). If we are able to find it in the hash table, we update the max length and starting index variables.

Next, we store the cumulative sum in the hash table if it hasn't been seen before. It is crucial that we only do this if the cumulative sum hasn't been seen before and not at each index becauase we can maximize the length of the possible returned subarray by doing so.

Finally, we extract and return the longest subarray.


In [72]:
def longest_subarray(arr: list[int], target: int) -> list[int]:
    cum_sum = 0
    len_long_sub = 0
    index_start = 0
    cum_sums = {}

    for i in range(len(arr)):
        cum_sum += arr[i]

        if cum_sum == target:
            len_long_sub = i + 1
            index_start = 0

        if (cum_sum - target) in cum_sums:
            if len_long_sub < (i - cum_sums[cum_sum - target]):
                len_long_sub = i - cum_sums[cum_sum - target]
                index_start = cum_sums[cum_sum - target] + 1
        
        if cum_sum not in cum_sums:
            cum_sums[cum_sum] = i

    return len_long_sub, arr[index_start:index_start + len_long_sub]

In [76]:
arr = [3, 1, -1, 2, -1, 5, -2, 3]
target = 3
length, subarray = longest_subarray(arr, target)
print(f"Length of longest subarray: {length}")
print(f"Longest subarray: {subarray}")

Length of longest subarray: 5
Longest subarray: [-1, 2, -1, 5, -2]
