# Chapter 4: Sorting

## Applications of Sorting: Numbers

### 4-1
The Grinch is given the job of partitioning $2n$ players into two teams of $n$ players each. Each player has a numerical rating that measures how good he or she is at the game. The Grinch seeks to divide the players as *unfairly* as possible, so as to create the biggest possible talent imbalance between the teams. Show how the Grinch can do the job in $O(n\log{}n)$ time.

In [1]:
def grinch_sort(players):
    # it takes nlogn time to sort the players
    players.sort()

    # most unfair partition is to arrange the players in sorted order based on the ratings and assign the first and second halves to different teams
    middle_index = len(players) // 2
    team_a = players[:middle_index]
    team_b = players[middle_index:]

    return (team_a, team_b)

How do we know that this is truly the biggest imbalance? If we go with proof by contradiction then there must be a different partitioning which creates bigger imbalance. The rate of the imbalance is the difference between the sum of talent ratings in the two teams. Starting with our solution notice that if we exchange any 2 players from the teams, then Team B will have less total rating and Team A will have more total rating (since the players are first assembled by sorting by skill ratings). This way the difference between the two teams gets smaller which contradicts with our assumption. Even in the edge case when the two players exchanged have the same rating the the optimal solution produced by my algorithm still holds.

### 4-2
For each of the following problems, give an algorithm that finds the desired numbers within the given amount of time. To keep your answers brief, feel free to use algorithms from the book as subroutines. For the example, $S = \{6, 13, 19, 3, 8\},\  19 − 3$ maximizes the difference, while $8 − 6\ $ minimizes the difference.

1. Let $S$ be an *unsorted* array of $n$ integers. Give an algorithm that finds the pair $x, y \in S$ that *maximizes* $|x−y|$. Your algorithm must run in $O(n)$ worst-case time.

In [2]:
def find_pair_1(numbers):
    if not numbers:
        return None

    min_element = max_element = numbers[0]

    for n in numbers:
        if n < min_element:
            min_element = n
            continue
        if n > max_element:
            max_element = n
            
    return max_element - min_element

2. Let $S$ be a *sorted* array of $n$ integers. Give an algorithm that finds the pair $x, y \in S$ that *maximizes* $|x−y|$. Your algorithm must run in $O(1)$ worst-case time.

In [3]:
def find_pair_2(numbers):
    if not numbers:
        return None
            
    return numbers[len(numbers) - 1] - numbers[0]

3. Let $S$ be an *unsorted* array of $n$ integers. Give an algorithm that finds the pair $x, y \in S$ that *minimizes* $|x − y|$, for $x \neq y$. Your algorithm must run in $O(n \log{} n)$ worst-case time.

In [4]:
def find_pair_3(numbers):
    if len(numbers) < 2:
        return None

    # sorting takes O(n logn) time
    numbers.sort()
    min_dist = float('inf')

    for i in range(1, len(numbers)):
        current_dist = numbers[i] - numbers[i - 1]
        if current_dist != 0 and current_dist < min_dist:
            min_dist = current_dist

    return min_dist

4. Let $S$ be an *sorted* array of $n$ integers. Give an algorithm that finds the pair $x, y \in S$ that *minimizes* $|x − y|$, for $x \neq y$. Your algorithm must run in $O(n)$ worst-case time.

In [5]:
def find_pair_4(numbers):
    # it is basically the same as the previous one but without sorting
    if len(numbers) < 2:
        return None

    min_dist = float('inf')

    for i in range(1, len(numbers)):
        current_dist = numbers[i] - numbers[i - 1]
        if current_dist != 0 and current_dist < min_dist:
            min_dist = current_dist

    return min_dist

### 4-3
Take a list of $2n$ real numbers as input. Design an $O(n \log{} n)$ algorithm that partitions the numbers into $n$ pairs, with the property that the partition minimizes the maximum sum of a pair. For example, say we are given the numbers $(1,3,5,9)$. The possible partitions are $((1,3),(5,9))$, $((1,5),(3,9))$, and $((1,9),(3,5))$. The pair sums for these partitions are $(4,14)$, $(6,12)$, and $(10,8)$. Thus, the third partition has $10$ as its maximum sum, which is the minimum over the three partitions.

In [6]:
def partition_with_min_max_sum_of_pairs(numbers):
    pairs = []
    numbers.sort()
    size = len(numbers)
    for i in range(size // 2):
        pair_left = numbers[i]
        pair_right = numbers[size - i - 1] 
        pairs.append((pair_left, pair_right))
    return pairs

### 4-5
The *mode* of a bag of numbers is the number that occurs most frequently in the set. The set $\{4, 6, 2, 4, 3, 1\}$ has a mode of $4$. Give an efficient and correct algorithm to compute the mode of a bag of $n$ numbers.

In [7]:
def find_mode(numbers):
    if not numbers:
        return None

    numbers.sort()
    current_number = mode = numbers[0]
    max_occurrences = 1
    current_occurrences = 0 

    for n in numbers:
        if current_number != n:
            current_number = n
            current_occurrences = 0
        current_occurrences += 1
        if current_occurrences > max_occurrences:
            max_occurrences = current_occurrences
            mode = n

    return mode

### 4-6
Given two sets $S1$ and $S2$ (each of size $n$), and a number $x$, describe an $O(n \log{} n)$ algorithm for finding whether there exists a pair of elements, one from $S1$ and one from $S2$, that add up to $x$. (For partial credit, give a $\Theta(n^2)$ algorithm for this problem.)

In [8]:
def binary_search(sorted_array, x):
    if not sorted_array:
        return None

    if len(sorted_array) == 1:
        return x if x == sorted_array[0] else None

    mid_index = len(sorted_array) // 2
    return binary_search(sorted_array[:mid_index], x) if sorted_array[mid_index] > x \
        else binary_search(sorted_array[mid_index:], x)

def find_pairs(s1, s2, x):
    s1.sort()
    s2.sort()

    for n in s1:
        pair_to_find = x - n
        if binary_search(s2, pair_to_find) is not None:
            return (n, pair_to_find)

    return None

## Heaps

### 4-17
Devise an algorithm for finding the $k$ smallest elements of an unsorted set of $n$ integers in $O(n + k \log{} n)$.

In [9]:
import heapq # use the standard library

def find_k_smallest(li, k):
    if li is None or k >= len(li):
        raise Exception("Invalid input")
    
    heapq.heapify(li) # convert the list to a heap in O(n) time
    k_min = li[0]
    for _ in range(k):
        k_min = heapq.heappop(li) # continuously popping the min takes O(klogn) time
    return k_min

li = [5, 7, 9, 1, 3]
find_k_smallest(li, 3)

5

### 4-18
Give an $O(n \log{} k)$ -time algorithm that merges $k$ sorted lists with a total
of $n$ elements into one sorted list. (Hint: use a heap to speed up the obvious
$O(kn)$-time algorithm).

In [10]:
import heapq # use the standard library

def merge_sorted_lists(lists, n):
    if lists is None:
        raise Exception("Invalid input")
        
    mins = [] # store mins from the k list in a pq
    merged_list = []
    
    for idx, x in enumerate(lists):
        if len(x) == 0:
            continue
        element = (x.pop(0), idx) # this works because heapq sorts based on the first element
        heapq.heappush(mins, element)

    for i in range(n): # construct the merged list with n elements
        num, list_index = heapq.heappop(mins)
        merged_list.append(num)
        
        if len(lists[list_index]) != 0:
            new_element = (lists[list_index].pop(0), list_index)
            heapq.heappush(mins, new_element) # add new element to the pq in O(logk) time
        
    return merged_list


li1 = [2, 4, 5, 6, 8]
li2 = [0, 1, 9]

merge_sorted_lists([li1, li2], 8)

[0, 1, 2, 4, 5, 6, 8, 9]

### 4-19
You wish to store a set of $n$ numbers in either a max-heap or a sorted array.
For each application below, state which data structure is better, or if it does not
matter. Explain your answers.

1. Find the maximum element quickly.
> Both max-heap and sorted array provides $O(1)$ access to the maximum element
2. Delete an element quickly.
> Max-heap can delete an element in $O(\log{} n)$ time due to the bubble-down mechanism\
> For sorted array it is $O(n)$ as remaining elements need to be moved in order to fill the gap after the removal
3. Form the structure quickly.
> Sorting takes $O(n \log{} n)$ time\
> Heap can be constructed in $O(n)$ time.
4. Find the minimum element quickly.
> Sorted array provides $O(1)$ access to the minimum element\
> Max-heap is optimized for max elements, finding the min is only possible by scanning through the leaves of the tree in $O(n)$ time

### 4-20
1. Give an efficient algorithm to find the second-largest key among $n$ keys.
You can do better than $2n − 3$ comparisons.

In [11]:
class BinaryTreeNode:
    def __init__(self, data, left=None, right=None):
        self.data = data
        self.left = left
        self.right = right

def build_max_tree(li):
    """
    Builds a tree with leaves corresponding to list elements and parents as the respective max value
    The root of the tree should contain the max value
    """
    if li is None:
        raise Exception("Wrong input")
    
    if len(li) == 1:
        return BinaryTreeNode(li.pop(0))
    
    middle_index = len(li) // 2
    left = build_max_tree(li[:middle_index])
    right = build_max_tree(li[middle_index:])
    return BinaryTreeNode(
        max(left.data, right.data),
        left,
        right
    )

def find_keys_compared_to(node):
    """
    Find all the keys the given node data was compared against.
    The second max should be the maximum element out of all these values.
    """    
    max = node.data
    ret = []
    
    while node.left is not None and node.right is not None:
        if node.left.data == max:
            ret.append(node.right.data)
            node = node.left
        else:
            ret.append(node.left.data)
            node = node.right
        
    return ret

def get_second_max(li):
    node = build_max_tree(li)
    keys = find_keys_compared_to(node)
    return max(keys)

2. Then, give an efficient algorithm to find the third-largest key among $n$ keys.
How many key comparisons does your algorithm do in the worst case? Must your
algorithm determine which key is largest and second-largest in the process?

In [12]:
def get_third_max(li):
    """
    First find all the keys the max data node was compared against.
    Then find all the keys the second max data node was compared against.
    The third max value should be the max value out of the two above list minus the second largest key.
    """  
    node = build_max_tree(li)
    max_value = node.data
    keys_compared_to_largest = find_keys_compared_to(node)
    second_largest = max(keys_compared_to_largest)
    
    # Find second largest key data node
    while node.left is not None and node.right is not None:
        if node.left.data == second_largest:
            node = node.left
            break
        elif node.right.data == second_largest:
            node = node.right
            break
        elif node.left.data == max_value:
            node = node.left
        else:
            node = node.right
            
    keys_compared_to_second_largest = find_keys_compared_to(node)
    combined_list = keys_compared_to_second_largest + keys_compared_to_largest
    
    combined_list.remove(second_largest)
    return max(combined_list)

### 4-21
Use the partitioning idea of quicksort to give an algorithm that finds the
*median* element of an array of $n$ integers in expected $O(n)$ time. (Hint: must
you look at both sides of the partition?)

In [13]:
import random

def partition(array, low, high):

    # choose the the pivot as random
    pivot = random.randrange(low, high + 1)

    # pointer for greater element
    i = low - 1

    # traverse through all elements
    # compare each element with pivot
    for j in range(low, high):
        if array[j] <= pivot:

            # If element smaller than pivot is found
            # swap it with the greater element pointed by i
            i = i + 1

            # Swapping element at i with element at j
            (array[i], array[j]) = (array[j], array[i])

    # Swap the pivot element with the greater element specified by i
    (array[i + 1], array[high]) = (array[high], array[i + 1])

    # Return the position from where partition is done
    return i + 1

def find_median(array):
    if array is None:
        raise Exception("Wrong input")
    
    start = 0
    end = len(li) - 1
    med_index = len(li) // 2

    
    while start < end:
        pivot = partition(array, start, end)
        # key idea, only recurse on one side to achieve O(n) running time
        if pivot - med_index == 0:
            break
        elif pivot - med_index < 0:
            start = pivot + 1
        else:
            end = pivot - 1

    
    return array[med_index]


li = []
for i in range(10000):
    li.append(i)

find_median(li)

5000

### 4-24
Give an efficient algorithm to rearrange an array of $n$ keys so that all
the negative keys precede all the non-negative keys. Your algorithm must be
in-place, meaning you cannot allocate another array to temporarily hold the
items. How fast is your algorithm?

In [14]:
def do_special_rearrangement(arr):
    # O(n) in-place rearrangement
    
    if arr is None:
        raise Exception("Wrong input")
    
    left = 0
    right = len(arr) - 1
    pivot = 0
    
    while left < right:
        if arr[left] > pivot and arr[right] < pivot:
            arr[left], arr[right] = arr[right], arr[left]
            left += 1
            right -= 1
        elif arr[left] < pivot:
            left += 1
        elif arr[right] > pivot:
            right -= 1
            
arr = []
for i in range(10):
    arr.append(random.randrange(-50, 50))
print(arr)

do_special_rearrangement(arr)
print(arr)

[-38, 22, -1, 4, -25, -3, -18, 21, -16, 14]
[-38, -16, -1, -18, -25, -3, 4, 21, 22, 14]


### 4-33
Show that $n$ positive integers in the range 1 to $k$ can be sorted in $O(n \log{} k)$
time. The interesting case is when $k << n$.

In [15]:
def quicksort_with_median_partition(arr, low, high, k_start, k_end):
    """
    Quicksort that partitions the array using the median of the range [k_start, k_end] as a pivot.

    Parameters:
    - arr: List of integers to sort.
    - low: Starting index of the current range.
    - high: Ending index of the current range.
    - k_start: Lower bound of the range [k_start, k_end].
    - k_end: Upper bound of the range [k_start, k_end].

    Returns:
    - Sorted array (in-place).
    """
    if low < high:
        # Median of range
        pivot = k_start + (k_end - k_start) // 2

        # Partition the array around the pivot
        partition_index = partition(arr, low, high, pivot)

        # Recursively sort the two halves, updating k for the new ranges
        quicksort_with_median_partition(arr, low, partition_index - 1, k_start, pivot)
        quicksort_with_median_partition(arr, partition_index, high, pivot, k_end)


def partition(arr, low, high, pivot):
    """
    Partition the array around a given pivot value.

    Parameters:
    - arr: List of integers to partition.
    - low: Starting index of the range to partition.
    - high: Ending index of the range to partition.
    - pivot: The pivot value for partitioning.

    Returns:
    - Index where the partition happens.
    """
    i = low
    for j in range(low, high + 1):
        if arr[j] <= pivot:
            arr[i], arr[j] = arr[j], arr[i]
            i += 1
    return i


arr = [12, 4, 6, 8, 1, 16, 5, 9]
k = 16
print("Original array:", arr)
quicksort_with_median_partition(arr, 0, len(arr) - 1, 1, k)
print("Sorted array:", arr)


Original array: [12, 4, 6, 8, 1, 16, 5, 9]
Sorted array: [1, 4, 5, 6, 8, 9, 12, 16]


### 4-35
Let $A[1..n]$ be an array such that the first $n−\sqrt{n}$ elements are already sorted
(though we know nothing about the remaining elements). Give an algorithm
that sorts $A$ in substantially better than $O(n\log{}n)$ steps.

This can be done in $O(n)$ time. Basically we have $\sqrt{n}$ unsorted elements. The steps to solve the problem are the following:
* Sort the remaining $\sqrt{n}$ unsorted elements. This can be done at best $O(\sqrt{n \log{} n} )$ time
* Merge the two sorted subsets together in $O(n)$ time
* The total time complexity is $O(\sqrt{n \log{} n} + n )$ where the dominant term is $O(n)$

To make the solution more obvious we can even use a $O(n^2 )$ sorting algorithm to do the work on the unsorted elements. This takes $O(\sqrt{ n^2} )$ = $O(n )$ time so the total complexity is $O(n + n )$ which is again simplified to $O(n )$. 