## 10.1 Merge Sorted Files 

Write a program that takes as input a set of sorted sequences and computes the union of these sequences as a sorted sequence. For example, if the inout is <3,5,7>, <0,6>, and <0,6,28>, then the output is <0,0,3,5,6,6,7,28>.

In [3]:
import heapq

In [4]:
def merge_sorted_arrays(sorted_arrays: list) -> list:
    min_heap = []
    # Builds a list of iterators for each array in sorted_arrays
    sorted_arrays_iters = [iter(x) for x in sorted_arrays]
    
    # Puts first element from each iterator in min_heap
    for i, it in enumerate(sorted_arrays_iters):
        first_element = next(it, None)
        if first_element is not None:
            heapq.heappush(min_heap, (first_element, i))
            
    result = []
    while min_heap:
        smallest_entry, smallest_array_i = heapq.heappop(min_heap)
        smallest_array_iter = sorted_arrays_iters[smallest_array_i]
        result.append(smallest_entry)
        next_element = next(smallest_array_iter, None)
        if next_element is not None:
            heapq.heappush(min_heap, (next_element, smallest_array_i))
    return result 

In [5]:
sorted_arrays = [[3,5,7], [0,6], [0,6,28]]
merge_sorted_arrays(sorted_arrays)

[0, 0, 3, 5, 6, 6, 7, 28]

In [7]:
from heapq import heappush, heappop
heap = []
data = [1,3,5,7,9,2,4,6,8,0]
for item in data:
    heappush(heap, item)

In [8]:
print(heap)

[0, 1, 2, 6, 3, 5, 4, 7, 8, 9]


In [9]:
ordered = []
while heap:
    ordered.append(heappop(heap))

In [10]:
ordered

[0, 1, 2, 3, 4, 5, 6, 7, 8, 9]

In [11]:
data.sort()
data == ordered

True

In [12]:
heap = []
data = [(1, 'J'), (4, 'N'), (3, 'H'), (2, 'O')]
for item in data:
    heappush(heap, item)

In [13]:
while heap:
    print(heappop(heap)[1])

J
O
H
N


In [14]:
# Pythonic solution, uses the heapq.merge() method which takes multiple inputs 
def merge_sorted_arrays_pythonic(sorted_arrays):
    return list(heapq.merge(*sorted_arrays))

In [15]:
merge_sorted_arrays_pythonic(sorted_arrays)

[0, 0, 3, 5, 6, 6, 7, 28]

Let k be the number of input sequences. Then there are no more than k elements in the min-heap. Both extract-min and insert take O(log k) time. Hence, we can do the merge in O(n log k) time. The space complexity is O(k) beyond the space needed to write the final result. In particular, if the data comes from files and is written to a file, instead of arrays, we would need only O(k) additional storage. 

Alternatively, we could recursively merge the k files, two at a time using the mergge step from merge sort. We would go from k to k/2 then k/4, etc.files. There would be log k stages, and each has time complexity O(n), so the time complexity is the same as that of the heap-based approach, i.e., O(n log k). The space complexity of any reasonable implementation of merge sort would end up being O(n), which is considerabley worse than the heap based approach when k << n. 

## 10.2 Sort an Increasing-Decreasing Array

An array is sad to be k-increasing-decreasing if elements repeatedly increase up to a certain index after which they decrease, then again increase, a total of k times. Design an efficient algorithm for sorting a k-increasing-decreasing array. 

In [20]:
def sort_k_increasing_decreasing_array(A: list) -> list:
    # Decomposes A into a set of sorted subarrays
    sorted_subarrays = []
    increasing, decreasing = range(2)
    subarray_type = increasing 
    start_idx = 0
    for i in range(1, len(A)+1):
        if (i == len(A) or #A is ended. Adds the last subarray.
            (A[i -1] < A[i] and subarray_type == decreasing) or 
            (A[i-1] >= A[i] and subarray_type == increasing)):
                sorted_subarrays.append(A[start_idx : i] if subarray_type == increasing
                                       else A[i-1: start_idx -1: -1])
                start_idx = i
                subarray_type = (decreasing
                                if subarray_type == increasing else increasing)
    return merge_sorted_arrays(sorted_subarrays)
        

In [21]:
A =[57, 131, 493, 294, 221, 339, 418, 452, 442, 190]
sort_k_increasing_decreasing_array(A)

[57, 131, 190, 221, 294, 339, 418, 442, 452, 493]

In [24]:
import itertools

In [29]:
# Pythonic solution, uses a stateful object to trace the monotonic subarrays. 
def sort_k_increasing_decreasing_array_pythonic(A):
    class Monotonic:
        def __init__(self):
            self._last = float('-inf')
        def __call__(self, curr):
            result = curr < self._last
            self._last = curr
            return result 
    
    return merge_sorted_arrays([
        list(group)[::-1 if is_decreasing else 1]
        for is_descreasing, group in itertools.groupby(A, Monotonic())
    ])

In [30]:
sort_k_increasing_decreasing_array_pythonic(A)

NameError: name 'is_decreasing' is not defined

The time complexity is O(n log k) time. 

## 10.3 Sort an Almost-sorted Array

Often data is almost-sorted -- for example, a server receives timestamped stock quotes and earlier quotes may arrive slightly after later quotes because of differneces in server loads and network routes. 

Write a program which takes as input a very long sequence of numbers and prints the numbers in sorted order. Each number is at most k away from its correctly sorted position. 

**Hint:** How many numbers must you read after reading the ith number to be sure you can place it in the correct location? k+1.

The brute-force solution is to put the sequence in anarray, sort it, and then print it. The time complexity is O(n log n), where n is the length of the input sequence. The space complexity is O(n). 

We can do better by taking advantage of the almost-sorted property. Specifically, after we have read k+1 numbers, the smallest number in the group must be smaller than all following numbers. We need to store k+1 numbers and want to be able to efficiently extract the minimum number and add a new number. A min-heap is eactly what we need. We add the first k numbers to a min-heap. Now we add additional numbers to the min-heap and extract the minimum from the heap. 

In [None]:
def sort_approximately_sorted_array(sequence, k: int) -> list:
    min_heap = []
    # Adds the first k elements into min_heap. Stop if there are fewer than k elements.
    for x in itertools.islice(sequence, k):
        heapq.heappush(min_heap, x)
        
    result = []
    # For every new element, add it to min_heap and extract the smallest. 
    for x in sequence:
        smallest = heapq.heappushpop(min_heap, x)
        result.append(smallest)
        
    # sequence is exhausted, iteratively extracts the remaining elements
    while 