# Template - Binary Search

Binary search is a powerful array searching algorithm, but it's [notoriously tough to implement correctly](https://en.wikipedia.org/wiki/Binary_search_algorithm#Implementation_issues). When I've tried implementing the "classical version" from scratch, it usually breaks for various reasons (invalid `hi < lo` index, can't handle odd or even arrays, doesn't find the value at `arr[0]`/`arr[-1]`, etc). I make two changes to make it more robust.

### Test lo and hi for adjacency, not gt/gte/lt/lte
Writing the test as `while lo < hi-1` prevents overshooting and simplifies the exit condition. At the end, we can just test `arr[hi]` and `arr[lo]` for equality to `val` and return -1 otherwise.

### Use a callbacks for calculating indexes and the return value
When the problem needs a simple binary search (i.e. search for a val in an array of ints and return the index or -1), this isn't necessary. However, some problems require a more complicated condition to determine the next `lo` and `hi`, and in this case code can become ugly / complicated.

Using callbacks and the modified comparison, the main body of binary search reduces to the following template:

In [1]:
def binary_search(arr, val, cmp, ret):
    if not arr:
        return -1

    lo, hi = 0, len(arr)-1
    while lo < hi-1:
        lo, hi = cmp(arr, lo, val, hi)
    
    return ret(arr, lo, val, hi)

In [2]:
def default_cmp(arr, l,v,h):
    m = (l+h)//2
    return (l,m) if arr[m] > v else (m,h)

def default_test(arr, l,v,h):
    if arr[l] == v:
        return l
    if arr[h] == v:
        return h
    return -1

In [3]:
cases = [
    ([], 10, -1), # Empty arr
    ([10], 10, 0), # Singleton arr, val present
    ([0], 10, -1), # Singleton arr, val missing
    ([i for i in range(10)], 0, 0), # Even length array, value at lowest bound
    ([i for i in range(10)], 5, 5), # Even length array, value between bounds
    ([i for i in range(10)], 9, 9), # Even length array, value at highest bound    
    ([i for i in range(10)], 67, -1), # Even length array, value missing
    ([i for i in range(11)], 0, 0), # Odd length array, value at lowest bound
    ([i for i in range(11)], 6, 6), # Odd length array, value between bounds
    ([i for i in range(11)], 10, 10), # Odd length array, value at highest bound    
    ([i for i in range(11)], 67, -1), # Odd length array, value missing
]
for arr, val, expected in cases:
    actual = binary_search(arr, val, default_cmp, default_test)
    assert actual == expected, f"{arr},{k}: {expected} != {actual}"

Now, this becomes easily implemented and flexible for other cases. 

### Example: [Kth missing positive number](https://leetcode.com/problems/kth-missing-positive-number/). 
There's a  O(n) solution where we enumerate every missing number until we find the k-th one. However, the array is strictly increasing, so `arr[i]-i-1` is the number of missing elements to the left of i, and we can use this to find the `k`-th missing:
```
     i =  [  0 1 2     3         4]
    arr = [  2,3,4,    7,       11]
missing = [1       5,6,  8,9,10,

i arr[i] (arr[i]-i-1) = left missing   
0   2      2-0-1 = 1
1   3      3-1-1 = 1
2   4      4-2-1 = 1
3   7      7-3-1 = 3
4   11    11-4-1 = 6

k   i     arr[i] - (left missing - k) - 1 = kth missing
5   4       11   - (   6   -       5) - 1 =   9     
1   0        2   - (   1   -       1) - 1 =   2


Note that because left missing = arr[i]-i-1:
kth missing = arr[i] - (left missing - k) - 1
            = arr[i] - (arr[i] - i - 1 - k) - 1 
            = arr[i] - arr[i] + i + 1 + k - 1
            = i + k
```
So with this, we can use a binary search to find the minimum `i` s.t. `k <= arr[i]-i-1`, from which we can then calculate the `k`-th missing. Note that the problem allows for `k` to fall outside of the array (`k=250` is valid for the above input), but we can handle this with a check before the binary search.

In [4]:
arr = [2,3,4,7,11]

def kth_missing_cmp(arr, lo, k ,hi):
    mid = (hi+lo)//2
    mid_missing_before = arr[mid] - mid - 1
    return (mid, hi) if mid_missing_before < k else (lo, mid)
        
def kth_missing_ret(arr, lo, k, hi):
    return hi if arr[lo] - lo - 1 < k else lo

assert binary_search(arr, 6, kth_missing_cmp, kth_missing_ret) == 4
assert binary_search(arr, 5, kth_missing_cmp, kth_missing_ret) == 4
assert binary_search(arr, 4, kth_missing_cmp, kth_missing_ret) == 4
assert binary_search(arr, 3, kth_missing_cmp, kth_missing_ret) == 3
assert binary_search(arr, 2, kth_missing_cmp, kth_missing_ret) == 3
assert binary_search(arr, 1, kth_missing_cmp, kth_missing_ret) == 0

Then we can wrap this into the solution as follows:

In [21]:
class Solution:
    def findKthPositive(self, arr, k):
        # k-th missing is greater than largest
        if arr[-1] - len(arr) < k:
            return  k + len(arr)

        nearest_i = binary_search(arr, k, kth_missing_cmp, kth_missing_ret)
        return k + nearest_i

sol = Solution()
arr = [2,3,4,7,11]
cases = [
    (arr, 7, 12),
    (arr, 6, 10),
    (arr, 5, 9),
    (arr, 4, 8),
    (arr, 3, 6),
    (arr, 2, 5),
    (arr, 1, 1),
]
for k, expected in cases:
    actual = sol.findKthPositive(arr, k)   
    assert actual == expected, f"{arr, k}: {expected} != {actual}"