The P-th percentile of a list of __N ordered values__ (sorted from least to greatest) is the __smallest value__ in the list, 
such that __no more than P percent__ of the data is __strictly less than__ the value 
and __at least P percent__ of the data is __less than or equal__ to that value.

It can be pbtained by first calculating the _ordinal rank_ and then taking the value from the ordered list that corresponds to that rank. The _original rank_ __k__ is calculated using this formula

$k= \lceil \frac{P}{100} \times K \rceil$

__example__

For instance, an array with 5 elements what should the 90th percentil be?

The 5th element can be the 90th percentile:

__no more than P percent of the data is strictly less than the value__: 80% of the data is strictly less than the largest element, which is no more than 90%

__ at least P percent of the data is less than or equal to that value__: 100% of the data is less than or equal to the 5th element, which is least 90%.

And the 5th element is the smallest one which can do that (even if the 4th and 5th elements are equal, the 5th element is still the smallest one, because the percentile is about the value not the position)

for fine tuning a formula, border cases are more interesting - like the 79-80-81st percentile of a 5-element list.

| element index: | 0 | 1 | 2 | 3 | 4 |
|--|--|--|--|--|--|
| strictly less: | 0% | 20% | 40% | 60% | 80% |
| less or equal: | 20% | 40% | 60% | 80% | 100% |

79th percentile: 4th is expected (60%<70%, 79%<=80%) 

80th percetile: 4th is expected (60%<80%, 80%<=80%)

81th percentile: 5th is expected (80%<81%, 81%<=100%)

| P | 5\*N/100 | k=ceil(5\*P/100) |
|--|--|--|
| 79 | 3.95 | 4 |
| 80 | 4 | 4 |
| 81 | 4.05 | 5 |

In [3]:
import math
import random
import time

def merge(a,b):
    """
    merge two ordered lists
    """
    result = []
    i, j = 0, 0
    while i < len(a) and j < len(b):
        if a[i] < b[j]:
            result.append(a[i])
            i += 1
        else:
            result.append(b[j])
            j += 1
    while i < len(a):
        result.append(a[i])
        i += 1
    while j < len(b):
        result.append(b[j])
        j += 1
    return result

def find_percentile_refrence(a, b, p):
    """
    refrence solution of find_percentile
    """
    c = merge(a,b)
    k = math.ceil(p/100 * len(c))
    return c[k-1]

def kth(arr1, arr2, m, n, k, st1, st2):
    # In case we have reached end of array 1 
    if (st1 == m):
        return arr2[st2 + k - 1]; 

    # In case we have reached end of array 2 
    if (st2 == n): 
        return arr1[st1 + k - 1]; 

    # k should never reach 0 or exceed sizes 
    # of arrays 
    if (k == 0 or k > (m - st1) + (n - st2)):
        return None; 

    # Compare first elements of arrays and return 
    if (k == 1): 
        return  arr1[st1] if (arr1[st1] < arr2[st2]) else arr2[st2]; 

    curr = k // 2; 

    # Size of array 1 is less than k / 2 
    if (curr - 1 >= m - st1):    
        # Last element of array 1 is not kth 
        # We can directly return the (k - m)th 
        # element in array 2 
        if (arr1[m - 1] < arr2[st2 + curr - 1]):
        
            return arr2[st2 + (k - (m - st1) - 1)]; 
        
        else:
        
            return kth(arr1, arr2, m, n, k - curr, 
                    st1, st2 + curr); 
        

    # Size of array 2 is less than k / 2 
    if (curr - 1 >= n - st2):
    
        if (arr2[n - 1] < arr1[st1 + curr - 1]):

            return arr1[st1 + (k - (n - st2) - 1)]; 

        else:

            return kth(arr1, arr2, m, n, k - curr, 
                    st1 + curr, st2); 
      
    else:
        # Normal comparison, move starting index 
        # of one array k / 2 to the right 
        if (arr1[curr + st1 - 1] < arr2[curr + st2 - 1]):
        
            return kth(arr1, arr2, m, n, k - curr, 
                    st1 + curr, st2); 
        
        else:
        
            return kth(arr1, arr2, m, n, k - curr, 
                    st1, st2 + curr); 
        


def find_percentile_recusion(a, b, p):

    len_a = len(a)
    len_b = len(b)

    k = math.ceil(p/100 * (len_a + len_b))

    return kth(a, b, len_a, len_b, k, 0, 0)

def find_percentile(a, b, p):

    m = len(a)
    n = len(b)
    # first calculate k=⌈P/100 * K⌉
    k = math.ceil(p/100 * (m + n))
    # initilize two pointers, move them from the beginning of the array to the end of the array
    p1 = 0
    p2 = 0
    # while both of the poinsters not reached the end of the array yet
    # we increase pointer by k//2 each time, and subtract k by k//2 each time
    # and the target must always to the right of the pointer
    while p1 <= m and p2 <= n:
        # one of the array exhausted, target is in the other array, its position is posinter position + k-1
        if p1 == m:
            return b[p2 + k - 1]
        if p2 == n:
            return a[p1 + k - 1]
        # when k is 1, we only need to find the first element of the merged array 
        # by campare the elements at the pointer position and return the smaller one
        if (k == 1): 
            return  a[p1] if (a[p1] < b[p2]) else b[p2]
        # try to divide both array by k//2
        curr = k // 2
        # if the remaining elements in `a` is less than k//2-1
        # and if the last element of `a` is less than the element of `b` at `p2 + curr - 1`
        # it means the target must be in `b`, and it's position is `p2 + (k - (m - p1) - 1)`
        # otherwise increase `p2` by k//2` becasue the target must be to the right of k//2 if in `b` or in `a`
        if (curr - 1) >= (m - p1):
            if (a[-1] < b[p2 + curr - 1]):
                return b[p2 + (k - (m - p1) - 1)]
            else:
                p2 = p2 + curr
        # if the remaining elements in `b` is less than k//2-1
        # and if the last element of `b` is less than the element of `a` at `p1 + curr - 1`
        # it means the target must be in `a`, and it's position is `p1 + (k - (n - p2) - 1)`
        # otherwise increase `p1` by `k//2` becasue the target must be to the right of k//2 if in `a` or in `b`
        elif (curr - 1) >= (n - p2):
            if (b[-1] < a[p1 + curr - 1]):
                return a[p1 + (k - (n - p2) - 1)]
            else:
                p1 = p1 + curr
        # if both arrays have elements more than k//2-1
        # if `a[curr + p1 - 1]` is less than `b[curr + p2 - 1]`
        # increase `p1` by `k//2` becasue the target must be to the right of k//2 if in `a` or in `b`
        # otherwise increase `p2` by `k//2` becasue the target must be to the right of k//2 if in `b` or in `a`
        else:
            if (a[curr + p1 - 1] < b[curr + p2 - 1]):
                p1 = p1 + curr
            else:
                p2 = p2 + curr
        # finally decrease k by k//2
        k = k - curr

def test_find_percentile(a, b, p, correct_answer):
    # run the soluton and compare to the correct answer
    result = find_percentile(a, b, p)
    error_str = 'Test failed!\nInput: {0}, {1}, {2}\nOutput: {3}\nAnswer: {4}'
    assert result == correct_answer, error_str.format(a, b, p, result, correct_answer)
    

def run_unit_tests():
    # run several test_find_percentile for different tests
    test_cases = [
        ([1, 2, 7, 8, 10], [6, 12], 50, 7),
        ([1, 2, 7, 8], [6, 12], 50, 6),
        (range(100), range(100), 50, 49),
        ([15, 20, 35, 40, 50], [], 30, 20),
        ([], [15, 20, 35, 40, 50], 30, 20),
        ([15, 20], [25, 40, 50], 40, 20),
    ]

    for case in test_cases:
        test_find_percentile(*case)

    print('Unit test passed!')

def get_random_test(test_size, max_step):
    random_step_a = random.randint(1, max_step)
    random_step_b = random.randint(1, max_step)
    return range(0, test_size * random_step_a, random_step_a), range(0, test_size * random_step_b, random_step_b), random.randint(1, 100)


def run_stress_test(max_test_size=100, max_step = 10, max_attempt = 100):
    # run several random tests
    # don't forget to find the right answer with refrence solution
    random.seed(100)
    for test_size in range(10, max_test_size+1, 10):
        print('Test size:' , test_size // 10)
        for step in range(1, max_step):
            for attampt in range(max_attempt):
                # random_step = random.randint(1, step)
                # a = range(0, test_size, random_step)
                # b = range(0, test_size, random_step)
                # p = random.randint(0, 100)
                arguments = get_random_test(test_size, step)
                result = test_find_percentile(*arguments, find_percentile_refrence(*arguments))
    print('Stress test passed!')

# find_percentile works 10 seconds on the max test
def run_max_test():
    # generate arrays a and b of maximum possible sizes
    # len(a), len(b) <= 150000 for theproblem
    # calculate working time for your solution, not reference!
    # run test_find_ercentile for max_test

    random.seed(100)
    
    arguments = get_random_test(1500000, 10)

    start = time.time()

    result = find_percentile(*arguments)

    end = time.time()

    time_cost = end-start

    timeout_str = 'Max Test failed!\nInput: {0}, {1}, {2}\nTime cost {3}'

    assert time_cost < 0.1, timeout_str.format(*arguments, time_cost)

    correct_answer = find_percentile_refrence(*arguments)

    error_str = 'Max Test failed!\nInput: {0}, {1}, {2}\nOutput: {3}\nAnswer: {4}'
    assert result == correct_answer, error_str.format(*arguments, result, correct_answer)

    print('Max test passed')

# some test code
if __name__ == "__main__":

    run_unit_tests()
    run_stress_test()
    run_max_test()

Unit test passed!
Test size: 1
Test size: 2
Test size: 3
Test size: 4
Test size: 5
Test size: 6
Test size: 7
Test size: 8
Test size: 9
Test size: 10
Stress test passed!
Max test passed


__Solution explanation:__
```
def find_percentile(a, b, p):
```

>first calculate k=⌈P/100 * K⌉

```
    m = len(a)
    n = len(b)
    k = math.ceil(p/100 * (m + n))
```

>initilize two pointers, move them from the beginning of the array to the end of the array  


```
    p1 = 0
    p2 = 0
```

>while both of the poinsters not reached the end of the array yet  
>we increase pointer by `k//2` each time, and subtract `k` by `k//2` each time  
>and the target must always to the right of the pointer  

```
    while p1 <= m and p2 <= n:
```

>if one of the array exhausted, target is in the other array, its position is posinter position + k-1   
        
```
        if p1 == m:
            return b[p2 + k - 1]
        if p2 == n:
            return a[p1 + k - 1]
```
        
> when k is 1, we only need to find the first element of the merged array 
> by campare the elements at the pointer position and return the smaller one

```
        if (k == 1): 
            return  a[p1] if (a[p1] < b[p2]) else b[p2]
```

> try to divide both array by `k//2`

```
        curr = k // 2
```

> if the remaining elements in `a` is less than `k//2-1`
> and if the last element of `a` is less than the element of `b` at `p2 + curr - 1`
> it means the target must be in `b`, and it's position is `p2 + (k - (m - p1) - 1)`
> otherwise increase `p2` by `k//2` becasue the target must be to the right of `k//2` if in `b` or in `a`

```
        if (curr - 1) >= (m - p1):
            if (a[-1] < b[p2 + curr - 1]):
                return b[p2 + (k - (m - p1) - 1)]
            else:
                p2 = p2 + curr
```
        
> if the remaining elements in `b` is less than `k//2-1`
> and if the last element of `b` is less than the element of `a` at `p1 + curr - 1`
> it means the target must be in `a`, and it's position is `p1 + (k - (n - p2) - 1)`
> otherwise increase `p1` by `k//2` becasue the target must be to the right of `k//2` if in `a` or in `b`

```
        elif (curr - 1) >= (n - p2):
            if (b[-1] < a[p1 + curr - 1]):
                return a[p1 + (k - (n - p2) - 1)]
            else:
                p1 = p1 + curr
```
        
> if both arrays have elements more than `k//2-1`
> if `a[curr + p1 - 1]` is less than `b[curr + p2 - 1]`
> increase `p1` by `k//2` becasue the target must be to the right of `k//2` if in `a` or in `b`
> otherwise increase `p2` by `k//2` becasue the target must be to the right of `k//2` if in `b` or in `a`
        
```
        else:
            if (a[curr + p1 - 1] < b[curr + p2 - 1]):
                p1 = p1 + curr
            else:
                p2 = p2 + curr
```
        
> finally decrease `k` by `k//2`
        
```
        k = k - curr
```

__Time complexity__: $O(log(m+n))$, since we dump the elements left to k//2 in one array each time, so the time complexity of the `while` loop is $O(log(m+n))$, in the worst case when both pointer reached the end of the arrays.
And the other part of the code is $O(1)$

__Space complexity__: $O(m+n)$ because a takes $O(n)$, and b takes $O(n)$. while `m,n,k,p1,p2` only take $O(1)$

- __loop invariant__: two pointers and `k` are stored, the target must be to the right of the pointer, increase the pointer by `k//2` each time, decrease `k` by `k//2` each time

- __Initiablization__: It's easy to calculate `k = ceil(p/100*K)`, initialize two pointers p1 and p2 for two arrays, and we consider the pointer as the start of both arrays, which means our target must NOT to the left of the pointer.

- __Maintenance__: Nomarlly, when the elements of the remaining of the array is more than `k//2` for both arrays. try to divide arrays into two parts by `k//2`. Compare the elements at the position `k//2` of both arrays, and our target must NOT in the left part of the array with a smaller element at the position `k//2`. So we can dump that part, namely move the pointer to the right by `k//2`, and pass `k - k//2` as `k` for next recursion.  
If `k//2` is more than the elements remaining in array 1, we compare the last element in array 1 with the element at `k//2` in array 2. If the last element of array 1 is smaller, then our target must be in the remaining part of array 2, we can end recursion. If the last element of array 1 is bigger, move the pointer on array 2 by `k//2`, and pass `k - k//2` as `k` for next recursion. 
And we do the same for array 1, if `k//2` is more than the elements remaining in array 2.

- __Termination__:  
If `k == 1`, campare the first elements of array 1 and 2, return the smaller one.  
When one of the array is exhausted, the target is the pointer of the remaining array plus k.   
If `k//2` is more than the remaining element of array 1 and the last element of array 1 is smaller than the (k//2)-th element of array 2, then our target must be in the remaining part of array 2, and it's position is k minus (the remaining elements of array 1) plus (the pointer on array 2);  
if `k//2` is more than the remaining element of array 2 and the last element of array 2 is smaller than the (k//2)-th element of array 1, then our target must be in the remaining part of array 1, and it's position is k minus (the remaining elements of array 2) plus (the pointer on array 1).  