# Linear-Time Selection

* Finding Max/Min

In [2]:
import random
import numpy as np

## Finding Maximum and Minimum

* Brute force: $2n-2 = O(n)$ (i.e. 2 comparisons per element)
* Pair-per-time: $3(n/2) = O(n)$ (i.e. 3 comparisons per pair)

In [3]:
arr = range(10000)
random.shuffle(arr)

### Brute Force

In [26]:
def maxmin_brute(arr):
    maximum, minimum = arr[0], arr[0]
    for elem in arr:
        if elem > maximum:
            maximum = elem
        if elem < minimum:
            minimum = elem
    return maximum, minimum

In [27]:
%%timeit

maxmin_brute(arr)

The slowest run took 16.66 times longer than the fastest. This could mean that an intermediate result is being cached.
1000 loops, best of 3: 222 µs per loop


In [16]:
print 'max: %6f | min: %6f' % maxmin_brute(arr)

max: 9999.000000 | min: 0.000000


### Pair-per-time

In [23]:
def maxmin_pair(arr, arrlen, maximum, minimum):
    for i in range(0, arrlen, 2):
        a,b = arr[i],arr[i+1]
        if a<b:
            if a<minimum: minimum = a
            if b>maximum: maximum = b
        else:
            if b<minimum: minimum = b
            if a>maximum: maximum = a
    return maximum, minimum        

def call_maxmin_pair(arr):
    arrlen = len(arr)
    if arrlen%2!=0:
        return maxmin_pair(arr[1:], arrlen-1, arr[0], arr[0])
    return maxmin_pair(arr[2:], arrlen-2, arr[0], arr[1]) \
            if arr[0]>arr[1] else maxmin_pair(arr[2:], arrlen-2, arr[1], arr[0]) 

In [28]:
%%timeit

call_maxmin_pair(arr)

1000 loops, best of 3: 346 µs per loop


In [29]:
print 'max: %6f | min: %6f' % call_maxmin_pair(arr)

max: 9999.000000 | min: 0.000000


## Randomized Lin-Select (avg. linear time)

* Sorting solution: sort with quicksort, e.g., then return the ith element.
* Randomized select.

In [110]:
def gen_arr(n):
    arr = range(0,2*n,2)
    random.shuffle(arr)
    return arr

### Select with Randomized QuickSort

* Procedure:
    * Sort array with QuickSort;
    * Return the $i$th element.
* Complexity: $O(nlogn)$

In [153]:
def partition(arr, l, r):
    rand_id = random.choice(range(l,r))
    arr[l],arr[rand_id] = arr[rand_id],arr[l]
    piv = arr[l]
    i = l+1
    for j in range(l+1,r): 
        if arr[j] < piv:
            arr[i],arr[j] = arr[j],arr[i]
            i += 1
    arr[l],arr[i-1] = arr[i-1],arr[l]
    return i-1

def quicksort(arr, l, r):
    if l<r:
        piv_id = partition(arr, l, r)
        quicksort(arr, l, piv_id)
        quicksort(arr, piv_id+1, r)
        
def quicksort_select(arr, i):
    quicksort(arr, 0, len(arr))
    return arr[i-1] if i-1>=0 else 0

In [154]:
%%timeit

quicksort_select(gen_arr(1000), 125)

1000 loops, best of 3: 1.62 ms per loop


### Randomized Lin-Select

* Procedure:
    * Partition as in Quicksort;
    * Operate on the left subarray if $i<piv$, right subarray if $i>piv$, return the pivot if $i=piv$ (i.e. the pivot is the $i$th smallest element).
* Complexity: $O(n)$ on average.

In [124]:
def partition(arr, l, r):
    rand_id = random.choice(range(l,r))
    arr[l],arr[rand_id] = arr[rand_id],arr[l]
    piv = arr[l]
    i = l+1
    for j in range(l+1,r): 
        if arr[j] < piv:
            arr[i],arr[j] = arr[j],arr[i]
            i += 1
    arr[l],arr[i-1] = arr[i-1],arr[l]
    return i-1

def rand_linselect(arr, l, r, i):
    if l==r:
        return arr[l]
    piv = partition(arr, l, r)
    k = piv - l + 1 # k: #elems in the subarray arr[l..piv]
    if i==k:
        return arr[piv]
    elif i<k:
        return rand_linselect(arr, l, piv, i)
    else:
        return rand_linselect(arr, piv+1, r, i-k)

In [162]:
%%timeit

rand_linselect(gen_arr(1000), 0, 1000, 125)

The slowest run took 7.51 times longer than the fastest. This could mean that an intermediate result is being cached.
1000 loops, best of 3: 303 µs per loop
