# More Python Exercises

## Problem 0 - Fast Searching in Ordered Collections (10 points)

In [1]:
# Suppose you're given a list of already sorted numbers:
A = [2, 16, 26, 32, 52, 71, 80, 88]

# These are already sorted:
assert A == sorted(A)

In [2]:
# Check to see if an element exists in the list:
def contains(A, x):
    """Returns True only if the value `x` exists in `A`."""
    return x in A

print("A contains 32: {}".format(contains(A, 32)))
print("A contains 7: {}".format(contains(A, 7)))
print("A contains -10: {}".format(contains(A, -10)))

A contains 32: True
A contains 7: False
A contains -10: False


This method works fine and is reasonably fast on small lists. However, if the list is very large, this method can be wasteful, computationally speaking.

That's because it does **not** take advantage of the fact that `A` is already ordered. In such a case, it should be easier to determine whether the element exists. (How?)

**Exercise 0** (3 + 7 == 10 points). Write a function, `ordered_contains(S, x)`, that takes an **already sorted** list, `S`, as input, and determines whether it contains `x`. But there is one more condition: your method **must** be **at least ten times faster** than `contains()` for "large" lists!

In particular, there are two test codes for this exercise. The first one checks that your procedure does, indeed, return the correct result by comparing its output to `contains()`, which we will assume is correct. Correctness is worth three (3) points of partial credit out of ten (10). The second test cell checks whether your implementation is faster than `contains()` for a relatively large, but ordered, list. If your implementation is slower for smaller lists, that is okay!

> **Hint.** If you can find a standard Python library routine to help you, by all means, use it!

In [3]:
import bisect

In [6]:
bisect.bisect_left(A, 32)

3

In [11]:
A[3] == 32

True

In [8]:
bisect.bisect_left(A, -10)

0

In [10]:
A[0] == -10

False

In [12]:
bisect.bisect_left(A, 1000000)

8

In [13]:
A[8]

IndexError: list index out of range

In [14]:
len(A)

8

In [21]:
# write your code into this function here:
def ordered_contains(S, x):
    # You may assume that `S` is sorted
    # Try the bisect method
            
    x_index = bisect.bisect_left(S, x) 
    if x_index < len(S):
        return S[x_index] == x
    else:
        return False
                
print("A contains 32: {}".format(ordered_contains(A, 32)))
print("A contains 7: {}".format(ordered_contains(A, 7)))
print("A contains -10: {}".format(ordered_contains(A, -10)))
print("\n(Did those results match the earlier example?)")

A contains 32: True
A contains 7: False
A contains -10: False

(Did those results match the earlier example?)


In [22]:
# Test cell: `test_is_correct` (1 point)

from random import randint, sample

def gen_list(n, v_max, v_min=0):
    return sample(range(v_min, v_max), n)

def gen_sorted_list(n, v_max, v_min=0):
    return sorted(gen_list(n, v_max, v_min))

def check_case(S, x):
    msg = "`contains(S, {}) == {}` while `ordered_contains(S, {}) == {}`!"
    true_solution = contains(S, x)
    your_solution = ordered_contains(S, x)
    assert your_solution == true_solution, msg.format(true_solution, your_solution)

S = gen_sorted_list(13, 100)
print("Checking your code on this input: S = {}".format(S))

check_case(S, S[0])
check_case(S, S[0]-1)
check_case(S, S[-1])
check_case(S, S[-1]+1)

for x in gen_list(50, 100, -100):
    check_case(S, x)
print("\n(Passed basic correctness checks.)")

print("\nTiming `contains()`...")
x = randint(-100, 100)
%timeit contains(S, x)

print("\nTiming `ordered_contains_1()`...")
%timeit ordered_contains(S, x)

print("\n(This problem is small, so it's okay if your method is slower.)")
print("\n(Passed!)")

Checking your code on this input: S = [2, 15, 29, 30, 34, 41, 42, 48, 56, 85, 89, 94, 99]

(Passed basic correctness checks.)

Timing `contains()`...
241 ns ± 17.2 ns per loop (mean ± std. dev. of 7 runs, 1000000 loops each)

Timing `ordered_contains_1()`...
628 ns ± 8.02 ns per loop (mean ± std. dev. of 7 runs, 1000000 loops each)

(This problem is small, so it's okay if your method is slower.)

(Passed!)


In [23]:
# Test cell: `test_is_faster` (7 points)

N_MIN = 1000000
N_MAX = 2*N_MIN
R_MAX = max(10*N_MAX, 1000000000)

n = randint(N_MIN, N_MAX)
print("Generating a list of size n={}...".format(n))

S_large = gen_sorted_list(n, R_MAX)

print("Quick correctness check...")
x = randint(-R_MAX, R_MAX)
check_case(S_large, x)
print("\n(Passed.)")

print("\nTiming `contains()`...")
t_baseline = %timeit -o contains(S_large, x)
print("\nTiming `ordered_contains()`...")
t_better = %timeit -o ordered_contains(S_large, x)

speedup = t_baseline.average / t_better.average
assert speedup >= 10, "Your method was only {:.2f}x faster (< 1 means it was slower)!".format(speedup)

print("\n(Passed -- you were {:.1f}x faster!)".format(speedup))

Generating a list of size n=1740566...
Quick correctness check...

(Passed.)

Timing `contains()`...
136 ms ± 6 ms per loop (mean ± std. dev. of 7 runs, 10 loops each)

Timing `ordered_contains()`...
1.09 µs ± 47.8 ns per loop (mean ± std. dev. of 7 runs, 1000000 loops each)

(Passed -- you were 124239.4x faster!)
