### How would you uniformly pick a random element from a gigantic stream? 

This is the principal of reservoir sampling. 

Naive solution:
- process the stream 
- store all the elements
- find its size
- pick a random element from $[0, size-1]$ 

The problem with the naive solution is that it takes $O(N)$ space for a large $N.$ This is problematic, and we can do much better (constant space). The key is to understand loop invariants. On the $i$-th iteration of our loop, pick a random element. 

See: https://stackoverflow.com/questions/3221577/what-is-a-loop-invariant

>A loop invariant is a condition [among program variables] that is necessarily true immediately before and immediately after each iteration of a loop. (Note that this says nothing about its truth or falsity part way through an iteration.)

>By itself, a loop invariant doesn't do much. However, given an appropriate invariant, it can be used to help prove the correctness of an algorithm. The simple example in CLRS probably has to do with sorting. For example, let your loop invariant be something like, at the start of the loop, the first i entries of this array are sorted. If you can prove that this is indeed a loop invariant (i.e. that it holds before and after every loop iteration), you can use this to prove the correctness of a sorting algorithm: at the termination of the loop, the loop invariant is still satisfied, and the counter i is the length of the array. Therefore, the first i entries are sorted means the entire array is sorted.

For the actual reservoir sampling algorithm... 

See: https://en.wikipedia.org/wiki/Reservoir_sampling


In [20]:
import random 

def pick(big_stream): 
    random_element = None
    
    # i = counter
    # e = value 
    # big_stream is some massive list
    
    for i, e in enumerate(big_stream): 
        if i == 0: # trivial case, if i=0 pick uniformly from [0,0]
            random_element = e
            
        # random.randint(1,i+1) chooses uniformly from [1,i+1] inclusive
        if random.randint(1,i+1) == 1: 
            random_element = e
            
        return random_element
    
# since we are only storing a single variable, this only takes up constant space

In [28]:
# random.randint(1,i+1) notes
# random.randint(..) picks uniformly between [1,5] inclusive
for i in range(10): 
    print(random.randint(1,5))


3
3
1
1
2
4
1
1
1
1


In [16]:
# Enumerate notes
# See: http://book.pythontips.com/en/latest/enumerate.html

for counter, value in enumerate([5,6,10,15,8]): 
    print(counter, value) 
    # print(value)

0 5
1 6
2 10
3 15
4 8


In [18]:
my_list = ['apple', 'banana', 'grape','pear']
for c, value in enumerate(my_list,5): 
    # the optional argument 1 tells it where to start
    print(c,value) 

5 apple
6 banana
7 grape
8 pear


In [19]:
my_list = ['apple', 'banana', 'grape','pear']
counter_list = list(enumerate(my_list,1)) # creates a tuple!
print(counter_list) 

[(1, 'apple'), (2, 'banana'), (3, 'grape'), (4, 'pear')]
