### 528. Random Pick with Weight

You are given a 0-indexed array of positive integers `w` where `w[i]` describes the weight of the ith index. **(WLOG,`w[i]` can be float)**

You need to implement the function pickIndex(), which randomly picks an index in the range `[0, w.length - 1]` (inclusive) and returns it. The probability of picking an index `i` is `w[i] / sum(w)`.

For example, if `w = [1, 3]`, the probability of picking index 0 is 1 / (1 + 3) = 0.25 (i.e., 25%), and the probability of picking index 1 is 3 / (1 + 3) = 0.75 (i.e., 75%).


<ins>Logic<ins>

Use sampling technique from statistics (sampling using CDF)

- Create a cumulative distribution of `w`, called `w_cum`

- Random generate a number `x` from [0, `w_cum[-1]`)

- Find the first index `i` in `w_cum` s.t `w_cum[i] > x`, which corresponds to the index to be picked

In [4]:
import random

class WeightedPick:

    def __init__(self, w):
        '''
        Initialize cumulative w list
        '''
        # O(N)
        self.w = w
        self.cum_w = []
        prev = 0
        for weight in w:
            self.cum_w.append(weight + prev)
            prev = self.cum_w[-1]
        
    def pickIndex(self):
        '''
        BS to find the first index s.t cum_w[i] > random number
        '''
        # O(logN) besides edge case part

        # edge case handler
        # 1. if w is a empty list 
        # 2. if w has -ve values 
        # 3. if w are all zero
        if (
            (not self.cum_w) or 
            (min(self.cum_w) < 0) or 
            (not sum(self.cum_w))
        ):
           return -1

        # random number gen
        random_num = random.random() * self.cum_w[-1]

        # find index using BS
        start, end = 0, len(self.cum_w) - 1
        while start <= end:
            mid = (start + end) // 2
            if self.cum_w[mid] > random_num:
                end = mid - 1
            else:
                start = mid + 1
        
        # since there must exist a number in cum_w which is > random_num, 
        # so no handler is needed
        return start

    def compare_distn(self, n_sim=100000):
        normalize_w = [round(w / sum(self.w), 2) for w in self.w]
        
        freqs = [0] * len(normalize_w)
        for _ in range(n_sim):
            freqs[self.pickIndex()] += 1
        distn = [round(freq / n_sim, 2) for freq in freqs]
        
        print(normalize_w, distn, sep='\n')

In [5]:
print('Test: Edge Case 1')
w = []
test = WeightedPick(w)
print(w, test.pickIndex(), sep='\n')

print('\nTest: Edge Case 2')
w = [1, 2, -10, 4, 0]
test = WeightedPick(w)
print(w, test.pickIndex(), sep='\n')

print('\nTest: Edge Case 3')
w = [0, 0, 0, 0]
test = WeightedPick(w)
print(w, test.pickIndex(), sep='\n')

print('\nTest: normal case')
w = [1, 2, 3.5, 4, 5]
test = WeightedPick(w)
test.compare_distn()

print('\nTest: len(w) == 1')
w = [3]
test = WeightedPick(w)
test.compare_distn()

print('\nTest: contain 0')
w = [0, 1, 2, 0, 10]
test = WeightedPick(w)
test.compare_distn()


Test: Edge Case 1
[]
-1

Test: Edge Case 2
[1, 2, -10, 4, 0]
-1

Test: Edge Case 3
[0, 0, 0, 0]
-1

Test: normal case
[0.06, 0.13, 0.23, 0.26, 0.32]
[0.06, 0.13, 0.22, 0.26, 0.32]

Test: len(w) == 1
[1.0]
[1.0]

Test: contain 0
[0.0, 0.08, 0.15, 0.0, 0.77]
[0.0, 0.08, 0.15, 0.0, 0.77]
