# Draw with replacement

Imagin we have an urn, filled with numbered balls. We draw a number of these balls, but after each draw, we place the drawn ball back in the urn. How many different numbers will we have seen after a given number of draws?

To make this precise, imagine the urn contains $N$ balls, labeled with the numbers $0, 1, ..., N$. Let us refer to $N$ as the _urn size_. We draw $S$ balls, our _sample size_, but after every draw, we place the ball back in the urn. I.e. for each of the $S$ draws, we are effectively sampling from all of the $N$ balls. How many different numbers do we expect to have seen in a sample of size $S$?

In [1]:
import numpy as np

## Setting urn and sample size

In [2]:
N = 5  # Urn size
S = 5  # Sample size

## A first implementation
Before trying to calculate how many different numbers we expect to see, let us write a function which performs the experiment a given number of times, and computes the average.

In [46]:
def compute_expected_sample_size(urn_size: int, sample_size: int, no_samples: int = 1_000) -> float:
    """Computes the expected sample size, given the urn size and sample size, by simulating the 
    experiment ``no_samples`` times."""
    # The urn is represented as a list of numbers 0, 1, ..., urn_size - 1
    urn = list(range(urn_size))
    
    # Run the experiment
    diff_numbers_in_sample = []
    for _ in range(int(no_samples)):
        sample = np.random.choice(urn, sample_size, replace=True)
        diff_numbers = len(set(sample))
        diff_numbers_in_sample.append(diff_numbers)
        
    # Compute and return mean
    return np.mean(diff_numbers_in_sample)    

In [47]:
# Let us run a single experiment
%time print(compute_expected_sample_size(N, S))

3.368
CPU times: user 23.7 ms, sys: 4.03 ms, total: 27.7 ms
Wall time: 23.4 ms


## Making it a bit faster

In [48]:
def compute_expected_sample_size(urn_size: int, sample_size: int, no_samples: int = 1_000) -> float:
    """Computes the expected sample size, given the urn size and sample size, by simulating the 
    experiment ``no_samples`` times."""
    # The urn is represented as a list of numbers 0, 1, ..., urn_size - 1
    urn = list(range(urn_size))
    
    # Run the experiment
    all_samples = np.random.choice(urn, (no_samples, sample_size), replace=True)
    
    return all_samples