### Basic settings

**Source:** https://www.facebook.com/mameaw14/posts/1783992211619138

There are $N$ members.  Each member has $K$ types of pictures.  Each pack consists of $M$ pictures of distinct members.  We want to buy $L$ packs of pictures.

### Number of members (Uniq)

We first calculate the expected number of members that you get from buying $L$ sets of pictures.  Since we are interested only in the number of members, the types can be ignored.

Let's consider member $i$.  For each pack, the probability that you get member $i$'s picture is exactly

$$M/N.$$

Thus, if we buy $L$ packs, the probability that you do not get member $i$'s picture is 

$$(1 - M/N)^L.$$

For each member $i$, let indicator random variable $X_i$ be 1 of you get at least one picture of member $i$.  Therefore,

$$E[X_i] = Pr[X_i=1] = 1 - (1-M/N)^L$$

Let random variable $X$ be the number of members that you get at least one picture.  Note that

$$X = \sum_{i=1}^{N} X_i.$$

Therefore, using linearity of expectation, we have

$$E[X] = E\left[\sum_{i=1}^{N} X_i\right] = \sum_{i=1}^{N} E[X_i] = N\left(1-(1-M/N)^L\right) = N - N(1-M/N)^L$$

#### Let's see the numbers

Let's try to plug in $N=27$ and $M=5$, and calculate $E[X]$ for various values of $L$.

In [1]:
def uniq(n,m,l):
    return n - n*((1-m/n)**l)

In [2]:
for i,ex in [(i,uniq(27,5,i)) for i in range(1,101)]:
    if (i <= 10) or (i % 5 == 0):
        print("#sets = %3d, exp. uniq = %.5f" % (i,ex))

#sets =   1, exp. uniq = 5.00000
#sets =   2, exp. uniq = 9.07407
#sets =   3, exp. uniq = 12.39369
#sets =   4, exp. uniq = 15.09856
#sets =   5, exp. uniq = 17.30253
#sets =   6, exp. uniq = 19.09836
#sets =   7, exp. uniq = 20.56163
#sets =   8, exp. uniq = 21.75392
#sets =   9, exp. uniq = 22.72541
#sets =  10, exp. uniq = 23.51700
#sets =  15, exp. uniq = 25.74903
#sets =  20, exp. uniq = 26.55069
#sets =  25, exp. uniq = 26.83862
#sets =  30, exp. uniq = 26.94204
#sets =  35, exp. uniq = 26.97918
#sets =  40, exp. uniq = 26.99252
#sets =  45, exp. uniq = 26.99731
#sets =  50, exp. uniq = 26.99904
#sets =  55, exp. uniq = 26.99965
#sets =  60, exp. uniq = 26.99988
#sets =  65, exp. uniq = 26.99996
#sets =  70, exp. uniq = 26.99998
#sets =  75, exp. uniq = 26.99999
#sets =  80, exp. uniq = 27.00000
#sets =  85, exp. uniq = 27.00000
#sets =  90, exp. uniq = 27.00000
#sets =  95, exp. uniq = 27.00000
#sets = 100, exp. uniq = 27.00000


### Number of members with comp (u_comp)

Let's consider the number of members you get at least one complete set.  The current formula is pretty involved, but it is not very hard to understand.

#### Single member $i$

Suppose we get $P$ pictures of member $i$, what is the probability that we get a complete set?  It is hard to deal with general $K$, but for $K=3$ we have a simple formula.  There are 2 cases that you do not get a complete set:

**Case 1:** You only get one type.  This occurs with probability $3\cdot (1/3)^P$.  (3 types, each picture has to be in this type.)

**Case 2:** You only get exactly two types.  Let's consider the probability that you get only type-1 and type-2.  With probability $(2/3)^P$ you do not get any type-3 pictures.  However, it can be the case that you get only type-1 or type-2.  Excluding both possibility, you have that the desired probability is

$$(2/3)^P - 2(1/3)^P.$$

There are ${3 \choose 2}=3$ possible pairs of types, thus Case 2 occurs with probability 

$$3\left((2/3)^p - 2(1/3)^P\right).$$

Since both two cases are disjoint, we can add their probabilities, yielding that we get a complete set with probability

$$1 - \left(3\cdot (1/3)^P + 3\left((2/3)^p - 2(1/3)^P\right)\right).$$

Let's call this quantity $q(P)$.  We will use $q(P)$ to find the probability that when buying $L$ packs, you get at least one complete set.  Let event $A_P$ be the event that you get $P$ pictures.  Let event $B$ be the event that you get at least one complete set.  We want to compute

$$Pr[B]=Pr\left[\bigcup_{P=0}^{L} B\cap A_P\right].$$

Since each event $B\cap A_P$ in the union are disjoint, we have that

$$Pr[B]=\sum_{P=0}^{L} Pr\left[B\cap A_P\right]=\sum_{P=0}^{L} Pr[B|A_P]\cdot Pr[A_P].$$

Recall that $Pr[B|A_P]=q(P)$ and the number of pictures that you get is a binomial random variable with parameter $(L,M/N)$.  Hence,

$$Pr[B]=\sum_{P=0}^{L} q(P)\cdot {L \choose P}(M/N)^P(1-M/N)^{L-P}.$$

#### All members

To get the expected number of members that you get complete sets, we define, for member $i$, an indicator random variable $Y_i$ to be 1 iff you get a complete set of member $i$, and let random variable $Y$ be the number of members that you get complete sets.  Thus, $Y=\sum_{i=1}^N Y_i$.

From previous section, we have $Pr[Y_i=1]=Pr[B]$.

We then have

$$E[Y] = E\left[\sum_{i=1}^{N} Y_i\right] = \sum_{i=1}^N E[Y_i] = N\cdot Pr[B]
= N\left(\sum_{P=0}^{L} q(P)\cdot {L \choose P}(M/N)^P(1-M/N)^{L-P}\right).$$

#### Let's see the numbers

We will plug in, again, $N=27$ and $M=5$.

In [3]:
def q(p):
    if p == 0:   # the formula doesn't work when p = 0
        return 0
    else:
        return 1 - (3*(1/3**p) + 3*((2**p/3**p) - 2*(1/3**p)))

In [4]:
# Let's see if q makes sense.

print([(i,q(i)) for i in range(0,10)])

[(0, 0), (1, 0.0), (2, 0.0), (3, 0.22222222222222232), (4, 0.4444444444444444), (5, 0.6172839506172839), (6, 0.7407407407407407), (7, 0.8257887517146776), (8, 0.8834019204389575), (9, 0.9221155311690291)]


For sanity check, if $P=3$, the probability is $3!/3^3 = 6/27 = 0.222222$.  So our formula seems correct.

In [5]:
from scipy.stats import binom

def prb(l):
    s = sum([q(i) * binom.pmf(i,l,5./27.)
            for i in range(0,l+1)])
    return s

In [6]:
# finally the numbers
for i,ex in [(i,27 * prb(i)) for i in range(1,101)]:
    if (i <= 10) or (i % 5 == 0):
        print("#sets = %3d, exp. u_comp = %.5f" % (i,ex))

#sets =   1, exp. u_comp = 0.00000
#sets =   2, exp. u_comp = 0.00000
#sets =   3, exp. u_comp = 0.03810
#sets =   4, exp. u_comp = 0.13830
#sets =   5, exp. u_comp = 0.31411
#sets =   6, exp. u_comp = 0.57136
#sets =   7, exp. u_comp = 0.91044
#sets =   8, exp. u_comp = 1.32792
#sets =   9, exp. u_comp = 1.81785
#sets =  10, exp. u_comp = 2.37271
#sets =  15, exp. u_comp = 5.82459
#sets =  20, exp. u_comp = 9.70830
#sets =  25, exp. u_comp = 13.37323
#sets =  30, exp. u_comp = 16.52004
#sets =  35, exp. u_comp = 19.07439
#sets =  40, exp. u_comp = 21.07557
#sets =  45, exp. u_comp = 22.60730
#sets =  50, exp. u_comp = 23.76152
#sets =  55, exp. u_comp = 24.62201
#sets =  60, exp. u_comp = 25.25880
#sets =  65, exp. u_comp = 25.72762
#sets =  70, exp. u_comp = 26.07152
#sets =  75, exp. u_comp = 26.32316
#sets =  80, exp. u_comp = 26.50695
#sets =  85, exp. u_comp = 26.64101
#sets =  90, exp. u_comp = 26.73872
#sets =  95, exp. u_comp = 26.80988
#sets = 100, exp. u_comp = 26.86169
