- Suppose I randomly and independently place $n$ marbles into $m$ urns, where $m\geq2$
- Let $N_{i}$ be the number of marbles placed in urn number $i$, so $N_{i}$ is a binomial random variale with parameters $n$ and $p=1/m$
- Calculate $\rho(N_{i},N_{j})$ for $i\neq j$
- Explain your answer in the case $m=2$

____

### Simple Example 1

- Let's start thinking about this with the easiest scenario: $m=2$ and $n=1$
    - So, what we want to model is placing a single ball into one of two urns
    
- There are two possible scenarios:
    1. Drop the ball into urn 1
    2. Drop the ball into urn 2
    
- This means that:

$$
P(N_{1}=x) = \left\{\begin{matrix} 1/2 & x=0\\  1/2 & x=1 \end{matrix}\right.
$$

$$
P(N_{2}=x) = \left\{\begin{matrix} 1/2 & x=0\\  1/2 & x=1 \end{matrix}\right.
$$

- In other words, it's a 50/50 chance that the ball will end up in either urn
    - This means that the expected number of balls placed in each urn is 0.5
        - This means $E(N_{1}) = 0.5$, $E(N_{2})=0.5$

- Since if the ball is in urn 1, it cannot be in urn 2, we know that $N_{1}=1\implies N_{2}=0$
- **Recall**: $Cov(X,Y) = E[XY]-E[X]E[Y]$

- Since $N_{1}$ and $N_{2}$ can't both be non-zero at the same time, we know that $N_{1}\cdot N_{2} = 0 \implies E[N_{1}N_{2}]=0$

- Therefore

$$
Cov(N_{1},N_{2}) = 0 - (1/2)(1/2) = -1/4
$$

- Now, to calculate $\rho(N_{1},N_{2})$, we need to calculate $\sigma_{N_{1}}$ and $\sigma_{N_{2}}$
    - **Recall**: the variance is the *mean squared distance from the mean*

$$
\sigma^{2}_{N_{1}} = 1/2(0-1/2)^{2} + 1/2(1-1/2)^{2}= (1/2)(1/4+1/4) = 1/4 \implies \sigma^{2}_{N_{2}} = 1/4
$$

$$
\implies \sigma_{N_{1}} = 1/2 = \sigma_{N_{2}}
$$

$$
\implies \rho(N_{1},N_{2}) = \frac{-1/4}{(1/2)(1/2)} = -1
$$

- This result makes sense since if $N_{1}$ increases by one, $N_{2}$ **must decrease** by one
    - Therefore, they're perfectly negatively correlated

### Simple Example 2

- Now, let's assume we have two balls and three urns

- $P(N_{1} = 0) = 4/9$, $P(N_{1}=1) = 4/9$ and $P(N_{1}=2) = 1/9$
    - These probabilities are the same for $N_{2}$ and $N_{3}$

- Let's think about $E[N_{1}N_{2}]$
    - The possible values are 0, or 1
        - *Why can't it be 2?*
            - Because if one is 2, the other must be zero (since there are only two balls)
        - $N_{1}N_{2} = 1 \implies$ both $N_{1}$ and $N_{2}$ are 1
            - $P(N_{1}N_{2} = 1) = P(N_{1} = 1 \cap N_{2} = 1) = 1 - P(N_{3}=1) - P(N_{3}=2) = 1 - 4/9 - 1/9$
        - $N_{1}N_{2} = 0 \implies$ at least one of $N_{1}$ or $N_{2}$ is zero
            - This has probability $7/9$
                - $P(N_{
                - $P(N_{1}=0\cap N_{2}=0) = P(N_{3}=2) = 1/9$
                - 

$$
\implies E[N_{1}N_{2}] = (1/2)(0) + (1/2)(1) = 1/2
$$

$$
E[N_{1}] = (4/9)(0) + (4/9)(1) + (1/9)(2) = 4/9 + 2/9 = 6/9 = 2/3
$$

- $E[N_{2}] = E[N_{1}]$

$$
\implies Cov(N_{1},N_{2}) = E[N_{1}N_{2}]-E[N_{1}]E[N_{2}] = 1/2 - (2/3)(2/3) = 1/18
$$

In [22]:
import numpy as np
import pandas as pd

In [46]:
n_trials = 1000000

random_array = np.random.randint(1,4,size=(n_trials, 2))

In [47]:
df = pd.DataFrame(index=range(n_trials))
df['N1'] = (pd.DataFrame(random_array)==1).sum(axis=1)
df['N2'] = (pd.DataFrame(random_array)==2).sum(axis=1)
df['N3'] = (pd.DataFrame(random_array)==3).sum(axis=1)

In [48]:
(df['N1']*df['N2']).value_counts(normalize=True)

0    0.777992
1    0.222008
dtype: float64

In [52]:
len(df[(df['N1']==1) & (df['N2']==0)])/n_trials

0.222168

In [45]:
df.drop_duplicates()

Unnamed: 0,N1,N2,N3
0,0,0,2
1,1,1,0
2,0,2,0
3,2,0,0
4,1,0,1
9,0,1,1


In [50]:
2/9

0.2222222222222222