## Marginalization

Given a joint probability table, often we'll want to know what the probability table is for just one of the random variables. We can do this by just summing or “marginalizing" out the other random variables. For example, to get the probability table for random variable $W$, we do the following:

<img src="../images/images_sec-joint-rv-marg-rows.png" alt="Drawing" style="width: 400px;"/>

We take the joint probability table (left-hand side) and compute out the row sums (which we've written in the margin).

The right-hand side table is the probability table pW for random variable $W$; we call this resulting probability distribution the marginal distribution of $W$ (put another way, it is the distribution obtained by marginalizing out the random variables that aren't $W$).

In terms of notation, the above marginalization procedure whereby we used the joint distribution of $W$ and $T$ to produce the marginal distribution of $W$ is written:

$$p_{W}(w)=\sum _{t\in \mathcal{T}}p_{W,T}(w,t),$$
 
where $\mathcal{T}$ is the set of values that random variable $T$ can take on. In fact, throughout this course, we will often omit explicitly writing out the alphabet of values that a random variable takes on, e.g., writing instead

$$p_{W}(w)=\sum _{t}p_{W,T}(w,t).$$
 
It's clear from context that we're summing over all possible values for $t$, which is going to be the values that random variable $T$ can possibly take on.

As a specific example,

$$p_{W}(\text {rainy})=\sum _{t}p_{W,T}(\text {rainy},t)=\underbrace{p_{W,T}(\text {rainy},\text {hot})}_{1/30}+\underbrace{p_{W,T}(\text {rainy},\text {cold})}_{2/15}=\frac{1}{6}.$$
 
We could similarly marginalize out random variable $W$ to get the marginal distribution $p_T$ for random variable $T$:

<img src="../images/images_sec-joint-rv-marg-cols.png" alt="Drawing" style="width: 400px;"/>

(Note that whether we write a probability table for a single variable horizontally or vertically doesn't actually matter.)

As a formula, we would write:

$$p_{T}(t)=\sum _{w}p_{W,T}(w,t).$$
 
For example,

$$p_{T}(\text {hot})=\sum _{w}p_{W,T}(w,\text {hot})=\underbrace{p_{W,T}(\text {sunny},\text {hot})}_{3/10}+\underbrace{p_{W,T}(\text {rainy},\text {hot})}_{1/30}+\underbrace{p_{W,T}(\text {snowy},\text {hot})}_{0}=\frac{1}{3}.$$

In general:

**Marginalization:** Consider two random variables $X$ and $Y$ (that take on values in the sets $\mathcal{X}$ and $\mathcal{Y}$ respectively) with joint probability table $p_{X,Y}$. For any $x∈\mathcal{X}$, the marginal probability that $X=x$ is given by

$$p_{X}(x)=\sum _{y}p_{X,Y}(x,y).$$


### Exercise: Marginalization

Consider the following two joint probability tables.

<img src="../images/images_sec-joint-rv-ex-marg.png" alt="Drawing" style="width: 400px;"/>

Express the probability table for random variable $X$ as a Python dictionary (the keys should be the Python strings 'sunny', 'rainy', and 'snowy'). (Your answer should be the Python dictionary itself, and not the dictionary assigned to a variable, so please do not include, for instance, “prob_table =" before specifying your answer. You can use fractions. If you use decimals instead, please be accurate and use at least $5$ decimal places.)

In [1]:
p_X_Y = {
    'sunny': {1: 1/4 ,  0: 1/4 },
    'rainy': {1: 1/12,  0: 1/12},
    'snowy': {1: 1/6 ,  0: 1/6 }
}

In [2]:
p_X = {}
for key, value in p_X_Y.items():
    p_X_value = 0
    for key1, value1 in value.items():
        p_X_value += value1
    
    p_X[key] = p_X_value
    
p_X

{'rainy': 0.16666666666666666, 'snowy': 0.3333333333333333, 'sunny': 0.5}

Express the probability table for random variable $Y$ as a Python dictionary (the keys should be the Python integers $0$ and $1$). (Your answer should be the Python dictionary itself, and not the dictionary assigned to a variable, so please do not include, for instance, “prob_table =" before specifying your answer. You can use fractions. If you use decimals instead, please be accurate and use at least 5 decimal places.)

In [3]:
p_Y = {}
for key, value in p_X_Y.items():
    for key1, value1 in value.items():
        if key1 not in p_Y:
            p_Y[key1] = value1
            
        else:
            p_Y[key1] += value1

p_Y           

{0: 0.5, 1: 0.5}

For two random variables $U$ and $V$ that take on values in the same alphabet, we say that $U$ and $V$ have the same distribution if $p_U(a)=p_V(a)$ for all $a$. For the above tables:

Do $W$ and $X$ have the same distribution?

In [4]:
p_W_I = {
    'sunny': {1: 1/2},
    'rainy': {0: 1/6},
    'snowy': {0: 1/3}
}

In [5]:
p_W = {}
for key, value in p_W_I.items():
    p_W_value = 0
    for key1, value1 in value.items():
        p_W_value += value1
    
    p_W[key] = p_W_value
    
p_W

{'rainy': 0.16666666666666666, 'snowy': 0.3333333333333333, 'sunny': 0.5}

In [6]:
def is_samePMF(X, Y):
    """
    Retrun True if both are same PMF else false
    
    >>> is_samePMF({5: 0.8999999999999999, 7: 0.1}, {5: 0.9, 7: 0.1})
    True
    
    >>> is_samePMF({5: 0.7, 7: 0.3}, {5: 0.9, 7: 0.1})
    False
    
    >>> is_samePMF({6: 0.8999999999999999, 7: 0.1}, {5: 0.9, 7: 0.1})
    False
    
    >>> is_samePMF(dict(), {5: 0.9, 7: 0.1})
    False
    
    >>> is_samePMF({5: 0.9, 7: 0.1}, {5: 0.1})
    False
    """
    if not X.keys() == Y.keys():
        return False 
    
    else:
        for key, value in X.items():
            if (abs(X[key] - Y[key]) > 0.00001):
                return False

        return True

if __name__ == "__main__":
    import doctest
    doctest.testmod()

In [7]:
is_samePMF(p_W, p_X)

True

Do I and Y have the same distribution?

In [8]:
p_I = {}
for key, value in p_W_I.items():
    for key1, value1 in value.items():
        if key1 not in p_I:
#             print(key1, value1)
            p_I[key1] = value1
            
        else:
#             print(p_I, key1, value1)
            p_I[key1] += value1

p_I      

{0: 0.5, 1: 0.5}

In [9]:
is_samePMF(p_I, p_Y)

True

**True or false:** Consider two random variables (S,T) and (U,V), where S and U have the same distribution, and T and V have the same distribution. Then (S,T) and (U,V) have the same joint distribution.

False

## Marginalization for Many Random Variables

What happens when we have more than two random variables? Let's build on our earlier example and suppose that in addition to weather $W$ and temperature $T$, we also had a random variable $H$ for humidity that takes on values in the alphabet {dry, humid}. Then having a third random variable, we can draw out a 3D joint probability table for random variables $W, T$, and $H$. As an example, we could have the following:

<img src="../images/images_sec-joint-rv-marg-many-rv-joint-table.png" alt="Drawing" style="width: 400px;"/>

Here, each of the cubes/boxes stores a probability. Not visible are two of the cubes in the back left column, which for this particular example both have probability values of $0$.

Then to marginalize out the humidity $H$, we would add values as follows:

<img src="../images/images_sec-joint-rv-marg-many-rv-marg.png" alt="Drawing" style="width: 400px;"/>

The result is the joint probability table for weather W and temperature $T$, shown still in 3D cubes with each cube storing a single probability.

As an equation:

$$p_{W,T}(w,t) = \sum _ h p_{W,T,H}(w, t, h).$$
 
In general, for three random variables $X, Y$, and $Z$ with joint probability table $p_\{X,Y,Z\}$, we have

$$
begin{align}
p_{X,Y}(x,y) &=& \sum_{z} p_{X,Y,Z}(x,y,z), \\
p_{X,Z}(x,z) &=& \sum_{y} p_{X,Y,Z}(x,y,z), \\
p_{Y,Z}(y,z) &=& \sum_{x} p_{X,Y,Z}(x,y,z).
\end{align}
$$

Note that we can marginalize out different random variables in succession. For example, given joint probability table $p_\{X,Y,Z\}$, if we wanted the probability table $p_X$, we can get it by marginalizing out the two random variables $Y$ and $Z$:

$$p_ X(x) = \sum _{y} p_{X,Y}(x,y) = \sum _{y} \Big( \sum _{z} p_{X,Y,Z}(x,y,z) \Big).$$
 
Even with more than three random variables, the idea is the same. For example, with four random variables $W, X, Y$, and $Z$ with joint probability table $p_\{X,Y,Z\}$, if we want the joint probability table for $X$ and $Y$, we would do the following:

$$p_{X,Y}(x, y) = \sum_w \Big( \sum_z p_{W,X,Y,Z}(w,x,y,z) \Big).$$

### Exercise: Marginalization for Many Random Variables

Suppose that we have the joint probability table $p_{V,W,X,Y,Z}$ where random variable $V$ takes on $k$ values (i.e., the alphabet for $V$ has $k$ elements in it), $W$ takes on $ℓ$ values, $X$ takes on $m$ values, $Y$ takes on $n$ values, and $Z$ takes on $o$ values.

1.  **Question:** How many entries are in the joint probability table $p_{V,W,X,Y,Z}$?

    **Answer:** $k \times \ell \times m \times n \times o$
    
    $ $
2.  **Question:** If we marginalize out $X$ and $Z$, the resulting joint probability table is for which random    variables? (You can select multiple options.)

    **Answer:** $V, W$ and $Y$
    
    $ $
3. **Question:** If we marginalize out $V, Y,$ and $Z$, the resulting joint probability table has how many entries?

    **Answer:** $\ell \times m$

Now suppose that we have the joint probability table $p_{X,Y,Z}$ for three random variables $X, Y,$ and $Z$. We want to compute the probability table $p_X$ for random variable $X$.

1. **True or false:** If we marginalize out $Z$ first and then $Y$, or if we marginalize out $Y$ first and then $Z$, we get the same answer for the probability table $p_X$. In other words, we have

    $$p_X(x) = \sum _y \Big( \sum_z p_{X,Y,Z}(x,y,z)\Big) = \sum_z \Big( \sum_y p_{X,Y,Z}(x,y,z) \Big).$$

    **Answer:** True