![](coin_paradox_thumbnail.png)

## Introduction

Recently, in my stochastic processes discussion, we went over a coin-flipping problem. In my probability course, we’ve covered many coin-toss problems, but I found this particular problem especially interesting—so much so that it sent me down a rabbit hole trying to understand it. The problem went as follows:

**Imagine you have two coins. One is perfectly fair (50% chance of heads), and the other is biased, with a 60% chance of heads. Visually, the coins are indistinguishable. We then flip the left coin twice (not knowing if it is fair or biased), thus what is the variance of the number of heads denoted by $\text{Var}(\text{\# heads})$?**

## Coin Flip Variance Solution

The solution goes as follows: 

Let  
$$
X = X_1 + X_2
$$  
where $X_1$ and $X_2$ represent the results of the first and second coin flips, respectively (1 = heads, 0 = tails).

We want to compute:
$$
\operatorname{Var}(X) = \operatorname{Var}(X_1 + X_2)
$$

Using the identity:
$$
\operatorname{Var}(X_1 + X_2) = \operatorname{Var}(X_1) + \operatorname{Var}(X_2) + 2 \cdot \operatorname{Cov}(X_1, X_2)
$$

> *Note: The two coin flips are **not independent**, since flipping one head increases the likelihood that the coin is biased. Hence, we must account for covariance.*

---

### Step 1: Compute $\operatorname{Var}(X_1)$

$$
\operatorname{Var}(X_1) = \mathbb{E}[X_1^2] - (\mathbb{E}[X_1])^2
$$

Since $X_1 \in \{0, 1\}$ (Bernoulli random variable), $X_1^2 = X_1$, so:
$$
\mathbb{E}[X_1^2] = \mathbb{E}[X_1]
$$

We compute $\mathbb{E}[X_1]$ using the law of total expectation:
$$
\mathbb{E}[X_1] = \mathbb{P}(\text{Heads}) = \frac{1}{2}(0.5) + \frac{1}{2}(0.6) = 0.55
$$

$$
\operatorname{Var}(X_1) = 0.55 - (0.55)^2 = 0.55 - 0.3025 = 0.2475
$$

By symmetry:
$$
\operatorname{Var}(X_2) = \operatorname{Var}(X_1) = 0.2475
$$

---

### Step 2: Compute $\operatorname{Cov}(X_1, X_2)$

We use:
$$
\operatorname{Cov}(X_1, X_2) = \mathbb{E}[X_1 X_2] - \mathbb{E}[X_1]\mathbb{E}[X_2]
$$

#### Compute $\mathbb{E}[X_1 X_2]$

This is the probability that both flips are heads:
$$
\mathbb{P}(\text{both H}) = \mathbb{P}(\text{both H} \mid \text{fair}) \cdot \mathbb{P}(\text{fair}) + \mathbb{P}(\text{both H} \mid \text{biased}) \cdot \mathbb{P}(\text{biased})
$$

$$
= (0.5)(0.5)^2 + (0.5)(0.6)^2 = 0.5(0.25) + 0.5(0.36) = 0.125 + 0.18 = 0.305
$$

#### Compute $\mathbb{E}[X_1]\mathbb{E}[X_2]$

$$
= (0.55)^2 = 0.3025
$$

So,
$$
\operatorname{Cov}(X_1, X_2) = 0.305 - 0.3025 = 0.0025
$$

---

### Final Step: Combine All

$$
\operatorname{Var}(X) = \operatorname{Var}(X_1) + \operatorname{Var}(X_2) + 2 \cdot \operatorname{Cov}(X_1, X_2)
$$

$$
= 0.2475 + 0.2475 + 2(0.0025) = 0.495 + 0.005 = \boxed{0.5}
$$

## Interpretting Our Answer

I don’t know about you, but I found it hard to believe that the answer was actually 1/2. 

For example, we know: 

- A fair coin (p=0.5) flipped twice has variance 0.5
- A biased coin (p=0.6) flipped twice has variance 0.48
- We have a 50/50 chance of using either coin

We can also graph the relationship of variance to probability of heads:

In [8]:
#| echo: false

import plotly.io as pio
pio.renderers.default="iframe"

In [9]:
#| echo: false

import numpy as np
import plotly.graph_objects as go
from plotly.subplots import make_subplots

# Number of coin flips
n = 2

# Create an array of probability values from 0 to 1
p_values = np.linspace(0, 1, 1000)

# Calculate variance for each probability value
# For binomial distribution: Var(X) = n * p * (1-p)
variances = n * p_values * (1 - p_values)

# Create a plotly figure
fig = make_subplots()

# Add the main curve
fig.add_trace(
    go.Scatter(
        x=p_values,
        y=variances,
        mode='lines',
        name='Variance',
        line=dict(color='blue', width=3),
        fill='tozeroy',
        fillcolor='rgba(0, 0, 255, 0.1)',
        hovertemplate='Probability: %{x:.3f}<br>Variance: %{y:.3f}<extra></extra>'
    )
)

# Add points for p=0.5 (fair coin) and p=0.6 (biased coin)
fig.add_trace(
    go.Scatter(
        x=[0.5],
        y=[n * 0.5 * 0.5],
        mode='markers+text',
        name='Fair coin (p=0.5)',
        marker=dict(color='red', size=12),
        text=["Fair coin"],
        textposition="top center",
        hovertemplate='Fair coin<br>Probability: 0.5<br>Variance: 0.5<extra></extra>'
    )
)

fig.add_trace(
    go.Scatter(
        x=[0.6],
        y=[n * 0.6 * 0.4],
        mode='markers+text',
        name='Biased coin (p=0.6)',
        marker=dict(color='green', size=12),
        textposition="top center",
        hovertemplate='Biased coin<br>Probability: 0.6<br>Variance: %{y:.3f}<extra></extra>'
    )
)

# Add a point for maximum variance at p=0.5
fig.add_trace(
    go.Scatter(
        x=[0.5],
        y=[0.5],
        mode='text',
        textposition="top center",
        showlegend=False,
        hoverinfo='none'
    )
)

# Update the layout
fig.update_layout(
    title='Variance of Number of Heads in Two Coin Flips',
    xaxis_title='Probability of Heads (p)',
    yaxis_title='Variance',
    xaxis=dict(range=[0, 1], tickformat='.1f'),
    yaxis=dict(range=[0, 0.6]),
    hovermode='closest',
    legend=dict(x=0.02, y=0.98),
    template='plotly_white',
    width=800,
    height=500
)

# Add grid lines
fig.update_xaxes(showgrid=True, gridwidth=1, gridcolor='lightgray')
fig.update_yaxes(showgrid=True, gridwidth=1, gridcolor='lightgray')

# Add a horizontal line at y=0
fig.add_shape(
    type="line",
    x0=0,
    y0=0,
    x1=1,
    y1=0,
    line=dict(color="black", width=1, dash="solid"),
)

# Add annotations for key values
annotations = [
    dict(
        x=0, 
        y=0, 
        xref="x", 
        yref="y",
        text="Var = 0 at p=0",
        showarrow=True,
        arrowhead=2,
        ax=40,
        ay=40
    ),
    dict(
        x=1, 
        y=0, 
        xref="x", 
        yref="y",
        text="Var = 0 at p=1",
        showarrow=True,
        arrowhead=2,
        ax=-40,
        ay=40
    )
]
fig.update_layout(annotations=annotations)

fig.show()

So, kind of like a magic trick, how is it possible that with a biased coin, our variance is the same as if we flipped two fair coins?

The answer is actually in the wording of the problem. By adding the uncertainty about which coin we're using (we picked the left coin not knowing whether it's fair or biased), we perfectly compensate for the reduced variance of the biased coin. On top of that, it wouldn't matter whether the probability of heads was 0.6 or 0.7 or 0.8. For any probability of heads for the biased coin, our variance of heads for the two coin flips will always be 0.5.

It was at this point that I left the discussion knowing the answer to the paradox, however, I still wasn’t fully convinced. Thus, as any curious mathematician would do, I sought to understand why our solution works the way it does by generalizing the problem.


## Generalizing the Problem

**Imagine you have two coins. One is perfectly fair (50% chance of heads), and the other is biased, but you don't necessarily know its exact bias – just that it has some probability *p* of landing heads.**

Now, consider this experiment:

1. You randomly pick one of the two coins (with equal probability, 0.5 each).
2. You flip the *chosen* coin twice.
3. You count the total number of heads, let's call this *Y*.

Let's define our variables more formally:

- $C$: The random variable representing the chosen coin.
    * $C = F$ (Fair coin) with $P(C=F) = 0.5$
    * $C = B$ (Biased coin) with $P(C=B) = 0.5$
- Probabilities of Heads:
    * $P(\text{Heads} | C=F) = p_F = 0.5$
    * $P(\text{Heads} | C=B) = p_B = p$ (where $0 \leq p \leq 1$)
- **$X_1, X_2$**: The outcomes of the two flips (1 for heads, 0 for tails). These are Bernoulli trials *given* the chosen coin.
- **$Y$**: The total number of heads in the two flips. $Y = X_1 + X_2$. *Given* the coin, $Y$ follows a Binomial distribution $\text{Bin}(n=2, \text{probability}=p_C)$.

Our goal is to calculate $\text{Var}(Y)$, the variance of the total number of heads *before* we know which coin was chosen.

### Explaining the Constant Variance

The key to understanding this result lies in the **Law of Total Variance** (also known as Eve's Law). For random variables $Y$ and $C$, it states:

$$\text{Var}(Y) = E[\text{Var}(Y | C)] + \text{Var}(E[Y | C])$$

Let's break this down:

1. **$E[\text{Var}(Y | C)]$**: The *expected value of the conditional variance*. This is the average variance *within* each coin type. We calculate the variance of $Y$ *assuming* we know which coin was picked ($\text{Var}(Y | C=F)$ and $\text{Var}(Y | C=B)$) and then find the weighted average of these variances based on the probability of picking each coin.

2. **$\text{Var}(E[Y | C])$**: The *variance of the conditional expectation*. This measures the variability *between* the average outcomes of the different coin types. We calculate the expected value of $Y$ *assuming* we know which coin was picked ($E[Y | C=F]$ and $E[Y | C=B]$) and then find the variance of these expected values, treating $E[Y | C]$ itself as a random variable that depends on $C$.

Let's calculate each term.

### Step 1: Conditional Expectations and Variances

First, let's find the expected value and variance of $Y$, *conditional* on knowing which coin was chosen.

* **Given the Fair Coin ($C=F$):**
    * The number of heads $Y$ follows $\text{Bin}(n=2, p=0.5)$.
    * $E[Y | C=F] = n \cdot p_F = 2 \times 0.5 = 1$
    * $\text{Var}(Y | C=F) = n \cdot p_F \cdot (1 - p_F) = 2 \cdot 0.5 \cdot (1 - 0.5) = 2 \cdot 0.5 \times 0.5 = 0.5$
* **Given the Biased Coin ($C=B$):**
    * The number of heads $Y$ follows $\text{Bin}(n=2, p=p)$.
    * $E[Y | C=B] = n \cdot p_B = 2 \times p$
    * $\text{Var}(Y | C=B) = n \cdot p_B \cdot (1 - p_B) = 2 \cdot p \times (1 - p) = 2p(1-p)$

### Step 2: Calculate $E[\text{Var}(Y | C)]$

This is the average of the conditional variances, weighted by the probability of choosing each coin:

$$E[\text{Var}(Y | C)] = \text{Var}(Y | C=F) \cdot P(C=F) + \text{Var}(Y | C=B) \cdot P(C=B)$$

$$E[\text{Var}(Y | C)] = (0.5) \cdot (0.5) + (2p(1-p)) \cdot (0.5)$$

$$E[\text{Var}(Y | C)] = 0.25 + p(1-p)$$

$$E[\text{Var}(Y | C)] = 0.25 + p - p^2$$

### Step 3: Calculate $\text{Var}(E[Y | C])$

This is the variance of the conditional means. We have a random variable $E[Y | C]$ which takes the value $E[Y | C=F] = 1$ with probability 0.5, and the value $E[Y | C=B] = 2p$ with probability 0.5.

To find its variance, we use $\text{Var}(X) = E[X^2] - (E[X])^2$.

* First, find the mean $E[E[Y | C]]$:

    $$E[E[Y | C]] = E[Y | C=F] \cdot P(C=F) + E[Y | C=B] \cdot P(C=B)$$
    
    $$E[E[Y | C]] = (1) \cdot (0.5) + (2p) \cdot (0.5)$$
    
    $$E[E[Y | C]] = 0.5 + p$$
    
    *(Note: By the law of total expectation, this is also $E[Y]$).*

* Next, find the expected value of the square $E[(E[Y | C])^2]$:

    $$E[(E[Y | C])^2] = (E[Y | C=F])^2 \cdot P(C=F) + (E[Y | C=B])^2 \cdot P(C=B)$$
    
    $$E[(E[Y | C])^2] = (1)^2 \cdot (0.5) + (2p)^2 \cdot (0.5)$$
    
    $$E[(E[Y | C])^2] = 1 \cdot 0.5 + 4p^2 \cdot 0.5$$
    
    $$E[(E[Y | C])^2] = 0.5 + 2p^2$$

* Now, calculate the variance:

    $$\text{Var}(E[Y | C]) = E[(E[Y | C])^2] - (E[E[Y | C]])^2$$
    
    $$\text{Var}(E[Y | C]) = (0.5 + 2p^2) - (0.5 + p)^2$$
    
    $$\text{Var}(E[Y | C]) = 0.5 + 2p^2 - (0.25 + p + p^2)$$
    
    $$\text{Var}(E[Y | C]) = 0.5 + 2p^2 - 0.25 - p - p^2$$
    
    $$\text{Var}(E[Y | C]) = 0.25 - p + p^2$$

### Step 4: Combine the Terms

Now we add the two components according to the Law of Total Variance:

$$\text{Var}(Y) = E[\text{Var}(Y | C)] + \text{Var}(E[Y | C])$$

$$\text{Var}(Y) = (0.25 + p - p^2) + (0.25 - p + p^2)$$

Notice how the terms involving $p$ and $p^2$ cancel out!

$$\text{Var}(Y) = 0.25 + 0.25 + (p - p) + (-p^2 + p^2)$$

$$\text{Var}(Y) = 0.5$$

The variance $\text{Var}(Y)$ is indeed $0.5$, regardless of the value of $p$.

### Intuition: Why Does $p$ Cancel Out?

The cancellation happens because the two components of the total variance move in opposite directions as the bias $p$ changes:

* **Average Within-Coin Variance ($E[\text{Var}(Y | C)] = 0.25 + p(1-p)$):** This term represents the inherent randomness *within* each coin type. The variance of a single Bernoulli or Binomial trial is maximized when $p=0.5$. So, as the biased coin's $p$ moves away from 0.5 (towards 0 or 1), its individual variance $2p(1-p)$ *decreases*. This makes the *average* variance term smaller when $p$ is far from 0.5.

* **Variance Between Coin Averages ($\text{Var}(E[Y | C]) = 0.25 - p + p^2$):** This term represents how different the *average* outcomes are for the two coins. The expected values are $E[Y|C=F]=1$ and $E[Y|C=B]=2p$. When $p=0.5$, both expectations are 1, so there's *no variance* between them ($\text{Var}(E[Y|C]) = 0.25 - 0.5 + 0.25 = 0$). As $p$ moves away from 0.5, the difference between the average outcomes (1 and 2p) *increases*, leading to a *larger* variance between the conditional expectations.

These two effects perfectly offset each other. As the bias $p$ makes one term smaller, it makes the other term larger by exactly the same amount, keeping their sum constant at 0.5.

In [15]:
#| echo: false

import numpy as np
import plotly.graph_objects as go

# Define the range of p (bias of the second coin)
p = np.linspace(0, 1, 200)

# Calculate the two components of variance
# E[Var(Y|C)] = 0.25 + p(1-p)
avg_within_coin_variance = 0.25 + p * (1 - p)

# Var(E[Y|C]) = 0.25 - p + p**2
variance_between_coin_means = 0.25 - p + p**2

# Calculate the total variance (sum of the two components)
total_variance = avg_within_coin_variance + variance_between_coin_means

# Create the figure
fig = go.Figure()

# Add the three lines with custom hover templates
fig.add_trace(
    go.Scatter(
        x=p, 
        y=avg_within_coin_variance, 
        mode='lines', 
        name='E[Var(Y | C)] (Avg. Within-Coin Variance)',
        line=dict(width=2, color='blue'),
        hovertemplate='Bias p: %{x:.3f}<br>E[Var(Y|C)]: %{y:.3f}<extra></extra>'
    )
)

fig.add_trace(
    go.Scatter(
        x=p, 
        y=variance_between_coin_means, 
        mode='lines', 
        name='Var(E[Y | C]) (Variance Between Coin Means)',
        line=dict(width=2, color='green'),
        hovertemplate='Bias p: %{x:.3f}<br>Var(E[Y|C]): %{y:.3f}<extra></extra>'
    )
)

fig.add_trace(
    go.Scatter(
        x=p, 
        y=total_variance, 
        mode='lines', 
        name='Var(Y) (Total Variance)',
        line=dict(width=3, dash='dash', color='black'),
        hovertemplate='Bias p: %{x:.3f}<br>Total Variance: %{y:.3f}<extra></extra>'
    )
)

# Add a point for p=0.6 to highlight that specific case
fig.add_trace(
    go.Scatter(
        x=[0.6],
        y=[0.5],
        mode='markers+text',
        marker=dict(size=10, color='red'),
        text=["p=0.6"],
        textposition="top center",
        name="Biased Coin (p=0.6)",
        hovertemplate='Bias p: 0.6<br>Total Variance: 0.5<extra></extra>'
    )
)

# Add annotations for the variance components at p=0.6
fig.add_trace(
    go.Scatter(
        x=[0.6],
        y=[0.25 + 0.6 * (1 - 0.6)],
        mode='markers',
        marker=dict(size=8, color='blue'),
        name="Within-Coin at p=0.6",
        hovertemplate='E[Var(Y|C)] at p=0.6: %{y:.3f}<extra></extra>'
    )
)

fig.add_trace(
    go.Scatter(
        x=[0.6],
        y=[0.25 - 0.6 + 0.6**2],
        mode='markers',
        marker=dict(size=8, color='green'),
        name="Between-Coin at p=0.6",
        hovertemplate='Var(E[Y|C]) at p=0.6: %{y:.3f}<extra></extra>'
    )
)

# Update layout
fig.update_layout(
    title='Components of Variance vs. Biased Coin Probability',
    xaxis_title="Bias 'p' of the Second Coin (P(Heads))",
    yaxis_title='Variance Component',
    yaxis=dict(range=[0, 0.6]),  # Set ylim to better see the components
    legend=dict(y=1.0, x=1.9, xanchor='right'),
    template='plotly_white',
    width=800,
    height=500,
    hovermode='closest'
)

# Add grid
fig.update_xaxes(showgrid=True, gridwidth=1, gridcolor='lightgray')
fig.update_yaxes(showgrid=True, gridwidth=1, gridcolor='lightgray')

# Add shapes to highlight special cases
fig.add_shape(
    type="line",
    x0=0.5, y0=0,
    x1=0.5, y1=0.5,
    line=dict(
        color="gray",
        width=1,
        dash="dot",
    )
)

# Add annotation for the fair coin (p=0.5)
fig.add_annotation(
    x=0.5,
    y=0.1,
    text="Fair Coin (p=0.5)",
    showarrow=True,
    arrowhead=2,
    arrowsize=1,
    arrowwidth=1,
    ax=40
)

fig.show()

## Conclusion

The constant variance of 0.5 in this coin mixture problem is a fascinating result that stems directly from the Law of Total Variance. While it seems paradoxical that the overall variability doesn't depend on the specific bias $p$ of the second coin, the mathematical breakdown shows a perfect cancellation effect. The average variance *within* the coin types and the variance *between* the coin types' average outcomes compensate for each other precisely.

This example highlights how decomposing variance can reveal underlying structures and sometimes lead to surprising, constant results even when parameters within the mixture model are changing. It also serves as a reminder that our intuition about how probability distributions combine can sometimes be misleading!