# Prior Elicitation


A key paper in this area from [Mikkola et al. 2024](https://projecteuclid.org/journals/bayesian-analysis/volume-19/issue-4/Prior-Knowledge-Elicitation-The-Past-Present-and-Future/10.1214/23-BA1381.full).

And a great blog post from [Michael Bettancourt](https://betanalpha.github.io/assets/case_studies/prior_modeling.html).

# Whence piors?

A key question in Bayesian modelling lies in what are priors anyhow? What do they represent? The most succinct definition is that priors are representations of our personal beliefs. But how can we do that? It seems both sensible and nonsense at the same time

> "...statistics are always to some extent constructed on the basis of judgements, and it would be an obvious delusion to think the full complexity of personal experience can be unambiguously coded and put into a spreadsheet or other software." -- David Spiegelhalter

And yet this is ultimately what Bayes theorem demands of us

> "In Bayesian theory, a 'prior' represents one's personal degree of belief before considering current evidence." -- ET Jaynes

So if this is the case, and we wish to set priors in a principled way, how can we go about it? How should we go about specifying our own priors? And how can we specify the priors of others? How do we translate our intuitive sense of plausibility into a probability distribution?

In [None]:
# Import packages
%matplotlib inline
from matplotlib import pyplot as plt
import pandas as pd
import numpy as np
import seaborn as sns
from scipy import stats
import scipy as sp
from scipy.optimize import minimize
import warnings
warnings.filterwarnings('ignore')

# Set style
sns.set_style('whitegrid')
plt.rcParams['figure.figsize'] = (10, 6)

## The Triangular Distribution

When we ask students (or domain experts) about a question, such as "what percentage of the earth does the Atlantic Ocean cover", we typically don't get a single number. Instead, we might hear:

> *"I'm guessing it's around 15% of Earth's surface, but it could be as low as 10% or as high as 30%"*

This gives us three key values:
- **Minimum** (a): The lowest plausible value (10%)
- **Mode** (c): The most likely value (15%)
- **Maximum** (b): The highest plausible value (30%)

The **triangular distribution** is perfect for capturing this kind of belief. It is useful for elicitation because it requires only three intuitive parameters: a minimum, a maximum, and a mode (most likely value). It has been widely used in conservation decision making (a niche, but very interesting field): https://scholar.google.com/citations?user=O4YYKCsAAAAJ&hl=en&oi=ao

In [None]:
# Elicited parameters from our hypothetical person
min_val = 0.10  # 10% minimum
mode_val = 0.15  # 15% most likely
max_val = 0.30  # 30% maximum

# The triangular distribution in scipy needs the mode scaled between 0 and 1
# c = (mode - min) / (max - min)
c_param = (mode_val - min_val) / (max_val - min_val)

# Create the triangular distribution
tri_dist = stats.triang(c=c_param, loc=min_val, scale=max_val - min_val)

print(f"Elicited prior for Atlantic Ocean coverage:")
print(f"  Minimum: {min_val*100:.0f}%")
print(f"  Mode:    {mode_val*100:.0f}%")
print(f"  Maximum: {max_val*100:.0f}%")
print(f"\nTriangular distribution parameters:")
print(f"  c = {c_param:.3f}, loc = {min_val:.2f}, scale = {max_val - min_val:.2f}")

So using the `scipy` package in python, we get out parameters from our input min, mode, max values, namely c (the location of the mode between the minimum and maximum values), loc (minimum value), and scale (the range of the distribution). These are most easily understood visually:

In [None]:
# Visualize the elicited prior
x = np.linspace(0, 1, 1000)
y = tri_dist.pdf(x)

plt.figure(figsize=(12, 6))
plt.plot(x, y, 'b-', linewidth=2, label='Triangular prior')
plt.axvline(min_val, color='red', linestyle='--', alpha=0.5, label=f'Min: {min_val*100:.0f}%')
plt.axvline(mode_val, color='green', linestyle='--', alpha=0.5, label=f'Mode: {mode_val*100:.0f}%')
plt.axvline(max_val, color='red', linestyle='--', alpha=0.5, label=f'Max: {max_val*100:.0f}%')
plt.fill_between(x, y, alpha=0.3)
plt.xlabel('Proportion of Earth covered by Atlantic Ocean', fontsize=12)
plt.ylabel('Probability density', fontsize=12)
#plt.title('Elicited Prior: Triangular Distribution', fontsize=14)
plt.legend(fontsize=10)
plt.xlim(0, 1)
plt.grid(alpha=0.3)
plt.savefig('triangular.jpg',dpi=300);

In [None]:
# Summary statistics of our elicited prior
tri_mean = tri_dist.mean()
tri_std = tri_dist.std()
tri_median = tri_dist.median()

print(f"Summary statistics of triangular prior:")
print(f"  Mean:   {tri_mean:.3f} ({tri_mean*100:.1f}%)")
print(f"  Median: {tri_median:.3f} ({tri_median*100:.1f}%)")
print(f"  Std:    {tri_std:.3f} ({tri_std*100:.1f}%)")
print(f"\nNote: For triangular distributions, the mean is (a + b + c)/3")
print(f"      Calculated: ({min_val:.2f} + {max_val:.2f} + {mode_val:.2f})/3 = {(min_val + max_val + mode_val)/3:.3f}")

## Summary Statistics of the Triangular Distribution

For a triangular distribution with parameters:
- $\text{min}$ = minimum value
- $\text{max}$ = maximum value  
- $\text{mode}$ = most likely value

### Mean (Expected Value)

The mean of a triangular distribution is simply the average of the three parameters:

$$\mu = \frac{\text{min} + \text{max} + \text{mode}}{3}$$

**Intuition**: The mean balances the three "anchor points" equally. This differs from the uniform distribution where the mean would be just $\frac{\text{min} + \text{max}}{2}$.

**Example**: For $\text{min} = 0.10$, $\text{mode} = 0.15$, $\text{max} = 0.30$:

$$\mu = \frac{0.10 + 0.30 + 0.15}{3} = \frac{0.55}{3} = 0.183$$

In [None]:
# Visualize the elicited prior
x = np.linspace(0, 1, 1000)
y = tri_dist.pdf(x)

plt.figure(figsize=(12, 6))
plt.plot(x, y, 'b-', linewidth=2, label='Triangular prior')
plt.axvline(min_val, color='red', linestyle='--', alpha=0.5, label=f'Min: {min_val*100:.0f}%')
plt.axvline(mode_val, color='green', linestyle='--', alpha=0.5, label=f'Mode: {mode_val*100:.0f}%')
plt.axvline(max_val, color='red', linestyle='--', alpha=0.5, label=f'Max: {max_val*100:.0f}%')
plt.axvline(tri_mean, color='blue', linestyle=':', alpha=0.5, label=f'Mean: {tri_mean:.3f}')
plt.fill_between(x, y, alpha=0.3)
plt.xlabel('Proportion of Earth covered by Atlantic Ocean', fontsize=12)
plt.ylabel('Probability density', fontsize=12)
#plt.title('Elicited Prior: Triangular Distribution', fontsize=14)
plt.legend(fontsize=10)
plt.xlim(0, 1)
plt.grid(alpha=0.3)
plt.savefig('triangular2.jpg',dpi=300);

### Median

The median depends on whether the mode is below or above the midpoint:

$$\text{Median} = \begin{cases}
\text{min} + \sqrt{\frac{(\text{max}-\text{min})(\text{mode}-\text{min})}{2}} & \text{if } \text{mode} \geq \frac{\text{min}+\text{max}}{2} \\[10pt]
\text{max} - \sqrt{\frac{(\text{max}-\text{min})(\text{max}-\text{mode})}{2}} & \text{if } \text{mode} < \frac{\text{min}+\text{max}}{2}
\end{cases}$$

**Intuition**: The median is the value that splits the area under the triangle in half.

**Example**: For $\text{min} = 0.10$, $\text{mode} = 0.15$, $\text{max} = 0.30$:

First, find the midpoint:
$$\frac{\text{min} + \text{max}}{2} = \frac{0.10 + 0.30}{2} = 0.20$$

Since $\text{mode} = 0.15 < 0.20$, we use the second formula:

$$\text{Median} = \text{max} - \sqrt{\frac{(\text{max}-\text{min})(\text{max}-\text{mode})}{2}}$$

$$= 0.30 - \sqrt{\frac{(0.30-0.10)(0.30-0.15)}{2}}$$

$$= 0.30 - \sqrt{\frac{(0.20)(0.15)}{2}}$$

$$= 0.30 - \sqrt{\frac{0.03}{2}}$$

$$= 0.30 - \sqrt{0.015}$$

$$= 0.30 - 0.122 = 0.178$$

In [None]:
# Visualize the elicited prior
x = np.linspace(0, 1, 1000)
y = tri_dist.pdf(x)

plt.figure(figsize=(12, 6))
plt.plot(x, y, 'b-', linewidth=2, label='Triangular prior')
plt.axvline(min_val, color='red', linestyle='--', alpha=0.5, label=f'Min: {min_val*100:.0f}%')
plt.axvline(mode_val, color='green', linestyle='--', alpha=0.5, label=f'Mode: {mode_val*100:.0f}%')
plt.axvline(max_val, color='red', linestyle='--', alpha=0.5, label=f'Max: {max_val*100:.0f}%')
plt.axvline(tri_mean, color='blue', linestyle=':', alpha=0.5, label=f'Mean: {tri_mean:.3f}')
plt.axvline(tri_median, color='blue', linestyle='--', alpha=0.5, label=f'Median: {tri_median:.3f}')
plt.fill_between(x, y, alpha=0.3)
plt.xlabel('Proportion of Earth covered by Atlantic Ocean', fontsize=12)
plt.ylabel('Probability density', fontsize=12)
#plt.title('Elicited Prior: Triangular Distribution', fontsize=14)
plt.legend(fontsize=10)
plt.xlim(0, 1)
plt.grid(alpha=0.3)
plt.savefig('triangular3.jpg',dpi=300);

### Standard Deviation

The variance of a triangular distribution is:

$$\sigma^2 = \frac{\text{min}^2 + \text{max}^2 + \text{mode}^2 - \text{min} \cdot \text{max} - \text{min} \cdot \text{mode} - \text{max} \cdot \text{mode}}{18}$$

And the standard deviation is:

$$\sigma = \sqrt{\frac{\text{min}^2 + \text{max}^2 + \text{mode}^2 - \text{min} \cdot \text{max} - \text{min} \cdot \text{mode} - \text{max} \cdot \text{mode}}{18}}$$

**Intuition**: The spread increases as the range $(\text{max} - \text{min})$ increases and as the mode moves away from the center.

**Example**: For $\text{min} = 0.10$, $\text{mode} = 0.15$, $\text{max} = 0.30$:

First, calculate each term:
- $\text{min}^2 = (0.10)^2 = 0.01$
- $\text{max}^2 = (0.30)^2 = 0.09$
- $\text{mode}^2 = (0.15)^2 = 0.0225$
- $\text{min} \cdot \text{max} = (0.10)(0.30) = 0.03$
- $\text{min} \cdot \text{mode} = (0.10)(0.15) = 0.015$
- $\text{max} \cdot \text{mode} = (0.30)(0.15) = 0.045$

Now substitute:

$$\sigma^2 = \frac{0.01 + 0.09 + 0.0225 - 0.03 - 0.015 - 0.045}{18}$$

$$= \frac{0.1225 - 0.09}{18}$$

$$= \frac{0.0325}{18} = 0.00181$$

Therefore:

$$\sigma = \sqrt{0.00181} = 0.043$$


### Properties to Remember

1. **Mean is intuitive**: Just average the three values
2. **Symmetric case**: When $\text{mode} = \frac{\text{min}+\text{max}}{2}$ (mode at center), the mean, median, and mode all coincide
3. **Skewness**: When mode is closer to min, the distribution is right-skewed; when closer to max, it's left-skewed
4. **Maximum standard deviation**: Occurs when the mode is at one of the extremes ($\text{mode} = \text{min}$ or $\text{mode} = \text{max}$), giving $\sigma = \frac{\text{max}-\text{min}}{\sqrt{18}} \approx 0.236(\text{max}-\text{min})$

In [None]:
# Visualize the elicited prior
x = np.linspace(0, 1, 1000)
y = tri_dist.pdf(x)

plt.figure(figsize=(12, 6))
plt.plot(x, y, 'b-', linewidth=2, label='Triangular prior')
plt.axvline(min_val, color='red', linestyle='--', alpha=0.5, label=f'Min: {min_val*100:.0f}%')
plt.axvline(mode_val, color='green', linestyle='--', alpha=0.5, label=f'Mode: {mode_val*100:.0f}%')
plt.axvline(max_val, color='red', linestyle='--', alpha=0.5, label=f'Max: {max_val*100:.0f}%')
plt.axvline(tri_mean, color='blue', linestyle=':', alpha=0.5, label=f'Mean: {tri_mean:.3f}')
plt.axvline(tri_median, color='blue', linestyle='--', alpha=0.5, label=f'Median: {tri_median:.3f}')
plt.axvline(tri_mean+2*tri_std, color='blue', linestyle='-', alpha=0.5, label=f'~L95: {tri_mean+2*tri_std:.3f}')
plt.axvline(tri_mean-2*tri_std, color='blue', linestyle='-', alpha=0.5, label=f'~U95: {tri_median-2*tri_std:.3f}')
plt.fill_between(x, y, alpha=0.3)
plt.xlabel('Proportion of Earth covered by Atlantic Ocean', fontsize=12)
plt.ylabel('Probability density', fontsize=12)
#plt.title('Elicited Prior: Triangular Distribution', fontsize=14)
plt.legend(fontsize=10)
plt.xlim(0, 1)
plt.grid(alpha=0.3)
plt.savefig('triangular4.jpg',dpi=300);

## Converting to a Beta Distribution
    
While the triangular distribution is great for elicitation, the **Beta distribution** is more commonly used as a prior for proportions in Bayesian analysis. The Beta distribution is the conjugate prior for the binomial likelihood, which makes it computationally convenient and provides nice interpretability.

We can match the moments (mean and variance) of our triangular distribution to find equivalent Beta parameters:

From the formulas:

$$\mu = \frac{\alpha}{\alpha + \beta}$$

$$\sigma^2 = \frac{\alpha\beta}{(\alpha + \beta)^2(\alpha + \beta + 1)}$$

We can derive:

$$\text{common} = \frac{\mu(1-\mu)}{\sigma^2} - 1$$

$$\alpha = \mu \cdot \text{common}$$

$$\beta = (1-\mu) \cdot \text{common}$$

so given the triangular distribution with:
- $\mu = 0.183$ (mean)
- $\sigma = 0.043$ (standard deviation)
- $\sigma^2 = (0.043)^2 = 0.001849$ (variance)

### Step 1: Calculate common

$$\text{common} = \frac{\mu(1-\mu)}{\sigma^2} - 1$$

$$= \frac{0.183 \times (1 - 0.183)}{0.001849} - 1$$

$$= \frac{0.183 \times 0.817}{0.001849} - 1$$

$$= \frac{0.1495}{0.001849} - 1$$

$$= 80.86 - 1$$

$$= 79.86$$

### Step 2: Calculate α

$$\alpha = \mu \cdot \text{common}$$

$$= 0.183 \times 79.86$$

$$= 14.61$$

### Step 3: Calculate β

$$\beta = (1-\mu) \cdot \text{common}$$

$$= 0.817 \times 79.86$$

$$= 65.25$$

### Result

The equivalent Beta distribution is: **Beta(14.61, 65.25)**

### Verification

Check that we get back the original moments:

**Mean:**
$$\mu = \frac{\alpha}{\alpha + \beta} = \frac{14.61}{14.61 + 65.25} = \frac{14.61}{79.86} = 0.183 \,\checkmark$$

**Variance:**
$$\sigma^2 = \frac{\alpha\beta}{(\alpha + \beta)^2(\alpha + \beta + 1)} = \frac{14.61 \times 65.25}{(79.86)^2 \times 80.86}$$

$$= \frac{953.30}{6,377.62 \times 80.86} = \frac{953.30}{515,794} = 0.001848 \,\checkmark$

In [None]:
# Method of moments: match mean and variance
# For Beta(α, β): mean = α/(α+β), var = αβ/[(α+β)²(α+β+1)]

def beta_params_from_moments(mean, var):
    """
    Convert mean and variance to Beta distribution parameters.
    
    Parameters:
    mean: float, mean of the distribution (between 0 and 1)
    var: float, variance of the distribution
    
    Returns:
    alpha, beta: shape parameters for Beta distribution
    """
    # From the formulas:
    # mean = α/(α+β)
    # var = αβ/[(α+β)²(α+β+1)]
    # We can derive:
    common = mean * (1 - mean) / var - 1
    alpha = mean * common
    beta = (1 - mean) * common
    return alpha, beta

# Get Beta parameters matching our triangular prior
alpha, beta = beta_params_from_moments(tri_mean, tri_std**2)

print(f"Beta distribution parameters matching triangular prior:")
print(f"  α (alpha) = {alpha:.3f}")
print(f"  β (beta)  = {beta:.3f}")

# Create the Beta distribution
beta_dist = stats.beta(alpha, beta)

# Verify the match
beta_mean = beta_dist.mean()
beta_std = beta_dist.std()

print(f"\nVerification:")
print(f"  Beta mean: {beta_mean:.3f} (Triangular mean: {tri_mean:.3f})")
print(f"  Beta std:  {beta_std:.3f} (Triangular std: {tri_std:.3f})")

In [None]:
# Compare the distributions
x = np.linspace(0, 1, 1000)
y_tri = tri_dist.pdf(x)
y_beta = beta_dist.pdf(x)

plt.figure(figsize=(12, 6))
plt.plot(x, y_tri, 'b-', linewidth=2, alpha=0.7, label='Triangular (elicited)')
plt.plot(x, y_beta, 'r--', linewidth=2, alpha=0.7, label=f'Beta({alpha:.2f}, {beta:.2f})')
plt.fill_between(x, y_tri, alpha=0.2, color='blue')
plt.fill_between(x, y_beta, alpha=0.2, color='red')
plt.axvline(tri_mean, color='blue', linestyle=':', alpha=0.5, label=f'Mean: {tri_mean:.3f}')
plt.xlabel('Proportion of Earth covered by Atlantic Ocean', fontsize=12)
plt.ylabel('Probability density', fontsize=12)
#plt.title('Triangular vs Beta Distribution', fontsize=14)
plt.legend(fontsize=10)
plt.xlim(0, 1)
plt.grid(alpha=0.3)
plt.savefig('triangular_beta.jpg',dpi=300);

# Eliciting from Multiple People and Creating a Joint Prior

In real-world applications, we often want to gather prior information from multiple sources - different students, multiple experts, or various stakeholders. Rather than relying on a single person's belief, pooling multiple opinions creates a more robust prior.

Let's ask people for their beliefs about the coverage of the **Pacific Ocean**; each person provides their own minimum, mode, and maximum estimate.

### Input Elicitations

In [None]:
# ========== EDIT THESE VALUES ==========
# Each row is [minimum, mode, maximum] for one person's belief
# Values should be between 0 and 1 (proportions, not percentages)

elicitations = np.array([
    # Person   Min    Mode   Max
    [0.15, 0.30, 0.60],  # Person 1: Very uncertain, wide range
    [0.25, 0.35, 0.45],  # Person 2: More confident, narrower range
    [0.30, 0.45, 0.60],  # Person 3: Thinks it's quite large
    [0.20, 0.25, 0.35],  # Person 4: Conservative, lower estimate
    [0.25, 0.33, 0.50],  # Person 5: Moderate uncertainty
    [0.30, 0.40, 0.55],  # Person 6: Higher estimate
    [0.18, 0.28, 0.40],  # Person 7: Lower estimate
    [0.26, 0.30, 0.36],  # Person 8: Very confident, tight range
    [0.20, 0.35, 0.55],  # Person 9: Wide uncertainty
    [0.24, 0.32, 0.42],  # Person 10: Moderate confidence
])

# ========================================

### Create Individual Triangular Distributions

In [None]:
# Create triangular distributions for each person
individual_dists = []

for i, (min_val, mode_val, max_val) in enumerate(elicitations):
    c = (mode_val - min_val) / (max_val - min_val)
    dist = stats.triang(c=c, loc=min_val, scale=max_val - min_val)
    individual_dists.append(dist)

print(f"Created {len(individual_dists)} triangular distributions")

In [None]:
# Visualize all individual priors
x = np.linspace(0, 1, 1000)

plt.figure(figsize=(14, 7))
colors = plt.cm.tab10(np.linspace(0, 1, n_people))

for i, dist in enumerate(individual_dists):
    y = dist.pdf(x)
    plt.plot(x, y, linewidth=1.5, alpha=0.6, color=colors[i], label=f'Person {i+1}')
    plt.fill_between(x, y, alpha=0.1, color=colors[i])

plt.xlabel('Proportion of Earth covered by Pacific Ocean', fontsize=12)
plt.ylabel('Probability density', fontsize=12)
plt.title(f'Individual Elicited Priors (n={n_people})', fontsize=14)
plt.legend(ncol=2, fontsize=9, loc='upper right')
plt.xlim(0, 1)
plt.grid(alpha=0.3)
plt.savefig('multi_triangular.jpg',dpi=300);

### Pool the Elicitations into a Joint Beta Prior

Now we combine these individual beliefs. We'll use **moment matching** again to pool the means and variances from all individuals, then fit a single Beta distribution that captures the consensus.

This approach:
- Captures the central tendency across all experts
- Accounts for both within-person uncertainty and between-person disagreement
- Produces a conjugate Beta prior for convenient Bayesian updating

In [None]:
# Calculate mean and variance for each person's triangular distribution
individual_means = [dist.mean() for dist in individual_dists]
individual_vars = [dist.var() for dist in individual_dists]

# Pool the statistics
# Mean: Average across all people
pooled_mean = np.mean(individual_means)

# Variance: Average within-person variance + between-person variance
# This captures both individual uncertainty AND disagreement between people
within_person_var = np.mean(individual_vars)
between_person_var = np.var(individual_means)
pooled_var = within_person_var + between_person_var

print(f"Pooling statistics:")
print(f"  Average of individual means: {pooled_mean:.3f}")
print(f"  Within-person uncertainty:   {within_person_var:.4f}")
print(f"  Between-person disagreement: {between_person_var:.4f}")
print(f"  Total pooled variance:       {pooled_var:.4f}")

In [None]:
# Convert pooled moments to Beta parameters
alpha_pooled, beta_pooled = beta_params_from_moments(pooled_mean, pooled_var)
joint_prior = stats.beta(alpha_pooled, beta_pooled)

print(f"Joint prior (pooled from {n_people} people):")
print(f"  Beta({alpha_pooled:.3f}, {beta_pooled:.3f})")
print(f"\nJoint prior statistics:")
print(f"  Mean: {joint_prior.mean():.3f} ({joint_prior.mean()*100:.1f}%)")
print(f"  Std:  {joint_prior.std():.3f} ({joint_prior.std()*100:.1f}%)")
print(f"  Median: {joint_prior.median():.3f}")
print(f"  95% credible interval: [{joint_prior.ppf(0.025):.3f}, {joint_prior.ppf(0.975):.3f}]")

In [None]:
# Visualize the joint prior against individual priors
fig, axes = plt.subplots(1, 2, figsize=(16, 6))

# Left: All individual priors (faint) with joint prior
for i, dist in enumerate(individual_dists):
    axes[0].plot(x, dist.pdf(x), linewidth=0.8, alpha=0.3, color='gray')
axes[0].plot(x, joint_prior.pdf(x), 'darkblue', linewidth=3, 
             label=f'Joint Prior: Beta({alpha_pooled:.2f}, {beta_pooled:.2f})')
axes[0].fill_between(x, joint_prior.pdf(x), alpha=0.3, color='darkblue')
axes[0].axvline(pooled_mean, color='red', linestyle='--', linewidth=2, 
                label=f'Mean: {pooled_mean:.3f}', alpha=0.7)
axes[0].set_xlabel('Proportion', fontsize=12)
axes[0].set_ylabel('Density', fontsize=12)
axes[0].set_title('Joint Prior from Pooled Elicitations', fontsize=13)
axes[0].legend(fontsize=10)
axes[0].set_xlim(0, 1)
axes[0].grid(alpha=0.3)

# Right: Comparison showing spread
axes[1].plot(x, joint_prior.pdf(x), 'darkblue', linewidth=3, label='Joint prior')
axes[1].fill_between(x, joint_prior.pdf(x), alpha=0.2, color='darkblue')

# Show the range of individual means
for mean_i in individual_means:
    axes[1].axvline(mean_i, color='orange', alpha=0.3, linewidth=1)
axes[1].axvline(individual_means[0], color='orange', alpha=0.3, linewidth=1, 
                label='Individual means')

axes[1].axvline(pooled_mean, color='red', linestyle='--', linewidth=2.5, 
                label=f'Pooled mean: {pooled_mean:.3f}')

# Add credible interval
ci_lower = joint_prior.ppf(0.025)
ci_upper = joint_prior.ppf(0.975)
axes[1].axvspan(ci_lower, ci_upper, alpha=0.2, color='green', 
                label=f'95% CI: [{ci_lower:.3f}, {ci_upper:.3f}]')

axes[1].set_xlabel('Proportion', fontsize=12)
axes[1].set_ylabel('Density', fontsize=12)
axes[1].set_title('Joint Prior with Individual Variation', fontsize=13)
axes[1].legend(fontsize=10)
axes[1].set_xlim(0, 1)
axes[1].grid(alpha=0.3)

plt.tight_layout()
plt.savefig('multi_beta.jpg',dpi=300);

The joint prior represents our best synthesis of the collective knowledge from all {n_people} people. The spread reflects both:
1. **Individual uncertainty**: How uncertain each person was (width of their triangular distributions)
2. **Expert disagreement**: How much people disagreed with each other (spread of their modal estimates)

### Bayesian Updating with Observed Data

Now let's use our joint prior for Bayesian inference. Imagine we collect some actual data by randomly tossing a globe and recording whether our finger lands on the Pacific Ocean.

In [None]:
# Simulate some data
# True value: Pacific Ocean is about 32% of Earth's surface (165.25M km² / 510M km²)
true_proportion = 0.32

# Simulate 50 globe tosses
n_tosses = 50
np.random.seed(42)
tosses = np.random.binomial(1, true_proportion, n_tosses)
n_pacific = tosses.sum()

print(f"Observed data:")
print(f"  Number of tosses: {n_tosses}")
print(f"  Landed on Pacific: {n_pacific}")
print(f"  Sample proportion: {n_pacific/n_tosses:.3f}")
print(f"  True proportion: {true_proportion:.3f}")

In [None]:
# Bayesian updating: Beta-Binomial conjugacy
# Prior: Beta(α, β)
# Likelihood: Binomial(n, p) with k successes
# Posterior: Beta(α + k, β + n - k)

alpha_post = alpha_pooled + n_pacific
beta_post = beta_pooled + (n_tosses - n_pacific)
posterior = stats.beta(alpha_post, beta_post)

print(f"Posterior: Beta({alpha_post:.2f}, {beta_post:.2f})")
print(f"\nPosterior statistics:")
print(f"  Mean: {posterior.mean():.3f}")
print(f"  Std:  {posterior.std():.3f}")
print(f"  95% credible interval: [{posterior.ppf(0.025):.3f}, {posterior.ppf(0.975):.3f}]")
print(f"\nComparison:")
print(f"  Prior mean (from {n_people} people): {joint_prior.mean():.3f}")
print(f"  Sample proportion:               {n_pacific/n_tosses:.3f}")
print(f"  Posterior mean:                  {posterior.mean():.3f}")
print(f"  True value:                      {true_proportion:.3f}")

In [None]:
# Visualize the updating process
x = np.linspace(0, 1, 1000)

# Prior (from pooled elicitations)
y_prior = joint_prior.pdf(x)

# Likelihood (scaled for visualization)
likelihood = stats.binom(n_tosses, x).pmf(n_pacific)
likelihood_scaled = likelihood / likelihood.max() * y_prior.max() * 0.8

# Posterior
y_post = posterior.pdf(x)

plt.figure(figsize=(14, 7))
plt.plot(x, y_prior, 'b-', linewidth=2.5, 
         label=f'Prior: Beta({alpha_pooled:.2f}, {beta_pooled:.2f})\n(from {n_people} elicitations)', 
         alpha=0.8)
plt.plot(x, likelihood_scaled, 'g-', linewidth=2, 
         label=f'Likelihood (scaled)\n({n_pacific}/{n_tosses} Pacific)', 
         alpha=0.7)
plt.plot(x, y_post, 'r-', linewidth=3, 
         label=f'Posterior: Beta({alpha_post:.2f}, {beta_post:.2f})')
plt.axvline(true_proportion, color='black', linestyle='--', linewidth=2, 
            label=f'True value: {true_proportion:.3f}', alpha=0.7)
plt.fill_between(x, y_post, alpha=0.2, color='red')

# Add vertical lines for means
plt.axvline(joint_prior.mean(), color='blue', linestyle=':', linewidth=1.5, alpha=0.5)
plt.axvline(n_pacific/n_tosses, color='green', linestyle=':', linewidth=1.5, alpha=0.5)
plt.axvline(posterior.mean(), color='red', linestyle=':', linewidth=1.5, alpha=0.5)

plt.xlabel('Proportion of Earth covered by Pacific Ocean', fontsize=12)
plt.ylabel('Density', fontsize=12)
plt.legend(fontsize=10, loc='upper right')
plt.xlim(0, 0.8)
plt.grid(alpha=0.3)
plt.savefig('multi_beta2.jpg',dpi=300);

The posterior represents a synthesis of:
1. **Prior knowledge**: The pooled beliefs of {n_people} people
2. **New data**: {n_tosses} globe tosses with observed outcomes

Notice how the posterior mean is a weighted average between the prior mean and the sample proportion. With more data, the posterior would increasingly reflect the likelihood and less the prior.

## Log-Odds Transformation for Logistic Regression

In logistic regression and many GLMs, we work with log-odds (logit) rather than probabilities directly. The logit transformation is:

$$\text{logit}(p) = \log\left(\frac{p}{1-p}\right)$$

This maps probabilities from (0,1) to the entire real line (-∞, +∞), which is useful when we want to use normal priors in our models.

In [None]:
# Define logit and inverse logit functions
def logit(p):
    """Convert probability to log-odds."""
    return np.log(p / (1 - p))

def inv_logit(x):
    """Convert log-odds to probability."""
    return 1 / (1 + np.exp(-x))

# Transform our joint Beta prior to log-odds scale
# Sample from Beta, transform to logit
n_samples = 10000
beta_samples = joint_prior.rvs(n_samples)
logit_samples = logit(beta_samples)

print(f"Log-odds transformation of Beta({alpha_pooled:.2f}, {beta_pooled:.2f}):")
print(f"  Mean log-odds: {logit_samples.mean():.3f}")
print(f"  Std log-odds:  {logit_samples.std():.3f}")
print(f"\nFor reference:")
print(f"  logit(0.25) = {logit(0.25):.3f}")
print(f"  logit(0.33) = {logit(0.33):.3f}")
print(f"  logit(0.50) = {logit(0.50):.3f}")
print(f"  logit(0.75) = {logit(0.75):.3f}")

In [None]:
# Visualize the transformed distribution
fig, axes = plt.subplots(1, 2, figsize=(14, 5))

# Left panel: probability scale
axes[0].hist(beta_samples, bins=50, density=True, alpha=0.6, color='blue', edgecolor='black')
x_prob = np.linspace(0, 1, 1000)
axes[0].plot(x_prob, joint_prior.pdf(x_prob), 'r-', linewidth=2, label='Beta PDF')
axes[0].axvline(pooled_mean, color='green', linestyle='--', linewidth=2, 
                label=f'Mean: {pooled_mean:.3f}')
axes[0].set_xlabel('Proportion (probability scale)', fontsize=12)
axes[0].set_ylabel('Density', fontsize=12)
axes[0].set_title('Joint Beta Prior (Probability Scale)', fontsize=13)
axes[0].legend()
axes[0].grid(alpha=0.3)

# Right panel: log-odds scale
axes[1].hist(logit_samples, bins=50, density=True, alpha=0.6, color='purple', edgecolor='black')
axes[1].axvline(logit_samples.mean(), color='green', linestyle='--', linewidth=2, 
                label=f'Mean: {logit_samples.mean():.3f}')
axes[1].axvline(0, color='black', linestyle=':', alpha=0.5, label='logit(0.5) = 0')
axes[1].set_xlabel('Log-odds (logit scale)', fontsize=12)
axes[1].set_ylabel('Density', fontsize=12)
axes[1].set_title('Transformed Distribution (Log-Odds Scale)', fontsize=13)
axes[1].legend()
axes[1].grid(alpha=0.3)

plt.tight_layout();

The log-odds transformation creates an approximately normal distribution. This is useful because we can now approximate this with a normal prior in logistic regression models.

In [None]:
# Fit a normal distribution to the log-odds
from scipy.stats import norm

logit_mean = logit_samples.mean()
logit_std = logit_samples.std()

# Create a normal distribution with these parameters
logit_normal = norm(logit_mean, logit_std)

print(f"Normal approximation in log-odds space:")
print(f"  N({logit_mean:.3f}, {logit_std:.3f}²)")
print(f"\nThis can be used as a prior in logistic regression:")
print(f"\nIn PyMC:")
print(f"  intercept = pm.Normal('intercept', mu={logit_mean:.3f}, sigma={logit_std:.3f})")
print(f"\nIn Stan:")
print(f"  intercept ~ normal({logit_mean:.3f}, {logit_std:.3f});")
print(f"\nIn R (brms):")
print(f"  prior(normal({logit_mean:.3f}, {logit_std:.3f}), class = Intercept)")

In [None]:
# Compare log-odds transformation vs normal approximation
x_logit = np.linspace(-3, 2, 1000)

plt.figure(figsize=(12, 6))
plt.hist(logit_samples, bins=50, density=True, alpha=0.4, color='purple', 
         label='Transformed Beta samples', edgecolor='black')
plt.plot(x_logit, logit_normal.pdf(x_logit), 'r-', linewidth=2.5, 
         label=f'Normal({logit_mean:.3f}, {logit_std:.3f})')
plt.axvline(logit_mean, color='green', linestyle='--', linewidth=2, alpha=0.7,
            label=f'Mean: {logit_mean:.3f}')
plt.axvline(0, color='black', linestyle=':', alpha=0.5)
plt.xlabel('Log-odds', fontsize=12)
plt.ylabel('Density', fontsize=12)
plt.title('Log-Odds Prior: Transformed Beta vs Normal Approximation', fontsize=14)
plt.legend(fontsize=10)
plt.grid(alpha=0.3)

# Add second x-axis showing probability scale
ax2 = plt.gca().twiny()
ax2.set_xlim(plt.gca().get_xlim())
logit_ticks = np.array([-2, -1, 0, 1])
prob_ticks = inv_logit(logit_ticks)
ax2.set_xticks(logit_ticks)
ax2.set_xticklabels([f'{p:.2f}' for p in prob_ticks])
ax2.set_xlabel('Corresponding probability', fontsize=12, color='gray');

# Implied sample size

Beside eliciting bounds or ranges of things, there are many other ways of getting at the underlying (implied) paramers of a probability distribution. One highlighted by O'Hagan *et al.* is the *equivalent prior sample* (EST) method, which recognizes that elicition methods often make the uncertainties too low, even for an opinion from one person. 

The EST method asks for a point estimate of the expected value (in my case, 0.72) but also then asks "based on how many samples?" Then given $n$ the calculation can be made that $\alpha = n\hat{p}$ and $\beta = n(1-\hat{p})$.

For examples about the earth, there is only one, so the resulting pdf would simply be $Beta(0.72, 0.18)$

In [None]:
# Generate points for the Beta distribution PDF
pdf_y = sp.stats.beta.pdf(x, 0.72, 0.18)

# Plot the Beta distribution PDF
plt.figure(figsize=(8, 6))
plt.plot(x, pdf_y, label="Beta PDF (α=0.72, β=0.18)", color='green')

# Add graph details
plt.title("Some Beta Distribution PDF", fontsize=14)
plt.xlabel("Probability", fontsize=12)
plt.ylabel("PDF Value", fontsize=12)
plt.axhline(0, color='black', linewidth=0.8, linestyle='--')
plt.axvline(0, color='black', linewidth=0.8, linestyle='--')
plt.legend()
plt.grid(alpha=0.4)
plt.savefig('one_earth.jpg',dpi=300);

Which is a bit of a crap estimate. But let's instead use samples from people to give some perspective

In [None]:
# Generate points for the Beta distribution PDF

# Plot the Beta distribution PDF
plt.figure(figsize=(8, 6))
plt.plot(x, sp.stats.beta.pdf(x, 5*0.72, 0.18), label="Beta PDF (α=0.72, β=0.18)")
plt.plot(x, sp.stats.beta.pdf(x, 5*0.72, 5*0.18), label="Beta PDF  (α=5x0.72, β=5x0.18)")
plt.plot(x, sp.stats.beta.pdf(x, 10*0.72, 10*0.18), label="Beta PDF  (α=10x0.72, β=10x0.18)")
plt.plot(x, sp.stats.beta.pdf(x, 50*0.72, 50*0.18), label="Beta PDF  (α=50x0.72, β=50x0.18)")
plt.plot(x, sp.stats.beta.pdf(x, 500*0.72, 500*0.18), label="Beta PDF  (α=500x0.72, β=500x0.18)")


# Add graph details
plt.title("Some Beta Distributions PDF", fontsize=14)
plt.xlabel("Probability", fontsize=12)
plt.ylabel("PDF Value", fontsize=12)
plt.axhline(0, color='black', linewidth=0.8, linestyle='--')
plt.axvline(0, color='black', linewidth=0.8, linestyle='--')
plt.legend()
plt.grid(alpha=0.4)
plt.savefig('multi_earth.jpg',dpi=300);

In general, this looks worse than what we have using the CDFs with few numbers of people and in practice it frequently is - the CDF is just better. But good to know that there are other options.

While the elicitation of probabilities is good in the sense it is bounded and therefore tractable, what about paramters in a geocentric linear model? How can people elicit such things? 

In [None]:
# Simulate expert responses
# Assume three experts provide their mean and confidence intervals for the slope
expert_responses = {
    "Expert 1": {"mean": 2.0, "std": 0.5},  # Mean and standard deviation
    "Expert 2": {"mean": 2.2, "std": 0.3},
    "Expert 3": {"mean": 1.8, "std": 0.4},
}

# Simulate individual distributions
n_samples = 1000
x = np.linspace(0, 4, n_samples)  # Range for the slope
distributions = {
    name: sp.stats.norm.pdf(x, loc=data["mean"], scale=data["std"])
    for name, data in expert_responses.items()
}

# Aggregate expert opinions using weighted averaging
# Equal weights for simplicity
weights = np.array([1/3, 1/3, 1/3])
aggregated_pdf = sum(weight * dist for weight, dist in zip(weights, distributions.values()))


# Calculate the aggregated mean and standard deviation
aggregated_mean = sum(weights[i] * expert_responses[f"Expert {i+1}"]["mean"] for i in range(len(weights)))
aggregated_variance = sum(weights[i] * (expert_responses[f"Expert {i+1}"]["std"]**2 + 
                                       (expert_responses[f"Expert {i+1}"]["mean"] - aggregated_mean)**2) for i in range(len(weights)))
aggregated_std = round(np.sqrt(aggregated_variance),2)

In [None]:
# Plot individual expert distributions and the aggregated distribution
plt.figure(figsize=(10, 6))
for name, pdf in distributions.items():
    plt.plot(x, pdf, label=f"{name} (Mean: {expert_responses[name]['mean']}, Std: {expert_responses[name]['std']})")
plt.plot(x, aggregated_pdf, label=f"{'Aggregated'} (Mean: {aggregated_mean}, Std: {aggregated_std})", color="black", lw=2, linestyle="--")

# Add plot details
plt.title("Direct elicitation of Slope in Simple Linear Regression", fontsize=14)
plt.xlabel("Slope", fontsize=12)
plt.ylabel("Density", fontsize=12)
plt.legend()
plt.grid(alpha=0.4)

plt.savefig('direct.jpg',dpi=300);

This is fine if we have experts that know something about - and can think in terms of - the mean and standard deviation of a regression slope. But what about normal people? Well first the language has to be good - saying what is the slope isn't good, but saying "what is the most change you would expect in Y given a change from X1 to X2?" (with context appropriate words for Y, X1 and X2)..."and what is the minimum change you would expect?" And "How sure are you that the change would be within this range?"

With these statements in hand we can convert into quantitative estimates

In [None]:
# Experts provide a plausible range (lower and upper bounds) with a confidence level (e.g., 95%)
expert_ranges = {
    "Expert 1": {"lower": 1.5, "upper": 2.5, "confidence": 0.90},
    "Expert 2": {"lower": 2.0, "upper": 2.4, "confidence": 0.95},
    "Expert 3": {"lower": 1.6, "upper": 2.0, "confidence": 0.75},
}

# Convert ranges to mean and std assuming a normal distribution
for name, data in expert_ranges.items():
    mean = (data["lower"] + data["upper"]) / 2
    std = (data["upper"] - data["lower"]) / (2 * sp.stats.norm.ppf((1 + data["confidence"]) / 2))
    expert_ranges[name]["mean"] = mean
    expert_ranges[name]["std"] = std

# Calculate the aggregated mean from the weighted means
aggregated_mean = sum(
    weights[i] * expert_ranges[f"Expert {i+1}"]["mean"] for i in range(len(weights))
)

# Calculate the aggregated variance from the weighted variances
aggregated_variance = sum(
    weights[i] * (expert_ranges[f"Expert {i+1}"]["std"]**2 +
                  (expert_ranges[f"Expert {i+1}"]["mean"] - aggregated_mean)**2)
    for i in range(len(weights))
)

# Compute the aggregated standard deviation
aggregated_std = np.sqrt(aggregated_variance)


aggregated_pdf = sp.stats.norm.pdf(x, aggregated_mean, scale=aggregated_std)


In [None]:
# Plot individual expert distributions and the aggregated distribution
plt.figure(figsize=(10, 6))
for name, pdf in distributions.items():
    plt.plot(x, pdf, label=f"{name} (Lower: {expert_ranges[name]['lower']}, Upper: {expert_ranges[name]['upper']}, Conf.: {expert_ranges[name]['confidence']})")
plt.plot(x, aggregated_pdf, label=f"{'Aggregated'} (Mean: {aggregated_mean}, Std: {round(aggregated_std,2)})", color="black", lw=2, linestyle="--")

# Add plot details
plt.title("Sort-of indirect elicitation of Slope in Simple Linear Regression", fontsize=14)
plt.xlabel("Slope", fontsize=12)
plt.ylabel("Density", fontsize=12)
plt.legend()
plt.grid(alpha=0.4)

plt.savefig('indirect.jpg',dpi=300);

All this is but the tip of the elicitation iceberg - there are many other, complex ways to derive estimates!