# Lab 27: Custom Prior Probability Models

## Overview

This notebook explores custom prior probability models in Bonsai v3, which are implemented in the `prior.py` module. Prior probability models allow you to incorporate prior knowledge and population-specific information into the pedigree reconstruction process, enhancing the accuracy of genetic relationship inference.

**Learning Objectives:**
- Understand the role of prior probabilities in Bayesian inference for genetic genealogy
- Explore the structure and implementation of the `prior.py` module in Bonsai v3
- Learn how to develop and customize prior probability models for different populations
- Apply custom priors to improve relationship inference accuracy
- Evaluate the impact of different prior models on reconstruction results

**Prerequisites:**
- Completion of Lab 6: Probabilistic Relationship Inference
- Completion of Lab 7: PwLogLike Class
- Basic understanding of Bayesian statistics

**Estimated completion time:** 60-90 minutes

In [None]:
# Standard imports
import os
import sys
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
import networkx as nx
from IPython.display import display, HTML, Markdown
import inspect
import importlib

sys.path.append(os.path.dirname(os.getcwd()))

# Cross-compatibility setup
from scripts_support.lab_cross_compatibility import setup_environment, is_jupyterlite, save_results, save_plot

# Set up environment-specific paths
DATA_DIR, RESULTS_DIR = setup_environment()

# Set visualization styles
plt.style.use('seaborn-v0_8-whitegrid')
sns.set_context("notebook")
sns.set_palette("colorblind")  # Improve accessibility with colorblind-friendly palette

# Configure plot defaults for better readability
plt.rcParams.update({
    'figure.figsize': (10, 6),
    'font.size': 12,
    'axes.labelsize': 12,
    'axes.titlesize': 14,
    'xtick.labelsize': 10,
    'ytick.labelsize': 10
})

In [None]:
# Setup Bonsai module paths
if not is_jupyterlite():
    # In local environment, add the utils directory to system path
    utils_dir = os.getenv('PROJECT_UTILS_DIR', os.path.join(os.path.dirname(DATA_DIR), 'utils'))
    bonsaitree_dir = os.path.join(utils_dir, 'bonsaitree')
    
    # Add to path if it exists and isn't already there
    if os.path.exists(bonsaitree_dir) and bonsaitree_dir not in sys.path:
        sys.path.append(bonsaitree_dir)
        print(f"Added {bonsaitree_dir} to sys.path")
else:
    # In JupyterLite, use a simplified approach
    print("⚠️ Running in JupyterLite: Some Bonsai functionality may be limited.")
    print("This notebook is primarily designed for local execution where the Bonsai codebase is available.")

In [None]:
# Helper functions for exploring modules
def display_module_classes(module_name):
    """Display classes and their docstrings from a module"""
    try:
        # Import the module
        module = importlib.import_module(module_name)
        
        # Find all classes
        classes = inspect.getmembers(module, inspect.isclass)
        
        # Filter classes defined in this module (not imported)
        classes = [(name, cls) for name, cls in classes if cls.__module__ == module_name]
        
        if not classes:
            print(f"No classes found in module {module_name}")
            return
            
        # Print info for each class
        for name, cls in classes:
            display(Markdown(f"### Class: {name}"))
            
            # Get docstring
            doc = inspect.getdoc(cls)
            if doc:
                display(Markdown(f"**Documentation:**\n{doc}"))
            else:
                display(Markdown("*No documentation available*"))
            
            # Get methods
            methods = inspect.getmembers(cls, inspect.isfunction)
            public_methods = [(method_name, method) for method_name, method in methods 
                             if not method_name.startswith('_')]
            
            if public_methods:
                display(Markdown("**Public Methods:**"))
                for method_name, method in public_methods:
                    sig = inspect.signature(method)
                    display(Markdown(f"- `{method_name}{sig}`"))
            else:
                display(Markdown("*No public methods*"))
            
            display(Markdown("---"))
    except ImportError as e:
        print(f"Error importing module {module_name}: {e}")
    except Exception as e:
        print(f"Error processing module {module_name}: {e}")

def display_module_functions(module_name):
    """Display functions and their docstrings from a module"""
    try:
        # Import the module
        module = importlib.import_module(module_name)
        
        # Find all functions
        functions = inspect.getmembers(module, inspect.isfunction)
        
        # Filter functions defined in this module (not imported)
        functions = [(name, func) for name, func in functions if func.__module__ == module_name]
        
        if not functions:
            print(f"No functions found in module {module_name}")
            return
            
        # Filter public functions
        public_functions = [(name, func) for name, func in functions if not name.startswith('_')]
        
        if not public_functions:
            print(f"No public functions found in module {module_name}")
            return
            
        # Print info for each function
        for name, func in public_functions:                
            display(Markdown(f"### Function: {name}"))
            
            # Get signature
            sig = inspect.signature(func)
            display(Markdown(f"**Signature:** `{name}{sig}`"))
            
            # Get docstring
            doc = inspect.getdoc(func)
            if doc:
                display(Markdown(f"**Documentation:**\n{doc}"))
            else:
                display(Markdown("*No documentation available*"))
                
            display(Markdown("---"))
    except ImportError as e:
        print(f"Error importing module {module_name}: {e}")
    except Exception as e:
        print(f"Error processing module {module_name}: {e}")

def view_function_source(module_name, function_name):
    """Display the source code of a function"""
    try:
        # Import the module
        module = importlib.import_module(module_name)
        
        # Get the function
        func = getattr(module, function_name)
        
        # Get the source code
        source = inspect.getsource(func)
        
        # Print the source code with syntax highlighting
        display(Markdown(f"### Source code for `{function_name}`\n```python\n{source}\n```"))
    except ImportError as e:
        print(f"Error importing module {module_name}: {e}")
    except AttributeError:
        print(f"Function {function_name} not found in module {module_name}")
    except Exception as e:
        print(f"Error processing function {function_name}: {e}")

def view_class_source(module_name, class_name):
    """Display the source code of a class"""
    try:
        # Import the module
        module = importlib.import_module(module_name)
        
        # Get the class
        cls = getattr(module, class_name)
        
        # Get the source code
        source = inspect.getsource(cls)
        
        # Print the source code with syntax highlighting
        display(Markdown(f"### Source code for class `{class_name}`\n```python\n{source}\n```"))
    except ImportError as e:
        print(f"Error importing module {module_name}: {e}")
    except AttributeError:
        print(f"Class {class_name} not found in module {module_name}")
    except Exception as e:
        print(f"Error processing class {class_name}: {e}")

def explore_module(module_name):
    """Display a comprehensive overview of a module with classes and functions"""
    try:
        # Import the module
        module = importlib.import_module(module_name)
        
        # Module docstring
        doc = inspect.getdoc(module)
        display(Markdown(f"# Module: {module_name}"))
        
        if doc:
            display(Markdown(f"**Module Documentation:**\n{doc}"))
        else:
            display(Markdown("*No module documentation available*"))
            
        display(Markdown("---"))
        
        # Display classes
        display(Markdown("## Classes"))
        display_module_classes(module_name)
        
        # Display functions
        display(Markdown("## Functions"))
        display_module_functions(module_name)
        
    except ImportError as e:
        print(f"Error importing module {module_name}: {e}")
    except Exception as e:
        print(f"Error exploring module {module_name}: {e}")

## Check Bonsai Installation

Let's verify that the Bonsai v3 module is available for import:

In [None]:
try:
    from bonsaitree import v3
    print("✅ Successfully imported Bonsai v3 module")
    
    # Print Bonsai version information if available
    if hasattr(v3, "__version__"):
        print(f"Bonsai v3 version: {v3.__version__}")
    
    # List key submodules
    print("\nAvailable Bonsai submodules:")
    for module_name in dir(v3):
        if not module_name.startswith("_") and not module_name.startswith("__"):
            print(f"- {module_name}")
except ImportError as e:
    print(f"❌ Failed to import Bonsai v3 module: {e}")
    print("This lab requires access to the Bonsai v3 codebase.")
    print("Make sure you've properly set up your environment with the Bonsai repository.")

## Introduction

In genetic genealogy, we often have prior knowledge or assumptions about relationships before examining the genetic evidence. For example, we might know that parent-child relationships are more common than second-cousin relationships in our dataset, or that certain age differences make some relationships more likely than others. Bayesian inference provides a principled framework for incorporating this prior knowledge into relationship inference.

The Bonsai v3 framework incorporates prior probability models through the `prior.py` module, which allows us to specify and utilize different types of prior information to enhance relationship inference accuracy. By applying appropriate priors, we can improve the reconstruction of pedigrees, especially in cases with limited or ambiguous genetic evidence.

**Key concepts we'll cover:**
- The Bayesian framework for relationship inference in Bonsai v3
- Population genetic models for generating appropriate priors
- Implementation of custom prior probability models
- Integration of prior models with genetic evidence

## Part 1: The Role of Prior Probabilities in Bonsai v3

### Theory and Background

Bonsai v3 uses a Bayesian inference framework to deduce the most likely relationships between individuals based on their shared IBD (Identity by Descent) segments. In Bayesian inference, we aim to calculate the posterior probability of a relationship $R$ given the observed genetic data $D$:

$$P(R|D) = \frac{P(D|R) \times P(R)}{P(D)}$$

Where:
- $P(R|D)$ is the posterior probability of relationship $R$ given the genetic data $D$
- $P(D|R)$ is the likelihood of observing the genetic data $D$ if the relationship is $R$
- $P(R)$ is the prior probability of relationship $R$ before seeing any genetic data
- $P(D)$ is the marginal likelihood of the data (a normalization constant)

The prior probability $P(R)$ represents our beliefs about the relative frequencies of different relationships before examining the genetic evidence. Well-designed priors can significantly improve relationship inference, especially when the genetic evidence is limited or ambiguous.

In Bonsai v3, the prior probability model fulfills several key roles:

1. **Relationship Priors**: Specifying the expected frequencies of different relationship types in the population
2. **Age-Based Priors**: Incorporating constraints based on age differences between individuals
3. **Population-Specific Priors**: Adapting priors to reflect the demographic characteristics of specific populations
4. **External Knowledge Integration**: Incorporating non-genetic information into the inference process

The `prior.py` module in Bonsai v3 provides functions to calculate these prior probabilities, which are then combined with genetic likelihood models to determine the most probable relationships.

### Implementation in Bonsai v3

Let's examine the structure and key functions of the `prior.py` module in Bonsai v3. This module contains functions to calculate prior probabilities for different aspects of pedigree reconstruction.

In [None]:
# Import prior module
try:
    from bonsaitree.v3 import prior
    print("✅ Successfully imported the prior module")
except ImportError as e:
    print(f"❌ Failed to import prior module: {e}")
    print("Will proceed with theoretical discussion.")

In [None]:
# Explore the functions in the prior module
try:
    display_module_functions("bonsaitree.v3.prior")
except Exception as e:
    print(f"Error displaying prior module functions: {e}")

The `prior.py` module provides several key functions for calculating prior probabilities related to IBD sharing and ancestral relationships. Let's examine some of the most important functions in more detail.

In [None]:
# View the source code of a key function: get_prior_g
try:
    view_function_source("bonsaitree.v3.prior", "get_prior_g")
except Exception as e:
    print(f"Error viewing function source: {e}")

### Main Components of Bonsai's Prior Model

The prior model in Bonsai v3 consists of several interconnected components:

1. **Population Size Model**: Functions like `exp_num_female_common_ancs` and `exp_num_total_common_ancs` model how many common ancestors two individuals are expected to have based on population size and number of generations.

2. **IBD Transmission Model**: Functions like `prob_share_any_ibd_seg` and `prob_share_no_ibd_seg` calculate the probability of IBD segment sharing between individuals separated by a given number of meioses.

3. **Generation Prior**: The function `get_prior_g` calculates the prior probability that a common ancestor lived g generations in the past, based on population demographics and IBD detection parameters.

These components work together to create a comprehensive prior probability framework that informs the pedigree reconstruction process.

### Exercise 1: Understanding the Common Ancestor Prior

In this exercise, we'll explore how the population size affects the expected number of common ancestors and the resulting prior probabilities.

**Task:** Complete the function below to calculate and visualize how the expected number of common ancestors varies with population size and generation depth.

**Hint:** Use the `exp_num_shared_common_ancs` function from the prior module to perform the calculation.

In [None]:
# Exercise 1: Calculate and visualize the expected number of common ancestors
def visualize_common_ancestors():
    """Visualize how the expected number of common ancestors varies with population size and generation depth."""
    # TODO: Implement the visualization
    
    # Create a range of population sizes to test
    population_sizes = [1000, 5000, 10000, 50000, 100000]
    
    # Create a range of generations to consider
    generations = list(range(1, 11))  # 1 to 10 generations
    
    # Create a figure with subplots
    fig, axs = plt.subplots(1, 2, figsize=(14, 6))
    
    # Plot 1: Expected number of common ancestors vs. generations for different population sizes
    for N in population_sizes:
        # Calculate the expected number of common ancestors for each generation
        expected_common_ancs = [prior.exp_num_shared_common_ancs(N, g) for g in generations]
        
        # Plot the results
        axs[0].plot(generations, expected_common_ancs, marker='o', label=f'N = {N:,}')
    
    axs[0].set_title('Expected Number of Shared Common Ancestors')
    axs[0].set_xlabel('Generations in the Past')
    axs[0].set_ylabel('Expected Number of Common Ancestors')
    axs[0].set_yscale('log')
    axs[0].legend()
    axs[0].grid(True, alpha=0.3)
    
    # Plot 2: Probability of sharing any IBD segment vs. meioses
    meioses = list(range(2, 21, 2))  # 2 to 20 meioses in steps of 2
    
    # Calculate probability of sharing any IBD segment for different min_seg_len values
    min_seg_lengths = [1, 5, 7, 10, 15]
    
    for min_seg_len in min_seg_lengths:
        # Calculate probability for each number of meioses
        prob_share = [prior.prob_share_any_ibd_seg(m, a=1, min_seg_len=min_seg_len) for m in meioses]
        
        # Plot the results
        axs[1].plot(meioses, prob_share, marker='o', label=f'min_seg_len = {min_seg_len} cM')
    
    axs[1].set_title('Probability of Sharing Any IBD Segment')
    axs[1].set_xlabel('Number of Meioses')
    axs[1].set_ylabel('Probability')
    axs[1].legend()
    axs[1].grid(True, alpha=0.3)
    
    plt.tight_layout()
    plt.show()
    
    return fig

# Run the visualization
try:
    fig = visualize_common_ancestors()
except Exception as e:
    print(f"Error in visualization: {e}")

### The Role of Population Size in Prior Models

As we can see from the visualizations, population size plays a crucial role in determining the probability of shared ancestry and IBD segment sharing. In smaller populations, the likelihood of two randomly selected individuals sharing a common ancestor in the recent past is much higher than in larger populations. This has important implications for genealogical research in different demographic contexts:

1. **Small, Isolated Populations**: Higher rates of shared ancestry and IBD segment sharing are expected, requiring adjusted priors to avoid overestimating close relationships.

2. **Large, Diverse Populations**: Lower rates of background IBD sharing, allowing more confident inference of close relationships based on observed segments.

3. **Historically Varying Populations**: Many human populations have experienced significant size changes over time (bottlenecks, expansions), requiring more complex prior models that account for these demographic events.

The functions in Bonsai's `prior.py` module allow us to compute appropriate priors for different population scenarios, enhancing the accuracy of relationship inference.

In [None]:
# Demonstration: Calculating generational priors
try:
    # Calculate prior probabilities for common ancestors at different generations
    N = 10000  # Population size
    a = 1      # Number of common ancestors
    min_seg_len = 7  # Minimum observable segment length in cM
    g_range = np.arange(1, 11)  # Generations 1 to 10
    
    # Get the prior probability mass function
    prior_probabilities = prior.get_prior_g(
        g_range=g_range,
        N=N,
        a=a,
        min_seg_len=min_seg_len
    )
    
    # Visualize the prior probability distribution
    plt.figure(figsize=(10, 6))
    plt.bar(g_range, prior_probabilities)
    plt.xlabel('Generations in the Past')
    plt.ylabel('Prior Probability')
    plt.title(f'Prior Probability of Common Ancestor Generation\n(N={N}, min_seg_len={min_seg_len} cM)')
    plt.xticks(g_range)
    plt.grid(True, alpha=0.3)
    plt.show()
    
    # Print numerical values
    print("Generation | Prior Probability")
    print("-" * 30)
    for g, p in zip(g_range, prior_probabilities):
        print(f"{g:^10} | {p:.4f}")
    
except Exception as e:
    print(f"Error calculating generational priors: {e}")

## Part 2: Relationship Prior Probability Models

### Theory and Background

Relationship prior probabilities represent our expectations about the relative frequencies of different relationship types in a population before considering any genetic evidence. In genetic genealogy, relationships are often parameterized using a tuple representation (up, down, num_ancs) where:

- `up`: Number of generations from first individual up to the common ancestor
- `down`: Number of generations from the common ancestor down to the second individual
- `num_ancs`: Number of common ancestors (typically 1 or 2)

For example:
- Parent-child: (1, 0, 1) or (0, 1, 1)
- Full siblings: (1, 1, 2)
- Half siblings: (1, 1, 1)
- First cousins: (2, 2, 2)

The relationship prior model assigns probabilities to these different relationship types, typically based on:

1. **Demographic Patterns**: Age structure, marriage patterns, and fertility rates in the population
2. **Genealogical Structure**: Typical family structures in the culture or time period
3. **Study Design**: The sampling approach used to collect genetic data

In Bonsai v3, relationship priors can be customized to fit different population contexts, improving the accuracy of pedigree reconstruction for specific applications.

### Implementation in Bonsai v3

Bonsai v3 implements relationship priors through a series of calculations that convert demographic models into prior probabilities for different relationship types. While the `prior.py` module doesn't directly implement a relationship prior distribution, it provides the foundation for calculating the likelihood of ancestral connections, which informs the relationship prior.

Let's create a custom relationship prior model based on Bonsai's prior probability framework:

In [None]:
# Implementation of a custom relationship prior model
def create_relationship_prior(population_size=10000, min_seg_len=7):
    """Create a relationship prior probability distribution based on population size and IBD detection parameters.
    
    Args:
        population_size: Size of the population
        min_seg_len: Minimum observable segment length in cM
        
    Returns:
        Dictionary mapping relationship tuples to prior probabilities
    """
    # Define the relationship tuples we want to calculate priors for
    # (up, down, num_ancs) format
    relationships = [
        (0, 1, 1),  # Parent-child (parent to child)
        (1, 0, 1),  # Parent-child (child to parent)
        (1, 1, 2),  # Full siblings
        (1, 1, 1),  # Half siblings
        (1, 2, 1),  # Aunt/Uncle-Niece/Nephew
        (2, 1, 1),  # Niece/Nephew-Aunt/Uncle
        (2, 2, 2),  # First cousins
        (2, 3, 1),  # First cousin once removed (up)
        (3, 2, 1),  # First cousin once removed (down)
        (3, 3, 2),  # Second cousins
    ]
    
    # Initialize the prior dictionary
    relationship_prior = {}
    
    # Calculate generational priors using prior.py functions
    g_range = np.arange(1, 6)  # Consider generations 1 to 5
    gen_priors = prior.get_prior_g(
        g_range=g_range,
        N=population_size,
        a=1,
        min_seg_len=min_seg_len
    )
    
    # Calculate probability of sharing at least one IBD segment for different meioses distances
    meioses_probs = {}
    for up, down, num_ancs in relationships:
        total_meioses = up + down
        share_prob = prior.prob_share_any_ibd_seg(
            m_lst=total_meioses,
            a=num_ancs,
            min_seg_len=min_seg_len
        )
        meioses_probs[(up, down, num_ancs)] = share_prob
    
    # Combine generation priors and meioses probabilities to create relationship priors
    # We'll weight by IBD sharing probability and generational prior
    total_weight = sum(meioses_probs.values())
    
    for rel_tuple, prob in meioses_probs.items():
        relationship_prior[rel_tuple] = prob / total_weight
    
    # Apply demographic adjustments (simplified example)
    # Increase probability of parent-child and full sibling relationships
    for rel_tuple in [(0, 1, 1), (1, 0, 1), (1, 1, 2)]:
        if rel_tuple in relationship_prior:
            relationship_prior[rel_tuple] *= 2.0
    
    # Renormalize
    total = sum(relationship_prior.values())
    for rel_tuple in relationship_prior:
        relationship_prior[rel_tuple] /= total
    
    return relationship_prior

# Create and visualize a relationship prior for a standard population
try:
    # Create the prior
    standard_prior = create_relationship_prior(population_size=10000, min_seg_len=7)
    
    # Define a mapping of relationship tuples to human-readable names
    relationship_names = {
        (0, 1, 1): "Parent→Child",
        (1, 0, 1): "Child→Parent",
        (1, 1, 2): "Full Siblings",
        (1, 1, 1): "Half Siblings",
        (1, 2, 1): "Aunt/Uncle",
        (2, 1, 1): "Niece/Nephew",
        (2, 2, 2): "First Cousins",
        (2, 3, 1): "1C1R (up)",
        (3, 2, 1): "1C1R (down)",
        (3, 3, 2): "Second Cousins",
    }
    
    # Convert to a format suitable for visualization
    relationships = []
    probabilities = []
    for rel_tuple, prob in sorted(standard_prior.items(), key=lambda x: -x[1]):
        relationships.append(relationship_names.get(rel_tuple, str(rel_tuple)))
        probabilities.append(prob)
    
    # Create a bar chart
    plt.figure(figsize=(12, 6))
    bars = plt.bar(relationships, probabilities)
    
    # Add probability values on top of bars
    for bar in bars:
        height = bar.get_height()
        plt.text(bar.get_x() + bar.get_width()/2., height + 0.005,
                f'{height:.3f}',
                ha='center', va='bottom', rotation=0)
    
    plt.xlabel('Relationship')
    plt.ylabel('Prior Probability')
    plt.title('Relationship Prior Probability Distribution (Standard Population)')
    plt.xticks(rotation=45, ha='right')
    plt.grid(axis='y', alpha=0.3)
    plt.tight_layout()
    plt.show()
    
except Exception as e:
    print(f"Error creating relationship prior: {e}")

### Exercise 2: Population-Specific Prior Models

Different populations have different demographic characteristics that affect the expected distribution of relationships. In this exercise, you'll create custom relationship prior models for two different population scenarios.

**Task:** Complete the function to create and compare relationship prior models for an endogamous population (small, isolated community) and a large, diverse population.

**Hint:** For an endogamous population, use a smaller population size and adjust the demographic modifiers to increase the probability of distant relationships.

In [None]:
# Exercise 2: Create population-specific prior models
def compare_population_priors():
    """Create and compare relationship prior models for different population types."""
    # TODO: Implement the comparison
    
    # Define population parameters
    populations = {
        "Large, Diverse": {"size": 50000, "min_seg_len": 7},
        "Standard": {"size": 10000, "min_seg_len": 7},
        "Endogamous": {"size": 1000, "min_seg_len": 7}
    }
    
    # Create prior models for each population
    priors = {}
    for pop_name, params in populations.items():
        # Create basic prior
        prior_model = create_relationship_prior(
            population_size=params["size"],
            min_seg_len=params["min_seg_len"]
        )
        
        # Apply population-specific adjustments
        if pop_name == "Endogamous":
            # Increase probability of distant relationships
            for rel_tuple in [(2, 2, 2), (2, 3, 1), (3, 2, 1), (3, 3, 2)]:
                if rel_tuple in prior_model:
                    prior_model[rel_tuple] *= 3.0
        elif pop_name == "Large, Diverse":
            # Decrease probability of distant relationships
            for rel_tuple in [(2, 2, 2), (2, 3, 1), (3, 2, 1), (3, 3, 2)]:
                if rel_tuple in prior_model:
                    prior_model[rel_tuple] *= 0.5
        
        # Renormalize
        total = sum(prior_model.values())
        for rel_tuple in prior_model:
            prior_model[rel_tuple] /= total
        
        priors[pop_name] = prior_model
    
    # Define relationship names for visualization
    relationship_names = {
        (0, 1, 1): "Parent→Child",
        (1, 0, 1): "Child→Parent",
        (1, 1, 2): "Full Siblings",
        (1, 1, 1): "Half Siblings",
        (1, 2, 1): "Aunt/Uncle",
        (2, 1, 1): "Niece/Nephew",
        (2, 2, 2): "First Cousins",
        (2, 3, 1): "1C1R (up)",
        (3, 2, 1): "1C1R (down)",
        (3, 3, 2): "Second Cousins",
    }
    
    # Get the set of all relationships across all priors
    all_relationships = set()
    for prior_model in priors.values():
        all_relationships.update(prior_model.keys())
    
    # Sort relationships by a typical order
    relationship_order = [(0, 1, 1), (1, 0, 1), (1, 1, 2), (1, 1, 1), 
                           (1, 2, 1), (2, 1, 1), (2, 2, 2), 
                           (2, 3, 1), (3, 2, 1), (3, 3, 2)]
    
    sorted_relationships = sorted(
        all_relationships, 
        key=lambda rel: relationship_order.index(rel) if rel in relationship_order else 999
    )
    
    # Create a dataframe for comparison
    import pandas as pd
    df = pd.DataFrame(index=[relationship_names.get(rel, str(rel)) for rel in sorted_relationships])
    
    for pop_name, prior_model in priors.items():
        df[pop_name] = [prior_model.get(rel, 0) for rel in sorted_relationships]
    
    # Display the dataframe
    display(df.style.format("{:.4f}").background_gradient(cmap='Blues'))
    
    # Create a grouped bar chart
    df_plot = df.copy()
    ax = df_plot.plot(kind='bar', figsize=(14, 7))
    ax.set_ylabel('Prior Probability')
    ax.set_title('Relationship Prior Probability Distributions by Population Type')
    ax.grid(axis='y', alpha=0.3)
    ax.legend(title='Population Type')
    plt.xticks(rotation=45, ha='right')
    plt.tight_layout()
    plt.show()
    
    return df

# Run the comparison
try:
    df_priors = compare_population_priors()
except Exception as e:
    print(f"Error comparing population priors: {e}")

### Population Demographics and Prior Models

As demonstrated in the comparison, population demographics significantly impact the relationship prior probabilities:

1. **Large, Diverse Populations**:
   - Higher probability of close relationships like parent-child and siblings
   - Lower probability of distant relationships like cousins
   - Reflect the lower background IBD sharing in large populations

2. **Endogamous Populations**:
   - Relatively lower probability of immediate family relationships
   - Higher probability of cousin and distant relationships
   - Account for the elevated background IBD sharing due to population structure

Tailoring the prior model to the specific population context can significantly improve the accuracy of relationship inference, especially in populations with unusual demographic histories or marriage patterns.

## Summary

In this lab, we explored the role of prior probability models in Bonsai v3's pedigree reconstruction framework. We've learned how to:

1. Understand the functions in the `prior.py` module that calculate ancestral sharing probabilities
2. Visualize how population size affects the expected number of common ancestors
3. Create custom relationship prior models for different population scenarios
4. Recognize how population demographics shape appropriate prior probability distributions

These concepts are fundamental to improving the accuracy of relationship inference in genetic genealogy, especially when dealing with limited or ambiguous genetic evidence.

### Connections to Other Labs

The concepts covered in this lab connect to:
- **Lab 6: Probabilistic Relationship Inference**: The prior models developed here complement the likelihood models discussed in that lab.
- **Lab 7: PwLogLike Class**: The `PwLogLike` class incorporates prior probabilities when calculating relationship likelihoods.
- **Lab 25: Real-World Datasets**: When working with real-world data, appropriate prior models are essential for accurate relationship inference.

### Further Reading

To deepen your understanding of these topics, consider exploring:

- Browning, S. R., & Browning, B. L. (2012). Identity by descent between distant relatives: detection and applications. Annual review of genetics, 46, 617-633.
- Wakeley, J. (2008). Coalescent theory: an introduction. Roberts & Company Publishers.
- Ralph, P., & Coop, G. (2013). The geography of recent genetic ancestry across Europe. PLoS biology, 11(5), e1001555.

---

## Answer Key (for instructors)

### Exercise 1
```python
def visualize_common_ancestors():
    """Visualize how the expected number of common ancestors varies with population size and generation depth."""
    # Create a range of population sizes to test
    population_sizes = [1000, 5000, 10000, 50000, 100000]
    
    # Create a range of generations to consider
    generations = list(range(1, 11))  # 1 to 10 generations
    
    # Create a figure with subplots
    fig, axs = plt.subplots(1, 2, figsize=(14, 6))
    
    # Plot 1: Expected number of common ancestors vs. generations for different population sizes
    for N in population_sizes:
        # Calculate the expected number of common ancestors for each generation
        expected_common_ancs = [prior.exp_num_shared_common_ancs(N, g) for g in generations]
        
        # Plot the results
        axs[0].plot(generations, expected_common_ancs, marker='o', label=f'N = {N:,}')
    
    axs[0].set_title('Expected Number of Shared Common Ancestors')
    axs[0].set_xlabel('Generations in the Past')
    axs[0].set_ylabel('Expected Number of Common Ancestors')
    axs[0].set_yscale('log')
    axs[0].legend()
    axs[0].grid(True, alpha=0.3)
    
    # Plot 2: Probability of sharing any IBD segment vs. meioses
    meioses = list(range(2, 21, 2))  # 2 to 20 meioses in steps of 2
    
    # Calculate probability of sharing any IBD segment for different min_seg_len values
    min_seg_lengths = [1, 5, 7, 10, 15]
    
    for min_seg_len in min_seg_lengths:
        # Calculate probability for each number of meioses
        prob_share = [prior.prob_share_any_ibd_seg(m, a=1, min_seg_len=min_seg_len) for m in meioses]
        
        # Plot the results
        axs[1].plot(meioses, prob_share, marker='o', label=f'min_seg_len = {min_seg_len} cM')
    
    axs[1].set_title('Probability of Sharing Any IBD Segment')
    axs[1].set_xlabel('Number of Meioses')
    axs[1].set_ylabel('Probability')
    axs[1].legend()
    axs[1].grid(True, alpha=0.3)
    
    plt.tight_layout()
    plt.show()
    
    return fig
```

### Exercise 2
```python
def compare_population_priors():
    """Create and compare relationship prior models for different population types."""
    # Define population parameters
    populations = {
        "Large, Diverse": {"size": 50000, "min_seg_len": 7},
        "Standard": {"size": 10000, "min_seg_len": 7},
        "Endogamous": {"size": 1000, "min_seg_len": 7}
    }
    
    # Create prior models for each population
    priors = {}
    for pop_name, params in populations.items():
        # Create basic prior
        prior_model = create_relationship_prior(
            population_size=params["size"],
            min_seg_len=params["min_seg_len"]
        )
        
        # Apply population-specific adjustments
        if pop_name == "Endogamous":
            # Increase probability of distant relationships
            for rel_tuple in [(2, 2, 2), (2, 3, 1), (3, 2, 1), (3, 3, 2)]:
                if rel_tuple in prior_model:
                    prior_model[rel_tuple] *= 3.0
        elif pop_name == "Large, Diverse":
            # Decrease probability of distant relationships
            for rel_tuple in [(2, 2, 2), (2, 3, 1), (3, 2, 1), (3, 3, 2)]:
                if rel_tuple in prior_model:
                    prior_model[rel_tuple] *= 0.5
        
        # Renormalize
        total = sum(prior_model.values())
        for rel_tuple in prior_model:
            prior_model[rel_tuple] /= total
        
        priors[pop_name] = prior_model
    
    # Rest of the visualization code...
```

### Self-Assessment Answers

1. How does the expected number of common ancestors change with population size?
   * Answer: The expected number of common ancestors decreases as population size increases. In smaller populations, two individuals are more likely to share common ancestors in the recent past.

2. Why is it important to adjust prior models for different population demographics?
   * Answer: Different populations have different expected frequencies of relationship types. Using an inappropriate prior model can lead to systematic biases in relationship inference, especially in populations with unusual demographic histories or marriage patterns.

3. How does the minimum detectable segment length affect the relationship prior model?
   * Answer: A smaller minimum segment length allows detection of more distant relationships, which shifts the prior probability distribution toward more distant relationships. Conversely, a larger minimum segment length means fewer distant relationships can be detected, shifting the prior distribution toward closer relationships.

In [None]:
# Optional: Convert this notebook to PDF
# Uncomment and run this cell if you want to generate a PDF version

# !jupyter nbconvert --to pdf "$(basename \"$PWD\").ipynb"