# Lab 30: Advanced Applications and Future Directions

## Overview

This final lab explores advanced applications of the Bonsai v3 system across various domains and examines emerging research directions in computational pedigree reconstruction. We'll investigate how Bonsai can be adapted for specialized use cases beyond standard genealogy and discuss potential future enhancements to the system.

**Learning Objectives:**
- Apply Bonsai v3 to specialized domains including founder populations, biomedical research, and historical research
- Adapt core algorithms for non-standard applications like conservation genetics and forensic investigation
- Explore emerging approaches like machine learning integration and multi-modal data fusion
- Design experimental approaches for population-scale reconstruction challenges
- Evaluate ethical considerations and privacy implications of advanced genetic genealogy applications

**Prerequisites:**
- Completion of Lab 20: Error Handling
- Completion of Lab 22: Interpreting Results
- Completion of Lab 29: End-to-End Implementation

**Estimated completion time:** 90-120 minutes

In [None]:
# Standard imports
import os
import sys
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
import networkx as nx
from IPython.display import display, HTML, Markdown
import inspect
import importlib

sys.path.append(os.path.dirname(os.getcwd()))

# Cross-compatibility setup
from scripts_support.lab_cross_compatibility import setup_environment, is_jupyterlite, save_results, save_plot

# Set up environment-specific paths
DATA_DIR, RESULTS_DIR = setup_environment()

# Set visualization styles
plt.style.use('seaborn-v0_8-whitegrid')
sns.set_context("notebook")
sns.set_palette("colorblind")  # Improve accessibility with colorblind-friendly palette

# Configure plot defaults for better readability
plt.rcParams.update({
    'figure.figsize': (10, 6),
    'font.size': 12,
    'axes.labelsize': 12,
    'axes.titlesize': 14,
    'xtick.labelsize': 10,
    'ytick.labelsize': 10
})

In [None]:
# Setup Bonsai module paths
if not is_jupyterlite():
    # In local environment, add the utils directory to system path
    utils_dir = os.getenv('PROJECT_UTILS_DIR', os.path.join(os.path.dirname(DATA_DIR), 'utils'))
    bonsaitree_dir = os.path.join(utils_dir, 'bonsaitree')
    
    # Add to path if it exists and isn't already there
    if os.path.exists(bonsaitree_dir) and bonsaitree_dir not in sys.path:
        sys.path.append(bonsaitree_dir)
        print(f"Added {bonsaitree_dir} to sys.path")
else:
    # In JupyterLite, use a simplified approach
    print("⚠️ Running in JupyterLite: Some Bonsai functionality may be limited.")
    print("This notebook is primarily designed for local execution where the Bonsai codebase is available.")

In [None]:
# Helper functions for exploring modules
def display_module_classes(module_name):
    """Display classes and their docstrings from a module"""
    try:
        # Import the module
        module = importlib.import_module(module_name)
        
        # Find all classes
        classes = inspect.getmembers(module, inspect.isclass)
        
        # Filter classes defined in this module (not imported)
        classes = [(name, cls) for name, cls in classes if cls.__module__ == module_name]
        
        if not classes:
            print(f"No classes found in module {module_name}")
            return
            
        # Print info for each class
        for name, cls in classes:
            display(Markdown(f"### Class: {name}"))
            
            # Get docstring
            doc = inspect.getdoc(cls)
            if doc:
                display(Markdown(f"**Documentation:**\
{doc}"))
            else:
                display(Markdown("*No documentation available*"))
            
            # Get methods
            methods = inspect.getmembers(cls, inspect.isfunction)
            public_methods = [(method_name, method) for method_name, method in methods 
                             if not method_name.startswith('_')]
            
            if public_methods:
                display(Markdown("**Public Methods:**"))
                for method_name, method in public_methods:
                    sig = inspect.signature(method)
                    display(Markdown(f"- `{method_name}{sig}`"))
            else:
                display(Markdown("*No public methods*"))
            
            display(Markdown("---"))
    except ImportError as e:
        print(f"Error importing module {module_name}: {e}")
    except Exception as e:
        print(f"Error processing module {module_name}: {e}")

def display_module_functions(module_name):
    """Display functions and their docstrings from a module"""
    try:
        # Import the module
        module = importlib.import_module(module_name)
        
        # Find all functions
        functions = inspect.getmembers(module, inspect.isfunction)
        
        # Filter functions defined in this module (not imported)
        functions = [(name, func) for name, func in functions if func.__module__ == module_name]
        
        if not functions:
            print(f"No functions found in module {module_name}")
            return
            
        # Filter public functions
        public_functions = [(name, func) for name, func in functions if not name.startswith('_')]
        
        if not public_functions:
            print(f"No public functions found in module {module_name}")
            return
            
        # Print info for each function
        for name, func in public_functions:                
            display(Markdown(f"### Function: {name}"))
            
            # Get signature
            sig = inspect.signature(func)
            display(Markdown(f"**Signature:** `{name}{sig}`"))
            
            # Get docstring
            doc = inspect.getdoc(func)
            if doc:
                display(Markdown(f"**Documentation:**\
{doc}"))
            else:
                display(Markdown("*No documentation available*"))
                
            display(Markdown("---"))
    except ImportError as e:
        print(f"Error importing module {module_name}: {e}")
    except Exception as e:
        print(f"Error processing module {module_name}: {e}")

def view_function_source(module_name, function_name):
    """Display the source code of a function"""
    try:
        # Import the module
        module = importlib.import_module(module_name)
        
        # Get the function
        func = getattr(module, function_name)
        
        # Get the source code
        source = inspect.getsource(func)
        
        # Print the source code with syntax highlighting
        display(Markdown(f"### Source code for `{function_name}`\
```python\
{source}\
```"))
    except ImportError as e:
        print(f"Error importing module {module_name}: {e}")
    except AttributeError:
        print(f"Function {function_name} not found in module {module_name}")
    except Exception as e:
        print(f"Error processing function {function_name}: {e}")

def view_class_source(module_name, class_name):
    """Display the source code of a class"""
    try:
        # Import the module
        module = importlib.import_module(module_name)
        
        # Get the class
        cls = getattr(module, class_name)
        
        # Get the source code
        source = inspect.getsource(cls)
        
        # Print the source code with syntax highlighting
        display(Markdown(f"### Source code for class `{class_name}`\
```python\
{source}\
```"))
    except ImportError as e:
        print(f"Error importing module {module_name}: {e}")
    except AttributeError:
        print(f"Class {class_name} not found in module {module_name}")
    except Exception as e:
        print(f"Error processing class {class_name}: {e}")

def explore_module(module_name):
    """Display a comprehensive overview of a module with classes and functions"""
    try:
        # Import the module
        module = importlib.import_module(module_name)
        
        # Module docstring
        doc = inspect.getdoc(module)
        display(Markdown(f"# Module: {module_name}"))
        
        if doc:
            display(Markdown(f"**Module Documentation:**\
{doc}"))
        else:
            display(Markdown("*No module documentation available*"))
            
        display(Markdown("---"))
        
        # Display classes
        display(Markdown("## Classes"))
        display_module_classes(module_name)
        
        # Display functions
        display(Markdown("## Functions"))
        display_module_functions(module_name)
        
    except ImportError as e:
        print(f"Error importing module {module_name}: {e}")
    except Exception as e:
        print(f"Error exploring module {module_name}: {e}")

## Check Bonsai Installation

Let's verify that the Bonsai v3 module is available for import:

In [None]:
try:
    from bonsaitree import v3
    print("✅ Successfully imported Bonsai v3 module")
    
    # Print Bonsai version information if available
    if hasattr(v3, "__version__"):
        print(f"Bonsai v3 version: {v3.__version__}")
    
    # List key submodules
    print("\
Available Bonsai submodules:")
    for module_name in dir(v3):
        if not module_name.startswith("_") and not module_name.startswith("__"):
            print(f"- {module_name}")
except ImportError as e:
    print(f"❌ Failed to import Bonsai v3 module: {e}")
    print("This lab requires access to the Bonsai v3 codebase.")
    print("Make sure you've properly set up your environment with the Bonsai repository.")

## Introduction

The Bonsai v3 system we've explored throughout this course is a powerful framework for pedigree reconstruction from genetic data. While our primary focus has been on standard genealogical applications, the core algorithms and approaches can be adapted and extended for a wide range of specialized domains.

In this final lab, we'll investigate several advanced applications of Bonsai v3 and explore promising research directions for the future of computational pedigree reconstruction. These applications span diverse fields including population genetics, biomedical research, conservation biology, historical studies, and forensic science.

We'll also examine how emerging technologies like machine learning, multi-modal data integration, and distributed computing could enhance pedigree reconstruction methods. Throughout the lab, we'll consider ethical implications and privacy challenges that arise in these advanced applications.

**Key concepts we'll cover:**
- Adapting Bonsai for specialized population scenarios
- Domain-specific applications in research and practice
- Algorithmic extensions for future capabilities
- Ethical considerations in advanced applications

## Part 1: Specialized Population Applications

### Theory and Background

Specialized population scenarios present unique challenges and opportunities for pedigree reconstruction. In this section, we'll explore how Bonsai v3 can be adapted for populations with distinctive genetic characteristics or historical contexts.

#### Founder Populations

Founder populations are groups that originated from a small number of ancestors and have experienced limited admixture with outside groups. Examples include religious isolates (like Amish and Hutterite communities), geographical isolates (like island populations), and cultural isolates.

Key characteristics of founder populations include:
- **High levels of endogamy** (marriage within the group)
- **Population bottlenecks** (historical reductions in population size)
- **Increased IBD sharing** across seemingly distant relatives
- **Distinctive IBD patterns** that differ from outbred populations

In founder populations, standard methods for relationship inference can produce misleading results due to:
1. Background relatedness that exceeds typical thresholds
2. Higher prevalence of multiple common ancestors for any pair of individuals
3. Complex interrelated pedigree structures that span many generations

#### Historical Populations

Reconstructing pedigrees for historical populations involves additional challenges:
- Limited or degraded genetic data from ancient samples
- Need to integrate documentary evidence with genetic data
- Temporal dynamics spanning multiple centuries
- Cultural practices that differ from contemporary standards
- Gaps in the genetic record due to incomplete sampling

#### Biomedical Applications

In biomedical research, pedigree reconstruction can help identify inherited genetic patterns associated with diseases:
- **Disease variant tracking** through families
- **Carrier status detection** in extended family networks
- **Penetrance estimation** through accurately reconstructed relationships
- **Risk prediction** leveraging family structure

These applications often require higher precision in relationship inference and more comprehensive error handling than standard genealogy.

### Implementation in Bonsai v3

Bonsai v3 includes several components that can be adapted for specialized population applications. Let's explore these extensions and the necessary modifications to handle the unique challenges presented by founder populations, historical datasets, and biomedical applications.

#### Handling Founder Populations: Adjusting Prior Probabilities

Bonsai's relationship inference system can be adapted for founder populations by:

1. **Recalibrating IBD thresholds**: Increasing expected IBD sharing levels for relationship classes
2. **Adjusting prior probabilities**: Changing the expected distribution of relationship types
3. **Modeling endogamy explicitly**: Incorporating historical endogamy rates into relationship models
4. **Ancestral haplotype tracking**: Identifying founder-specific segments shared by descendants

Let's examine the configuration modules in Bonsai that allow for these adaptations:

In [ ]:
try:
    # Explore the prior model configuration in Bonsai
    display(Markdown("### Prior Model Configuration in Bonsai v3"))
    
    try:
        from bonsaitree.v3 import priors
        print("Successfully imported priors module")
        
        # Display the module structure and classes
        display_module_classes("bonsaitree.v3.priors")
        
        # Look at prior distribution functions
        display_module_functions("bonsaitree.v3.priors")
        
    except ImportError:
        print("Could not import Bonsai v3 priors module.")
        print("Will use theoretical discussion instead.")
        
    # Let's also look at the configuration options for relationship models
    try:
        from bonsaitree.v3 import models
        print("\
Successfully imported models module")
        
        # Display relationship model configuration
        display_module_classes("bonsaitree.v3.models")
        
    except ImportError:
        print("Could not import Bonsai v3 models module.")
        
except Exception as e:
    print(f"Error exploring Bonsai modules: {e}")
    print("Will continue with theoretical discussion of adaptations for specialized populations.")

In [ ]:
# Create visualizations for IBD expectations in standard vs. founder populations

import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns

# Define relationship degrees and expected IBD values for standard populations
relationships = ['Parent-Child', 'Full Siblings', 
                 'Half Siblings/\
Avuncular/\
Grandparent', 
                 'First Cousins',
                 'First Cousins\
Once Removed', 
                 'Second Cousins']

# Mean IBD sharing (cM) in standard populations
standard_ibd = [3400, 2550, 1700, 850, 425, 212.5]

# Increased IBD sharing in founder populations (hypothetical values)
founder_ibd = [3400, 2550, 1850, 1050, 600, 380]

# Calculate standard deviations (roughly estimated for visualization purposes)
standard_std = [50, 200, 300, 200, 150, 100]
founder_std = [50, 200, 375, 300, 225, 200]  # Higher variance in founder populations

# Set up the figure
plt.figure(figsize=(12, 8))

# Plot settings
bar_width = 0.4
opacity = 0.8
index = np.arange(len(relationships))

# Plot bars for standard population
plt.bar(index, standard_ibd, bar_width,
        yerr=standard_std,
        alpha=opacity,
        color='steelblue',
        label='Standard Population')

# Plot bars for founder population
plt.bar(index + bar_width, founder_ibd, bar_width,
        yerr=founder_std,
        alpha=opacity,
        color='darkred',
        label='Founder Population')

# Add labels and title
plt.xlabel('Relationship Type', fontsize=12)
plt.ylabel('Expected IBD Sharing (cM)', fontsize=12)
plt.title('Expected IBD Sharing in Standard vs. Founder Populations', fontsize=14)
plt.xticks(index + bar_width/2, relationships)
plt.legend()

# Add a text annotation describing founder effects
plt.figtext(0.5, 0.01, 
            "Founder populations typically show increased IBD sharing, especially for more distant relationships,\
" +
            "due to background relatedness and historical endogamy.",
            ha="center", fontsize=11, bbox={"facecolor":"lightyellow", "alpha":0.5, "pad":5})

plt.tight_layout(rect=[0, 0.05, 1, 1])  # Adjust layout to make room for the annotation
plt.show()

# Create a second visualization showing pedigree complexity comparison
plt.figure(figsize=(12, 6))

# Number of unique ancestors at each generation for standard and founder populations
generations = np.arange(1, 11)  # 10 generations back
standard_ancestors = [2, 4, 8, 16, 32, 64, 128, 256, 512, 1024]  # 2^n in ideal case
founder_ancestors = [2, 4, 7, 12, 18, 26, 35, 45, 55, 62]  # Hypothetical values with ancestor collapse

plt.plot(generations, standard_ancestors, 'o-', linewidth=2, markersize=8, color='steelblue', label='Standard Population')
plt.plot(generations, founder_ancestors, 'o-', linewidth=2, markersize=8, color='darkred', label='Founder Population')

# Fill area between curves to highlight the difference
plt.fill_between(generations, standard_ancestors, founder_ancestors, color='lightgray', alpha=0.5)

# Annotate the "pedigree collapse" region
plt.annotate('Pedigree Collapse Region', 
             xy=(7, 150), 
             xytext=(5, 300),
             arrowprops=dict(facecolor='black', shrink=0.05, width=1.5, headwidth=8),
             fontsize=12)

plt.xlabel('Generations Back', fontsize=12)
plt.ylabel('Number of Unique Ancestors', fontsize=12)
plt.title('Pedigree Collapse in Founder Populations', fontsize=14)
plt.xticks(generations)
plt.grid(True, alpha=0.3)
plt.legend()

plt.tight_layout()
plt.show()

In [ ]:
# Configuration for specialized population scenarios

# Simulate a founder population prior model (hypothetical code for demonstration)
def create_founder_population_priors(endogamy_factor=2.0, generations_isolated=10):
    """
    Create a prior distribution model for a founder population.
    
    Args:
        endogamy_factor: Factor by which to increase probability of close relationships compared to standard population
        generations_isolated: Number of generations the population has been isolated
        
    Returns:
        Dictionary with prior configuration for founder populations
    """
    # Start with standard priors
    standard_priors = {
        'parent_child': 0.01,
        'full_sibling': 0.01,
        'half_sibling': 0.02,
        'grandparent': 0.02,
        'avuncular': 0.03,
        'first_cousin': 0.05,
        'first_cousin_once_removed': 0.1,
        'second_cousin': 0.2,
        'distant': 0.56
    }
    
    # Calculate influence of endogamy based on generations isolated
    isolation_effect = min(0.9, endogamy_factor * (1 - 0.9**generations_isolated))
    
    # Adjust priors for founder effect
    founder_priors = {}
    for rel_type, prior in standard_priors.items():
        if rel_type == 'distant':
            # Reduce probability of truly distant relationships
            founder_priors[rel_type] = max(0.05, prior * (1 - isolation_effect))
        else:
            # Increase probability of close and moderately distant relationships
            adjustment = isolation_effect * (0.5 if 'cousin' in rel_type else 0.3)
            founder_priors[rel_type] = min(0.25, prior * (1 + adjustment))
    
    # Renormalize to ensure priors sum to 1
    total = sum(founder_priors.values())
    for rel_type in founder_priors:
        founder_priors[rel_type] /= total
    
    return founder_priors

# Create and display priors for different population scenarios
standard_priors = {
    'parent_child': 0.01,
    'full_sibling': 0.01,
    'half_sibling': 0.02,
    'grandparent': 0.02,
    'avuncular': 0.03,
    'first_cousin': 0.05,
    'first_cousin_once_removed': 0.1,
    'second_cousin': 0.2,
    'distant': 0.56
}

moderate_founder_priors = create_founder_population_priors(endogamy_factor=1.5, generations_isolated=5)
strong_founder_priors = create_founder_population_priors(endogamy_factor=3.0, generations_isolated=15)

# Plot the prior distributions for comparison
plt.figure(figsize=(14, 8))

# Organize relationship types for better visualization
rel_types = ['parent_child', 'full_sibling', 'half_sibling', 'grandparent', 'avuncular', 
             'first_cousin', 'first_cousin_once_removed', 'second_cousin', 'distant']
rel_labels = ['Parent-Child', 'Full Sibling', 'Half Sibling', 'Grandparent', 'Avuncular',
              'First Cousin', 'First Cousin\
Once Removed', 'Second Cousin', 'Distant']

# Extract values in consistent order
standard_values = [standard_priors[rel] for rel in rel_types]
moderate_values = [moderate_founder_priors[rel] for rel in rel_types]
strong_values = [strong_founder_priors[rel] for rel in rel_types]

# Plot settings
bar_width = 0.25
opacity = 0.8
index = np.arange(len(rel_types))

# Plot bars for each scenario
plt.bar(index, standard_values, bar_width, alpha=opacity, color='steelblue', label='Standard Population')
plt.bar(index + bar_width, moderate_values, bar_width, alpha=opacity, color='darkorange', 
        label='Moderate Founder Effect (5 generations)')
plt.bar(index + 2*bar_width, strong_values, bar_width, alpha=opacity, color='darkred',
        label='Strong Founder Effect (15 generations)')

# Add labels and formatting
plt.xlabel('Relationship Type', fontsize=12)
plt.ylabel('Prior Probability', fontsize=12)
plt.title('Relationship Prior Distributions Across Population Scenarios', fontsize=14)
plt.xticks(index + bar_width, rel_labels, rotation=45, ha='right')
plt.legend()

plt.tight_layout()
plt.show()

# Now show how IBD interpretation would change
from IPython.display import Markdown, display

display(Markdown("""
### Adapting IBD Interpretation for Founder Populations

In a standard population, we might interpret IBD sharing as follows:

| Total IBD (cM) | Standard Interpretation | Founder Population Interpretation |
|----------------|-------------------------|----------------------------------|
| 3400           | Parent-Child            | Parent-Child (unchanged)         |
| 2500           | Full Siblings           | Full Siblings (unchanged)        |
| 1700           | Half Siblings/Avuncular | Half Siblings/Avuncular (unchanged) |
| 950            | First Cousins           | First Cousins (higher confidence needed) |
| 600            | First Cousins           | **Could be First Cousins or First Cousins Once Removed** |
| 400            | First Cousins Once Removed | **Could be multiple relationships in range 1C1R-2C** |
| 250            | Second Cousins          | **Could be 2C or multiple pathways of more distant relationship** |
| 100            | Distant Relative        | **Could be multiple intersecting distant relationships** |

This shows how the same amount of IBD sharing requires different interpretations in founder populations due to the effects of endogamy and pedigree collapse.
"""))

### Exercise 1: Adapting Bonsai for Founder Populations

In this exercise, you'll adapt Bonsai's relationship inference approach to better handle founder populations.

**Task:** Create a configuration function that adjusts IBD thresholds and prior probabilities for a founder population scenario.

**Hint:** Consider how both the expected IBD sharing distributions and the prior probabilities of different relationship types need to be adjusted.

In [ ]:
def configure_bonsai_for_founder_population(
    population_name,
    estimated_endogamy_factor=1.5,
    generations_isolated=8,
    background_ibd_shift=20,
    min_confidence_threshold=0.7
):
    """
    Configure Bonsai's relationship inference for a founder population.
    
    Args:
        population_name: Name of the founder population (e.g., "Ashkenazi Jewish", "Finnish")
        estimated_endogamy_factor: Factor by which to adjust relationship probabilities (1.0 = standard)
        generations_isolated: Estimated number of generations of isolation
        background_ibd_shift: Amount to shift distant relationship IBD expectations (in cM)
        min_confidence_threshold: Minimum confidence required for relationship classifications
        
    Returns:
        Dictionary with Bonsai configuration parameters adjusted for the founder population
    """
    # TODO: Implement configuration adjustments for founder populations
    
    # Start with default configuration
    config = {
        "population_name": population_name,
        "relationship_priors": {},
        "ibd_expectations": {},
        "confidence_thresholds": {},
        "pedigree_parameters": {}
    }
    
    # 1. Adjust the prior probabilities for each relationship type
    
    # 2. Modify the expected IBD sharing distributions for each relationship type
    
    # 3. Adjust confidence thresholds to require stronger evidence for distant relationships
    
    # 4. Configure pedigree reconstruction parameters to handle higher interconnectedness
    
    return config

# Test your solution
# founder_config = configure_bonsai_for_founder_population("Ashkenazi Jewish", estimated_endogamy_factor=2.0)

In [ ]:
# Example solution for Exercise 1

def configure_bonsai_for_founder_population(
    population_name,
    estimated_endogamy_factor=1.5,
    generations_isolated=8,
    background_ibd_shift=20,
    min_confidence_threshold=0.7
):
    """
    Configure Bonsai's relationship inference for a founder population.
    
    Args:
        population_name: Name of the founder population (e.g., "Ashkenazi Jewish", "Finnish")
        estimated_endogamy_factor: Factor by which to adjust relationship probabilities (1.0 = standard)
        generations_isolated: Estimated number of generations of isolation
        background_ibd_shift: Amount to shift distant relationship IBD expectations (in cM)
        min_confidence_threshold: Minimum confidence required for relationship classifications
        
    Returns:
        Dictionary with Bonsai configuration parameters adjusted for the founder population
    """
    # Start with default configuration
    config = {
        "population_name": population_name,
        "relationship_priors": {},
        "ibd_expectations": {},
        "confidence_thresholds": {},
        "pedigree_parameters": {}
    }
    
    # 1. Adjust the prior probabilities for each relationship type
    # Standard priors for reference
    standard_priors = {
        'parent_child': 0.01,
        'full_sibling': 0.01,
        'half_sibling': 0.02,
        'grandparent': 0.02,
        'avuncular': 0.03,
        'first_cousin': 0.05,
        'first_cousin_once_removed': 0.1,
        'second_cousin': 0.2,
        'distant': 0.56
    }
    
    # Calculate isolation effect
    isolation_effect = min(0.9, estimated_endogamy_factor * (1 - 0.9**generations_isolated))
    
    # Adjust priors based on endogamy
    founder_priors = {}
    for rel_type, prior in standard_priors.items():
        if rel_type == 'distant':
            # Reduce probability of truly distant relationships
            founder_priors[rel_type] = max(0.05, prior * (1 - isolation_effect))
        else:
            # Increase probability of close and moderately distant relationships
            adjustment = isolation_effect * (0.5 if 'cousin' in rel_type else 0.3)
            founder_priors[rel_type] = min(0.25, prior * (1 + adjustment))
    
    # Renormalize priors
    total = sum(founder_priors.values())
    for rel_type in founder_priors:
        founder_priors[rel_type] /= total
    
    config["relationship_priors"] = founder_priors
    
    # 2. Modify the expected IBD sharing distributions for each relationship type
    # Standard IBD expectations (mean, std) in cM
    standard_ibd_expectations = {
        'parent_child': (3400, 100),
        'full_sibling': (2550, 180),
        'half_sibling': (1700, 200),
        'grandparent': (1700, 200),  
        'avuncular': (1700, 220),
        'first_cousin': (850, 120),
        'first_cousin_once_removed': (425, 90),
        'second_cousin': (212.5, 50),
        'distant': (50, 30)
    }
    
    # Adjust IBD expectations for founder effect
    founder_ibd_expectations = {}
    for rel_type, (mean, std) in standard_ibd_expectations.items():
        if rel_type in ['parent_child', 'full_sibling', 'half_sibling', 'grandparent', 'avuncular']:
            # Close relationships are less affected
            adjusted_mean = mean * (1 + 0.05 * min(1.0, isolation_effect))
            adjusted_std = std * (1 + 0.1 * min(1.0, isolation_effect))
        elif 'cousin' in rel_type:
            # Cousins are more affected
            degree = 1 if 'first' in rel_type else 2
            removed = 1 if 'removed' in rel_type else 0
            # More distant relationships see larger increase in expected IBD
            adjustment_factor = 0.1 + 0.05 * (degree + removed)
            adjusted_mean = mean * (1 + adjustment_factor * isolation_effect)
            adjusted_std = std * (1 + 0.2 * isolation_effect)  # Increased variance
        else:
            # Distant relationships have shifted baseline
            adjusted_mean = mean + background_ibd_shift * isolation_effect
            adjusted_std = std * (1 + 0.3 * isolation_effect)  # Much higher variance
            
        founder_ibd_expectations[rel_type] = (adjusted_mean, adjusted_std)
    
    config["ibd_expectations"] = founder_ibd_expectations
    
    # 3. Adjust confidence thresholds to require stronger evidence for distant relationships
    standard_confidence_thresholds = {
        'parent_child': 0.8,
        'full_sibling': 0.8,
        'half_sibling': 0.7,
        'grandparent': 0.7,
        'avuncular': 0.7,
        'first_cousin': 0.6,
        'first_cousin_once_removed': 0.6,
        'second_cousin': 0.5,
        'distant': 0.4
    }
    
    # Increase thresholds for more distant relationships
    founder_confidence_thresholds = {}
    for rel_type, threshold in standard_confidence_thresholds.items():
        if rel_type in ['parent_child', 'full_sibling']:
            # Keep high confidence for very close relationships
            adjusted_threshold = threshold
        elif rel_type in ['half_sibling', 'grandparent', 'avuncular']:
            # Slightly increased threshold for close relationships
            adjusted_threshold = min(0.95, threshold + 0.05 * isolation_effect)
        else:
            # Significant increase for distant relationships
            increase = 0.1 * isolation_effect
            adjusted_threshold = min(0.95, threshold + increase)
            # Ensure we don't go below minimum confidence threshold
            adjusted_threshold = max(adjusted_threshold, min_confidence_threshold)
            
        founder_confidence_thresholds[rel_type] = adjusted_threshold
    
    config["confidence_thresholds"] = founder_confidence_thresholds
    
    # 4. Configure pedigree reconstruction parameters to handle higher interconnectedness
    config["pedigree_parameters"] = {
        "max_pedigree_size": 500,  # Larger pedigrees to accommodate interconnections
        "max_generations": int(10 + generations_isolated / 3),  # Deeper genealogy
        "allow_multiple_common_ancestors": True,  # Critical for founder populations
        "max_relationship_paths": 5,  # Consider multiple relationship paths
        "min_path_confidence": min_confidence_threshold,  # Minimum confidence for a path
        "ungenotyped_ancestor_penalty": 0.8 / estimated_endogamy_factor,  # Reduce penalty in founder populations
        "endogamy_aware": True,  # Enable endogamy-specific optimizations
        "background_ibd_threshold": max(5, 20 - background_ibd_shift),  # Lower threshold for background relatedness
    }
    
    return config

# Test with different founder populations
ashkenazi_config = configure_bonsai_for_founder_population(
    "Ashkenazi Jewish", 
    estimated_endogamy_factor=2.0,
    generations_isolated=20
)

finnish_config = configure_bonsai_for_founder_population(
    "Finnish", 
    estimated_endogamy_factor=1.2,
    generations_isolated=100
)

# Display the configurations
print(f"Configuration for {ashkenazi_config['population_name']}:")
print("\
Relationship Prior Probabilities:")
for rel, prior in ashkenazi_config['relationship_priors'].items():
    print(f"  {rel}: {prior:.3f}")

print("\
Confidence Thresholds:")
for rel, threshold in ashkenazi_config['confidence_thresholds'].items():
    print(f"  {rel}: {threshold:.2f}")

print("\
Pedigree Parameters:")
for param, value in ashkenazi_config['pedigree_parameters'].items():
    print(f"  {param}: {value}")

# Visualize the differences in IBD expectations
plt.figure(figsize=(12, 8))

# Only show a subset of relationships for clarity
relationships_to_show = ['first_cousin', 'first_cousin_once_removed', 'second_cousin', 'distant']
rel_labels = ['First Cousin', 'First Cousin\
Once Removed', 'Second Cousin', 'Distant']

# Standard values
means_standard = [standard_ibd_expectations[rel][0] for rel in relationships_to_show]
std_standard = [standard_ibd_expectations[rel][1] for rel in relationships_to_show]

# Ashkenazi values
means_ashkenazi = [ashkenazi_config['ibd_expectations'][rel][0] for rel in relationships_to_show]
std_ashkenazi = [ashkenazi_config['ibd_expectations'][rel][1] for rel in relationships_to_show]

# Finnish values
means_finnish = [finnish_config['ibd_expectations'][rel][0] for rel in relationships_to_show]
std_finnish = [finnish_config['ibd_expectations'][rel][1] for rel in relationships_to_show]

# Plot settings
bar_width = 0.25
opacity = 0.8
index = np.arange(len(relationships_to_show))

# Plot bars
plt.bar(index, means_standard, bar_width, yerr=std_standard, alpha=opacity, color='steelblue', 
        label='Standard Population')
plt.bar(index + bar_width, means_ashkenazi, bar_width, yerr=std_ashkenazi, alpha=opacity, color='darkred',
        label='Ashkenazi Jewish (Strong Founder Effect)')
plt.bar(index + 2*bar_width, means_finnish, bar_width, yerr=std_finnish, alpha=opacity, color='darkorange',
        label='Finnish (Moderate Founder Effect)')

# Add labels and formatting
plt.xlabel('Relationship Type', fontsize=12)
plt.ylabel('Expected IBD Sharing (cM)', fontsize=12)
plt.title('Adjusted IBD Expectations for Different Populations', fontsize=14)
plt.xticks(index + bar_width, rel_labels)
plt.legend()

plt.tight_layout()
plt.show()

## Part 2: Conservation Genetics and Non-Human Applications

### Theory and Background

The principles and algorithms of Bonsai v3 can be extended beyond human genealogy to conservation genetics and breeding programs for endangered species. This adaptation requires understanding the key differences between human and non-human genetic inheritance patterns:

#### Key Differences in Non-Human Applications:

1. **Different Genetic Architecture**:
   - Chromosome numbers vary widely across species
   - Recombination rates and patterns differ from humans
   - Some species have unique inheritance mechanisms (e.g., haplodiploidy in bees)

2. **Breeding Patterns**:
   - Many species have non-monogamous mating structures
   - Some species produce large numbers of offspring with high mortality
   - Seasonal breeding cycles can create distinct generational cohorts

3. **Population Structures**:
   - Conservation programs often involve highly managed breeding
   - Many endangered species have extreme population bottlenecks
   - Captive populations may have detailed pedigree records but limited genetic data

4. **Data Characteristics**:
   - Often lower density of genetic markers compared to human data
   - Samples may be collected non-invasively (e.g., from feces or hair)
   - Reference genomes may be less complete or accurate

#### Applications in Conservation Genetics:

Conservation genetics uses pedigree reconstruction to:

1. **Manage Genetic Diversity**:
   - Identify optimal breeding pairs to maximize diversity
   - Avoid inbreeding depression in small populations
   - Balance competing conservation goals (genetic diversity vs. adaptation)

2. **Monitor Wild Populations**:
   - Estimate census and effective population sizes
   - Detect population structure and migration patterns
   - Identify kinship groups and social structures

3. **Forensic Applications**:
   - Track illegal wildlife trade through genetic relationships
   - Identify the origins of confiscated animals or animal products
   - Provide evidence for wildlife crime prosecution

### Implementation for Conservation Genetics

Adapting Bonsai v3 for conservation genetics requires several key modifications to its core algorithms and data structures:

#### 1. Genetic Map Adaptations

The genetic map defines the relationship between physical position (base pairs) and genetic distance (centiMorgans). This map varies substantially between species and must be calibrated for each target species:

```python
# Example of species-specific genetic map configuration
def load_species_genetic_map(species_name, chromosome):
    """Load the genetic map for a specific species and chromosome."""
    if species_name == "panthera_tigris":  # Tiger
        # Tigers have 19 chromosome pairs
        if chromosome > 19:
            raise ValueError(f"Invalid chromosome number {chromosome} for tigers (max 19)")
        map_file = f"data/genetic_maps/{species_name}/chr{chromosome}.map"
    elif species_name == "gorilla_gorilla":  # Gorilla
        # Gorillas have 24 chromosome pairs (close to humans)
        if chromosome > 24:
            raise ValueError(f"Invalid chromosome number {chromosome} for gorillas (max 24)")
        map_file = f"data/genetic_maps/{species_name}/chr{chromosome}.map"
    else:
        raise ValueError(f"Unsupported species: {species_name}")
    
    # Load the map data
    positions = []
    genetic_distances = []
    
    # In a real implementation, this would read from an actual file
    # Here we simulate with random data
    import numpy as np
    chrom_length = {
        "panthera_tigris": [150e6, 160e6, 170e6, 135e6, 140e6, 130e6, 120e6, 
                          125e6, 115e6, 110e6, 100e6, 95e6, 90e6, 85e6, 
                          80e6, 75e6, 70e6, 65e6, 60e6],
        "gorilla_gorilla": [230e6, 240e6, 200e6, 210e6, 180e6, 170e6, 160e6,
                          150e6, 140e6, 135e6, 130e6, 125e6, 120e6, 110e6,
                          100e6, 90e6, 80e6, 70e6, 65e6, 60e6, 50e6, 55e6, 
                          56e6, 40e6],
    }
    
    # Create a simulated genetic map with appropriate length for the species/chromosome
    chr_len = chrom_length[species_name][chromosome-1]
    
    # Generate physical positions (bp)
    positions = np.linspace(0, chr_len, 1000, dtype=int)
    
    # Generate genetic distances (cM)
    # Use species-specific recombination rates
    recombination_rate = {
        "panthera_tigris": 1.2,  # cM/Mb, hypothetical value
        "gorilla_gorilla": 0.9,  # cM/Mb, hypothetical value
    }
    
    rate = recombination_rate[species_name]
    
    # Add some variation to make it realistic
    noise = np.random.normal(0, 0.2, len(positions))
    genetic_distances = positions / 1e6 * rate * (1 + noise)
    genetic_distances = np.cumsum(np.abs(np.diff(np.insert(genetic_distances, 0, 0))))
    
    return positions, genetic_distances
```

#### 2. Species-Specific IBD Thresholds

IBD expectations must be adjusted for:
- Chromosome number and size differences
- Recombination rate variations
- Mating system characteristics

#### 3. Relationship Model Adaptations

Relationship models need to accommodate:
- Polygamous mating systems
- Different inbreeding patterns
- Species-specific demographic constraints

In [ ]:
# Implement species-specific IBD threshold calculator

def calculate_species_ibd_thresholds(
    species_name,
    genome_size_mb,
    chromosome_count,
    avg_recombination_rate,
    mating_system="monogamous"
):
    """
    Calculate IBD thresholds for a specific species based on its genetic characteristics.
    
    Args:
        species_name: Name of the species
        genome_size_mb: Genome size in megabases
        chromosome_count: Number of chromosome pairs
        avg_recombination_rate: Average recombination rate in cM/Mb
        mating_system: Mating system type ('monogamous', 'polygamous', 'polyandrous', etc.)
        
    Returns:
        Dictionary with IBD thresholds for different relationship types
    """
    # Calculate total genetic map length in cM
    total_genetic_length = genome_size_mb * avg_recombination_rate
    
    # Scaling factor relative to human genome (human ≈ 3000 cM)
    scaling_factor = total_genetic_length / 3000.0
    
    # Standard human IBD expectations (mean values in cM)
    human_ibd_thresholds = {
        'parent_child': 3400,
        'full_sibling': 2550,
        'half_sibling': 1700,
        'grandparent': 1700,
        'avuncular': 1700,
        'first_cousin': 850,
        'first_cousin_once_removed': 425,
        'second_cousin': 212.5,
        'distant': 100
    }
    
    # Adjust for species-specific genome characteristics
    species_thresholds = {}
    for rel_type, threshold in human_ibd_thresholds.items():
        # Scale the threshold based on genome size
        adjusted_threshold = threshold * scaling_factor
        
        # Adjust for mating system
        if mating_system == "polygamous" and "sibling" in rel_type:
            # Higher variance in polygamous species siblings
            adjusted_threshold *= 0.9  # Slightly lower threshold to account for higher variance
        elif mating_system == "polyandrous" and rel_type == "parent_child":
            # Maternal certainty but paternal uncertainty
            if rel_type == "parent_child":
                # No change for parent-child as we can still identify parent-child
                pass
        
        species_thresholds[rel_type] = adjusted_threshold
    
    # Add species-specific metadata
    return {
        "species_name": species_name,
        "genome_size_mb": genome_size_mb,
        "chromosome_count": chromosome_count,
        "total_genetic_length_cm": total_genetic_length,
        "thresholds": species_thresholds
    }

# Define some example species
species_database = [
    {
        "name": "Homo sapiens",
        "common_name": "Human",
        "genome_size_mb": 3200,
        "chromosome_count": 23,
        "avg_recombination_rate": 1.0,
        "mating_system": "monogamous"
    },
    {
        "name": "Panthera tigris",
        "common_name": "Tiger",
        "genome_size_mb": 2400,
        "chromosome_count": 19,
        "avg_recombination_rate": 1.2,
        "mating_system": "polygamous"
    },
    {
        "name": "Gorilla gorilla",
        "common_name": "Western Gorilla",
        "genome_size_mb": 3100,
        "chromosome_count": 24,
        "avg_recombination_rate": 0.9,
        "mating_system": "polygamous"
    },
    {
        "name": "Canis lupus familiaris",
        "common_name": "Domestic Dog",
        "genome_size_mb": 2500,
        "chromosome_count": 39,
        "avg_recombination_rate": 0.97,
        "mating_system": "polygamous"
    },
    {
        "name": "Ailuropoda melanoleuca",
        "common_name": "Giant Panda",
        "genome_size_mb": 2300,
        "chromosome_count": 21,
        "avg_recombination_rate": 1.05,
        "mating_system": "polygamous"
    }
]

# Calculate and compare IBD thresholds across species
thresholds_by_species = {}
for species in species_database:
    thresholds = calculate_species_ibd_thresholds(
        species["name"],
        species["genome_size_mb"],
        species["chromosome_count"],
        species["avg_recombination_rate"],
        species["mating_system"]
    )
    thresholds_by_species[species["name"]] = thresholds

# Print formatted results
print("Species-Specific IBD Thresholds for Key Relationships (in cM)")
print("-" * 70)
print(f"{'Species':<20} {'Parent-Child':>13} {'Full Siblings':>13} {'First Cousins':>13}")
print("-" * 70)

for species in species_database:
    name = species["common_name"]
    thresholds = thresholds_by_species[species["name"]]["thresholds"]
    
    pc = thresholds["parent_child"]
    fs = thresholds["full_sibling"]
    fc = thresholds["first_cousin"]
    
    print(f"{name:<20} {pc:13.1f} {fs:13.1f} {fc:13.1f}")

print("-" * 70)

# Visualize the comparison
# Create a figure for threshold comparison
plt.figure(figsize=(12, 8))

# Relationship types to plot
relationships = ["parent_child", "full_sibling", "half_sibling", "first_cousin", "second_cousin"]
rel_labels = ["Parent-Child", "Full Sibling", "Half Sibling", "First Cousin", "Second Cousin"]

# Set up bar positions
bar_width = 0.15
positions = np.arange(len(relationships))
species_colors = ['steelblue', 'darkred', 'forestgreen', 'darkorange', 'purple']

# Plot each species
for i, species in enumerate(species_database):
    species_name = species["name"]
    common_name = species["common_name"]
    thresholds = thresholds_by_species[species_name]["thresholds"]
    
    # Extract values for each relationship
    values = [thresholds[rel] for rel in relationships]
    
    # Plot bars
    plt.bar(positions + i*bar_width, values, bar_width, 
            label=common_name, color=species_colors[i], alpha=0.8)

# Add labels and formatting
plt.xlabel('Relationship Type', fontsize=12)
plt.ylabel('Expected IBD Threshold (cM)', fontsize=12)
plt.title('IBD Thresholds by Species and Relationship Type', fontsize=14)
plt.xticks(positions + bar_width * (len(species_database)-1)/2, rel_labels)
plt.legend()

plt.tight_layout()
plt.show()

### Exercise 2: Adapting Bonsai for a Non-Human Species

In this exercise, you'll adapt the Bonsai algorithm for a non-human conservation genetics application.

**Task:** Create a function that adapts Bonsai's pedigree reconstruction for an endangered species. Consider species-specific genetic characteristics, breeding patterns, and conservation goals.

**Hint:** Think about how to modify the IBD detection, relationship inference, and pedigree building components for a species that might have different genetic architecture and mating patterns from humans.

In [ ]:
def configure_bonsai_for_conservation(
    species_name,
    genetic_map_dir,
    genome_size_mb,
    chromosome_count,
    recombination_rate,
    mating_system,
    conservation_goal="genetic_diversity",
    population_bottleneck_severity=0.0  # 0.0 = no bottleneck, 1.0 = extreme bottleneck
):
    """
    Configure Bonsai for a conservation genetics application.
    
    Args:
        species_name: Scientific name of the species
        genetic_map_dir: Directory containing genetic maps for the species
        genome_size_mb: Genome size in megabases
        chromosome_count: Number of chromosome pairs
        recombination_rate: Average recombination rate in cM/Mb
        mating_system: Mating system type ('monogamous', 'polygamous', 'polyandrous')
        conservation_goal: Primary goal for conservation ('genetic_diversity', 'adaptation', 'balanced')
        population_bottleneck_severity: Severity of historical bottleneck (0.0-1.0)
        
    Returns:
        Dictionary with Bonsai configuration adapted for the species
    """
    # TODO: Implement species-specific adaptations
    
    # Return species-adapted configuration
    return {
        "species": {
            "name": species_name,
            "genome_size_mb": genome_size_mb,
            "chromosome_count": chromosome_count,
            "recombination_rate": recombination_rate,
            "mating_system": mating_system
        },
        "genetic_map": {
            "directory": genetic_map_dir,
            "format": "standard"  # or custom format if needed
        },
        "ibd_detection": {
            # Parameters for IBD detection adapted to the species
        },
        "relationship_inference": {
            # Relationship model parameters
        },
        "pedigree_construction": {
            # Pedigree building parameters
        },
        "conservation_parameters": {
            "goal": conservation_goal,
            "bottleneck_severity": population_bottleneck_severity,
            # Additional conservation-specific parameters
        }
    }

# Test your solution
# giant_panda_config = configure_bonsai_for_conservation(
#     species_name="Ailuropoda melanoleuca",
#     genetic_map_dir="/data/maps/panda",
#     genome_size_mb=2300,
#     chromosome_count=21,
#     recombination_rate=1.05,
#     mating_system="polygamous",
#     conservation_goal="genetic_diversity",
#     population_bottleneck_severity=0.8  # Severe historical bottleneck
# )

In [ ]:
# Example solution for Exercise 2

def configure_bonsai_for_conservation(
    species_name,
    genetic_map_dir,
    genome_size_mb,
    chromosome_count,
    recombination_rate,
    mating_system,
    conservation_goal="genetic_diversity",
    population_bottleneck_severity=0.0  # 0.0 = no bottleneck, 1.0 = extreme bottleneck
):
    """
    Configure Bonsai for a conservation genetics application.
    
    Args:
        species_name: Scientific name of the species
        genetic_map_dir: Directory containing genetic maps for the species
        genome_size_mb: Genome size in megabases
        chromosome_count: Number of chromosome pairs
        recombination_rate: Average recombination rate in cM/Mb
        mating_system: Mating system type ('monogamous', 'polygamous', 'polyandrous')
        conservation_goal: Primary goal for conservation ('genetic_diversity', 'adaptation', 'balanced')
        population_bottleneck_severity: Severity of historical bottleneck (0.0-1.0)
        
    Returns:
        Dictionary with Bonsai configuration adapted for the species
    """
    # Calculate total genetic map length
    total_genetic_length = genome_size_mb * recombination_rate
    
    # Scaling factor relative to human genome
    scaling_factor = total_genetic_length / 3000.0
    
    # Base IBD detection parameters
    # Adjust based on species characteristics
    min_segment_length = 5.0 * (1.0 / scaling_factor)  # Smaller segments for smaller genomes
    
    # Different segment filtering based on mating system
    if mating_system == "polygamous":
        # More conservative segment filtering in polygamous species due to higher genetic diversity
        min_segment_length *= 1.2
    
    # Calculate IBD thresholds adapted to the species
    ibd_thresholds = {
        'parent_child': 3400 * scaling_factor,
        'full_sibling': 2550 * scaling_factor,
        'half_sibling': 1700 * scaling_factor,
        'grandparent': 1700 * scaling_factor,
        'avuncular': 1700 * scaling_factor,
        'first_cousin': 850 * scaling_factor,
        'first_cousin_once_removed': 425 * scaling_factor,
        'second_cousin': 212.5 * scaling_factor,
        'distant': max(50 * scaling_factor, 20)  # Set minimum threshold
    }
    
    # Adjust confidence based on mating system
    confidence_adjustments = {
        "monogamous": {
            "parent_child": 0.0,   # No adjustment
            "sibling": 0.0,        # No adjustment
            "cousin": 0.0          # No adjustment
        },
        "polygamous": {
            "parent_child": -0.05, # Slightly less confident in parent-child
            "sibling": -0.1,       # Less confident in sibling relationships
            "cousin": -0.15        # Much less confident in cousin relationships
        },
        "polyandrous": {
            "parent_child": -0.1,  # Less confident in parent-child (paternal)
            "sibling": -0.15,      # Less confident in sibling relationships
            "cousin": -0.2         # Much less confident in cousin relationships
        }
    }
    
    # Base confidence thresholds
    confidence_thresholds = {
        'parent_child': 0.8,
        'full_sibling': 0.8,
        'half_sibling': 0.7,
        'grandparent': 0.7,
        'avuncular': 0.7,
        'first_cousin': 0.6,
        'first_cousin_once_removed': 0.6,
        'second_cousin': 0.5,
        'distant': 0.4
    }
    
    # Apply mating system adjustments to confidence thresholds
    mating_adjustments = confidence_adjustments[mating_system]
    adjusted_confidence = {}
    
    for rel_type, threshold in confidence_thresholds.items():
        if 'parent_child' in rel_type:
            adjustment = mating_adjustments["parent_child"]
        elif 'sibling' in rel_type:
            adjustment = mating_adjustments["sibling"]
        elif 'cousin' in rel_type:
            adjustment = mating_adjustments["cousin"]
        else:
            adjustment = mating_adjustments.get("cousin", 0)
            
        adjusted_confidence[rel_type] = max(0.3, threshold + adjustment)
    
    # Apply bottleneck effects if present
    if population_bottleneck_severity > 0:
        # Increase expected IBD sharing for distant relationships in bottlenecked populations
        bottleneck_factor = 1.0 + (0.3 * population_bottleneck_severity)
        for rel_type in ['first_cousin', 'first_cousin_once_removed', 'second_cousin', 'distant']:
            ibd_thresholds[rel_type] *= bottleneck_factor
        
        # Decrease confidence thresholds for distant relationships
        confidence_penalty = -0.1 * population_bottleneck_severity
        for rel_type in ['first_cousin', 'first_cousin_once_removed', 'second_cousin', 'distant']:
            adjusted_confidence[rel_type] = max(0.3, adjusted_confidence[rel_type] + confidence_penalty)
    
    # Adjust pedigree construction based on conservation goal
    if conservation_goal == "genetic_diversity":
        # Prioritize detecting distant relationships to maximize diversity
        distant_rel_bonus = 0.2
        for rel_type in ['distant', 'second_cousin', 'first_cousin_once_removed']:
            adjusted_confidence[rel_type] += distant_rel_bonus
            distant_rel_bonus -= 0.05  # Less bonus for closer relationships
    elif conservation_goal == "adaptation":
        # Prioritize close relationships to maintain adaptive traits
        close_rel_bonus = 0.1
        for rel_type in ['parent_child', 'full_sibling', 'half_sibling']:
            adjusted_confidence[rel_type] += close_rel_bonus
            close_rel_bonus -= 0.03
    
    # Set appropriate pedigree constraints for the species
    if mating_system == "polygamous":
        max_spouses_per_individual = 5
        max_children_per_family = 20  # Higher in polygamous species
    elif mating_system == "polyandrous":
        max_spouses_per_individual = 3
        max_children_per_family = 15
    else:  # monogamous
        max_spouses_per_individual = 2  # Allow for sequential monogamy
        max_children_per_family = 10
    
    # Return complete configuration
    return {
        "species": {
            "name": species_name,
            "genome_size_mb": genome_size_mb,
            "chromosome_count": chromosome_count,
            "recombination_rate": recombination_rate,
            "mating_system": mating_system,
            "total_genetic_length_cm": total_genetic_length
        },
        "genetic_map": {
            "directory": genetic_map_dir,
            "format": "standard",
            "scaling_factor": scaling_factor
        },
        "ibd_detection": {
            "min_segment_length": min_segment_length,
            "min_snps": max(1000, int(3000 / scaling_factor)),  # Scale minimum SNPs by genome size
            "max_gap": 500000,  # Standard gap threshold
            "phase_sensitivity": mating_system != "polyandrous"  # Phase is less important in polyandrous species
        },
        "relationship_inference": {
            "ibd_thresholds": ibd_thresholds,
            "confidence_thresholds": adjusted_confidence,
            "use_demographic_data": True,
            "allow_multiple_relationships": mating_system in ["polygamous", "polyandrous"],
            "relationship_prior_adjustment": {
                "bottleneck_effect": population_bottleneck_severity,
                "mating_system": mating_system
            }
        },
        "pedigree_construction": {
            "max_pedigree_size": 1000,  # Allow large pedigrees for conservation
            "max_generations": 10,  # Standard generational depth
            "allow_cycles": False,  # No inbreeding cycles in conservation
            "max_spouses_per_individual": max_spouses_per_individual,
            "max_children_per_family": max_children_per_family,
            "enforce_demographic_constraints": True,
            "ungenotyped_ancestor_penalty": 0.5,
            "allow_half_relationships": True,
            "chromosome_count": chromosome_count  # Important for crossover modeling
        },
        "conservation_parameters": {
            "goal": conservation_goal,
            "bottleneck_severity": population_bottleneck_severity,
            "prioritize_genetic_diversity": conservation_goal == "genetic_diversity",
            "prioritize_adaptation": conservation_goal == "adaptation",
            "inbreeding_depression_risk": population_bottleneck_severity > 0.5,
            "recommended_mate_selection_method": (
                "min_kinship" if conservation_goal == "genetic_diversity" 
                else "optimal_contribution" if conservation_goal == "balanced"
                else "trait_preservation"
            ),
            "min_effective_population_size": max(50, int(100 * (1 - population_bottleneck_severity))),
            "genetic_rescue_threshold": 0.25 if population_bottleneck_severity > 0.7 else 0.125
        }
    }

# Test with endangered species examples
endangered_species = [
    {
        "name": "Ailuropoda melanoleuca",
        "common_name": "Giant Panda",
        "genetic_map_dir": "/data/maps/panda",
        "genome_size_mb": 2300,
        "chromosome_count": 21,
        "recombination_rate": 1.05,
        "mating_system": "polygamous",
        "conservation_goal": "genetic_diversity",
        "population_bottleneck_severity": 0.8  # Severe historical bottleneck
    },
    {
        "name": "Panthera leo",
        "common_name": "African Lion",
        "genetic_map_dir": "/data/maps/lion",
        "genome_size_mb": 2400,
        "chromosome_count": 19,
        "recombination_rate": 1.1,
        "mating_system": "polygamous",
        "conservation_goal": "balanced",
        "population_bottleneck_severity": 0.4  # Moderate bottleneck
    }
]

# Configure Bonsai for conservation applications
conservation_configs = {}
for species in endangered_species:
    config = configure_bonsai_for_conservation(
        species_name=species["name"],
        genetic_map_dir=species["genetic_map_dir"],
        genome_size_mb=species["genome_size_mb"],
        chromosome_count=species["chromosome_count"],
        recombination_rate=species["recombination_rate"],
        mating_system=species["mating_system"],
        conservation_goal=species["conservation_goal"],
        population_bottleneck_severity=species["population_bottleneck_severity"]
    )
    conservation_configs[species["name"]] = config

# Display key parameters for conservation applications
def display_conservation_config_comparison(configs, endangered_species):
    """Display a comparison of key conservation configurations."""
    from IPython.display import Markdown, display
    
    # Create a markdown table with key parameters
    markdown_text = "### Conservation-Specific Bonsai Configurations\
\
"
    
    # Add species information
    markdown_text += "#### Species Information\
\
"
    markdown_text += "| Species | Mating System | Genome Size | Chromosome Count | Bottleneck Severity | Conservation Goal |\
"
    markdown_text += "|---------|--------------|-------------|------------------|---------------------|-------------------|\
"
    
    for species in endangered_species:
        name = species["common_name"]
        mating = species["mating_system"].capitalize()
        genome = f"{species['genome_size_mb']} Mb"
        chromosomes = str(species["chromosome_count"])
        bottleneck = f"{species['population_bottleneck_severity'] * 100:.0f}%"
        goal = species["conservation_goal"].replace("_", " ").capitalize()
        
        markdown_text += f"| {name} | {mating} | {genome} | {chromosomes} | {bottleneck} | {goal} |\
"
    
    markdown_text += "\
\
"
    
    # Add relationship inference parameters
    markdown_text += "#### Relationship Inference Thresholds\
\
"
    markdown_text += "| Species | Parent-Child | Full Sibling | First Cousin | Second Cousin |\
"
    markdown_text += "|---------|--------------|--------------|--------------|---------------|\
"
    
    for species in endangered_species:
        name = species["common_name"]
        config = configs[species["name"]]
        thresholds = config["relationship_inference"]["ibd_thresholds"]
        
        pc = f"{thresholds['parent_child']:.1f} cM"
        fs = f"{thresholds['full_sibling']:.1f} cM"
        fc = f"{thresholds['first_cousin']:.1f} cM"
        sc = f"{thresholds['second_cousin']:.1f} cM"
        
        markdown_text += f"| {name} | {pc} | {fs} | {fc} | {sc} |\
"
    
    markdown_text += "\
\
"
    
    # Add conservation-specific parameters
    markdown_text += "#### Conservation-Specific Parameters\
\
"
    markdown_text += "| Species | Mate Selection Method | Min Effective Population | Genetic Rescue Threshold |\
"
    markdown_text += "|---------|------------------------|--------------------------|---------------------------|\
"
    
    for species in endangered_species:
        name = species["common_name"]
        config = configs[species["name"]]
        cons_params = config["conservation_parameters"]
        
        mate_method = cons_params["recommended_mate_selection_method"].replace("_", " ").capitalize()
        min_pop = str(cons_params["min_effective_population_size"])
        rescue = f"{cons_params['genetic_rescue_threshold'] * 100:.1f}%"
        
        markdown_text += f"| {name} | {mate_method} | {min_pop} | {rescue} |\
"
    
    # Display the markdown
    display(Markdown(markdown_text))

# Display the comparison
display_conservation_config_comparison(conservation_configs, endangered_species)

## Part 3: Future Directions and Research Opportunities

### Theory and Background

As computational methods advance and genomic data becomes more abundant, several exciting research directions are emerging in computational pedigree reconstruction. This section explores potentially transformative approaches that could enhance Bonsai v3 and similar systems in the future.

#### Machine Learning Integration

Modern machine learning techniques show promise for improving several aspects of pedigree reconstruction:

1. **Relationship Classification**:
   - Neural networks can learn complex patterns in IBD sharing distributions
   - Deep learning models can integrate multiple sources of evidence
   - Ensemble methods can combine multiple classification approaches

2. **Segment Detection**:
   - Convolutional neural networks (CNNs) can improve IBD segment detection
   - Recurrent neural networks (RNNs) can model sequential aspects of genomes
   - Self-supervised learning can leverage unlabeled genomic data

3. **Pedigree Structure Prediction**:
   - Graph neural networks can learn valid pedigree structures
   - Reinforcement learning can optimize pedigree construction strategies
   - Generative models can propose multiple plausible pedigree hypotheses

#### Multi-Modal Data Integration

Future pedigree reconstruction systems will likely integrate evidence from multiple data sources:

1. **Documentary Evidence**:
   - Automated extraction of relationship information from historical records
   - Integration of birth, death, and marriage records 
   - Confidence scoring for documentary evidence

2. **Phenotype Data**:
   - Incorporation of heritable trait information
   - Modeling of complex polygenic traits
   - Integration of medical and phenotypic records

3. **Ancient DNA**:
   - Connecting modern populations to historical samples
   - Temporal modeling across centuries
   - Compensating for DNA degradation and contamination

#### Population-Scale Reconstruction

As genetic testing becomes ubiquitous, computational methods must scale to population-level pedigree reconstruction:

1. **Algorithm Scaling**:
   - Parallelization techniques for billion-scale genetic datasets
   - Approximate algorithms with provable error bounds
   - GPU and cloud acceleration of computationally intensive processes

2. **Privacy Preservation**:
   - Secure multi-party computation for relationship inference
   - Differential privacy approaches for genetic data
   - Homomorphic encryption techniques for sensitive computations

3. **Graph Database Integration**:
   - Specialized storage formats for massive pedigree structures
   - Query optimization for relationship path finding
   - Real-time updates to giant relationship graphs

#### Ethical Considerations

Advances in pedigree reconstruction raise important ethical questions that must be addressed:

1. **Privacy Implications**:
   - Impact of indirect genetic identification
   - Control over unexpected relationship discoveries
   - Rights of biological relatives who haven't consented to testing

2. **Cultural Sensitivity**:
   - Alignment with diverse cultural family structures
   - Respect for cultural practices around genetic relatedness
   - Inclusivity in computational models

3. **Equity and Access**:
   - Ensuring diverse population representation in genetic databases
   - Accessibility of computational tools to underrepresented communities
   - Preventing exploitative uses of relationship inference

In [ ]:
# Visualize future research directions and their connections

import matplotlib.pyplot as plt
import matplotlib.patches as mpatches
import networkx as nx
import numpy as np

# Create a graph to represent research directions and their relationships
G = nx.Graph()

# Add nodes for main research areas
main_areas = [
    "Machine Learning\
Integration",
    "Multi-Modal Data\
Integration",
    "Population-Scale\
Reconstruction",
    "Ethical\
Considerations",
    "Bonsai v3"
]

# Add nodes for specific research topics
research_topics = {
    "Machine Learning\
Integration": [
        "Neural Networks for\
Relationship Classification",
        "Deep Learning for\
IBD Detection",
        "Graph Neural Networks\
for Pedigree Structures",
        "Reinforcement Learning\
for Optimization"
    ],
    "Multi-Modal Data\
Integration": [
        "Historical Record\
Integration",
        "Phenotype Data\
Incorporation",
        "Ancient DNA\
Analysis",
        "Medical Data\
Linkage"
    ],
    "Population-Scale\
Reconstruction": [
        "Distributed\
Algorithms",
        "Privacy-Preserving\
Computation",
        "Graph Database\
Optimization",
        "Cloud Computing\
Architectures"
    ],
    "Ethical\
Considerations": [
        "Privacy Impact\
Assessment",
        "Cultural Sensitivity\
Frameworks",
        "Equity and\
Access",
        "Informed Consent\
Models"
    ]
}

# Add nodes to the graph
positions = {}
positions["Bonsai v3"] = (0, 0)

# Add main research areas
for i, area in enumerate(main_areas[:-1]):  # Skip Bonsai v3
    angle = i * 2 * np.pi / 4
    x = 10 * np.cos(angle)
    y = 10 * np.sin(angle)
    positions[area] = (x, y)
    G.add_node(area, type="main_area")
    G.add_edge("Bonsai v3", area, type="foundation")

# Add research topics
for area, topics in research_topics.items():
    area_x, area_y = positions[area]
    for i, topic in enumerate(topics):
        # Calculate position around the main area
        angle = (i / len(topics)) * 2 * np.pi / 4 + (main_areas.index(area) * 2 * np.pi / 4)
        radius = 5
        x = area_x + radius * np.cos(angle)
        y = area_y + radius * np.sin(angle)
        positions[topic] = (x, y)
        G.add_node(topic, type="research_topic")
        G.add_edge(area, topic, type="subtopic")

# Add some cross-connections between related topics
cross_connections = [
    ("Neural Networks for\
Relationship Classification", "Privacy-Preserving\
Computation"),
    ("Graph Neural Networks\
for Pedigree Structures", "Graph Database\
Optimization"),
    ("Historical Record\
Integration", "Privacy Impact\
Assessment"),
    ("Deep Learning for\
IBD Detection", "Distributed\
Algorithms"),
    ("Ancient DNA\
Analysis", "Cultural Sensitivity\
Frameworks"),
    ("Privacy-Preserving\
Computation", "Privacy Impact\
Assessment"),
    ("Phenotype Data\
Incorporation", "Medical Data\
Linkage"),
    ("Reinforcement Learning\
for Optimization", "Cloud Computing\
Architectures")
]

for source, target in cross_connections:
    G.add_edge(source, target, type="cross_connection")

# Set up the figure
plt.figure(figsize=(18, 18))

# Draw the graph with custom styling
node_sizes = []
node_colors = []
node_shapes = []
labels = {}

# Set node attributes based on type
for node in G.nodes():
    if node == "Bonsai v3":
        node_sizes.append(3000)
        node_colors.append("#1f77b4")  # Blue
        labels[node] = node
    elif G.nodes[node].get("type") == "main_area":
        node_sizes.append(2000)
        node_colors.append("#ff7f0e")  # Orange
        labels[node] = node
    else:  # research_topic
        node_sizes.append(1500)
        node_colors.append("#2ca02c")  # Green
        labels[node] = node

# Draw edges with different styles based on type
foundation_edges = [(u, v) for u, v, d in G.edges(data=True) if d.get("type") == "foundation"]
subtopic_edges = [(u, v) for u, v, d in G.edges(data=True) if d.get("type") == "subtopic"]
cross_edges = [(u, v) for u, v, d in G.edges(data=True) if d.get("type") == "cross_connection"]

nx.draw_networkx_edges(G, positions, edgelist=foundation_edges, width=3, alpha=0.7, edge_color="#1f77b4")
nx.draw_networkx_edges(G, positions, edgelist=subtopic_edges, width=2, alpha=0.5, edge_color="#ff7f0e")
nx.draw_networkx_edges(G, positions, edgelist=cross_edges, width=1.5, alpha=0.3, edge_color="#d62728", 
                       style="dashed")

# Draw nodes
nx.draw_networkx_nodes(G, positions, node_size=node_sizes, node_color=node_colors, alpha=0.8)

# Draw labels with appropriate font sizes
main_labels = {node: node for node in G.nodes() if node == "Bonsai v3" or G.nodes[node].get("type") == "main_area"}
topic_labels = {node: node for node in G.nodes() if G.nodes[node].get("type") == "research_topic"}

nx.draw_networkx_labels(G, positions, labels=main_labels, font_size=14, font_weight="bold")
nx.draw_networkx_labels(G, positions, labels=topic_labels, font_size=10)

# Add a title and legend
plt.title("Future Research Directions in Computational Pedigree Reconstruction", fontsize=20, pad=20)

# Create a custom legend
legend_elements = [
    mpatches.Patch(color="#1f77b4", label="Bonsai v3 Core"),
    mpatches.Patch(color="#ff7f0e", label="Main Research Areas"),
    mpatches.Patch(color="#2ca02c", label="Specific Research Topics"),
    mpatches.Patch(color="#d62728", label="Interdisciplinary Connections", alpha=0.3)
]

plt.legend(handles=legend_elements, loc="lower right", fontsize=12)

# Remove axis
plt.axis("off")
plt.tight_layout()
plt.show()

# Create a simplified taxonomy of future enhancements
def create_ml_application_taxonomy():
    """Create a taxonomy visualization of ML applications in genetic genealogy."""
    # Create a figure
    plt.figure(figsize=(12, 8))
    ax = plt.subplot(111)
    
    # Hide axis
    ax.set_axis_off()
    
    # Define the main categories and subcategories
    categories = {
        "Relationship Inference": [
            "Neural Network Classifiers",
            "Ensemble Methods",
            "Transfer Learning from Model Organisms"
        ],
        "IBD Detection": [
            "CNN-based Segment Detection",
            "Anomaly Detection for Error Identification",
            "Self-supervised Phasing"
        ],
        "Pedigree Construction": [
            "Graph Neural Networks",
            "Reinforcement Learning",
            "Generative Adversarial Networks"
        ],
        "Privacy Protection": [
            "Federated Learning",
            "Differential Privacy",
            "Homomorphic Encryption"
        ]
    }
    
    # Define colors for each main category
    colors = {
        "Relationship Inference": "#3498db",  # Blue
        "IBD Detection": "#e74c3c",  # Red
        "Pedigree Construction": "#2ecc71",  # Green
        "Privacy Protection": "#f39c12"  # Orange
    }
    
    # Position calculations
    y_offset = 0
    y_spacing = 1.5
    y_box_height = 1.2
    x_category = 1
    x_subcategory = 6
    
    # Draw the title
    ax.text(0.5, 0.95, "Machine Learning Applications in Genetic Genealogy", 
            ha="center", va="center", fontsize=16, fontweight="bold", 
            transform=ax.transAxes)
    
    # Draw each category and its subcategories
    for i, (category, subcategories) in enumerate(categories.items()):
        # Calculate y position
        y_pos = y_offset + i * (len(subcategories) + 1) * y_spacing
        
        # Draw category box
        rect = mpatches.Rectangle((0.5, y_pos - 0.5), 4, y_box_height, 
                                 ec="black", fc=colors[category], alpha=0.7)
        ax.add_patch(rect)
        
        # Add category text
        ax.text(x_category, y_pos, category, 
                ha="left", va="center", fontweight="bold", fontsize=12)
        
        # Draw subcategories
        for j, subcategory in enumerate(subcategories):
            # Calculate subcategory y position
            sub_y = y_pos + (j + 1) * y_spacing
            
            # Draw an arrow from category to subcategory
            ax.arrow(4, y_pos, 1, sub_y - y_pos, 
                    head_width=0.2, head_length=0.2, fc='black', ec='black')
            
            # Draw subcategory box
            sub_rect = mpatches.Rectangle((5.5, sub_y - 0.4), 8, 0.8, 
                                        ec="black", fc=colors[category], alpha=0.3)
            ax.add_patch(sub_rect)
            
            # Add subcategory text
            ax.text(x_subcategory, sub_y, subcategory, 
                   ha="left", va="center", fontsize=10)
    
    # Set limits
    ax.set_xlim(0, 14)
    ax.set_ylim(-1, y_offset + len(categories) * 5 * y_spacing)
    
    plt.tight_layout()
    plt.show()

# Generate the ML application taxonomy
create_ml_application_taxonomy()

### Exercise 3: Design a Future Enhancement

In this exercise, you'll design a future enhancement for the Bonsai v3 system based on emerging technologies.

**Task:** Propose a specific machine learning or multi-modal data integration enhancement for Bonsai, including:

1. A clear description of the enhancement
2. The problem it solves or opportunity it addresses
3. A high-level architecture or algorithm description
4. Expected benefits and potential challenges

**Hint:** Consider how techniques like deep learning, federated learning, or document analysis could enhance specific aspects of pedigree reconstruction.

In [ ]:
# Example solution for Exercise 3

proposal = BonsaiEnhancementProposal()
proposal.document(
    title="Federated Multi-Modal Pedigree Enhancement (FedMPE)",
    description="""
    A privacy-preserving framework for enhancing pedigree reconstruction by integrating multiple data modalities (genetic, documentary, and phenotypic) across distributed databases without central data sharing. The framework uses federated learning to train relationship classification and pedigree optimization models while keeping sensitive data local to each participating organization.
    """,
    
    problem="""
    Current pedigree reconstruction approaches face three critical limitations:
    
    1. **Privacy Constraints**: Genetic and genealogical data is highly sensitive, limiting data sharing across organizations and regions, which fragments available evidence.
    
    2. **Modal Isolation**: Genetic data (IBD segments), documentary evidence (birth/marriage/death records), and phenotypic data (heritable traits) are typically stored in different systems with different access controls.
    
    3. **Confidence Assessment**: Determining the reliability of reconstructed relationships requires reconciling evidence from multiple sources with varying levels of trustworthiness.
    
    These limitations reduce the accuracy, completeness, and utility of reconstructed pedigrees, particularly for individuals with ancestry spanning multiple regions or data sources.
    """,
    
    algorithm="""
    FedMPE integrates three key components:
    
    1. **Federated Learning Architecture**:
       - Client-side models at each participating organization (genetic testing companies, genealogical databases, research institutions)
       - Secure aggregation protocol for model updates without raw data sharing
       - Differential privacy mechanisms to prevent membership inference attacks
    
    2. **Multi-Modal Integration Framework**:
       - Local fusion models that combine genetic, documentary, and phenotypic evidence
       - Confidence scoring mechanisms for each evidence type
       - Weighted integration based on evidence quality and consistency
       
    3. **Distributed Pedigree Optimization**:
       - Graph neural network for structural prediction trained via federated learning
       - Consensus protocols for resolving conflicting pedigree hypotheses
       - Privacy-preserving record linkage for identifying the same individuals across databases
    
    The algorithm flow is as follows:
    
    ```
    1. Initialize global model structure shared with all participants
    2. For each training round:
       a. Each participant trains local models on their private data
       b. Local models generate encrypted gradient updates
       c. Secure aggregation server combines updates without seeing raw data
       d. Updated global model is distributed to participants
    3. For inference:
       a. Users submit queries with consent-based access controls
       b. Federated inference across relevant participant databases
       c. Results combined with confidence scores and provenance tracking
    ```
    """,
    
    benefits=[
        "Privacy Preservation: Organizations can contribute to improved pedigree reconstruction without compromising sensitive data",
        "Enhanced Accuracy: Integration of multiple evidence types leads to more accurate relationship inferences",
        "Improved Coverage: Access to broader data sources while respecting privacy and regional regulations",
        "Confidence Metrics: Clear provenance and reliability measures for each inferred relationship",
        "Scalability: Distributed computation enables processing of much larger combined datasets",
        "Ethical Compliance: Respects individual consent and data sovereignty principles"
    ],
    
    challenges=[
        "Technical Complexity: Requires sophisticated federated learning infrastructure and secure computation methods",
        "Standardization: Participants need common data schemas and evidence evaluation metrics",
        "Computational Overhead: Secure computation and privacy mechanisms add significant processing requirements",
        "Cold Start Problem: Initial models need sufficient training data before producing reliable results",
        "Regulatory Alignment: Must comply with diverse and evolving privacy regulations across jurisdictions",
        "Incentive Alignment: Ensuring all participants benefit proportionally to their contributions"
    ],
    
    implementation="""
    The implementation strategy follows a phased approach:
    
    **Phase 1: Prototype and Validation (12 months)**
    - Develop core federated learning architecture with synthetic data
    - Implement and test multi-modal fusion models for genetic and documentary evidence
    - Create evaluation framework with metrics for accuracy, privacy, and utility
    - Conduct validation studies with 3-5 partner organizations using limited datasets
    
    **Phase 2: Limited Deployment (18 months)**
    - Expand to 10-15 participating organizations with diverse data types
    - Implement production-grade security and privacy controls
    - Develop user interfaces for querying and visualizing results
    - Create governance framework for participant coordination
    
    **Phase 3: Full-Scale Implementation (24+ months)**
    - Scale to 50+ global participants across multiple jurisdictions
    - Implement advanced features (automated documentary evidence extraction, etc.)
    - Develop specialized applications for medical research, forensics, etc.
    - Establish standard protocols for new participant onboarding
    
    The implementation would require an interdisciplinary team with expertise in:
    - Federated learning and secure multi-party computation
    - Genetic genealogy and relationship inference
    - Privacy engineering and differential privacy
    - Distributed systems and high-performance computing
    """
)

# Display the proposal
proposal.display()

# Create a visualization of the FedMPE architecture
plt.figure(figsize=(14, 10))

# Create a circle of participants around a central aggregation server
num_participants = 6
participant_radius = 4
participant_positions = {}

# Central aggregation server
plt.plot(0, 0, 'o', markersize=20, color='#3498db', alpha=0.8)
plt.text(0, 0, "Secure\
Aggregation\
Server", ha='center', va='center', fontsize=12, fontweight='bold', color='white')

# Draw participants
for i in range(num_participants):
    angle = i * 2 * np.pi / num_participants
    x = participant_radius * np.cos(angle)
    y = participant_radius * np.sin(angle)
    
    # Store position for later use
    participant_positions[i] = (x, y)
    
    # Determine participant type
    if i % 3 == 0:
        participant_type = "Genetic\
Testing\
Company"
        color = '#e74c3c'  # Red
    elif i % 3 == 1:
        participant_type = "Genealogical\
Database"
        color = '#2ecc71'  # Green
    else:
        participant_type = "Research\
Institution"
        color = '#f39c12'  # Orange
    
    # Draw participant
    plt.plot(x, y, 'o', markersize=18, color=color, alpha=0.8)
    plt.text(x, y, participant_type, ha='center', va='center', fontsize=10, fontweight='bold', color='white')
    
    # Draw connection to central server
    plt.arrow(x * 0.8, y * 0.8, -x * 0.6, -y * 0.6, 
              head_width=0.2, head_length=0.2, fc=color, ec=color, alpha=0.6,
              length_includes_head=True)
    
    # Draw connection from central server to participant
    plt.arrow(x * 0.2, y * 0.2, x * 0.4, y * 0.4,
              head_width=0.2, head_length=0.2, fc='#3498db', ec='#3498db', alpha=0.6,
              length_includes_head=True)

# Draw participant data repositories
for i, (x, y) in participant_positions.items():
    # Data repositories
    data_x = x + 1.2 * np.cos(i * 2 * np.pi / num_participants)
    data_y = y + 1.2 * np.sin(i * 2 * np.pi / num_participants)
    
    if i % 3 == 0:
        data_type = "Genetic\
Data"
        color = '#e74c3c'  # Red
    elif i % 3 == 1:
        data_type = "Documentary\
Records"
        color = '#2ecc71'  # Green
    else:
        data_type = "Research\
Data"
        color = '#f39c12'  # Orange
    
    plt.plot(data_x, data_y, 's', markersize=14, color=color, alpha=0.4)
    plt.text(data_x, data_y, data_type, ha='center', va='center', fontsize=8)
    
    # Connect data to participant
    plt.plot([x, data_x], [y, data_y], '-', color=color, alpha=0.4, linewidth=2)

# Draw the user query
user_x, user_y = -6, -6
plt.plot(user_x, user_y, 'o', markersize=16, color='#9b59b6', alpha=0.8)  # Purple
plt.text(user_x, user_y, "User\
Query", ha='center', va='center', fontsize=10, fontweight='bold', color='white')

# Connect user to server
plt.arrow(user_x + 0.7, user_y + 0.7, -user_x * 0.5, -user_y * 0.5,
          head_width=0.2, head_length=0.2, fc='#9b59b6', ec='#9b59b6', alpha=0.6,
          length_includes_head=True)

# Draw result back to user
plt.arrow(-user_x * 0.3, -user_y * 0.3, user_x * 0.6 + 0.7, user_y * 0.6 + 0.7,
          head_width=0.2, head_length=0.2, fc='#3498db', ec='#3498db', alpha=0.6,
          length_includes_head=True)

# Draw result pedigree near user
result_x, result_y = user_x + 2, user_y + 1
plt.plot(result_x, result_y, 's', markersize=18, color='#3498db', alpha=0.4)
plt.text(result_x, result_y, "Integrated\
Pedigree\
Result", ha='center', va='center', fontsize=9)

# Connect result to user
plt.plot([user_x, result_x], [user_y, result_y], '--', color='#3498db', alpha=0.6, linewidth=1.5)

# Add titles and annotations
plt.title("Federated Multi-Modal Pedigree Enhancement (FedMPE) Architecture", fontsize=16, pad=20)
plt.text(0, -7, "Data remains local to each organization\
Only model updates are securely shared", 
         ha='center', fontsize=12, bbox=dict(facecolor='lightyellow', alpha=0.5, boxstyle="round,pad=0.5"))

# Create legend
legend_elements = [
    mpatches.Patch(color='#3498db', label='Global Model & Aggregation'),
    mpatches.Patch(color='#e74c3c', label='Genetic Testing Companies'),
    mpatches.Patch(color='#2ecc71', label='Genealogical Databases'),
    mpatches.Patch(color='#f39c12', label='Research Institutions'),
    mpatches.Patch(color='#9b59b6', label='User Interface')
]
plt.legend(handles=legend_elements, loc='upper right', fontsize=10)

# Remove axis
plt.axis('off')
plt.tight_layout()
plt.show()

## Summary

In this final lab, we've explored advanced applications and future directions for the Bonsai v3 pedigree reconstruction system. We've covered several key areas:

1. **Specialized Population Applications**:
   - Adapting Bonsai for founder populations with high endogamy
   - Customizing IBD thresholds and relationship models for specific scenarios
   - Using demographic context to refine relationship inferences

2. **Conservation Genetics Applications**:
   - Extending Bonsai for non-human species
   - Scaling genetic parameters based on species characteristics
   - Accounting for different mating systems and breeding patterns

3. **Future Research Directions**:
   - Machine learning integration for improved relationship classification
   - Multi-modal data fusion across genetic, documentary, and phenotypic sources
   - Population-scale reconstruction techniques
   - Privacy-preserving approaches like federated learning

Through exercises, you've practiced adapting Bonsai for specialized applications and designing future enhancements. These adaptations demonstrate the flexibility of the core Bonsai architecture and the potential for extending it to new domains.

### Connections to Other Labs

The concepts covered in this lab connect to:
- **Lab 5: Statistical Models**: Foundation for adapting relationship models to different populations
- **Lab 8: Age-Based Relationship Modeling**: Demographic integration for specialized applications
- **Lab 14: Optimizing Pedigrees**: Core algorithms that can be enhanced with ML techniques
- **Lab 20: Error Handling**: Critical for robust specialized applications
- **Lab 28: Integration Tools**: Interfaces for connecting with other data sources

### Further Reading

To deepen your understanding of advanced applications and future directions in computational pedigree reconstruction:

- Browning, B. L., & Browning, S. R. (2021). "Haplotype phasing and imputation with deep learning techniques." *Nature Reviews Genetics*, 22(10), 594-607.
- Kaplanis, J., et al. (2018). "Quantitative analysis of population-scale family trees with millions of relatives." *Science*, 360(6385), 171-175.
- Bellet, A., Guerraoui, R., & Hendrikx, H. (2022). "Federated learning with personalization layers." *Communications of the ACM*, 65(7), 50-57.
- Field, Y., et al. (2016). "Detection of human adaptation during the past 2000 years." *Science*, 354(6313), 760-764.
- Rosenberg, N. A., & Edge, M. D. (2021). "Genetic relatedness of indigenous peoples in North America." *Science Advances*, 7(35).

## Course Reflection

Congratulations on reaching the final lab in the "Computational Pedigree Reconstruction: Deep Dive into Bonsai v3" course! Over the past 30 labs, you've gained a comprehensive understanding of computational genetic genealogy, from foundational concepts to advanced applications.

### Key Learning Journey

Throughout this course, you've:

1. **Built a Strong Foundation**:
   - Explored the fundamentals of IBD and genetic inheritance
   - Learned how to process and interpret genetic data
   - Understood the statistical models underlying relationship inference

2. **Mastered Core Bonsai v3 Components**:
   - Implemented pairwise likelihood calculations
   - Built and manipulated pedigree data structures
   - Developed algorithms for optimizing and merging pedigrees
   - Created visualization techniques for complex family structures

3. **Applied Knowledge to Real Problems**:
   - Handled challenging cases like twins and complex relationships
   - Worked with real-world datasets
   - Optimized performance for large-scale applications
   - Integrated different data sources and tools

4. **Explored Advanced Applications**:
   - Adapted methods for founder populations
   - Extended techniques to non-human applications
   - Considered ethical implications and privacy concerns
   - Investigated cutting-edge research directions

### Continuing Your Journey

As you move forward from this course, there are many ways to build on what you've learned:

- **Contribute to Open Source**: Many genetic genealogy projects welcome contributors
- **Explore Research Opportunities**: This field offers rich research possibilities at the intersection of genetics, computer science, and genealogy
- **Build Applications**: Apply these techniques to create tools for specific use cases
- **Join Communities**: Connect with others working in genetic genealogy for continuing education and collaboration

Thank you for your dedication and engagement throughout this course. We hope you'll continue exploring the fascinating world of computational genetic genealogy!

## Self-Assessment Questions

Test your understanding of advanced applications and future directions:

1. What adjustments are needed when adapting Bonsai for a founder population, and why?

2. How would you modify IBD thresholds when applying Bonsai to a non-human species with a genome half the size of the human genome?

3. What are the three key components of a federated learning approach to pedigree reconstruction, and what privacy benefits does it provide?

4. How might graph neural networks improve the pedigree construction process compared to current algorithmic approaches?

5. What ethical considerations should be addressed when implementing advanced pedigree reconstruction systems with increased power and scope?

*Answers to self-assessment questions can be found at the end of the lab document.*

In [ ]:
# Optional: Convert this notebook to PDF
# Uncomment and run this cell if you want to generate a PDF version

# !jupyter nbconvert --to pdf Lab30_Advanced_Applications.ipynb

## Answer Key (for instructors)

### Exercise 1 Solution
```python
def configure_bonsai_for_founder_population(
    population_name,
    estimated_endogamy_factor=1.5,
    generations_isolated=8,
    background_ibd_shift=20,
    min_confidence_threshold=0.7
):
    """Configure Bonsai for a founder population."""
    # Calculate isolation effect
    isolation_effect = min(0.9, estimated_endogamy_factor * (1 - 0.9**generations_isolated))
    
    # Adjust prior probabilities for relationship types
    standard_priors = {
        'parent_child': 0.01,
        'full_sibling': 0.01,
        'half_sibling': 0.02,
        'grandparent': 0.02,
        'avuncular': 0.03,
        'first_cousin': 0.05,
        'first_cousin_once_removed': 0.1,
        'second_cousin': 0.2,
        'distant': 0.56
    }
    
    founder_priors = {}
    for rel_type, prior in standard_priors.items():
        if rel_type == 'distant':
            # Reduce probability of truly distant relationships
            founder_priors[rel_type] = max(0.05, prior * (1 - isolation_effect))
        else:
            # Increase probability of close and moderately distant relationships
            adjustment = isolation_effect * (0.5 if 'cousin' in rel_type else 0.3)
            founder_priors[rel_type] = min(0.25, prior * (1 + adjustment))
    
    # Renormalize to ensure priors sum to 1
    total = sum(founder_priors.values())
    for rel_type in founder_priors:
        founder_priors[rel_type] /= total
    
    # Return configuration
    return {
        "population_name": population_name,
        "relationship_priors": founder_priors,
        "ibd_expectations": {
            # Expected IBD sharing would be increased for distant relationships
        },
        "confidence_thresholds": {
            # Higher thresholds for declaring relationship confidence
        },
        "pedigree_parameters": {
            # Configuration for pedigree construction with endogamy handling
        }
    }
```

### Exercise 2 Solution
```python
def configure_bonsai_for_conservation(
    species_name,
    genetic_map_dir,
    genome_size_mb,
    chromosome_count,
    recombination_rate,
    mating_system,
    conservation_goal="genetic_diversity",
    population_bottleneck_severity=0.0
):
    """Configure Bonsai for a conservation genetics application."""
    # Calculate total genetic map length
    total_genetic_length = genome_size_mb * recombination_rate
    
    # Scaling factor relative to human genome
    scaling_factor = total_genetic_length / 3000.0
    
    # Calculate IBD thresholds adapted to the species
    ibd_thresholds = {
        'parent_child': 3400 * scaling_factor,
        'full_sibling': 2550 * scaling_factor,
        'half_sibling': 1700 * scaling_factor,
        'grandparent': 1700 * scaling_factor,
        'avuncular': 1700 * scaling_factor,
        'first_cousin': 850 * scaling_factor,
        'first_cousin_once_removed': 425 * scaling_factor,
        'second_cousin': 212.5 * scaling_factor,
        'distant': max(50 * scaling_factor, 20)  # Set minimum threshold
    }
    
    # Return configuration
    return {
        "species": {
            "name": species_name,
            "genome_size_mb": genome_size_mb,
            "chromosome_count": chromosome_count,
            "recombination_rate": recombination_rate,
            "mating_system": mating_system,
            "total_genetic_length_cm": total_genetic_length
        },
        "genetic_map": {
            "directory": genetic_map_dir,
            "format": "standard",
            "scaling_factor": scaling_factor
        },
        "ibd_detection": {
            # Parameters for IBD detection adapted to species
        },
        "relationship_inference": {
            "ibd_thresholds": ibd_thresholds,
            # Additional parameters
        },
        "conservation_parameters": {
            "goal": conservation_goal,
            "bottleneck_severity": population_bottleneck_severity,
            # Additional parameters
        }
    }
```

### Exercise 3 Solution
A good solution should include:
1. Clear description of a specific enhancement (e.g., machine learning model, multi-modal integration)
2. Explanation of the problem it addresses
3. Architecture description with key components
4. Benefits and challenges assessment
5. Implementation approach

### Self-Assessment Answers

1. **Adapting Bonsai for founder populations**:
   - Increase expected IBD sharing thresholds for distant relationships
   - Adjust prior probabilities to account for higher likelihood of close relationships
   - Add endogamy handling in pedigree construction algorithms
   - Increase confidence thresholds for declaring relationships
   - These adaptations are needed because founder populations have higher background relatedness, making distant relationships appear closer than in outbred populations.

2. **Modifying IBD thresholds for a non-human species**:
   - Scale IBD thresholds by a factor of approximately 0.5 (proportional to genome size)
   - Adjust for differences in recombination rate
   - Consider mating system structure in relationship priors
   - This scaling is needed because expected IBD sharing is directly related to genome size and genetic map length.

3. **Federated learning components for pedigree reconstruction**:
   - Local models at each participating organization
   - Secure aggregation protocol for model updates
   - Privacy-preserving inference mechanism
   - Privacy benefits include keeping sensitive data within its organization of origin, minimizing risk of data breaches, and allowing participation without central data sharing.

4. **Graph neural network improvements for pedigree construction**:
   - Learn optimal pedigree structures from training data
   - Better handle ambiguous relationship evidence
   - Incorporate both genetic and non-genetic information
   - Generalize across different population structures
   - Current algorithmic approaches are limited by hard-coded rules and heuristics, while GNNs can learn complex patterns.

5. **Ethical considerations for advanced pedigree reconstruction**:
   - Privacy protection and consent mechanisms
   - Potential for unexpected relationship discoveries
   - Diverse cultural perspectives on family structures
   - Equitable access and representation in databases
   - Prevention of misuse for discrimination or surveillance
   - Protecting vulnerable populations and indigenous groups