# Lab 24: Complex Relationship Patterns

## Overview

This notebook explores Bonsai v3's handling of complex relationship patterns through the `relationships.py` module. We'll examine how Bonsai represents, calculates, and analyses non-standard and compound genealogical connections that go beyond simple parent-child or sibling relationships.

**Learning Objectives:**
- Understand Bonsai v3's relationship representation system and tuple format
- Learn how to convert between different relationship representations
- Explore the implementation of compound relationship handling
- Analyze relationship distance computation in complex pedigrees
- Apply these techniques to real-world genetic genealogy scenarios

**Prerequisites:**
- Completion of Lab 9: Pedigree Data Structures
- Completion of Lab 12: Relationship Assessment
- Familiarity with basic relationship types and genetic inheritance patterns

**Estimated completion time:** 60-90 minutes

In [None]:
# Standard imports
import os
import sys
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
import networkx as nx
from IPython.display import display, HTML, Markdown
import inspect
import importlib

sys.path.append(os.path.dirname(os.getcwd()))

# Cross-compatibility setup
from scripts_support.lab_cross_compatibility import setup_environment, is_jupyterlite, save_results, save_plot

# Set up environment-specific paths
DATA_DIR, RESULTS_DIR = setup_environment()

# Set visualization styles
plt.style.use('seaborn-v0_8-whitegrid')
sns.set_context("notebook")
sns.set_palette("colorblind")  # Improve accessibility with colorblind-friendly palette

# Configure plot defaults for better readability
plt.rcParams.update({
    'figure.figsize': (10, 6),
    'font.size': 12,
    'axes.labelsize': 12,
    'axes.titlesize': 14,
    'xtick.labelsize': 10,
    'ytick.labelsize': 10
})

In [None]:
# Setup Bonsai module paths
if not is_jupyterlite():
    # In local environment, add the utils directory to system path
    utils_dir = os.getenv('PROJECT_UTILS_DIR', os.path.join(os.path.dirname(DATA_DIR), 'utils'))
    bonsaitree_dir = os.path.join(utils_dir, 'bonsaitree')
    
    # Add to path if it exists and isn't already there
    if os.path.exists(bonsaitree_dir) and bonsaitree_dir not in sys.path:
        sys.path.append(bonsaitree_dir)
        print(f"Added {bonsaitree_dir} to sys.path")
else:
    # In JupyterLite, use a simplified approach
    print("⚠️ Running in JupyterLite: Some Bonsai functionality may be limited.")
    print("This notebook is primarily designed for local execution where the Bonsai codebase is available.")

In [None]:
# Helper functions for exploring modules
def display_module_classes(module_name):
    """Display classes and their docstrings from a module"""
    try:
        # Import the module
        module = importlib.import_module(module_name)
        
        # Find all classes
        classes = inspect.getmembers(module, inspect.isclass)
        
        # Filter classes defined in this module (not imported)
        classes = [(name, cls) for name, cls in classes if cls.__module__ == module_name]
        
        if not classes:
            print(f"No classes found in module {module_name}")
            return
            
        # Print info for each class
        for name, cls in classes:
            display(Markdown(f"### Class: {name}"))
            
            # Get docstring
            doc = inspect.getdoc(cls)
            if doc:
                display(Markdown(f"**Documentation:**\
{doc}"))
            else:
                display(Markdown("*No documentation available*"))
            
            # Get methods
            methods = inspect.getmembers(cls, inspect.isfunction)
            public_methods = [(method_name, method) for method_name, method in methods 
                             if not method_name.startswith('_')]
            
            if public_methods:
                display(Markdown("**Public Methods:**"))
                for method_name, method in public_methods:
                    sig = inspect.signature(method)
                    display(Markdown(f"- `{method_name}{sig}`"))
            else:
                display(Markdown("*No public methods*"))
            
            display(Markdown("---"))
    except ImportError as e:
        print(f"Error importing module {module_name}: {e}")
    except Exception as e:
        print(f"Error processing module {module_name}: {e}")

def display_module_functions(module_name):
    """Display functions and their docstrings from a module"""
    try:
        # Import the module
        module = importlib.import_module(module_name)
        
        # Find all functions
        functions = inspect.getmembers(module, inspect.isfunction)
        
        # Filter functions defined in this module (not imported)
        functions = [(name, func) for name, func in functions if func.__module__ == module_name]
        
        if not functions:
            print(f"No functions found in module {module_name}")
            return
            
        # Filter public functions
        public_functions = [(name, func) for name, func in functions if not name.startswith('_')]
        
        if not public_functions:
            print(f"No public functions found in module {module_name}")
            return
            
        # Print info for each function
        for name, func in public_functions:                
            display(Markdown(f"### Function: {name}"))
            
            # Get signature
            sig = inspect.signature(func)
            display(Markdown(f"**Signature:** `{name}{sig}`"))
            
            # Get docstring
            doc = inspect.getdoc(func)
            if doc:
                display(Markdown(f"**Documentation:**\
{doc}"))
            else:
                display(Markdown("*No documentation available*"))
                
            display(Markdown("---"))
    except ImportError as e:
        print(f"Error importing module {module_name}: {e}")
    except Exception as e:
        print(f"Error processing module {module_name}: {e}")

def view_function_source(module_name, function_name):
    """Display the source code of a function"""
    try:
        # Import the module
        module = importlib.import_module(module_name)
        
        # Get the function
        func = getattr(module, function_name)
        
        # Get the source code
        source = inspect.getsource(func)
        
        # Print the source code with syntax highlighting
        display(Markdown(f"### Source code for `{function_name}`\
```python\
{source}\
```"))
    except ImportError as e:
        print(f"Error importing module {module_name}: {e}")
    except AttributeError:
        print(f"Function {function_name} not found in module {module_name}")
    except Exception as e:
        print(f"Error processing function {function_name}: {e}")

def view_class_source(module_name, class_name):
    """Display the source code of a class"""
    try:
        # Import the module
        module = importlib.import_module(module_name)
        
        # Get the class
        cls = getattr(module, class_name)
        
        # Get the source code
        source = inspect.getsource(cls)
        
        # Print the source code with syntax highlighting
        display(Markdown(f"### Source code for class `{class_name}`\
```python\
{source}\
```"))
    except ImportError as e:
        print(f"Error importing module {module_name}: {e}")
    except AttributeError:
        print(f"Class {class_name} not found in module {module_name}")
    except Exception as e:
        print(f"Error processing class {class_name}: {e}")

def explore_module(module_name):
    """Display a comprehensive overview of a module with classes and functions"""
    try:
        # Import the module
        module = importlib.import_module(module_name)
        
        # Module docstring
        doc = inspect.getdoc(module)
        display(Markdown(f"# Module: {module_name}"))
        
        if doc:
            display(Markdown(f"**Module Documentation:**\
{doc}"))
        else:
            display(Markdown("*No module documentation available*"))
            
        display(Markdown("---"))
        
        # Display classes
        display(Markdown("## Classes"))
        display_module_classes(module_name)
        
        # Display functions
        display(Markdown("## Functions"))
        display_module_functions(module_name)
        
    except ImportError as e:
        print(f"Error importing module {module_name}: {e}")
    except Exception as e:
        print(f"Error exploring module {module_name}: {e}")

## Check Bonsai Installation

Let's verify that the Bonsai v3 module is available for import:

In [None]:
try:
    from bonsaitree import v3
    print("✅ Successfully imported Bonsai v3 module")
    
    # Print Bonsai version information if available
    if hasattr(v3, "__version__"):
        print(f"Bonsai v3 version: {v3.__version__}")
    
    # List key submodules
    print("\
Available Bonsai submodules:")
    for module_name in dir(v3):
        if not module_name.startswith("_") and not module_name.startswith("__"):
            print(f"- {module_name}")
except ImportError as e:
    print(f"❌ Failed to import Bonsai v3 module: {e}")
    print("This lab requires access to the Bonsai v3 codebase.")
    print("Make sure you've properly set up your environment with the Bonsai repository.")

## Introduction

Genealogical relationships in the real world are not always simple and straightforward. While basic relationships like parent-child, sibling, and first cousin are well understood, many real-world scenarios involve complex relationship patterns that challenge standard analysis methods.

In this lab, we'll explore how Bonsai v3's `relationships.py` module handles these complex patterns, including:

**Key concepts we'll cover:**
- The relationship tuple representation system in Bonsai
- Converting between different relationship formats
- Handling compound relationships (related in multiple ways)
- Computing relationship distances in complex pedigrees
- Comparing and classifying relationships

## Part 1: The Relationship Representation System

### Theory and Background

In genetic genealogy, representing relationships mathematically is essential for analysis and inference. Bonsai v3 uses a tuple-based system to represent genealogical relationships. Let's explore the key formats used:

#### Relationship Tuple Format (u, d, a)

The primary relationship tuple format in Bonsai is a 3-element tuple with the structure:

```
(u, d, a)
```

where:
- `u` (up): Number of generations to go up from the first person to their ancestor
- `d` (down): Number of generations to go down from the ancestor to the second person
- `a` (ancestors): Number of common ancestors connecting the two people

This versatile format can represent any genealogical relationship. Examples:

- Parent-child: `(0, 1, 1)` or `(1, 0, 1)` (depending on direction)
- Full siblings: `(1, 1, 2)` (up 1 to parents, down 1 to sibling, 2 common ancestors)
- First cousins: `(2, 2, 2)` (up 2 to grandparents, down 2 to cousin, 2 common ancestors)
- Second cousins: `(3, 3, 2)` (up 3 to great-grandparents, down 3 to second cousin, 2 common ancestors)
- Half-siblings: `(1, 1, 1)` (up 1 to parent, down 1 to half-sibling, 1 common ancestor)

#### Degree of Relationship

From the (u, d, a) tuple, we can compute the "degree" of relationship, which is:

```
degree = u + d - a + 1
```

This formula yields:
- Degree 1: Parent-child, siblings (u=1, d=1, a=2 → 1+1-2+1 = 1)
- Degree 2: Grandparent, aunt/uncle, first cousin
- Degree 3: Great-grandparent, first cousin once removed, second cousin

The degree is often used as a simplified measure of relationship closeness.

#### Alternative Format (m, a)

An alternative representation uses:
```
(m, a)
```

where:
- `m` (meioses): Total number of meioses separating the individuals (= u + d)
- `a` (ancestors): Number of common ancestors connecting the two people

This format is useful for certain genetic calculations, as the number of meioses directly affects the expected amount of shared DNA.

### Implementation in Bonsai v3

Let's explore how Bonsai v3 implements its relationship representation system in the `relationships.py` module. First, we'll look at the functions responsible for relationship conversions and calculations:

In [ ]:
try:
    # Import the relationships module from Bonsai v3
    from bonsaitree.v3 import relationships
    print("✅ Successfully imported the relationships module")
    
    # Examine the module structure
    explore_module("bonsaitree.v3.relationships")
except ImportError as e:
    print(f"❌ Failed to import relationships module: {e}")
    print("Using alternative approach to explore module...")
    
    # Alternative approach: import directly from path
    try:
        sys.path.append(os.path.join(os.path.dirname(os.getcwd()), 'utils'))
        from bonsaitree.bonsaitree.v3 import relationships
        print("✅ Successfully imported the relationships module using alternative path")
        
        # Examine the module structure
        explore_module("bonsaitree.bonsaitree.v3.relationships")
    except ImportError as e:
        print(f"❌ Failed to import relationships module using alternative path: {e}")
        print("We'll continue with manual explanation of the module.")

In [ ]:
if 'relationships' not in globals():
    # If import failed, display key functions manually
    relationships_functions = """
    ### Function: reverse_rel

    **Signature:** `reverse_rel(rel: Optional[tuple[int, int, int]])`

    **Documentation:**
    Reverse a relationship tuple (u, d, a) to (d, u, a) or None to None.

    Args:
        rel: relationship tuple of the form (u, d, a) or None

    Returns:
        rev_rel: (d, u, a) or None

    ### Function: get_deg

    **Signature:** `get_deg(rel: Optional[tuple[int, int, int]])`

    **Documentation:**
    Get the degree of a relationship tuple.

    Args:
        rel: relationship tuple of the form (u, d, a), (m, a), or None

    Returns:
        deg: if rel is None or a is None: INF
             else: u + d - a + 1

    ### Function: join_rels

    **Signature:** `join_rels(rel_ab: Optional[tuple[int, int, int]], rel_bc: Optional[tuple[int, int, int]])`

    **Documentation:**
    For three individuals A, B, and C related by relAB and relBC, find relAC.

    Args:
        rel_ab: relationship between A and B of the form (u, d, a) or None
        rel_bc: relationship between B and C of the form (u, d, a) or None

    Returns:
        relAC: relationship between A and C of the form (u, d, a) or None

    ### Function: get_transitive_rel

    **Signature:** `get_transitive_rel(rel_list: list[Optional[tuple[int, int, int]]])`

    **Documentation:**
    For a list of relationships in rel_list that represent pairwise relationships from one person to the next in a chain
    of relatives, find the relationship between the first person in the list and the last person in the list.

    Args:
        rel_list: chain of relationships of the form [(up, down, num_ancs), ...]
                  where rel_list[i] is the relationship between individuals i and i+1.

    Returns:
        rel: The relationship between individual 0 and individual n specified
             by the chain of relationships in rel_list.

    ### Function: a_m_to_rel

    **Signature:** `a_m_to_rel(a: int, m: int)`

    **Documentation:**
    Convert number of meioses and number of common ancestors
    to a good approximate relationship of the form (up, down, num_common_ancs).

    Note that m and a do not capture the full distribution so
    this is a one-to-many map and we choose the "best" relationship,
    which is a collateral relationship, which I assume will be more
    common in the pedigrees we encounter.

    The distribution is "similar" for direct ancestral relationships
    so I think we can use it.

    Args:
        a: number of common ancestors
        m: number of meioses

    Returns:
        rel: "best" relationship corresponding to a and m
    """

    display(Markdown(relationships_functions))
    
    # Display implementations of key functions
    reverse_rel_impl = """
    def reverse_rel(
        rel : Optional[tuple[int, int, int]],
    ):
        \\"\\"\\"
        Reverse a relationship tuple (u, d, a)
        to (d, u, a) or None to None.

        Args:
            rel: relationship tuple of the form (u, d, a) or None

        Returns:
            rev_rel: (d, u, a) or None
        \\"\\"\\"
        if type(rel) is tuple:
            rev_rel = (rel[1], rel[0], rel[2])
        else:
            rev_rel = rel
        return rev_rel
    """
    
    get_deg_impl = """
    def get_deg(
        rel : Optional[tuple[int, int, int]],
    ):
        \\"\\"\\"
        Get the degree of a relationship tuple.

        Args:
            rel: relationship tuple of the form (u, d, a), (m, a), or None

        Returns:
            deg: if rel is None or a is None: INF
                 else: u + d - a + 1
        \\"\\"\\"
        if type(rel) is tuple:

            # get m and a
            if len(rel) == 3:
                u,d,a = rel
                m = u+d
            elif len(rel) == 2:
                m, a = rel

            # get degree
            if a is None:
                deg = INF
            elif m == 0:
                deg = 0
            else:
                deg = m - a + 1
        else:
            deg = INF
        return deg
    """
    
    display(Markdown(f"```python\
{reverse_rel_impl}\
```"))
    display(Markdown(f"```python\
{get_deg_impl}\
```"))

### Exercise 1: Relationship Representation and Conversion

Let's practice working with Bonsai's relationship representation system by implementing functions to convert between different relationship formats and compute relationship degrees.

**Task:** Implement the following functions to convert between relationship representations and calculate relationship degrees:

1. `convert_to_degree`: Calculate the degree of relationship from a (u, d, a) tuple
2. `classify_relationship`: Classify common relationships based on their (u, d, a) tuples
3. `convert_between_formats`: Convert between (u, d, a) and (m, a) formats

**Hint:** Remember that `degree = u + d - a + 1` and `m = u + d` (total meioses)

In [ ]:
# Exercise 1 code template
import math

def convert_to_degree(rel_tuple):
    """
    Calculate the degree of relationship from a (u, d, a) tuple.
    
    Args:
        rel_tuple: Relationship tuple of the form (u, d, a) or (m, a)
        
    Returns:
        degree: Degree of relationship (or float('inf') for unrelated/None)
    """
    # Check if rel_tuple is None
    if rel_tuple is None:
        return float('inf')
    
    # Get tuple components
    if len(rel_tuple) == 3:
        # (u, d, a) format
        u, d, a = rel_tuple
        
        # Check if a is None
        if a is None:
            return float('inf')
        
        # Calculate degree
        degree = u + d - a + 1
        
    elif len(rel_tuple) == 2:
        # (m, a) format
        m, a = rel_tuple
        
        # Check if a is None
        if a is None:
            return float('inf')
            
        # Special case for identical individuals
        if m == 0:
            return 0
            
        # Calculate degree
        degree = m - a + 1
    else:
        # Invalid tuple format
        return float('inf')
    
    return degree

def classify_relationship(rel_tuple):
    """
    Classify common relationships based on their (u, d, a) tuple.
    
    Args:
        rel_tuple: Relationship tuple of the form (u, d, a)
        
    Returns:
        relationship_name: String description of the relationship
    """
    # Handle None or invalid inputs
    if rel_tuple is None or len(rel_tuple) != 3:
        return "Unknown or invalid relationship"
    
    # Unpack the tuple
    u, d, a = rel_tuple
    
    # Handle special cases
    if a is None:
        return "Unrelated"
    
    # Self
    if u == 0 and d == 0:
        if a == 1:
            return "Self (identical individual)"
        if a == 2:
            return "Self (identical twins)"
    
    # Direct ancestors/descendants
    if u == 0 and d > 0 and a == 1:
        if d == 1:
            return "Child"
        if d == 2:
            return "Grandchild"
        if d == 3:
            return "Great-grandchild"
        return f"{d-1}x great-grandchild"
    
    if u > 0 and d == 0 and a == 1:
        if u == 1:
            return "Parent"
        if u == 2:
            return "Grandparent"
        if u == 3:
            return "Great-grandparent"
        return f"{u-1}x great-grandparent"
    
    # Siblings
    if u == 1 and d == 1:
        if a == 2:
            return "Full sibling"
        if a == 1:
            return "Half-sibling"
    
    # Aunts/Uncles and Nieces/Nephews
    if u == 2 and d == 1 and a == 1:
        return "Aunt/Uncle"
    if u == 1 and d == 2 and a == 1:
        return "Niece/Nephew"
    
    # Cousins
    if u >= 2 and d >= 2:
        if u == d:  # Same generation
            cousin_degree = u - 1
            if a == 2:
                if cousin_degree == 1:
                    return "First cousin"
                if cousin_degree == 2:
                    return "Second cousin"
                if cousin_degree == 3:
                    return "Third cousin"
                return f"{cousin_degree}th cousin"
            if a == 1:
                if cousin_degree == 1:
                    return "Half first cousin"
                if cousin_degree == 2:
                    return "Half second cousin"
                if cousin_degree == 3:
                    return "Half third cousin"
                return f"Half {cousin_degree}th cousin"
        else:  # Different generations
            min_side = min(u, d)
            removal = abs(u - d)
            cousin_degree = min_side - 1
            
            if cousin_degree == 1:
                cousin_str = "First cousin"
            elif cousin_degree == 2:
                cousin_str = "Second cousin"
            elif cousin_degree == 3:
                cousin_str = "Third cousin"
            else:
                cousin_str = f"{cousin_degree}th cousin"
                
            if removal == 1:
                return f"{cousin_str} once removed"
            if removal == 2:
                return f"{cousin_str} twice removed"
            return f"{cousin_str} {removal} times removed"
    
    # Default case for other relationships
    return f"Complex relationship (u={u}, d={d}, a={a})"

def convert_between_formats(rel_tuple, target_format='m_a'):
    """
    Convert between (u, d, a) and (m, a) relationship formats.
    
    Args:
        rel_tuple: Relationship tuple to convert
        target_format: Target format, either 'u_d_a' or 'm_a'
        
    Returns:
        converted_tuple: Relationship tuple in the target format
    """
    # Handle None inputs
    if rel_tuple is None:
        return None
    
    # Convert to (m, a) format
    if target_format == 'm_a':
        if len(rel_tuple) == 3:
            # Current format is (u, d, a)
            u, d, a = rel_tuple
            m = u + d
            return (m, a)
        elif len(rel_tuple) == 2:
            # Already in (m, a) format
            return rel_tuple
        else:
            # Invalid input
            return None
            
    # Convert to (u, d, a) format
    elif target_format == 'u_d_a':
        if len(rel_tuple) == 2:
            # Current format is (m, a)
            m, a = rel_tuple
            
            # Special cases for self-relationship or invalid input
            if m == 0:
                return (0, 0, a)
                
            # For other cases, we need to make an assumption about u and d
            # We'll use a_m_to_rel logic: prefer collateral relationship when possible
            if m == 1:
                return (1, 0, 1)  # Parent-child
            else:
                # Default to symmetric relationship (similar to cousins)
                u = m // 2
                d = m - u
                return (u, d, a)
        elif len(rel_tuple) == 3:
            # Already in (u, d, a) format
            return rel_tuple
        else:
            # Invalid input
            return None
    else:
        # Invalid target format
        return None

# Test cases for the functions
rel_tuples = [
    (0, 0, 1),     # Self
    (1, 0, 1),     # Parent
    (0, 1, 1),     # Child
    (1, 1, 2),     # Full sibling
    (1, 1, 1),     # Half-sibling
    (2, 0, 1),     # Grandparent
    (0, 2, 1),     # Grandchild
    (2, 2, 2),     # First cousin
    (3, 3, 2),     # Second cousin
    (2, 3, 2),     # First cousin once removed
    (None),        # None
    (4, 4, 2),     # Third cousin
    (2, 1, 1),     # Aunt/Uncle
]

# Test the convert_to_degree function
print("Testing convert_to_degree:")
for rel in rel_tuples:
    if rel is not None:
        degree = convert_to_degree(rel)
        print(f"Tuple {rel} has degree: {degree}")
    else:
        print(f"Tuple {rel} has degree: {convert_to_degree(rel)}")

print("\
Testing classify_relationship:")
for rel in rel_tuples:
    if rel is not None:
        relationship = classify_relationship(rel)
        print(f"Tuple {rel} represents: {relationship}")
    else:
        print(f"Tuple {rel} represents: {classify_relationship(rel)}")

print("\
Testing convert_between_formats:")
for rel in rel_tuples:
    if rel is not None and len(rel) == 3:
        m_a = convert_between_formats(rel, 'm_a')
        back_to_u_d_a = convert_between_formats(m_a, 'u_d_a')
        print(f"(u,d,a)={rel} → (m,a)={m_a} → (u,d,a)={back_to_u_d_a}")

## Part 2: Joining and Combining Relationships

### Theory and Background

In real-world genealogies, individuals are often related in multiple ways or through complex chains of relationships. Bonsai v3 provides functions to compute these complex relationships:

1. **Joining Relationships**: Finding the relationship between A and C given the relationships between A and B, and B and C

2. **Transitive Relationships**: Computing the relationship between the first and last individuals in a chain of relatives

3. **Compound Relationships**: Handling cases where two individuals are related in multiple different ways (e.g., double first cousins)

Let's examine these concepts and how they're implemented in Bonsai v3.

### Implementation in Bonsai v3

Let's look at how Bonsai v3 implements joining and combining relationships in the `relationships.py` module. The key functions for this are `join_rels` and `get_transitive_rel`.

#### The `join_rels` Function

The `join_rels` function finds the relationship between A and C given the relationships between A and B, and B and C. Let's examine its implementation:

In [ ]:
join_rels_impl = """
def join_rels(
    rel_ab : Optional[tuple[int, int, int]],
    rel_bc : Optional[tuple[int, int, int]],
):
    \\"\\"\\"
    For three individuals A, B, and C
    related by relAB and relBC, find relAC.

    Args:
        rel_ab: relationship between A and B of the form (u, d, a) or None
        rel_bc: relationship between B and C of the form (u, d, a) or None

    Returns:
        relAC: relationship between A and C of the form (u, d, a) or None
    \\"\\"\\"

    if (rel_ab is None) or (rel_bc is None):
        return None

    if rel_ab == (0, 0, 2):
        return rel_bc

    if rel_bc == (0, 0, 2):
        return rel_ab

    u1, d1, a1 = rel_ab
    u2, d2, a2 = rel_bc

    if a1 is None or a2 is None:
        return None

    if d1 > 0 and u2 > 0:
        return None

    if u1 > 0 and d1 > 0 and u2 == 0 and d2 > 0:
        a = a1
    elif u1 == 0 and d1 > 0 and u2 == 0 and d2 > 0:
        a = a2
    elif u1 > 0 and d1 == 0 and u2 > 0 and d2 == 0:
        a = a2
    elif u1 > 0 and d1 == 0 and u2 > 0 and d2 > 0:
        a = a2
    elif u1 > 0 and d1 == 0 and u2 == 0 and d2 > 0:
        a = a1
    elif u1 > 0 and d1 > 0 and u2 == 0 and d2 == 0:
        a = a1
    elif u1 == 0 and d1 > 0 and u2 == 0 and d2 == 0:
        a = a1
    elif u1 > 0 and d1 == 0 and u2 == 0 and d2 == 0:
        a = a1
    elif u1 == 0 and d1 == 0 and u2 == 0 and d2 > 0:
        a = a2
    elif u1 == 0 and d1 == 0 and u2 > 0 and d2 > 0:
        a = a2

    u = u1 + u2
    d = d1 + d2

    return (u, d, a)
"""

display(Markdown(f"```python\
{join_rels_impl}\
```"))

get_transitive_rel_impl = """
def get_transitive_rel(
    rel_list : list[Optional[tuple[int, int, int]]],
):
    \\"\\"\\"
    For a list of relationships in rel_list that represent
    pairwise relationships from one person to the next in a chain
    of relatives, find the relationship between the first person
    in the list and the last person in the list.

    Args:
        rel_list: chain of relationships of the form [(up, down, num_ancs), ...]
                  where rel_list[i] is the relationship between individuals i and i+1.

    Returns:
        rel: The relationship between individual 0 and individual n specified
             by the chain of relationships in rel_list.
    \\"\\"\\"
    if rel_list == []:
        return None

    rel = rel_list.pop(0)
    while rel_list:
        next_rel = rel_list.pop(0)
        rel = join_rels(rel, next_rel)

    # ensure that ancestor/descendant relationships
    # only have one ancestor. We can't ensure this
    # within join_rels() so we have to do it here.
    if rel is not None and (rel[0] == 0 or rel[1] == 0):
        rel = (rel[0], rel[1], 1)

    return rel
"""

display(Markdown(f"```python\
{get_transitive_rel_impl}\
```"))

#### Understanding the `join_rels` Function

The `join_rels` function works as follows:

1. **Input Validation**: If either input relationship is `None`, it returns `None`.

2. **Special Case Handling**: 
   - If `rel_ab` is the self-relationship `(0, 0, 2)`, then A and B are identical, so A's relationship to C is the same as B's relationship to C: `return rel_bc`.
   - Similarly, if `rel_bc` is the self-relationship, A's relationship to C is the same as A's relationship to B: `return rel_ab`.

3. **Ancestor Determination**:
   - The function contains a series of conditional statements to determine the number of common ancestors (`a`) for the resulting relationship.
   - This depends on the relationship types and directions.

4. **Final Calculation**:
   - `u = u1 + u2`: The total "up" generations is the sum of the "up" generations from A to B and from B to C.
   - `d = d1 + d2`: The total "down" generations is the sum of the "down" generations from A to B and from B to C.
   - Returns the new relationship tuple `(u, d, a)`.

#### Understanding the `get_transitive_rel` Function

The `get_transitive_rel` function uses `join_rels` to compute the relationship between the first and last individuals in a chain of relatives:

1. If the list is empty, return `None`.
2. Start with the first relationship in the list.
3. Iteratively join it with the next relationship in the chain using `join_rels`.
4. For ancestor/descendant relationships (where either u or d is 0), ensure exactly one ancestor.
5. Return the final relationship.

### Exercise 2: Joining Relationships in a Pedigree

Let's practice working with Bonsai's relationship joining functions by implementing a function to calculate relationships in a simple pedigree.

**Task:** Implement the `find_relationship` function to find the relationship between any two individuals in a simple pedigree by finding a path of known relationships.

**Hint:** Use a graph search algorithm to find a path between the individuals, then use `get_transitive_rel` to compute the relationship along that path.

In [ ]:
# Exercise 2 code template
import networkx as nx
from collections import deque

def find_relationship(pedigree_rels, id1, id2):
    """
    Find the relationship between two individuals in a pedigree.
    
    Args:
        pedigree_rels: Dictionary mapping (id1, id2) pairs to relationship tuples
        id1: ID of the first individual
        id2: ID of the second individual
        
    Returns:
        relationship: Relationship tuple of the form (u, d, a) or None if no path exists
    """
    # Direct relationship check
    if (id1, id2) in pedigree_rels:
        return pedigree_rels[(id1, id2)]
    if (id2, id1) in pedigree_rels:
        # Reverse the relationship if needed
        rel = pedigree_rels[(id2, id1)]
        if rel is not None:
            return (rel[1], rel[0], rel[2])  # Reverse (u, d, a) to (d, u, a)
    
    # Build a graph representing the pedigree
    G = nx.Graph()
    
    # Add all known relationships as edges
    for (person1, person2), rel in pedigree_rels.items():
        if rel is not None:  # Only add valid relationships
            # Add both individuals as nodes if they don't exist
            if person1 not in G:
                G.add_node(person1)
            if person2 not in G:
                G.add_node(person2)
            
            # Add an edge between them with the relationship as an attribute
            G.add_edge(person1, person2, relationship=rel)
    
    # Check if both individuals exist in the graph
    if id1 not in G or id2 not in G:
        return None
    
    # Find the shortest path between the individuals
    try:
        path = nx.shortest_path(G, id1, id2)
    except nx.NetworkXNoPath:
        return None
    
    # Extract the relationships along the path
    rel_list = []
    for i in range(len(path) - 1):
        person1 = path[i]
        person2 = path[i + 1]
        
        # Get the relationship between these two individuals
        if (person1, person2) in pedigree_rels:
            rel = pedigree_rels[(person1, person2)]
        elif (person2, person1) in pedigree_rels:
            # Reverse the relationship if needed
            rel = pedigree_rels[(person2, person1)]
            if rel is not None:
                rel = (rel[1], rel[0], rel[2])  # Reverse (u, d, a) to (d, u, a)
        else:
            # This shouldn't happen if the graph was built correctly
            rel = None
        
        # Add to the relationship list
        rel_list.append(rel)
    
    # Compute the transitive relationship
    return get_transitive_rel(rel_list)

# Implementation of get_transitive_rel for this exercise
def get_transitive_rel(rel_list):
    """
    For a list of relationships in rel_list that represent
    pairwise relationships from one person to the next in a chain
    of relatives, find the relationship between the first person
    in the list and the last person in the list.
    """
    if not rel_list:
        return None
    
    # Make a copy to avoid modifying the original
    rel_list_copy = rel_list.copy()
    
    rel = rel_list_copy.pop(0)
    while rel_list_copy:
        next_rel = rel_list_copy.pop(0)
        rel = join_rels(rel, next_rel)
    
    # Ensure ancestor/descendant relationships have one ancestor
    if rel is not None and (rel[0] == 0 or rel[1] == 0):
        rel = (rel[0], rel[1], 1)
    
    return rel

# Implementation of join_rels for this exercise
def join_rels(rel_ab, rel_bc):
    """
    For three individuals A, B, and C
    related by relAB and relBC, find relAC.
    """
    if (rel_ab is None) or (rel_bc is None):
        return None
    
    # Special cases for self relationships
    if rel_ab == (0, 0, 2):
        return rel_bc
    if rel_bc == (0, 0, 2):
        return rel_ab
    
    # Extract components
    u1, d1, a1 = rel_ab
    u2, d2, a2 = rel_bc
    
    # Check for invalid relationships
    if a1 is None or a2 is None:
        return None
    if d1 > 0 and u2 > 0:
        return None
    
    # Determine the number of common ancestors
    if u1 > 0 and d1 > 0 and u2 == 0 and d2 > 0:
        a = a1
    elif u1 == 0 and d1 > 0 and u2 == 0 and d2 > 0:
        a = a2
    elif u1 > 0 and d1 == 0 and u2 > 0 and d2 == 0:
        a = a2
    elif u1 > 0 and d1 == 0 and u2 > 0 and d2 > 0:
        a = a2
    elif u1 > 0 and d1 == 0 and u2 == 0 and d2 > 0:
        a = a1
    elif u1 > 0 and d1 > 0 and u2 == 0 and d2 == 0:
        a = a1
    elif u1 == 0 and d1 > 0 and u2 == 0 and d2 == 0:
        a = a1
    elif u1 > 0 and d1 == 0 and u2 == 0 and d2 == 0:
        a = a1
    elif u1 == 0 and d1 == 0 and u2 == 0 and d2 > 0:
        a = a2
    elif u1 == 0 and d1 == 0 and u2 > 0 and d2 > 0:
        a = a2
    else:
        a = min(a1, a2)  # Default case
    
    # Calculate the total number of "up" and "down" generations
    u = u1 + u2
    d = d1 + d2
    
    return (u, d, a)

# Test pedigree relationships
# A simple family tree with parent-child and sibling relationships
pedigree_relationships = {
    # Grandparents and parents
    ('GF1', 'F1'): (0, 1, 1),  # GF1 is father of F1
    ('GM1', 'F1'): (0, 1, 1),  # GM1 is mother of F1
    ('GF1', 'F2'): (0, 1, 1),  # GF1 is father of F2
    ('GM1', 'F2'): (0, 1, 1),  # GM1 is mother of F2
    
    # F1 and M1 have children C1, C2
    ('F1', 'C1'): (0, 1, 1),   # F1 is father of C1
    ('M1', 'C1'): (0, 1, 1),   # M1 is mother of C1
    ('F1', 'C2'): (0, 1, 1),   # F1 is father of C2
    ('M1', 'C2'): (0, 1, 1),   # M1 is mother of C2
    
    # F2 and M2 have children N1, N2
    ('F2', 'N1'): (0, 1, 1),   # F2 is father of N1
    ('M2', 'N1'): (0, 1, 1),   # M2 is mother of N1
    ('F2', 'N2'): (0, 1, 1),   # F2 is father of N2
    ('M2', 'N2'): (0, 1, 1),   # M2 is mother of N2
    
    # Sibling relationships (automatically derivable, but included for completeness)
    ('C1', 'C2'): (1, 1, 2),   # C1 and C2 are full siblings
    ('N1', 'N2'): (1, 1, 2),   # N1 and N2 are full siblings
    ('F1', 'F2'): (1, 1, 2),   # F1 and F2 are full siblings
}

# Test cases for the find_relationship function
test_pairs = [
    ('GF1', 'C1'),    # Great-grandparent to great-grandchild
    ('C1', 'C2'),     # Siblings
    ('C1', 'N1'),     # First cousins
    ('F1', 'N1'),     # Uncle/Aunt to niece/nephew
    ('GF1', 'N1'),    # Great-grandparent to great-grandchild
    ('M1', 'M2'),     # Unrelated (no path should exist)
]

# Test the function
for id1, id2 in test_pairs:
    rel = find_relationship(pedigree_relationships, id1, id2)
    rel_name = classify_relationship(rel) if rel is not None else "No relationship found"
    print(f"Relationship between {id1} and {id2}: {rel} - {rel_name}")

# Visualize the pedigree for clarity
G = nx.Graph()
for (person1, person2), rel in pedigree_relationships.items():
    G.add_edge(person1, person2, relationship=rel)

plt.figure(figsize=(12, 8))
pos = nx.spring_layout(G, seed=42)
nx.draw(G, pos, with_labels=True, node_color='lightblue', node_size=500, font_size=10)
nx.draw_networkx_edge_labels(G, pos, edge_labels={(u, v): str(d['relationship']) for u, v, d in G.edges(data=True)})
plt.title('Pedigree Relationship Graph')
plt.tight_layout()
plt.show()

## Part 3: Compound and Complex Relationships

### Theory and Background

In real-world genealogies, individuals are sometimes related in multiple ways, creating **compound relationships**. Common examples include:

1. **Double First Cousins**: When two siblings from one family marry two siblings from another family, their children are related through both their maternal and paternal lines, sharing more DNA than typical first cousins.

2. **Endogamous Populations**: In populations with high rates of intermarriage (like isolated communities, island populations, or certain cultural groups), individuals may be related through multiple different ancestral paths.

3. **Pedigree Collapse**: When the same ancestor appears multiple times in a person's family tree, creating multiple paths of relationship between descendants.

These complex relationship patterns pose challenges for genetic genealogy:

- **Increased Genetic Sharing**: Individuals with compound relationships typically share more DNA than those with simple relationships of the same degree.
- **Multiple Paths**: Relationship inference must account for all paths of connection.
- **Ambiguity**: The increased sharing can make relationship types harder to distinguish.

Bonsai v3 addresses these challenges through its relationship model and by computing total expected genetic sharing from all paths of relationship.

### Implementation for Compound Relationships

When dealing with compound relationships, Bonsai v3 follows these general principles:

1. **Represent Each Path Separately**: Each path of relationship is represented by its own relationship tuple.

2. **Compute Expected Sharing for Each Path**: For each path, calculate the expected amount of genetic sharing.

3. **Combined Model**: The total expected sharing is a function of all paths, accounting for potential overlaps.

Let's explore a simple model for handling compound relationships by calculating expected IBD sharing:

In [ ]:
# Model for calculating expected IBD sharing

def calculate_expected_ibd(relationship_tuple, genome_length=3545):
    """
    Calculate expected IBD sharing for a relationship.
    
    Args:
        relationship_tuple: Tuple of the form (u, d, a) or (m, a)
        genome_length: Total genome length in centiMorgans (default: 3545 cM for autosomes)
        
    Returns:
        expected_ibd: Expected IBD sharing in centiMorgans
    """
    # Handle None input
    if relationship_tuple is None:
        return 0
    
    # Extract tuple components
    if len(relationship_tuple) == 3:
        # (u, d, a) format
        u, d, a = relationship_tuple
        m = u + d  # Total meioses
    elif len(relationship_tuple) == 2:
        # (m, a) format
        m, a = relationship_tuple
    else:
        # Invalid format
        return 0
    
    # Check for self relationship
    if m == 0:
        return genome_length
    
    # Calculate expected sharing
    # For each ancestor, the proportion of the genome expected to be shared is (1/2)^m
    # Multiple by genome_length to get the sharing in cM
    expected_ibd = a * (0.5 ** m) * genome_length
    
    return expected_ibd

def calculate_compound_ibd(relationship_tuples, genome_length=3545):
    """
    Calculate expected total IBD sharing for compound relationships.
    
    Args:
        relationship_tuples: List of relationship tuples (u, d, a) or (m, a)
        genome_length: Total genome length in centiMorgans
        
    Returns:
        total_expected_ibd: Total expected IBD sharing in centiMorgans
    """
    # Simple model: calculate sharing for each relationship and sum
    # This is a simplification as it doesn't account for overlaps
    total_expected_ibd = 0
    for rel_tuple in relationship_tuples:
        total_expected_ibd += calculate_expected_ibd(rel_tuple, genome_length)
    
    # Cap at genome length (simplistic approach to handle overlaps)
    total_expected_ibd = min(total_expected_ibd, genome_length)
    
    return total_expected_ibd

# Examples of compound relationships
print("Expected IBD sharing for simple and compound relationships:")
print("-" * 60)

# Simple relationships
simple_relationships = [
    ((1, 0, 1), "Parent-child"),
    ((1, 1, 2), "Full siblings"),
    ((1, 1, 1), "Half-siblings"),
    ((2, 2, 2), "First cousins"),
    ((3, 3, 2), "Second cousins")
]

for rel_tuple, rel_name in simple_relationships:
    expected_ibd = calculate_expected_ibd(rel_tuple)
    print(f"{rel_name} {rel_tuple}: {expected_ibd:.2f} cM")

print("\
Compound relationships:")
print("-" * 60)

# Compound relationships
compound_relationships = [
    ([((2, 2, 2), "First cousins paternal side"), ((2, 2, 2), "First cousins maternal side")], "Double first cousins"),
    ([((1, 1, 1), "Half-siblings paternal side"), ((3, 3, 2), "Second cousins maternal side")], "Half-siblings + second cousins"),
    ([((2, 2, 2), "First cousins path 1"), ((3, 3, 2), "Second cousins path 2"), ((4, 4, 2), "Third cousins path 3")], "Multiple cousin paths")
]

for rel_tuples_with_names, compound_name in compound_relationships:
    rel_tuples = [rel_tuple for rel_tuple, _ in rel_tuples_with_names]
    expected_ibd = calculate_compound_ibd(rel_tuples)
    
    # Calculate individual IBD values for comparison
    individual_ibds = [(name, calculate_expected_ibd(rel_tuple)) for rel_tuple, name in rel_tuples_with_names]
    individual_ibds_str = ", ".join([f"{name}: {ibd:.2f} cM" for name, ibd in individual_ibds])
    
    print(f"{compound_name}:")
    print(f"  Individual paths: {individual_ibds_str}")
    print(f"  Combined expected IBD: {expected_ibd:.2f} cM")
    print()

In [ ]:
# Visualize compound relationships
import matplotlib.pyplot as plt
import numpy as np

def create_relationship_chart(relationship_info, title):
    """Create a bar chart comparing IBD sharing for relationships"""
    labels = [name for name, _ in relationship_info]
    ibd_values = [ibd for _, ibd in relationship_info]
    
    plt.figure(figsize=(10, 6))
    bars = plt.bar(labels, ibd_values, color='skyblue')
    
    # Add data labels
    for bar in bars:
        height = bar.get_height()
        plt.text(bar.get_x() + bar.get_width()/2., height + 20,
                 f'{height:.1f} cM',
                 ha='center', va='bottom')
    
    plt.xlabel('Relationship')
    plt.ylabel('Expected IBD Sharing (cM)')
    plt.title(title)
    plt.xticks(rotation=45, ha='right')
    plt.tight_layout()
    plt.ylim(0, max(ibd_values) * 1.2)  # Add some headroom for labels
    plt.show()

# Prepare data for visualization
# 1. Simple relationships
simple_rel_data = [(rel_name, calculate_expected_ibd(rel_tuple)) 
                  for rel_tuple, rel_name in simple_relationships]
create_relationship_chart(simple_rel_data, 'Expected IBD Sharing by Relationship Type')

# 2. Compound relationships
compound_rel_data = []

# Add individual and combined paths for double first cousins
double_fc_data = []
double_fc_data.append(("First Cousin (1 path)", calculate_expected_ibd((2, 2, 2))))
double_fc_data.append(("Double First Cousin", 
                      calculate_compound_ibd([(2, 2, 2), (2, 2, 2)])))
create_relationship_chart(double_fc_data, 'Comparison: First Cousin vs Double First Cousin')

# Compare multiple compound relationships
compound_comparisons = [
    ("First Cousin", calculate_expected_ibd((2, 2, 2))),
    ("Half Sibling", calculate_expected_ibd((1, 1, 1))),
    ("Double First Cousin", calculate_compound_ibd([(2, 2, 2), (2, 2, 2)])),
    ("Half Sibling + 2nd Cousin", 
     calculate_compound_ibd([(1, 1, 1), (3, 3, 2)])),
    ("Triple Path Cousin", 
     calculate_compound_ibd([(2, 2, 2), (3, 3, 2), (4, 4, 2)]))
]

create_relationship_chart(compound_comparisons, 'Comparison of Simple and Compound Relationships')

### Exercise 3: Modeling Compound Relationships

In this exercise, you'll implement a more sophisticated model for calculating expected IBD sharing in compound relationships, and visualize complex family structures with multiple relationship paths.

**Task:** Implement a function to create and visualize a pedigree with compound relationships, and calculate expected IBD sharing between individuals who are related in multiple ways.

**Hint:** Use a graph to represent the pedigree, find all paths between individuals, and calculate the expected IBD sharing accounting for all paths.

In [ ]:
# Exercise 3 code template
import networkx as nx
import matplotlib.pyplot as plt
import numpy as np
from collections import defaultdict

def create_compound_relationship_pedigree():
    """
    Create a pedigree with compound relationships.
    
    In this case, we'll create a pedigree with:
    1. Double first cousins
    2. Half-siblings who are also second cousins
    3. Multiple paths between other individuals
    
    Returns:
        G: NetworkX DiGraph representing the pedigree
    """
    # Create a directed graph for the pedigree
    G = nx.DiGraph()
    
    # Generation 1 (great-grandparents)
    G.add_node("GGF1", generation=1, sex="M")
    G.add_node("GGM1", generation=1, sex="F")
    G.add_node("GGF2", generation=1, sex="M")
    G.add_node("GGM2", generation=1, sex="F")
    
    # Generation 2 (grandparents)
    # Family 1
    G.add_node("GF1", generation=2, sex="M")
    G.add_node("GM1", generation=2, sex="F")
    G.add_node("GF2", generation=2, sex="M")
    G.add_node("GM2", generation=2, sex="F")
    
    # Family 2
    G.add_node("GF3", generation=2, sex="M")
    G.add_node("GM3", generation=2, sex="F")
    G.add_node("GF4", generation=2, sex="M")
    G.add_node("GM4", generation=2, sex="F")
    
    # Connect Generation 1 to 2
    G.add_edge("GGF1", "GF1")
    G.add_edge("GGM1", "GF1")
    G.add_edge("GGF1", "GF2")
    G.add_edge("GGM1", "GF2")
    
    G.add_edge("GGF2", "GM1")
    G.add_edge("GGM2", "GM1")
    G.add_edge("GGF2", "GM2")
    G.add_edge("GGM2", "GM2")
    
    # Generation 3 (parents)
    # Family 1
    G.add_node("F1", generation=3, sex="M")  # Child of GF1 and GM1
    G.add_node("M1", generation=3, sex="F")  # Child of GF3 and GM3
    G.add_node("F2", generation=3, sex="M")  # Child of GF2 and GM2
    G.add_node("M2", generation=3, sex="F")  # Child of GF4 and GM4
    
    # Connect Generation 2 to 3
    G.add_edge("GF1", "F1")
    G.add_edge("GM1", "F1")
    G.add_edge("GF2", "F2")
    G.add_edge("GM2", "F2")
    G.add_edge("GF3", "M1")
    G.add_edge("GM3", "M1")
    G.add_edge("GF4", "M2")
    G.add_edge("GM4", "M2")
    
    # Generation 4 (subject generation)
    G.add_node("C1", generation=4, sex="M")  # Child of F1 and M1
    G.add_node("C2", generation=4, sex="F")  # Child of F1 and M1 (sibling of C1)
    G.add_node("C3", generation=4, sex="M")  # Child of F2 and M2
    G.add_node("C4", generation=4, sex="F")  # Child of F2 and M2 (sibling of C3)
    
    # Connect Generation 3 to 4
    G.add_edge("F1", "C1")
    G.add_edge("M1", "C1")
    G.add_edge("F1", "C2")
    G.add_edge("M1", "C2")
    G.add_edge("F2", "C3")
    G.add_edge("M2", "C3")
    G.add_edge("F2", "C4")
    G.add_edge("M2", "C4")
    
    # Create a compound relationship - add an additional parent-child relationship to create 
    # half-siblings who are also related in other ways
    G.add_node("H", generation=4, sex="M")  # Half-sibling to C1 and C2
    G.add_edge("F1", "H")  # Same father as C1 and C2
    # Different mother (not shown) for H
    
    # Connect families in additional ways to create more compound relationships
    # Make GF3 and GF4 brothers (children of same great-grandparents)
    G.add_edge("GGF1", "GF3")
    G.add_edge("GGM1", "GF3")
    G.add_edge("GGF1", "GF4")
    G.add_edge("GGM1", "GF4")
    
    return G

def visualize_pedigree(G, highlight_path=None):
    """
    Visualize a pedigree graph.
    
    Args:
        G: NetworkX graph representing the pedigree
        highlight_path: Optional list of nodes to highlight as a path
    """
    plt.figure(figsize=(16, 10))
    
    # Position nodes by generation level
    pos = {}
    generation_nodes = defaultdict(list)
    
    # Group nodes by generation
    for node, data in G.nodes(data=True):
        gen = data.get('generation', 0)
        generation_nodes[gen].append(node)
    
    # Position nodes by generation (vertical) and distribute horizontally
    for gen, nodes in generation_nodes.items():
        y = -gen * 2  # Vertical position by generation
        
        for i, node in enumerate(nodes):
            x = i - len(nodes)/2  # Center nodes horizontally
            pos[node] = (x, y)
    
    # Draw regular edges
    nx.draw_networkx_edges(G, pos, width=1.0, alpha=0.5, 
                           arrowstyle='->', arrowsize=15)
    
    # Highlight specific path if provided
    if highlight_path and len(highlight_path) > 1:
        path_edges = [(highlight_path[i], highlight_path[i+1]) 
                     for i in range(len(highlight_path)-1)]
        nx.draw_networkx_edges(G, pos, edgelist=path_edges, 
                               width=3.0, edge_color='red', 
                               arrowstyle='->', arrowsize=20)
    
    # Draw nodes with different colors by sex
    male_nodes = [n for n, d in G.nodes(data=True) if d.get('sex') == 'M']
    female_nodes = [n for n, d in G.nodes(data=True) if d.get('sex') == 'F']
    
    nx.draw_networkx_nodes(G, pos, nodelist=male_nodes, 
                          node_color='skyblue', node_size=500)
    nx.draw_networkx_nodes(G, pos, nodelist=female_nodes, 
                          node_color='lightpink', node_size=500)
    
    # Draw node labels
    nx.draw_networkx_labels(G, pos, font_size=10)
    
    plt.axis('off')
    plt.title('Pedigree with Compound Relationships')
    plt.tight_layout()
    plt.show()

def find_all_relationship_paths(G, id1, id2, max_depth=8):
    """
    Find all paths that represent genealogical relationships between two individuals.
    
    Args:
        G: NetworkX graph representing the pedigree
        id1: ID of the first individual
        id2: ID of the second individual
        max_depth: Maximum path length to consider
        
    Returns:
        paths: List of paths that connect the individuals through common ancestors
    """
    # Find all ancestors of each individual
    def get_ancestors(G, node, max_depth):
        ancestors = set()
        queue = [(node, 0)]  # (node, depth)
        
        while queue:
            current, depth = queue.pop(0)
            
            if depth > max_depth:
                continue
                
            # Get all parents
            parents = list(G.predecessors(current))
            
            for parent in parents:
                ancestors.add(parent)
                queue.append((parent, depth + 1))
                
        return ancestors
    
    # Get ancestors of both individuals
    ancestors1 = get_ancestors(G, id1, max_depth)
    ancestors2 = get_ancestors(G, id2, max_depth)
    
    # Find common ancestors
    common_ancestors = ancestors1.intersection(ancestors2)
    
    if not common_ancestors:
        return []  # No common ancestors
    
    # For each common ancestor, find paths to both individuals
    all_paths = []
    
    for ancestor in common_ancestors:
        # Find all paths from ancestor to id1
        paths_to_id1 = list(nx.all_simple_paths(G, ancestor, id1, cutoff=max_depth))
        
        # Find all paths from ancestor to id2
        paths_to_id2 = list(nx.all_simple_paths(G, ancestor, id2, cutoff=max_depth))
        
        # Combine paths to create full relationship paths
        for path1 in paths_to_id1:
            for path2 in paths_to_id2:
                # Create full path: id1 <- ancestor -> id2
                full_path = list(reversed(path1[:-1])) + path2
                all_paths.append(full_path)
    
    return all_paths

def convert_path_to_relationship(path):
    """
    Convert a path in the pedigree to a relationship tuple.
    
    Args:
        path: List of nodes representing a path from one individual to another
              through a common ancestor
        
    Returns:
        relationship: Tuple of the form (u, d, a) representing the relationship
    """
    if not path or len(path) < 3:
        return None
    
    # Find the position of the common ancestor (highest point in the path)
    # In our simplified paths, the common ancestor is the middle node
    ancestor_pos = len(path) // 2
    
    # Calculate u (up from first person to common ancestor)
    u = ancestor_pos
    
    # Calculate d (down from common ancestor to second person)
    d = len(path) - ancestor_pos - 1
    
    # Common ancestor count is always 1 in this simple model
    # In a more complex model, we'd count the actual common ancestors
    a = 1
    
    return (u, d, a)

def calculate_compound_relationship(G, id1, id2):
    """
    Calculate the compound relationship between two individuals.
    
    Args:
        G: NetworkX graph representing the pedigree
        id1: ID of the first individual
        id2: ID of the second individual
        
    Returns:
        relationships: List of relationship tuples
        total_expected_ibd: Total expected IBD sharing
    """
    # Find all relationship paths
    all_paths = find_all_relationship_paths(G, id1, id2)
    
    if not all_paths:
        return [], 0
    
    # Convert paths to relationships
    relationships = []
    for path in all_paths:
        rel = convert_path_to_relationship(path)
        if rel is not None:
            relationships.append((rel, path))
    
    # Calculate expected IBD for each relationship
    expected_ibds = []
    for rel, path in relationships:
        expected_ibd = calculate_expected_ibd(rel)
        expected_ibds.append((rel, path, expected_ibd))
    
    # Calculate total expected IBD
    rel_tuples = [rel for rel, _, _ in expected_ibds]
    total_expected_ibd = calculate_compound_ibd(rel_tuples)
    
    return expected_ibds, total_expected_ibd

# Create and visualize the pedigree
pedigree = create_compound_relationship_pedigree()
visualize_pedigree(pedigree)

# Test pairs to analyze
test_pairs = [
    ("C1", "C3"),  # Double first cousins
    ("C1", "H"),   # Half-siblings
    ("C2", "C4"),  # Double first cousins
    ("C1", "C2"),  # Full siblings
    ("H", "C3"),   # Complex relationship
]

# Analyze each pair
for id1, id2 in test_pairs:
    print(f"\
Analyzing relationship between {id1} and {id2}:")
    
    # Calculate compound relationship
    relationships, total_ibd = calculate_compound_relationship(pedigree, id1, id2)
    
    if not relationships:
        print(f"No relationship found between {id1} and {id2}")
        continue
    
    # Display each path and its expected IBD
    print(f"Found {len(relationships)} relationship paths:")
    for i, (rel, path, expected_ibd) in enumerate(relationships):
        rel_name = classify_relationship(rel)
        print(f"  Path {i+1}: {rel} - {rel_name}")
        print(f"    Path: {' -> '.join(path)}")
        print(f"    Expected IBD: {expected_ibd:.2f} cM")
    
    print(f"Total expected IBD: {total_ibd:.2f} cM")
    
    # Visualize the first path for this pair
    if relationships:
        visualize_pedigree(pedigree, highlight_path=relationships[0][1])

## Real-World Application: Endogamous Populations

Compound relationships are particularly common in endogamous populations - groups with a history of marriage within a relatively small community. Examples include:

1. **Island communities**: Small island populations with limited geographic mobility
2. **Religious isolates**: Communities that marry predominantly within their faith
3. **Cultural or ethnic groups**: Populations that maintain marriage patterns within their group

In these populations, genetic genealogy presents unique challenges:

- **Elevated IBD sharing**: Individuals tend to share more DNA than expected for their stated relationship
- **Multiple relationship paths**: Many pairs of individuals are related in numerous ways
- **Ambiguous relationship inference**: The standard relationship models may struggle to distinguish between different relationship types
- **Complex pedigrees**: Family trees contain many loops and interconnections

When working with endogamous populations, Bonsai v3 makes several adjustments:

1. **Calibrating background IBD**: Adjusting the expected background level of IBD sharing
2. **Considering multiple paths**: Accounting for all possible relationship connections
3. **Using additional information**: Incorporating age, geography, and historical context
4. **Probabilistic approaches**: Using statistical methods that express uncertainty appropriately

A key principle in these cases is to not over-interpret the genetic data and to carefully combine it with traditional genealogical research.

In [ ]:
# Simulation of IBD sharing in endogamous vs. non-endogamous populations

import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
import random

# Set random seed for reproducibility
np.random.seed(42)
random.seed(42)

def simulate_endogamous_population(n_pairs=100, endogamy_factor=1.5):
    """
    Simulate IBD sharing in an endogamous population compared to a non-endogamous population.
    
    Args:
        n_pairs: Number of pairs to simulate for each relationship type
        endogamy_factor: Factor by which to increase IBD sharing in endogamous population
        
    Returns:
        DataFrame with simulated IBD sharing data
    """
    # Define relationship types and their expected IBD sharing
    relationships = {
        "2nd Cousins": {"mean_cm": 212, "std_cm": 70},
        "3rd Cousins": {"mean_cm": 53, "std_cm": 30},
        "4th Cousins": {"mean_cm": 13, "std_cm": 10},
        "5th Cousins": {"mean_cm": 3, "std_cm": 2}
    }
    
    # Initialize data for the simulation
    data = []
    
    # Generate data for non-endogamous population
    for rel_type, params in relationships.items():
        for _ in range(n_pairs):
            # Generate normal random IBD sharing
            ibd = max(0, np.random.normal(params["mean_cm"], params["std_cm"]))
            
            data.append({
                "relationship": rel_type,
                "population": "Non-Endogamous",
                "ibd_cm": ibd
            })
    
    # Generate data for endogamous population
    for rel_type, params in relationships.items():
        for _ in range(n_pairs):
            # Increase mean and std by endogamy factor
            endogamous_mean = params["mean_cm"] * endogamy_factor
            endogamous_std = params["std_cm"] * np.sqrt(endogamy_factor)
            
            # Generate normal random IBD sharing
            ibd = max(0, np.random.normal(endogamous_mean, endogamous_std))
            
            data.append({
                "relationship": rel_type,
                "population": "Endogamous",
                "ibd_cm": ibd
            })
    
    return pd.DataFrame(data)

# Simulate data
endogamous_data = simulate_endogamous_population(n_pairs=200, endogamy_factor=2.0)

# Create boxplots to compare IBD sharing
plt.figure(figsize=(14, 8))
sns.boxplot(x="relationship", y="ibd_cm", hue="population", data=endogamous_data)
plt.title("IBD Sharing in Endogamous vs. Non-Endogamous Populations")
plt.xlabel("Relationship Type")
plt.ylabel("IBD Sharing (cM)")
plt.ylim(0, 500)  # Adjust y-axis for better visibility
plt.grid(axis='y', alpha=0.3)
plt.tight_layout()
plt.show()

# Create violin plots for more detailed distribution visualization
plt.figure(figsize=(14, 8))
sns.violinplot(x="relationship", y="ibd_cm", hue="population", data=endogamous_data, split=True)
plt.title("Distribution of IBD Sharing in Endogamous vs. Non-Endogamous Populations")
plt.xlabel("Relationship Type")
plt.ylabel("IBD Sharing (cM)")
plt.ylim(0, 500)  # Adjust y-axis for better visibility
plt.grid(axis='y', alpha=0.3)
plt.tight_layout()
plt.show()

# Calculate and display summary statistics
summary = endogamous_data.groupby(['relationship', 'population'])['ibd_cm'].agg(['mean', 'std', 'min', 'max'])
print("Summary Statistics of IBD Sharing:\
")
print(summary)

# Calculate overlap percentages between endogamous and non-endogamous distributions
print("\
Overlap Analysis - Risk of Misclassification:")
for rel_type in relationships.keys():
    # Get data for this relationship type
    non_endo = endogamous_data[(endogamous_data['relationship'] == rel_type) & 
                              (endogamous_data['population'] == 'Non-Endogamous')]['ibd_cm'].values
    endo = endogamous_data[(endogamous_data['relationship'] == rel_type) & 
                          (endogamous_data['population'] == 'Endogamous')]['ibd_cm'].values
    
    # Calculate 10th and 90th percentiles
    non_endo_10th = np.percentile(non_endo, 10)
    non_endo_90th = np.percentile(non_endo, 90)
    endo_10th = np.percentile(endo, 10)
    endo_90th = np.percentile(endo, 90)
    
    # Calculate percentage of endogamous values falling in non-endogamous range
    misclass_pct = np.mean((endo >= non_endo_10th) & (endo <= non_endo_90th)) * 100
    
    print(f"{rel_type}: {misclass_pct:.1f}% of endogamous pairs could be misclassified as non-endogamous")

## Self-Assessment Questions

Test your understanding with these questions:

1. In Bonsai's relationship tuple format (u, d, a), what do each of the three components represent, and how would you represent a parent-child relationship?

2. How does Bonsai calculate the degree of relationship from a relationship tuple, and why is it useful?

3. What is the key challenge posed by compound relationships for genetic genealogy, and how does Bonsai address it?

4. When computing expected IBD sharing for two individuals who are related through multiple paths, why can't we simply add the expected sharing from each path?

5. What adjustments are necessary when analyzing relationships in endogamous populations compared to non-endogamous ones?

*Answers to self-assessment questions can be found at the end of the lab document.*

## Summary

In this lab, we explored how Bonsai v3 handles complex relationship patterns through its `relationships.py` module. Key takeaways include:

1. **Relationship Representation**: Bonsai uses the (u, d, a) tuple format to represent genealogical relationships, where u is the number of generations up, d is the number of generations down, and a is the number of common ancestors.

2. **Relationship Calculations**: The module provides functions to calculate relationship degrees, join relationships through common individuals, and find transitive relationships along chains of relatives.

3. **Compound Relationships**: Individuals can be related in multiple ways (like double first cousins), and Bonsai handles these by considering all paths of relationship and their combined genetic sharing.

4. **Endogamous Populations**: In populations with high rates of intermarriage, Bonsai adjusts its models to account for elevated background IBD sharing and multiple relationship paths.

5. **Relationship Uncertainty**: Complex relationships introduce uncertainty, and Bonsai uses probabilistic approaches to express confidence in its relationship predictions.

### Connections to Other Labs

The concepts covered in this lab connect to:
- **Lab 6: Probabilistic Relationship Inference** - Using likelihood models to infer relationships
- **Lab 12: Relationship Assessment** - Evaluating and classifying relationships
- **Lab 16: Merging Pedigrees** - Complex relationships affect how pedigrees can be merged
- **Lab 23: Handling Twins** - Twins are a special case of close relationships

### Further Reading

To deepen your understanding of these topics, consider exploring:

- Huff, C. D., et al. (2011). "Maximum-likelihood estimation of recent shared ancestry (ERSA)." *Genome Research*, 21(5), 768-774.
- Browning, S. R., & Browning, B. L. (2012). "Identity by descent between distant relatives: Detection and applications." *Annual Review of Genetics*, 46, 617-633.
- Baran, Y., et al. (2012). "Fast and accurate inference of local ancestry in Latino populations." *Bioinformatics*, 28(10), 1359-1367.
- Staples, J., et al. (2016). "PRIMUS: Rapid reconstruction of pedigrees from genome-wide estimates of identity by descent." *American Journal of Human Genetics*, 95(5), 553-564.

---

## Answer Key (for instructors)

### Exercise 1
The solution code for the relationship representation and conversion functions is provided in the notebook. The key aspects include:

- Converting between (u, d, a) and (m, a) formats
- Calculating relationship degrees using the formula `degree = u + d - a + 1`
- Properly classifying relationships based on their (u, d, a) tuples

### Exercise 2
The solution code for finding relationships in a pedigree is provided in the notebook. The key aspects include:

- Building a graph representation of the pedigree
- Finding paths between individuals
- Converting paths to relationship tuples
- Using `get_transitive_rel` to compute relationships along chains

### Exercise 3
The solution code for modeling compound relationships is provided in the notebook. The key aspects include:

- Creating a pedigree with compound relationships
- Finding all relationship paths between individuals
- Converting each path to a relationship tuple
- Calculating expected IBD sharing for each path
- Computing total expected IBD sharing for the compound relationship

### Self-Assessment Answers

1. In Bonsai's relationship tuple format (u, d, a):
   - `u` (up): Number of generations to go up from the first person to a common ancestor
   - `d` (down): Number of generations to go down from the ancestor to the second person
   - `a` (ancestors): Number of common ancestors connecting the two people
   - A parent-child relationship would be represented as (0, 1, 1) - go up 0 generations from parent, down 1 generation to child, with 1 common ancestor

2. Bonsai calculates the degree of relationship using the formula `degree = u + d - a + 1`. This is useful because:
   - It provides a standardized measure of relationship closeness
   - Relationships with the same degree have similar genetic sharing expectations
   - It allows for comparing and sorting relationships of different types

3. The key challenge of compound relationships is that individuals may be related through multiple paths, which increases their genetic sharing beyond what would be expected for a single relationship. Bonsai addresses this by:
   - Identifying all paths of relationship between individuals
   - Calculating expected genetic sharing for each path
   - Modeling the combined sharing using a sophisticated model that accounts for overlaps
   - Using these multiple paths in pedigree construction and relationship inference

4. We can't simply add expected IBD sharing from multiple paths because:
   - Segments shared through different paths may overlap
   - The total sharing can't exceed the genome length (3545 cM for autosomes)
   - Simple addition would overestimate the expected sharing
   - The statistical distribution of sharing becomes more complex
   - A more sophisticated model is needed to account for the correlation between segments

5. When analyzing relationships in endogamous populations, necessary adjustments include:
   - Calibrating for elevated background IBD sharing across the population
   - Adjusting relationship likelihood models to account for multiple relationship paths
   - Using additional non-genetic information (age, geography, historical records)
   - Expressing greater uncertainty in relationship predictions
   - Considering many more potential relationship types when making inferences
   - Using more conservative confidence thresholds for relationship classification

In [ ]:
# Optional: Convert this notebook to PDF
# Uncomment and run this cell if you want to generate a PDF version

# !jupyter nbconvert --to pdf "Lab24_Complex_Relationships.ipynb"