# Lab 6: Pedigree Data Structures Implementation

## Overview

This lab explores the core pedigree data structures used in Bonsai v3. We'll examine how family relationships are represented computationally and how these representations facilitate efficient pedigree operations. Key topics include:

1. The `up_node_dict` and `down_node_dict` fundamental data structures
2. Key algorithms for navigating and manipulating pedigree structures
3. Finding common ancestors, descendants, and relationship paths
4. Working with the core functions in the `pedigrees.py` module

In [None]:
# Standard imports
import os
import sys
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
import inspect
import importlib
import json
from IPython.display import display, HTML, Markdown
import warnings
warnings.filterwarnings('ignore')

sys.path.append(os.path.dirname(os.getcwd()))

# Cross-compatibility setup
from scripts_support.lab_cross_compatibility import setup_environment, is_jupyterlite, save_results, save_plot

# Set up environment-specific paths
DATA_DIR, RESULTS_DIR = setup_environment()

# Set visualization styles
plt.style.use('seaborn-v0_8-whitegrid')
sns.set_context("notebook")

In [None]:
# Setup Bonsai module paths
if not is_jupyterlite():
    # In local environment, add the utils directory to system path
    utils_dir = os.getenv('PROJECT_UTILS_DIR', os.path.join(os.path.dirname(DATA_DIR), 'utils'))
    bonsaitree_dir = os.path.join(utils_dir, 'bonsaitree')
    
    # Add to path if it exists and isn't already there
    if os.path.exists(bonsaitree_dir) and bonsaitree_dir not in sys.path:
        sys.path.append(bonsaitree_dir)
        print(f"Added {bonsaitree_dir} to sys.path")
else:
    # In JupyterLite, use a simplified approach
    print("⚠️ Running in JupyterLite: Some Bonsai functionality may be limited.")
    print("This notebook is primarily designed for local execution where the Bonsai codebase is available.")

In [None]:
# Helper functions for exploring modules
def display_module_classes(module_name):
    """Display classes and their docstrings from a module"""
    try:
        # Import the module
        module = importlib.import_module(module_name)
        
        # Find all classes
        classes = inspect.getmembers(module, inspect.isclass)
        
        # Filter classes defined in this module (not imported)
        classes = [(name, cls) for name, cls in classes if cls.__module__ == module_name]
        
        # Print info for each class
        for name, cls in classes:
            print(f"\n## {name}")
            
            # Get docstring
            doc = inspect.getdoc(cls)
            if doc:
                print(f"Docstring: {doc}")
            else:
                print("No docstring available")
            
            # Get methods
            methods = inspect.getmembers(cls, inspect.isfunction)
            if methods:
                print("\nMethods:")
                for method_name, method in methods:
                    if not method_name.startswith('_'):  # Skip private methods
                        print(f"- {method_name}")
    except ImportError as e:
        print(f"Error importing module {module_name}: {e}")
    except Exception as e:
        print(f"Error processing module {module_name}: {e}")

def display_module_functions(module_name):
    """Display functions and their docstrings from a module"""
    try:
        # Import the module
        module = importlib.import_module(module_name)
        
        # Find all functions
        functions = inspect.getmembers(module, inspect.isfunction)
        
        # Filter functions defined in this module (not imported)
        functions = [(name, func) for name, func in functions if func.__module__ == module_name]
        
        # Print info for each function
        for name, func in functions:
            if name.startswith('_'):  # Skip private functions
                continue
                
            print(f"\n## {name}")
            
            # Get signature
            sig = inspect.signature(func)
            print(f"Signature: {name}{sig}")
            
            # Get docstring
            doc = inspect.getdoc(func)
            if doc:
                print(f"Docstring: {doc}")
            else:
                print("No docstring available")
    except ImportError as e:
        print(f"Error importing module {module_name}: {e}")
    except Exception as e:
        print(f"Error processing module {module_name}: {e}")

def view_function_source(module_name, function_name):
    """Display the source code of a function"""
    try:
        # Import the module
        module = importlib.import_module(module_name)
        
        # Get the function
        func = getattr(module, function_name)
        
        # Get the source code
        source = inspect.getsource(func)
        
        # Print the source code
        from IPython.display import display, Markdown
        display(Markdown(f"```python\n{source}\n```"))
    except ImportError as e:
        print(f"Error importing module {module_name}: {e}")
    except AttributeError:
        print(f"Function {function_name} not found in module {module_name}")
    except Exception as e:
        print(f"Error processing function {function_name}: {e}")

## Importing Bonsai Modules

Let's start by importing the relevant Bonsai v3 modules for pedigree data structures:

In [2]:
try:
    # Import Bonsai v3 modules
    from utils.bonsaitree.bonsaitree.v3 import pedigrees
    from utils.bonsaitree.bonsaitree.v3 import relationships
    
    print("✅ Successfully imported Bonsai v3 modules")
except ImportError as e:
    print(f"❌ Failed to import Bonsai v3 modules: {e}")
    print("This lab requires access to the Bonsai v3 codebase.")

❌ Failed to import Bonsai v3 modules: No module named 'utils'
This lab requires access to the Bonsai v3 codebase.


## Core Pedigree Data Structures

Bonsai v3 represents pedigrees primarily using two key data structures:

1. **up_node_dict**: A dictionary mapping each individual to their parents
2. **down_node_dict**: A dictionary mapping each individual to their children

Let's examine these data structures and how they're used in Bonsai v3.

In [None]:
# Let's look at the key functions for manipulating pedigree data structures
try:
    display_module_functions('utils.bonsaitree.bonsaitree.v3.pedigrees')
except Exception as e:
    print(f"Could not display pedigrees module functions: {e}")
    print("\nThe pedigrees module contains core functions for representing and manipulating pedigree structures")

### The up_node_dict and down_node_dict

Let's create a simple pedigree to understand these data structures:

In [None]:
def create_example_pedigree():
    """
    Create a simple example pedigree to work with.
    
    The pedigree represents a family with grandparents (1, 2, 3, 4),
    parents (5, 6), and children (7, 8).
    
    Structure:
    
         1    2      3    4
          \  /        \  /
           5           6
            \         /
             \       /
              7     8
    
    Note: Positive IDs (>0) represent genotyped individuals (real individuals with DNA data),
    while negative IDs (<0) would represent ungenotyped individuals (inferred ancestors).
    """
    # Create the up_node_dict (mapping of individuals to their parents)
    up_node_dict = {
        # Individual 1 and 2 are founders (no parents)
        1: {},
        2: {},
        # Individual 3 and 4 are also founders
        3: {},
        4: {},
        # Individual 5 has parents 1 and 2
        5: {1: 1, 2: 1},  # "1" here represents 1 meiosis away (direct parent-child)
        # Individual 6 has parents 3 and 4
        6: {3: 1, 4: 1},
        # Individual 7 has parents 5 and 6
        7: {5: 1, 6: 1},
        # Individual 8 also has parents 5 and 6 (siblings)
        8: {5: 1, 6: 1}
    }
    
    return up_node_dict

# Create an example pedigree
up_node_dict = create_example_pedigree()
print("Example pedigree (up_node_dict):")
for individual, parents in up_node_dict.items():
    if parents:
        parent_list = [f"{parent} (d={degree})" for parent, degree in parents.items()]
        print(f"Individual {individual} has parents: {', '.join(parent_list)}")
    else:
        print(f"Individual {individual} is a founder (no parents)")

Now, let's convert the `up_node_dict` to a `down_node_dict` using Bonsai's `reverse_node_dict` function:

In [None]:
# Let's examine the reverse_node_dict function
try:
    view_function_source('utils.bonsaitree.bonsaitree.v3.pedigrees', 'reverse_node_dict')
except Exception as e:
    print(f"Could not display the function: {e}")
    # Simplified implementation
    def reverse_node_dict(dct):
        """
        Reverse a node dict. If it's a down dict make it an up dict
        and vice versa.
        """
        rev_dct = {}
        for i, info in dct.items():
            for a, d in info.items():
                if a not in rev_dct:
                    rev_dct[a] = {}
                rev_dct[a][i] = d
        return rev_dct

In [None]:
# Convert up_node_dict to down_node_dict
try:
    down_node_dict = pedigrees.reverse_node_dict(up_node_dict)
except Exception as e:
    # Use our simplified implementation if Bonsai isn't available
    down_node_dict = reverse_node_dict(up_node_dict)

print("Example pedigree (down_node_dict):")
for individual, children in down_node_dict.items():
    if children:
        child_list = [f"{child} (d={degree})" for child, degree in children.items()]
        print(f"Individual {individual} has children: {', '.join(child_list)}")
    else:
        print(f"Individual {individual} has no children")

### Visualizing the Pedigree

Let's create a simple visualization of our pedigree using NetworkX:

In [None]:
try:
    import networkx as nx
    
    def visualize_pedigree(up_dict):
        """Create a visualization of the pedigree using NetworkX"""
        G = nx.DiGraph()
        
        # Add all individuals as nodes
        for individual in up_dict.keys():
            G.add_node(individual)
        
        # Add edges from parents to children
        for child, parents in up_dict.items():
            for parent in parents:
                G.add_edge(parent, child)
        
        # Create positions based on generational structure (layered layout)
        pos = nx.multipartite_layout(G, subset_key=lambda x: nx.shortest_path_length(G, source=1, target=x))
        
        # Draw the pedigree
        plt.figure(figsize=(10, 7))
        nx.draw(G, pos, with_labels=True, node_color='lightblue', 
                node_size=500, arrowsize=20, font_size=12, font_weight='bold')
        plt.title("Family Pedigree")
        plt.show()
    
    # Visualize our example pedigree
    visualize_pedigree(up_node_dict)
except ImportError:
    print("NetworkX is not available in this environment for visualization.")
    print("Here's a text representation of the pedigree:")
    print("""
     1    2      3    4
      \  /        \  /
       5           6
        \         /
         \       /
          7     8
    """)

## Key Algorithms for Pedigree Navigation

Now, let's explore some of the key algorithms used for navigating and analyzing pedigrees in Bonsai v3.

### 1. Finding Sets of Related Individuals

One common operation is finding all individuals related to a given person - either ancestors or descendants:

In [None]:
# Let's look at functions for finding related individuals
try:
    view_function_source('utils.bonsaitree.bonsaitree.v3.pedigrees', 'get_rel_set')
except Exception as e:
    print(f"Could not display the function: {e}")
    # Simplified implementation of get_rel_set
    def get_rel_set(node_dict, i):
        """Get a set of all related individuals"""
        rel_set = {i}  # Start with the individual
        for parent in node_dict.get(i, {}):
            # Recursively add all relatives of each parent
            rel_set |= get_rel_set(node_dict, parent)
        return rel_set

In [None]:
# Find all ancestors of individual 7
try:
    ancestors = pedigrees.get_rel_set(up_node_dict, 7)
except Exception as e:
    # Use our simplified implementation
    ancestors = get_rel_set(up_node_dict, 7)
    
print(f"Ancestors of individual 7: {ancestors}")

# Find all descendants of individual 5
try:
    descendants = pedigrees.get_rel_set(down_node_dict, 5)
except Exception as e:
    # Use our simplified implementation
    descendants = get_rel_set(down_node_dict, 5)
    
print(f"Descendants of individual 5: {descendants}")

### 2. Finding Relationship Paths Between Individuals

Another important operation is finding all possible paths between two individuals, which helps determine their relationship:

In [None]:
# Let's explore the function for finding paths between individuals
try:
    view_function_source('bonsaitree.bonsaitree.v3.pedigrees', 'get_all_paths')
except Exception as e:
    print(f"Could not display the function: {e}")
    # Simplified implementation
    def get_all_paths(up_node_dict, i, j):
        """Find all paths between individuals i and j"""
        # This is a simplified version - in reality this is a more complex operation
        # Simplified implementation can't handle all cases
        if i == j:
            return {(i,)}, {i}  # Path is just the individual themselves, who is also the MRCA
        
        # Find ancestors of both individuals
        i_ancestors = get_rel_set(up_node_dict, i)
        j_ancestors = get_rel_set(up_node_dict, j)
        
        # Find common ancestors
        common_ancestors = i_ancestors.intersection(j_ancestors)
        
        if not common_ancestors:
            return set(), set()  # No paths found, no MRCAs
        
        # Simplified - just return a path through the first common ancestor
        mrca = min(common_ancestors)  # Just pick one MRCA (in reality we'd find the most recent one)
        path = (i, mrca, j)  # Simplified path i -> mrca -> j
        
        return {path}, {mrca}

In [None]:
# Find all paths between individuals 7 and 8 (siblings)
try:
    paths, mrcas = pedigrees.get_all_paths(up_node_dict, 7, 8)
except Exception as e:
    # Use our simplified implementation
    paths, mrcas = get_all_paths(up_node_dict, 7, 8)
    
print(f"Paths between individual 7 and 8: {paths}")
print(f"Most recent common ancestors (MRCAs): {mrcas}")

# Find paths between individuals 7 and 1 (grandparent-grandchild)
try:
    paths, mrcas = pedigrees.get_all_paths(up_node_dict, 7, 1)
except Exception as e:
    # Use our simplified implementation
    paths, mrcas = get_all_paths(up_node_dict, 7, 1)
    
print(f"\nPaths between individual 7 and 1: {paths}")
print(f"Most recent common ancestors (MRCAs): {mrcas}")

### 3. Determining Relationship Types

Bonsai represents relationship types using tuples of the form `(up, down, num_ancs)`, where:
- `up`: Number of generations up from the first individual to the common ancestor
- `down`: Number of generations down from the common ancestor to the second individual
- `num_ancs`: Number of common ancestors (1 or 2)

Let's examine how Bonsai computes these relationship tuples:

In [None]:
# Let's look at how Bonsai determines relationship tuples
try:
    view_function_source('bonsaitree.bonsaitree.v3.pedigrees', 'get_simple_rel_tuple')
except Exception as e:
    print(f"Could not display the function: {e}")
    # Simplified implementation
    def get_simple_rel_tuple(up_node_dict, i, j):
        """Determine the relationship tuple between individuals i and j"""
        if i == j:
            return (0, 0, 2)  # Self-relationship
        
        # Find paths
        paths, _ = get_all_paths(up_node_dict, i, j)
        
        if not paths:
            return None  # No relationship found
        
        num_ancs = len(paths)  # Number of common ancestors
        
        # Simplified calculation of up and down
        path = list(paths)[0]  # Take the first path (in reality, more complex logic is needed)
        up = 1  # Simplified - in reality we'd count meioses up from i to mrca
        down = 1  # Simplified - in reality we'd count meioses down from mrca to j
        
        return up, down, num_ancs

In [None]:
def describe_relationship(rel_tuple):
    """Convert a relationship tuple to a human-readable description"""
    if rel_tuple is None:
        return "Unrelated"
    
    up, down, num_ancs = rel_tuple
    
    if up == 0 and down == 0:
        return "Self"
    elif up == 0 and down == 1:
        return "Parent-Child"
    elif up == 1 and down == 0:
        return "Child-Parent"
    elif up == 1 and down == 1 and num_ancs == 2:
        return "Full Siblings"
    elif up == 1 and down == 1 and num_ancs == 1:
        return "Half Siblings"
    elif up == 0 and down == 2:
        return "Grandparent-Grandchild"
    elif up == 2 and down == 0:
        return "Grandchild-Grandparent"
    elif up == 1 and down == 2:
        return "Avuncular (Aunt/Uncle-Niece/Nephew)"
    elif up == 2 and down == 1:
        return "Avuncular (Niece/Nephew-Aunt/Uncle)"
    elif up == 2 and down == 2 and num_ancs == 2:
        return "First Cousins"
    elif up == 2 and down == 2 and num_ancs == 1:
        return "Half First Cousins"
    else:
        return f"Complex Relationship (up={up}, down={down}, num_ancs={num_ancs})"

# Analyze relationships in our example pedigree
relationships_to_check = [
    (7, 8),  # Siblings
    (7, 5),  # Child-Parent
    (1, 7),  # Grandparent-Grandchild
    (5, 6),  # Unrelated (spouses)
    (3, 8),  # Grandparent-Grandchild
]

print("Relationships in the example pedigree:")
for i, j in relationships_to_check:
    try:
        rel_tuple = pedigrees.get_simple_rel_tuple(up_node_dict, i, j)
    except Exception:
        # Use our simplified implementation
        rel_tuple = get_simple_rel_tuple(up_node_dict, i, j)
    
    rel_description = describe_relationship(rel_tuple)
    print(f"Individual {i} and {j}: {rel_description} {rel_tuple}")

## Advanced Pedigree Operations

Now let's explore some more advanced operations on pedigree structures:

### 1. Re-rooting a Pedigree

The `re_root_up_node_dict` function is used to reorient a pedigree around a specific individual:

In [None]:
# Let's examine the function for re-rooting a pedigree
try:
    view_function_source('bonsaitree.bonsaitree.v3.pedigrees', 're_root_up_node_dict')
except Exception as e:
    print(f"Could not display the function: {e}")
    # Simplified implementation
    def re_root_up_node_dict(up_dct, node):
        """Re-root an up node dict on a particular node"""
        print("Using simplified re-rooting function")
        # This is a complex operation that creates a view of the pedigree centered on a specific node
        # For simplicity, we'll just return a copy of the original in this example
        import copy
        return copy.deepcopy(up_dct)

In [None]:
# Re-root the pedigree at individual 7
try:
    rerooted_dict = pedigrees.re_root_up_node_dict(up_node_dict, 7)
except Exception as e:
    # Use simplified implementation
    rerooted_dict = re_root_up_node_dict(up_node_dict, 7)
    
print("Re-rooted pedigree (focused on individual 7):")
for individual, relatives in rerooted_dict.items():
    if relatives:
        rel_list = [f"{rel} (d={degree})" for rel, degree in relatives.items()]
        print(f"Individual {individual} connects to: {', '.join(rel_list)}")
    else:
        print(f"Individual {individual} has no connections in this view")

### 2. Finding Common Ancestors

Finding common ancestors is a critical operation for identifying relationships:

In [None]:
# Let's look at the common ancestor finding function
try:
    view_function_source('bonsaitree.bonsaitree.v3.pedigrees', 'get_common_anc_set')
except Exception as e:
    print(f"Could not display the function: {e}")
    # Simplified implementation
    def get_common_anc_set(up_dct, id_set):
        """Get all common ancestors of the nodes in id_set"""
        # Get the ancestors of the first individual
        first_id = next(iter(id_set))
        common_ancs = get_rel_set(up_dct, first_id)
        
        # For each other individual, intersect with their ancestors
        for id in id_set:
            if id == first_id:
                continue
            ancs = get_rel_set(up_dct, id)
            common_ancs &= ancs
        
        return common_ancs

In [None]:
# Find common ancestors of individuals 7 and 8
try:
    common_ancs = pedigrees.get_common_anc_set(up_node_dict, {7, 8})
except Exception as e:
    # Use simplified implementation
    common_ancs = get_common_anc_set(up_node_dict, {7, 8})
    
print(f"Common ancestors of individuals 7 and 8: {common_ancs}")

# Find common ancestors of individuals 5 and 6
try:
    common_ancs = pedigrees.get_common_anc_set(up_node_dict, {5, 6})
except Exception as e:
    # Use simplified implementation
    common_ancs = get_common_anc_set(up_node_dict, {5, 6})
    
print(f"Common ancestors of individuals 5 and 6: {common_ancs}")

### 3. Working with Genotyped vs. Ungenotyped Individuals

In Bonsai, positive IDs (>0) represent genotyped individuals (real individuals with DNA data), while negative IDs (<0) represent ungenotyped individuals (inferred ancestors). Let's explore functions for working with these different types of individuals:

In [None]:
# Let's look at functions for working with genotyped individuals
try:
    view_function_source('bonsaitree.bonsaitree.v3.pedigrees', 'get_gt_id_set')
except Exception as e:
    print(f"Could not display the function: {e}")
    # Simplified implementation
    def get_gt_id_set(ped):
        """Get genotyped IDs from a pedigree"""
        # Get all IDs in the pedigree
        all_ids = set(ped.keys()).union(*[set(parents.keys()) for parents in ped.values()])
        # Filter to only positive IDs (genotyped individuals)
        return {i for i in all_ids if i > 0}

In [None]:
# Create a pedigree with some ungenotyped individuals (negative IDs)
def create_pedigree_with_ungenotyped():
    """
    Create a pedigree with some ungenotyped (inferred) individuals
    
    Structure:
          -1    -2
            \  /
             1     2
              \   /
               \ /
                3
    
    Here -1 and -2 are ungenotyped (inferred) ancestors of individual 1.
    Individuals 1, 2, and 3 are genotyped (have DNA data).
    """
    return {
        # Individual -1 and -2 are ungenotyped founders
        -1: {},
        -2: {},
        # Individual 1 has ungenotyped parents -1 and -2
        1: {-1: 1, -2: 1},
        # Individual 2 is a founder (no parents in this pedigree)
        2: {},
        # Individual 3 has parents 1 and 2
        3: {1: 1, 2: 1}
    }

# Create the pedigree
mixed_pedigree = create_pedigree_with_ungenotyped()

# Get genotyped individuals
try:
    genotyped_ids = pedigrees.get_gt_id_set(mixed_pedigree)
except Exception as e:
    # Use simplified implementation
    genotyped_ids = get_gt_id_set(mixed_pedigree)
    
print(f"All individuals in the pedigree: {set(mixed_pedigree.keys())}")
print(f"Genotyped individuals: {genotyped_ids}")
print(f"Ungenotyped individuals: {set(mixed_pedigree.keys()) - genotyped_ids}")

### 4. Finding Founders

Founders are individuals without parents in the pedigree, which is important for understanding the overall structure:

In [None]:
# Let's look at the function for finding founders
try:
    view_function_source('bonsaitree.bonsaitree.v3.pedigrees', 'get_founder_set')
except Exception as e:
    print(f"Could not display the function: {e}")
    # Simplified implementation
    def get_founder_set(up_dct):
        """Get all pedigree founders"""
        # Founders are nodes with no parents (empty dictionaries in up_node_dict)
        return {node for node, parents in up_dct.items() if not parents}

In [None]:
# Find founders in our original pedigree
try:
    founders = pedigrees.get_founder_set(up_node_dict)
except Exception as e:
    # Use simplified implementation
    founders = get_founder_set(up_node_dict)
    
print(f"Founders in the original pedigree: {founders}")

# Find founders in the mixed pedigree
try:
    mixed_founders = pedigrees.get_founder_set(mixed_pedigree)
except Exception as e:
    # Use simplified implementation
    mixed_founders = get_founder_set(mixed_pedigree)
    
print(f"Founders in the mixed pedigree: {mixed_founders}")
print(f"Genotyped founders: {mixed_founders & get_gt_id_set(mixed_pedigree)}")
print(f"Ungenotyped founders: {mixed_founders - get_gt_id_set(mixed_pedigree)}")

## Creating a Relationship Dictionary

A powerful operation is creating a comprehensive relationship dictionary that maps each pair of individuals to their relationship tuple:

In [None]:
# Let's look at the function for creating a relationship dictionary
try:
    view_function_source('bonsaitree.bonsaitree.v3.pedigrees', 'get_rel_dict')
except Exception as e:
    print(f"Could not display the function: {e}")
    # Simplified implementation
    def get_rel_dict(up_dct):
        """Get a dictionary mapping each pair of individuals to their relationship tuple"""
        from itertools import combinations
        
        # Get all individuals in the pedigree
        all_ids = set(up_dct.keys()).union(*[set(parents.keys()) for parents in up_dct.values()])
        
        # Create the dictionary
        rel_dict = {}
        for i in all_ids:
            rel_dict[i] = {}
            # Self-relationship
            rel_dict[i][i] = (0, 0, 2)
        
        # Compute relationship tuples for all pairs
        for i, j in combinations(all_ids, 2):
            # Compute relationship from i to j
            rel_tuple = get_simple_rel_tuple(up_dct, i, j)
            
            # Record the relationship
            rel_dict[i][j] = rel_tuple
            
            # Compute the reverse relationship from j to i
            if rel_tuple is None:
                rev_rel_tuple = None
            else:
                up, down, num_ancs = rel_tuple
                rev_rel_tuple = (down, up, num_ancs)  # Reverse the relationship
                
            rel_dict[j][i] = rev_rel_tuple
        
        return rel_dict

In [None]:
# Create a relationship dictionary for our original pedigree
try:
    rel_dict = pedigrees.get_rel_dict(up_node_dict)
except Exception as e:
    # Use simplified implementation
    rel_dict = get_rel_dict(up_node_dict)

# Display selected relationships
print("Relationships in the pedigree:")
for i, j in [(1, 5), (5, 7), (7, 8), (1, 7)]:
    rel_tuple = rel_dict[i][j]
    print(f"Individual {i} to {j}: {describe_relationship(rel_tuple)} {rel_tuple}")

## Manipulating Pedigree Structures

Bonsai v3 provides functions for modifying pedigree structures, which is essential for pedigree reconstruction:

In [None]:
# Let's look at functions for manipulating pedigrees
try:
    # Function for adding a parent
    view_function_source('bonsaitree.bonsaitree.v3.pedigrees', 'add_parent')
except Exception as e:
    print(f"Could not display the add_parent function: {e}")
    # Simplified implementation
    def add_parent(node, up_dct, min_id=None):
        """Add an ungenotyped parent to node in up_dct"""
        import copy
        up_dct_copy = copy.deepcopy(up_dct)
        
        # Check if node exists and has room for another parent
        if node not in up_dct_copy:
            return up_dct_copy, None
        if len(up_dct_copy[node]) >= 2:
            return up_dct_copy, None
        
        # Find the minimum ID
        if min_id is None:
            all_ids = set(up_dct.keys()).union(*[set(parents.keys()) for parents in up_dct.values()])
            min_id = min(all_ids) if all_ids else 0
            min_id = min(-1, min_id)  # Ensure negative ID
        
        # Add new parent with ID one less than min_id
        new_parent_id = min_id - 1
        up_dct_copy[node][new_parent_id] = 1
        
        return up_dct_copy, new_parent_id

In [None]:
# Create a simpler pedigree to manipulate
simple_pedigree = {
    1: {},  # A founder with no parents
    2: {1: 1},  # Child of 1
    3: {2: 1}   # Child of 2 (grandchild of 1)
}

# Add an ungenotyped parent to individual 1
try:
    updated_pedigree, new_parent = pedigrees.add_parent(1, simple_pedigree)
except Exception as e:
    # Use simplified implementation
    updated_pedigree, new_parent = add_parent(1, simple_pedigree)

print(f"Original pedigree: {simple_pedigree}")
print(f"Updated pedigree: {updated_pedigree}")
print(f"New parent ID: {new_parent}")

# Add another parent to individual 1
try:
    final_pedigree, another_parent = pedigrees.add_parent(1, updated_pedigree, min_id=new_parent)
except Exception as e:
    # Use simplified implementation
    final_pedigree, another_parent = add_parent(1, updated_pedigree, min_id=new_parent)

print(f"\nFinal pedigree: {final_pedigree}")
print(f"Second parent ID: {another_parent}")

## Pedigree Consistency and Validation

Bonsai v3 provides various functions to check the consistency of pedigrees. Let's explore some of these:

In [None]:
def check_pedigree_consistency(up_dct):
    """Perform basic checks on a pedigree structure"""
    # Check 1: Every individual should have at most 2 parents
    for individual, parents in up_dct.items():
        if len(parents) > 2:
            return False, f"Individual {individual} has more than 2 parents: {len(parents)}"
    
    # Check 2: No loops (a person can't be their own ancestor)
    for individual in up_dct:
        ancestors = set()
        to_check = [individual]
        
        while to_check:
            current = to_check.pop()
            ancestors.add(current)
            
            for parent in up_dct.get(current, {}):
                if parent == individual:
                    return False, f"Loop detected: Individual {individual} is their own ancestor"
                if parent not in ancestors:
                    to_check.append(parent)
    
    return True, "Pedigree is consistent"

# Check our example pedigree
is_consistent, message = check_pedigree_consistency(up_node_dict)
print(f"Original pedigree consistency: {is_consistent}, {message}")

# Create an inconsistent pedigree (loop)
inconsistent_pedigree = {
    1: {},
    2: {1: 1},
    3: {2: 1},
    1: {3: 1}  # This creates a loop: 1 -> 2 -> 3 -> 1
}

is_consistent, message = check_pedigree_consistency(inconsistent_pedigree)
print(f"Inconsistent pedigree: {is_consistent}, {message}")

## Building a Simple Pedigree from IBD Data

Let's put together what we've learned to build a simple pedigree from simulated IBD data:

In [None]:
def build_simple_pedigree_from_ibd(ibd_data):
    """
    Build a simple pedigree from IBD data.
    
    Args:
        ibd_data: List of tuples (id1, id2, total_ibd), where total_ibd is in cM
    
    Returns:
        up_node_dict: The reconstructed pedigree
    """
    # Initialize empty pedigree
    pedigree = {}
    
    # Extract all individuals
    individuals = set()
    for id1, id2, _ in ibd_data:
        individuals.add(id1)
        individuals.add(id2)
    
    # Initialize pedigree with empty parent dictionaries
    for ind in individuals:
        pedigree[ind] = {}
    
    # Process IBD data to infer relationships
    next_ungenotyped_id = -1
    
    for id1, id2, total_ibd in ibd_data:
        # Simple heuristic: 
        # >1500 cM: Parent-Child
        # 700-1500 cM: Full Siblings or Grandparent-Grandchild or Avuncular
        # <700 cM: More distant relationships (not handled in this simple example)
        
        if total_ibd > 1500:  # Parent-Child relationship
            # Arbitrary decision: make the smaller ID the parent
            if id1 < id2:
                parent, child = id1, id2
            else:
                parent, child = id2, id1
            
            # Add the parent-child relationship
            pedigree[child][parent] = 1
            
        elif 700 <= total_ibd <= 1500:  # Full Siblings or similar
            # For simplicity, assume full siblings and create ungenotyped parents
            # Create two ungenotyped parents
            parent1 = next_ungenotyped_id
            next_ungenotyped_id -= 1
            parent2 = next_ungenotyped_id
            next_ungenotyped_id -= 1
            
            # Add the parents to the pedigree
            pedigree[parent1] = {}
            pedigree[parent2] = {}
            
            # Connect both individuals to both parents
            pedigree[id1][parent1] = 1
            pedigree[id1][parent2] = 1
            pedigree[id2][parent1] = 1
            pedigree[id2][parent2] = 1
    
    return pedigree

# Simulated IBD data (id1, id2, total_ibd_in_cM)
simulated_ibd_data = [
    (1, 2, 1800),  # Parent-child
    (1, 3, 1750),  # Parent-child
    (2, 3, 900),   # Full siblings
    (1, 4, 900),   # Grandparent-grandchild or avuncular
    (2, 4, 1800),  # Parent-child
    (3, 4, 450),   # More distant relationship
]

# Build a pedigree from this data
reconstructed_pedigree = build_simple_pedigree_from_ibd(simulated_ibd_data)

print("Reconstructed pedigree from IBD data:")
for individual, parents in reconstructed_pedigree.items():
    if parents:
        parent_list = [f"{parent} (d={degree})" for parent, degree in parents.items()]
        print(f"Individual {individual} has parents: {', '.join(parent_list)}")
    else:
        print(f"Individual {individual} is a founder (no parents)")

## Visualizing the Reconstructed Pedigree

Let's visualize our reconstructed pedigree:

In [None]:
try:
    import networkx as nx
    
    # Create a directed graph from the pedigree
    G = nx.DiGraph()
    
    # Add all individuals as nodes
    for individual in reconstructed_pedigree.keys():
        # Add node with different color based on whether it's genotyped (positive ID) or not (negative ID)
        if individual > 0:
            G.add_node(individual, color='lightblue', style='filled')
        else:
            G.add_node(individual, color='lightgray', style='filled')
    
    # Add edges from parents to children
    for child, parents in reconstructed_pedigree.items():
        for parent in parents:
            G.add_edge(parent, child)
    
    # Create a layout for the graph
    pos = nx.spring_layout(G, seed=42)
    
    # Draw the graph
    plt.figure(figsize=(10, 7))
    
    # Draw the nodes with different colors for genotyped vs ungenotyped
    node_colors = ['lightblue' if n > 0 else 'lightgray' for n in G.nodes()]
    nx.draw(G, pos, with_labels=True, node_color=node_colors, 
            node_size=500, arrowsize=20, font_size=12, font_weight='bold')
    
    plt.title("Reconstructed Pedigree from IBD Data")
    plt.show()
except ImportError:
    print("NetworkX is not available for visualization. Here's a text representation of the pedigree:")
    print("""     -1  -2         -3  -4
       \  /          \  /
         2            3
           \         /
             \      /
               \   /
                 1
                 |
                 4     
    """)

## Summary

In this lab, we've explored the core pedigree data structures in Bonsai v3 and their associated algorithms:

1. **Core Data Structures**:
   - `up_node_dict`: Maps individuals to their parents
   - `down_node_dict`: Maps individuals to their children

2. **Key Algorithms**:
   - Finding related individuals using `get_rel_set`
   - Discovering relationship paths with `get_all_paths`
   - Determining relationship types with `get_simple_rel_tuple`
   - Re-rooting pedigrees with `re_root_up_node_dict`
   - Identifying common ancestors with `get_common_anc_set`
   - Working with genotyped/ungenotyped individuals using `get_gt_id_set`
   - Finding pedigree founders with `get_founder_set`

3. **Pedigree Manipulation**:
   - Adding parents with `add_parent`
   - Building and modifying pedigree structures

4. **Practical Application**:
   - Building simple pedigrees from IBD data
   - Visualizing pedigree structures

These data structures and algorithms form the foundation of Bonsai v3's pedigree reconstruction capabilities, enabling efficient representation, manipulation, and analysis of family relationships.