# Lab 10: Up-Node Dictionary and Pedigree Representation

## Overview

In this lab, we'll explore the up-node dictionary, a fundamental data structure in Bonsai v3 for representing pedigrees. Understanding how Bonsai stores and manipulates pedigree relationships is crucial for comprehending the core algorithms used in genetic genealogy inference.

In [None]:
# 🧬 Google Colab Setup - Run this cell first!
import os
import sys
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
import networkx as nx
from IPython.display import display, HTML, Markdown

def is_colab():
    '''Check if running in Google Colab'''
    try:
        import google.colab
        return True
    except ImportError:
        return False

if is_colab():
    print("🔬 Setting up Google Colab environment...")
    
    # Install dependencies
    print("📦 Installing packages...")
    !pip install -q pysam biopython scikit-allel networkx pygraphviz seaborn plotly
    !apt-get update -qq && apt-get install -qq samtools bcftools tabix graphviz-dev
    
    # Create directories
    !mkdir -p /content/class_data /content/results
    
    # Download essential class data
    print("📥 Downloading class data...")
    S3_BASE = "https://computational-genetic-genealogy.s3.us-east-2.amazonaws.com/class_data/"
    data_files = [
        "pedigree.fam", "pedigree.def", 
        "merged_opensnps_autosomes_ped_sim.seg",
        "merged_opensnps_autosomes_ped_sim-everyone.fam",
        "ped_sim_run2.seg", "ped_sim_run2-everyone.fam"
    ]
    
    for file in data_files:
        !wget -q -O /content/class_data/{file} {S3_BASE}{file}
        print(f"  ✅ {file}")
    
    # Define utility functions
    def setup_environment():
        return "/content/class_data", "/content/results"
    
    def save_results(dataframe, filename, description="results"):
        os.makedirs("/content/results", exist_ok=True)
        full_path = f"/content/results/{filename}"
        dataframe.to_csv(full_path, index=False)
        display(HTML(f'''
        <div style="padding: 10px; background-color: #e3f2fd; border-left: 4px solid #2196f3; margin: 10px 0;">
            <p><strong>💾 Results saved!</strong> To download: 
            <code>from google.colab import files; files.download('{full_path}')</code></p>
        </div>
        '''))
        return full_path
    
    def save_plot(plt, filename, description="plot"):
        os.makedirs("/content/results", exist_ok=True)
        full_path = f"/content/results/{filename}"
        plt.savefig(full_path, dpi=300, bbox_inches='tight')
        plt.show()
        display(HTML(f'''
        <div style="padding: 10px; background-color: #e8f5e8; border-left: 4px solid #4caf50; margin: 10px 0;">
            <p><strong>📊 Plot saved!</strong> To download: 
            <code>from google.colab import files; files.download('{full_path}')</code></p>
        </div>
        '''))
        return full_path
    
    print("✅ Colab setup complete! Ready to explore genetic genealogy.")
    
else:
    print("🏠 Local environment detected")
    def setup_environment():
        return "class_data", "results"
    def save_results(df, filename, description=""):
        os.makedirs("results", exist_ok=True)
        path = f"results/{filename}"
        df.to_csv(path, index=False)
        return path
    def save_plot(plt, filename, description=""):
        os.makedirs("results", exist_ok=True)
        path = f"results/{filename}"
        plt.savefig(path, dpi=300, bbox_inches='tight')
        plt.show()
        return path

# Set up paths and configure visualization
DATA_DIR, RESULTS_DIR = setup_environment()
plt.style.use('seaborn-v0_8-whitegrid')
sns.set_context("notebook")

In [None]:
# Setup Bonsai module paths
if not is_jupyterlite():
    # In local environment, add the utils directory to system path
    utils_dir = os.getenv('PROJECT_UTILS_DIR', os.path.join(os.path.dirname(DATA_DIR), 'utils'))
    bonsaitree_dir = os.path.join(utils_dir, 'bonsaitree')
    
    # Add to path if it exists and isn't already there
    if os.path.exists(bonsaitree_dir) and bonsaitree_dir not in sys.path:
        sys.path.append(bonsaitree_dir)
        print(f"Added {bonsaitree_dir} to sys.path")
else:
    # In JupyterLite, use a simplified approach
    print("⚠️ Running in JupyterLite: Some Bonsai functionality may be limited.")
    print("This notebook is primarily designed for local execution where the Bonsai codebase is available.")

In [None]:
# Helper functions for exploring modules
def display_module_classes(module_name):
    """Display classes and their docstrings from a module"""
    try:
        # Import the module
        module = importlib.import_module(module_name)
        
        # Find all classes
        classes = inspect.getmembers(module, inspect.isclass)
        
        # Filter classes defined in this module (not imported)
        classes = [(name, cls) for name, cls in classes if cls.__module__ == module_name]
        
        # Print info for each class
        for name, cls in classes:
            print(f"\n## {name}")
            
            # Get docstring
            doc = inspect.getdoc(cls)
            if doc:
                print(f"Docstring: {doc}")
            else:
                print("No docstring available")
            
            # Get methods
            methods = inspect.getmembers(cls, inspect.isfunction)
            if methods:
                print("\nMethods:")
                for method_name, method in methods:
                    if not method_name.startswith('_'):  # Skip private methods
                        print(f"- {method_name}")
    except ImportError as e:
        print(f"Error importing module {module_name}: {e}")
    except Exception as e:
        print(f"Error processing module {module_name}: {e}")

def display_module_functions(module_name):
    """Display functions and their docstrings from a module"""
    try:
        # Import the module
        module = importlib.import_module(module_name)
        
        # Find all functions
        functions = inspect.getmembers(module, inspect.isfunction)
        
        # Filter functions defined in this module (not imported)
        functions = [(name, func) for name, func in functions if func.__module__ == module_name]
        
        # Print info for each function
        for name, func in functions:
            if name.startswith('_'):  # Skip private functions
                continue
                
            print(f"\n## {name}")
            
            # Get signature
            sig = inspect.signature(func)
            print(f"Signature: {name}{sig}")
            
            # Get docstring
            doc = inspect.getdoc(func)
            if doc:
                print(f"Docstring: {doc}")
            else:
                print("No docstring available")
    except ImportError as e:
        print(f"Error importing module {module_name}: {e}")
    except Exception as e:
        print(f"Error processing module {module_name}: {e}")

def view_source(obj):
    """Display the source code of an object (function or class)"""
    try:
        source = inspect.getsource(obj)
        display(Markdown(f"```python\n{source}\n```"))
    except Exception as e:
        print(f"Error retrieving source: {e}")

## Check Bonsai Installation

Let's verify that the Bonsai v3 module is available for import:

In [None]:
try:
    from utils.bonsaitree.bonsaitree import v3
    print("✅ Successfully imported Bonsai v3 module")
except ImportError as e:
    print(f"❌ Failed to import Bonsai v3 module: {e}")
    print("This lab requires access to the Bonsai v3 codebase.")
    print("Make sure you've properly set up your environment with the Bonsai repository.")

## Lab 10: Up-Node Dictionary and Pedigree Representation

The up-node dictionary is a core data structure in Bonsai v3 that represents pedigree relationships. It serves as the foundation for efficient storage, querying, and manipulation of family structures during pedigree inference.

In this lab, we'll explore:
1. The structure and design of the up-node dictionary
2. Key operations and functions for manipulating up-node dictionaries
3. Advanced concepts including tree traversal and pedigree combination

## Part 1: The Structure and Design of the Up-Node Dictionary

Let's begin by exploring the core concept of the up-node dictionary. This data structure represents a directed graph with individuals as nodes and parent-child relationships as edges, always pointing from child to parent (upward in a traditional pedigree diagram).

In [ ]:
# First, let's examine the actual Bonsai v3 functions
if not is_jupyterlite():
    try:
        # Import the key pedigree functions for examination
        from utils.bonsaitree.bonsaitree.v3.pedigrees import (
            reverse_node_dict as actual_reverse_node_dict,
            get_subdict as actual_get_subdict,
            get_rel_set as actual_get_rel_set,
            get_simple_rel_tuple as actual_get_simple_rel_tuple,
            get_all_paths as actual_get_all_paths,
            get_mrca_set as actual_get_mrca_set,
            get_gt_id_set as actual_get_gt_id_set
        )
        
        # Display the actual implementations
        print("Examining actual Bonsai v3 implementations:")
        print("\nreverse_node_dict:")
        view_source(actual_reverse_node_dict)
        
        print("\nget_rel_set:")
        view_source(actual_get_rel_set)
        
        print("\nget_simple_rel_tuple:")
        view_source(actual_get_simple_rel_tuple)
        
        # Now import them normally for use
        from utils.bonsaitree.bonsaitree.v3.pedigrees import (
            reverse_node_dict,
            get_subdict,
            get_rel_set,
            get_simple_rel_tuple,
            get_all_paths,
            get_mrca_set,
            get_gt_id_set
        )
        
        print("\nSuccessfully imported key pedigree functions from Bonsai v3")
    
    except Exception as e:
        print(f"Error accessing Bonsai v3 implementations: {e}")
        print("Using simplified implementations instead")
        
        # Define simplified versions as a fallback
        def reverse_node_dict(dct):
            """Reverse a node dict. If it's a down dict make it an up dict and vice versa."""
            rev_dct = {}
            for i, info in dct.items():
                for a, d in info.items():
                    if a not in rev_dct:
                        rev_dct[a] = {}
                    rev_dct[a][i] = d
            return rev_dct
        
        def get_subdict(dct, node):
            """Get the cone above/below node in a node dict."""
            if node not in dct:
                return {}
            import copy
            sub_dct = {}
            sub_dct[node] = copy.deepcopy(dct[node])
            for n in dct[node]:
                n_dct = get_subdict(dct, n)
                if n_dct:
                    sub_dct.update(n_dct)
            return sub_dct
        
        def get_rel_set(node_dict, i):
            """Find all ancestors/descendants of i based on node_dict type."""
            rel_set = {i}
            if i not in node_dict:
                return rel_set
            for relative in node_dict.get(i, {}):
                rel_set.add(relative)
                rel_set.update(get_rel_set(node_dict, relative))
            return rel_set
        
        def get_all_paths(up_node_dict, i, j):
            """Find all paths between individuals i and j."""
            i_ancs = get_rel_set(up_node_dict, i)
            j_ancs = get_rel_set(up_node_dict, j)
            common_ancs = i_ancs.intersection(j_ancs)
            if not common_ancs:
                return set(), set()
            paths = set()
            for ca in common_ancs:
                path = (i, ca, j)
                paths.add(path)
            return paths, common_ancs
            
        def get_simple_rel_tuple(up_node_dict, i, j):
            """Get relationship tuple (up, down, num_ancs) between individuals i and j."""
            if i == j:
                return (0, 0, 2)
            path_set, _ = get_all_paths(up_node_dict, i, j)
            if not path_set:
                return None
            num_ancs = len(path_set)
            path = path_set.pop() if isinstance(path_set, set) else path_set[0]
            up = down = 0
            for idx in range(len(path)-1):
                i1, i2 = path[idx], path[idx+1]
                if i1 in up_node_dict and i2 in up_node_dict[i1]:
                    up += up_node_dict[i1][i2]
                elif i2 in up_node_dict and i1 in up_node_dict[i2]:
                    down += up_node_dict[i2][i1]
            return up, down, num_ancs
        
        def get_mrca_set(up_dct, id_set):
            """Get the set of most recent common ancestors of id_set."""
            if len(id_set) == 1:
                return id_set
            common_ancs = None
            for i in id_set:
                i_ancs = get_rel_set(up_dct, i)
                if common_ancs is None:
                    common_ancs = i_ancs
                else:
                    common_ancs &= i_ancs
            return common_ancs if common_ancs else set()
        
        def get_gt_id_set(ped):
            """Get all genotyped IDs (positive) from the pedigree."""
            all_ids = set()
            for node, parents in ped.items():
                all_ids.add(node)
                all_ids.update(parents.keys())
            return {i for i in all_ids if i > 0}

# For JupyterLite, provide simplified implementations
elif is_jupyterlite():
    print("Running in JupyterLite environment. Using simplified implementations.")
    
    def reverse_node_dict(dct):
        """Reverse a node dict. If it's a down dict make it an up dict and vice versa."""
        rev_dct = {}
        for i, info in dct.items():
            for a, d in info.items():
                if a not in rev_dct:
                    rev_dct[a] = {}
                rev_dct[a][i] = d
        return rev_dct
    
    def get_subdict(dct, node):
        """Get the cone above/below node in a node dict."""
        if node not in dct:
            return {}
        import copy
        sub_dct = {}
        sub_dct[node] = copy.deepcopy(dct[node])
        for n in dct[node]:
            n_dct = get_subdict(dct, n)
            if n_dct:
                sub_dct.update(n_dct)
        return sub_dct
    
    def get_rel_set(node_dict, i):
        """Find all ancestors/descendants of i based on node_dict type."""
        rel_set = {i}
        if i not in node_dict:
            return rel_set
        for relative in node_dict.get(i, {}):
            rel_set.add(relative)
            rel_set.update(get_rel_set(node_dict, relative))
        return rel_set
    
    def get_simple_rel_tuple(up_node_dict, i, j):
        """Get relationship tuple (up, down, num_ancs) between individuals i and j."""
        if i == j:
            return (0, 0, 2)
        path_set, _ = get_all_paths(up_node_dict, i, j)
        if not path_set:
            return None
        num_ancs = len(path_set)
        path = path_set.pop() if isinstance(path_set, set) else path_set[0]
        up = down = 0
        for idx in range(len(path)-1):
            i1, i2 = path[idx], path[idx+1]
            if i1 in up_node_dict and i2 in up_node_dict[i1]:
                up += up_node_dict[i1][i2]
            elif i2 in up_node_dict and i1 in up_node_dict[i2]:
                down += up_node_dict[i2][i1]
        return up, down, num_ancs
    
    def get_all_paths(up_node_dict, i, j):
        """Find all paths between individuals i and j."""
        i_ancs = get_rel_set(up_node_dict, i)
        j_ancs = get_rel_set(up_node_dict, j)
        common_ancs = i_ancs.intersection(j_ancs)
        if not common_ancs:
            return set(), set()
        paths = set()
        for ca in common_ancs:
            path = (i, ca, j)
            paths.add(path)
        return paths, common_ancs
    
    def get_mrca_set(up_dct, id_set):
        """Get the set of most recent common ancestors of id_set."""
        if len(id_set) == 1:
            return id_set
        common_ancs = None
        for i in id_set:
            i_ancs = get_rel_set(up_dct, i)
            if common_ancs is None:
                common_ancs = i_ancs
            else:
                common_ancs &= i_ancs
        return common_ancs if common_ancs else set()
    
    def get_gt_id_set(ped):
        """Get all genotyped IDs (positive) from the pedigree."""
        all_ids = set()
        for node, parents in ped.items():
            all_ids.add(node)
            all_ids.update(parents.keys())
        return {i for i in all_ids if i > 0}

### 1.1 The Structure of the Up-Node Dictionary

Let's examine the basic structure of an up-node dictionary. In Bonsai v3, it follows this format:

```python
up_node_dict = {
    individual_id: {parent_id1: degree1, parent_id2: degree2},
    ...
}
```

Where:
- Each key is an individual ID
- Each value is a dictionary mapping parent IDs to the degrees of relationship (usually 1 for direct parent-child relationships)
- Positive IDs represent observed/genotyped individuals
- Negative IDs represent inferred/latent (ungenotyped) ancestors
- Empty dictionaries (`{}`) represent founder individuals with no recorded parents

Let's create a simple example pedigree to explore these concepts:

In [None]:
# Create a three-generation family example
# Grandparents: 1, 2, 3, 4
# Parents: 5, 6
# Child: 7
# Note: Individuals 1-7 are genotyped (positive IDs)

simple_pedigree = {
    7: {5: 1, 6: 1},    # Child with parents 5 and 6 (degree 1)
    5: {1: 1, 2: 1},    # Parent 5 with parents 1 and 2 (degree 1)
    6: {3: 1, 4: 1},    # Parent 6 with parents 3 and 4 (degree 1)
    1: {},              # Founder (no parents)
    2: {},              # Founder (no parents)
    3: {},              # Founder (no parents)
    4: {}               # Founder (no parents)
}

# Let's examine the structure
for person_id, parents in simple_pedigree.items():
    if parents:  # If the person has parents
        parent_info = ", ".join([f"parent {p} (degree {d})" for p, d in parents.items()])
        print(f"Person {person_id} has parents: {parent_info}")
    else:  # If the person is a founder
        print(f"Person {person_id} is a founder (no parents)")

Let's also define a function to visualize pedigrees, which will help us understand the up-node dictionary structure better:

In [None]:
def visualize_pedigree(up_node_dict, title="Pedigree", node_labels=None):
    """Visualize a pedigree from an up_node_dict using networkx."""
    # Create a directed graph (edges point from child to parent)
    G = nx.DiGraph()
    
    # Add all nodes to the graph (combine all IDs from keys and values)
    all_ids = set(up_node_dict.keys())
    for parents in up_node_dict.values():
        all_ids.update(parents.keys())
    
    # Create node labels (default to the node ID if not provided)
    if node_labels is None:
        node_labels = {node_id: str(node_id) for node_id in all_ids}
    
    # Create a color map - blue for genotyped (positive IDs), gray for ungenotyped (negative IDs)
    color_map = ['lightblue' if node_id > 0 else 'lightgray' for node_id in all_ids]
    
    # Add edges (from child to parent)
    edges = []
    for child, parents in up_node_dict.items():
        for parent in parents:
            edges.append((child, parent))
    
    G.add_edges_from(edges)
    
    # Create plot
    plt.figure(figsize=(10, 6))
    plt.title(title)
    
    # Layout: By default, parents are shown above children (opposite arrow direction)
    pos = nx.spring_layout(G, seed=42)  # For reproducibility, use a fixed seed
    
    # Draw nodes
    nx.draw(G, pos, with_labels=True, labels=node_labels, node_color=color_map, 
            node_size=800, font_weight='bold')
    
    # Draw edges
    nx.draw_networkx_edges(G, pos, width=1.0, alpha=0.5, arrows=True)
    
    plt.tight_layout()
    plt.show()

# Visualize our simple pedigree
visualize_pedigree(simple_pedigree, title="Three-Generation Pedigree")

Let's look at the actual implementation of the up-node dictionary in the Bonsai v3 codebase:

In [None]:
# Import the Bonsai v3 pedigrees module (if not in JupyterLite)
if not is_jupyterlite():
    import utils.bonsaitree.bonsaitree.v3.pedigrees as pedigrees_module
    
    # Let's examine some of the key functions that operate on up-node dictionaries
    print("Key functions for up-node dictionaries in Bonsai v3:")
    for func_name in [
        'reverse_node_dict', 
        'get_subdict', 
        'get_rel_set', 
        'get_all_paths', 
        'get_simple_rel_tuple'
    ]:
        if hasattr(pedigrees_module, func_name):
            func = getattr(pedigrees_module, func_name)
            # Print the function signature and docstring
            print(f"\n## {func_name}")
            print(f"Signature: {func_name}{inspect.signature(func)}")
            doc = inspect.getdoc(func)
            if doc:
                print(f"Docstring: {doc}")
else:
    print("Cannot access the Bonsai v3 codebase directly in JupyterLite environment.")

### 1.2 Converting Between Up-Node and Down-Node Dictionaries

The up-node dictionary represents relationships from child to parent (upward in the pedigree). Bonsai v3 also uses down-node dictionaries, which represent relationships from parent to child (downward in the pedigree).

Converting between these two representations is a common operation. Let's examine the `reverse_node_dict` function:

In [None]:
# View the source code for reverse_node_dict (if not in JupyterLite)
if not is_jupyterlite():
    print("Source code for reverse_node_dict:")
    view_source(pedigrees_module.reverse_node_dict)
else:
    print("Using simplified reverse_node_dict in JupyterLite environment:")
    view_source(reverse_node_dict)

In [None]:
# Convert our simple_pedigree from an up-node dict to a down-node dict
down_node_dict = reverse_node_dict(simple_pedigree)

# Examine the down-node structure
print("Down-node dictionary structure:")
for parent_id, children in down_node_dict.items():
    children_info = ", ".join([f"child {c} (degree {d})" for c, d in children.items()])
    print(f"Person {parent_id} has children: {children_info}")

# Visualize the down-node structure (it's the same pedigree, just with reversed internal representation)
visualize_pedigree(reverse_node_dict(down_node_dict), title="Same Pedigree After Double Reversal")

### 1.3 Representing Complex Family Structures

The up-node dictionary can represent various complex family structures, including:
- Half-siblings (two individuals who share only one parent)
- Consanguinity (inbreeding or marriages between relatives)
- Pedigrees with missing ancestors or complex topologies

Let's create a more complex pedigree with some of these features:

In [None]:
# Create a complex pedigree with half-siblings
complex_pedigree = {
    # Children (third generation)
    7: {5: 1, 6: 1},    # Child of parents 5 and 6
    8: {5: 1, 6: 1},    # Sibling of 7 (same parents)
    9: {-1: 1, 5: 1},   # Half-sibling (shares parent 5 with 7 and 8, other parent is ungenotyped -1)
    
    # Parents (second generation)
    5: {1: 1, 2: 1},    # Parent with own parents 1 and 2
    6: {3: 1, 4: 1},    # Parent with own parents 3 and 4
    -1: {},             # Ungenotyped parent (no known ancestors)
    
    # Grandparents (first generation)
    1: {},
    2: {},
    3: {},
    4: {}
}

# Visualize the complex pedigree
visualize_pedigree(complex_pedigree, title="Complex Pedigree with Half-Siblings")

Let's create a more complex pedigree with consanguinity (where individuals 3 and 4 are related to each other):

In [None]:
# Create a pedigree with consanguinity (inbreeding)
consanguineous_pedigree = {
    # Children (fourth generation)
    10: {7: 1, 8: 1},   # Child of cousins 7 and 8
    
    # Parents (third generation - cousins who partnered)
    7: {5: 1, 6: 1},    # First cousin of 8
    8: {-1: 1, -2: 1},  # First cousin of 7
    9: {5: 1, 6: 1},    # Sibling of 7
    
    # Grandparents (second generation - siblings)
    5: {1: 1, 2: 1},    # Sibling of -2
    6: {3: 1, 4: 1},    # Unrelated to others
    -1: {1: 1, 2: 1},   # Sibling of 5
    -2: {1: 1, 2: 1},   # Sibling of 5 and -1
    
    # Great-grandparents (first generation)
    1: {},
    2: {},
    3: {},
    4: {}
}

# Visualize the consanguineous pedigree
visualize_pedigree(consanguineous_pedigree, title="Pedigree with Consanguinity")

## Part 2: Key Operations on Up-Node Dictionaries

Now let's explore the key operations that can be performed on up-node dictionaries in Bonsai v3. These functions allow us to query relationships, find paths between individuals, and extract information about the pedigree structure.

### 2.1 Finding Ancestors and Descendants

One of the most common operations is finding the ancestors of an individual (using the up-node dictionary) or the descendants of an individual (using the down-node dictionary). Let's explore these operations:

In [None]:
# Let's look at how get_rel_set finds ancestors in an up-node dictionary
if not is_jupyterlite():
    print("Source code for get_rel_set:")
    view_source(pedigrees_module.get_rel_set)
else:
    print("Using simplified get_rel_set in JupyterLite environment:")
    view_source(get_rel_set)

In [None]:
# Find ancestors of individual 7 in the simple pedigree
ancestors_of_7 = get_rel_set(simple_pedigree, 7)
print(f"Ancestors of individual 7 (including self): {ancestors_of_7}")

# Convert to down-node dict and find descendants of individual 1
simple_down_dict = reverse_node_dict(simple_pedigree)
descendants_of_1 = get_rel_set(simple_down_dict, 1)
print(f"Descendants of individual 1 (including self): {descendants_of_1}")

# Find ancestors of individual 10 in the consanguineous pedigree (should include some duplicates)
ancestors_of_10 = get_rel_set(consanguineous_pedigree, 10)
print(f"Ancestors of individual 10 (including self): {ancestors_of_10}")

### 2.2 Finding Relationships Between Individuals

To understand how individuals are related, we need to find the paths connecting them and determine their relationship type. In Bonsai v3, this is done using the `get_all_paths` and `get_simple_rel_tuple` functions.

In [None]:
# Let's examine the source of get_simple_rel_tuple
if not is_jupyterlite():
    print("Source code for get_simple_rel_tuple:")
    view_source(pedigrees_module.get_simple_rel_tuple)
else:
    print("Using simplified get_simple_rel_tuple in JupyterLite environment:")
    view_source(get_simple_rel_tuple)

In [None]:
# Define common relationship patterns for reference
relationships = {
    "identical/self": (0, 0, 2),
    "parent-child": (0, 1, 1),
    "child-parent": (1, 0, 1),
    "full sibling": (1, 1, 2),
    "half sibling": (1, 1, 1),
    "grandparent-grandchild": (0, 2, 1),
    "grandchild-grandparent": (2, 0, 1),
    "aunt/uncle-niece/nephew": (1, 2, 1),
    "niece/nephew-aunt/uncle": (2, 1, 1),
    "first cousin": (2, 2, 1),
    "first cousin once removed (down)": (2, 3, 1),
    "first cousin once removed (up)": (3, 2, 1),
    "second cousin": (3, 3, 1),
}

# Create a reverse mapping for lookup
rel_names = {v: k for k, v in relationships.items()}

In [None]:
# Find all paths between individuals 7 and 5 in the simple pedigree
paths_7_5, common_ancestors_7_5 = get_all_paths(simple_pedigree, 7, 5)
print(f"Paths between 7 and 5: {paths_7_5}")
print(f"Common ancestors of 7 and 5: {common_ancestors_7_5}")

# Find the relationship between individuals 7 and 5
rel_7_5 = get_simple_rel_tuple(simple_pedigree, 7, 5)
print(f"Relationship between 7 and 5: {rel_7_5} ({rel_names.get(rel_7_5, 'Unknown')})")

# Find the relationship between siblings
rel_7_8 = get_simple_rel_tuple(complex_pedigree, 7, 8)
print(f"Relationship between 7 and 8: {rel_7_8} ({rel_names.get(rel_7_8, 'Unknown')})")

# Find the relationship between half-siblings
rel_7_9 = get_simple_rel_tuple(complex_pedigree, 7, 9)
print(f"Relationship between 7 and 9: {rel_7_9} ({rel_names.get(rel_7_9, 'Unknown')})")

# Find the relationship between cousins who partnered
rel_7_8_cons = get_simple_rel_tuple(consanguineous_pedigree, 7, 8)
print(f"Relationship between 7 and 8 in consanguineous pedigree: {rel_7_8_cons} ({rel_names.get(rel_7_8_cons, 'Unknown')})")

### 2.3 Working with Subtrees and Extracting Subpedigrees

Often, we need to work with just a portion of a pedigree - for example, extracting the subtree connecting a specific set of individuals. Let's explore how to do this using Bonsai v3 functions:

In [ ]:
# First, let's look at the actual implementation in Bonsai v3
if not is_jupyterlite():
    try:
        # Get the actual function from Bonsai v3
        from utils.bonsaitree.bonsaitree.v3.pedigrees import get_sub_up_node_dict as actual_get_sub_up_node_dict
        
        # Display the source code
        print("Source code for get_sub_up_node_dict in Bonsai v3:")
        view_source(actual_get_sub_up_node_dict)
        
        # Import for use
        from utils.bonsaitree.bonsaitree.v3.pedigrees import get_sub_up_node_dict
        
        print("\nSuccessfully imported get_sub_up_node_dict from Bonsai v3")
    except Exception as e:
        print(f"Error accessing the actual implementation: {e}")
        print("Using a simplified version instead")
        
        # Define a simplified version as fallback
        def get_subtree_node_set(up_dct, id_set):
            """Get nodes in the subtree connecting IDs."""
            if len(id_set) == 1:
                return id_set
            node_set = set()
            for i, j in [(a, b) for a in id_set for b in id_set if a != b]:
                paths, _ = get_all_paths(up_dct, i, j)
                for path in paths:
                    node_set.update(path)
            return node_set
            
        def get_sub_up_dct_for_id_set(up_dct, id_set):
            """Extract subset of pedigree containing only nodes in id_set."""
            return {
                i: {n: d for n, d in up_dct.get(i, {}).items() if n in id_set}
                for i in id_set
            }
        
        def get_sub_up_node_dict(up_dct, id_set):
            """Get subtree connecting all IDs in id_set."""
            # Get nodes in the connecting subtree
            subtree_node_set = get_subtree_node_set(up_dct, id_set)
            # Extract the subtree
            return get_sub_up_dct_for_id_set(up_dct, subtree_node_set)
            
# For JupyterLite compatibility
elif is_jupyterlite():
    print("Running in JupyterLite environment. Using simplified implementation.")
    
    def get_subtree_node_set(up_dct, id_set):
        """Get nodes in the subtree connecting IDs."""
        if len(id_set) == 1:
            return id_set
        node_set = set()
        for i, j in [(a, b) for a in id_set for b in id_set if a != b]:
            paths, _ = get_all_paths(up_dct, i, j)
            for path in paths:
                node_set.update(path)
        return node_set
        
    def get_sub_up_dct_for_id_set(up_dct, id_set):
        """Extract subset of pedigree containing only nodes in id_set."""
        return {
            i: {n: d for n, d in up_dct.get(i, {}).items() if n in id_set}
            for i in id_set
        }
    
    def get_sub_up_node_dict(up_dct, id_set):
        """Get subtree connecting all IDs in id_set."""
        # Get nodes in the connecting subtree
        subtree_node_set = get_subtree_node_set(up_dct, id_set)
        # Extract the subtree
        return get_sub_up_dct_for_id_set(up_dct, subtree_node_set)

In [None]:
# Extract the subtree connecting individuals 7 and 9 in the complex pedigree
subtree_7_9 = get_sub_up_node_dict(complex_pedigree, {7, 9})

print("Subtree connecting individuals 7 and 9:")
for person, parents in subtree_7_9.items():
    parent_info = ", ".join([f"parent {p} (degree {d})" for p, d in parents.items()]) if parents else "no parents"
    print(f"Person {person} has {parent_info}")

# Visualize the subtree
visualize_pedigree(subtree_7_9, title="Subtree Connecting Individuals 7 and 9")

Let's try another example - extracting the subtree connecting all third-generation individuals in the consanguineous pedigree:

In [None]:
# Extract the subtree connecting third-generation individuals (7, 8, 9)
subtree_third_gen = get_sub_up_node_dict(consanguineous_pedigree, {7, 8, 9})

# Visualize the subtree
visualize_pedigree(subtree_third_gen, title="Subtree Connecting Third-Generation Individuals")

## Part 3: Advanced Operations and Applications

In this section, we'll explore more advanced operations on up-node dictionaries, including finding common ancestors, identifying connection points between pedigrees, and modifying pedigree structures.

### 3.1 Finding Most Recent Common Ancestors (MRCAs)

The Most Recent Common Ancestor (MRCA) is a fundamental concept in genealogy. For a set of individuals, the MRCA is the most recent ancestor they all share. Let's explore how to find MRCAs using Bonsai v3:

In [None]:
# Find the most recent common ancestors of individuals 7, 8, and 9 in the complex pedigree
mrca_7_8_9 = get_mrca_set(complex_pedigree, {7, 8, 9})
print(f"Most recent common ancestors of 7, 8, and 9: {mrca_7_8_9}")

# Find the most recent common ancestors of 7 and 8 in the complex pedigree
mrca_7_8 = get_mrca_set(complex_pedigree, {7, 8})
print(f"Most recent common ancestors of 7 and 8: {mrca_7_8}")

# Find the most recent common ancestors of 7 and 9 in the complex pedigree
mrca_7_9 = get_mrca_set(complex_pedigree, {7, 9})
print(f"Most recent common ancestors of 7 and 9: {mrca_7_9}")

### 3.2 Manipulating Pedigree Structures

Let's explore how to modify pedigree structures by adding or removing individuals:

In [ ]:
# First, let's examine the actual implementations in Bonsai v3
if not is_jupyterlite():
    try:
        # Get the actual functions for manipulation from Bonsai v3
        from utils.bonsaitree.bonsaitree.v3.pedigrees import (
            add_parent as actual_add_parent,
            delete_node as actual_delete_node,
            get_min_id as actual_get_min_id
        )
        
        # Display the source code for these functions
        print("Source code for add_parent in Bonsai v3:")
        view_source(actual_add_parent)
        
        print("\nSource code for delete_node in Bonsai v3:")
        view_source(actual_delete_node)
        
        print("\nSource code for get_min_id in Bonsai v3:")
        view_source(actual_get_min_id)
        
        # Import for use
        from utils.bonsaitree.bonsaitree.v3.pedigrees import add_parent, delete_node, get_min_id
        
        print("\nSuccessfully imported pedigree manipulation functions from Bonsai v3")
    except Exception as e:
        print(f"Error accessing the actual implementations: {e}")
        print("Using simplified versions instead")
        
        # Define simplified versions as fallback
        def get_min_id(dct):
            """Get the minimal ID in a node dict."""
            all_ids = set(dct.keys())
            for parents in dct.values():
                all_ids.update(parents.keys())
            min_id = min(all_ids) if all_ids else 0
            return min(-1, min_id)  # ensure ID is negative
        
        def add_parent(node, up_dct, min_id=None):
            """Add an ungenotyped parent to node in up_dct."""
            import copy
            up_dct = copy.deepcopy(up_dct)
            
            if node not in up_dct:
                raise ValueError(f"Node {node} is not in up dct.")
                
            pid_dict = up_dct[node]
            if len(pid_dict) >= 2:
                return up_dct, None
                
            if min_id is None:
                min_id = get_min_id(up_dct)
                
            new_pid = min_id - 1
            up_dct[node][new_pid] = 1
            up_dct[new_pid] = {}
            
            return up_dct, new_pid
        
        def delete_node(dct, node):
            """Delete node from a node dict."""
            new_dct = {}
            for k, v in dct.items():
                if k != node:
                    new_dct[k] = {r: d for r, d in v.items() if r != node}
            return new_dct
            
# For JupyterLite compatibility
elif is_jupyterlite():
    print("Running in JupyterLite environment. Using simplified implementations.")
    
    def get_min_id(dct):
        """Get the minimal ID in a node dict."""
        all_ids = set(dct.keys())
        for parents in dct.values():
            all_ids.update(parents.keys())
        min_id = min(all_ids) if all_ids else 0
        return min(-1, min_id)  # ensure ID is negative
    
    def add_parent(node, up_dct, min_id=None):
        """Add an ungenotyped parent to node in up_dct."""
        import copy
        up_dct = copy.deepcopy(up_dct)
        
        if node not in up_dct:
            raise ValueError(f"Node {node} is not in up dct.")
            
        pid_dict = up_dct[node]
        if len(pid_dict) >= 2:
            return up_dct, None
            
        if min_id is None:
            min_id = get_min_id(up_dct)
            
        new_pid = min_id - 1
        up_dct[node][new_pid] = 1
        up_dct[new_pid] = {}
        
        return up_dct, new_pid
    
    def delete_node(dct, node):
        """Delete node from a node dict."""
        new_dct = {}
        for k, v in dct.items():
            if k != node:
                new_dct[k] = {r: d for r, d in v.items() if r != node}
        return new_dct

In [None]:
# Start with a copy of the simple pedigree
import copy
pedigree_to_modify = copy.deepcopy(simple_pedigree)

# Find the minimum ID in the pedigree (for assigning new ungenotyped IDs)
min_id = get_min_id(pedigree_to_modify)
print(f"Minimum ID in the pedigree: {min_id}")

# Add an ungenotyped parent to individual 1 (who is a founder)
modified_pedigree, new_parent_id = add_parent(1, pedigree_to_modify)
print(f"Added parent {new_parent_id} to individual 1")

# Add the new parent to the up_node_dict if it's not already there
if new_parent_id is not None and new_parent_id not in modified_pedigree:
    modified_pedigree[new_parent_id] = {}

# Visualize the modified pedigree
visualize_pedigree(modified_pedigree, title="Pedigree with Added Ungenotyped Parent")

In [None]:
# Now let's delete a node
pedigree_with_deletion = delete_node(modified_pedigree, 2)
print("Deleted individual 2 from the pedigree")

# Visualize the pedigree after deletion
visualize_pedigree(pedigree_with_deletion, title="Pedigree After Deleting Individual 2")

### 3.3 Building a Custom Pedigree

Let's implement a function to build a custom pedigree from a set of relationships, demonstrating how up-node dictionaries are constructed in practice:

In [None]:
def build_pedigree(relationships):
    """
    Build an up-node dictionary from a list of parent-child relationships.
    
    Args:
        relationships: List of tuples (child_id, parent1_id, parent2_id)
                      where parent IDs can be None if unknown
    
    Returns:
        up_node_dict: Dictionary representing the pedigree
    """
    up_node_dict = {}
    
    # Process each relationship
    for child, parent1, parent2 in relationships:
        # Ensure child is in the pedigree
        if child not in up_node_dict:
            up_node_dict[child] = {}
        
        # Add first parent if provided
        if parent1 is not None:
            up_node_dict[child][parent1] = 1
            if parent1 not in up_node_dict:
                up_node_dict[parent1] = {}
        
        # Add second parent if provided
        if parent2 is not None:
            up_node_dict[child][parent2] = 1
            if parent2 not in up_node_dict:
                up_node_dict[parent2] = {}
    
    return up_node_dict

# Define relationships for a new pedigree: (child, parent1, parent2)
custom_relationships = [
    # Fourth generation
    (10, 7, 8),       # Child of 7 and 8
    
    # Third generation
    (7, 5, 6),        # Child of 5 and 6
    (8, 5, -1),       # Child of 5 and ungenotyped -1
    (9, 5, 6),        # Sibling of 7
    
    # Second generation
    (5, 1, 2),        # Child of 1 and 2
    (6, 3, 4),        # Child of 3 and 4
    (-1, None, None), # Ungenotyped individual with no parents
    
    # First generation (founders)
    (1, None, None),
    (2, None, None),
    (3, None, None),
    (4, None, None)
]

# Build the custom pedigree
custom_pedigree = build_pedigree(custom_relationships)

# Visualize the custom pedigree
visualize_pedigree(custom_pedigree, title="Custom-Built Pedigree")

### 3.4 Practical Application: Analyzing Relationship Types in a Pedigree

Let's create a function to analyze all relationships in a pedigree and count the different types:

In [None]:
def analyze_pedigree_relationships(up_node_dict):
    """
    Analyze the relationships present in a pedigree.
    
    Args:
        up_node_dict: Up-node dictionary representing the pedigree
    
    Returns:
        DataFrame summarizing the relationships
    """
    # Get all genotyped IDs
    genotyped_ids = get_gt_id_set(up_node_dict)
    
    # Store all relationships
    relationships_found = []
    
    # For each pair of genotyped individuals
    for i in genotyped_ids:
        for j in genotyped_ids:
            if i >= j:  # Skip redundant pairs and self-relationships
                continue
                
            # Get the relationship tuple
            rel_tuple = get_simple_rel_tuple(up_node_dict, i, j)
            
            # Get the relationship name
            rel_name = rel_names.get(rel_tuple, "Unknown")
            
            # Store the relationship
            if rel_tuple is not None:
                up, down, num_ancs = rel_tuple
                relationships_found.append({
                    "Person 1": i,
                    "Person 2": j,
                    "Up": up,
                    "Down": down,
                    "Num Common Ancestors": num_ancs,
                    "Relationship": rel_name
                })
    
    # Convert to DataFrame for easy analysis
    if relationships_found:
        df = pd.DataFrame(relationships_found)
        
        # Count relationship types
        rel_counts = df['Relationship'].value_counts().reset_index()
        rel_counts.columns = ['Relationship Type', 'Count']
        
        return df, rel_counts
    else:
        return pd.DataFrame(), pd.DataFrame()

# Analyze the complex pedigree
rel_df, rel_counts = analyze_pedigree_relationships(complex_pedigree)

# Display the results
print("All relationships in the complex pedigree:")
display(rel_df)

print("\nRelationship type counts:")
display(rel_counts)

# Plot the relationship distribution
plt.figure(figsize=(10, 6))
sns.barplot(x='Relationship Type', y='Count', data=rel_counts)
plt.title('Relationship Types in Complex Pedigree')
plt.xticks(rotation=45)
plt.tight_layout()
plt.show()

Let's analyze the consanguineous pedigree as well:

In [None]:
# Analyze the consanguineous pedigree
rel_df_con, rel_counts_con = analyze_pedigree_relationships(consanguineous_pedigree)

# Display the results
print("All relationships in the consanguineous pedigree:")
display(rel_df_con)

print("\nRelationship type counts:")
display(rel_counts_con)

# Plot the relationship distribution
plt.figure(figsize=(10, 6))
sns.barplot(x='Relationship Type', y='Count', data=rel_counts_con)
plt.title('Relationship Types in Consanguineous Pedigree')
plt.xticks(rotation=45)
plt.tight_layout()
plt.show()

## Summary

In this lab, we've explored the up-node dictionary, a core data structure in Bonsai v3 for representing pedigrees. Key takeaways include:

1. **Structure and Design**: The up-node dictionary is a powerful, flexible representation where each key is an individual ID, and each value is a dictionary mapping parent IDs to degrees of relationship. This design efficiently encodes ancestral relationships and can represent complex pedigree structures.

2. **ID Conventions**: The Bonsai v3 pedigree representation uses positive IDs for genotyped individuals and negative IDs for ungenotyped/inferred individuals. Empty dictionaries represent founders (individuals with no recorded parents).

3. **Relationship Representation**: Relationships are encoded as tuples (up, down, num_ancs), where up is the number of meioses up to the common ancestor, down is the number of meioses down to the second individual, and num_ancs is the number of common ancestors (1 for half relationships, 2 for full relationships).

4. **Key Operations**: Bonsai v3 provides a rich set of functions for working with pedigrees, including finding ancestors/descendants, determining relationships, extracting subtrees, and identifying common ancestors.

5. **Bidirectional Navigation**: By converting between up-node and down-node dictionaries, we can efficiently navigate both up (toward ancestors) and down (toward descendants) through the pedigree.

6. **Advanced Applications**: The up-node dictionary structure supports advanced operations like finding connection points between pedigrees, analyzing relationship distributions, and handling complex structures like consanguinity.

The up-node dictionary is the foundation upon which the Bonsai v3 algorithms operate, enabling efficient pedigree construction, analysis, and inference from genetic data.

In [None]:
# Convert this notebook to PDF using poetry
!poetry run jupyter nbconvert --to pdf Lab10_Up_Node_Dictionary.ipynb

# Note: PDF conversion requires LaTeX to be installed on your system
# If you encounter errors, you may need to install it:
# On Ubuntu/Debian: sudo apt-get install texlive-xetex
# On macOS with Homebrew: brew install texlive