# Lab 9: Pedigree Data Structures Implementation

## Overview

In this lab, we'll explore the core pedigree data structures in the Bonsai v3 codebase. Understanding how Bonsai represents and manipulates pedigrees is crucial for comprehending the rest of the codebase. The pedigree implementation is at the heart of Bonsai's ability to infer relationships and build family trees from genetic data.

In [None]:
# Standard imports
import os
import sys
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
import networkx as nx
from IPython.display import display, HTML, Markdown
import inspect
import importlib

sys.path.append(os.path.dirname(os.getcwd()))

# Cross-compatibility setup
from scripts_support.lab_cross_compatibility import setup_environment, is_jupyterlite, save_results, save_plot

# Set up environment-specific paths
DATA_DIR, RESULTS_DIR = setup_environment()

# Set visualization styles
plt.style.use('seaborn-v0_8-whitegrid')
sns.set_context("notebook")

In [None]:
# Setup Bonsai module paths
if not is_jupyterlite():
    # In local environment, add the utils directory to system path
    utils_dir = os.getenv('PROJECT_UTILS_DIR', os.path.join(os.path.dirname(DATA_DIR), 'utils'))
    bonsaitree_dir = os.path.join(utils_dir, 'bonsaitree')
    
    # Add to path if it exists and isn't already there
    if os.path.exists(bonsaitree_dir) and bonsaitree_dir not in sys.path:
        sys.path.append(bonsaitree_dir)
        print(f"Added {bonsaitree_dir} to sys.path")
else:
    # In JupyterLite, use a simplified approach
    print("⚠️ Running in JupyterLite: Some Bonsai functionality may be limited.")
    print("This notebook is primarily designed for local execution where the Bonsai codebase is available.")

In [None]:
# Helper functions for exploring modules
def display_module_classes(module_name):
    """Display classes and their docstrings from a module"""
    try:
        # Import the module
        module = importlib.import_module(module_name)
        
        # Find all classes
        classes = inspect.getmembers(module, inspect.isclass)
        
        # Filter classes defined in this module (not imported)
        classes = [(name, cls) for name, cls in classes if cls.__module__ == module_name]
        
        # Print info for each class
        for name, cls in classes:
            print(f"\
## {name}")
            
            # Get docstring
            doc = inspect.getdoc(cls)
            if doc:
                print(f"Docstring: {doc}")
            else:
                print("No docstring available")
            
            # Get methods
            methods = inspect.getmembers(cls, inspect.isfunction)
            if methods:
                print("\
Methods:")
                for method_name, method in methods:
                    if not method_name.startswith('_'):  # Skip private methods
                        print(f"- {method_name}")
    except ImportError as e:
        print(f"Error importing module {module_name}: {e}")
    except Exception as e:
        print(f"Error processing module {module_name}: {e}")

def display_module_functions(module_name):
    """Display functions and their docstrings from a module"""
    try:
        # Import the module
        module = importlib.import_module(module_name)
        
        # Find all functions
        functions = inspect.getmembers(module, inspect.isfunction)
        
        # Filter functions defined in this module (not imported)
        functions = [(name, func) for name, func in functions if func.__module__ == module_name]
        
        # Print info for each function
        for name, func in functions:
            if name.startswith('_'):  # Skip private functions
                continue
                
            print(f"\
## {name}")
            
            # Get signature
            sig = inspect.signature(func)
            print(f"Signature: {name}{sig}")
            
            # Get docstring
            doc = inspect.getdoc(func)
            if doc:
                print(f"Docstring: {doc}")
            else:
                print("No docstring available")
    except ImportError as e:
        print(f"Error importing module {module_name}: {e}")
    except Exception as e:
        print(f"Error processing module {module_name}: {e}")

def view_source(obj):
    """Display the source code of an object (function or class)"""
    try:
        source = inspect.getsource(obj)
        display(Markdown(f"```python\
{source}\
```"))
    except Exception as e:
        print(f"Error retrieving source: {e}")

## Check Bonsai Installation

Let's verify that the Bonsai v3 module is available for import:

In [None]:
try:
    from utils.bonsaitree.bonsaitree import v3
    print("✅ Successfully imported Bonsai v3 module")
except ImportError as e:
    print(f"❌ Failed to import Bonsai v3 module: {e}")
    print("This lab requires access to the Bonsai v3 codebase.")
    print("Make sure you've properly set up your environment with the Bonsai repository.")

## Lab 9: Pedigree Data Structures in Bonsai v3

In this lab, we'll explore how Bonsai v3 implements and manipulates pedigree data structures. Pedigrees are the core representation in Bonsai, allowing it to model family relationships, track inheritance, and infer connections between individuals.

We'll investigate:
1. The fundamental data structures used to represent pedigrees
2. How to navigate and query these data structures
3. How to build and modify pedigrees programmatically

The key module for pedigree manipulation in Bonsai v3 is `pedigrees.py`, which contains most of the functions for working with pedigree representations.

## Part 1: Fundamental Pedigree Representations

Bonsai v3 uses dictionary structures to represent pedigrees. Let's explore the fundamental data structures:

### 1.1 Node Dictionaries: up_node_dict and down_node_dict

The primary pedigree representation in Bonsai v3 consists of two main dictionaries:

1. **up_node_dict**: Maps individuals to their ancestors with degrees of relationship
   - Format: `{id: {parent1: degree1, parent2: degree2, ...}, ...}`
   
2. **down_node_dict**: Maps individuals to their descendants with degrees of relationship
   - Format: `{id: {child1: degree1, child2: degree2, ...}, ...}`

These dictionaries use integer IDs (positive for genotyped individuals, negative for inferred/ungenotyped individuals).

In [None]:
# Create a simple pedigree example
# Let's model a small family with IDs 1, 2, 3, 4
# Where 1 and 2 are parents of 3 and 4

# up_node_dict representation (mapping individuals to their parents)
up_node_dict = {
    3: {1: 1, 2: 1},  # 3 has parents 1 and 2, with degree 1
    4: {1: 1, 2: 1},  # 4 has parents 1 and 2, with degree 1
    1: {},            # 1 has no parents in this pedigree
    2: {}             # 2 has no parents in this pedigree
}

# Let's examine how this is structured
print("up_node_dict example:")
for person_id, parents in up_node_dict.items():
    parent_info = ", ".join([f"parent {p} (degree {d})" for p, d in parents.items()])
    print(f"Person {person_id} has {'parents: ' + parent_info if parent_info else 'no parents'}")

Now let's look at how Bonsai converts between up_node_dict and down_node_dict using the `reverse_node_dict` function:

In [None]:
# Import the function from v3.pedigrees
from utils.bonsaitree.bonsaitree.v3.pedigrees import reverse_node_dict

# Let's examine the source code
if not is_jupyterlite():
    view_source(reverse_node_dict)
else:
    print("Cannot view source code in JupyterLite environment")

In [None]:
# Now use the function to convert our up_node_dict to a down_node_dict
down_node_dict = reverse_node_dict(up_node_dict)

print("down_node_dict example:")
for person_id, children in down_node_dict.items():
    children_info = ", ".join([f"child {c} (degree {d})" for c, d in children.items()])
    print(f"Person {person_id} has {'children: ' + children_info if children_info else 'no children'}")

### 1.2 Representing Complex Relationships

Bonsai v3 represents relationships as tuples of the form `(up, down, num_ancs)`:

- `up`: Number of meioses up from individual 1 to common ancestor
- `down`: Number of meioses down from common ancestor to individual 2
- `num_ancs`: Number of common ancestors (1 for half relationships, 2 for full relationships)

Let's see how this works for a few example relationships:

In [None]:
# Define some common relationship tuples
relationships = {
    "identical/self": (0, 0, 2),
    "parent-child": (0, 1, 1),
    "child-parent": (1, 0, 1),
    "full sibling": (1, 1, 2),
    "half sibling": (1, 1, 1),
    "grandparent-grandchild": (0, 2, 1),
    "grandchild-grandparent": (2, 0, 1),
    "aunt/uncle-niece/nephew": (1, 2, 1),
    "niece/nephew-aunt/uncle": (2, 1, 1),
    "first cousin": (2, 2, 1),
    "first cousin once removed (down)": (2, 3, 1),
    "first cousin once removed (up)": (3, 2, 1),
    "second cousin": (3, 3, 1),
}

# Create a DataFrame to display the relationship tuples
rel_df = pd.DataFrame([(name, up, down, num_ancs) for name, (up, down, num_ancs) in relationships.items()],
                       columns=["Relationship", "Up", "Down", "Num Common Ancestors"])
display(rel_df)

Let's define a function to visualize a pedigree using networkx, which will help us understand these data structures better:

In [None]:
def visualize_pedigree(up_node_dict, title="Pedigree"):
    """Visualize a pedigree from an up_node_dict using networkx."""
    G = nx.DiGraph()
    
    # Add all nodes to the graph
    all_ids = set(up_node_dict.keys()).union(*[set(parents.keys()) for parents in up_node_dict.values()])
    
    # Create a color map (genotyped in blue, ungenotyped in gray)
    color_map = ['lightblue' if node_id > 0 else 'lightgray' for node_id in all_ids]
    labels = {node_id: str(node_id) for node_id in all_ids}
    
    # Add edges (from parent to child)
    edges = []
    for child, parents in up_node_dict.items():
        for parent in parents:
            edges.append((parent, child))
    
    G.add_edges_from(edges)
    
    # Create plot
    plt.figure(figsize=(10, 6))
    plt.title(title)
    pos = nx.spring_layout(G, seed=42)  # positions for all nodes
    
    # Draw nodes
    nx.draw(G, pos, with_labels=True, labels=labels, node_color=color_map, 
            node_size=800, font_weight='bold')
    
    # Draw edges
    nx.draw_networkx_edges(G, pos, width=1.0, alpha=0.5, arrows=True)
    
    plt.tight_layout()
    plt.show()
    
# Test the visualization function with our simple pedigree
visualize_pedigree(up_node_dict, "Simple Family Pedigree")

## Part 2: Navigating and Querying Pedigrees

Let's explore the key functions in Bonsai v3 for navigating pedigrees. These functions allow finding relatives, computing relationships, and extracting subsets of pedigrees.

### 2.1 Finding Relatives and Paths

Let's start with functions that find relatives and paths in a pedigree:

In [None]:
# Let's create a more complex pedigree for testing
# This models a three-generation family with grandparents, parents, and children
complex_pedigree = {
    # Third generation (children)
    7: {5: 1, 6: 1},  # 7 has parents 5 and 6
    8: {5: 1, 6: 1},  # 8 has parents 5 and 6
    9: {4: 1, -1: 1}, # 9 has parent 4 and an ungenotyped parent -1
    
    # Second generation (parents)
    4: {1: 1, 2: 1},  # 4 has parents 1 and 2
    5: {1: 1, 2: 1},  # 5 has parents 1 and 2 (full sibling of 4)
    6: {3: 1, -2: 1}, # 6 has parent 3 and an ungenotyped parent -2
    
    # First generation (grandparents)
    1: {},            # 1 has no parents in this pedigree
    2: {},            # 2 has no parents in this pedigree
    3: {},            # 3 has no parents in this pedigree
    
    # Ungenotyped individuals
    -1: {},           # -1 has no parents in this pedigree
    -2: {}            # -2 has no parents in this pedigree
}

# Visualize the complex pedigree
visualize_pedigree(complex_pedigree, "Complex Family Pedigree")

In [None]:
# Import necessary functions from v3.pedigrees
from utils.bonsaitree.bonsaitree.v3.pedigrees import (
    get_rel_set,
    get_all_paths,
    get_simple_rel_tuple
)

# For JupyterLite compatibility, let's define simplified versions of these functions
if is_jupyterlite():
    def get_rel_set(node_dict, i):
        """Find all ancestors of i if node_dict is an up_dict or all descendants if node_dict is a down_dict."""
        # Start with the node itself
        rel_set = {i}
        
        # If the node has no relatives in the node_dict, return just the node
        if i not in node_dict:
            return rel_set
        
        # For each direct relative, add it and recursively get its relatives
        for relative in node_dict.get(i, {}):
            rel_set.add(relative)
            rel_set.update(get_rel_set(node_dict, relative))
            
        return rel_set
    
    def get_all_paths(up_node_dict, i, j):
        """Find all paths between individuals i and j."""
        # Simplified implementation for JupyterLite
        # Get ancestors of i and j
        i_ancs = get_rel_set(up_node_dict, i)
        j_ancs = get_rel_set(up_node_dict, j)
        
        # Find common ancestors
        common_ancs = i_ancs.intersection(j_ancs)
        
        # Simple path representation
        if not common_ancs:
            return set(), set()
        
        # Just return a simple path through the first common ancestor
        common_anc = next(iter(common_ancs))
        path = (i, common_anc, j)
        
        return {path}, {common_anc}
    
    def get_simple_rel_tuple(up_node_dict, i, j):
        """Get relationship tuple (up, down, num_ancs) between individuals i and j."""
        # Handle self relationship
        if i == j:
            return (0, 0, 2)
        
        # Get paths between i and j
        paths, common_ancs = get_all_paths(up_node_dict, i, j)
        
        if not paths:
            return None
        
        # Simple logic: count up and down steps in the path
        path = next(iter(paths))
        
        # In our simplified model
        if len(path) == 3:  # i -> common_anc -> j
            return (1, 1, len(common_ancs))
        else:
            return None

In [None]:
# Find all ancestors of individual 7
ancestors_of_7 = get_rel_set(complex_pedigree, 7)
print(f"Ancestors of individual 7: {ancestors_of_7}")

# Find all paths between individuals 7 and 9
paths, common_ancestors = get_all_paths(complex_pedigree, 7, 9)
print(f"\
Paths between 7 and 9:")
for path in paths:
    print(f"  {' -> '.join(map(str, path))}")
print(f"Common ancestors: {common_ancestors}")

# Find the relationship between individuals 7 and 9
relationship = get_simple_rel_tuple(complex_pedigree, 7, 9)
print(f"\
Relationship between 7 and 9: {relationship}")

# Determine the relationship name based on the tuple
rel_names = {v: k for k, v in relationships.items()}
rel_name = rel_names.get(relationship, "Unknown")
print(f"This is a {rel_name} relationship")

### 2.2 Working with Pedigree Subsets

Let's explore functions that extract portions of pedigrees based on specific criteria:

In [None]:
# Import functions for pedigree subsetting
from utils.bonsaitree.bonsaitree.v3.pedigrees import (
    get_subdict,
    get_sub_up_node_dict,
    get_gt_id_set
)

# For JupyterLite compatibility
if is_jupyterlite():
    def get_subdict(dct, node):
        """Get the cone above/below node in a node dict."""
        import copy
        
        if node not in dct:
            return {}
            
        # Start with the node itself
        sub_dct = {}
        sub_dct[node] = copy.deepcopy(dct[node])
        
        # Add subdicts for all relatives of the node
        for n in dct[node]:
            n_dct = get_subdict(dct, n)
            if n_dct:
                sub_dct.update(n_dct)
                
        return sub_dct
    
    def get_sub_up_node_dict(up_dct, id_set):
        """Get subtree connecting all IDs in id_set."""
        # Simplified for JupyterLite - just combined subdicts for each ID
        result = {}
        for node_id in id_set:
            sub_dict = get_subdict(up_dct, node_id)
            for k, v in sub_dict.items():
                if k not in result:
                    result[k] = v
                else:
                    result[k].update(v)
        return result
    
    def get_gt_id_set(ped):
        """Get all genotyped IDs (positive) from the pedigree."""
        all_ids = set(ped.keys()).union(*[set(parents.keys()) for parents in ped.values()])
        return {i for i in all_ids if i > 0}

In [None]:
# Get the subset of the pedigree with individual 1 as the root
subset_pedigree_1 = get_subdict(complex_pedigree, 1)
print("Pedigree subset with individual 1 as root:")
for person, parents in subset_pedigree_1.items():
    print(f"  Person {person}: {parents}")

# Visualize this subset
visualize_pedigree(subset_pedigree_1, "Pedigree Subset: Individual 1 as Root")

# Get all genotyped individuals
genotyped_ids = get_gt_id_set(complex_pedigree)
print(f"\
Genotyped individuals: {genotyped_ids}")

# Get the subtree connecting individuals 7 and 9
subtree_7_9 = get_sub_up_node_dict(complex_pedigree, {7, 9})
print("\
Subtree connecting individuals 7 and 9:")
for person, parents in subtree_7_9.items():
    print(f"  Person {person}: {parents}")

# Visualize this subtree
visualize_pedigree(subtree_7_9, "Subtree Connecting Individuals 7 and 9")

## Part 3: Building and Modifying Pedigrees

In this section, we'll explore how to build and modify pedigrees programmatically using Bonsai's functions.

### 3.1 Adding and Deleting Nodes

In [None]:
# Import functions for modifying pedigrees
from utils.bonsaitree.bonsaitree.v3.pedigrees import (
    add_parent,
    delete_node,
    get_min_id,
    replace_ids
)

# For JupyterLite compatibility
if is_jupyterlite():
    def get_min_id(dct):
        """Get the minimal ID in a node dict."""
        all_ids = set(dct.keys()).union(*[set(parents.keys()) for parents in dct.values()])
        min_id = min(all_ids) if all_ids else 0
        return min(-1, min_id)  # ensure ID is negative
    
    def add_parent(node, up_dct, min_id=None):
        """Add an ungenotyped parent to node in up_dct."""
        import copy
        up_dct = copy.deepcopy(up_dct)
        
        if node not in up_dct:
            raise ValueError(f"Node {node} is not in up dct.")
            
        pid_dict = up_dct[node]
        if len(pid_dict) >= 2:
            return up_dct, None
            
        if min_id is None:
            min_id = get_min_id(up_dct)
            
        new_pid = min_id - 1
        up_dct[node][new_pid] = 1
        up_dct[new_pid] = {}
        
        return up_dct, new_pid
    
    def delete_node(dct, node):
        """Delete node from a node dict."""
        import copy
        new_dct = {}
        for k, v in dct.items():
            if k != node:
                new_dct[k] = {r: d for r, d in v.items() if r != node}
        return new_dct
    
    def replace_ids(rep_dct, dct):
        """Replace IDs in dct according to mapping in rep_dct."""
        if not isinstance(dct, dict):
            return dct
            
        new_dct = {}
        for k, v in dct.items():
            new_k = rep_dct.get(k, k)
            if isinstance(v, dict):
                new_v = {}
                for k2, v2 in v.items():
                    new_k2 = rep_dct.get(k2, k2)
                    new_v[new_k2] = v2
            else:
                new_v = v
            new_dct[new_k] = new_v
        return new_dct

In [None]:
# Start with a simple pedigree and modify it
import copy
pedigree = copy.deepcopy(up_node_dict)  # The simple pedigree we defined earlier

# Visualize the starting pedigree
visualize_pedigree(pedigree, "Starting Pedigree")

# 1. Add a parent to individual 1
modified_pedigree, new_parent_id = add_parent(1, pedigree)
print(f"Added parent {new_parent_id} to individual 1")

# Ensure the ungenotyped parent appears in the pedigree
if new_parent_id not in modified_pedigree:
    modified_pedigree[new_parent_id] = {}

# Visualize after adding a parent
visualize_pedigree(modified_pedigree, "Pedigree After Adding Parent to 1")

# 2. Delete a node (individual 4)
deleted_pedigree = delete_node(modified_pedigree, 4)
print("Deleted individual 4")

# Visualize after deleting a node
visualize_pedigree(deleted_pedigree, "Pedigree After Deleting Individual 4")

# 3. Replace IDs
rep_dict = {1: 101, 2: 102, 3: 103}
renamed_pedigree = replace_ids(rep_dict, deleted_pedigree)
print("Replaced IDs: 1→101, 2→102, 3→103")

# Visualize after replacing IDs
visualize_pedigree(renamed_pedigree, "Pedigree After Replacing IDs")

### 3.2 Working with Pedigree Relationships

Let's explore functions that help us work with relationships in a pedigree:

In [None]:
# Import relationship-related functions
from utils.bonsaitree.bonsaitree.v3.pedigrees import (
    get_rel_dict,
    get_mrca_set,
    get_sib_set
)

# For JupyterLite compatibility
if is_jupyterlite():
    def get_rel_dict(up_dct):
        """Get dict mapping each ID pair to their relationship tuple."""
        from itertools import combinations
        
        all_ids = set(up_dct.keys()).union(*[set(parents.keys()) for parents in up_dct.values()])
        
        rel_dict = {}
        for i in all_ids:
            rel_dict[i] = {}
            rel_dict[i][i] = (0, 0, 2)  # Self relationship
            
            for j in all_ids:
                if i != j:
                    rel = get_simple_rel_tuple(up_dct, i, j)
                    if rel:
                        rel_dict[i][j] = rel
        
        return rel_dict
    
    def get_mrca_set(up_dct, id_set):
        """Get the set of most recent common ancestors of id_set."""
        if len(id_set) == 1:
            return id_set
            
        # Simplified version that just finds common ancestors
        common_ancs = None
        for i in id_set:
            i_ancs = get_rel_set(up_dct, i)
            if common_ancs is None:
                common_ancs = i_ancs
            else:
                common_ancs &= i_ancs
                
        return common_ancs if common_ancs else set()
    
    def get_sib_set(up_dct, down_dct, node):
        """Get all siblings of node."""
        if node not in up_dct:
            return set()
            
        # Get parents of the node
        parents = set(up_dct[node].keys())
        
        # Get all children of these parents
        sibling_set = set()
        for parent in parents:
            if parent in down_dct:
                sibling_set.update(down_dct[parent].keys())
                
        # Remove the node itself
        sibling_set.discard(node)
        
        return sibling_set

In [None]:
# Work with our complex pedigree
pedigree = complex_pedigree

# Get the downdictionary for finding siblings
down_dict = reverse_node_dict(pedigree)

# Find siblings of individual 7
siblings_of_7 = get_sib_set(pedigree, down_dict, 7)
print(f"Siblings of individual 7: {siblings_of_7}")

# Find most recent common ancestors of individuals 7, 8, and 9
mrcas = get_mrca_set(pedigree, {7, 8, 9})
print(f"Most recent common ancestors of 7, 8, and 9: {mrcas}")

# Get all relationships in the pedigree
rel_dict = get_rel_dict(pedigree)

# Display a few interesting relationships
relationships_to_check = [(7, 9), (7, 4), (4, 9)]
print("\
Interesting relationships:")
for i, j in relationships_to_check:
    rel_tuple = rel_dict[i].get(j)
    rel_name = rel_names.get(rel_tuple, "Unknown")
    print(f"  {i} to {j}: {rel_tuple} ({rel_name})")

### 3.3 Implementing a simple pedigree constructor

Let's implement a simple function to build a pedigree from a list of parent-child relationships:

In [None]:
def build_pedigree_from_relationships(relationships):
    """
    Build an up_node_dict from a list of parent-child relationships.
    
    Args:
        relationships: List of tuples (child_id, parent1_id, parent2_id)
                      where parent2_id can be None if only one parent is known
                      
    Returns:
        up_node_dict: Dictionary representing the pedigree
    """
    pedigree = {}
    
    # Process each relationship
    for rel in relationships:
        child, parent1, parent2 = rel
        
        # Ensure child is in the pedigree
        if child not in pedigree:
            pedigree[child] = {}
        
        # Add first parent
        if parent1 is not None:
            pedigree[child][parent1] = 1
            if parent1 not in pedigree:
                pedigree[parent1] = {}
        
        # Add second parent if present
        if parent2 is not None:
            pedigree[child][parent2] = 1
            if parent2 not in pedigree:
                pedigree[parent2] = {}
    
    return pedigree

# Example relationships to build a new pedigree
# Format: (child_id, parent1_id, parent2_id)
relationships_list = [
    (101, None, None),     # 101 has no parents
    (102, None, None),     # 102 has no parents
    (103, 101, 102),       # 103 has parents 101 and 102
    (104, 101, 102),       # 104 has parents 101 and 102 (full sibling of 103)
    (105, None, None),     # 105 has no parents
    (106, 103, 105),       # 106 has parents 103 and 105
    (107, 104, -1),        # 107 has parent 104 and ungenotyped parent -1
    (108, 106, 107)        # 108 has parents 106 and 107 (consanguineous)
]

# Build the pedigree
new_pedigree = build_pedigree_from_relationships(relationships_list)

# Ensure ungenotyped parent appears in the pedigree
if -1 not in new_pedigree:
    new_pedigree[-1] = {}

# Visualize the pedigree
visualize_pedigree(new_pedigree, "Custom Built Pedigree")

This pedigree includes a consanguineous relationship - individual 108 has parents 106 and 107 who are related to each other. Let's examine the relationships in this pedigree:

In [None]:
# Calculate all relationships in the new pedigree
new_rel_dict = get_rel_dict(new_pedigree)

# Find the relationship between 106 and 107
rel_106_107 = new_rel_dict[106].get(107)
print(f"Relationship between 106 and 107: {rel_106_107}")

# Get all relationships to 108
relationships_to_108 = {
    person: rel_tuple 
    for person, rel_tuple in new_rel_dict[108].items() 
    if person != 108
}

# Create a DataFrame to show relationships
rel_df_108 = pd.DataFrame(
    [(person, up, down, num_ancs, rel_names.get((up, down, num_ancs), "Unknown")) 
     for person, (up, down, num_ancs) in relationships_to_108.items()],
    columns=["Person ID", "Up", "Down", "Num Common Ancestors", "Relationship Name"]
)

rel_df_108 = rel_df_108.sort_values(by=["Up", "Down"])
display(rel_df_108)

## Summary

In this lab, we explored the core pedigree data structures in the Bonsai v3 codebase. Key takeaways include:

1. **Fundamental Data Structures**: Bonsai represents pedigrees using two primary dictionary structures - `up_node_dict` (mapping individuals to their ancestors) and `down_node_dict` (mapping individuals to their descendants).

2. **Relationship Representation**: Relationships are represented as tuples of the form `(up, down, num_ancs)`, where `up` is the number of meioses up to the common ancestor, `down` is the number of meioses down from the common ancestor, and `num_ancs` is the number of common ancestors (1 for half relationships, 2 for full relationships).

3. **Navigation and Querying**: Bonsai provides a rich set of functions for navigating pedigrees, including finding ancestors, descendants, siblings, and computing relationships between individuals. Key functions include `get_rel_set()`, `get_all_paths()`, and `get_simple_rel_tuple()`.

4. **Building and Modification**: The codebase includes functions for building and modifying pedigrees, such as adding parents, deleting nodes, and replacing IDs. These functions allow for sophisticated manipulation of pedigree structures.

5. **Handling Consanguinity**: Bonsai's pedigree representation can handle complex scenarios like consanguinity (inbreeding), where individuals may be related through multiple paths.

Understanding these fundamental data structures and functions is essential for working with Bonsai v3, as they form the foundation upon which more complex algorithms for relationship inference and pedigree reconstruction are built.

In [None]:
# Convert this notebook to PDF using poetry
!poetry run jupyter nbconvert --to pdf Lab09_Pedigree_Data_Structures.ipynb

# Note: PDF conversion requires LaTeX to be installed on your system
# If you encounter errors, you may need to install it:
# On Ubuntu/Debian: sudo apt-get install texlive-xetex
# On macOS with Homebrew: brew install texlive