# Lab 2: Bonsai v3 Architecture Overview and Core Data Structures

## Overview

This lab introduces the architecture and core data structures of Bonsai v3. We'll explore the high-level design, key components, and basic operations of the system. Through practical exercises, you'll gain an understanding of:

1. The overall architecture and design philosophy of Bonsai v3
2. The up-node dictionary structure for representing pedigrees
3. Core modules and their responsibilities
4. Basic pedigree operations and manipulations

In [None]:
# Standard imports
import os
import sys
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
import networkx as nx
from IPython.display import display, HTML, Markdown
import inspect
import importlib
import pprint

sys.path.append(os.path.dirname(os.getcwd()))

# Cross-compatibility setup
from scripts_support.lab_cross_compatibility import setup_environment, is_jupyterlite, save_results, save_plot

# Set up environment-specific paths
DATA_DIR, RESULTS_DIR = setup_environment()

# Set visualization styles
plt.style.use('seaborn-v0_8-whitegrid')
sns.set_context("notebook")

In [None]:
# Setup Bonsai module paths
if not is_jupyterlite():
    # In local environment, add the utils directory to system path
    utils_dir = os.getenv('PROJECT_UTILS_DIR', os.path.join(os.path.dirname(DATA_DIR), 'utils'))
    bonsaitree_dir = os.path.join(utils_dir, 'bonsaitree')
    
    # Add to path if it exists and isn't already there
    if os.path.exists(bonsaitree_dir) and bonsaitree_dir not in sys.path:
        sys.path.append(bonsaitree_dir)
        print(f"Added {bonsaitree_dir} to sys.path")
else:
    # In JupyterLite, use a simplified approach
    print("⚠️ Running in JupyterLite: Some Bonsai functionality may be limited.")
    print("This notebook is primarily designed for local execution where the Bonsai codebase is available.")

In [None]:
# Helper functions for exploring modules
def display_module_classes(module_name):
    """Display classes and their docstrings from a module"""
    try:
        # Import the module
        module = importlib.import_module(module_name)
        
        # Find all classes
        classes = inspect.getmembers(module, inspect.isclass)
        
        # Filter classes defined in this module (not imported)
        classes = [(name, cls) for name, cls in classes if cls.__module__ == module_name]
        
        # Print info for each class
        for name, cls in classes:
            print(f"\
## {name}")
            
            # Get docstring
            doc = inspect.getdoc(cls)
            if doc:
                print(f"Docstring: {doc}")
            else:
                print("No docstring available")
            
            # Get methods
            methods = inspect.getmembers(cls, inspect.isfunction)
            if methods:
                print("\
Methods:")
                for method_name, method in methods:
                    if not method_name.startswith('_'):  # Skip private methods
                        print(f"- {method_name}")
    except ImportError as e:
        print(f"Error importing module {module_name}: {e}")
    except Exception as e:
        print(f"Error processing module {module_name}: {e}")

def display_module_functions(module_name):
    """Display functions and their docstrings from a module"""
    try:
        # Import the module
        module = importlib.import_module(module_name)
        
        # Find all functions
        functions = inspect.getmembers(module, inspect.isfunction)
        
        # Filter functions defined in this module (not imported)
        functions = [(name, func) for name, func in functions if func.__module__ == module_name]
        
        # Print info for each function
        for name, func in functions:
            if name.startswith('_'):  # Skip private functions
                continue
                
            print(f"\
## {name}")
            
            # Get signature
            sig = inspect.signature(func)
            print(f"Signature: {name}{sig}")
            
            # Get docstring
            doc = inspect.getdoc(func)
            if doc:
                print(f"Docstring: {doc}")
            else:
                print("No docstring available")
    except ImportError as e:
        print(f"Error importing module {module_name}: {e}")
    except Exception as e:
        print(f"Error processing module {module_name}: {e}")

def view_function_source(module_name, function_name):
    """Display the source code of a function"""
    try:
        # Import the module
        module = importlib.import_module(module_name)
        
        # Get the function
        func = getattr(module, function_name)
        
        # Get the source code
        source = inspect.getsource(func)
        
        # Print the source code
        from IPython.display import display, Markdown
        display(Markdown(f"```python\
{source}\
```"))
    except ImportError as e:
        print(f"Error importing module {module_name}: {e}")
    except AttributeError:
        print(f"Function {function_name} not found in module {module_name}")
    except Exception as e:
        print(f"Error processing function {function_name}: {e}")

## Bonsai v3 Architecture

Bonsai v3 is designed as a modular system for pedigree reconstruction from IBD data. Let's start by exploring the high-level structure of the codebase.

In [None]:
# Check for Bonsai v3 availability
try:
    from utils.bonsaitree.bonsaitree import v3
    print("✅ Successfully imported Bonsai v3 module")
    
    # List key modules in Bonsai v3
    v3_modules = [name for name in dir(v3) if not name.startswith('_')]
    print("\
Bonsai v3 modules:")
    for module in sorted(v3_modules):
        print(f"- {module}")
except ImportError as e:
    print(f"❌ Failed to import Bonsai v3 module: {e}")
    print("This lab requires access to the Bonsai v3 codebase.")
    print("Make sure you've properly set up your environment with the Bonsai repository.")
    
    # Provide a fallback list of key modules
    print("\
Key modules in Bonsai v3 include:")
    print("- bonsai.py: Main entry point and orchestration module")
    print("- ibd.py: IBD data management and processing")
    print("- likelihoods.py: Relationship likelihood calculation")
    print("- connections.py: Pedigree connection logic")
    print("- pedigrees.py: Pedigree representation and operations")
    print("- utils.py: Utility functions")
    print("- constants.py: System constants and configuration")

### Bonsai v3 Module Organization

Bonsai v3 is organized into several key modules, each with specific responsibilities:

1. **bonsai.py**: Main entry point and orchestration module
2. **ibd.py**: IBD data management and processing
3. **likelihoods.py**: Relationship likelihood calculation
4. **connections.py**: Pedigree connection logic
5. **pedigrees.py**: Pedigree representation and operations
6. **utils.py**: Utility functions
7. **constants.py**: System constants and configuration
8. **caching.py**: Performance optimization through caching
9. **exceptions.py**: Error handling and exceptions
10. **rendering.py**: Visualization and rendering functions

This modular design promotes separation of concerns and makes the system easier to understand and maintain.

### Exploring the Main Entry Point: bonsai.py

Let's examine the main entry point of Bonsai v3, the `bonsai.py` module:

In [None]:
try:
    # Display functions in the bonsai module
    display_module_functions('bonsaitree.bonsaitree.v3.bonsai')
except Exception as e:
    print(f"Could not display bonsai module functions: {e}")
    print("\
The main entry point in Bonsai v3 is the build_pedigree() function in bonsai.py.")
    print("This function orchestrates the entire pedigree reconstruction process, including:")
    print("- Processing IBD data")
    print("- Computing pairwise relationships")
    print("- Building small pedigrees")
    print("- Merging pedigrees into a final structure")

The main entry point for Bonsai v3 is the `build_pedigree()` function, which takes the following key inputs:

- `bio_info`: Biological metadata (age, sex, coverage)
- `unphased_ibd_seg_list`: Unphased IBD segments (from detectors like IBIS)
- `phased_ibd_seg_list`: Phased IBD segments (optional)

This function orchestrates the entire pedigree reconstruction process, delegating specific tasks to other modules.

## The Up-Node Dictionary: Bonsai's Central Data Structure

A key innovation in Bonsai v3 is the up-node dictionary, a compact and efficient representation of pedigree structures. Let's explore this data structure in detail.

In [None]:
# Example up-node dictionary
example_pedigree = {
    1000: {1001: 1, 1002: 1},  # Individual 1000 has parents 1001 and 1002
    1003: {1001: 1, 1002: 1},  # Individual 1003 has the same parents (siblings)
    1004: {-1: 1, -2: 1},      # Individual 1004 has inferred parents -1 and -2
    1005: {-1: 1, 1002: 1},    # Individual 1005 has one inferred parent and one known parent
    -1: {1006: 1, 1007: 1},    # Inferred individual -1 has parents 1006 and 1007
    # Empty dictionaries represent founder individuals with no recorded parents
    1001: {},
    1002: {},
    1006: {},
    1007: {},
    -2: {}
}

# Display the structure
print("Up-Node Dictionary Structure:\
")
pprint.pprint(example_pedigree)

### Key Features of the Up-Node Dictionary

The up-node dictionary has several important characteristics:

1. **Sparse Representation**: Only includes individuals with known relationships, making it memory-efficient
2. **Directed Structure**: Represents relationships flowing from parents to children
3. **ID Conventions**:
   - Positive IDs represent observed individuals
   - Negative IDs represent inferred (latent) ancestors
4. **Founder Representation**: Empty dictionaries represent individuals with no recorded parents
5. **Flexibility**: Can represent complex pedigree structures including half-siblings and multi-generational relationships

### Visualizing Up-Node Dictionaries

Let's create a function to visualize up-node dictionaries as pedigree graphs:

In [None]:
def visualize_up_node_dict(up_node_dict, title="Pedigree Visualization"):
    """Visualize a pedigree from an up-node dictionary"""
    # Create a directed graph
    G = nx.DiGraph()
    
    # Add nodes and edges
    for child, parents in up_node_dict.items():
        # Add the child node
        is_inferred = child < 0 if isinstance(child, int) else False
        G.add_node(child, inferred=is_inferred)
        
        # Add parent nodes and edges
        for parent in parents:
            is_parent_inferred = parent < 0 if isinstance(parent, int) else False
            G.add_node(parent, inferred=is_parent_inferred)
            G.add_edge(parent, child)  # Edge from parent to child
    
    # Position nodes using a hierarchical layout
    try:
        pos = nx.nx_agraph.graphviz_layout(G, prog='dot')
    except:
        # Fallback to a simple hierarchical layout if graphviz is not available
        pos = nx.spring_layout(G)  
    
    # Draw the pedigree
    plt.figure(figsize=(12, 10))
    
    # Draw regular and inferred nodes differently
    regular_nodes = [node for node in G.nodes() if not G.nodes[node]['inferred']]
    inferred_nodes = [node for node in G.nodes() if G.nodes[node]['inferred']]
    
    nx.draw_networkx_nodes(G, pos, nodelist=regular_nodes, node_color='skyblue', 
                         node_size=1000, node_shape='o')
    nx.draw_networkx_nodes(G, pos, nodelist=inferred_nodes, node_color='lightgray', 
                         node_size=1000, node_shape='s')
    
    # Draw edges
    nx.draw_networkx_edges(G, pos, arrows=True, arrowsize=20)
    
    # Add labels
    labels = {node: f"ID: {node}" for node in G.nodes()}
    nx.draw_networkx_labels(G, pos, labels=labels, font_size=10)
    
    plt.axis('off')
    plt.title(title)
    
    # Add a legend
    plt.figtext(0.15, 0.02, "● Regular individual (observed)", color='black', backgroundcolor='skyblue')
    plt.figtext(0.55, 0.02, "■ Inferred individual (latent)", color='black', backgroundcolor='lightgray')
    
    plt.tight_layout()
    plt.show()
    
    return G

# Visualize our example pedigree
G = visualize_up_node_dict(example_pedigree, "Example Pedigree Structure")

### Common Operations on Up-Node Dictionaries

Let's implement some basic operations that are commonly performed on up-node dictionaries:

In [None]:
def get_parents(up_node_dict, individual_id):
    """Get the parents of an individual in the pedigree"""
    if individual_id not in up_node_dict:
        return []
    
    return list(up_node_dict[individual_id].keys())

def get_children(up_node_dict, individual_id):
    """Get the children of an individual in the pedigree"""
    children = []
    for child_id, parents in up_node_dict.items():
        if individual_id in parents:
            children.append(child_id)
    return children

def get_siblings(up_node_dict, individual_id):
    """Get the siblings of an individual in the pedigree"""
    if individual_id not in up_node_dict:
        return []
    
    # Get the individual's parents
    parents = get_parents(up_node_dict, individual_id)
    if not parents:
        return []
    
    # Find all individuals who share the same parents
    siblings = []
    for child_id, child_parents in up_node_dict.items():
        if child_id == individual_id:
            continue
        
        # Check if this child has the same parents
        child_parent_set = set(child_parents.keys())
        if child_parent_set == set(parents):
            siblings.append(child_id)
    
    return siblings

def get_ancestors(up_node_dict, individual_id, max_generations=None):
    """Get all ancestors of an individual up to max_generations"""
    if individual_id not in up_node_dict:
        return set()
    
    ancestors = set()
    queue = [(individual_id, 0)]  # (id, generation)
    
    while queue:
        current_id, gen = queue.pop(0)
        
        # Check if we've reached the maximum generation depth
        if max_generations is not None and gen >= max_generations:
            continue
        
        # Get parents of the current individual
        parents = get_parents(up_node_dict, current_id)
        
        # Add parents to ancestors and queue
        for parent in parents:
            ancestors.add(parent)
            queue.append((parent, gen + 1))
    
    return ancestors

def get_descendants(up_node_dict, individual_id, max_generations=None):
    """Get all descendants of an individual up to max_generations"""
    descendants = set()
    queue = [(individual_id, 0)]  # (id, generation)
    
    while queue:
        current_id, gen = queue.pop(0)
        
        # Check if we've reached the maximum generation depth
        if max_generations is not None and gen >= max_generations:
            continue
        
        # Get children of the current individual
        children = get_children(up_node_dict, current_id)
        
        # Add children to descendants and queue
        for child in children:
            descendants.add(child)
            queue.append((child, gen + 1))
    
    return descendants

def add_individual(up_node_dict, individual_id, parent_ids=None):
    """Add a new individual to the pedigree"""
    if individual_id in up_node_dict:
        print(f"Warning: Individual {individual_id} already exists in the pedigree")
        return up_node_dict
    
    # Create a copy of the pedigree
    new_pedigree = up_node_dict.copy()
    
    # Add the individual with parents if specified
    if parent_ids:
        new_pedigree[individual_id] = {parent_id: 1 for parent_id in parent_ids}
    else:
        new_pedigree[individual_id] = {}  # Founder (no parents)
    
    return new_pedigree

In [None]:
# Test the operations on our example pedigree
print(f"Parents of 1000: {get_parents(example_pedigree, 1000)}")
print(f"Children of 1001: {get_children(example_pedigree, 1001)}")
print(f"Siblings of 1000: {get_siblings(example_pedigree, 1000)}")
print(f"Ancestors of 1004: {get_ancestors(example_pedigree, 1004)}")
print(f"Descendants of 1001: {get_descendants(example_pedigree, 1001)}")

### Modifying Pedigrees

Let's demonstrate how to modify pedigrees by adding new individuals:

In [None]:
# Add a new individual to our example pedigree
modified_pedigree = add_individual(example_pedigree, 1008, parent_ids=[1000, 1005])

# Verify the addition
print(f"Parents of new individual 1008: {get_parents(modified_pedigree, 1008)}")
print(f"Children of 1000 (should include 1008): {get_children(modified_pedigree, 1000)}")

# Visualize the modified pedigree
G_modified = visualize_up_node_dict(modified_pedigree, "Modified Pedigree with New Individual")

## Core Library Components

Now that we understand the central data structure, let's explore some of the core modules in Bonsai v3 to understand their roles and responsibilities.

### IBD Data Management: ibd.py

The `ibd.py` module handles IBD data processing and conversion.

In [None]:
try:
    # Display functions in the ibd module
    display_module_functions('bonsaitree.bonsaitree.v3.ibd')
except Exception as e:
    print(f"Could not display ibd module functions: {e}")
    print("\
The ibd.py module in Bonsai v3 handles IBD data processing, including:")
    print("- Converting between phased and unphased IBD formats")
    print("- Extracting IBD statistics (counts, lengths)")
    print("- Filtering and normalizing IBD segments")

### Relationship Likelihoods: likelihoods.py

The `likelihoods.py` module implements the statistical models for relationship inference.

In [None]:
try:
    # Display classes in the likelihoods module
    display_module_classes('bonsaitree.bonsaitree.v3.likelihoods')
except Exception as e:
    print(f"Could not display likelihoods module classes: {e}")
    print("\
The likelihoods.py module in Bonsai v3 implements relationship inference, including:")
    print("- The PwLogLike class for pairwise relationship likelihood calculation")
    print("- Statistical models for IBD length and count distributions")
    print("- Age-based relationship models")
    print("- Combined likelihood calculation from multiple evidence sources")

### Pedigree Connections: connections.py

The `connections.py` module handles the logic for connecting individuals in pedigrees.

In [None]:
try:
    # Display functions in the connections module
    display_module_functions('bonsaitree.bonsaitree.v3.connections')
except Exception as e:
    print(f"Could not display connections module functions: {e}")
    print("\
The connections.py module in Bonsai v3 handles pedigree connection logic, including:")
    print("- Finding optimal connection points between pedigrees")
    print("- Evaluating different connection configurations")
    print("- Creating the physical connections between pedigrees")
    print("- Handling the iterative merging process")

### Pedigree Operations: pedigrees.py

The `pedigrees.py` module provides operations for manipulating pedigree structures.

In [None]:
try:
    # Display functions in the pedigrees module
    display_module_functions('bonsaitree.bonsaitree.v3.pedigrees')
except Exception as e:
    print(f"Could not display pedigrees module functions: {e}")
    print("\
The pedigrees.py module in Bonsai v3 provides pedigree operations, including:")
    print("- Creating and manipulating up-node dictionaries")
    print("- Traversing pedigree structures")
    print("- Computing relationship properties")
    print("- Validating pedigree consistency")

## Data Flow in Bonsai v3

Let's visualize the high-level data flow in the Bonsai v3 system:

In [None]:
# Create a visualization of the Bonsai v3 data flow
plt.figure(figsize=(14, 10))

# Create a directed graph for the data flow
G = nx.DiGraph()

# Add nodes for each component
components = [
    "IBD Detector Output",
    "ibd.py",
    "IBD Statistics",
    "likelihoods.py",
    "Pairwise Relationships",
    "connections.py",
    "Small Pedigrees",
    "Merged Pedigrees",
    "Final Pedigree",
    "bonsai.py"
]

# Add nodes to the graph
for component in components:
    G.add_node(component)

# Add edges representing data flow
edges = [
    ("IBD Detector Output", "ibd.py"),
    ("ibd.py", "IBD Statistics"),
    ("IBD Statistics", "likelihoods.py"),
    ("likelihoods.py", "Pairwise Relationships"),
    ("Pairwise Relationships", "connections.py"),
    ("connections.py", "Small Pedigrees"),
    ("Small Pedigrees", "Merged Pedigrees"),
    ("Merged Pedigrees", "Final Pedigree"),
    ("bonsai.py", "ibd.py"),
    ("bonsai.py", "likelihoods.py"),
    ("bonsai.py", "connections.py"),
]

G.add_edges_from(edges)

# Define node colors
node_colors = {
    "IBD Detector Output": "#ffcc99",  # Input (orange)
    "ibd.py": "#99ccff",               # Processing module (blue)
    "IBD Statistics": "#ccffcc",        # Intermediate data (green)
    "likelihoods.py": "#99ccff",        # Processing module (blue)
    "Pairwise Relationships": "#ccffcc", # Intermediate data (green)
    "connections.py": "#99ccff",        # Processing module (blue)
    "Small Pedigrees": "#ccffcc",       # Intermediate data (green)
    "Merged Pedigrees": "#ccffcc",      # Intermediate data (green)
    "Final Pedigree": "#ffcccc",        # Output (red)
    "bonsai.py": "#ff99ff"              # Orchestration (purple)
}

# Set layout
pos = {
    "IBD Detector Output": (0, 5),
    "ibd.py": (2, 5),
    "IBD Statistics": (4, 5),
    "likelihoods.py": (6, 5),
    "Pairwise Relationships": (8, 5),
    "connections.py": (10, 5),
    "Small Pedigrees": (12, 5),
    "Merged Pedigrees": (14, 5),
    "Final Pedigree": (16, 5),
    "bonsai.py": (8, 2)
}

# Draw the graph
colors = [node_colors[node] for node in G.nodes()]
nx.draw(G, pos, with_labels=True, node_color=colors, node_size=2000, 
        font_size=10, font_weight='bold', arrowsize=20, width=2)

# Add a title
plt.title("Bonsai v3 Data Flow", fontsize=16)

# Add a legend
legend_elements = [
    plt.Line2D([0], [0], marker='o', color='w', markerfacecolor="#ffcc99", markersize=15, label='Input Data'),
    plt.Line2D([0], [0], marker='o', color='w', markerfacecolor="#99ccff", markersize=15, label='Processing Module'),
    plt.Line2D([0], [0], marker='o', color='w', markerfacecolor="#ccffcc", markersize=15, label='Intermediate Data'),
    plt.Line2D([0], [0], marker='o', color='w', markerfacecolor="#ffcccc", markersize=15, label='Output Data'),
    plt.Line2D([0], [0], marker='o', color='w', markerfacecolor="#ff99ff", markersize=15, label='Orchestration')
]
plt.legend(handles=legend_elements, loc='lower center')

plt.tight_layout()
plt.show()

## Pedigree Tracking Structures

In addition to the up-node dictionary, Bonsai v3 uses several tracking structures to manage pedigrees during the construction process:

1. **idx_to_up_dict_ll_list**: Maps pedigree indices to lists of (pedigree, log-likelihood) pairs
2. **id_to_idx**: Maps individual IDs to their pedigree indices
3. **idx_to_id_set**: Maps pedigree indices to sets of contained individual IDs

Let's implement a simplified version of these structures and demonstrate their usage:

In [None]:
def initialize_tracking_structures(individuals):
    """Initialize pedigree tracking structures"""
    # Create a separate pedigree for each individual
    idx_to_up_dict_ll_list = {}
    id_to_idx = {}
    idx_to_id_set = {}
    
    for i, individual_id in enumerate(individuals):
        # Create a single-person pedigree
        up_dict = {individual_id: {}}
        log_likelihood = 0.0  # Initial log-likelihood
        
        # Add to tracking structures
        idx_to_up_dict_ll_list[i] = [(up_dict, log_likelihood)]
        id_to_idx[individual_id] = i
        idx_to_id_set[i] = {individual_id}
    
    return idx_to_up_dict_ll_list, id_to_idx, idx_to_id_set

def merge_pedigrees(idx1, idx2, idx_to_up_dict_ll_list, id_to_idx, idx_to_id_set, connection_likelihood=0.0):
    """Merge two pedigrees in the tracking structures"""
    # Get the best pedigrees for each index
    up_dict1, ll1 = idx_to_up_dict_ll_list[idx1][0]  # Take the first (best) pedigree
    up_dict2, ll2 = idx_to_up_dict_ll_list[idx2][0]  # Take the first (best) pedigree
    
    # Merge the dictionaries
    merged_up_dict = {**up_dict1, **up_dict2}
    
    # Calculate the merged likelihood
    merged_ll = ll1 + ll2 + connection_likelihood
    
    # Create a new index for the merged pedigree
    new_idx = max(idx_to_up_dict_ll_list.keys()) + 1
    
    # Update tracking structures
    idx_to_up_dict_ll_list[new_idx] = [(merged_up_dict, merged_ll)]
    
    # Update the id_to_idx mapping for all individuals in both pedigrees
    merged_id_set = idx_to_id_set[idx1].union(idx_to_id_set[idx2])
    for individual_id in merged_id_set:
        id_to_idx[individual_id] = new_idx
    
    # Update the idx_to_id_set mapping
    idx_to_id_set[new_idx] = merged_id_set
    
    # Remove the old pedigrees from tracking structures
    del idx_to_up_dict_ll_list[idx1]
    del idx_to_up_dict_ll_list[idx2]
    del idx_to_id_set[idx1]
    del idx_to_id_set[idx2]
    
    return new_idx

In [None]:
# Demonstrate the tracking structures with our example individuals
individuals = [1000, 1001, 1002, 1003, 1004, 1005, 1006, 1007]

# Initialize tracking structures
idx_to_up_dict_ll_list, id_to_idx, idx_to_id_set = initialize_tracking_structures(individuals)

# Display initial state
print("Initial state of tracking structures:")
print("\
id_to_idx:")
pprint.pprint(id_to_idx)
print("\
idx_to_id_set:")
pprint.pprint(idx_to_id_set)
print("\
idx_to_up_dict_ll_list (simplified):")
for idx, pedigrees in idx_to_up_dict_ll_list.items():
    print(f"  Index {idx}: {len(pedigrees)} pedigree(s) with likelihood {pedigrees[0][1]:.2f}")

# Merge some pedigrees
print("\
Merging pedigrees for individuals 1000 and 1003...")
idx1 = id_to_idx[1000]
idx2 = id_to_idx[1003]
new_idx = merge_pedigrees(idx1, idx2, idx_to_up_dict_ll_list, id_to_idx, idx_to_id_set, connection_likelihood=5.0)

# Display merged state
print("\
State after merging:")
print("\
id_to_idx:")
pprint.pprint(id_to_idx)
print("\
idx_to_id_set:")
pprint.pprint(idx_to_id_set)
print("\
idx_to_up_dict_ll_list (simplified):")
for idx, pedigrees in idx_to_up_dict_ll_list.items():
    print(f"  Index {idx}: {len(pedigrees)} pedigree(s) with likelihood {pedigrees[0][1]:.2f}")

## Summary

In this lab, we've explored the architecture and core data structures of Bonsai v3.

Key takeaways include:

1. **System Architecture**: Bonsai v3 is organized into modular components, each with specific responsibilities
2. **Up-Node Dictionary**: The central data structure for representing pedigrees, with efficient support for various operations
3. **Core Modules**: Key modules include `bonsai.py`, `ibd.py`, `likelihoods.py`, `connections.py`, and `pedigrees.py`
4. **Data Flow**: The system processes IBD data through a series of transformations to produce pedigree structures
5. **Tracking Structures**: Additional data structures (`idx_to_up_dict_ll_list`, `id_to_idx`, `idx_to_id_set`) manage pedigrees during construction

This understanding of Bonsai v3's architecture provides the foundation for exploring the specific algorithms and techniques used in the system, which we'll cover in subsequent labs.

In [None]:
# Convert this notebook to PDF using poetry
!poetry run jupyter nbconvert --to pdf Lab01_IBD_and_Genealogy_Intro.ipynb

# Note: PDF conversion requires LaTeX to be installed on your system
# If you encounter errors, you may need to install it:
# On Ubuntu/Debian: sudo apt-get install texlive-xetex
# On macOS with Homebrew: brew install texlive