# Cosmograph Ingress Framework Tutorial

This notebook demonstrates how to use the modular ingress framework for `cosmograph.cosmo`.

## What are Ingress Functions?

Ingress functions are transformations applied to the kwargs dictionary before it's passed to cosmograph. They allow you to:

- Validate inputs
- Resolve data sources (load from files/URLs)
- Infer missing parameters using smart defaults
- Transform or enrich data
- Log and debug your data pipeline

## Setup

In [None]:
import pandas as pd
import numpy as np
from xcosmo import (
    cosmo,
    IngressPipeline,
    compose_ingresses,
    as_ingress,
    INGRESS_REGISTRY,
    list_ingresses,
    # Pre-built ingresses
    check_points_and_links_format,
    guess_point_xy_columns,
    infer_color_by_from_clusters,
    create_smart_defaults_pipeline,
)

## Example 1: Basic Usage with Smart Defaults

Let's create some sample data and use the smart defaults pipeline.

In [None]:
# Create sample points data
np.random.seed(42)
n_points = 100

points = pd.DataFrame({
    'x': np.random.randn(n_points),
    'y': np.random.randn(n_points),
    'title': [f'Node {i}' for i in range(n_points)],
    'cluster_id': np.random.randint(0, 5, n_points),
    'size': np.random.uniform(1, 10, n_points),
})

# Create sample links
n_links = 200
links = pd.DataFrame({
    'source': np.random.randint(0, n_points, n_links),
    'target': np.random.randint(0, n_points, n_links),
})

print(f"Points shape: {points.shape}")
print(f"Points columns: {list(points.columns)}")
print(f"\nLinks shape: {links.shape}")
print(f"Links columns: {list(links.columns)}")

### Without ingress - you need to specify everything:

In [None]:
# The old way - manually specify all parameters
# graph = cosmo(
#     points=points,
#     links=links,
#     point_x_by='x',
#     point_y_by='y',
#     point_label_by='title',
#     point_color_by='cluster_id',
#     point_size_by='size',
#     link_source_by='source',
#     link_target_by='target',
# )

### With smart defaults ingress - auto-infer parameters:

In [None]:
# The new way - let ingress infer smart defaults
smart_pipeline = create_smart_defaults_pipeline()

# This will automatically:
# - Detect x, y columns for positioning
# - Use 'title' for labels
# - Use 'cluster_id' for coloring
# - Use 'size' for point sizes
# - Detect 'source' and 'target' in links

graph = cosmo(
    points=points,
    links=links,
    ingress=smart_pipeline,
)

# Display the graph
graph

## Example 2: Composing Custom Ingress Pipelines

You can compose your own pipelines from individual ingress functions.

In [None]:
# List available ingresses by category
print("Available ingresses:")
for category in ['validation', 'resolution', 'parameter_resolution']:
    ingresses = list_ingresses(category=category)
    print(f"\n{category}:")
    for name in ingresses:
        print(f"  - {name}")

In [None]:
# Compose a custom pipeline
my_pipeline = compose_ingresses(
    check_points_and_links_format,  # Validate data types
    guess_point_xy_columns,         # Infer x/y columns
    infer_color_by_from_clusters,   # Infer coloring
    name="my_custom_pipeline"
)

print(f"Created pipeline: {my_pipeline}")
print(f"Ingress names: {my_pipeline.ingress_names}")

## Example 3: Creating Your Own Ingress Function

Use the `@as_ingress` decorator to create custom ingress functions.

In [None]:
@as_ingress
def add_node_degree(kwargs):
    """Add a 'degree' column to points based on link connectivity."""
    points = kwargs.get('points')
    links = kwargs.get('links')
    
    if points is None or links is None:
        return kwargs
    
    # Assume points have an 'id' column
    if 'id' not in points.columns:
        points['id'] = range(len(points))
    
    # Count connections
    source_col = kwargs.get('link_source_by', 'source')
    target_col = kwargs.get('link_target_by', 'target')
    
    if source_col in links.columns and target_col in links.columns:
        degree = pd.concat([
            links[source_col].value_counts(),
            links[target_col].value_counts()
        ]).groupby(level=0).sum()
        
        # Add to points
        points['degree'] = points['id'].map(degree).fillna(0)
        print(f"Added 'degree' column to points (range: {points['degree'].min()}-{points['degree'].max()})")
    
    return kwargs

# Test it
enriched_pipeline = compose_ingresses(
    create_smart_defaults_pipeline(),
    add_node_degree,
    name="enriched_pipeline"
)

graph = cosmo(
    points=points.copy(),
    links=links,
    ingress=enriched_pipeline,
    point_size_by='degree',  # Use the new column for sizing
)

graph

## Example 4: Debugging with Logging Ingress

In [None]:
from xcosmo import debug_ingress, log_ingress_call

# Create a pipeline with debugging
debug_pipeline = IngressPipeline(
    [
        debug_ingress,
        guess_point_xy_columns,
        debug_ingress,
        infer_color_by_from_clusters,
        debug_ingress,
    ],
    name="debug_pipeline",
    log_transforms=True,  # Enable detailed logging
)

# Run with logging
result_kwargs = debug_pipeline({
    'points': points.copy(),
    'links': links,
})

print("\nFinal kwargs keys:", list(result_kwargs.keys()))

## Example 5: Using the Registry

Register your custom ingress for reuse across projects.

In [None]:
@as_ingress(register=True, category="custom", name="highlight_outliers")
def highlight_outliers(kwargs):
    """Add a column marking outliers based on degree."""
    points = kwargs.get('points')
    if points is None or 'degree' not in points.columns:
        return kwargs
    
    # Mark top 10% as outliers
    threshold = points['degree'].quantile(0.9)
    points['is_outlier'] = points['degree'] > threshold
    
    print(f"Marked {points['is_outlier'].sum()} outliers (degree > {threshold:.1f})")
    return kwargs

# Now you can retrieve it by name
from xcosmo import get_ingress

outlier_ingress = get_ingress('highlight_outliers')
print(f"Retrieved ingress: {outlier_ingress}")

# Use in a pipeline
full_pipeline = compose_ingresses(
    create_smart_defaults_pipeline(),
    add_node_degree,
    outlier_ingress,
    name="full_analysis_pipeline"
)

graph = cosmo(
    points=points.copy(),
    links=links,
    ingress=full_pipeline,
)

graph

## Example 6: Conditional Ingress

Apply ingress only when certain conditions are met.

In [None]:
from xcosmo import conditional_ingress

# Only normalize if dataset is large
def has_large_points(kwargs):
    points = kwargs.get('points')
    return points is not None and len(points) > 50

conditional_normalize = conditional_ingress(
    condition=has_large_points,
    ingress=lambda kw: {**kw, 'point_size_scale': 0.5}  # Reduce size for large datasets
)

# Use it
adaptive_pipeline = compose_ingresses(
    create_smart_defaults_pipeline(),
    conditional_normalize,
    name="adaptive_pipeline"
)

graph = cosmo(
    points=points,
    links=links,
    ingress=adaptive_pipeline,
)

graph

## Summary

The ingress framework provides:

1. **Composability**: Chain multiple transformations using `compose_ingresses()` or `IngressPipeline`
2. **Reusability**: Register custom ingresses with `@as_ingress(register=True)`
3. **Smart Defaults**: Use pre-built pipelines like `create_smart_defaults_pipeline()`
4. **Debugging**: Enable logging with `log_transforms=True` or use `debug_ingress`
5. **Flexibility**: Create custom ingresses for your specific needs
6. **Validation**: Built-in validation ensures data integrity

### Best Practices

- Use `create_smart_defaults_pipeline()` as a starting point
- Add custom ingresses for domain-specific transformations
- Enable logging during development, disable in production
- Register frequently-used ingresses for reuse
- Validate your data early in the pipeline