# NetSmith: Null Models and Statistical Testing

This notebook demonstrates null model generation and statistical testing in NetSmith.

## Overview

Null models help us understand whether observed network properties are significant or could have arisen by chance. NetSmith provides:
- **Configuration model** - Preserves degree sequence
- **Erdos-Renyi model** - Random graph with same n, m
- **Degree-preserving randomization** - Edge swapping while preserving degrees
- **Permutation tests** - Statistical significance testing


In [None]:
import numpy as np
from netsmith.core import Graph
from netsmith.core.nulls import null_models, permutation_tests
from netsmith.core.metrics import clustering, degree


## Create a Sample Graph

Let's create a graph to test against null models:


In [None]:
edges = [
    (0, 1), (1, 2), (2, 0),  # Triangle
    (2, 3), (3, 4),           # Path
]

graph = Graph(
    edges=edges,
    n_nodes=5,
    directed=False,
    weighted=False
)

print(f"Original graph: {graph.n_nodes} nodes, {graph.n_edges} edges")
print(f"Edges: {edges}")


## Configuration Model

The configuration model preserves the degree sequence while randomizing connections.


In [None]:
try:
    result = null_models(
        graph,
        method="configuration",
        n_samples=10,
        seed=42
    )
    null_graphs = result["graphs"]
    print(f"Generated {len(null_graphs)} null graphs using configuration model")
    
    # Compare original to null models
    original_deg = degree(graph)
    print(f"\nOriginal degree sequence: {original_deg}")
    
    print(f"\nFirst 3 null graphs:")
    for i, null_graph in enumerate(null_graphs[:3]):
        null_deg = degree(null_graph)
        print(f"  Null graph {i+1}: degrees {null_deg}, edges {null_graph.n_edges}")
        assert np.array_equal(sorted(original_deg), sorted(null_deg)), "Degree sequences should match!"
    
except Exception as e:
    print(f"Error: {e}")
    print("Make sure networkx is installed: pip install networkx")


## Erdos-Renyi Model

The Erdos-Renyi model creates random graphs with the same number of nodes and edges, but doesn't preserve the degree sequence.


In [None]:
try:
    result = null_models(
        graph,
        method="erdos_renyi",
        n_samples=10,
        seed=42
    )
    null_graphs = result["graphs"]
    print(f"Generated {len(null_graphs)} null graphs using Erdos-Renyi model")
    
    print(f"\nFirst 3 null graphs:")
    for i, null_graph in enumerate(null_graphs[:3]):
        print(f"  Null graph {i+1}: {null_graph.n_nodes} nodes, {null_graph.n_edges} edges")
        print(f"    Degree sequence: {degree(null_graph)}")
        
except Exception as e:
    print(f"Error: {e}")


## Permutation Tests

Permutation tests check if an observed network statistic is significantly different from what we'd expect by chance.


In [None]:
# Define a statistic to test: mean clustering coefficient
def mean_clustering(g):
    """Compute mean clustering coefficient."""
    clust = clustering(g)
    return float(np.mean(clust))

# Run permutation test
try:
    result = permutation_tests(
        graph,
        statistic=mean_clustering,
        n_permutations=100,
        seed=42
    )
    
    print(f"Permutation Test Results:")
    print(f"  Observed statistic: {result['statistic']:.3f}")
    print(f"  Null mean: {result['null_mean']:.3f}")
    print(f"  Null std: {result['null_std']:.3f}")
    print(f"  p-value: {result['p_value']:.3f}")
    print(f"  n_permutations: {result['n_permutations']}")
    
    print(f"\nInterpretation:")
    if result['p_value'] < 0.05:
        print(f"  Significant (p < 0.05): The observed clustering is unlikely to occur by chance")
    else:
        print(f"  Not significant (p >= 0.05): The observed clustering could occur by chance")
        
except Exception as e:
    print(f"Error: {e}")


## Visualizing Null Distributions

Let's compare the distribution of a statistic across null models:


In [None]:
try:
    import matplotlib.pyplot as plt
    
    # Generate null models and compute statistics
    result = null_models(graph, method="configuration", n_samples=50, seed=42)
    null_graphs = result["graphs"]
    
    original_clust = mean_clustering(graph)
    null_clustering = [mean_clustering(g) for g in null_graphs]
    
    # Plot histogram
    plt.figure(figsize=(10, 6))
    plt.hist(null_clustering, bins=15, alpha=0.7, label='Null distribution')
    plt.axvline(original_clust, color='red', linestyle='--', linewidth=2, label=f'Observed ({original_clust:.3f})')
    plt.xlabel('Mean Clustering Coefficient')
    plt.ylabel('Frequency')
    plt.title('Distribution of Mean Clustering in Null Models')
    plt.legend()
    plt.grid(alpha=0.3)
    plt.show()
    
    print(f"Observed: {original_clust:.3f}")
    print(f"Null mean: {np.mean(null_clustering):.3f}")
    print(f"Null std: {np.std(null_clustering):.3f}")
    
except ImportError:
    print("Matplotlib not available for plotting")
except Exception as e:
    print(f"Error: {e}")
