# Architecture-Based Tenant Replication with Visualization

This notebook demonstrates the complete architecture-based replication workflow:

1. **Pattern Detection**: Analyze source tenant and detect architectural patterns
2. **Instance Selection**: Select connected architectural instances
3. **Target Graph Building**: Build target pattern graph from selected instances
4. **Visualization & Comparison**: Compare source vs target graphs

## Key Concepts

- **Pattern Graph**: Type-level aggregation of the instance resource graph
- **Architectural Instances**: Groups of resources sharing a **ResourceGroup** (common parent)
- **Goal**: Build target pattern graph that MATCHES source pattern graph structure
- **Spectral Distance**: Mathematical measure of structural similarity (lower = better)

## Setup

In [None]:
import sys
import os
from pathlib import Path
import matplotlib.pyplot as plt
import networkx as nx
from collections import Counter
import numpy as np

sys.path.insert(0, str(Path.cwd().parent))

from src.architecture_based_replicator import ArchitectureBasedReplicator

# Set up matplotlib
plt.style.use('seaborn-v0_8-darkgrid')
%matplotlib inline

print("‚úÖ Setup complete")

## Configuration

In [None]:
NEO4J_URI = os.getenv("NEO4J_URI", "bolt://localhost:7687")
NEO4J_USER = os.getenv("NEO4J_USER", "neo4j")
NEO4J_PASSWORD = os.getenv("NEO4J_PASSWORD", "neo4j123")

# Number of instances to select for demonstration (use None for all instances)
TARGET_INSTANCE_COUNT = 10

print(f"Neo4j URI: {NEO4J_URI}")
print(f"Target instance count: {TARGET_INSTANCE_COUNT}")

---
# Part 1: Pattern Detection & Instance Selection
---

## Step 1: Analyze Source Tenant and Detect Architectural Patterns

**How this works**:
1. Queries the **instance resource graph** (all individual resources in Neo4j)
2. Aggregates instance relationships by resource type
3. Creates **pattern graph** (type-level view)
4. Detects which architectural patterns exist in the pattern graph
5. Finds **instances** by grouping resources that share a ResourceGroup

In [None]:
replicator = ArchitectureBasedReplicator(
    neo4j_uri=NEO4J_URI,
    neo4j_user=NEO4J_USER,
    neo4j_password=NEO4J_PASSWORD
)

print("üîç Analyzing source tenant and detecting architectural patterns...")
analysis = replicator.analyze_source_tenant()

print(f"\nüìä Source Tenant:")
print(f"   Resource Types: {analysis['resource_types']}")
print(f"   Pattern Graph Edges: {analysis['pattern_graph_edges']}")
print(f"   Detected Patterns: {analysis['detected_patterns']}")
print(f"   Total Pattern Instances: {analysis.get('total_pattern_resources', 0)}")

print(f"\nüìê Detected Architectural Patterns (Connected Instances):")
for pattern_name, pattern_info in replicator.detected_patterns.items():
    instances = replicator.pattern_resources.get(pattern_name, [])
    total_resources = sum(len(instance) for instance in instances)
    print(f"  {pattern_name}:")
    print(f"    Connected Instances: {len(instances)}")
    print(f"    Total Resources: {total_resources}")
    print(f"    Completeness: {pattern_info['completeness']:.1%}")
    if instances:
        avg_size = total_resources / len(instances)
        print(f"    Avg Instance Size: {avg_size:.1f} resources")

## Step 2: Generate Replication Plan

In [None]:
print(f"üî® Generating replication plan for {TARGET_INSTANCE_COUNT} architectural instances...\n")

selected_pattern_instances, spectral_history = replicator.generate_replication_plan(
    target_instance_count=TARGET_INSTANCE_COUNT,
    hops=2,
)

print(f"\n‚úÖ Selected {TARGET_INSTANCE_COUNT} architectural instances")

# Count total resources across all selected instances
total_resources = 0
for pattern_name, instances in selected_pattern_instances:
    for instance in instances:
        total_resources += len(instance)

print(f"   Total resources: {total_resources}")

print(f"\nüì¶ Selected Patterns and Instances:")
for pattern_name, instances in selected_pattern_instances:
    pattern_total = sum(len(instance) for instance in instances)
    print(f"  {pattern_name}: {len(instances)} instances ({pattern_total} resources)")
    # Show first few instances
    for i, instance in enumerate(instances[:3], 1):
        resource_types = Counter(r['type'] for r in instance)
        type_summary = ', '.join([f"{count} {rtype}" for rtype, count in resource_types.most_common(3)])
        print(f"    Instance {i}: {len(instance)} resources ({type_summary})")

## Step 3: Build Target Pattern Graph

In [None]:
print("üîç Building target pattern graph from selected instances...")

# Flatten instances for graph building
flattened_instances = []
for pattern_name, instances in selected_pattern_instances:
    for instance in instances:
        flattened_instances.append((pattern_name, instance))

target_pattern_graph = replicator._build_target_pattern_graph_from_instances(
    flattened_instances
)

print(f"\nüìä Target Pattern Graph:")
print(f"   Resource Types: {target_pattern_graph.number_of_nodes()}")
print(f"   Pattern Edges: {target_pattern_graph.number_of_edges()}")
print(f"   Total Resources: {total_resources}")

print(f"\nüìä Source Pattern Graph (for comparison):")
print(f"   Resource Types: {replicator.source_pattern_graph.number_of_nodes()}")
print(f"   Pattern Edges: {replicator.source_pattern_graph.number_of_edges()}")

if target_pattern_graph.number_of_edges() > 0:
    print(f"\n‚úÖ SUCCESS: Target graph has {target_pattern_graph.number_of_edges()} edges!")
    print("\nEdge types:")
    edge_counter = Counter()
    for u, v, data in target_pattern_graph.edges(data=True):
        edge_key = (u, data.get('relationship'), v)
        edge_counter[edge_key] += data.get('frequency', 1)
    
    for (u, rel, v), freq in edge_counter.most_common(20):
        print(f"   {u} -{rel}-> {v} ({freq} times)")
else:
    print("\n‚ö†Ô∏è  No edges found - selected instances may not have direct resource connections")

---
# Part 2: Graph Comparison & Visualization
---

## Step 4: Compare Graph Statistics

In [None]:
source_graph = replicator.source_pattern_graph

print("üìä Detailed Graph Comparison:\n")
print(f"{'Metric':<30} {'Source':<15} {'Target':<15} {'Ratio'}")
print("=" * 70)

# Nodes
source_nodes = source_graph.number_of_nodes()
target_nodes = target_pattern_graph.number_of_nodes()
node_ratio = f"{target_nodes/source_nodes:.1%}" if source_nodes > 0 else "N/A"
print(f"{'Resource Types (nodes)':<30} {source_nodes:<15} {target_nodes:<15} {node_ratio}")

# Edges
source_edges = source_graph.number_of_edges()
target_edges = target_pattern_graph.number_of_edges()
edge_ratio = f"{target_edges/source_edges:.1%}" if source_edges > 0 else "N/A"
print(f"{'Pattern Edges':<30} {source_edges:<15} {target_edges:<15} {edge_ratio}")

# Density
source_density = nx.density(source_graph.to_undirected())
target_density = nx.density(target_pattern_graph.to_undirected()) if target_nodes > 0 else 0
print(f"{'Graph Density':<30} {source_density:<15.4f} {target_density:<15.4f}")

# Average degree
source_avg_degree = sum(dict(source_graph.degree()).values()) / source_nodes if source_nodes > 0 else 0
target_avg_degree = sum(dict(target_pattern_graph.degree()).values()) / target_nodes if target_nodes > 0 else 0
print(f"{'Average Degree':<30} {source_avg_degree:<15.2f} {target_avg_degree:<15.2f}")

# Spectral distance
spectral_distance = replicator._compute_spectral_distance(source_graph, target_pattern_graph)
print(f"\n{'Spectral Distance':<30} {spectral_distance:.4f}")
print("   (Lower is better, 0.0 = perfect match)")

## Step 5: Visualize Node Overlap

In [None]:
source_nodes_set = set(source_graph.nodes())
target_nodes_set = set(target_pattern_graph.nodes())

common_nodes = source_nodes_set.intersection(target_nodes_set)
source_only = source_nodes_set - target_nodes_set
target_only = target_nodes_set - source_nodes_set

print(f"üìä Node Overlap Analysis:\n")
print(f"   Common Resource Types: {len(common_nodes)} ({len(common_nodes)/len(source_nodes_set):.1%} of source)")
print(f"   Source-only Types: {len(source_only)}")
print(f"   Target-only Types: {len(target_only)}")

if common_nodes:
    print(f"\n   Common types: {', '.join(sorted(list(common_nodes)[:10]))}...")

# Venn diagram
fig, ax = plt.subplots(figsize=(10, 6))

categories = ['Source\nOnly', 'Common', 'Target\nOnly']
counts = [len(source_only), len(common_nodes), len(target_only)]
colors = ['#ff9999', '#66b3ff', '#99ff99']

bars = ax.bar(categories, counts, color=colors, alpha=0.7, edgecolor='black', linewidth=2)

# Add value labels on bars
for bar, count in zip(bars, counts):
    height = bar.get_height()
    ax.text(bar.get_x() + bar.get_width()/2., height,
            f'{count}\n({count/(len(source_nodes_set)+len(target_only)):.1%})',
            ha='center', va='bottom', fontsize=12, fontweight='bold')

ax.set_ylabel('Number of Resource Types', fontsize=12)
ax.set_title('Resource Type Overlap: Source vs Target Pattern Graphs', 
             fontsize=14, fontweight='bold', pad=20)
ax.grid(axis='y', alpha=0.3)

plt.tight_layout()
plt.show()

## Step 6: Side-by-Side Graph Visualization

In [None]:
# Limit to top nodes by degree for better visualization
TOP_N = 20

# Get top nodes from source graph
source_degrees = dict(source_graph.degree())
top_source_nodes = sorted(source_degrees.items(), key=lambda x: x[1], reverse=True)[:TOP_N]
source_subgraph = source_graph.subgraph([n for n, _ in top_source_nodes]).copy()

# Get top nodes from target graph (or all if fewer than TOP_N)
target_degrees = dict(target_pattern_graph.degree())
top_target_nodes = sorted(target_degrees.items(), key=lambda x: x[1], reverse=True)[:TOP_N]
target_subgraph = target_pattern_graph.subgraph([n for n, _ in top_target_nodes]).copy()

# Identify missing edges: edges in source but not in target
# Build sets of (source, target, relationship) tuples for comparison
source_edge_set = set()
for u, v, data in source_subgraph.edges(data=True):
    rel = data.get('relationship', 'UNKNOWN')
    source_edge_set.add((u, v, rel))

target_edge_set = set()
for u, v, data in target_subgraph.edges(data=True):
    rel = data.get('relationship', 'UNKNOWN')
    target_edge_set.add((u, v, rel))

# Edges that exist in source but not in target (only for nodes present in both graphs)
missing_edges = []
for u, v, rel in source_edge_set:
    # Only consider edges between nodes that exist in target's node set
    if u in target_nodes_set and v in target_nodes_set:
        if (u, v, rel) not in target_edge_set:
            missing_edges.append((u, v, rel))

print(f"üìä Edge Analysis:")
print(f"   Source subgraph edges: {source_subgraph.number_of_edges()}")
print(f"   Target subgraph edges: {target_subgraph.number_of_edges()}")
print(f"   Missing edges (in source, not in target): {len(missing_edges)}")

# Create figure with two subplots
fig, (ax1, ax2) = plt.subplots(1, 2, figsize=(24, 10))

# Function to draw a graph with edge highlighting
def draw_pattern_graph(G, ax, title, highlight_missing=False):
    if G.number_of_nodes() == 0:
        ax.text(0.5, 0.5, 'No nodes to display', 
                ha='center', va='center', transform=ax.transAxes, fontsize=16)
        ax.set_title(title, fontsize=14, fontweight='bold', pad=20)
        ax.axis('off')
        return
    
    # Layout
    pos = nx.spring_layout(G, k=2, iterations=50, seed=42)
    
    # Node sizes based on degree
    degrees = dict(G.degree())
    node_sizes = [degrees[node] * 100 + 200 for node in G.nodes()]
    
    # Node colors: highlight common nodes
    common = source_nodes_set.intersection(target_nodes_set)
    node_colors = ['#66b3ff' if node in common else '#ff9999' for node in G.nodes()]
    
    # Draw edges with highlighting for source graph
    if highlight_missing and missing_edges:
        # Draw regular edges first (gray)
        regular_edges = []
        for u, v, data in G.edges(data=True):
            rel = data.get('relationship', 'UNKNOWN')
            if (u, v, rel) not in missing_edges:
                regular_edges.append((u, v))
        
        if regular_edges:
            nx.draw_networkx_edges(G, pos, edgelist=regular_edges, alpha=0.2, 
                                  edge_color='gray', arrows=True, arrowsize=10, 
                                  width=1.5, connectionstyle='arc3,rad=0.1', ax=ax)
        
        # Draw missing edges in red (highlighted)
        missing_edge_list = []
        for u, v, rel in missing_edges:
            if G.has_edge(u, v):
                missing_edge_list.append((u, v))
        
        if missing_edge_list:
            nx.draw_networkx_edges(G, pos, edgelist=missing_edge_list, alpha=0.6, 
                                  edge_color='#FF6B6B', arrows=True, arrowsize=12, 
                                  width=3, connectionstyle='arc3,rad=0.1', ax=ax)
    else:
        # Draw all edges normally (for target graph)
        nx.draw_networkx_edges(G, pos, alpha=0.3, edge_color='#4CAF50', 
                              arrows=True, arrowsize=10, width=2,
                              connectionstyle='arc3,rad=0.1', ax=ax)
    
    # Draw nodes
    nx.draw_networkx_nodes(G, pos, node_size=node_sizes, node_color=node_colors,
                          alpha=0.9, edgecolors='black', linewidths=2, ax=ax)
    
    # Draw labels
    nx.draw_networkx_labels(G, pos, font_size=9, font_weight='bold', ax=ax)
    
    # Title with stats
    stats_text = f"{G.number_of_nodes()} types, {G.number_of_edges()} edges"
    if highlight_missing:
        stats_text += f"\n{len(missing_edges)} missing edges (red)"
    ax.set_title(f"{title}\n{stats_text}", fontsize=14, fontweight='bold', pad=20)
    ax.axis('off')

# Draw both graphs
draw_pattern_graph(source_subgraph, ax1, f"Source Pattern Graph (Top {TOP_N})", highlight_missing=True)
draw_pattern_graph(target_subgraph, ax2, f"Target Pattern Graph (Top {min(TOP_N, target_nodes)})", highlight_missing=False)

# Add legend
from matplotlib.patches import Patch
legend_elements = [
    Patch(facecolor='#66b3ff', edgecolor='black', label='Common Types'),
    Patch(facecolor='#ff9999', edgecolor='black', label='Unique Types'),
    Patch(facecolor='white', edgecolor='#FF6B6B', linewidth=3, label='Missing Edges (in source, not target)')
]
fig.legend(handles=legend_elements, loc='upper center', ncol=3, 
          fontsize=12, bbox_to_anchor=(0.5, 0.98))

plt.tight_layout(rect=[0, 0, 1, 0.96])
plt.show()

print(f"\nüìä Visualization shows top {TOP_N} nodes by degree from each graph")
print(f"   Blue nodes: Common resource types in both graphs")
print(f"   Red nodes: Unique to that graph")
print(f"   Red edges (source graph): Missing relationships not yet in target")
print(f"   Green edges (target graph): Captured relationships")

if missing_edges:
    print(f"\nüîç Top missing edge types (need more instances to capture):")
    missing_edge_types = Counter([rel for u, v, rel in missing_edges])
    for rel, count in missing_edge_types.most_common(5):
        print(f"   {rel}: {count} missing edges")

## Step 7: Edge Type Comparison

In [None]:
# Extract edge types from both graphs
def get_edge_types(G):
    edge_types = Counter()
    for u, v, data in G.edges(data=True):
        rel = data.get('relationship', 'UNKNOWN')
        edge_types[rel] += 1
    return edge_types

source_edge_types = get_edge_types(source_graph)
target_edge_types = get_edge_types(target_pattern_graph)

# Get all edge types
all_edge_types = set(source_edge_types.keys()) | set(target_edge_types.keys())

print("üìä Edge Type Comparison:\n")
print(f"{'Relationship Type':<30} {'Source':<15} {'Target':<15} {'Match'}")
print("=" * 75)

for edge_type in sorted(all_edge_types, key=lambda x: source_edge_types.get(x, 0), reverse=True)[:15]:
    source_count = source_edge_types.get(edge_type, 0)
    target_count = target_edge_types.get(edge_type, 0)
    match = "‚úì" if target_count > 0 else "‚úó"
    print(f"{edge_type:<30} {source_count:<15} {target_count:<15} {match}")

# Summary
common_edge_types = set(source_edge_types.keys()).intersection(set(target_edge_types.keys()))
print(f"\nCommon edge types: {len(common_edge_types)}/{len(source_edge_types)} ({len(common_edge_types)/len(source_edge_types):.1%})")

## Step 8: Spectral Distance Evolution

In [None]:
if spectral_history:
    fig, ax = plt.subplots(figsize=(12, 6))
    
    ax.plot(range(len(spectral_history)), spectral_history, 
            marker='o', linewidth=2, markersize=8, alpha=0.7, color='#2E86AB')
    
    ax.set_xlabel('Instance Selection Step', fontsize=12, fontweight='bold')
    ax.set_ylabel('Spectral Distance', fontsize=12, fontweight='bold')
    ax.set_title('Spectral Distance Evolution as Instances are Added\n(Lower = Better Match)', 
                 fontsize=14, fontweight='bold', pad=20)
    
    ax.grid(True, alpha=0.3)
    
    # Add horizontal line for final value
    ax.axhline(y=spectral_history[-1], color='red', linestyle='--', 
               alpha=0.5, linewidth=2, label=f'Final: {spectral_history[-1]:.4f}')
    
    # Add horizontal line for initial value
    ax.axhline(y=spectral_history[0], color='orange', linestyle='--', 
               alpha=0.5, linewidth=2, label=f'Initial: {spectral_history[0]:.4f}')
    
    ax.legend(fontsize=11, loc='best')
    ax.set_xlim(-0.5, len(spectral_history) - 0.5)
    
    plt.tight_layout()
    plt.show()
    
    print(f"\nüìà Spectral Distance Analysis:")
    print(f"   Initial distance: {spectral_history[0]:.4f}")
    print(f"   Final distance: {spectral_history[-1]:.4f}")
    print(f"   Change: {spectral_history[-1] - spectral_history[0]:.4f} ({(spectral_history[-1] - spectral_history[0])/spectral_history[0]:.1%})")
    print(f"   Min distance: {min(spectral_history):.4f} (at step {spectral_history.index(min(spectral_history))})")
else:
    print("‚ö†Ô∏è  No spectral history available")

---
# Summary & Interpretation
---

## Architecture-Based Approach

This approach operates at the **architectural instance layer** - selecting connected groups of resources:

### How It Works

1. **Detects architectural patterns**: Uses `ArchitecturalPatternAnalyzer.detect_patterns()` to identify pattern types

2. **Understands the relationship model**:
   - **Pattern graph**: Type-level aggregation of the instance resource graph
   - **Instance connections**: Resources are related through **shared parents** (ResourceGroup, Subscription)
   - **Direct edges**: Some resources have explicit edges (e.g., VirtualNetwork‚ÜíSubnet)
   - The instance graph creates the pattern graph through aggregation

3. **Finds connected instances**: Groups resources by their shared ResourceGroup:
   - Example: ResourceGroup "rg-prod-web" contains:
     - Web App (sites)
     - Storage Account (storageAccounts)
     - Application Insights (components)
   - These form an architectural instance of "Web Application" pattern

4. **Merges with direct connections**: Also includes resources connected by explicit edges:
   - VirtualNetwork and its Subnets (across ResourceGroups if needed)

5. **Selects instances iteratively**: Adds one architectural instance at a time to build target pattern graph

6. **Goal**: Build target pattern graph that MATCHES source pattern graph structure
   - Same resource types (nodes)
   - Same relationship patterns (edges)
   - Uses spectral comparison to measure similarity

### Key Insight

**The pattern graph is derived FROM the instance graph through type-level aggregation.**

When resources share a ResourceGroup or have direct connections, those instance-level relationships aggregate into type-level edges in the pattern graph.

### Interpretation Guide

**Node Coverage**: 
- High percentage of common nodes = Target captures key resource types from source
- Target should include most high-degree source nodes

**Edge Coverage**:
- Target edges / Source edges ratio shows relationship preservation
- Common edge types indicate structural similarity

**Spectral Distance**:
- Measures overall structural similarity (topology, connectivity patterns)
- Lower values = Better match
- Distance ‚Üí 0 means graphs are structurally equivalent

**Graph Density & Degree**:
- Similar density = Similar connectivity patterns
- Similar average degree = Similar resource interconnection
- Note: Target density often higher (it's a connected subgraph)

### Key Advantages

Compared to looking only for direct Resource‚ÜíResource edges:
- ‚úÖ Uses the actual relationship model (shared parents + direct edges)
- ‚úÖ Finds realistic architectural instances (resources in same ResourceGroup)
- ‚úÖ Target pattern graph has many edges (aggregated from instance relationships)
- ‚úÖ Preserves natural architectural groupings (how Azure organizes resources)
- ‚úÖ Realistic replication (creates coherent architectural units)