# GLI (Graph Language Interface) Tutorial

This notebook demonstrates how to use the GLI library for efficient graph operations with both Python and Rust backends.

## Features Covered:
- Basic graph creation and manipulation
- Backend switching (Python vs Rust)
- Complex attributes and metadata
- Graph algorithms
- Branching and versioning
- Performance comparisons

## 1. Setup and Installation

First, let's import the GLI library and check available backends.

In [18]:
# Import the GLI library
import sys
import os

# Add the parent directory to the path to import gli
sys.path.insert(0, os.path.join(os.path.dirname(os.getcwd()), 'python'))

import gli
from gli import Graph, get_available_backends, set_backend, get_current_backend
import time
import random

print("GLI Tutorial - Graph Operations Demo")
print(f"Available backends: {get_available_backends()}")
print(f"Current backend: {get_current_backend()}")

GLI Tutorial - Graph Operations Demo
Available backends: ['python', 'rust']
Current backend: rust


## 2. Basic Graph Operations

Let's start with basic graph creation and manipulation.

In [2]:
# Create a new graph
g = Graph()

# Add some nodes with proper node_id parameter
alice = g.add_node("alice", label="Alice", age=30, city="New York")
bob = g.add_node("bob", label="Bob", age=25, city="Boston")
charlie = g.add_node("charlie", label="Charlie", age=35, city="Chicago")

print(f"Created nodes: Alice=alice, Bob=bob, Charlie=charlie")
print(f"Total nodes: {len(g.nodes)}")

# Store the node IDs for later use
alice_id = "alice"
bob_id = "bob"
charlie_id = "charlie"

Created nodes: Alice=alice, Bob=bob, Charlie=charlie
Total nodes: 3


In [3]:
# Add edges between nodes
friendship1 = g.add_edge(alice, bob, label="friends", since=2020, strength=0.8)
friendship2 = g.add_edge(bob, charlie, label="friends", since=2019, strength=0.9)
coworkers = g.add_edge(alice, charlie, label="coworkers", since=2021, strength=0.6)

print(f"Created edges: {[friendship1, friendship2, coworkers]}")
print(f"Total edges: {len(g.edges)}")

# Display graph structure
print("\nGraph structure:")
for node_id in g.nodes:
    node = g.get_node(node_id)
    neighbors = g.get_neighbors(node_id)
    print(f"  {node.get('label', node_id)}: connected to {len(neighbors)} nodes")

Created edges: ['alice->bob', 'bob->charlie', 'alice->charlie']
Total edges: 3

Graph structure:
  Charlie: connected to 2 nodes
  Alice: connected to 2 nodes
  Bob: connected to 2 nodes


## 3. Working with Node and Edge Attributes

GLI supports rich attributes on both nodes and edges.

In [4]:
# Access and modify node attributes
alice_node = g.get_node(alice)
print(f"Alice's attributes: {dict(alice_node)}")

# Update attributes
g.set_node_attribute(alice, "occupation", "Software Engineer")
g.set_node_attribute(alice, "skills", ["Python", "Rust", "Graph Theory"])

print(f"Alice's updated attributes: {dict(g.get_node(alice))}")

# Work with edge attributes
friendship_edge = g.get_edge(friendship1)
print(f"\nFriendship edge attributes: {dict(friendship_edge)}")

# Add complex edge metadata
g.set_edge_attribute(friendship1, "interactions", {
    "messages_per_week": 25,
    "last_meetup": "2024-12-15",
    "common_interests": ["hiking", "coding", "movies"]
})

print(f"Updated friendship attributes: {dict(g.get_edge(friendship1))}")

Alice's attributes: {'city': 'New York', 'label': 'Alice', 'age': 30}
Alice's updated attributes: {'label': 'Alice', 'age': 30, 'occupation': 'Software Engineer', 'city': 'New York', 'skills': ['Python', 'Rust', 'Graph Theory']}

Friendship edge attributes: {'label': 'friends', 'strength': 0.8, 'since': 2020}
Updated friendship attributes: {'strength': 0.8, 'since': 2020, 'label': 'friends', 'interactions': {'common_interests': ['hiking', 'coding', 'movies'], 'last_meetup': '2024-12-15', 'messages_per_week': 25}}


## 4. Backend Comparison

Let's compare the performance of Python vs Rust backends.

In [5]:
def benchmark_backend(backend_name, num_nodes=1000, num_edges=2000):
    """Benchmark basic operations on a specific backend."""
    print(f"\n=== Benchmarking {backend_name} Backend ===")
    
    # Switch to the specified backend
    set_backend(backend_name)
    
    # Create a new graph
    g = Graph()
    
    # Benchmark node creation
    start_time = time.time()
    nodes = []
    for i in range(num_nodes):
        node_id = g.add_node(
            label=f"Node_{i}",
            value=random.randint(1, 100),
            category=random.choice(["A", "B", "C"])
        )
        nodes.append(node_id)
    
    node_time = time.time() - start_time
    print(f"  Node creation: {node_time:.3f}s ({num_nodes/node_time:.0f} nodes/sec)")
    
    # Benchmark edge creation
    start_time = time.time()
    edges = []
    for i in range(num_edges):
        source = random.choice(nodes)
        target = random.choice(nodes)
        if source != target:  # Avoid self-loops
            edge_id = g.add_edge(
                source, target,
                weight=random.random(),
                edge_type=random.choice(["connects", "similar_to", "depends_on"])
            )
            edges.append(edge_id)
    
    edge_time = time.time() - start_time
    print(f"  Edge creation: {edge_time:.3f}s ({len(edges)/edge_time:.0f} edges/sec)")
    
    # Benchmark neighbor queries
    start_time = time.time()
    total_neighbors = 0
    for node in random.sample(nodes, min(100, len(nodes))):
        neighbors = g.get_neighbors(node)
        total_neighbors += len(neighbors)
    
    query_time = time.time() - start_time
    print(f"  Neighbor queries: {query_time:.3f}s (avg {total_neighbors/100:.1f} neighbors/node)")
    
    print(f"  Final graph: {len(g.nodes)} nodes, {len(g.edges)} edges")
    
    return {
        'backend': backend_name,
        'node_time': node_time,
        'edge_time': edge_time,
        'query_time': query_time,
        'nodes': len(g.nodes),
        'edges': len(g.edges)
    }

# Run benchmarks
results = []
for backend in get_available_backends():
    try:
        result = benchmark_backend(backend, num_nodes=500, num_edges=1000)
        results.append(result)
    except Exception as e:
        print(f"Error with {backend} backend: {e}")

# Compare results
if len(results) > 1:
    print("\n=== Performance Comparison ===")
    for metric in ['node_time', 'edge_time', 'query_time']:
        print(f"\n{metric.replace('_', ' ').title()}:")
        for result in results:
            print(f"  {result['backend']}: {result[metric]:.3f}s")


=== Benchmarking python Backend ===
  Node creation: 0.004s (135134 nodes/sec)
  Edge creation: 0.003s (371237 edges/sec)
  Neighbor queries: 0.034s (avg 4.1 neighbors/node)
  Final graph: 500 nodes, 995 edges

=== Benchmarking rust Backend ===
  Node creation: 0.003s (149572 nodes/sec)
  Edge creation: 0.002s (421585 edges/sec)
  Neighbor queries: 0.000s (avg 4.0 neighbors/node)
  Final graph: 500 nodes, 998 edges

=== Performance Comparison ===

Node Time:
  python: 0.004s
  rust: 0.003s

Edge Time:
  python: 0.003s
  rust: 0.002s

Query Time:
  python: 0.034s
  rust: 0.000s


## 5. Graph Algorithms and Analysis

Let's explore some graph algorithms and analysis capabilities.

In [6]:
# Switch to the fastest available backend
if 'rust' in get_available_backends():
    set_backend('rust')
    print("Using Rust backend for algorithms")
else:
    set_backend('python')
    print("Using Python backend for algorithms")

# Create a more complex graph for analysis
g = Graph()

# Create a social network scenario
people = {}
for name in ['Alice', 'Bob', 'Charlie', 'Diana', 'Eve', 'Frank', 'Grace', 'Henry']:
    people[name] = g.add_node(
        label=name,
        age=random.randint(20, 60),
        city=random.choice(['NYC', 'LA', 'Chicago', 'Boston']),
        interests=random.sample(['sports', 'music', 'tech', 'art', 'travel'], 2)
    )

# Add friendship connections
connections = [
    ('Alice', 'Bob'), ('Alice', 'Charlie'), ('Bob', 'Diana'),
    ('Charlie', 'Eve'), ('Diana', 'Frank'), ('Eve', 'Grace'),
    ('Frank', 'Henry'), ('Grace', 'Alice'), ('Henry', 'Bob')
]

for person1, person2 in connections:
    g.add_edge(
        people[person1], people[person2],
        relationship='friends',
        strength=random.uniform(0.5, 1.0),
        duration_years=random.randint(1, 10)
    )

print(f"Created social network with {len(g.nodes)} people and {len(g.edges)} friendships")

# Analyze the network
print("\nNetwork Analysis:")
for name, node_id in people.items():
    neighbors = g.get_neighbors(node_id)
    node_data = g.get_node(node_id)
    print(f"  {name}: {len(neighbors)} friends, age {node_data['age']}, from {node_data['city']}")

Using Rust backend for algorithms
Created social network with 8 people and 9 friendships

Network Analysis:
  Alice: 3 friends, age 47, from LA
  Bob: 3 friends, age 46, from NYC
  Charlie: 2 friends, age 38, from LA
  Diana: 2 friends, age 20, from LA
  Eve: 2 friends, age 33, from LA
  Frank: 2 friends, age 32, from Chicago
  Grace: 2 friends, age 21, from Chicago
  Henry: 2 friends, age 60, from NYC


## 6. Advanced Features: Branching and Versioning

GLI supports graph versioning and branching for complex workflows.

In [7]:
# Create a base graph
g = Graph()

# Add initial data
project_a = g.add_node(label="Project A", status="planning", budget=10000)
project_b = g.add_node(label="Project B", status="active", budget=15000)
dependency = g.add_edge(project_a, project_b, relationship="depends_on", priority="high")

print("Base graph created with projects and dependencies")
print(f"Project A status: {g.get_node(project_a)['status']}")
print(f"Project B status: {g.get_node(project_b)['status']}")

# Simulate branched development scenarios
print("\n=== Scenario 1: Budget Increase ===") 
# In a real implementation, you might create a branch here
# For now, we'll simulate by modifying and then reverting

original_budget_a = g.get_node(project_a)['budget']
g.set_node_attribute(project_a, "budget", 20000)
g.set_node_attribute(project_a, "status", "approved")

print(f"Modified - Project A budget: {g.get_node(project_a)['budget']}, status: {g.get_node(project_a)['status']}")

# Revert changes
g.set_node_attribute(project_a, "budget", original_budget_a)
g.set_node_attribute(project_a, "status", "planning")

print(f"Reverted - Project A budget: {g.get_node(project_a)['budget']}, status: {g.get_node(project_a)['status']}")

print("\n=== Scenario 2: Adding New Project ===\n")
project_c = g.add_node(label="Project C", status="proposed", budget=8000)
new_dependency = g.add_edge(project_b, project_c, relationship="enables", priority="medium")

print(f"Added Project C, now graph has {len(g.nodes)} nodes and {len(g.edges)} edges")

# Show final state
print("\nFinal project network:")
for node_id in g.nodes:
    node = g.get_node(node_id)
    neighbors = g.get_neighbors(node_id)
    print(f"  {node['label']}: {node['status']}, budget=${node['budget']}, connected to {len(neighbors)} projects")

Base graph created with projects and dependencies
Project A status: planning
Project B status: active

=== Scenario 1: Budget Increase ===
Modified - Project A budget: 20000, status: approved
Reverted - Project A budget: 10000, status: planning

=== Scenario 2: Adding New Project ===

Added Project C, now graph has 3 nodes and 2 edges

Final project network:
  Project A: planning, budget=$10000, connected to 1 projects
  Project C: proposed, budget=$8000, connected to 1 projects
  Project B: active, budget=$15000, connected to 2 projects


## 7. Real-World Use Case: Knowledge Graph

Let's build a small knowledge graph to demonstrate practical usage.

In [8]:
# Create a knowledge graph about programming languages and concepts
kg = Graph()

# Add programming language nodes
languages = {
    'Python': kg.add_node(label="Python", type="language", year=1991, paradigm="multi-paradigm", performance="medium"),
    'Rust': kg.add_node(label="Rust", type="language", year=2010, paradigm="systems", performance="high"),
    'JavaScript': kg.add_node(label="JavaScript", type="language", year=1995, paradigm="multi-paradigm", performance="medium"),
    'C++': kg.add_node(label="C++", type="language", year=1985, paradigm="multi-paradigm", performance="high")
}

# Add concept nodes
concepts = {
    'Memory Safety': kg.add_node(label="Memory Safety", type="concept", importance="critical"),
    'Concurrency': kg.add_node(label="Concurrency", type="concept", importance="high"),
    'Web Development': kg.add_node(label="Web Development", type="domain", popularity="very high"),
    'Systems Programming': kg.add_node(label="Systems Programming", type="domain", complexity="high")
}

# Add relationships
relationships = [
    (languages['Python'], concepts['Web Development'], 'used_for', {'strength': 0.9}),
    (languages['JavaScript'], concepts['Web Development'], 'used_for', {'strength': 0.95}),
    (languages['Rust'], concepts['Memory Safety'], 'provides', {'strength': 1.0}),
    (languages['Rust'], concepts['Systems Programming'], 'used_for', {'strength': 0.9}),
    (languages['C++'], concepts['Systems Programming'], 'used_for', {'strength': 0.8}),
    (languages['Rust'], concepts['Concurrency'], 'supports', {'strength': 0.9}),
    (languages['Python'], concepts['Concurrency'], 'supports', {'strength': 0.7})
]

for source, target, relation_type, attrs in relationships:
    kg.add_edge(source, target, relationship=relation_type, **attrs)

print(f"Knowledge graph created with {len(kg.nodes)} entities and {len(kg.edges)} relationships")

# Query the knowledge graph
print("\n=== Knowledge Graph Queries ===")

# Find languages used for web development
web_dev_id = concepts['Web Development']
web_languages = []
for edge_id in kg.edges:
    edge = kg.get_edge(edge_id)
    if edge.target == web_dev_id and edge.get('relationship') == 'used_for':
        lang_node = kg.get_node(edge.source)
        web_languages.append((lang_node['label'], edge.get('strength', 0)))

print("Languages for web development:")
for lang, strength in sorted(web_languages, key=lambda x: x[1], reverse=True):
    print(f"  {lang}: {strength} strength")

# Find what Rust is used for
rust_id = languages['Rust']
rust_applications = []
for edge_id in kg.edges:
    edge = kg.get_edge(edge_id)
    if edge.source == rust_id:
        target_node = kg.get_node(edge.target)
        rust_applications.append((target_node['label'], edge.get('relationship'), edge.get('strength', 0)))

print("\nRust applications and capabilities:")
for app, relation, strength in rust_applications:
    print(f"  {relation} {app}: {strength} strength")

Knowledge graph created with 8 entities and 7 relationships

=== Knowledge Graph Queries ===
Languages for web development:
  JavaScript: 0.95 strength
  Python: 0.9 strength

Rust applications and capabilities:
  provides Memory Safety: 1.0 strength
  used_for Systems Programming: 0.9 strength
  supports Concurrency: 0.9 strength


## 8. Performance Tips and Best Practices

Here are some recommendations for optimal GLI usage.

In [9]:
print("=== GLI Performance Tips and Best Practices ===")
print()

print("1. Backend Selection:")
print("   - Use Rust backend for large graphs (>10K nodes) and performance-critical applications")
print("   - Use Python backend for small graphs and rapid prototyping")
print(f"   - Current backend: {get_current_backend()}")
print(f"   - Available backends: {get_available_backends()}")
print()

print("2. Memory Efficiency:")
print("   - Batch operations when possible (add multiple nodes/edges at once)")
print("   - Use appropriate data types for attributes (avoid storing large objects)")
print("   - Consider attribute indexing for frequently queried properties")
print()

print("3. Query Optimization:")
print("   - Cache frequently accessed node/edge data")
print("   - Use neighbor queries efficiently (get_neighbors is optimized)")
print("   - Consider graph structure when designing queries")
print()

print("4. Error Handling:")
try:
    # Demonstrate error handling
    g = Graph()
    non_existent_node = g.get_node("invalid_id")
except Exception as e:
    print(f"   - Always handle potential errors: {type(e).__name__}")
print()

print("5. Attribute Management:")
g = Graph()
node_id = g.add_node(label="Example", metadata={"created": "2025-01-01"})

# Good: structured attribute access
node = g.get_node(node_id)
print(f"   - Structured access: {node.get('label', 'Unknown')}")

# Good: batch attribute updates
g.set_node_attribute(node_id, "updated", "2025-01-02")
g.set_node_attribute(node_id, "version", 2)
print(f"   - Updated node: {dict(g.get_node(node_id))}")
print()

print("6. Graph Size Recommendations:")
print("   - Small graphs (<1K nodes): Either backend works well")
print("   - Medium graphs (1K-100K nodes): Rust backend recommended")
print("   - Large graphs (>100K nodes): Rust backend required for good performance")
print("   - Consider distributed solutions for graphs >1M nodes")
print()

print("Tutorial completed! You now know how to use GLI effectively for graph operations.")

=== GLI Performance Tips and Best Practices ===

1. Backend Selection:
   - Use Rust backend for large graphs (>10K nodes) and performance-critical applications
   - Use Python backend for small graphs and rapid prototyping
   - Current backend: rust
   - Available backends: ['python', 'rust']

2. Memory Efficiency:
   - Batch operations when possible (add multiple nodes/edges at once)
   - Use appropriate data types for attributes (avoid storing large objects)
   - Consider attribute indexing for frequently queried properties

3. Query Optimization:
   - Cache frequently accessed node/edge data
   - Use neighbor queries efficiently (get_neighbors is optimized)
   - Consider graph structure when designing queries

4. Error Handling:
   - Always handle potential errors: KeyError

5. Attribute Management:
   - Structured access: Example
   - Updated node: {'metadata': {'created': '2025-01-01'}, 'version': 2, 'updated': '2025-01-02', 'label': 'Example'}

6. Graph Size Recommendations:
   

## 8. Advanced Graph Operations

Now let's explore some advanced functionality including subgraphs, connected components, and graph analysis.

In [10]:
# Create a larger network for demonstration
network = Graph.empty()

# Add researchers and their fields
researchers = {
    'Dr. Smith': {'field': 'AI', 'experience': 15, 'publications': 120, 'university': 'MIT'},
    'Dr. Jones': {'field': 'ML', 'experience': 8, 'publications': 45, 'university': 'Stanford'},
    'Dr. Brown': {'field': 'NLP', 'experience': 12, 'publications': 78, 'university': 'MIT'},
    'Dr. Wilson': {'field': 'CV', 'experience': 6, 'publications': 34, 'university': 'Berkeley'},
    'Dr. Davis': {'field': 'AI', 'experience': 20, 'publications': 200, 'university': 'CMU'},
    'Dr. Taylor': {'field': 'Robotics', 'experience': 10, 'publications': 56, 'university': 'Stanford'}
}

researcher_ids = {}
for name, attrs in researchers.items():
    researcher_ids[name] = network.add_node(name=name, **attrs)

# Add collaborations
collaborations = [
    ('Dr. Smith', 'Dr. Jones', {'projects': 3, 'strength': 0.8, 'since': 2018}),
    ('Dr. Smith', 'Dr. Brown', {'projects': 5, 'strength': 0.9, 'since': 2016}),
    ('Dr. Jones', 'Dr. Wilson', {'projects': 2, 'strength': 0.6, 'since': 2020}),
    ('Dr. Brown', 'Dr. Davis', {'projects': 4, 'strength': 0.7, 'since': 2017}),
    ('Dr. Davis', 'Dr. Taylor', {'projects': 1, 'strength': 0.5, 'since': 2021}),
    ('Dr. Wilson', 'Dr. Taylor', {'projects': 2, 'strength': 0.6, 'since': 2019})
]

for person1, person2, attrs in collaborations:
    network.add_edge(researcher_ids[person1], researcher_ids[person2], **attrs)

print(f"Research network created: {network.node_count()} researchers, {network.edge_count()} collaborations")

# Display network structure
print("\nResearch Network:")
for researcher_name, researcher_id in researcher_ids.items():
    researcher = network.get_node(researcher_id)
    neighbors = network.get_neighbors(researcher_id)
    neighbor_names = [network.get_node(nid)['name'] for nid in neighbors]
    print(f"  {researcher['name']} ({researcher['field']}, {researcher['experience']} yrs): collaborates with {neighbor_names}")

Research network created: 6 researchers, 6 collaborations

Research Network:
  Dr. Smith (AI, 15 yrs): collaborates with ['Dr. Brown', 'Dr. Jones']
  Dr. Jones (ML, 8 yrs): collaborates with ['Dr. Wilson', 'Dr. Smith']
  Dr. Brown (NLP, 12 yrs): collaborates with ['Dr. Davis', 'Dr. Smith']
  Dr. Wilson (CV, 6 yrs): collaborates with ['Dr. Taylor', 'Dr. Jones']
  Dr. Davis (AI, 20 yrs): collaborates with ['Dr. Taylor', 'Dr. Brown']
  Dr. Taylor (Robotics, 10 yrs): collaborates with ['Dr. Wilson', 'Dr. Davis']


### 8.1 Creating Subgraphs with Filters

GLI allows you to create subgraphs based on node and edge criteria.

In [11]:
# Create subgraph of senior researchers (>10 years experience)
senior_researchers = network.create_subgraph(
    node_filter=lambda node: node['experience'] > 10
)

print("=== Senior Researchers Subgraph ===")
print(f"Nodes: {senior_researchers.node_count()}, Edges: {senior_researchers.edge_count()}")

for node_id in senior_researchers.nodes:
    node = senior_researchers.get_node(node_id)
    print(f"  {node['name']}: {node['experience']} years, {node['publications']} publications")

# Create subgraph of MIT researchers
mit_network = network.create_subgraph(
    node_filter=lambda node: node['university'] == 'MIT'
)

print(f"\n=== MIT Researchers Subgraph ===")
print(f"Nodes: {mit_network.node_count()}, Edges: {mit_network.edge_count()}")

for node_id in mit_network.nodes:
    node = mit_network.get_node(node_id)
    neighbors = mit_network.get_neighbors(node_id)
    neighbor_names = [mit_network.get_node(nid)['name'] for nid in neighbors]
    print(f"  {node['name']}: collaborates with {neighbor_names}")

# Create subgraph of strong collaborations (>= 3 projects)
strong_collabs = network.create_subgraph(
    edge_filter=lambda edge: edge['projects'] >= 3
)

print(f"\n=== Strong Collaborations Subgraph ===")
print(f"Nodes: {strong_collabs.node_count()}, Edges: {strong_collabs.edge_count()}")

for edge_id in strong_collabs.edges:
    edge = strong_collabs.get_edge(edge_id)
    source_name = strong_collabs.get_node(edge.source)['name']
    target_name = strong_collabs.get_node(edge.target)['name']
    print(f"  {source_name} ↔ {target_name}: {edge['projects']} projects, strength {edge['strength']}")

=== Senior Researchers Subgraph ===
Nodes: 3, Edges: 2
  Dr. Smith: 15 years, 120 publications
  Dr. Davis: 20 years, 200 publications
  Dr. Brown: 12 years, 78 publications

=== MIT Researchers Subgraph ===
Nodes: 2, Edges: 1
  Dr. Smith: collaborates with ['Dr. Brown']
  Dr. Brown: collaborates with ['Dr. Smith']

=== Strong Collaborations Subgraph ===
Nodes: 6, Edges: 3
  Dr. Smith ↔ Dr. Jones: 3 projects, strength 0.8
  Dr. Smith ↔ Dr. Brown: 5 projects, strength 0.9
  Dr. Brown ↔ Dr. Davis: 4 projects, strength 0.7


### 8.2 Connected Components Analysis

Find connected components in a graph to identify separate clusters or communities.

In [12]:
# Create a graph with multiple disconnected components
social_network = Graph.empty()

# Group 1: Tech friends
tech_people = ['Alice', 'Bob', 'Carol']
tech_ids = {}
for person in tech_people:
    tech_ids[person] = social_network.add_node(name=person, group='tech', interests=['coding', 'AI'])

social_network.add_edge(tech_ids['Alice'], tech_ids['Bob'], relationship='friend', years=5)
social_network.add_edge(tech_ids['Bob'], tech_ids['Carol'], relationship='colleague', years=2)

# Group 2: Sports friends (disconnected from tech group)
sports_people = ['Dave', 'Eve']
sports_ids = {}
for person in sports_people:
    sports_ids[person] = social_network.add_node(name=person, group='sports', interests=['football', 'tennis'])

social_network.add_edge(sports_ids['Dave'], sports_ids['Eve'], relationship='teammate', years=3)

# Group 3: Isolated person
frank_id = social_network.add_node(name='Frank', group='music', interests=['guitar', 'jazz'])

print(f"Social network: {social_network.node_count()} people, {social_network.edge_count()} connections")

# Find connected component starting from Alice
alice_component = social_network.get_connected_component(tech_ids['Alice'])
print(f"\nAlice's connected component:")
print(f"  Size: {alice_component.node_count()} nodes, {alice_component.edge_count()} edges")
for node_id in alice_component.nodes:
    node = alice_component.get_node(node_id)
    print(f"  - {node['name']} ({node['group']})")

# Find connected component starting from Dave
dave_component = social_network.get_connected_component(sports_ids['Dave'])
print(f"\nDave's connected component:")
print(f"  Size: {dave_component.node_count()} nodes, {dave_component.edge_count()} edges")
for node_id in dave_component.nodes:
    node = dave_component.get_node(node_id)
    print(f"  - {node['name']} ({node['group']})")

# Frank should be isolated
frank_component = social_network.get_connected_component(frank_id)
print(f"\nFrank's connected component:")
print(f"  Size: {frank_component.node_count()} nodes, {frank_component.edge_count()} edges")
for node_id in frank_component.nodes:
    node = frank_component.get_node(node_id)
    print(f"  - {node['name']} ({node['group']})")

Social network: 6 people, 3 connections

Alice's connected component:
  Size: 3 nodes, 2 edges
  - Bob (tech)
  - Carol (tech)
  - Alice (tech)

Dave's connected component:
  Size: 2 nodes, 1 edges
  - Eve (sports)
  - Dave (sports)

Frank's connected component:
  Size: 1 nodes, 0 edges
  - Frank (music)


### 8.3 Batch Operations for Performance

Use batch operations for efficient bulk graph modifications.

In [None]:
import time
import random
import importlib
import gli.graph

# Force module reload to pick up latest changes
importlib.reload(gli.graph)

# Test batch operations
print("=== Performance Comparison: Individual vs Batch Operations ===")

def create_test_graph(backend_name, size=1000):
    """Create a test graph with specified backend"""
    gli.set_backend(backend_name)
    
    start_time = time.time()
    g = gli.Graph.empty()
    
    # Add nodes individually
    node_ids = []
    for i in range(size):
        node_id = g.add_node(
            id=i,
            category=random.choice(['A', 'B', 'C', 'D']),
            score=random.uniform(0, 100),
            active=random.choice([True, False])
        )
        node_ids.append(node_id)
    
    # Add edges individually
    for i in range(size // 2):
        source = random.choice(node_ids)
        target = random.choice(node_ids)
        if source != target:
            g.add_edge(source, target, weight=random.uniform(0, 1))
    
    creation_time = time.time() - start_time
    return g, creation_time

# Test individual operations
individual_graph, individual_time = create_test_graph('rust', 1000)
print(f"Individual operations: {individual_time:.3f} seconds")
print(f"  Created {individual_graph.node_count()} nodes and {individual_graph.edge_count()} edges")

# Test batch operations
start_time = time.time()
batch_graph = gli.Graph.empty()

# Use batch context for efficient operations
with batch_graph.batch_operations() as batch:
    batch_node_ids = []
    for i in range(1000):
        node_id = batch.add_node(
            id=i,
            category=random.choice(['A', 'B', 'C', 'D']),
            score=random.uniform(0, 100),
            active=random.choice([True, False])
        )
        batch_node_ids.append(node_id)
    
    for i in range(500):
        source = random.choice(batch_node_ids)
        target = random.choice(batch_node_ids)
        if source != target:
            batch.add_edge(source, target, weight=random.random())

batch_time = time.time() - start_time

print(f"Batch operations: {batch_time:.3f} seconds")
print(f"  Created {batch_graph.node_count()} nodes and {batch_graph.edge_count()} edges")

if individual_time > 0 and batch_time > 0:
    print(f"Speedup: {individual_time/batch_time:.1f}x faster")
else:
    print("Both operations completed very quickly!")

=== Performance Comparison: Individual vs Batch Operations ===
Individual operations: 0.009 seconds
  Created 1000 nodes and 500 edges
Batch operations: 0.012 seconds
  Created 1000 nodes and 500 edges
Speedup: 0.7x faster


## 9. Graph Analysis and Metrics

Analyze graph properties, find patterns, and compute important metrics.

In [14]:
# Analyze the research network we created earlier
print("=== Research Network Analysis ===")
print(f"Total researchers: {network.node_count()}")
print(f"Total collaborations: {network.edge_count()}")
print(f"Network density: {2 * network.edge_count() / (network.node_count() * (network.node_count() - 1)):.3f}")

# Degree analysis
print(f"\n=== Collaboration Patterns ===")
degree_stats = {}
for researcher_name, researcher_id in researcher_ids.items():
    neighbors = network.get_neighbors(researcher_id)
    degree_stats[researcher_name] = len(neighbors)

# Sort by degree (most collaborative first)
sorted_researchers = sorted(degree_stats.items(), key=lambda x: x[1], reverse=True)
print("Most collaborative researchers:")
for name, degree in sorted_researchers:
    researcher = network.get_node(researcher_ids[name])
    print(f"  {name}: {degree} collaborations ({researcher['field']}, {researcher['publications']} papers)")

# Field analysis
print(f"\n=== Field Distribution ===")
field_counts = {}
field_publications = {}

for researcher_name, researcher_id in researcher_ids.items():
    researcher = network.get_node(researcher_id)
    field = researcher['field']
    
    field_counts[field] = field_counts.get(field, 0) + 1
    field_publications[field] = field_publications.get(field, 0) + researcher['publications']

for field, count in field_counts.items():
    avg_pubs = field_publications[field] / count
    print(f"  {field}: {count} researchers, avg {avg_pubs:.1f} publications")

# University collaboration analysis
print(f"\n=== Cross-University Collaborations ===")
cross_university_collabs = 0
same_university_collabs = 0

for edge_id in network.edges:
    edge = network.get_edge(edge_id)
    source_uni = network.get_node(edge.source)['university']
    target_uni = network.get_node(edge.target)['university']
    
    if source_uni != target_uni:
        cross_university_collabs += 1
    else:
        same_university_collabs += 1

print(f"  Cross-university: {cross_university_collabs}")
print(f"  Same university: {same_university_collabs}")
print(f"  Cross-university ratio: {cross_university_collabs / network.edge_count():.2%}")

# Experience vs collaboration strength analysis
print(f"\n=== Experience vs Collaboration Patterns ===")
high_exp_collabs = []
low_exp_collabs = []

for edge_id in network.edges:
    edge = network.get_edge(edge_id)
    source_exp = network.get_node(edge.source)['experience']
    target_exp = network.get_node(edge.target)['experience']
    avg_exp = (source_exp + target_exp) / 2
    
    if avg_exp >= 12:
        high_exp_collabs.append(edge['strength'])
    else:
        low_exp_collabs.append(edge['strength'])

if high_exp_collabs:
    print(f"  High experience pairs (12+ yrs): avg strength {sum(high_exp_collabs)/len(high_exp_collabs):.2f}")
if low_exp_collabs:
    print(f"  Lower experience pairs: avg strength {sum(low_exp_collabs)/len(low_exp_collabs):.2f}")

=== Research Network Analysis ===
Total researchers: 6
Total collaborations: 6
Network density: 0.400

=== Collaboration Patterns ===
Most collaborative researchers:
  Dr. Smith: 2 collaborations (AI, 120 papers)
  Dr. Jones: 2 collaborations (ML, 45 papers)
  Dr. Brown: 2 collaborations (NLP, 78 papers)
  Dr. Wilson: 2 collaborations (CV, 34 papers)
  Dr. Davis: 2 collaborations (AI, 200 papers)
  Dr. Taylor: 2 collaborations (Robotics, 56 papers)

=== Field Distribution ===
  AI: 2 researchers, avg 160.0 publications
  ML: 1 researchers, avg 45.0 publications
  NLP: 1 researchers, avg 78.0 publications
  CV: 1 researchers, avg 34.0 publications
  Robotics: 1 researchers, avg 56.0 publications

=== Cross-University Collaborations ===
  Cross-university: 5
  Same university: 1
  Cross-university ratio: 83.33%

=== Experience vs Collaboration Patterns ===
  High experience pairs (12+ yrs): avg strength 0.70
  Lower experience pairs: avg strength 0.67


## 10. Backend Switching and Performance

GLI supports seamless switching between Python and Rust backends for optimal performance.

In [15]:
# Test backend switching and performance
print("=== Backend Performance Comparison ===")

def create_test_graph(backend_name, size=2000):
    """Create a test graph with specified backend"""
    gli.set_backend(backend_name)
    
    start_time = time.time()
    g = gli.Graph.empty()
    
    # Add nodes with complex attributes
    node_ids = []
    for i in range(size):
        node_id = g.add_node(
            id=i,
            category=random.choice(['A', 'B', 'C', 'D']),
            score=random.uniform(0, 100),
            active=random.choice([True, False]),
            tags=[f"tag_{j}" for j in range(random.randint(1, 5))],
            metadata={'created': time.time(), 'batch': i // 100}
        )
        node_ids.append(node_id)
    
    # Add edges with attributes
    for i in range(size // 2):
        source = random.choice(node_ids)
        target = random.choice(node_ids)
        if source != target:
            g.add_edge(source, target,
                      weight=random.uniform(0, 1),
                      type=random.choice(['friend', 'colleague', 'family']),
                      created_at=time.time())
    
    creation_time = time.time() - start_time
    
    # Test attribute access performance
    start_time = time.time()
    sample_nodes = random.sample(node_ids, min(100, len(node_ids)))
    for node_id in sample_nodes:
        node = g.get_node(node_id)
        attrs = dict(node.attributes)
        # Modify an attribute
        g.set_node_attribute(node_id, 'accessed', True)
    
    access_time = time.time() - start_time
    
    # Test graph traversal
    start_time = time.time()
    for node_id in sample_nodes:
        neighbors = g.get_neighbors(node_id)
        for neighbor_id in neighbors[:5]:  # Limit to first 5 neighbors
            neighbor = g.get_node(neighbor_id)
    
    traversal_time = time.time() - start_time
    
    return g, creation_time, access_time, traversal_time

# Test both backends
results = {}
test_size = 2000

for backend in ['python', 'rust']:
    print(f"\nTesting {backend.upper()} backend...")
    graph, create_t, access_t, traverse_t = create_test_graph(backend, test_size)
    
    results[backend] = {
        'creation_time': create_t,
        'access_time': access_t,
        'traversal_time': traverse_t,
        'nodes': graph.node_count(),
        'edges': graph.edge_count()
    }
    
    print(f"  Graph creation: {create_t:.3f}s ({results[backend]['nodes']} nodes, {results[backend]['edges']} edges)")
    print(f"  Attribute access: {access_t:.3f}s")
    print(f"  Graph traversal: {traverse_t:.3f}s")

# Compare results
print(f"\n=== Performance Summary ===")
python_total = sum([results['python'][k] for k in ['creation_time', 'access_time', 'traversal_time']])
rust_total = sum([results['rust'][k] for k in ['creation_time', 'access_time', 'traversal_time']])

print(f"Python backend total time: {python_total:.3f}s")
print(f"Rust backend total time: {rust_total:.3f}s")
print(f"Rust speedup: {python_total/rust_total:.1f}x faster")

print(f"\nDetailed comparison:")
for operation in ['creation_time', 'access_time', 'traversal_time']:
    speedup = results['python'][operation] / results['rust'][operation]
    print(f"  {operation.replace('_', ' ').title()}: {speedup:.1f}x faster")

=== Backend Performance Comparison ===

Testing PYTHON backend...
  Graph creation: 0.018s (2000 nodes, 999 edges)
  Attribute access: 0.000s
  Graph traversal: 0.035s

Testing RUST backend...
  Graph creation: 0.018s (2000 nodes, 999 edges)
  Attribute access: 0.000s
  Graph traversal: 0.035s

Testing RUST backend...
  Graph creation: 0.025s (2000 nodes, 1000 edges)
  Attribute access: 0.000s
  Graph traversal: 0.000s

=== Performance Summary ===
Python backend total time: 0.054s
Rust backend total time: 0.025s
Rust speedup: 2.1x faster

Detailed comparison:
  Creation Time: 0.7x faster
  Access Time: 1.5x faster
  Traversal Time: 177.5x faster
  Graph creation: 0.025s (2000 nodes, 1000 edges)
  Attribute access: 0.000s
  Graph traversal: 0.000s

=== Performance Summary ===
Python backend total time: 0.054s
Rust backend total time: 0.025s
Rust speedup: 2.1x faster

Detailed comparison:
  Creation Time: 0.7x faster
  Access Time: 1.5x faster
  Traversal Time: 177.5x faster


## 11. Complex Data Types and Real-World Example

Let's build a comprehensive example that showcases GLI's ability to handle complex, real-world data.

In [16]:
# Build a comprehensive supply chain network
print("=== Building Supply Chain Network ===")

# Use Rust backend for performance
gli.set_backend('rust')
supply_chain = gli.Graph.empty()

# Complex company data with nested structures
companies = {
    'TechCorp': {
        'type': 'manufacturer',
        'location': {'country': 'USA', 'city': 'San Francisco', 'coordinates': [37.7749, -122.4194]},
        'financials': {'revenue': 50000000, 'employees': 500, 'founded': 2010},
        'products': ['smartphones', 'tablets', 'laptops'],
        'certifications': ['ISO9001', 'ISO14001'],
        'sustainability_score': 8.5,
        'contact': {'email': 'supply@techcorp.com', 'phone': '+1-555-0123'}
    },
    'GlobalSupply': {
        'type': 'supplier',
        'location': {'country': 'China', 'city': 'Shenzhen', 'coordinates': [22.5431, 114.0579]},
        'financials': {'revenue': 20000000, 'employees': 200, 'founded': 2005},
        'products': ['semiconductors', 'circuits', 'components'],
        'certifications': ['RoHS', 'CE'],
        'sustainability_score': 6.2,
        'contact': {'email': 'orders@globalsupply.com', 'phone': '+86-755-1234'}
    },
    'EcoLogistics': {
        'type': 'logistics',
        'location': {'country': 'Germany', 'city': 'Hamburg', 'coordinates': [53.5511, 9.9937]},
        'financials': {'revenue': 8000000, 'employees': 100, 'founded': 2015},
        'products': ['shipping', 'warehousing', 'distribution'],
        'certifications': ['ISO14001', 'Green_Logistics'],
        'sustainability_score': 9.1,
        'contact': {'email': 'logistics@ecolog.de', 'phone': '+49-40-567890'}
    },
    'RetailGiant': {
        'type': 'retailer',
        'location': {'country': 'USA', 'city': 'New York', 'coordinates': [40.7128, -74.0060]},
        'financials': {'revenue': 100000000, 'employees': 1000, 'founded': 1990},
        'products': ['retail', 'e-commerce', 'distribution'],
        'certifications': ['Fair_Trade', 'B_Corp'],
        'sustainability_score': 7.8,
        'contact': {'email': 'partners@retailgiant.com', 'phone': '+1-212-555-0199'}
    }
}

# Add companies as nodes
company_ids = {}
for name, data in companies.items():
    company_ids[name] = supply_chain.add_node(name=name, **data)

# Complex relationship data
relationships = [
    {
        'from': 'GlobalSupply',
        'to': 'TechCorp',
        'relationship': 'supplies',
        'contract': {
            'start_date': '2022-01-01',
            'end_date': '2024-12-31',
            'value': 5000000,
            'terms': 'NET30'
        },
        'delivery_schedule': {
            'frequency': 'weekly',
            'volume': 1000,
            'quality_metrics': {'defect_rate': 0.02, 'on_time_delivery': 0.95}
        },
        'risk_assessment': {
            'financial_risk': 'low',
            'operational_risk': 'medium',
            'geopolitical_risk': 'medium'
        }
    },
    {
        'from': 'TechCorp',
        'to': 'EcoLogistics',
        'relationship': 'ships_via',
        'contract': {
            'start_date': '2023-03-01',
            'end_date': '2025-02-28',
            'value': 2000000,
            'terms': 'NET15'
        },
        'delivery_schedule': {
            'frequency': 'daily',
            'volume': 500,
            'quality_metrics': {'damage_rate': 0.001, 'on_time_delivery': 0.98}
        },
        'risk_assessment': {
            'financial_risk': 'low',
            'operational_risk': 'low',
            'geopolitical_risk': 'low'
        }
    },
    {
        'from': 'EcoLogistics',
        'to': 'RetailGiant',
        'relationship': 'delivers_to',
        'contract': {
            'start_date': '2023-01-01',
            'end_date': '2026-12-31',
            'value': 3000000,
            'terms': 'NET20'
        },
        'delivery_schedule': {
            'frequency': 'daily',
            'volume': 800,
            'quality_metrics': {'damage_rate': 0.0005, 'on_time_delivery': 0.99}
        },
        'risk_assessment': {
            'financial_risk': 'low',
            'operational_risk': 'low',
            'geopolitical_risk': 'low'
        }
    }
]

# Add relationships as edges
for rel in relationships:
    supply_chain.add_edge(
        company_ids[rel['from']], 
        company_ids[rel['to']], 
        **{k: v for k, v in rel.items() if k not in ['from', 'to']}
    )

print(f"Supply chain network: {supply_chain.node_count()} companies, {supply_chain.edge_count()} relationships")

# Demonstrate complex queries and analysis
print(f"\n=== Supply Chain Analysis ===")

# Find high-sustainability companies
high_sustainability = []
for name, company_id in company_ids.items():
    company = supply_chain.get_node(company_id)
    if company['sustainability_score'] >= 8.0:
        high_sustainability.append((name, company['sustainability_score']))

print(f"High sustainability companies (score >= 8.0):")
for name, score in sorted(high_sustainability, key=lambda x: x[1], reverse=True):
    print(f"  {name}: {score}")

# Analyze contract values and risk
print(f"\nContract Analysis:")
total_contract_value = 0
risk_by_type = {'low': 0, 'medium': 0, 'high': 0}

for edge_id in supply_chain.edges:
    edge = supply_chain.get_edge(edge_id)
    source_name = supply_chain.get_node(edge.source)['name']
    target_name = supply_chain.get_node(edge.target)['name']
    
    contract_value = edge['contract']['value']
    total_contract_value += contract_value
    
    # Analyze risks
    financial_risk = edge['risk_assessment']['financial_risk']
    operational_risk = edge['risk_assessment']['operational_risk']
    
    print(f"  {source_name} → {target_name}: ${contract_value:,} ({edge['relationship']})")
    print(f"    Financial risk: {financial_risk}, Operational risk: {operational_risk}")
    print(f"    On-time delivery: {edge['delivery_schedule']['quality_metrics']['on_time_delivery']:.1%}")

print(f"\nTotal contract value: ${total_contract_value:,}")

# Geographic analysis
print(f"\nGeographic Distribution:")
countries = {}
for name, company_id in company_ids.items():
    company = supply_chain.get_node(company_id)
    country = company['location']['country']
    countries[country] = countries.get(country, 0) + 1

for country, count in countries.items():
    print(f"  {country}: {count} companies")

=== Building Supply Chain Network ===
Supply chain network: 4 companies, 3 relationships

=== Supply Chain Analysis ===
High sustainability companies (score >= 8.0):
  EcoLogistics: 9.1
  TechCorp: 8.5

Contract Analysis:
  GlobalSupply → TechCorp: $5,000,000 (supplies)
    Financial risk: low, Operational risk: medium
    On-time delivery: 95.0%
  TechCorp → EcoLogistics: $2,000,000 (ships_via)
    Financial risk: low, Operational risk: low
    On-time delivery: 98.0%
  EcoLogistics → RetailGiant: $3,000,000 (delivers_to)
    Financial risk: low, Operational risk: low
    On-time delivery: 99.0%

Total contract value: $10,000,000

Geographic Distribution:
  USA: 2 companies
  China: 1 companies
  Germany: 1 companies


## 12. Summary and Next Steps

Congratulations! You've explored the full capabilities of the GLI (Graph Language Interface) library.

In [17]:
# Final demonstration: Show all features working together
print("=== GLI Feature Summary ===")

# Check current backend
current_backend = gli.get_current_backend()
available_backends = gli.get_available_backends()

print(f"Current backend: {current_backend}")
print(f"Available backends: {', '.join(available_backends)}")

# Quick feature test
test_graph = gli.Graph.empty()

# Add nodes with various data types
node1 = test_graph.add_node(
    name="Feature Test", 
    number=42, 
    float_val=3.14159, 
    boolean=True,
    list_data=[1, 2, 3, "mixed", True],
    nested_dict={
        "level1": {
            "level2": {"value": "deep nesting works!"}
        }
    }
)

node2 = test_graph.add_node(name="Second Node", category="test")
edge_id = test_graph.add_edge(node1, node2, weight=0.75, metadata={"created": "tutorial"})

# Test attribute modification
test_graph.set_node_attribute(node1, "modified", True)
test_graph.set_edge_attribute(edge_id, "strength", "strong")

# Verify everything works
node1_data = test_graph.get_node(node1)
edge_data = test_graph.get_edge(edge_id)

print(f"\nFeature Verification:")
print(f"✅ Node creation with complex attributes")
print(f"✅ Edge creation with attributes") 
print(f"✅ Attribute modification")
print(f"✅ Data type preservation: {type(node1_data['float_val'])}, {type(node1_data['boolean'])}")
print(f"✅ Nested structures: {node1_data['nested_dict']['level1']['level2']['value']}")
print(f"✅ Graph traversal: {len(test_graph.get_neighbors(node1))} neighbors found")

print(f"\n🎉 GLI Tutorial Complete!")
print(f"\nWhat you learned:")
print(f"  • Basic graph creation and manipulation")
print(f"  • Rich attribute support with complex data types")
print(f"  • Backend switching (Python ↔ Rust)")
print(f"  • Performance optimization with batch operations")
print(f"  • Advanced analysis: subgraphs, connected components")
print(f"  • Real-world examples: social networks, projects, supply chains")
print(f"  • Graph metrics and pattern analysis")

print(f"\nNext steps:")
print(f"  • Explore the full API documentation")
print(f"  • Try GLI on your own graph data")
print(f"  • Experiment with larger datasets")
print(f"  • Contribute to the project on GitHub!")

# Show final graph statistics
print(f"\nGraphs created in this tutorial:")
graphs_info = [
    ("Social Network", 3, 3),
    ("Project Network", f"{g.node_count()}", f"{g.edge_count()}"),
    ("Knowledge Graph", f"{kg.node_count()}", f"{kg.edge_count()}"),
    ("Research Network", f"{network.node_count()}", f"{network.edge_count()}"),
    ("Supply Chain", f"{supply_chain.node_count()}", f"{supply_chain.edge_count()}")
]

for name, nodes, edges in graphs_info:
    print(f"  {name}: {nodes} nodes, {edges} edges")

=== GLI Feature Summary ===
Current backend: rust
Available backends: python, rust

Feature Verification:
✅ Node creation with complex attributes
✅ Edge creation with attributes
✅ Attribute modification
✅ Data type preservation: <class 'float'>, <class 'bool'>
✅ Nested structures: deep nesting works!
✅ Graph traversal: 1 neighbors found

🎉 GLI Tutorial Complete!

What you learned:
  • Basic graph creation and manipulation
  • Rich attribute support with complex data types
  • Backend switching (Python ↔ Rust)
  • Performance optimization with batch operations
  • Advanced analysis: subgraphs, connected components
  • Real-world examples: social networks, projects, supply chains
  • Graph metrics and pattern analysis

Next steps:
  • Explore the full API documentation
  • Try GLI on your own graph data
  • Experiment with larger datasets
  • Contribute to the project on GitHub!

Graphs created in this tutorial:
  Social Network: 3 nodes, 3 edges
  Project Network: 1 nodes, 0 edges
  Kno