[![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/Hawksight-AI/semantica/blob/main/cookbook/advanced/02_Advanced_Graph_Analytics.ipynb)

# Advanced Graph Analytics

## Overview

This notebook demonstrates advanced graph analytics using GraphAnalyzer, CentralityCalculator, CommunityDetector, ConnectivityAnalyzer, GraphValidator, Deduplicator, and **GraphStore** for persistent storage.


**Documentation**: [API Reference](https://semantica.readthedocs.io/reference/kg/)

### Learning Objectives

- Use GraphAnalyzer for comprehensive graph analysis
- Use CentralityCalculator for advanced centrality measures
- Use CommunityDetector for community detection
- Use ConnectivityAnalyzer for connectivity analysis
- Use GraphValidator and Deduplicator for graph quality
- **Use GraphStore to persist graphs to Neo4j or FalkorDB**

## Installation

Install Semantica from PyPI:

```bash
pip install semantica
# Or with all optional dependencies:
pip install semantica[all]
```

---

## Workflow: Graph Analysis → Centrality → Communities → Connectivity → Validation → Deduplication → **Persist to Graph Store**


In [None]:
%pip install -U "semantica[all]"
import semantica
print(semantica.__version__)


In [None]:
from semantica.kg import GraphBuilder, GraphAnalyzer, CentralityCalculator, CommunityDetector, ConnectivityAnalyzer, GraphValidator
from semantica.deduplication import DuplicateDetector, EntityMerger, MergeStrategy

builder = GraphBuilder()
analyzer = GraphAnalyzer()

entities = [
    {"id": "e1", "type": "Organization", "name": "Apple Inc.", "properties": {}},
    {"id": "e2", "type": "Person", "name": "Tim Cook", "properties": {}},
    {"id": "e3", "type": "Location", "name": "Cupertino", "properties": {}}
]

relationships = [
    {"source": "e2", "target": "e1", "type": "CEO_of", "properties": {}},
    {"source": "e1", "target": "e3", "type": "located_in", "properties": {}}
]

kg = builder.build(entities, relationships)

metrics = analyzer.compute_metrics(kg)

print(f"Graph metrics:")
print(f"  Entities: {metrics.get('entity_count', 0)}")
print(f"  Relationships: {metrics.get('relationship_count', 0)}")
print(f"  Density: {metrics.get('density', 0):.3f}")


## Step 2: Advanced Centrality Measures

Calculate multiple centrality measures.


In [None]:
centrality_calculator = CentralityCalculator()

degree_centrality_result = centrality_calculator.calculate_degree_centrality(kg)
degree_centrality = degree_centrality_result.get('centrality', {})
betweenness_centrality_result = centrality_calculator.calculate_betweenness_centrality(kg)
betweenness_centrality = betweenness_centrality_result.get('centrality', {})

print(f"Degree centrality: {len(degree_centrality)} entities")
print(f"Betweenness centrality: {len(betweenness_centrality)} entities")


## Step 3: Community Detection

Detect communities in the graph.


In [None]:
community_detector = CommunityDetector()

communities = community_detector.detect_communities(kg)

print(f"Detected {len(communities)} communities")
for i, community in enumerate(communities[:3], 1):
    print(f"  Community {i}: {len(community)} entities")


## Step 4: Connectivity Analysis

Analyze graph connectivity.


In [None]:
connectivity_analyzer = ConnectivityAnalyzer()

connectivity = connectivity_analyzer.analyze_connectivity(kg)

print(f"Connectivity analysis:")
print(f"  Is connected: {connectivity.get('is_connected', False)}")
print(f"  Components: {len(connectivity.get('components', []))}")


## Step 5: Graph Validation and Deduplication

Validate and deduplicate the graph.


In [None]:
graph_validator = GraphValidator()

validation_result = graph_validator.validate(kg)

print(f"Graph validation: {validation_result.get('valid', False)}")
print(f"Issues found: {len(validation_result.get('issues', []))}")

# For deduplication, use semantica.deduplication module:
# from semantica.deduplication import DuplicateDetector, EntityMerger, MergeStrategy
# detector = DuplicateDetector(similarity_threshold=0.8)
# duplicate_groups = detector.detect_duplicate_groups(kg.get('entities', []))
# merger = EntityMerger()
# merge_operations = merger.merge_duplicates(kg.get('entities', []), strategy=MergeStrategy.KEEP_MOST_COMPLETE)


## Step 6: Persist to Graph Store

Store the analyzed graph in a persistent graph database using GraphStore.


In [None]:
from semantica.graph_store import GraphStore

# Option 1: Neo4j (requires Neo4j server running)
graph_store = GraphStore(backend="neo4j", uri="bolt://localhost:7687", user="neo4j", password="password")
graph_store.connect()

# Store entities as nodes and track node ID mapping
node_id_map = {}
for entity in entities:
    node = graph_store.create_node(
        labels=[entity["type"]],
        properties={"name": entity["name"], "original_id": entity["id"]}
    )
    node_id_map[entity["id"]] = node.get("id")
    print(f"Stored node: {entity['name']} (ID: {node.get('id')})")

# Store relationships using mapped node IDs
for rel in relationships:
    source_id = node_id_map.get(rel["source"])
    target_id = node_id_map.get(rel["target"])
    
    if source_id is not None and target_id is not None:
        relationship = graph_store.create_relationship(
            start_node_id=source_id,
            end_node_id=target_id,
            rel_type=rel["type"],
            properties=rel.get("properties", {})
        )
        print(f"Stored relationship: {rel['source']} -{rel['type']}-> {rel['target']}")
    else:
        print(f"Warning: Could not find node IDs for relationship {rel['source']} -> {rel['target']}")

# Query using Cypher
results = graph_store.execute_query("MATCH (n) RETURN n.name, labels(n) LIMIT 10")
print(f"\nQuery results: {len(results.get('records', []))} nodes")

# Get statistics
stats = graph_store.get_stats()
print(f"\nGraph store statistics:")
print(f"  Node count: {stats.get('node_count', 'N/A')}")
print(f"  Relationship count: {stats.get('relationship_count', 'N/A')}")
print(f"  Label counts: {stats.get('label_counts', {})}")

graph_store.close()


## Summary

You've learned advanced graph analytics:

- **GraphAnalyzer**: Comprehensive graph analysis and metrics
- **CentralityCalculator**: Multiple centrality measures
- **CommunityDetector**: Community detection
- **ConnectivityAnalyzer**: Connectivity analysis
- **GraphValidator**: Graph validation
- **Deduplicator**: Graph deduplication
- **GraphStore**: Persist graphs to Neo4j or FalkorDB
