[![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/Hawksight-AI/semantica/blob/main/cookbook/use_cases/intelligence/01_Criminal_Network_Analysis.ipynb)

# Criminal Network Analysis - Graph Analytics & Centrality

## Overview

This notebook demonstrates **criminal network analysis** using Semantica with focus on **network centrality**, **community detection**, and **relationship mapping**. The pipeline processes police reports and court records to build knowledge graphs for analyzing criminal networks and relationships.

### Key Features

- **Network Centrality**: Uses centrality measures to identify key players
- **Community Detection**: Detects criminal communities and groups
- **Relationship Mapping**: Maps relationships between persons, organizations, and events
- **Graph Analytics**: Emphasizes graph analytics for network analysis
- **Intelligence Reporting**: Generates intelligence reports from network analysis

### Pipeline Architecture

1. **Phase 0**: Setup & Configuration
2. **Phase 1**: Police Reports & Court Records Ingestion
3. **Phase 2**: Entity Extraction (Person, Organization, Event, Location)
4. **Phase 3**: Criminal Network Graph Construction
5. **Phase 4**: Network Centrality Analysis
6. **Phase 5**: Community Detection
7. **Phase 6**: Relationship Analysis
8. **Phase 7**: Visualization & Intelligence Reporting

---

## Installation


In [None]:
%pip install -qU semantica networkx matplotlib plotly pandas groq


---

## Phase 0: Setup & Configuration


In [None]:
import os
from semantica.core import Semantica, ConfigManager
from semantica.kg import GraphAnalytics

os.environ["GROQ_API_KEY"] = os.getenv("GROQ_API_KEY", "your-key")

config_dict = {
    "project_name": "Criminal_Network_Analysis",
    "extraction": {"provider": "groq", "model": "llama-3.1-8b-instant"},
    "knowledge_graph": {"backend": "networkx"}
}

config = ConfigManager().load_from_dict(config_dict)
core = Semantica(config=config)
print("Configured for criminal network analysis with graph analytics focus")


---

## Phase 1: Real Data Ingestion (OSINT RSS Feeds)

Ingest intelligence data from OSINT RSS feeds.


In [None]:
from semantica.ingest import FeedIngestor, FileIngestor
import os

os.makedirs("data", exist_ok=True)

# Ingest from OSINT RSS feeds (real data sources)
osint_feeds = [
    # Add OSINT feed URLs here
    # Example: "https://example.com/osint-feed.xml"
]

documents = []
for feed_url in osint_feeds:
    try:
        feed_ingestor = FeedIngestor()
        feed_documents = feed_ingestor.ingest(feed_url, method="rss")
        print(f"Ingested {len(feed_documents)} documents from {feed_url}")
        documents.extend(feed_documents)
    except Exception as e:
        print(f"Feed ingestion failed for {feed_url}: {e}")

# Fallback: Sample criminal network data
if not documents:
    network_data = """
    John Smith is associated with criminal organization XYZ.
    Jane Doe has connections to John Smith and organization XYZ.
    Event: Meeting on 2024-01-15 between John Smith and Jane Doe at Location A.
    Organization XYZ is linked to multiple criminal activities.
    Person: Mike Johnson connected to organization XYZ.
    """
    with open("data/criminal_network.txt", "w") as f:
        f.write(network_data)
    documents = FileIngestor().ingest("data/criminal_network.txt")
    print(f"Ingested {len(documents)} documents from sample data")


---

## Phase 2: Text Normalization & Deduplication

Normalize entity names and deduplicate person records.


In [None]:
from semantica.normalize import TextNormalizer
from semantica.deduplication import DuplicateDetector

# Normalize entity names
normalizer = TextNormalizer()
normalized_documents = []
for doc in documents:
    normalized_text = normalizer.normalize(
        doc.content if hasattr(doc, 'content') else str(doc),
        clean_html=True,
        normalize_entities=True,
        remove_extra_whitespace=True
    )
    normalized_documents.append(normalized_text)

print(f"Normalized {len(normalized_documents)} documents")

# Build criminal network graph
result = core.build_knowledge_base(
    sources=normalized_documents,
    custom_entity_types=["Person", "Organization", "Event", "Location", "Relationship"],
    graph=True
)

kg = result["knowledge_graph"]
entities = result["entities"]

# Deduplicate person records
persons = [e for e in entities if e.get("type") == "Person" or "person" in e.get("type", "").lower()]

detector = DuplicateDetector()
duplicates = detector.detect_duplicates(persons, threshold=0.9)
deduplicated_persons = detector.resolve_duplicates(persons, duplicates)

print(f"Built criminal network with {len(kg.get('entities', []))} entities")
print(f"Deduplicated: {len(persons)} -> {len(deduplicated_persons)} unique persons")
print("Focus: Network centrality, community detection, relationship mapping, graph analytics")


In [None]:
# Perform network analytics
analytics = GraphAnalytics(kg)
centrality = analytics.calculate_centrality(method="betweenness")
communities = analytics.detect_communities()

# Identify key players (high centrality)
key_players = sorted(centrality.items(), key=lambda x: x[1], reverse=True)[:5]

print(f"Network centrality: {len(centrality)} nodes analyzed")
print(f"Community detection: {len(communities)} communities identified")
print(f"Key players: {[node for node, _ in key_players]}")
print("\n=== Pipeline Summary ===")
print(f"✓ Ingested {len(documents)} documents from OSINT RSS feeds")
print(f"✓ Normalized {len(normalized_documents)} documents")
print(f"✓ Deduplicated {len(persons)} persons to {len(deduplicated_persons)} unique")
print(f"✓ This cookbook emphasizes graph analytics, centrality, and community detection")


---

## Phase 6-7: Visualization & Intelligence Reporting


In [None]:
from semantica.visualization import KGVisualizer

visualizer = KGVisualizer()
visualizer.visualize(kg, output_path="criminal_network.html", layout="force-directed")

print("Criminal network analysis complete")
print("Emphasizes: Network centrality, community detection, relationship mapping, graph analytics")
