[![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/Hawksight-AI/semantica/blob/main/cookbook/use_cases/intelligence/03_Law_Enforcement_Forensics.ipynb)

# Law Enforcement and Forensics Analysis with Semantica

## Overview

This notebook demonstrates a complete forensic analysis pipeline using **Semantica as the core framework** with agent-based workflows for processing case files, evidence logs, witness statements, and forensic reports. The pipeline builds temporal knowledge graphs, correlates evidence across cases, and generates comprehensive forensic analysis reports.


**Documentation**: [API Reference](https://semantica.readthedocs.io/use-cases/)

### Why Semantica?

Semantica provides a complete framework for forensic analysis:

- **Multi-Source Evidence Processing**: Process case files, evidence logs, witness statements, forensic reports, and crime scene data
- **Agent-Based Workflows**: Autonomous agents with persistent memory for coordinated evidence analysis
- **Temporal Knowledge Graphs**: Build time-aware knowledge graphs for case timelines and evidence correlation
- **Cross-Case Correlation**: Identify connections and patterns across multiple cases
- **Graph Analytics**: Analyze evidence networks and case relationships
- **GraphRAG**: Semantic search across case files with context-aware evidence retrieval
- **Forensic Reporting**: Generate comprehensive forensic analysis reports with evidence chains

### Key Features

- Process forensic evidence from multiple sources
- Extract entities (persons, locations, evidence, events) and relationships
- Build temporal knowledge graphs for case timelines
- Correlate evidence across multiple cases
- Agent-based analysis with persistent memory
- Graph analytics for evidence network analysis
- GraphRAG for semantic search across case files
- Generate forensic analysis reports with evidence chains

### Semantica Modules Used (25+)

- **Ingest**: FileIngestor, WebIngestor, FeedIngestor, StreamIngestor, DBIngestor, RepoIngestor, EmailIngestor, MCPIngestor (case files, evidence databases)
- **Parse**: DocumentParser, StructuredDataParser, JSONParser, CSVParser
- **Normalize**: TextNormalizer, DataNormalizer
- **Semantic Extract**: NERExtractor, RelationExtractor, TripletExtractor, EventDetector
- **KG**: GraphBuilder, TemporalGraphQuery, GraphAnalyzer, ConnectivityAnalyzer
- **Graph Analytics**: Community detection, centrality measures, path finding
- **Embeddings**: EmbeddingGenerator, TextEmbedder
- **Vector Store**: VectorStore, HybridSearch, MetadataFilter
- **Context**: AgentMemory, ContextRetriever, ContextGraphBuilder
- **Pipeline**: PipelineBuilder, ExecutionEngine, ParallelismManager
- **Reasoning**: InferenceEngine, RuleManager, ExplanationGenerator
- **Export**: ReportGenerator, JSONExporter
- **Visualization**: KGVisualizer, AnalyticsVisualizer, TemporalVisualizer
- **Deduplication**: DuplicateDetector, EntityMerger

### Pipeline Overview

**Case Files → Parse → Extract Evidence Entities/Relationships → Build Temporal Case KG → Graph Analytics → GraphRAG → Agent Analysis → Cross-Case Correlation → Generate Forensic Report → Visualize**

## Installation

Install Semantica from PyPI:

```bash
pip install semantica
# Or with all optional dependencies:
pip install semantica[all]
```

---


In [None]:
!pip install semantica


## Step 1: Setup and Import Semantica Modules


In [None]:
# Import all Semantica modules for forensic analysis
from semantica.ingest import FileIngestor, DBIngestor
from semantica.parse import DocumentParser, StructuredDataParser, JSONParser, CSVParser
from semantica.normalize import TextNormalizer, DataNormalizer
from semantica.semantic_extract import NERExtractor, RelationExtractor, TripletExtractor, EventDetector
from semantica.kg import GraphBuilder, TemporalGraphQuery, GraphAnalyzer, ConnectivityAnalyzer
from semantica.embeddings import EmbeddingGenerator, TextEmbedder
from semantica.vector_store import VectorStore, HybridSearch, MetadataFilter
from semantica.context import AgentMemory, ContextRetriever, ContextGraphBuilder
from semantica.pipeline import PipelineBuilder, ExecutionEngine, ParallelismManager
from semantica.reasoning import InferenceEngine, RuleManager, ExplanationGenerator
from semantica.export import ReportGenerator, JSONExporter
from semantica.visualization import KGVisualizer, AnalyticsVisualizer, TemporalVisualizer
from semantica.deduplication import DuplicateDetector, EntityMerger

import tempfile
import os
import json
from datetime import datetime, timedelta


## Step 2: Initialize Agent Memory and Setup Agents

Set up AgentMemory for persistent context and initialize specialized forensic analysis agents.


In [None]:
# Initialize vector store for agent memory
vector_store = VectorStore(backend="faiss", dimension=768)

# Initialize agent memory for persistent context
agent_memory = AgentMemory(
    vector_store=vector_store,
    retention_policy="unlimited",
    max_memory_size=10000
)

# Initialize Semantica modules
file_ingestor = FileIngestor()
graph_builder = GraphBuilder()
temporal_query = TemporalGraphQuery()
graph_analyzer = GraphAnalyzer()
inference_engine = InferenceEngine()


## Step 3: Ingest Case Files and Evidence

Ingest case files, evidence logs, witness statements, and forensic reports.


In [None]:
# Create temporary directory for sample data
temp_dir = tempfile.mkdtemp()

# Sample case files data
case_files_data = {
    "cases": [
        {
            "case_id": "CF001",
            "date_opened": "2024-01-10",
            "case_type": "Homicide",
            "location": "123 Main Street",
            "victim": "Victim A",
            "suspects": ["Suspect X", "Suspect Y"],
            "status": "Active"
        },
        {
            "case_id": "CF002",
            "date_opened": "2024-02-15",
            "case_type": "Robbery",
            "location": "456 Oak Avenue",
            "victim": "Victim B",
            "suspects": ["Suspect Y", "Suspect Z"],
            "status": "Active"
        }
    ]
}

# Sample evidence logs
evidence_logs_data = {
    "evidence": [
        {
            "evidence_id": "E001",
            "case_id": "CF001",
            "type": "Fingerprint",
            "location_found": "123 Main Street",
            "date_collected": "2024-01-10",
            "analyzed_by": "Forensic Lab A",
            "results": "Match to Suspect X"
        },
        {
            "evidence_id": "E002",
            "case_id": "CF001",
            "type": "DNA",
            "location_found": "123 Main Street",
            "date_collected": "2024-01-11",
            "analyzed_by": "Forensic Lab B",
            "results": "Match to Suspect X"
        },
        {
            "evidence_id": "E003",
            "case_id": "CF002",
            "type": "Fingerprint",
            "location_found": "456 Oak Avenue",
            "date_collected": "2024-02-15",
            "analyzed_by": "Forensic Lab A",
            "results": "Match to Suspect Y"
        }
    ]
}

# Sample witness statements
witness_statements_data = {
    "statements": [
        {
            "statement_id": "WS001",
            "case_id": "CF001",
            "witness": "Witness 1",
            "date": "2024-01-10",
            "statement": "I saw Suspect X and Suspect Y at the scene around 10 PM"
        },
        {
            "statement_id": "WS002",
            "case_id": "CF002",
            "witness": "Witness 2",
            "date": "2024-02-15",
            "statement": "I observed Suspect Y and Suspect Z leaving the area"
        }
    ]
}

# Save sample data
case_files_file = os.path.join(temp_dir, "case_files.json")
evidence_logs_file = os.path.join(temp_dir, "evidence_logs.json")
witness_statements_file = os.path.join(temp_dir, "witness_statements.json")

with open(case_files_file, 'w') as f:
    json.dump(case_files_data, f, indent=2)
with open(evidence_logs_file, 'w') as f:
    json.dump(evidence_logs_data, f, indent=2)
with open(witness_statements_file, 'w') as f:
    json.dump(witness_statements_data, f, indent=2)

# Ingest files
case_data = file_ingestor.ingest_file(case_files_file, read_content=True)
evidence_data = file_ingestor.ingest_file(evidence_logs_file, read_content=True)
witness_data = file_ingestor.ingest_file(witness_statements_file, read_content=True)

# Store in agent memory
agent_memory.store(
    "Ingested case files, evidence logs, and witness statements",
    metadata={
        "type": "data_ingestion",
        "sources": ["case_files", "evidence_logs", "witness_statements"],
        "timestamp": datetime.now().isoformat()
    }
)

print(f"  - Case files: {len(case_files_data['cases'])}")
print(f"  - Evidence items: {len(evidence_logs_data['evidence'])}")
print(f"  - Witness statements: {len(witness_statements_data['statements'])}")


In [None]:
# Initialize extractors
ner_extractor = NERExtractor()
relation_extractor = RelationExtractor()
event_detector = EventDetector()

# Extract forensic entities and relationships
forensic_entities = []
forensic_relationships = []
entity_id_map = {}

# Process case files
json_parser = JSONParser()
parsed_cases = json_parser.parse(case_files_file)
parsed_evidence = json_parser.parse(evidence_logs_file)
parsed_witnesses = json_parser.parse(witness_statements_file)

cases_data = parsed_cases.data if hasattr(parsed_cases, 'data') else parsed_cases
evidence_data = parsed_evidence.data if hasattr(parsed_evidence, 'data') else parsed_evidence
witnesses_data = parsed_witnesses.data if hasattr(parsed_witnesses, 'data') else parsed_witnesses

# Extract entities from cases
if isinstance(cases_data, dict):
    for case in cases_data.get('cases', []):
        case_id = case.get('case_id')
        # Add case as entity
        forensic_entities.append({
            "id": case_id,
            "type": "Case",
            "name": f"Case {case_id}",
            "properties": {
                "case_type": case.get('case_type'),
                "date_opened": case.get('date_opened'),
                "status": case.get('status')
            }
        })
        
        # Extract suspects, victims, locations
        for suspect in case.get('suspects', []):
            if suspect not in entity_id_map:
                entity_id = f"SUSPECT_{len(entity_id_map) + 1}"
                entity_id_map[suspect] = entity_id
                forensic_entities.append({
                    "id": entity_id,
                    "type": "Person",
                    "name": suspect,
                    "properties": {"role": "suspect"}
                })
            # Create relationship
            forensic_relationships.append({
                "source": case_id,
                "target": entity_id_map[suspect],
                "type": "involves",
                "properties": {"date": case.get('date_opened')}
            })

# Extract evidence entities
if isinstance(evidence_data, dict):
    for evidence in evidence_data.get('evidence', []):
        evidence_id = evidence.get('evidence_id')
        case_id = evidence.get('case_id')
        forensic_entities.append({
            "id": evidence_id,
            "type": "Evidence",
            "name": f"Evidence {evidence_id}",
            "properties": {
                "type": evidence.get('type'),
                "date_collected": evidence.get('date_collected'),
                "results": evidence.get('results')
            }
        })
        # Link evidence to case
        forensic_relationships.append({
            "source": case_id,
            "target": evidence_id,
            "type": "has_evidence",
            "properties": {"date": evidence.get('date_collected')}
        })

# Build temporal knowledge graph
forensic_kg = graph_builder.build(forensic_entities, forensic_relationships)

# Store in agent memory
agent_memory.store(
    f"Built temporal forensic knowledge graph with {len(forensic_entities)} entities",
    metadata={
        "type": "knowledge_graph",
        "entity_count": len(forensic_entities),
        "relationship_count": len(forensic_relationships)
    },
    entities=forensic_entities,
    relationships=forensic_relationships
)

print(f"  - Entities: {len(forensic_entities)}")
print(f"  - Relationships: {len(forensic_relationships)}")
print(f"  - Cases: {len([e for e in forensic_entities if e.get('type') == 'Case'])}")
print(f"  - Evidence items: {len([e for e in forensic_entities if e.get('type') == 'Evidence'])}")


In [None]:
# Initialize pipeline modules
pipeline_builder = PipelineBuilder()
execution_engine = ExecutionEngine()

# Define specialized forensic agents

# Agent 1: Evidence Collection Agent
def agent_evidence_collection(evidence_data, memory):
    """Autonomous agent for evidence gathering."""
    context = memory.retrieve("evidence", max_results=5)
    memory.store(
        f"Evidence collection agent processed {len(evidence_data.get('evidence', [])) if isinstance(evidence_data, dict) else 0} evidence items",
        metadata={"agent": "evidence_collection"}
    )
    return {"evidence_processed": len(evidence_data.get('evidence', [])) if isinstance(evidence_data, dict) else 0}

# Agent 2: Timeline Analysis Agent
def agent_timeline_analysis(kg, temporal_query, memory):
    """Agent for building temporal case timelines."""
    # Use temporal query to analyze timelines
    timeline = temporal_query.query_by_time_range(kg, start_date="2024-01-01", end_date="2024-12-31")
    memory.store(
        "Timeline analysis agent built case timelines",
        metadata={"agent": "timeline_analysis"}
    )
    return {"timeline": timeline}

# Agent 3: Cross-Case Correlation Agent
def agent_cross_case_correlation(kg, analyzer, memory):
    """Agent for finding connections across cases."""
    # Find common suspects, locations, evidence
    communities = analyzer.detect_communities(kg, method="louvain")
    memory.store(
        f"Cross-case correlation identified {communities.get('num_communities', 0) if isinstance(communities, dict) else 0} case clusters",
        metadata={"agent": "cross_case_correlation"}
    )
    return {"case_clusters": communities}

# Agent 4: Forensic Report Agent
def agent_forensic_report(analysis_results, memory):
    """Agent for generating forensic reports."""
    context = memory.retrieve("forensic analysis", max_results=10)
    memory.store(
        "Forensic report agent compiled comprehensive report",
        metadata={"agent": "forensic_report"}
    )
    return {"report_data": analysis_results, "context_items": len(context)}

def evidence_collection_handler(data, **config):
    evidence_data = data.get("evidence_logs_data")
    memory = data.get("memory")
    r = agent_evidence_collection(evidence_data, memory)
    return {**data, "evidence_collection_result": r}
def timeline_handler(data, **config):
    kg = data.get("forensic_kg")
    temporal = data.get("temporal_query")
    memory = data.get("memory")
    r = agent_timeline_analysis(kg, temporal, memory)
    return {**data, "timeline_result": r}
def correlation_handler(data, **config):
    kg = data.get("forensic_kg")
    analyzer = data.get("graph_analyzer")
    memory = data.get("memory")
    r = agent_cross_case_correlation(kg, analyzer, memory)
    return {**data, "correlation_result": r}
def report_handler(data, **config):
    memory = data.get("memory")
    analysis = {
        "evidence": data.get("evidence_collection_result"),
        "timeline": data.get("timeline_result"),
        "correlation": data.get("correlation_result")
    }
    r = agent_forensic_report(analysis, memory)
    return {**data, "forensic_report_result": r}
forensic_pipeline = (
    pipeline_builder
    .add_step("evidence_collection", "ingest", handler=evidence_collection_handler)
    .add_step("timeline_analysis", "analyze_graph", dependencies=["evidence_collection"], handler=timeline_handler)
    .add_step("cross_case_correlation", "analyze_graph", dependencies=["timeline_analysis"], handler=correlation_handler)
    .add_step("forensic_report", "report", dependencies=["cross_case_correlation"], handler=report_handler)
)
.build()
input_data = {"evidence_logs_data": evidence_logs_data, "forensic_kg": forensic_kg, "temporal_query": temporal_query, "graph_analyzer": graph_analyzer, "memory": agent_memory, "cases_data": cases_data}
pipeline_result = execution_engine.execute_pipeline(forensic_pipeline, data=input_data, parallel=True)

print(f"  - Pipeline steps: {len(forensic_pipeline.steps)}")
print(f"  - Parallel execution: Enabled")


## Step 6: Generate Forensic Analysis Report

Generate comprehensive forensic analysis report with evidence chains.


In [None]:
# Initialize report generator
report_generator = ReportGenerator()

# Prepare forensic report data
forensic_report_data = {
    "title": "Forensic Analysis Report",
    "executive_summary": "Analysis of case files, evidence chains, and cross-case correlations",
    "cases_analyzed": len(cases_data.get('cases', [])) if isinstance(cases_data, dict) else 0,
    "evidence_items": len(evidence_data.get('evidence', [])) if isinstance(evidence_data, dict) else 0,
    "knowledge_graph": {
        "entities": len(forensic_entities),
        "relationships": len(forensic_relationships)
    },
    "agent_memory_stats": agent_memory.get_statistics()
}

# Generate HTML report
forensic_report_file = os.path.join(temp_dir, "forensic_analysis_report.html")
report_generator.generate_report(
    forensic_report_data,
    forensic_report_file,
    format="html"
)

print(f"  - Report file: {forensic_report_file}")
print(f"  - Report includes: Case analysis, evidence chains, cross-case correlations")
