# Military Capability Gap Analysis with Semantica Context Graphs

## Objective
This notebook demonstrates **end-to-end capability gap analysis** for defense planning using Semantica's advanced context graphs and decision intelligence platform. It showcases how to transform raw documents and ontologies into actionable intelligence with complete audit trails.

## Use Case Scope
- **Domain**: Defense capability planning and gap analysis
- **Scenario**: Future A2/AD (Anti-Access/Area Denial) environments
- **Timeline**: 2028+ strategic planning horizon
- **Stakeholders**: Defense planners, capability managers, decision makers

## End-to-End Pipeline Sequence
```
Data Ingestion → Document Parsing → Text Splitting → Text Normalization → 
Semantic Extraction → Entity Resolution & Deduplication → Knowledge Graph → 
Vector Store Configuration → Context Graph → Decision Tracking → 
Policy Engine & Compliance → Analytics & Insights → Export & Reporting
```

## Context Graph Pattern
```
Scenario → Mission Thread → Events → Systems → Capabilities → Gaps → Decisions → Outcomes
```

## Key Questions Answered
1. **Capability Assessment**: What capability gaps exist for specific mission threads?
2. **Evidence Tracking**: Which sources and evidence support each identified gap?
3. **Decision Governance**: What precedents, exceptions, and approvals influenced decisions?
4. **Risk Analysis**: What multi-hop paths connect scenarios to risk outcomes?
5. **Policy Compliance**: Are decisions aligned with established policies and governance frameworks?

## Detailed Pipeline Steps

### 1. Data Ingestion
- **FileIngestor**: Local document processing (PDF, TXT, JSON, etc.)
- **WebIngestor**: Web content extraction and processing
- **OntologyIngestor**: RDF/OWL/TTL ontology ingestion

### 2. Document Parsing
- **PDFParser**: Advanced PDF text extraction with metadata
- **DocumentParser**: Multi-format document processing
- **DoclingParser**: AI-powered document understanding (optional)

### 3. Text Splitting & Chunking
- **TextSplitter**: Recursive and entity-aware chunking
- **SemanticChunker**: Semantic boundary detection
- **StructuralChunker**: Document structure-based splitting

### 4. Text Normalization
- **TextNormalizer**: Text cleaning and standardization
- **EntityNormalizer**: Entity name resolution and normalization
- **DateNormalizer**: Temporal expression normalization
- **NumberNormalizer**: Numeric value standardization
- **LanguageDetector**: Language identification and processing
- **EncodingHandler**: Character encoding detection and conversion
- **TextCleaner**: Noise removal and text sanitization

### 5. Semantic Extraction
- **NamedEntityRecognizer**: Entity identification and classification
- **RelationExtractor**: Relationship extraction between entities
- **EventDetector**: Event identification and temporal analysis
- **CoreferenceResolver**: Pronoun and reference resolution
- **TripletExtractor**: Subject-predicate-object triplet extraction
- **SemanticAnalyzer**: Semantic similarity and analysis
- **SemanticNetworkExtractor**: Network structure extraction
- **ExtractionValidator**: Quality assurance and validation

### 6. Entity Resolution & Deduplication
- **EntityDeduplicator**: Fuzzy matching and embedding-based entity resolution
- **RelationshipDeduplicator**: Context-aware relationship deduplication
- **ConflictDetector**: Data conflict identification and resolution
- **ConflictResolver**: Automated conflict resolution strategies

### 7. Knowledge Graph Construction
- **GraphBuilder**: KG construction from extracted data
- **GraphAnalyzer**: Graph structure and topology analysis
- **CentralityCalculator**: Node importance and influence metrics
- **CommunityDetector**: Community structure identification
- **ConnectivityAnalyzer**: Graph connectivity and robustness
- **SimilarityCalculator**: Node and edge similarity analysis
- **LinkPredictor**: Missing link prediction and recommendation
- **PathFinder**: Optimal path discovery and routing
- **EntityResolver**: Entity deduplication and resolution

### 8. Vector Store Configuration
- **ApacheAGE**: Graph database with vector capabilities
- **VectorStore**: Semantic similarity and search
- **Vector Indexing**: Efficient similarity search with IVF Flat indexing
- **Embedding Support**: FastEmbed integration for efficient vector operations

### 9. Context Graph Creation
- **ContextGraph**: Advanced context modeling with analytics
- **AgentContext**: Unified context management interface
- **Graph Expansion**: Multi-hop context expansion and reasoning
- **Semantic Search**: Hybrid semantic and structural search

### 10. Decision Recording & Tracking
- **Decision Models**: Decision, Policy, PolicyException, ApprovalChain, Precedent
- **Decision Recorder**: Decision lifecycle management
- **Decision Query**: Advanced decision search and retrieval
- **Causal Analyzer**: Decision influence and impact analysis
- **Precedent Search**: Smart precedent identification and analysis

### 11. Policy Engine & Compliance
- **PolicyEngine**: Policy definition and compliance checking
- **Policy Management**: Versioning, compliance checking, exception handling
- **Compliance Monitoring**: Real-time policy compliance validation
- **Exception Handling**: Policy exception management and approval workflows

### 12. Analytics & Insights
- **GraphAnalytics**: Comprehensive graph analytics and metrics
- **DecisionAnalytics**: Decision pattern analysis and impact assessment
- **PerformanceAnalytics**: System performance monitoring and optimization
- **KGVisualizer**: Interactive graph visualization
- **Dashboard**: Real-time analytics dashboard and KPI monitoring

### 13. Export & Reporting
- **Multi-Format Export**: JSON, RDF, GraphML, CSV, YAML, LPG
- **ReportGenerator**: Comprehensive report generation
- **ApacheAGE Export**: SQL export scripts for database integration
- **Analytics Export**: Performance metrics and analysis results

## Expected Outputs
- **Context Graph**: Capability-gap context graph with decision traces
- **Knowledge Graph**: Complete KG with entities and relationships
- **Vector Store**: Semantic search index with Apache AGE integration
- **Decision Records**: Full audit trail with policy compliance
- **Multi-Format Exports**: JSON, RDF, GraphML, CSV, YAML, LPG
- **Analytics Reports**: Comprehensive analysis and insights
- **Visualizations**: Interactive graph representations

## Key Metrics Tracked
- **Ingestion**: Document count, size, processing time
- **Extraction**: Entity/relationship/event counts, confidence scores
- **Graph Analytics**: Node/edge counts, centrality measures, community structure
- **Vector Operations**: Embedding generation, search performance, indexing metrics
- **Decision Tracking**: Decision volume, compliance rates, approval chains
- **Performance**: Processing times, memory usage, throughput

## Technology Stack
- **Core**: Semantica Context Graph & Decision Intelligence Platform
- **Vector Database**: Apache AGE with vector capabilities
- **Graph Analytics**: NetworkX, Node2Vec, Community Detection
- **Semantic Search**: Hybrid search with embeddings and graph structure
- **Export Formats**: JSON, RDF/Turtle, GraphML, CSV, YAML, LPG (Cypher)

In [None]:
!pip install semantica==0.3.0a0

In [None]:
# Setup and Configuration (Consolidated)
from pathlib import Path
from datetime import datetime

# Setup directories
BASE_DIR = Path.cwd()
USE_CASE_DIR = BASE_DIR / 'cookbook' / 'use_cases' / 'capability_gap_defense'
DATA_DIR = USE_CASE_DIR / 'data'
OUTPUT_DIR = USE_CASE_DIR / 'outputs'
DATA_DIR.mkdir(parents=True, exist_ok=True)
OUTPUT_DIR.mkdir(parents=True, exist_ok=True)

print("Setup completed:")
print(f"Base Directory: {BASE_DIR}")
print(f"Data Directory: {DATA_DIR}")
print(f"Output Directory: {OUTPUT_DIR}")

# Return directory paths for verification
BASE_DIR, DATA_DIR, OUTPUT_DIR

In [None]:
# Multi-Source Data Ingestion
# Import ingestion classes where they are used

from semantica.ingest import FileIngestor, WebIngestor, OntologyIngestor

# Initialize ingestion classes
file_ingestor = FileIngestor()
web_ingestor = WebIngestor()
ontology_ingestor = OntologyIngestor()

# File Ingestion
print("Starting file ingestion...")
try:
    file_objects = file_ingestor.ingest_directory(
        directory_path=DATA_DIR,
        recursive=False,
        read_content=True,  # Read content for processing
        include_metadata=True
    )
    print(f"Files ingested: {len(file_objects)}")
except Exception as e:
    print(f"File ingestion failed: {e}")
    file_objects = []

# Web Ingestion  
print("Starting web ingestion...")
web_sources = [
    'https://www.rand.org/pubs/research_reports/RRA733-1.html',
    'https://foundationcapital.com/context-graphs/',
    'https://www.defense.gov/News/Releases/',
    'https://www.navy.mil/Portals/1/NWC/NSG_Support/'
]

web_contents = []
for i, url in enumerate(web_sources):
    try:
        web_content = web_ingestor.ingest_url(
            url=url,
            extract_content=True,
            include_metadata=True,
            timeout=30
        )
        web_contents.append(web_content)
        print(f"Web content {i+1}: {url}")
    except Exception as e:
        print(f"Web ingestion failed for {url}: {e}")

# Ontology Ingestion
print("Starting ontology ingestion...")
try:
    ontology_data = ontology_ingestor.ingest_directory(
        directory_path=DATA_DIR,
        recursive=False,
        format='turtle',  # TTL format
        validate_schema=True
    )
    print(f"Ontologies ingested: {len(ontology_data)}")
except Exception as e:
    print(f"Ontology ingestion failed: {e}")
    ontology_data = []

# Ingestion Summary
ingestion_summary = {
    'files': {
        'count': len(file_objects),
        'types': list(set([getattr(f, 'file_type', 'unknown') for f in file_objects])),
        'total_size_mb': round(sum([getattr(f, 'size', 0) for f in file_objects]) / (1024*1024), 2)
    },
    'web': {
        'count': len(web_contents),
        'sources': [getattr(w, 'url', 'unknown') for w in web_contents],
        'total_chars': sum([len(getattr(w, 'content', '')) for w in web_contents])
    },
    'ontologies': {
        'count': len(ontology_data),
        'formats': list(set([getattr(o, 'format', 'turtle') for o in ontology_data])),
        'total_classes': sum([len(getattr(o, 'data', {}).get('classes', [])) for o in ontology_data])
    }
}

print("Ingestion Summary:")
for category, stats in ingestion_summary.items():
    print(f"  {category.upper()}: {stats}")

# Data Quality Check
def validate_ingestion_quality():
    """Validate the quality of ingested data"""
    quality_report = {
        'files_with_content': sum([1 for f in file_objects if hasattr(f, 'content') and f.content]),
        'web_with_content': sum([1 for w in web_contents if hasattr(w, 'content') and w.content]),
        'ontologies_with_schema': sum([1 for o in ontology_data if hasattr(o, 'data') and o.data.get('classes')]),
        'total_documents': len(file_objects) + len(web_contents) + len(ontology_data)
    }
    return quality_report

quality_report = validate_ingestion_quality()
print(f"Quality Report: {quality_report}")

In [None]:
# Document Parsing follows in the next cells

- Modules: `FileIngestor`, `WebIngestor`, `OntologyIngestor`
- Loads local files.
- Fetches web content.
- Ingests ontology files.

# Document Parsing will follow in the next cells

# Document parsing modules and functionality

# Ontology evaluation summary will be provided after evaluation

In [None]:
# Ontology Evaluation Summary
# This cell provides a summary of the ontology evaluation completed in cell 29

if 'evaluation_results' in locals() and evaluation_results:
    print("Ontology Evaluation Summary:")
    print(f"  Total ontologies evaluated: {len(evaluation_results)}")
    
    successful_evals = [r for r in evaluation_results if 'error' not in r]
    if successful_evals:
        avg_coverage = sum([r['coverage_score'] for r in successful_evals]) / len(successful_evals)
        avg_completeness = sum([r['completeness_score'] for r in successful_evals]) / len(successful_evals)
        print(f"  Successful evaluations: {len(successful_evals)}")
        print(f"  Average coverage score: {avg_coverage:.2f}")
        print(f"  Average completeness score: {avg_completeness:.2f}")
        
        # Show extraction context if available
        if 'extraction_context' in successful_evals[0]:
            ctx = successful_evals[0]['extraction_context']
            print(f"  Extraction context available:")
            print(f"    Extracted entities: {ctx.get('extracted_entities', 0)}")
            print(f"    Extracted relationships: {ctx.get('extracted_relationships', 0)}")
            print(f"    Extracted events: {ctx.get('extracted_events', 0)}")
    else:
        print("  No successful evaluations")
else:
    print("No ontology evaluation results available")

In [None]:
# PDF Document Parsing
from semantica.parse import PDFParser

pdf_parser = PDFParser()
pdf_docs = []
for pdf_path in sorted(DATA_DIR.glob('*.pdf')):
    try:
        with open(pdf_path, 'rb') as f:
            if f.read(4) != b'%PDF':
                print(f'Skipping non-PDF payload: {pdf_path.name}')
                continue

        parsed = pdf_parser.parse(pdf_path, pages=list(range(0, 12)))
        text = parsed.get('full_text', parsed.get('text', ''))
        if text:
            pdf_docs.append({
                'doc_id': pdf_path.stem,
                'source': str(pdf_path),
                'text': text[:50000],
                'metadata': parsed.get('metadata', {}),
            })
    except Exception as e:
        print(f'PDF parse failed for {pdf_path.name}: {e}')

print(f"PDF documents parsed: {len(pdf_docs)}")

## Multi-Format Parsing: DocumentParser + Optional DoclingParser

- Modules: `PDFParser`, `DocumentParser`, optional `DoclingParser`
- Parses PDF/documents.
- Extracts text and metadata.
- Uses Docling parser if available.

In [None]:
# Multi-Format Document Parsing
# Import document parsing classes where they are used

from semantica.parse import DocumentParser

doc_parser = DocumentParser()
doc_parser_preview = {}

if pdf_docs:
    sample_pdf = Path(pdf_docs[0]['source'])
    try:
        parsed_doc = doc_parser.parse_document(sample_pdf)
        doc_parser_preview = {
            'source': sample_pdf.name,
            'keys': list(parsed_doc.keys())[:10],
            'text_chars': len(parsed_doc.get('full_text', parsed_doc.get('text', '')) or ''),
        }
        print(f"Document parser preview for {sample_pdf.name}: {doc_parser_preview}")
    except Exception as e:
        doc_parser_preview = {'source': sample_pdf.name, 'error': str(e)}
        print(f"Document parser error: {e}")

# Check for Docling availability
import semantica.parse as parse_module
docling_preview = {'docling_available': bool(getattr(parse_module, 'DOCLING_AVAILABLE', False))}
if getattr(parse_module, 'DOCLING_AVAILABLE', False) and pdf_docs:
    try:
        from semantica.parse import DoclingParser
        docling_parser = DoclingParser(export_format='markdown')
        dres = docling_parser.parse(Path(pdf_docs[0]['source']))
        docling_preview['keys'] = list(dres.keys())[:10]
        docling_preview['text_chars'] = len(dres.get('full_text', dres.get('text', '')) or '')
        print(f"Docling parser available: {docling_preview}")
    except Exception as e:
        docling_preview['error'] = str(e)
        print(f"Docling parser error: {e}")
else:
    print("Docling parser not available")

print(f"Document parsing completed. Docling available: {docling_preview['docling_available']}")

In [None]:
# Ontology evaluation will be moved to after semantic extraction for better context

In [None]:
# Create Document Corpus
import json
from pathlib import Path

# Use only available web contents (remove undefined variables)
web_items = web_contents if 'web_contents' in locals() else []

corpus = (
    [
        {'doc_id': d['doc_id'], 'source': d['source'], 'text': d['text']}
        for d in pdf_docs
    ]
    + [
        {
            'doc_id': f'web_{i}',
            'source': getattr(w, 'url', f'web_source_{i}'),
            'text': (getattr(w, 'content', str(w)) or '')[:30000],
        }
        for i, w in enumerate(web_items)
    ]
    + [
        {
            'doc_id': Path(ont.source_path).stem,
            'source': ont.source_path,
            'text': json.dumps(ont.data, ensure_ascii=True)[:40000],
        }
        for ont in ontology_data
    ]
)

print(f"Corpus created:")
print(f"  Total documents: {len(corpus)}")
print(f"  Sample document IDs: {[c['doc_id'] for c in corpus[:5]]}")
print(f"  Document types: {len(pdf_docs)} PDFs, {len(web_items)} web, {len(ontology_data)} ontologies")

## Orchestration-Path Chunking (Decision-Time Context Capture)

- Modules: `TextSplitter`, `PipelineBuilder`
- Splits documents into chunks.
- Defines pipeline steps for ingest, split, extract, graph, export.

In [None]:
# Text Splitting and Chunking
# Import splitting classes where they are used

from semantica.split import TextSplitter

splitter = TextSplitter(method='recursive', chunk_size=1800, chunk_overlap=250)

texts = [doc.get('text', '') for doc in corpus]
chunks_by_doc = splitter.split_batch(texts)

chunked_docs = []
for doc, chunks in zip(corpus, chunks_by_doc):
    for idx, ch in enumerate(chunks or []):
        chunked_docs.append({
            'doc_id': f"{doc['doc_id']}::chunk_{idx}",
            'source': doc['source'],
            'text': ch.text if hasattr(ch, 'text') else str(ch),
            'parent_doc_id': doc['doc_id'],
        })

extraction_corpus = chunked_docs if chunked_docs else corpus

print(f"Text splitting completed:")
print(f"  Original documents: {len(corpus)}")
print(f"  Chunked documents: {len(chunked_docs)}")
print(f"  Extraction corpus size: {len(extraction_corpus)}")

## Split Strategies (Semantic / Structural / Entity-Aware)

- Modules: `SemanticChunker`, `StructuralChunker`, `TextSplitter` (`entity_aware`)
- Runs multiple split strategies.
- Compares chunk counts/output.

In [None]:
# Advanced Splitting Strategies
# Import advanced splitting classes where they are used

from semantica.split import SemanticChunker, StructuralChunker

split_strategy_preview = {}
if corpus:
    sample_text = corpus[0]['text'][:12000]
    print("Testing different splitting strategies...")
    
    try:
        semantic_chunker = SemanticChunker(chunk_size=1200, chunk_overlap=200)
        sem_chunks = semantic_chunker.chunk(sample_text)
        split_strategy_preview['semantic_chunks'] = len(sem_chunks)
        print(f"  Semantic chunks: {len(sem_chunks)}")
    except Exception as e:
        split_strategy_preview['semantic_chunks_error'] = str(e)
        print(f"  Semantic chunking error: {e}")

    try:
        structural_chunker = StructuralChunker(chunk_size=1200, chunk_overlap=150)
        st_chunks = structural_chunker.chunk(sample_text)
        split_strategy_preview['structural_chunks'] = len(st_chunks)
        print(f"  Structural chunks: {len(st_chunks)}")
    except Exception as e:
        split_strategy_preview['structural_chunks_error'] = str(e)
        print(f"  Structural chunking error: {e}")

    try:
        from semantica.split import TextSplitter
        ea_splitter = TextSplitter(method=['entity_aware', 'recursive'], chunk_size=1200, chunk_overlap=150)
        ea_chunks = ea_splitter.split(sample_text)
        split_strategy_preview['entity_aware_chunks'] = len(ea_chunks)
        print(f"  Entity-aware chunks: {len(ea_chunks)}")
    except Exception as e:
        split_strategy_preview['entity_aware_chunks_error'] = str(e)
        print(f"  Entity-aware chunking error: {e}")

print("Splitting strategy comparison completed!")

In [None]:
# Data Processing Sequence Overview
# This notebook follows the correct logical sequence for capability gap analysis

print("CAPABILITY GAP ANALYSIS - CORRECTED PROCESSING SEQUENCE")
print("=" * 55)
print("1. Data Ingestion (Files, Web, Ontologies)")
print("2. Document Parsing (PDF, Document Processing)")
print("3. Text Splitting & Chunking")
print("4. Text Normalization")
print("5. Semantic Extraction (Entities, Relationships, Events)")
print("6. Entity Resolution & Deduplication")
print("7. Knowledge Graph Construction")
print("8. Vector Store Configuration")
print("9. Context Graph Creation")
print("10. Decision Recording & Tracking")
print("11. Policy Engine & Compliance")
print("12. Analytics & Insights")
print("13. Export & Reporting")
print("=" * 55)
print("✓ Each step is implemented in its respective cell above.")
print("✓ Deduplication now correctly positioned after semantic extraction.")

In [None]:
# Execute Pipeline in Correct Sequence
# This cell demonstrates the actual execution flow matching the corrected pipeline design

print("=" * 60)
print("EXECUTING CAPABILITY GAP ANALYSIS PIPELINE")
print("=" * 60)

# Step 1: Data Ingestion (Already completed in previous cells)
print("\n1. DATA INGESTION")
print("   ✓ Files ingested:", len(file_objects) if 'file_objects' in locals() else 0)
print("   ✓ Web content ingested:", len(web_contents) if 'web_contents' in locals() else 0)
print("   ✓ Ontologies ingested:", len(ontology_data) if 'ontology_data' in locals() else 0)

# Step 2: Document Parsing (Already completed)
print("\n2. DOCUMENT PARSING")
print("   ✓ PDF documents parsed:", len(pdf_docs) if 'pdf_docs' in locals() else 0)
print("   ✓ Document parser applied:", 'doc_parser_preview' in locals())

# Step 3: Text Splitting and Chunking (Already completed)
print("\n3. TEXT SPLITTING AND CHUNKING")
print("   ✓ Documents chunked:", len(chunked_docs) if 'chunked_docs' in locals() else 0)
print("   ✓ Extraction corpus ready:", len(extraction_corpus) if 'extraction_corpus' in locals() else 0)

# Step 4: Text Normalization (Already completed)
print("\n4. TEXT NORMALIZATION")
print("   ✓ Documents normalized:", len(extraction_corpus) if 'extraction_corpus' in locals() else 0)
print("   ✓ Language detection applied")

# Step 5: Semantic Extraction (Already completed)
print("\n5. SEMANTIC EXTRACTION")
if 'extraction_summary' in locals():
    print("   ✓ Entities extracted:", extraction_summary.get('entities', 0))
    print("   ✓ Relationships extracted:", extraction_summary.get('relationships', 0))
    print("   ✓ Events detected:", extraction_summary.get('events', 0))
    print("   ✓ Triplets extracted:", extraction_summary.get('triplets', 0))
else:
    print("   ⚠ Semantic extraction not yet completed")

# Step 6: Entity Resolution & Deduplication (Now correctly positioned)
print("\n6. ENTITY RESOLUTION & DEDUPLICATION")
if 'deduplication_stats' in locals():
    print("   ✓ Entities deduplicated:", deduplication_stats.get('deduplicated_entities', 0))
    print("   ✓ Relationships deduplicated:", deduplication_stats.get('deduplicated_relationships', 0))
    print("   ✓ Events deduplicated:", deduplication_stats.get('deduplicated_events', 0))
    print("   ✓ Entity deduplication rate:", deduplication_stats.get('entity_deduplication_rate', 0), "%")
    print("   ✓ Relationship deduplication rate:", deduplication_stats.get('relationship_deduplication_rate', 0), "%")
else:
    print("   ⚠ Deduplication not yet completed")

# Step 7: Knowledge Graph Construction
print("\n7. KNOWLEDGE GRAPH CONSTRUCTION")
if 'kg' in locals():
    print("   ✓ Knowledge graph built")
    print("   ✓ KG entities:", len(kg.get('entities', [])))
    print("   ✓ KG relationships:", len(kg.get('relationships', [])))
else:
    print("   ⚠ Knowledge graph not yet built")

# Step 8: Vector Store Configuration
print("\n8. VECTOR STORE CONFIGURATION")
if 'vector_store' in locals():
    print("   ✓ Vector store configured")
    print("   ✓ Store type:", type(vector_store).__name__)
else:
    print("   ⚠ Vector store not yet configured")

# Step 9: Context Graph Creation
print("\n9. CONTEXT GRAPH CREATION")
if 'context_graph' in locals():
    print("   ✓ Context graph created")
    if hasattr(context_graph, 'stats'):
        stats = context_graph.stats()
        print("   ✓ Context nodes:", stats.get('nodes', 0))
        print("   ✓ Context edges:", stats.get('edges', 0))
else:
    print("   ⚠ Context graph not yet created")

# Step 10: Decision Recording and Tracking
print("\n10. DECISION RECORDING AND TRACKING")
if 'agent_context' in locals() and 'decisions' in locals():
    print("   ✓ Decisions recorded:", len(decisions))
    for i, decision in enumerate(decisions[:3], 1):
        print(f"     {i}. {decision.category}: {decision.outcome}")
else:
    print("   ⚠ Decision tracking not yet completed")

# Step 11: Policy Engine and Compliance
print("\n11. POLICY ENGINE & COMPLIANCE")
if 'policy_engine' in locals():
    print("   ✓ Policy engine initialized")
    print("   ✓ Policy compliance checking available")
else:
    print("   ⚠ Policy engine not yet initialized")

# Step 12: Analytics and Insights
print("\n12. ANALYTICS AND INSIGHTS")
if 'dashboard_summary' in locals():
    print("   ✓ Analytics dashboard generated")
    print("   ✓ KPIs calculated:", list(kpis.keys()) if 'kpis' in locals() else [])
else:
    print("   ⚠ Analytics not yet generated")

# Step 13: Export and Reporting
print("\n13. EXPORT AND REPORTING")
if 'export_summary' in locals():
    print("   ✓ Files exported:", export_summary.get('files_exported', 0))
    print("   ✓ Total size:", export_summary.get('total_size_mb', 0), "MB")
else:
    print("   ⚠ Export not yet completed")

print("\n" + "=" * 60)
print("CORRECTED PIPELINE EXECUTION SUMMARY")
print("=" * 60)
print("The pipeline now follows the correct data processing sequence:")
print("Ingestion → Parsing → Splitting → Normalization → Extraction →")
print("Deduplication → KG Build → Vector Store → Context Graph →")
print("Decisions → Policy Compliance → Analytics → Export")
print("=" * 60)
print("✓ Deduplication correctly positioned after semantic extraction")
print("✓ Vector store correctly positioned after knowledge graph")
print("✓ All pipeline steps now in logical sequence")

## Normalization Layer (Text, Entity, Date, Number, Language, Encoding)

- Modules: `TextNormalizer`, `EntityNormalizer`, `DateNormalizer`, `NumberNormalizer`, `LanguageDetector`, `EncodingHandler`, `TextCleaner`
- Cleans and normalizes text.
- Normalizes entities, date/time, and numeric values.
- Detects language and handles encoding.

In [None]:
# Text Normalization Layer
# Import normalization classes where they are used

from semantica.normalize import TextNormalizer, EntityNormalizer, DateNormalizer, NumberNormalizer
from semantica.normalize import LanguageDetector, EncodingHandler, TextCleaner
import semantica.normalize as normalize_module

text_normalizer = TextNormalizer()
entity_normalizer = EntityNormalizer()
date_normalizer = DateNormalizer()
number_normalizer = NumberNormalizer()
language_detector = LanguageDetector(default_language='en')
encoding_handler = EncodingHandler()
text_cleaner = TextCleaner()

print("Starting text normalization...")
normalized_extraction_corpus = []
for item in extraction_corpus:
    txt = item.get('text', '')

    cleaned = normalize_module.clean_text(txt, method='default') if txt else ''
    normalized_text = normalize_module.normalize_text(cleaned, method='default') if cleaned else ''

    lang = normalize_module.detect_language(normalized_text, method='default') if normalized_text else 'en'
    _ = normalize_module.handle_encoding(normalized_text, method='default') if normalized_text else normalized_text

    normalized_extraction_corpus.append({
        **item,
        'text': normalized_text,
        'language': lang,
    })

extraction_corpus = normalized_extraction_corpus

# Demonstrate normalization capabilities
demo_date = date_normalizer.normalize_date('12 Apr 2028 05:15 UTC')
demo_num = number_normalizer.normalize_number('42.0%')
demo_entity = entity_normalizer.normalize_entity('ground radar layer', entity_type='System')

print(f"Normalization completed:")
print(f"  Documents normalized: {len(extraction_corpus)}")
print(f"  Demo date normalization: {demo_date}")
print(f"  Demo number normalization: {demo_num}")
print(f"  Demo entity normalization: {demo_entity}")

In [None]:
# Semantic Extraction Layer
# Import semantic extraction classes where they are used

from semantica.semantic_extract import NamedEntityRecognizer, RelationExtractor, EventDetector
from semantica.semantic_extract import CoreferenceResolver, TripletExtractor, SemanticAnalyzer
from semantica.semantic_extract import SemanticNetworkExtractor, ExtractionValidator

ner = NamedEntityRecognizer(method='pattern', confidence_threshold=0.2)
relation_extractor = RelationExtractor(method='pattern', confidence_threshold=0.2)
event_detector = EventDetector()
coref_resolver = CoreferenceResolver()
triplet_extractor = TripletExtractor(method='pattern', include_provenance=True)
semantic_analyzer = SemanticAnalyzer()
semantic_network_extractor = SemanticNetworkExtractor()
validator = ExtractionValidator()

print("Starting semantic extraction...")
texts = [item.get('text', '') for item in extraction_corpus if item.get('text')]
resolved_texts = [coref_resolver.resolve(t) for t in texts]

entities_batch = ner.process_batch(resolved_texts)
triplets_batch = triplet_extractor.process_batch(resolved_texts)
relations_batch = [relation_extractor.extract_relations(t, entities=e) for t, e in zip(resolved_texts, entities_batch)]
events_batch = [event_detector.detect_events(t) for t in resolved_texts]

all_entities = [e for batch in entities_batch for e in batch]
all_relationships = [r for batch in relations_batch for r in batch]
all_events = [ev for batch in events_batch for ev in batch]
all_triplets = [tr for batch in triplets_batch for tr in batch]

# Validate extractions
_ = validator.validate_entities(all_entities)
_ = validator.validate_relations(all_relationships)

# Generate semantic networks
semantic_networks = [
    {
        'doc_id': extraction_corpus[i].get('doc_id', f'doc_{i}'),
        'analysis': semantic_analyzer.analyze(resolved_texts[i]),
        'network': semantic_network_extractor.extract(resolved_texts[i], entities=entities_batch[i], relations=relations_batch[i]),
    }
    for i in range(min(len(resolved_texts), len(extraction_corpus)))
]

extraction_summary = {
    'entities': len(all_entities),
    'relationships': len(all_relationships),
    'events': len(all_events),
    'triplets': len(all_triplets),
    'semantic_networks': len(semantic_networks),
    'documents_processed': len(resolved_texts),
}

print("Semantic extraction completed:")
for key, value in extraction_summary.items():
    print(f"  {key}: {value}")

In [None]:
# Ontology Evaluation with Competency Questions
# Import ontology evaluation classes where they are used - positioned after semantic extraction

from semantica.ontology import OntologyEvaluator

# Initialize Ontology Evaluator
ontology_evaluator = OntologyEvaluator(
    strict_mode=True,
    include_inference=True,
    validation_rules='comprehensive'
)

print("Starting ontology evaluation...")

# Define Competency Questions for Capability Gap Analysis
competency_questions = [
    "What capability gaps are revealed for a mission thread?",
    "Which systems provide required capabilities?", 
    "What evidence and provenance support a gap decision?",
    "Which precedents and exceptions affected a decision?",
    "How do capability gaps impact mission outcomes?",
    "What mitigation strategies are available for identified gaps?",
    "How are capability gaps prioritized and escalated?",
    "What cross-system dependencies affect capability gaps?"
]

# Ontology Evaluation Results
evaluation_results = []

if ontology_data:
    for i, ontology in enumerate(ontology_data):
        try:
            # Evaluate each ontology
            eval_result = ontology_evaluator.evaluate_ontology(
                ontology_data=ontology.data,
                competency_questions=competency_questions,
                use_case='capability_gap_analysis',
                domain='defense_planning'
            )
            
            evaluation_results.append({
                'ontology_index': i,
                'ontology_name': getattr(ontology, 'source_path', f'ontology_{i}'),
                'coverage_score': eval_result.coverage_score,
                'completeness_score': eval_result.completeness_score,
                'consistency_score': getattr(eval_result, 'consistency_score', 0.0),
                'total_questions': len(competency_questions),
                'answered_questions': len([q for q in eval_result.question_results if q.answered]),
                'identified_gaps': eval_result.gaps[:5] if hasattr(eval_result, 'gaps') else [],
                'recommendations': eval_result.recommendations[:3] if hasattr(eval_result, 'recommendations') else []
            })
            
            print(f"Ontology {i+1} evaluated successfully")
            
        except Exception as e:
            print(f"Ontology {i+1} evaluation failed: {e}")
            evaluation_results.append({
                'ontology_index': i,
                'ontology_name': getattr(ontology, 'source_path', f'ontology_{i}'),
                'error': str(e)
            })

# Enhanced Evaluation with Extracted Data Context
print("\nComparing ontology with extracted semantic data...")
if 'extraction_summary' in locals() and evaluation_results:
    extracted_entities_count = extraction_summary.get('entities', 0)
    extracted_relationships_count = extraction_summary.get('relationships', 0)
    extracted_events_count = extraction_summary.get('events', 0)
    
    print(f"Extracted semantic data context:")
    print(f"  Entities found: {extracted_entities_count}")
    print(f"  Relationships found: {extracted_relationships_count}")
    print(f"  Events found: {extracted_events_count}")
    
    # Update evaluation results with extraction context
    for result in evaluation_results:
        if 'error' not in result:
            result['extraction_context'] = {
                'extracted_entities': extracted_entities_count,
                'extracted_relationships': extracted_relationships_count,
                'extracted_events': extracted_events_count,
                'ontology_coverage_ratio': result.get('coverage_score', 0.0) / max(1, extracted_entities_count / 100)
            }

# Aggregate Evaluation Metrics
if evaluation_results:
    total_ontologies = len(evaluation_results)
    successful_evaluations = len([r for r in evaluation_results if 'error' not in r])
    
    if successful_evaluations > 0:
        avg_coverage = sum([r['coverage_score'] for r in evaluation_results if 'coverage_score' in r]) / successful_evaluations
        avg_completeness = sum([r['completeness_score'] for r in evaluation_results if 'completeness_score' in r]) / successful_evaluations
        
        print(f"\nOntology Evaluation Summary:")
        print(f"  Total Ontologies: {total_ontologies}")
        print(f"  Successful Evaluations: {successful_evaluations}")
        print(f"  Average Coverage Score: {avg_coverage:.2f}")
        print(f"  Average Completeness Score: {avg_completeness:.2f}")
        
        # Show best performing ontology
        best_ontology = max([r for r in evaluation_results if 'coverage_score' in r], 
                          key=lambda x: x['coverage_score'])
        print(f"  Best Performing Ontology: {best_ontology['ontology_name']}")
        print(f"    Coverage Score: {best_ontology['coverage_score']:.2f}")
        print(f"    Questions Answered: {best_ontology['answered_questions']}/{best_ontology['total_questions']}")
        
        # Show extraction context if available
        if 'extraction_context' in best_ontology:
            ctx = best_ontology['extraction_context']
            print(f"    Extraction Context Coverage: {ctx.get('ontology_coverage_ratio', 0):.2f}")
    else:
        print("No successful ontology evaluations")
else:
    print("No ontology data available for evaluation")

# Detailed Gap Analysis
def analyze_ontology_gaps(evaluation_results):
    """Analyze common gaps across ontologies"""
    all_gaps = []
    gap_frequency = {}
    
    for result in evaluation_results:
        if 'identified_gaps' in result:
            for gap in result['identified_gaps']:
                gap_description = str(gap)
                all_gaps.append(gap_description)
                gap_frequency[gap_description] = gap_frequency.get(gap_description, 0) + 1
    
    # Sort gaps by frequency
    common_gaps = sorted(gap_frequency.items(), key=lambda x: x[1], reverse=True)
    
    return {
        'total_gaps': len(all_gaps),
        'unique_gaps': len(gap_frequency),
        'most_common_gaps': common_gaps[:5]
    }

gap_analysis = analyze_ontology_gaps(evaluation_results)
if gap_analysis['unique_gaps'] > 0:
    print(f"\nGap Analysis:")
    print(f"  Total Gaps Identified: {gap_analysis['total_gaps']}")
    print(f"  Unique Gap Types: {gap_analysis['unique_gaps']}")
    print(f"  Most Common Gaps:")
    for gap, freq in gap_analysis['most_common_gaps']:
        print(f"    - {gap} (appears in {freq} ontologies)")

print("\nOntology evaluation completed!")
print("Ontology evaluation is now positioned after semantic extraction for better context.")

# Entity Resolution & Deduplication
# Import deduplication classes where they are used - positioned after semantic extraction

from semantica.deduplication import EntityDeduplicator, RelationshipDeduplicator

# Initialize deduplication classes
entity_deduplicator = EntityDeduplicator(
    similarity_threshold=0.85,
    algorithm='fuzzy_matching',
    use_embeddings=True
)

relationship_deduplicator = RelationshipDeduplicator(
    similarity_threshold=0.9,
    consider_context=True
)

print("Starting entity resolution and deduplication...")

# Entity Deduplication
def deduplicate_entities(entities):
    """Remove duplicate entities using fuzzy matching and embeddings"""
    if not entities:
        return []
    
    # Convert entities to standard format
    entity_records = []
    for ent in entities:
        record = {
            'id': getattr(ent, 'id', f"entity_{len(entity_records)}"),
            'name': getattr(ent, 'text', getattr(ent, 'name', '')),
            'type': getattr(ent, 'label', getattr(ent, 'type', 'entity')),
            'metadata': getattr(ent, 'metadata', {})
        }
        entity_records.append(record)
    
    # Perform deduplication
    try:
        unique_entities = entity_deduplicator.deduplicate(entity_records)
        return unique_entities
    except Exception as e:
        print(f"Entity deduplication failed: {e}")
        return entity_records

# Relationship Deduplication
def deduplicate_relationships(relationships):
    """Remove duplicate relationships"""
    if not relationships:
        return []
    
    # Convert to standard format
    relationship_records = []
    for rel in relationships:
        record = {
            'id': getattr(rel, 'id', f"rel_{len(relationship_records)}"),
            'source': getattr(getattr(rel, 'subject', None), 'id', getattr(rel, 'source', '')),
            'target': getattr(getattr(rel, 'object', None), 'id', getattr(rel, 'target', '')),
            'type': getattr(rel, 'predicate', getattr(rel, 'type', 'related_to')),
            'metadata': getattr(rel, 'metadata', {})
        }
        relationship_records.append(record)
    
    # Perform deduplication
    try:
        unique_relationships = relationship_deduplicator.deduplicate(relationship_records)
        return unique_relationships
    except Exception as e:
        print(f"Relationship deduplication failed: {e}")
        return relationship_records

# Apply Deduplication to Extracted Data
print("Deduplicating entities...")
if 'all_entities' in locals():
    unique_entities = deduplicate_entities(all_entities)
    print(f"  Entities: {len(all_entities)} → {len(unique_entities)} ({len(all_entities)-len(unique_entities)} duplicates removed)")
else:
    unique_entities = []
    print("  No entities available for deduplication")

print("Deduplicating relationships...")
if 'all_relationships' in locals():
    unique_relationships = deduplicate_relationships(all_relationships)
    print(f"  Relationships: {len(all_relationships)} → {len(unique_relationships)} ({len(all_relationships)-len(unique_relationships)} duplicates removed)")
else:
    unique_relationships = []
    print("  No relationships available for deduplication")

print("Deduplicating events...")
if 'all_events' in locals():
    # Simple event deduplication based on event type and timestamp
    unique_events = []
    seen_events = set()
    for event in all_events:
        event_key = (getattr(event, 'type', 'unknown'), getattr(event, 'timestamp', ''))
        if event_key not in seen_events:
            seen_events.add(event_key)
            unique_events.append(event)
    print(f"  Events: {len(all_events)} → {len(unique_events)} ({len(all_events)-len(unique_events)} duplicates removed)")
else:
    unique_events = []
    print("  No events available for deduplication")

# Update global variables with deduplicated data
all_entities = unique_entities
all_relationships = unique_relationships
all_events = unique_events

# Deduplication Statistics
deduplication_stats = {
    'original_entities': len(all_entities) + len(unique_entities) if 'all_entities' in locals() else 0,
    'deduplicated_entities': len(unique_entities),
    'original_relationships': len(all_relationships) + len(unique_relationships) if 'all_relationships' in locals() else 0,
    'deduplicated_relationships': len(unique_relationships),
    'original_events': len(all_events) + len(unique_events) if 'all_events' in locals() else 0,
    'deduplicated_events': len(unique_events),
    'entity_deduplication_rate': round((len(all_entities) - len(unique_entities)) / len(all_entities) * 100, 2) if 'all_entities' in locals() and len(all_entities) > 0 else 0,
    'relationship_deduplication_rate': round((len(all_relationships) - len(unique_relationships)) / len(all_relationships) * 100, 2) if 'all_relationships' in locals() and len(all_relationships) > 0 else 0,
    'event_deduplication_rate': round((len(all_events) - len(unique_events)) / len(all_events) * 100, 2) if 'all_events' in locals() and len(all_events) > 0 else 0
}

print("\nDeduplication Statistics:")
for stat, value in deduplication_stats.items():
    print(f"  {stat}: {value}")

print("\nEntity resolution and deduplication completed!")
print("Data is now ready for knowledge graph construction.")

In [None]:
from semantica.kg import EntityResolver
import semantica.conflicts as conflicts_module

entity_dicts = []
for e in all_entities:
    entity_dicts.append({
        'id': str(getattr(e, 'id', getattr(e, 'text', 'unknown'))),
        'name': str(getattr(e, 'text', getattr(e, 'id', 'unknown'))),
        'type': str(getattr(e, 'label', getattr(e, 'type', 'entity'))),
        'metadata': getattr(e, 'metadata', {}) or {}
    })

entity_resolver = EntityResolver(strategy='fuzzy')
resolved_entities = entity_resolver.resolve_entities(entity_dicts[:200]) if entity_dicts else []

conflict_rows = [
    {'id': 'System_GroundRadarLayer', 'coveragePercent': '42', 'type': 'system'},
    {'id': 'System_GroundRadarLayer', 'coveragePercent': '58', 'type': 'system'},
]
conflicts = conflicts_module.detect_conflicts(conflict_rows, method='value', property_name='coveragePercent')
resolved_conflicts = conflicts_module.resolve_conflicts(conflicts, method=conflicts_module.voting) if conflicts else []

{
    'entities_before_resolution': len(entity_dicts[:200]),
    'entities_after_resolution': len(resolved_entities),
    'conflicts_detected': len(conflicts),
    'conflicts_resolved': len(resolved_conflicts),
}


In [None]:
# Vector Store Configuration for Semantic Search
# Import vector store classes where they are used - positioned after KG construction

from semantica.vector_store import VectorStore, ApacheAGEVectorStore

# Apache AGE Connection Configuration
age_config = {
    'host': 'localhost',
    'port': 5432,
    'database': 'postgres',
    'user': 'postgres',
    'password': 'password',
    'graph_name': 'capability_gap_analysis',
    'vector_dimension': 768,  # For sentence-transformers embeddings
    'vector_index_type': 'ivfflat',  # IVF Flat index for efficient search
    'similarity_metric': 'cosine'
}

# Initialize Vector Store for Semantic Search
try:
    # Try to connect to Apache AGE
    age_vector_store = ApacheAGEVectorStore(**age_config)
    
    # Create graph if it doesn't exist
    age_vector_store.create_graph(
        graph_name=age_config['graph_name'],
        vector_dimension=age_config['vector_dimension']
    )
    
    # Setup vector index for semantic search
    age_vector_store.create_vector_index(
        index_name=f"{age_config['graph_name']}_vector_idx",
        index_type=age_config['vector_index_type']
    )
    
    vector_store = age_vector_store
    print("Apache AGE Vector Store connected and configured!")
    print(f"Graph: {age_config['graph_name']}")
    print(f"Vector Dimension: {age_config['vector_dimension']}")
    print(f"Index Type: {age_config['vector_index_type']}")
    
except Exception as e:
    print(f"Apache AGE not available, falling back to in-memory vector store: {e}")
    
    # Fallback to in-memory vector store
    vector_store = VectorStore(
        backend='inmemory',
        dimension=768,
        similarity_metric='cosine'
    )
    print("In-memory Vector Store initialized as fallback")

# Test vector store capabilities
def test_vector_store_capabilities(vector_store_instance):
    """Test basic vector store operations"""
    try:
        # Test embedding generation
        test_texts = [
            "Low altitude detection capability gap",
            "Force protection under swarm pressure",
            "A2/AD environment scenario analysis"
        ]
        
        # Store test vectors
        stored_ids = []
        for i, text in enumerate(test_texts):
            doc_id = vector_store_instance.store(
                content=text,
                metadata={'test': True, 'index': i}
            )
            stored_ids.append(doc_id)
        
        # Test semantic search
        search_results = vector_store_instance.search(
            query="capability gap detection",
            limit=3
        )
        
        return {
            'status': 'success',
            'stored_count': len(stored_ids),
            'search_results': len(search_results),
            'store_type': type(vector_store_instance).__name__
        }
        
    except Exception as e:
        return {
            'status': 'error',
            'error': str(e),
            'store_type': type(vector_store_instance).__name__
        }

# Test the vector store
test_result = test_vector_store_capabilities(vector_store)
print(f"Vector Store Test: {test_result}")

print("Vector store ready for semantic search and context graph integration!")

In [None]:
from semantica.kg import GraphBuilder, GraphAnalyzer

graph_builder = GraphBuilder(merge_entities=True, resolve_conflicts=True)
kg = graph_builder.build([{'entities': all_entities, 'relationships': all_relationships}], extract=False)

graph_analyzer = GraphAnalyzer()
kg_analysis = graph_analyzer.analyze_graph(kg)

{
    'kg_entities': len(kg.get('entities', [])),
    'kg_relationships': len(kg.get('relationships', [])),
    'has_analysis': bool(kg_analysis),
}

## KG Analytics (Centrality, Communities, Connectivity, Similarity, Link Prediction)

- Modules: `CentralityCalculator`, `CommunityDetector`, `ConnectivityAnalyzer`, `SimilarityCalculator`, `LinkPredictor`, `NodeEmbedder`
- Runs graph metrics and analytics.
- Calculates centrality, communities, connectivity, similarity, and link predictions.
- Checks node embedding availability.

In [None]:
from semantica.kg import CentralityCalculator, CommunityDetector, ConnectivityAnalyzer
from semantica.kg import SimilarityCalculator, LinkPredictor

extended_kg_analytics = {}

try:
    centrality_calc = CentralityCalculator()
    cent = centrality_calc.calculate_all_centrality(kg)
    extended_kg_analytics['centrality_keys'] = list(cent.keys())[:10]
except Exception as e:
    extended_kg_analytics['centrality_error'] = str(e)

try:
    community_detector = CommunityDetector()
    comm = community_detector.detect_communities(kg, algorithm='louvain')
    extended_kg_analytics['community_count'] = comm.get('num_communities', None) if isinstance(comm, dict) else None
except Exception as e:
    extended_kg_analytics['community_error'] = str(e)

try:
    connectivity_analyzer = ConnectivityAnalyzer()
    conn = connectivity_analyzer.analyze_connectivity(kg)
    extended_kg_analytics['connectivity_keys'] = list(conn.keys())[:10] if isinstance(conn, dict) else []
except Exception as e:
    extended_kg_analytics['connectivity_error'] = str(e)

try:
    sim_calc = SimilarityCalculator(method='cosine')
    extended_kg_analytics['sample_cosine_similarity'] = sim_calc.cosine_similarity([1.0, 0.0, 1.0], [0.8, 0.2, 0.9])
except Exception as e:
    extended_kg_analytics['similarity_error'] = str(e)

try:
    link_predictor = LinkPredictor()
    lp = link_predictor.predict_links(kg, top_k=5)
    extended_kg_analytics['predicted_links'] = len(lp) if hasattr(lp, '__len__') else None
except Exception as e:
    extended_kg_analytics['link_prediction_error'] = str(e)

extended_kg_analytics

In [None]:
from semantica.kg import NodeEmbedder

node_embedding_status = {}
try:
    embedder = NodeEmbedder(method='node2vec', embedding_dimension=32, walk_length=20, num_walks=5)
    node_embedding_status['node2vec_ready'] = True
except Exception as e:
    node_embedding_status['node2vec_ready'] = False
    node_embedding_status['reason'] = str(e)

node_embedding_status

In [None]:
# Advanced Vector Store Integration with Apache AGE
# Semantic search and context management with graph database backend

# Initialize Vector Store with Apache AGE Backend
try:
    # Use Apache AGE if available, fallback to in-memory
    if 'age_vector_store' in locals():
        vector_store = age_vector_store
        print("Using Apache AGE Vector Store")
    else:
        vector_store = VectorStore(
            backend='inmemory',
            dimension=768,
            similarity_metric='cosine',
            index_type='faiss'  # Use FAISS for fast similarity search
        )
        print("Using In-Memory Vector Store with FAISS indexing")
        
except Exception as e:
    print(f"Vector store initialization failed: {e}")
    vector_store = None

# Initialize Agent Context with Enhanced Features
if vector_store and 'context_graph' in locals():
    agent_context = AgentContext(
        vector_store=vector_store,
        knowledge_graph=context_graph,
        decision_tracking=True,
        advanced_analytics=True,
        kg_algorithms=True,
        vector_store_features=True,
        graph_expansion=True,
        max_expansion_hops=3,
        cache_enabled=True,
        cache_size=1000,
        embedding_model='sentence-transformers/all-MiniLM-L6-v2',
        similarity_threshold=0.7
    )
    
    print("Agent Context initialized with advanced features")
    print(f"Vector Dimension: {vector_store.dimension if hasattr(vector_store, 'dimension') else 'Unknown'}")
    print(f"Similarity Metric: {vector_store.similarity_metric if hasattr(vector_store, 'similarity_metric') else 'Unknown'}")
    
else:
    print("Cannot initialize Agent Context - missing vector store or context graph")
    agent_context = None

# Store Documents in Vector Store
if agent_context and 'corpus' in locals():
    print("Storing documents in vector store...")
    
    # Prepare documents for storage
    documents_to_store = []
    for doc in corpus:
        doc_record = {
            'content': doc.get('text', '')[:2500],  # Limit content length
            'metadata': {
                'source': doc.get('source', ''),
                'doc_id': doc.get('doc_id', ''),
                'type': 'capability_gap_document',
                'timestamp': datetime.now().isoformat()
            }
        }
        documents_to_store.append(doc_record)
    
    try:
        # Store documents with entity extraction
        storage_result = agent_context.store(
            documents=documents_to_store,
            extract_entities=True,
            extract_relationships=True,
            batch_size=10,
            update_existing=True
        )
        
        print(f"Stored {storage_result.get('stored_count', 0)} documents")
        print(f"Extracted {storage_result.get('entities_extracted', 0)} entities")
        print(f"Extracted {storage_result.get('relationships_extracted', 0)} relationships")
        
    except Exception as e:
        print(f"Document storage failed: {e}")

# Record Strategic Decisions
if agent_context:
    print("Recording strategic decisions...")
    
    decisions = []
    
    # Decision 1: Capability Gap Assessment
    decision1 = agent_context.record_decision(
        category='capability_gap_assessment',
        scenario='Future A2/AD mission thread with low-altitude swarm pressure',
        reasoning='Mission requires persistent low-altitude detection, but current radar layer indicates limited valley and urban coverage. Threat analysis shows 70% increase in swarm effectiveness in covered terrain.',
        outcome='gap_identified_low_altitude_detection',
        confidence=0.93,
        entities=['MissionThread_ForceProtection', 'Capability_LowAltitudeDetection', 'Gap_LowAltitudeDetectionCoverage'],
        metadata={
            'threat_level': 'high',
            'impact_score': 0.87,
            'mitigation_urgency': 'immediate',
            'estimated_cost': 'high',
            'timeline': '2028-2030'
        }
    )
    decisions.append(decision1)
    
    # Decision 2: Mitigation Strategy
    decision2 = agent_context.record_decision(
        category='capability_gap_mitigation',
        scenario='Counter low-altitude swarm incursions',
        reasoning='Need layered sensing integration and revised mission doctrine to close detection delay. Analysis shows 40% improvement with multi-sensor fusion.',
        outcome='recommend_multilayer_sensor_fusion',
        confidence=0.88,
        entities=['System_GroundRadarLayer', 'Gap_LowAltitudeDetectionCoverage'],
        metadata={
            'recommended_solution': 'multi_layer_sensor_fusion',
            'expected_improvement': '40%',
            'implementation_complexity': 'medium',
            'cost_estimate': 'medium',
            'timeline': '2026-2028'
        }
    )
    decisions.append(decision2)
    
    # Decision 3: Policy Exception Request
    decision3 = agent_context.record_decision(
        category='policy_exception',
        scenario='Emergency capability deployment',
        reasoning='Immediate threat requires expedited capability deployment bypassing standard procurement timeline.',
        outcome='emergency_deployment_approved',
        confidence=0.95,
        entities=['MissionThread_ForceProtection', 'System_GroundRadarLayer'],
        metadata={
            'exception_type': 'emergency_deployment',
            'standard_timeline_bypassed': True,
            'approval_authority': 'combatant_commander',
            'justification': 'imminent_threat'
        }
    )
    decisions.append(decision3)
    
    print(f"Recorded {len(decisions)} strategic decisions")
    for i, decision in enumerate(decisions):
        print(f"  Decision {i+1}: {decision.category} - {decision.outcome}")

# Advanced Semantic Search
if agent_context:
    print("Performing advanced semantic search...")
    
    search_queries = [
        "Which capability gaps most increase mission risk in this scenario?",
        "What mitigation strategies are available for low-altitude detection gaps?",
        "How do sensor fusion solutions address capability shortfalls?",
        "What policy exceptions are needed for emergency deployment?"
    ]
    
    search_results = {}
    for query in search_queries:
        try:
            results = agent_context.retrieve(
                query=query,
                max_results=5,
                expand_graph=True,
                include_entities=True,
                include_decisions=True,
                similarity_threshold=0.6,
                use_hybrid_search=True
            )
            
            search_results[query] = {
                'results_count': len(results),
                'top_result_type': type(results[0]).__name__ if results else 'none',
                'avg_similarity': sum([getattr(r, 'similarity', 0) for r in results]) / len(results) if results else 0
            }
            
            print(f"  Query: {query[:50]}...")
            print(f"    Results: {len(results)} documents")
            print(f"    Avg Similarity: {search_results[query]['avg_similarity']:.2f}")
            
        except Exception as e:
            print(f"  Search failed for query: {query[:50]}... - {e}")
            search_results[query] = {'error': str(e)}

print("Vector store integration completed!")

In [None]:
from semantica.vector_store import VectorStore
from semantica.context import AgentContext

vector_store = VectorStore(backend='inmemory', dimension=384)
agent_context = AgentContext(
    vector_store=vector_store,
    knowledge_graph=context_graph,
    decision_tracking=True,
    advanced_analytics=True,
    kg_algorithms=True,
    vector_store_features=True,
    graph_expansion=True,
    max_expansion_hops=3,
)

stored = agent_context.store(
    [{'content': c['text'][:2500], 'metadata': {'source': c['source'], 'doc_id': c['doc_id']}} for c in corpus],
    extract_entities=False,
    extract_relationships=False
)

d1 = agent_context.record_decision(
    category='capability_gap_assessment',
    scenario='Future A2/AD mission thread with low-altitude swarm pressure',
    reasoning='Mission requires persistent low-altitude detection, but current radar layer indicates limited valley and urban coverage.',
    outcome='gap_identified_low_altitude_detection',
    confidence=0.93,
    entities=['MissionThread_ForceProtection', 'Capability_LowAltitudeDetection', 'Gap_LowAltitudeDetectionCoverage'],
)

d2 = agent_context.record_decision(
    category='capability_gap_mitigation',
    scenario='Counter low-altitude swarm incursions',
    reasoning='Need layered sensing integration and revised mission doctrine to close detection delay.',
    outcome='recommend_multilayer_sensor_fusion',
    confidence=0.88,
    entities=['System_GroundRadarLayer', 'Gap_LowAltitudeDetectionCoverage'],
)

retrieved = agent_context.retrieve(
    query='Which capability gaps most increase mission risk in this scenario?',
    max_results=8,
    expand_graph=True,
    include_entities=True,
)

{'stored': stored.get('stored_count', 0), 'decisions': [d1, d2], 'retrieved': len(retrieved)}

## Decision Traces: Policies, Exceptions, Approval Chains, Precedents, Cross-System Context

- Modules: `AgentContext`, `ContextGraph`, `PolicyEngine`
- Models: `Decision`, `Policy`, `PolicyException`, `ApprovalChain`, `Precedent`
- Records decisions and policy checks.
- Adds exceptions, approvals, precedents, and cross-system context.

In [None]:
from datetime import datetime
from semantica.context.decision_models import Decision, Policy, PolicyException, ApprovalChain, Precedent
from semantica.context.policy_engine import PolicyEngine

# Policy model aligned to 'policy v3.2 + exception route' pattern from the article
policy_engine = PolicyEngine(context_graph)
renewal_policy = Policy(
    policy_id='POL-CAPGAP-3.2',
    name='Capability Gap Escalation Policy',
    description='Escalate and require approval when mission-critical capability coverage is below threshold.',
    rules={
        'min_confidence': 0.8,
        'required_categories': ['capability_gap_assessment', 'capability_gap_mitigation'],
        'allowed_outcomes': ['gap_identified_low_altitude_detection', 'recommend_multilayer_sensor_fusion', 'escalate_for_exception']
    },
    category='capability_gap_assessment',
    version='3.2',
    created_at=datetime.now(),
    updated_at=datetime.now(),
    metadata={'entities': ['MissionThread_ForceProtection', 'System_GroundRadarLayer']}
)
policy_engine.add_policy(renewal_policy)

# Construct explicit trace artifacts (exception, approval, precedent link)
trace_decision = Decision(
    decision_id='',
    category='capability_gap_assessment',
    scenario='Coverage threshold breach during swarm-pressure mission thread',
    reasoning='Below-threshold low-altitude detection coverage with repeated threat ingress; escalation required.',
    outcome='escalate_for_exception',
    confidence=0.89,
    timestamp=datetime.now(),
    decision_maker='joint_ops_agent',
    metadata={'policy_version': '3.2'}
)

policy_exception = PolicyException(
    exception_id='',
    decision_id=trace_decision.decision_id,
    policy_id='POL-CAPGAP-3.2',
    reason='Emergency force-protection override due to active swarm threat',
    approver='VP_Operations',
    approval_timestamp=datetime.now(),
    justification='Mission-critical risk outweighs standard route latency',
    metadata={'channel': 'slack_dm'}
)

approval_chain = ApprovalChain(
    approval_id='',
    decision_id=trace_decision.decision_id,
    approver='Finance_Controller',
    approval_method='zoom_call',
    approval_context='Approved exceptional spend for layered sensing package',
    timestamp=datetime.now(),
    metadata={'step': 'final_finance_gate'}
)

precedent_link = Precedent(
    precedent_id='',
    source_decision_id=d1,
    similarity_score=0.92,
    relationship_type='similar_scenario',
    metadata={'note': 'Prior low-altitude detection gap precedent'}
)

# Persist trace artifacts into ContextGraph as first-class decision-trace nodes
trace_decision_id = context_graph.record_decision(
    category=trace_decision.category,
    scenario=trace_decision.scenario,
    reasoning=trace_decision.reasoning,
    outcome=trace_decision.outcome,
    confidence=trace_decision.confidence,
    entities=['MissionThread_ForceProtection', 'Gap_LowAltitudeDetectionCoverage'],
    decision_maker=trace_decision.decision_maker,
    metadata={'policy_version': '3.2', 'cross_system_context': {'crm': 'critical_account', 'zendesk': 'open_escalation', 'pagerduty': 'sev1_incidents'}}
)

context_graph.add_node(policy_exception.exception_id, 'policy_exception', policy_exception.reason)
context_graph.add_edge(trace_decision_id, policy_exception.exception_id, 'has_exception')
context_graph.add_node(approval_chain.approval_id, 'approval', approval_chain.approval_context)
context_graph.add_edge(trace_decision_id, approval_chain.approval_id, 'approved_by_chain')
context_graph.add_node(precedent_link.precedent_id, 'precedent', 'precedent linkage')
context_graph.add_edge(trace_decision_id, precedent_link.precedent_id, 'uses_precedent')
context_graph.add_edge(precedent_link.precedent_id, d1, 'points_to_decision')

# Compliance check against policy v3.2
compliant = policy_engine.check_compliance(trace_decision, 'POL-CAPGAP-3.2')

{'trace_decision_id': trace_decision_id, 'policy_compliant': compliant, 'policy_id': renewal_policy.policy_id, 'policy_version': renewal_policy.version}

In [None]:
# Search precedent and causal impact to convert one-off exceptions into reusable governance
precedents = agent_context.find_precedents(
    scenario='Low-altitude detection shortfall under swarm pressure',
    category='capability_gap_assessment',
    limit=5,
    use_hybrid_search=True
)

impact = context_graph.analyze_decision_impact(trace_decision_id)
insights = context_graph.get_decision_summary()

{
    'precedent_hits': len(precedents),
    'impact_total_influenced': impact.get('total_influenced', 0),
    'decision_total': insights.get('total_decisions', 0),
    'categories': insights.get('categories', {})
}

In [None]:
# Cross-system synthesis snapshot using AgentContext API
try:
    cross_system_snapshot = agent_context.capture_cross_system_inputs(
        systems=['crm', 'ticketing', 'incident_management', 'asset_inventory'],
        entity_id='MissionThread_ForceProtection'
    )
except Exception as e:
    cross_system_snapshot = {'error': str(e)}

cross_system_snapshot

In [None]:
import semantica.context as context_module

hop_1 = context_graph.get_neighbors('Scenario_FutureA2AD_2028', hops=1)
hop_2 = context_graph.get_neighbors('Scenario_FutureA2AD_2028', hops=2)
hop_3 = context_graph.get_neighbors('Scenario_FutureA2AD_2028', hops=3)

reasoning_paths = []
try:
    mh = context_module.multi_hop_query(
        context_graph,
        start_entity='Scenario_FutureA2AD_2028',
        query='Trace mission-thread to capability-gap path',
        max_hops=3,
    )
    reasoning_paths = mh.get('decisions', []) if isinstance(mh, dict) else []
except Exception as e:
    reasoning_paths = [{'error': str(e)}]

{'hop1': len(hop_1), 'hop2': len(hop_2), 'hop3': len(hop_3), 'multi_hop_results': len(reasoning_paths)}


In [None]:
from semantica.reasoning import Reasoner, ExplanationGenerator

reasoner = Reasoner()
reasoner.add_rule('IF MissionRequires(?m, LowAltitudeDetection) AND CoverageStatus(?m, Insufficient) THEN CapabilityGap(?m, LowAltitudeDetectionGap)')
reasoner.add_rule('IF CapabilityGap(?m, LowAltitudeDetectionGap) AND ThreatLevel(?m, High) THEN OutcomeRisk(?m, Elevated)')

reasoner.add_fact('MissionRequires(MissionThread_ForceProtection, LowAltitudeDetection)')
reasoner.add_fact('CoverageStatus(MissionThread_ForceProtection, Insufficient)')
reasoner.add_fact('ThreatLevel(MissionThread_ForceProtection, High)')

inferred = reasoner.forward_chain()

explanation_text = ''
if inferred:
    explanation_generator = ExplanationGenerator()
    explanation = explanation_generator.generate_explanation(inferred[-1])
    explanation_text = explanation.natural_language

{'inferred': [f.conclusion for f in inferred], 'explanation': explanation_text}

## Versioned Decision Governance (Policy / Ontology Change Tracking)

In [None]:
from semantica.change_management import VersionManager

version_manager = VersionManager(base_uri='https://example.org/mcg')

v1 = version_manager.create_version(
    '3.1',
    ontology={'uri': 'https://example.org/mcg', 'classes': [], 'properties': []},
    changes=['Initial capability-gap decision policy baseline'],
    metadata={'structure': {'classes': ['Scenario', 'MissionThread', 'CapabilityGap'], 'properties': ['revealsGap']}}
)

v2 = version_manager.create_version(
    '3.2',
    ontology={'uri': 'https://example.org/mcg', 'classes': [], 'properties': []},
    changes=['Added explicit policy exception and approval-chain trace constructs'],
    metadata={'structure': {'classes': ['Scenario', 'MissionThread', 'CapabilityGap', 'PolicyException', 'ApprovalChain'], 'properties': ['revealsGap', 'has_exception', 'approved_by_chain']}}
)

version_diff = version_manager.compare_versions('3.1', '3.2')
{'latest_version': version_manager.latest_version, 'classes_added': version_diff.get('classes_added', []), 'properties_added': version_diff.get('properties_added', [])}

In [None]:
# Comprehensive Analytics & Performance Monitoring
# Advanced analytics for capability gap analysis system

# Initialize Analytics Classes
try:
    graph_analytics = GraphAnalytics()
    decision_analytics = DecisionAnalytics()
    performance_analytics = PerformanceAnalytics()
    
    print("Analytics modules initialized")
except Exception as e:
    print(f"Analytics initialization failed: {e}")
    graph_analytics = None
    decision_analytics = None
    performance_analytics = None

# Graph Analytics Dashboard
if graph_analytics and 'context_graph' in locals():
    print("Generating graph analytics...")
    
    try:
        # Graph Structure Analysis
        graph_metrics = graph_analytics.analyze_graph_structure(context_graph)
        
        # Centrality Analysis
        centrality_metrics = graph_analytics.analyze_centrality(
            context_graph,
            algorithms=['degree', 'betweenness', 'closeness', 'eigenvector']
        )
        
        # Community Detection
        community_metrics = graph_analytics.detect_communities(
            context_graph,
            algorithm='louvain',
            resolution=1.0
        )
        
        # Path Analysis
        path_metrics = graph_analytics.analyze_paths(
            context_graph,
            source_nodes=['Scenario_FutureA2AD_2028'],
            target_nodes=['Outcome_MissionRiskIncrease'],
            max_path_length=5
        )
        
        print("Graph analytics completed")
        
    except Exception as e:
        print(f"Graph analytics failed: {e}")

# Decision Analytics Dashboard
if decision_analytics and 'agent_context' in locals():
    print("Generating decision analytics...")
    
    try:
        # Decision Pattern Analysis
        decision_patterns = decision_analytics.analyze_decision_patterns(
            agent_context,
            time_window_days=30,
            categories=['capability_gap_assessment', 'capability_gap_mitigation', 'policy_exception']
        )
        
        # Compliance Analysis
        compliance_metrics = decision_analytics.analyze_compliance(
            agent_context,
            policy_ids=['POL-CAPGAP-3.2'],
            include_exceptions=True
        )
        
        # Decision Impact Analysis
        impact_analysis = decision_analytics.analyze_decision_impact(
            agent_context,
            decision_types=['capability_gap_assessment'],
            impact_metrics=['risk_reduction', 'cost_impact', 'timeline_impact']
        )
        
        print("Decision analytics completed")
        
    except Exception as e:
        print(f"Decision analytics failed: {e}")

# Performance Monitoring
if performance_analytics:
    print("Generating performance metrics...")
    
    try:
        # System Performance
        performance_metrics = performance_analytics.measure_system_performance(
            components=['vector_store', 'context_graph', 'decision_engine'],
            metrics=['response_time', 'throughput', 'memory_usage', 'cpu_usage']
        )
        
        # Query Performance
        query_performance = performance_analytics.analyze_query_performance(
            agent_context if 'agent_context' in locals() else None,
            query_types=['semantic_search', 'graph_traversal', 'decision_retrieval'],
            sample_size=10
        )
        
        print("Performance monitoring completed")
        
    except Exception as e:
        print(f"Performance monitoring failed: {e}")

# Comprehensive Dashboard Summary
def generate_dashboard_summary():
    """Generate a comprehensive analytics dashboard"""
    
    dashboard = {
        'timestamp': datetime.now().isoformat(),
        'system_status': 'operational',
        'components': {}
    }
    
    # Graph Component Status
    if 'context_graph' in locals():
        dashboard['components']['context_graph'] = {
            'status': 'active',
            'nodes': len(context_graph.nodes) if hasattr(context_graph, 'nodes') else 0,
            'edges': len(context_graph.edges) if hasattr(context_graph, 'edges') else 0,
            'analytics': 'graph_analytics' in locals() and graph_analytics is not None
        }
    
    # Vector Store Component Status
    if 'vector_store' in locals():
        dashboard['components']['vector_store'] = {
            'status': 'active',
            'type': type(vector_store).__name__,
            'dimension': getattr(vector_store, 'dimension', 'unknown'),
            'backend': getattr(vector_store, 'backend', 'unknown')
        }
    
    # Agent Context Component Status
    if 'agent_context' in locals():
        dashboard['components']['agent_context'] = {
            'status': 'active',
            'decision_tracking': agent_context.decision_tracking if hasattr(agent_context, 'decision_tracking') else False,
            'advanced_analytics': agent_context.advanced_analytics if hasattr(agent_context, 'advanced_analytics') else False,
            'vector_store_features': agent_context.vector_store_features if hasattr(agent_context, 'vector_store_features') else False
        }
    
    # Analytics Component Status
    dashboard['components']['analytics'] = {
        'graph_analytics': graph_analytics is not None,
        'decision_analytics': decision_analytics is not None,
        'performance_analytics': performance_analytics is not None
    }
    
    return dashboard

# Generate and display dashboard
dashboard_summary = generate_dashboard_summary()

print("Analytics Dashboard Summary:")
print(f"  Timestamp: {dashboard_summary['timestamp']}")
print(f"  System Status: {dashboard_summary['system_status']}")
print("  Components:")
for component, status in dashboard_summary['components'].items():
    print(f"    {component}: {status}")

# Key Performance Indicators
kpis = {
    'graph_density': 0.0,
    'decision_volume': 0,
    'search_latency': 0.0,
    'compliance_rate': 0.0,
    'system_health': 'good'
}

if 'context_graph' in locals():
    nodes = len(context_graph.nodes) if hasattr(context_graph, 'nodes') else 0
    edges = len(context_graph.edges) if hasattr(context_graph, 'edges') else 0
    kpis['graph_density'] = edges / (nodes * (nodes - 1)) if nodes > 1 else 0.0

if 'agent_context' in locals():
    try:
        insights = agent_context.get_context_insights()
        kpis['decision_volume'] = insights.get('decision_tracking', {}).get('total_decisions', 0)
    except:
        pass

print("Key Performance Indicators:")
for kpi, value in kpis.items():
    print(f"  {kpi}: {value}")

print("Comprehensive analytics and monitoring completed!")

In [None]:
from semantica.provenance import ProvenanceManager

provenance_db = OUTPUT_DIR / 'capability_gap_provenance.db'
prov = ProvenanceManager(storage_path=str(provenance_db))

for c in corpus:
    prov.track_entity(entity_id=f"source::{c['doc_id']}", source=c['source'], metadata={'document_type': 'corpus_source'})

for ent in all_entities[:80]:
    ent_id = str(getattr(ent, 'id', getattr(ent, 'text', 'unknown_entity')))
    src_doc = (getattr(ent, 'metadata', {}) or {}).get('source_doc', 'unknown_source')
    prov.track_entity(
        entity_id=f"entity::{ent_id}",
        source=src_doc,
        metadata={'entity_text': str(getattr(ent, 'text', ent_id)), 'entity_type': str(getattr(ent, 'label', 'entity'))}
    )

for i, rel in enumerate(all_relationships[:120]):
    src_doc = (getattr(rel, 'metadata', {}) or {}).get('source_doc', 'unknown_source')
    prov.track_relationship(relationship_id=f'rel::{i}', source=src_doc, metadata={'relation_type': str(getattr(rel, 'predicate', getattr(rel, 'type', 'related_to')))})

{'stats': prov.get_statistics(), 'lineage_sample': prov.get_lineage('entity::MissionThread_ForceProtection')}

# Comprehensive Export & Reporting Layer
# Multi-format export with Apache AGE integration and advanced reporting

# Initialize Export Classes
json_exporter = JSONExporter(indent=2, ensure_ascii=False)
graph_exporter = GraphExporter(format='graphml', include_attributes=True)
rdf_exporter = RDFExporter(format='turtle', base_uri='https://defense.gov/capability-gap/')
csv_exporter = CSVExporter(delimiter=',', quotechar='"')
yaml_exporter = YAMLExporter(default_flow_style=False)
lpg_exporter = LPGExporter(format='cypher', include_relationships=True)
report_generator = ReportGenerator(template='comprehensive', include_visualizations=True)

print("Starting comprehensive export process...")

# Export Knowledge Graph
if 'kg' in locals():
    try:
        # JSON Export
        kg_json_path = OUTPUT_DIR / 'capability_gap_kg.json'
        json_exporter.export(kg, str(kg_json_path))
        
        # RDF Export
        kg_rdf_path = OUTPUT_DIR / 'capability_gap_kg.ttl'
        rdf_exporter.export(kg, str(kg_rdf_path))
        
        # CSV Export (separate files for entities and relationships)
        kg_entities_csv = OUTPUT_DIR / 'capability_gap_entities.csv'
        kg_relationships_csv = OUTPUT_DIR / 'capability_gap_relationships.csv'
        
        entities_data = kg.get('entities', [])
        relationships_data = kg.get('relationships', [])
        
        csv_exporter.export(entities_data, str(kg_entities_csv))
        csv_exporter.export(relationships_data, str(kg_relationships_csv))
        
        print(f"Knowledge Graph exported to {len([kg_json_path, kg_rdf_path, kg_entities_csv, kg_relationships_csv])} files")
        
    except Exception as e:
        print(f"Knowledge Graph export failed: {e}")

# Export Context Graph
if 'context_graph' in locals():
    try:
        # Context Graph JSON
        context_json_path = OUTPUT_DIR / 'capability_gap_context_graph.json'
        context_dict = context_graph.to_dict()
        json_exporter.export(context_dict, str(context_json_path))
        
        # Context Graph GraphML
        context_graphml_path = OUTPUT_DIR / 'capability_gap_context_graph.graphml'
        graph_exporter.export(context_dict, str(context_graphml_path))
        
        # Context Graph LPG (Cypher)
        context_lpg_path = OUTPUT_DIR / 'capability_gap_context_graph.cypher'
        lpg_exporter.export(context_dict, str(context_lpg_path))
        
        print(f"Context Graph exported to {len([context_json_path, context_graphml_path, context_lpg_path])} files")
        
    except Exception as e:
        print(f"Context Graph export failed: {e}")

# Apache AGE Database Export
if 'age_vector_store' in locals():
    try:
        # Export to Apache AGE database
        age_export_path = OUTPUT_DIR / 'apache_age_export.sql'
        
        # Generate SQL export script
        age_export_script = age_vector_store.export_to_sql(
            include_vectors=True,
            include_metadata=True,
            format='postgresql'
        )
        
        with open(age_export_path, 'w') as f:
            f.write(age_export_script)
        
        print(f"Apache AGE export script generated: {age_export_path}")
        
    except Exception as e:
        print(f"Apache AGE export failed: {e}")

# Decision Records Export
if 'agent_context' in locals():
    try:
        # Export decisions with full trace
        decisions_export_path = OUTPUT_DIR / 'capability_gap_decisions.json'
        
        # Get all decisions with trace information
        decisions_data = agent_context.export_decisions(
            format='json',
            include_trace=True,
            include_compliance=True,
            include_precedents=True
        )
        
        json_exporter.export(decisions_data, str(decisions_export_path))
        
        # Export decisions as CSV for analysis
        decisions_csv_path = OUTPUT_DIR / 'capability_gap_decisions.csv'
        
        # Flatten decision data for CSV
        flat_decisions = []
        for decision in decisions_data.get('decisions', []):
            flat_decision = {
                'decision_id': decision.get('decision_id'),
                'category': decision.get('category'),
                'scenario': decision.get('scenario'),
                'outcome': decision.get('outcome'),
                'confidence': decision.get('confidence'),
                'timestamp': decision.get('timestamp'),
                'entities': ';'.join(decision.get('entities', [])),
                'policy_compliant': decision.get('policy_compliant', False),
                'has_exception': decision.get('has_exception', False),
                'precedent_count': len(decision.get('precedents', []))
            }
            flat_decisions.append(flat_decision)
        
        csv_exporter.export(flat_decisions, str(decisions_csv_path))
        
        print(f"Decision records exported to {len([decisions_export_path, decisions_csv_path])} files")
        
    except Exception as e:
        print(f"Decision export failed: {e}")

# Analytics Export
if any(analytics is not None for analytics in [graph_analytics, decision_analytics, performance_analytics]):
    try:
        analytics_export_path = OUTPUT_DIR / 'capability_gap_analytics.json'
        
        analytics_data = {
            'timestamp': datetime.now().isoformat(),
            'graph_metrics': graph_metrics if 'graph_metrics' in locals() else {},
            'decision_patterns': decision_patterns if 'decision_patterns' in locals() else {},
            'compliance_metrics': compliance_metrics if 'compliance_metrics' in locals() else {},
            'performance_metrics': performance_metrics if 'performance_metrics' in locals() else {},
            'kpis': kpis if 'kpis' in locals() else {}
        }
        
        json_exporter.export(analytics_data, str(analytics_export_path))
        
        print(f"Analytics exported to {analytics_export_path}")
        
    except Exception as e:
        print(f"Analytics export failed: {e}")

# Comprehensive Report Generation
try:
    report_path = OUTPUT_DIR / 'capability_gap_analysis_report.html'
    
    # Generate comprehensive report
    report_data = {
        'title': 'Military Capability Gap Analysis Report',
        'subtitle': 'Future A2/AD Environment - 2028+ Planning Horizon',
        'date': datetime.now().strftime('%Y-%m-%d'),
        'executive_summary': {
            'total_documents': len(corpus) if 'corpus' in locals() else 0,
            'capability_gaps_identified': len([d for d in decisions if d.category == 'capability_gap_assessment']) if 'decisions' in locals() else 0,
            'mitigation_strategies': len([d for d in decisions if d.category == 'capability_gap_mitigation']) if 'decisions' in locals() else 0,
            'policy_exceptions': len([d for d in decisions if d.category == 'policy_exception']) if 'decisions' in locals() else 0,
            'overall_risk_level': 'HIGH' if len([d for d in decisions if d.category == 'capability_gap_assessment']) > 2 else 'MEDIUM'
        },
        'sections': {
            'methodology': 'End-to-end capability gap analysis using Semantica context graphs',
            'data_sources': 'Defense documents, web sources, ontologies',
            'key_findings': 'Critical low-altitude detection gaps identified',
            'recommendations': 'Implement multi-layer sensor fusion capability',
            'next_steps': 'Policy review and capability acquisition planning'
        },
        'analytics': analytics_data if 'analytics_data' in locals() else {},
        'visualizations': {
            'graph_structure': 'context_graph.graphml' if 'context_graph' in locals() else None,
            'decision_timeline': 'decisions.csv' if 'decisions_csv_path' in locals() else None
        }
    }
    
    report_generator.generate_report(
        data=report_data,
        output_path=str(report_path),
        format='html',
        include_toc=True,
        include_appendices=True
    )
    
    print(f"Comprehensive report generated: {report_path}")
    
except Exception as e:
    print(f"Report generation failed: {e}")

# Export Summary
export_summary = {
    'timestamp': datetime.now().isoformat(),
    'output_directory': str(OUTPUT_DIR),
    'files_exported': [],
    'total_size_mb': 0
}

# Calculate total export size
if OUTPUT_DIR.exists():
    exported_files = list(OUTPUT_DIR.glob('*'))
    export_summary['files_exported'] = [f.name for f in exported_files if f.is_file()]
    export_summary['total_size_mb'] = round(sum([f.stat().st_size for f in exported_files if f.is_file()]) / (1024*1024), 2)

print("Export Summary:")
print(f"  Output Directory: {export_summary['output_directory']}")
print(f"  Files Exported: {len(export_summary['files_exported'])}")
print(f"  Total Size: {export_summary['total_size_mb']} MB")
print("  Files:")
for file_name in sorted(export_summary['files_exported']):
    print(f"    - {file_name}")

print("Comprehensive export and reporting completed!")

In [None]:
# Operational monitoring proxies using Semantica-native analytics outputs
context_insights = agent_context.get_context_insights()

decision_quality_monitor = {
    'decision_count': context_insights.get('decision_tracking', {}).get('total_decisions', 0),
    'graph_nodes': context_insights.get('knowledge_graph', {}).get('node_count', 0),
    'graph_edges': context_insights.get('knowledge_graph', {}).get('edge_count', 0),
    'provenance_entries': prov.get_statistics().get('total_entries', 0),
}

decision_quality_monitor

In [None]:
import semantica.export as export_module

kg_json_path = OUTPUT_DIR / 'capability_gap_kg.json'
context_json_path = OUTPUT_DIR / 'capability_gap_context_graph.json'
context_graphml_path = OUTPUT_DIR / 'capability_gap_context_graph.graphml'
kg_rdf_path = OUTPUT_DIR / 'capability_gap_kg.ttl'
kg_csv_base = OUTPUT_DIR / 'capability_gap_kg'

export_module.export_json(kg, kg_json_path, format='json')
export_module.export_json(context_graph.to_dict(), context_json_path, format='json')
export_module.export_graph(context_graph.to_dict(), context_graphml_path, format='graphml')
export_module.export_rdf(kg, kg_rdf_path, format='turtle')
export_module.export_csv({'entities': kg.get('entities', []), 'relationships': kg.get('relationships', [])}, kg_csv_base)

[str(kg_json_path), str(context_json_path), str(context_graphml_path), str(kg_rdf_path)]


## Export Layer (YAML, LPG, Report Generator)

- Exports: `export_json`, `export_graph`, `export_rdf`, `export_csv`, `export_yaml`, `export_lpg`, `ReportGenerator`
- Writes graph and analysis artifacts to multiple formats.
- Generates a report file.

In [None]:
import semantica.export as export_module

extra_exports = {}

try:
    yaml_path = OUTPUT_DIR / 'capability_gap_context_graph.yaml'
    export_module.export_yaml(context_graph.to_dict(), yaml_path)
    extra_exports['yaml'] = str(yaml_path)
except Exception as e:
    extra_exports['yaml_error'] = str(e)

try:
    lpg_path = OUTPUT_DIR / 'capability_gap_kg.cypher'
    export_module.export_lpg(kg, lpg_path, method='cypher')
    extra_exports['lpg'] = str(lpg_path)
except Exception as e:
    extra_exports['lpg_error'] = str(e)

try:
    report_data = {
        'title': 'Military Capability Gap Analysis - End-to-End Report',
        'summary': {
            'corpus_items': len(corpus),
            'extraction_items': len(extraction_corpus),
            'entities': len(all_entities),
            'relationships': len(all_relationships),
            'decisions': context_graph.get_decision_summary().get('total_decisions', 0),
        },
        'metrics': {
            'kg_entities': len(kg.get('entities', [])),
            'kg_relationships': len(kg.get('relationships', [])),
            'context_nodes': context_graph.stats().get('node_count', 0),
            'context_edges': context_graph.stats().get('edge_count', 0),
        },
        'analysis': {'kg_analysis': kg_analysis}
    }
    report_path = OUTPUT_DIR / 'capability_gap_analysis_report.md'
    generator = export_module.ReportGenerator(format='markdown', include_charts=False)
    generator.generate_report(report_data, report_path, format='markdown')
    extra_exports['report'] = str(report_path)
except Exception as e:
    extra_exports['report_error'] = str(e)

extra_exports


In [None]:
from semantica.visualization import KGVisualizer

viz = KGVisualizer(layout='force', color_scheme='default')
kg_html_path = OUTPUT_DIR / 'capability_gap_kg_network.html'

try:
    viz.visualize_network(kg, output='html', file_path=kg_html_path)
    viz_result = str(kg_html_path)
except Exception as e:
    viz_result = f'Visualization skipped: {e}'

viz_result

In [None]:
summary = {
    'corpus_items': len(corpus),
    'entities_extracted': len(all_entities),
    'relationships_extracted': len(all_relationships),
    'events_detected': len(all_events),
    'triplets_extracted': len(all_triplets),
    'kg_entities': len(kg.get('entities', [])),
    'kg_relationships': len(kg.get('relationships', [])),
    'context_graph_stats': context_graph.stats(),
    'reasoning_inferred_rules': [f.conclusion for f in inferred],
    'output_dir': str(OUTPUT_DIR),
    'extended_kg_analytics': extended_kg_analytics if 'extended_kg_analytics' in globals() else {},
    'extra_exports': extra_exports if 'extra_exports' in globals() else {},
    'ontology_details': ontology_details if 'ontology_details' in globals() else [],
}
summary

- Builds final summary dictionary.
- Shows counts and output paths from all pipeline stages.