[![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/Hawksight-AI/semantica/blob/main/cookbook/use_cases/cybersecurity/02_Incident_Analysis.ipynb)

# Incident Analysis Pipeline

## Overview

This notebook demonstrates a complete security incident analysis pipeline: ingest security logs from multiple sources (files, databases, streams), parse structured and unstructured logs, extract security entities, build knowledge graph, analyze relationships, detect anomalies, and generate incident reports.


**Documentation**: [API Reference](https://semantica.readthedocs.io/use-cases/)

### Modules Used (20+)

- **Ingestion**: FileIngestor, WebIngestor, FeedIngestor, StreamIngestor, DBIngestor, RepoIngestor, EmailIngestor, MCPIngestor
- **Parsing**: JSONParser, XMLParser, StructuredDataParser, DocumentParser
- **Extraction**: NERExtractor, RelationExtractor, EventDetector, TripletExtractor
- **KG**: GraphBuilder, GraphAnalyzer, ConnectivityAnalyzer, CentralityCalculator
- **Reasoning**: InferenceEngine, RuleManager, ExplanationGenerator
- **Quality**: KGQualityAssessor, ConflictDetector, ProvenanceTracker
- **Export**: JSONExporter, RDFExporter, ReportGenerator
- **Visualization**: KGVisualizer, AnalyticsVisualizer, TemporalVisualizer

### Pipeline

**Multiple Security Sources → Parse Logs → Extract Security Entities → Build Incident KG → Analyze Relationships → Detect Anomalies → Generate Reports → Visualize**

## Installation

Install Semantica from PyPI:

```bash
pip install semantica
# Or with all optional dependencies:
pip install semantica[all]
```

---

## Step 1: Ingest Security Logs from Multiple Sources

Ingest security logs from files, databases, streams, and threat intelligence feeds.


In [None]:
!pip install semantica


In [None]:
from semantica.ingest import FileIngestor, DBIngestor, StreamIngestor, FeedIngestor
from semantica.parse import JSONParser, XMLParser, StructuredDataParser, DocumentParser
from semantica.semantic_extract import NERExtractor, RelationExtractor, EventDetector, TripletExtractor
from semantica.kg import GraphBuilder, GraphAnalyzer, ConnectivityAnalyzer, CentralityCalculator
from semantica.reasoning import InferenceEngine, RuleManager, ExplanationGenerator
from semantica.conflicts import ConflictDetector
from semantica.kg import ProvenanceTracker
from semantica.export import JSONExporter, RDFExporter, ReportGenerator
from semantica.visualization import KGVisualizer, AnalyticsVisualizer, TemporalVisualizer
import tempfile
import os
import json
from datetime import datetime, timedelta

file_ingestor = FileIngestor()
db_ingestor = DBIngestor()
stream_ingestor = StreamIngestor()
feed_ingestor = FeedIngestor()

json_parser = JSONParser()
xml_parser = XMLParser()
structured_parser = StructuredDataParser()
document_parser = DocumentParser()

temp_dir = tempfile.mkdtemp()

# Real-world security log formats
security_logs_json = os.path.join(temp_dir, "security_logs.json")
security_logs_data = [
    {
        "timestamp": (datetime.now() - timedelta(hours=2)).isoformat(),
        "source_ip": "192.168.1.100",
        "destination_ip": "10.0.0.50",
        "event_type": "failed_login",
        "user": "admin",
        "severity": "medium",
        "message": "Multiple failed login attempts detected"
    },
    {
        "timestamp": (datetime.now() - timedelta(hours=1)).isoformat(),
        "source_ip": "203.0.113.45",
        "destination_ip": "10.0.0.50",
        "event_type": "port_scan",
        "severity": "high",
        "message": "Port scanning activity detected from external IP"
    },
    {
        "timestamp": (datetime.now() - timedelta(minutes=30)).isoformat(),
        "source_ip": "192.168.1.100",
        "destination_ip": "10.0.0.75",
        "event_type": "data_exfiltration",
        "user": "user123",
        "severity": "critical",
        "message": "Large data transfer detected to external server"
    }
]

with open(security_logs_json, 'w') as f:
    json.dump(security_logs_data, f, indent=2)

# XML format security events (common in SIEM systems)
security_events_xml = os.path.join(temp_dir, "security_events.xml")
xml_content = """<?xml version="1.0"?>
<security_events>
    <event>
        <timestamp>2024-01-15T14:30:00</timestamp>
        <source_ip>172.16.0.10</source_ip>
        <destination_ip>10.0.0.50</destination_ip>
        <event_type>malware_detection</event_type>
        <severity>high</severity>
        <description>Malware signature detected in file transfer</description>
    </event>
    <event>
        <timestamp>2024-01-15T15:00:00</timestamp>
        <source_ip>192.168.1.200</source_ip>
        <destination_ip>10.0.0.50</destination_ip>
        <event_type>unauthorized_access</event_type>
        <severity>critical</severity>
        <description>Unauthorized access attempt to restricted resource</description>
    </event>
</security_events>"""

with open(security_events_xml, 'w') as f:
    f.write(xml_content)

# Ingest from files
file_objects = file_ingestor.ingest_file(security_logs_json, read_content=True)
file_objects_xml = file_ingestor.ingest_file(security_events_xml, read_content=True)

# Parse structured logs
parsed_json = json_parser.parse(security_logs_json)
parsed_xml = xml_parser.parse(security_events_xml)

# Real security intelligence feed URLs
security_feeds = [
    "https://www.cisa.gov/news.xml",  # CISA Security Advisories
    "https://www.us-cert.gov/ncas/alerts.xml",  # US-CERT Alerts
    "https://feeds.feedburner.com/SecurityWeek",  # Security Week
    "https://www.darkreading.com/rss.xml"  # Dark Reading
]

threat_feed_list = []
for feed_url in security_feeds:
    threat_feed = feed_ingestor.ingest_feed(feed_url)
    if threat_feed:
        threat_feed_list.append(threat_feed)
        print(f"  Ingested feed: {feed_url}")

print(f"Ingested {len([file_objects]) if file_objects else 0} JSON log files")
print(f"Ingested {len([file_objects_xml]) if file_objects_xml else 0} XML event files")
print(f"Parsed {len(parsed_json.data) if parsed_json and parsed_json.data else 0} JSON log entries")
print(f"Parsed {len(parsed_xml.elements) if parsed_xml else 0} XML event elements")


## Step 2: Extract Security Entities and Relationships

Extract security entities (IPs, users, events) and relationships from parsed logs.


In [None]:
ner_extractor = NERExtractor()
relation_extractor = RelationExtractor()
event_detector = EventDetector()
triplet_extractor = TripletExtractor()

all_entities = []
all_relationships = []
all_events = []

# Extract from JSON logs
if parsed_json and parsed_json.data:
    for log_entry in parsed_json.data:
        if isinstance(log_entry, dict):
            log_text = f"{log_entry.get('event_type', '')} from {log_entry.get('source_ip', '')} to {log_entry.get('destination_ip', '')}: {log_entry.get('message', '')}"
            
            entities = ner_extractor.extract(log_text)
            all_entities.extend(entities)
            
            relationships = relation_extractor.extract(log_text, entities)
            all_relationships.extend(relationships)
            
            events = event_detector.detect_events(log_text)
            all_events.extend(events)

# Extract from XML events
if parsed_xml and parsed_xml.elements:
    for elem in parsed_xml.elements:
        if hasattr(elem, 'text') and elem.text:
            entities = ner_extractor.extract(elem.text)
            all_entities.extend(entities)
            
            relationships = relation_extractor.extract(elem.text, entities)
            all_relationships.extend(relationships)

# Build structured entities from log data
security_entities = []
for log_entry in parsed_json.data if parsed_json and parsed_json.data else []:
    if isinstance(log_entry, dict):
        security_entities.append({
            "id": log_entry.get("source_ip", ""),
            "type": "IP_Address",
            "name": log_entry.get("source_ip", ""),
            "properties": {"source": "security_logs"}
        })
        security_entities.append({
            "id": log_entry.get("destination_ip", ""),
            "type": "IP_Address",
            "name": log_entry.get("destination_ip", ""),
            "properties": {"source": "security_logs"}
        })
        if log_entry.get("user"):
            security_entities.append({
                "id": log_entry.get("user", ""),
                "type": "User",
                "name": log_entry.get("user", ""),
                "properties": {"source": "security_logs"}
            })
        security_entities.append({
            "id": log_entry.get("event_type", ""),
            "type": "Security_Event",
            "name": log_entry.get("event_type", ""),
            "properties": {
                "severity": log_entry.get("severity", ""),
                "timestamp": log_entry.get("timestamp", ""),
                "message": log_entry.get("message", "")
            }
        })

incident_relationships = []
for log_entry in parsed_json.data if parsed_json and parsed_json.data else []:
    if isinstance(log_entry, dict):
        incident_relationships.append({
            "source": log_entry.get("source_ip", ""),
            "target": log_entry.get("event_type", ""),
            "type": "triggered",
            "properties": {"timestamp": log_entry.get("timestamp", "")}
        })
        incident_relationships.append({
            "source": log_entry.get("event_type", ""),
            "target": log_entry.get("destination_ip", ""),
            "type": "targeted",
            "properties": {"timestamp": log_entry.get("timestamp", "")}
        })

print(f"Extracted {len(security_entities)} security entities")
print(f"Extracted {len(incident_relationships)} incident relationships")
print(f"Detected {len(all_events)} security events")


## Step 3: Build Incident Knowledge Graph

Build a knowledge graph from security entities and relationships.


In [None]:
builder = GraphBuilder()
graph_analyzer = GraphAnalyzer()
connectivity_analyzer = ConnectivityAnalyzer()
centrality_calculator = CentralityCalculator()
provenance_tracker = ProvenanceTracker()

incident_kg = builder.build(security_entities, incident_relationships)

# Track provenance
for entity in security_entities:
    provenance_tracker.track_entity(entity.get("id"), entity.get("properties", {}).get("source", "unknown"), entity)

# Analyze graph structure
metrics = graph_analyzer.compute_metrics(incident_kg)
connectivity = connectivity_analyzer.analyze_connectivity(incident_kg)
centrality_result = centrality_calculator.calculate_degree_centrality(incident_kg)
centrality_scores = centrality_result.get('centrality', {})

print(f"Built incident knowledge graph")
print(f"  Entities: {len(incident_kg.get('entities', []))}")
print(f"  Relationships: {len(incident_kg.get('relationships', []))}")
print(f"  Graph density: {metrics.get('density', 0):.3f}")
print(f"  Connected components: {len(connectivity.get('components', []))}")
print(f"  Central entities: {len([e for e, score in centrality_scores.items() if score > 0])}")


## Step 4: Analyze Relationships and Detect Anomalies

Analyze security relationships and detect anomalous patterns.


In [None]:
inference_engine = InferenceEngine()
rule_manager = RuleManager()
explanation_generator = ExplanationGenerator()
conflict_detector = ConflictDetector()

# Define security rules
inference_engine.add_rule("IF event_type is port_scan AND severity is high THEN potential_intrusion")
inference_engine.add_rule("IF event_type is data_exfiltration AND severity is critical THEN data_breach")
inference_engine.add_rule("IF multiple failed_login events from same source_ip THEN brute_force_attack")

# Add facts from security events
for log_entry in parsed_json.data if parsed_json and parsed_json.data else []:
    if isinstance(log_entry, dict):
        inference_engine.add_fact({
            "event_type": log_entry.get("event_type", ""),
            "severity": log_entry.get("severity", ""),
            "source_ip": log_entry.get("source_ip", "")
        })

# Run inference
inferred_threats = inference_engine.forward_chain()

# Detect anomalies based on patterns
anomalies = []
for log_entry in parsed_json.data if parsed_json and parsed_json.data else []:
    if isinstance(log_entry, dict):
        anomaly_score = 0
        reasons = []
        
        if log_entry.get("severity") == "critical":
            anomaly_score += 5
            reasons.append("Critical severity event")
        
        if log_entry.get("event_type") in ["data_exfiltration", "unauthorized_access"]:
            anomaly_score += 4
            reasons.append("High-risk event type")
        
        if log_entry.get("severity") == "high" and log_entry.get("event_type") == "port_scan":
            anomaly_score += 3
            reasons.append("Port scanning detected")
        
        if anomaly_score >= 3:
            anomalies.append({
                "event": log_entry.get("event_type", ""),
                "source_ip": log_entry.get("source_ip", ""),
                "severity": log_entry.get("severity", ""),
                "score": anomaly_score,
                "reasons": reasons,
                "timestamp": log_entry.get("timestamp", "")
            })

# Detect conflicts in security data
conflicts = conflict_detector.detect_value_conflicts(security_entities, "name")

print(f"Analyzed security relationships")
print(f"Inferred {len(inferred_threats)} potential threats")
print(f"Detected {len(anomalies)} anomalies")
print(f"Found {len(conflicts)} data conflicts")


## Step 5: Generate Incident Reports

Generate comprehensive incident analysis reports.


In [None]:
quality_assessor = KGQualityAssessor()
json_exporter = JSONExporter()
rdf_exporter = RDFExporter()
report_generator = ReportGenerator()

quality_score = quality_assessor.assess_overall_quality(incident_kg)

json_exporter.export_knowledge_graph(incident_kg, os.path.join(temp_dir, "incident_kg.json"))
rdf_exporter.export_knowledge_graph(incident_kg, os.path.join(temp_dir, "incident_kg.rdf"))

report_data = {
    "summary": f"Security incident analysis identified {len(anomalies)} anomalies and {len(inferred_threats)} potential threats",
    "total_events": len(parsed_json.data) if parsed_json and parsed_json.data else 0,
    "anomalies": len(anomalies),
    "threats": len(inferred_threats),
    "quality_score": quality_score.get('overall_score', 0),
    "critical_events": len([e for e in anomalies if e.get('severity') == 'critical'])
}

report = report_generator.generate_report(report_data, format="markdown")

print(f"Report length: {len(report)} characters")
print(f"Graph quality score: {quality_score.get('overall_score', 0):.3f}")


## Step 6: Visualize Security Incidents

Visualize incident knowledge graph and security patterns.


In [None]:
kg_visualizer = KGVisualizer()
analytics_visualizer = AnalyticsVisualizer()
temporal_visualizer = TemporalVisualizer()

kg_viz = kg_visualizer.visualize_network(incident_kg, output="interactive")
analytics_viz = analytics_visualizer.visualize_analytics(incident_kg, output="interactive")
temporal_viz = temporal_visualizer.visualize_timeline(incident_kg, output="interactive")

print(f"Total modules used: 20+")
print(f"Pipeline complete: Multiple Security Sources → Parse Logs → Extract Entities → Build KG → Analyze → Detect Anomalies → Reports → Visualize")
