[![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/Hawksight-AI/semantica/blob/main/cookbook/use_cases/cybersecurity/05_Threat_Intelligence_Integration.ipynb)

# Threat Intelligence Integration Pipeline

## Overview

This notebook demonstrates how to integrate Python/FastMCP MCP servers as data sources for threat intelligence ingestion. Connect to threat intelligence MCP servers via URL, ingest threat feeds, vulnerability data, and security events, then build a threat intelligence knowledge graph.

**IMPORTANT**: This implementation supports ONLY Python-based MCP servers and FastMCP servers. Users can bring their own Python/FastMCP MCP servers via URL connections.


**Documentation**: [API Reference](https://semantica.readthedocs.io/use-cases/)

### Modules Used (20+)

- **Ingestion**: MCPIngestor, ingest_mcp, WebIngestor, FeedIngestor
- **Parsing**: MCPParser, JSONParser, XMLParser, StructuredDataParser
- **Extraction**: NERExtractor, RelationExtractor, EventDetector, TripletExtractor
- **KG**: GraphBuilder, TemporalGraphQuery, GraphAnalyzer, ConnectivityAnalyzer
- **Embeddings**: EmbeddingGenerator, TextEmbedder
- **Vector Store**: VectorStore, HybridSearch
- **Reasoning**: Reasoner (Legacy), RuleManager, ExplanationGenerator
- **Export**: JSONExporter, RDFExporter, ReportGenerator
- **Visualization**: KGVisualizer, TemporalVisualizer, AnalyticsVisualizer

### Pipeline

**Connect to Threat Intel MCP Server → Ingest Threat Data via MCP → Parse MCP Responses → Extract Threat Entities → Build Threat KG → Generate Embeddings → Hybrid RAG → Analyze Threats → Generate Reports → Visualize**

## Installation

Install Semantica from PyPI:

```bash
pip install semantica
# Or with all optional dependencies:
pip install semantica[all]
```

---

## Step 1: Connect to Threat Intelligence MCP Server

Connect to a Python/FastMCP MCP server that provides threat intelligence data via URL. The MCP server can expose resources (threat feeds, vulnerability databases) and tools (threat queries, IOC checks).


In [None]:
!pip install semantica


In [None]:
from semantica.ingest import MCPIngestor, ingest_mcp
from semantica.parse import MCPParser, JSONParser, XMLParser, StructuredDataParser
from semantica.semantic_extract import NERExtractor, RelationExtractor, EventDetector, TripletExtractor
from semantica.kg import GraphBuilder, TemporalGraphQuery, GraphAnalyzer, ConnectivityAnalyzer
from semantica.embeddings import EmbeddingGenerator, TextEmbedder
from semantica.vector_store import VectorStore, HybridSearch
# # from semantica.reasoning import InferenceEngine, RuleManager, ExplanationGenerator
from semantica.export import JSONExporter, RDFExporter, ReportGenerator
from semantica.visualization import KGVisualizer, TemporalVisualizer, AnalyticsVisualizer
import json
from datetime import datetime, timedelta

# Initialize MCP ingestor
mcp_ingestor = MCPIngestor()

# Connect to threat intelligence MCP server via URL
# Replace with your actual MCP server URL
# Example: http://localhost:8000/mcp or https://api.example.com/threat-mcp
threat_mcp_url = "http://localhost:8000/mcp"

# Connect to MCP server with authentication (if required)
mcp_ingestor.connect(
    "threat_server",
    url=threat_mcp_url,
    headers={
        "Authorization": "Bearer your_token",
        "X-API-Key": "your_api_key"
    } if "api.example.com" in threat_mcp_url else {}
)

# List available resources (threat feeds, vulnerability databases)
resources = mcp_ingestor.list_available_resources("threat_server")
print(f"\n📊 Available Resources ({len(resources)}):")
for resource in resources[:5]:  # Show first 5
    print(f"  - {resource.uri}: {resource.name}")
    if resource.description:
        print(f"    {resource.description[:80]}...")

# List available tools (threat queries, IOC checks)
tools = mcp_ingestor.list_available_tools("threat_server")
print(f"\n🔧 Available Tools ({len(tools)}):")
for tool in tools[:5]:  # Show first 5
    print(f"  - {tool.name}: {tool.description or 'No description'}")


## Step 2: Ingest Threat Intelligence Data from MCP Server

Ingest threat feeds, vulnerability data, and security events using both resource-based and tool-based methods.


In [None]:
# Initialize parsers
mcp_parser = MCPParser()
json_parser = JSONParser()
xml_parser = XMLParser()
structured_parser = StructuredDataParser()

threat_data = []

# Method 1: Resource-based ingestion
# Ingest from MCP resources (threat feeds, vulnerability databases)
threat_feeds = mcp_ingestor.ingest_resources(
    "threat_server",
    resource_uris=["resource://threats/feed", "resource://vulnerabilities/database"]
)

for item in threat_feeds:
    threat_data.append(item)
    print(f"  Ingested resource: {item}")

# Method 2: Tool-based ingestion
# Call MCP tools to retrieve data dynamically
# Example: Query threat indicators
threat_indicators = mcp_ingestor.ingest_tool_output(
    "threat_server",
    tool_name="query_threat_indicators",
    arguments={
        "indicator_type": "IP",
        "date_range": {
            "start": (datetime.now() - timedelta(days=7)).isoformat(),
            "end": datetime.now().isoformat()
        }
    }
)

if threat_indicators:
    threat_data.append(threat_indicators)
    print(f"  Retrieved threat indicators")

# Example: Check IOC (Indicators of Compromise)
ioc_check = mcp_ingestor.ingest_tool_output(
    "threat_server",
    tool_name="check_ioc",
    arguments={
        "ioc_type": "hash",
        "ioc_value": "abc123def456"
    }
)

if ioc_check:
    threat_data.append(ioc_check)
    print(f"  Retrieved IOC check results")

# Sample threat intelligence data (if MCP server is not available)
if not threat_data:
    sample_data = {
        "threat_indicators": [
            {
                "indicator_id": "TI001",
                "indicator_type": "IP",
                "indicator_value": "192.168.1.100",
                "threat_type": "malware",
                "severity": "high",
                "timestamp": (datetime.now() - timedelta(days=1)).isoformat(),
                "source": "ThreatFeed1"
            },
            {
                "indicator_id": "TI002",
                "indicator_type": "domain",
                "indicator_value": "malicious.example.com",
                "threat_type": "phishing",
                "severity": "medium",
                "timestamp": (datetime.now() - timedelta(hours=12)).isoformat(),
                "source": "ThreatFeed2"
            },
            {
                "indicator_id": "TI003",
                "indicator_type": "hash",
                "indicator_value": "abc123def456",
                "threat_type": "ransomware",
                "severity": "critical",
                "timestamp": datetime.now().isoformat(),
                "source": "ThreatFeed1"
            }
        ],
        "vulnerabilities": [
            {
                "cve_id": "CVE-2024-0001",
                "description": "Remote code execution vulnerability",
                "severity": "critical",
                "affected_products": ["Product A", "Product B"],
                "published_date": (datetime.now() - timedelta(days=5)).isoformat()
            }
        ]
    }
    threat_data.append(sample_data)
    print(f"  Loaded {len(sample_data['threat_indicators'])} threat indicators")
    print(f"  Loaded {len(sample_data['vulnerabilities'])} vulnerabilities")

print(f"\n📊 Total threat intelligence data items ingested: {len(threat_data)}")


## Step 3: Parse Threat Intelligence Data

Parse the threat intelligence data received from MCP server responses.


In [None]:
parsed_threat_data = []

# Parse MCP responses
for data_item in threat_data:
    # Parse MCP response (handles JSON, XML, text, binary)
    if isinstance(data_item, dict):
        parsed_item = data_item
    else:
        parsed_item = mcp_parser.parse_response(data_item, response_type="json")
    
    parsed_threat_data.append(parsed_item)
    print(f"  Parsed data item")

# Extract threat indicators and vulnerabilities
threat_indicators = []
vulnerabilities = []

for item in parsed_threat_data:
    if isinstance(item, dict):
        if "threat_indicators" in item:
            threat_indicators.extend(item["threat_indicators"])
        elif "indicator_id" in item:
            threat_indicators.append(item)
        elif "vulnerabilities" in item:
            vulnerabilities.extend(item["vulnerabilities"])
        elif "cve_id" in item:
            vulnerabilities.append(item)


## Step 4: Extract Threat Entities and Relationships

Extract threat entities (indicators, vulnerabilities, threat actors) and relationships from MCP data.


In [None]:
ner_extractor = NERExtractor()
relation_extractor = RelationExtractor()
event_detector = EventDetector()
triplet_extractor = TripletExtractor()

threat_entities = []
threat_relationships = []

# Extract from threat indicators
for indicator in threat_indicators:
    if isinstance(indicator, dict):
        indicator_id = indicator.get("indicator_id", "")
        indicator_type = indicator.get("indicator_type", "")
        threat_type = indicator.get("threat_type", "")
        source = indicator.get("source", "")
        
        # Threat Indicator entity
        threat_entities.append({
            "id": indicator_id,
            "type": "ThreatIndicator",
            "name": indicator_id,
            "properties": {
                "indicator_type": indicator_type,
                "indicator_value": indicator.get("indicator_value", ""),
                "threat_type": threat_type,
                "severity": indicator.get("severity", ""),
                "timestamp": indicator.get("timestamp", ""),
                "source": source
            }
        })
        
        # Threat Type entity
        if threat_type:
            threat_entities.append({
                "id": threat_type,
                "type": "ThreatType",
                "name": threat_type,
                "properties": {}
            })
            threat_relationships.append({
                "source": indicator_id,
                "target": threat_type,
                "type": "classified_as",
                "properties": {}
            })
        
        # Source entity
        if source:
            threat_entities.append({
                "id": source,
                "type": "ThreatSource",
                "name": source,
                "properties": {}
            })
            threat_relationships.append({
                "source": indicator_id,
                "target": source,
                "type": "reported_by",
                "properties": {}
            })

# Extract from vulnerabilities
for vuln in vulnerabilities:
    if isinstance(vuln, dict):
        cve_id = vuln.get("cve_id", "")
        
        # Vulnerability entity
        threat_entities.append({
            "id": cve_id,
            "type": "Vulnerability",
            "name": cve_id,
            "properties": {
                "description": vuln.get("description", ""),
                "severity": vuln.get("severity", ""),
                "published_date": vuln.get("published_date", "")
            }
        })
        
        # Affected products
        for product in vuln.get("affected_products", []):
            threat_entities.append({
                "id": product,
                "type": "Product",
                "name": product,
                "properties": {}
            })
            threat_relationships.append({
                "source": cve_id,
                "target": product,
                "type": "affects",
                "properties": {}
            })

# Remove duplicates
seen_entities = set()
unique_entities = []
for entity in threat_entities:
    entity_key = (entity["id"], entity["type"])
    if entity_key not in seen_entities:
        seen_entities.add(entity_key)
        unique_entities.append(entity)

threat_entities = unique_entities

print(f"  - Threat Indicators: {len([e for e in threat_entities if e['type'] == 'ThreatIndicator'])}")
print(f"  - Vulnerabilities: {len([e for e in threat_entities if e['type'] == 'Vulnerability'])}")
print(f"  - Threat Types: {len([e for e in threat_entities if e['type'] == 'ThreatType'])}")
print(f"  - Sources: {len([e for e in threat_entities if e['type'] == 'ThreatSource'])}")


## Step 5: Build Threat Intelligence Knowledge Graph

Build a temporal knowledge graph from the extracted threat entities and relationships.


In [None]:
builder = GraphBuilder()
temporal_query = TemporalGraphQuery()
graph_analyzer = GraphAnalyzer()
connectivity_analyzer = ConnectivityAnalyzer()

# Build knowledge graph
threat_kg = builder.build(threat_entities, threat_relationships)

# Analyze graph structure
metrics = graph_analyzer.compute_metrics(threat_kg)
connectivity = connectivity_analyzer.analyze_connectivity(threat_kg)

print(f"  Entities: {len(threat_kg.get('entities', []))}")
print(f"  Relationships: {len(threat_kg.get('relationships', []))}")
print(f"  Graph density: {metrics.get('density', 0):.3f}")
print(f"  Connectivity: {connectivity.get('connected_components', 0)} components")


## Step 6: Generate Embeddings and Set Up Hybrid RAG

Generate embeddings for threat intelligence data and set up hybrid search (vector + temporal KG).


In [None]:
# Generate embeddings
embedding_generator = EmbeddingGenerator()
text_embedder = TextEmbedder()

# Generate embeddings for threat entities
threat_texts = []
for entity in threat_entities:
    if entity.get("type") == "ThreatIndicator":
        text = f"{entity.get('properties', {}).get('indicator_value', '')} {entity.get('properties', {}).get('threat_type', '')} {entity.get('properties', {}).get('description', '')}"
        threat_texts.append(text)

embeddings = embedding_generator.generate_embeddings(threat_texts)

# Set up vector store
vector_store = VectorStore()
vector_store.add_embeddings(threat_texts, embeddings)

# Set up hybrid search (vector + temporal KG)
hybrid_search = HybridSearch()
hybrid_search.setup(vector_store, threat_kg)


# Inference engine for threat analysis
# # inference_engine = InferenceEngine()
rule_manager = RuleManager()
explanation_generator = ExplanationGenerator()

# Threat analysis rules
inference_engine.add_rule("IF severity(critical) AND threat_type(ransomware) THEN immediate_response_required")
inference_engine.add_rule("IF severity(high) AND indicator_type(IP) THEN block_ip")

# Add facts from threat data
for indicator in threat_indicators:
    if isinstance(indicator, dict):
        inference_engine.add_fact({
            "indicator_id": indicator.get("indicator_id", ""),
            "severity": indicator.get("severity", ""),
            "threat_type": indicator.get("threat_type", ""),
            "indicator_type": indicator.get("indicator_type", "")
        })

# Generate threat insights
# # threat_insights = inference_engine.forward_chain()

print(f"  Threat insights: {len(threat_insights)}")


## Step 7: Export and Visualize

Export the threat intelligence knowledge graph and generate visualizations.


In [None]:
import tempfile
import os

temp_dir = tempfile.mkdtemp()

json_exporter = JSONExporter()
rdf_exporter = RDFExporter()
report_generator = ReportGenerator()

# Export knowledge graph
json_exporter.export_knowledge_graph(threat_kg, os.path.join(temp_dir, "threat_kg.json"))
rdf_exporter.export_knowledge_graph(threat_kg, os.path.join(temp_dir, "threat_kg.rdf"))

# Generate report
report_data = {
    "summary": f"Threat intelligence integration from MCP server identified {len(threat_insights)} insights",
    "threat_indicators": len([e for e in threat_entities if e['type'] == 'ThreatIndicator']),
    "vulnerabilities": len([e for e in threat_entities if e['type'] == 'Vulnerability']),
    "threat_types": len([e for e in threat_entities if e['type'] == 'ThreatType']),
    "insights": len(threat_insights)
}

report = report_generator.generate_report(report_data, format="markdown")

print(f"  JSON: {os.path.join(temp_dir, 'threat_kg.json')}")
print(f"  RDF: {os.path.join(temp_dir, 'threat_kg.rdf')}")

# Visualize
kg_visualizer = KGVisualizer()
temporal_visualizer = TemporalVisualizer()
analytics_visualizer = AnalyticsVisualizer()

kg_viz = kg_visualizer.visualize_network(threat_kg, output="interactive")
temporal_viz = temporal_visualizer.visualize_timeline(threat_kg, output="interactive")
analytics_viz = analytics_visualizer.visualize_analytics(threat_kg, output="interactive")


# Cleanup: Disconnect from MCP server
mcp_ingestor.disconnect("threat_server")
print("  Disconnected from MCP server")

print(f"📊 Total modules used: 20+")
