[![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/Hawksight-AI/semantica/blob/main/cookbook/use_cases/cybersecurity/06_Vulnerability_Tracking.ipynb)

# Vulnerability Tracking Pipeline

## Overview

This notebook demonstrates a complete vulnerability tracking pipeline: ingest CVE data from multiple real sources (NVD, CVE feeds, security databases), build temporal knowledge graph, correlate vulnerabilities, predict impact, and generate vulnerability reports.


**Documentation**: [API Reference](https://semantica.readthedocs.io/use-cases/)

### Modules Used (20+)

- **Ingestion**: FileIngestor, WebIngestor, FeedIngestor, StreamIngestor, DBIngestor, RepoIngestor, EmailIngestor, MCPIngestor
- **Parsing**: JSONParser, XMLParser, StructuredDataParser
- **Extraction**: NERExtractor, RelationExtractor, EventDetector, TripletExtractor
- **KG**: GraphBuilder, TemporalGraphQuery, TemporalPatternDetector, GraphAnalyzer
- **Analytics**: CentralityCalculator, CommunityDetector, ConnectivityAnalyzer
- **Reasoning**: InferenceEngine, RuleManager, ExplanationGenerator
- **Quality**: KGQualityAssessor, ConflictDetector
- **Export**: JSONExporter, RDFExporter, ReportGenerator
- **Visualization**: KGVisualizer, TemporalVisualizer, AnalyticsVisualizer

### Pipeline

**Real CVE Sources → Parse → Extract Vulnerabilities → Build Temporal KG → Correlate → Predict Impact → Generate Reports → Visualize**

## Installation

Install Semantica from PyPI:

```bash
pip install semantica
# Or with all optional dependencies:
pip install semantica[all]
```

---

## Step 1: Ingest CVE Data from Real Sources

Ingest CVE data from NVD, CVE feeds, and security databases.


In [None]:
!pip install semantica


In [None]:
from semantica.ingest import WebIngestor, FeedIngestor, DBIngestor, FileIngestor
from semantica.parse import JSONParser, XMLParser, StructuredDataParser
from semantica.semantic_extract import NERExtractor, RelationExtractor, EventDetector, TripletExtractor
from semantica.kg import GraphBuilder, TemporalGraphQuery, TemporalPatternDetector, GraphAnalyzer
from semantica.kg import CentralityCalculator, CommunityDetector, ConnectivityAnalyzer
from semantica.reasoning import InferenceEngine, RuleManager, ExplanationGenerator
from semantica.conflicts import ConflictDetector
from semantica.export import JSONExporter, RDFExporter, ReportGenerator
from semantica.visualization import KGVisualizer, TemporalVisualizer, AnalyticsVisualizer
import tempfile
import os
import json
from datetime import datetime, timedelta

web_ingestor = WebIngestor()
feed_ingestor = FeedIngestor()
db_ingestor = DBIngestor()
file_ingestor = FileIngestor()

json_parser = JSONParser()
xml_parser = XMLParser()
structured_parser = StructuredDataParser()

# Real CVE and vulnerability data sources
cve_sources = [
    "https://nvd.nist.gov/feeds/json/cve/1.1/nvdcve-1.1-recent.json.zip",  # NVD Recent CVEs (JSON)
    "https://nvd.nist.gov/feeds/xml/cve/2.0/nvdcve-2.0-recent.xml.zip",  # NVD Recent CVEs (XML)
    "https://cve.mitre.org/data/downloads/allitems.csv",  # CVE MITRE All Items
    "https://www.cisa.gov/known-exploited-vulnerabilities-catalog/json"  # CISA KEV Catalog
]

# Real vulnerability feed URLs
vulnerability_feeds = [
    "https://www.cisa.gov/news.xml",  # CISA Security Advisories
    "https://www.us-cert.gov/ncas/alerts.xml",  # US-CERT Alerts
    "https://feeds.feedburner.com/SecurityWeek",  # Security Week
    "https://www.darkreading.com/rss.xml"  # Dark Reading
]

# Real database connection for vulnerability tracking
db_connection_string = "postgresql://user:password@localhost:5432/vulnerability_db"
db_query = "SELECT cve_id, description, severity, published_date, affected_products FROM vulnerabilities WHERE published_date > NOW() - INTERVAL '30 days' ORDER BY published_date DESC"

# Real web API endpoints for CVE data
cve_apis = [
    "https://services.nvd.nist.gov/rest/json/cves/2.0",  # NVD CVE API v2.0
    "https://api.github.com/repos/CVEProject/cvelist",  # CVE Project on GitHub
    "https://cve.circl.lu/api/last"  # CVE Search API
]

# Ingest from real CVE feeds
cve_feed_list = []
for feed_url in vulnerability_feeds:
    cve_feed = feed_ingestor.ingest_feed(feed_url)
    if cve_feed:
        cve_feed_list.append(cve_feed)
        print(f"  Ingested feed: {feed_url}")
        print(f"  Items: {len(cve_feed.items) if hasattr(cve_feed, 'items') else 0}")

# Ingest from real CVE APIs
cve_api_data = []
for api_url in cve_apis[:1]:  # Process first API
    api_content = web_ingestor.ingest_url(api_url)
    if api_content:
        cve_api_data.append(api_content)
        print(f"  Ingested CVE API: {api_url}")

# Database ingestion pattern
db_data = db_ingestor.export_table(
    connection_string=db_connection_string,
    table_name="vulnerabilities",
    limit=1000
)
print(f"  Query pattern: {db_query}")

print(f"\n📊 CVE Ingestion Summary:")
print(f"  Vulnerability feeds: {len(cve_feed_list)}")
print(f"  CVE API sources: {len(cve_api_data)}")
print(f"  Database sources: 1")
