# Data Loading

## Loading Cyber SBOM Data into Neo4j

It is assumed that you have loaded the [Cyber VPEM use case data](https://github.com/pedroleitao-neo4j/cyber-vpem) into Neo4j. This notebook loads additional data required for the **Software Bill of Materials (SBOM)** use case, specifically focusing on **transitive dependencies** and software lineage within the supply chain.

### Transitive Dependency Scenario

A transitive dependency occurs when your application relies on a library, which in turn relies on another library. This creates a "hidden" attack surface where vulnerabilities can exist multiple layers deep in your software stack.

In this scenario, we simulate a supply chain "poisoning" event:

1. **Deep Vulnerability:** A critical CVE is identified in a low-level open-source component (e.g., `vulnerable-codec`).
2. **Shared Usage:** This component is a dependency for a widely used internal "Core Utility" library (`common-utils`).
3. **Silent Propagation:** Multiple business applications (e.g., `CustomerFacingAPI`, `LegacyParser`) include the utility library, unknowingly inheriting the high-risk vulnerability.
4. **Operational Risk:** By connecting this code lineage to infrastructure data, we identify that this "hidden" bug is actually running on an internet-facing production server.

### Synthetic Data Extension

To model this scenario, we leverage the existing `DEPENDENCY_OF` relationship to create recursive chains. While traditional flat SBOMs (like spreadsheets) show a list of files, the graph reveals the **functional hierarchy** of the code.

The data loading method `create_sbom_scenario_data()` performs the following actions:

* **Dependency Chain:** Establishes `DEPENDENCY_OF` relationships from a third-party library to a shared internal library, and finally to existing `BuildArtifact` nodes.
* **Vulnerability Mapping:** Links a critical **CVE** node (such as Log4j) directly to the deep-seated transitive library.
* **Environment Context:** Utilizes the existing schema to trace the code from the `Library` node up through the `Application` and onto the `ComputeInstance`.

By modeling the software supply chain as a **Security Knowledge Graph**, defenders can achieve **Code-to-Cloud traceability**, instantly identifying every running instance affected by a newly discovered zero-day vulnerability.

In [3]:
import os
from neo4j import GraphDatabase
from dotenv import load_dotenv

load_dotenv()

# Connection details
URI = os.getenv("NEO4J_URI", "bolt://localhost:7687")
AUTH = (os.getenv("NEO4J_USER", "neo4j"), os.getenv("NEO4J_PASSWORD", "password"))
DB = os.getenv("NEO4J_DB", "nvd")

def create_sbom_scenario_data():
    query = """
    // Create a shared "Core Utility" library
    MERGE (lib_shared:Library {name: 'common-utils', version: '1.5.0', language: 'Java'})
    
    // REQUIRED: Pass lib_shared forward to the next MATCH section
    WITH lib_shared

    // Link this shared library to existing BuildArtifacts
    MATCH (artA:BuildArtifact {id: 'art-prod-001'})
    MATCH (artB:BuildArtifact {id: 'art-dev-999'})
    MERGE (lib_shared)-[:DEPENDENCY_OF]->(artA)
    MERGE (lib_shared)-[:DEPENDENCY_OF]->(artB)

    // REQUIRED: Bridge from previous MERGE to the next library creation
    WITH lib_shared

    // Create a transitive dependency
    MERGE (lib_transitive:Library {name: 'vulnerable-codec', version: '0.9.1', language: 'Java'})
    MERGE (lib_transitive)-[:DEPENDENCY_OF]->(lib_shared)

    // REQUIRED: Bridge to the final MATCH
    WITH lib_transitive

    // Connect a Critical CVE
    MATCH (v:CVE {id: 'CVE-2021-44228'})
    MERGE (v)-[:IDENTIFIED_IN]->(lib_transitive)
    """
    with GraphDatabase.driver(URI, auth=AUTH) as driver:
        with driver.session(database=DB) as session:
            session.run(query)
    print("SBOM transitive dependency data successfully added.")

In [4]:
create_sbom_scenario_data()

SBOM transitive dependency data successfully added.


With the data loaded, we can now proceed to analyze potential attack paths in the [next notebook](sbom.ipynb).