Skip to content

graph: BOL photo evidence stored as raw Base64 in Neo4j Document nodes #845

@PetrefiedThunder

Description

@PetrefiedThunder

Summary

The mobile field capture endpoint (POST /event) stores Bill of Lading (BOL) photo evidence directly in Neo4j graph nodes as raw Base64 strings via the Document.raw_content property.

File: services/graph/app/routers/fsma/traceability.py lines 439–447

if payload.image_data:
    doc_id = str(uuid.uuid4())
    document = Document(
        document_id=doc_id,
        document_type="BOL",
        # In a production environment, we would upload to S3.
        # For this pilot, we store the Base64 as the raw_content.
        source_uri=f"base64://{doc_id}",
        raw_content=payload.image_data,   # <-- Base64 blob written to Neo4j
        ...
    )

Problems:

  1. Graph database bloat: Neo4j is a graph database optimized for relationship traversal, not binary blob storage. Large Base64 payloads (BOL photos can be 1–5 MB each) will degrade page cache efficiency and query performance for all tenants.
  2. Data integrity / FSMA 204: FSMA 204 requires that evidence documents be retrievable within 24 hours of FDA request. Blobs embedded in graph nodes are not indexed or addressable by standard FDA document retrieval workflows.
  3. OWASP A04 Insecure Design: Document payloads bypass object storage access controls, encryption-at-rest policies, and CDN-based content delivery. Sensitive BOL data (trading partner info, lot codes, quantities) sits in the graph without property-level access restrictions.
  4. The comment acknowledges this is non-production behavior but the code has shipped as the operative implementation.

Violates:

  • OWASP A04 Insecure Design: Threat model each KDE data flow; ensure encryption at rest/in transit.
  • FSMA 204 Audit Trail: Evidence links (evidence_link) should be S3 URIs, not opaque base64:// pseudo-URIs.

Action:

  1. Upload BOL images to S3 (or equivalent object store) in the endpoint handler.
  2. Store only the S3 URI as source_uri in the Document node.
  3. Remove raw_content from the Document node properties returned by node_properties (or gate it behind an explicit flag).
  4. Add an integration test asserting that raw_content is None in the Neo4j node after mobile ingestion.

Severity: Medium — active pilot path writes binary blobs to the graph database; acknowledged in a code comment but not yet remediated.

Labels: graph, fsma-204, owasp-a04, data-integrity

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions