Skip to content

STPA-for-AI extension: ML lifecycle hazards, data-driven UCAs, DeepSTPA patterns #105

@avrabe

Description

@avrabe

Context

STPA is being rapidly adopted for AI/ML safety analysis. Recent research extends classic STPA to handle ML-specific failure modes:

  • DeepSTPA (arXiv 2302.10588): Extends control loops to cover the ML lifecycle (data collection → training → deployment → monitoring). Adds layer-wise analysis of ML model internals.
  • UniSTPA (arXiv 2505.15005): Extends along lifecycle and structural-depth dimensions for end-to-end autonomous driving.
  • MIT/SEI research: STPA for frontier AI — characterizing loss of control in AI-containing sociotechnical systems.
  • ISO/PAS 8800: Requires AI safety lifecycle analysis (aligns with STPA methodology).
  • SAE J3187: STPA is now a formal SAE standard.

Rivet's existing stpa.yaml (15+ artifact types) covers classic STPA. This issue extends it for AI/ML systems.

New artifact types for stpa.yaml or schemas/stpa-ai.yaml

ML lifecycle control loop extensions

Artifact Type Description Links
ml-controller An ML model acting as controller (e.g., perception CNN, decision transformer) refines → controller
ml-process-model The ML model's learned representation of the world (implicit, not explicit like traditional controllers) models → ml-controller
training-data-source Training dataset with provenance, bias assessment, distribution characteristics trains → ml-controller
data-hazard Hazard arising from data quality (bias, distribution shift, labeling errors, adversarial samples) leads-to-hazard → hazard
ml-uca Unsafe control action specific to ML failure modes — extends the 4 classic UCA types with ML-specific causes refines → uca
ml-loss-scenario Causal scenario specific to ML (training data bias → misclassification → UCA → hazard → loss) refines → loss-scenario
monitoring-trigger Post-deployment monitoring condition that indicates model degradation (distribution drift, accuracy decay) monitors → ml-controller
retraining-requirement Requirement triggered by monitoring (when to retrain, with what data, validation criteria) satisfies → monitoring-trigger

ML-specific UCA categories

Classic STPA defines 4 UCA types: (1) not providing, (2) providing causes hazard, (3) too early/late/wrong order, (4) stopped too soon/applied too long.

For ML controllers, additional categories:

  • Misclassification UCA: ML controller classifies input incorrectly → wrong control action
  • Confidence UCA: ML controller acts with inappropriate confidence (overconfident on OOD inputs)
  • Latency UCA: ML inference too slow for real-time control
  • Degradation UCA: Model performance degrades over time (data drift) → previously safe actions become unsafe

New fields on existing types

# Extend existing `uca` type with optional ML fields
- name: ml-failure-mode
  type: string
  required: false
  allowed-values:
    - misclassification
    - overconfidence
    - out-of-distribution
    - adversarial
    - data-drift
    - latency
    - mode-collapse
  description: ML-specific failure mode causing this UCA

- name: operational-design-domain
  type: text
  required: false
  description: ODD conditions under which this UCA can occur (ISO 21448 SOTIF alignment)

Traceability rules

  • Every ml-controller must have at least one training-data-source (error)
  • Every training-data-source should have a data-hazard assessment (warning)
  • Every ml-uca must link to a hazard (error)
  • Every monitoring-trigger must have a retraining-requirement response (warning)
  • Every ml-controller should have post-deployment monitoring (monitoring-trigger backlink, warning)

Bridge to ISO/PAS 8800

# stpa-ai-8800-bridge.yaml
link-types:
  - name: ai-lifecycle-phase
    description: Maps STPA-AI artifacts to ISO/PAS 8800 lifecycle phases
    source-types: [ml-controller, training-data-source, monitoring-trigger]
    target-types: [ai-assurance-argument]  # From safety-case schema (#103)

Integration with spar

Spar's EMV2 fault tree analysis already handles error propagation in AADL models. For ML components modeled in AADL (via WASI P3 threads or custom component categories):

  • EMV2 error states can map to ML failure modes (misclassification → error state)
  • Fault trees generated from AADL+EMV2 can reference STPA-AI UCAs
  • Architecture-level analysis (scheduling, latency) validates that ML inference meets timing requirements

References

Metadata

Metadata

Assignees

No one assigned

    Labels

    enhancementNew feature or request

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions