Skip to content
main
Switch branches/tags
Code

Latest commit

 

Git stats

Files

Permalink
Failed to load latest commit information.
Type
Name
Latest commit message
Commit time
 
 

Causal Analysis based on System Theory

Causal Analysis based on System Theory (CAST) is an accident analysis technique that maximizes learning from accidents and incidents.

https://psas.scripts.mit.edu/home/get_file4.php?name=CAST_handbook.pdf

CAST goals

Accidents create learning opportunities to look closely at the operation of a system in times of stress, and identify areas for improvement. Examining individual system components can help yet is not enough-- it is necessary to examine the operation of the system as whole.

  1. Include all causes and optimize all learning, in order to improve the system as a whole; do not focus on merely a few “probable causes” or one “root cause”, and do not focus on fixing merely a few areas or one area.

  2. Reduce hindsight bias. A clue that hindsight bias is involved is when you see the words “should have”, “could have”, or “if only”.

  3. Take a system’s view of human behavior. Usually the actions of the operators will be found to be understandable when the reasons for their behaviors are examined.

  4. Provide a blame-free explanation of why the loss occurred; consider “why” and “how” rather than “who.”

  5. Use a comprehensive accident causality model that emphasizes why the controls that were created to prevent the particular type of loss were not effective in the case at hand and how to strengthen the safety control structure to prevent similar losses in the future.

Terminology

System Goals
The reason the system was created in the first place.
System Constraints
The ways that the goals can acceptably be achieved.
Accident a.k.a. Mishap
An undesired, unacceptable, and unplanned event that results in a loss. For short, simply a loss.
Incident or Near-Miss
An undesired, unacceptable, and unplanned event that does not result in a loss, but could have under different conditions or in a different environment.
Hazard or vulnerability
A system state or set of conditions that, together with specific environmental conditions, can lead to an accident or loss.
Causality model
A causality model explains why something happened. For an accident, a causality model is used to explain why the accident occurred.
Emergent properties
These arise from the relationships among the parts of the system, that is, by how they interact and fit together.
Controller
A controler adjusts the system, and recieves feeback from it, in order to manage individual components, interactions among components and emergent properties.
Process model
What a controller believes about a process, including the current state of the process, how the it operates, and how to interact with it.
Mental model
What a person believes about a process, including the current state of the process, how the it operates, and how to interact with it.
Safety information system (SIS)
Store and communicate information about hazards, detecting dangerous trends and deviations, evaluating the effectiveness of controls and standards, comparing models and risk assessments with actual behavior, identifying and controlling hazards to improve designs and standards, etc.
Safety management system (SMS)
The safety management system is theoretically the same as the safety control structure used in CAST analyses. The more general term “safety control structure” is used here as some industries define an SMS that excludes important controls necessary to prevent accidents.
Safety control structure (SCS)
Safety control structure (SCS) is a more general term than safety management system (SMS). This document uses SCS because some industries define an SMS that excludes important controls necessary to prevent accidents.
Safety culture
The values and assumptions in the industry and organization used to make safety-related decisions.
Hindsight bias
Hindsight bias is a psychological phenomenon whereby people convince themselves after an event that they could/should have predicted the event.

Perspectives

Human error is a symptom, not a cause.
A systems approach to accident causation starts from the premise that human error is a symptom of a system that needs to be redesigned. Accident analysis should identify the design flaws and recommend ways they can be fixed, not blame the operators for the consequences of those design flaws.
Blame is the enemy of saftey.
Focusing on blame seriously hinders what we learn from accidents, because important information is often deliberately hidden and investigation is often deliberately deflected. Focus on explanations, not accusations. Focus on learning and improvements, not finger-pointing and punishments.

CAST analysis

The goal of analysis is to identify the limitations of the safety control structure that allowed the loss and identify how to strengthen the structure in the future.

  1. Assemble basic information. This is in order to perform the analysis.
    1. System. Define the system involved and the boundary of the analysis.
    2. Accident. Describe the loss.
    3. Hazards. Describe the system states and environment conditions that led to the accident.
    4. Constraints. Identify the system-level safety constraints required to prevent the hazard.
    5. Events. Describe what happened without conclusions and without blame.
    6. Physical loss. Analyze the physical loss in terms of the physical equipment and controls, the requirements on the physical design to prevent the hazard involved, the physical controls (emergency and safety equipment) included in the design to prevent this type of accident, failures and unsafe interactions leading to the hazard, missing or inadequate physical controls that might have prevented the accident, and any contextual factors that influenced the events.
    7. Questions. Create questions that will be answered later.
  2. Model the existing safety control structure for this type of hazard.
  3. Analyze each component in loss. Examine the components of the control structure to determine why they were not effective in preventing the loss. Show the role each component played in the accident and the explanation for their behavior (why they did what they did and why they thought it was the right thing to do at the time).
    1. Contributions to Accident
    2. Mental Model
    3. Flaws
    4. Context
    5. Questions. Create questions that will be answered later.
  4. Identify flaws in the control structure as a whole (general systemic factors) that contributed to the loss. The systemic factors span the individual system control structure components.
    1. Communication and coordination
    2. Safety Information System (SIS)
    3. Design of the safety management system
    4. Culture
    5. Changes and dynamics in the system and environment
    6. Economics
    7. System environment
    8. Questions. Create questions that will be answered later.
  5. Create recommendations for changes to the control structure to prevent a similar loss in the future. If appropriate, design a continuous improvement program for this hazard as part of your overall risk management program.
    1. Recommendations
    2. Implementations. Assign responsibilities for implementations of recommendations, including prioritizations, scheduling, budgeting, etc.
    3. Feedback. Establish a feedback system to determine whether the recommendations and implemenations were effective in strengthening the controls.
    4. Follow-up

Controls

Control is interpreted broadly and, therefore, includes everything that is currently done in safety engineering, plus more.

  • Component controls e.g. interlocks, barriers, fail-safes, redundancy, and intentional design.

  • Process controls e.g. development and training processes, manufacturing processes and procedures, maintenance processes, and general system operating processes.

  • Social controls e.g. shared value systems, societal and organizational culture and incentive structures, government regulation, insurance, and individual self-interest.

System-Theoretic Accident Model and Processes (STAMP)

STAMP is the accident causality model that underlies CAST.

STAMP treats accidents as caused by complex interactions among physical systems, humans, and social systems. Safety is treated as a dynamic control problem rather than a failure prevention problem. No causes are omitted from the STAMP model. STAMP changes the emphasis from preventing failures to enforcing constraints on system behavior.

Key advantages:

  • STAMP applies to very complex systems because it works top-down from a high level of abstraction rather than bottom up.

  • STAMP includes software, humans, organizations, safety culture, etc. as causal factors in accidents and other types of losses without having to treat them differently or separately.

System Theoretic Process Analysis (STPA)

If STPA was used for designing the system, there will be an explicit listing of the scenarios leading to an accident that were identified and the controls created during system development.

If an STPA analysis for the system already exists, then it will provide a lot of information about what might have gone wrong. Theoretically, the STPA analysis should contain the scenario that occurred. If not, then there was a disconnect between the analysis during development and the operation of the system.

STPA disconnects may include:

  • The original STPA did not completely specify all the potential scenarios.

  • The scenario that occurred was identified, but an effective control was not implemented.

  • The system and its environment may have changed over time after the system went into operation, negating the effectiveness of the designed controls and introducing new causal scenarios that were not analyzed originally.

Safety culture

The values and assumptions in the industry and organization used to make safety-related decisions.

The safety culture in any organization is set by the top management. A sincere commitment by management to safety is often cited as the most important factor in achieving it. Management needs to support employees when they exhibit a reasonable concern for safety in their work and when they put safety ahead of other goals such as schedule and cost.

See safety philosophy

Negative examples of safety cuture are e.g. culture of risk acceptance, culture of denial, culture of compliance, culture of documentation, culture of swagger.

Assumption-based leading indicators

Assumptions are made during system development that are used to design safety into a system. When, over time, those assumptions no longer hold, then the organization is likely to migrate to a state of higher risk.

Systems will always change and evolve over time, as will the environment in which the system operates. Changes may evolve slowly over time and their impact may not be obvious. Because changes are necessary and inevitable, processes must be created to ensure that safety is not degrading.

Leading indicators are commonly used in some industries to identify when the system is migrating toward a state of higher risk. Assumption-based leading indicators, then, can be identified by checking the original assumptions during operations to make sure that they are still true.

Introducing CAST into an organization

CAST involves a paradigm change in the way people think about and identify accident causes. Introducing CAST into an organization may require effort.

Advice:

  1. Grab the opportunity to make changes after a major loss.

  2. Demonstrate that this provides results that are significant improvements, while also being time efficient and cost effective.

  3. Achieve buy-in at the top.

  4. Make the investigation team independent of the management of the group in which the events occurred, and report to a higher level of management.

Comparisons

CAST vs. System Theoretic Process Analysis (STPA)
CAST assists in identifying one particular scenario that occurred, after the fact. STPA is a hazard analysis tool based on the same powerful model of causality as CAST; STPA proactive analysis can identify all potential scenarios that may lead to losses, and can prevent accidents before they happen.
CAST vs. Root Cause Analysis (RCA)
CAST is complex and multifactorial, because systems safety covers the total spectrum of risk management. RCA aims to find a single straightforward cause for a loss; this tends to make it easier to devise a response to a loss such as a way to “solve the problem”.
CAST STAMP causality model vs. chain-of-events causality model
CAST treats safety as a holistic system control problem, inluding emergent properties, not as an individual component failure problem. A chain of events causality model uses an assumption of direct causality between one idependent event and the next indpendent event; also known as "Heinrich’s Domino Model" with one domino falling which triggers the next domino falling; also known as "Reason’s Swiss Cheese Model" with failures akin to holes in succesive slices.

About

Causal Analysis based on System Theory (CAST)

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published