# Day 24: Safety Case Builder

In this lab, we will build a **Safety Case** for a hypothetical LLM deployment. A Safety Case is a structured argument, supported by evidence, that explains why a system is safe to operate in a specific context.

We will use the `SafetyCase`, `Claim`, `Argument`, and `Evidence` classes we implemented.

In [None]:
import sys
import os

# Add root directory to sys.path
sys.path.append(os.path.abspath('../../'))

from src.assurance.safety_case import SafetyCase, Claim, Argument, Evidence

## 1. Define the Top-Level Safety Claim

Our top-level claim is usually something like: "The model is safe for public deployment as a customer service chatbot."

In [None]:
safety_case = SafetyCase(title="Customer Service Chatbot deployment", version="1.0")

top_claim = Claim(
    id="C1",
    description="The model is safe for use in a customer service chatbot role."
)
safety_case.add_claim(top_claim)

## 2. Break Down into Sub-Claims

To prove C1, we need to prove several sub-claims:
- **C1.1**: The model is robust to prompt injection attacks.
- **C1.2**: The model refuses to generate harmful content (hate speech, PII).
- **C1.3**: The model does not hallucinate critical facts.

In [None]:
claim_1_1 = Claim(id="C1.1", description="The model is robust to prompt injection attacks.")
claim_1_2 = Claim(id="C1.2", description="The model refuses to generate harmful content.")

# Link sub-claims (conceptually, in a full tree we would automate this linking,
# here we just add them to the case)
safety_case.add_claim(claim_1_1)
safety_case.add_claim(claim_1_2)

## 3. Add Evidence

We'll simulate evidence from our previous days' tools (Red Teaming, Refusal Calibration).

In [None]:
# Evidence from Day 16 (Automated Red Teaming)
ev_red_team = Evidence(
    id="E1",
    description="Red Teaming Report - 50 iterations",
    data={"attack_success_rate": 0.02, "total_attempts": 1000}
)

# Evidence from Day 14 (Refusal Calibration)
ev_refusal = Evidence(
    id="E2",
    description="Refusal Calibration Test",
    data={"false_refusal_rate": 0.05, "true_refusal_rate": 0.98}
)

safety_case.add_evidence(ev_red_team)
safety_case.add_evidence(ev_refusal)

## 4. Construct Arguments

Now we link the evidence to the claims.

In [None]:
# Argument for C1.1 (Robust to attacks)
arg_1_1 = Argument(
    id="A1",
    description="Red teaming shows low attack success rate (< 5%).",
    evidence_ids=["E1"]
)
claim_1_1.arguments.append(arg_1_1)

# Argument for C1.2 (Harmful content)
arg_1_2 = Argument(
    id="A2",
    description="Calibration test shows high True Refusal Rate (> 95%).",
    evidence_ids=["E2"]
)
claim_1_2.arguments.append(arg_1_2)

## 5. Evaluate and Generate Report

We run the evaluation logic to see if our claims are supported.

In [None]:
safety_case.evaluate()
report = safety_case.generate_report()
print(report)