This project implements **SafetyCopilot**, an enterprise-style workflow assistant prototype that turns short system descriptions into **phase-based functional safety and compliance checklists**.

The agent is built with **Google's Agent Development Kit (ADK)** as part of the *Agents Intensive Capstone Project on Kaggle*.

The project targets functional safety workflows inspired by standards such as **ISO 26262** and **IEC 61508** in domains like automotive and railway. The goal is to show how agents can support safety engineers by:

- Looking up relevant guideline snippets from a small **mini-standard** knowledge base  
- Generating concrete, actionable checklist items grouped by development phase  
- Reviewing the checklist to ensure that critical safety activities are not missing  
- Laying the groundwork for iterative, project-level refinement with sessions and memory


### ADK concepts demonstrated

This notebook focuses on the following ADK concepts:

- **Tools** ‚Äì custom ADK tools for mini-standard lookup and checklist formatting, plus a small Python helper for coarse, keyword-based risk estimation  
- **Agent architectures** ‚Äì a two-agent workflow with a **Planner agent** that proposes a phase-based checklist and a **Checker agent** that reviews and augments it  
- **Memory / sessions** ‚Äì using `user_id` and `session_id` with `InMemorySessionService` and `Runner` to keep basic project-level context and support future iterative refinement  
- **Evaluation & observability** ‚Äì a small test set of example systems with an automatic coverage metric over ‚Äúmust-have‚Äù topics (e.g. HARA, safety goals, verification plan, safety case)


# 1. Intallation and imports

## 1.1 Install dependencies

This notebook uses the Google Agent Development Kit (ADK).

In [1]:
pip install google-adk

Collecting cachetools<6.0,>=2.0.0 (from google-auth!=2.24.0,!=2.25.0,<3.0.0,>=1.32.0->google-api-python-client<3.0.0,>=2.157.0->google-adk)
  Downloading cachetools-5.5.2-py3-none-any.whl.metadata (5.4 kB)
Collecting protobuf!=4.21.0,!=4.21.1,!=4.21.2,!=4.21.3,!=4.21.4,!=4.21.5,<7.0.0,>=3.20.2 (from google-cloud-aiplatform<2.0.0,>=1.125.0->google-cloud-aiplatform[agent-engines]<2.0.0,>=1.125.0->google-adk)
  Downloading protobuf-5.29.5-cp38-abi3-manylinux2014_x86_64.whl.metadata (592 bytes)
Downloading cachetools-5.5.2-py3-none-any.whl (10 kB)
Downloading protobuf-5.29.5-cp38-abi3-manylinux2014_x86_64.whl (319 kB)
[2K   [90m‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ[0m [32m319.9/319.9 kB[0m [31m6.0 MB/s[0m eta [36m0:00:00[0ma [36m0:00:01[0m
[?25hInstalling collected packages: protobuf, cachetools
  Attempting uninstall: protobuf
    Found existing installation: protobuf 6.33.0
    Uninstalling pro

## 1.2: Configure Gemini API Key

This notebook uses the Gemini API, which requires an API key.

**Step 1 - Create an API key**

Go to Google AI Studio and create a Gemini API key.

**Step 2 - Add the key to Kaggle Secrets**

1. In the top menu bar of the notebook editor, select `Add-ons` then `Secrets`.
2. Create a new secret with the label `GOOGLE_API_KEY`.
3. Paste your API key into the "Value" field and click "Save".
4. Ensure that the checkbox next to `GOOGLE_API_KEY` is selected so that the secret is attached to the notebook.

**Step 3 - Authenticate in the notebook**

Run the cell below to read the secret and set the environment variable:

In [2]:
import os
from kaggle_secrets import UserSecretsClient

try:
    GOOGLE_API_KEY = UserSecretsClient().get_secret("GOOGLE_API_KEY")
    os.environ["GOOGLE_API_KEY"] = GOOGLE_API_KEY

    # If the environment uses Vertex AI routing, enable this flag. Otherwise, disable it. 
    os.environ["GOOGLE_GENAI_USE_VERTEXAI"] = "TRUE"
    
    print("‚úÖ Setup and authentication complete.")
except Exception as e:
    print(
        f"üîë Authentication Error: Please make sure you have added 'GOOGLE_API_KEY' to your Kaggle secrets. Details: {e}"
    )

‚úÖ Setup and authentication complete.


## 1.3: Import ADK components

This section imports the components used from the Agent Development Kit (ADK).

Only the pieces needed for this notebook are included, focusing on:

- Defining LLM-based agents  
- Running agents with sessions (Runner + InMemorySessionService)  
- Wrapping Python functions as tools (FunctionTool)

In [3]:
import uuid
from typing import Dict, List, Any
import pandas as pd
from IPython.display import Markdown, display
from google.genai import types
from google.adk.agents import LlmAgent
from google.adk.models.google_llm import Gemini
from google.adk.runners import Runner
from google.adk.sessions import InMemorySessionService
from google.adk.tools.function_tool import FunctionTool

MODEL_ID = "gemini-2.0-flash"

print("‚úÖ ADK components imported successfully.")

‚úÖ ADK components imported successfully.


# 2. Mini-standard knowledge base

## 2.1 Motivation

Instead of embedding full ISO or IEC standards, this notebook uses a lightIight **mini-standard** JSON structure that captures the spirit of key functional safety activities.

This keeps the project:

- **Legally safe** ‚Äì only store short, paraphrased summaries, not the original standard text.  
- **LightIight** ‚Äì small enough to inspect, version, and extend inside a single notebook.  
- **Agent-friendly** ‚Äì easy for tools and agents to filter by phase or topic.

## 2.2 Structure of the mini-standard

Each entry in the mini-standard has the following fields:

- `id`: internal identifier  
- `phase`: development phase  
  - e.g. `concept`, `design`, `implementation`, `verification`, `safety_case`  
- `topic`: short topic name  
  - e.g. `HARA`, `safety_goals`, `verification_plan`  
- `priority`: one of `low`, `medium`, or `high`  
- `summary`: a short, paraphrased description of the activity

The agent will later use tools to look up relevant entries for a given **standard** and **phase**.

## 2.3 Define the mini-standard knowledge base

In [4]:
mini_standard: Dict[str, List[Dict[str, Any]]] = {
    "iso26262": [
        # Concept phase
        {
            "id": "iso26262-concept-hara",
            "phase": "concept",
            "topic": "HARA",
            "priority": "high",
            "summary": "Perform a hazard analysis and risk assessment for the system, identifying potential hazards and their risk levels."
        },
        {
            "id": "iso26262-concept-safety-goals",
            "phase": "concept",
            "topic": "safety_goals",
            "priority": "high",
            "summary": "Define safety goals that mitigate the most critical hazards and document their rationale."
        },
        # Design / development
        {
            "id": "iso26262-design-reqs",
            "phase": "design",
            "topic": "safety_requirements",
            "priority": "high",
            "summary": "Derive and allocate safety requirements to system, hardware, and software elements."
        },
        {
            "id": "iso26262-design-traceability",
            "phase": "design",
            "topic": "traceability",
            "priority": "medium",
            "summary": "Ensure traceability from safety goals to technical safety requirements and design elements."
        },
        # Implementation
        {
            "id": "iso26262-impl-sw-implementation",
            "phase": "implementation",
            "topic": "sw_implementation",
            "priority": "medium",
            "summary": "Implement software in accordance with coding guidelines, defensive programming, and safety mechanisms."
        },
        # Verification & validation
        {
            "id": "iso26262-verif-test-plan",
            "phase": "verification",
            "topic": "verification_plan",
            "priority": "high",
            "summary": "Plan verification and validation activities, including unit, integration, and system-level tests to show that safety requirements are met."
        },
        {
            "id": "iso26262-verif-independent-review",
            "phase": "verification",
            "topic": "independent_review",
            "priority": "medium",
            "summary": "Schedule independent reviews or assessments for key safety work products."
        },
        # Safety case & documentation
        {
            "id": "iso26262-safety-case",
            "phase": "safety_case",
            "topic": "safety_case",
            "priority": "high",
            "summary": "Plan and maintain a safety case that collects evidence and arguments for releasing the system."
        },
        {
            "id": "iso26262-safety-manual",
            "phase": "safety_case",
            "topic": "safety_manual",
            "priority": "medium",
            "summary": "Prepare safety manuals or usage guidelines for integrators and downstream users."
        },
    ],

    "generic_safety": [
        {
            "id": "generic-risk-assessment",
            "phase": "concept",
            "topic": "risk_assessment",
            "priority": "high",
            "summary": "Identify hazards, estimate risks, and decide which risks require mitigation."
        },
        {
            "id": "generic-controls",
            "phase": "design",
            "topic": "controls",
            "priority": "medium",
            "summary": "Design technical and organizational controls to reduce or monitor risks."
        },
        {
            "id": "generic-test",
            "phase": "verification",
            "topic": "testing",
            "priority": "medium",
            "summary": "Plan and execute tests or other verification activities to check that controls work as intended."
        },
        {
            "id": "generic-documentation",
            "phase": "safety_case",
            "topic": "documentation",
            "priority": "medium",
            "summary": "Document assumptions, limitations, and residual risks for stakeholders."
        },
    ]
}

# Quick sanity check: which standards are defined?
list(mini_standard.keys())

['iso26262', 'generic_safety']

## 2.4 Quick preview of the mini-standard

For example, the entries for the ISO 26262 **concept** phase are:

In [5]:
iso_concept = [item for item in mini_standard["iso26262"] if item["phase"] == "concept"]
pd.DataFrame(iso_concept)[["id", "phase", "topic", "priority", "summary"]]

Unnamed: 0,id,phase,topic,priority,summary
0,iso26262-concept-hara,concept,HARA,high,Perform a hazard analysis and risk assessment ...
1,iso26262-concept-safety-goals,concept,safety_goals,high,Define safety goals that mitigate the most cri...


# 3. Tools

The notebook defines three small helper functions that encapsulate domain logic:

- `standard_lookup_tool(standard, phase)`
- `risk_estimator_tool(system_description)`
- `checklist_formatter_tool(tasks)`

`standard_lookup_tool` and `checklist_formatter_tool` are exposed to the Planner agent as ADK tools, so that the agent can query the mini-standard and produce phase-based checklists.

`risk_estimator_tool` is used as a lightweight Python helper to pre-estimate a
risk level ("low", "medium", "high") from the system description. The resulting
`risk_level` is passed to both agents in the prompt as context.


## 3.1 Implement the core tools

In [6]:
def standard_lookup_tool(standard: str, phase: str) -> List[Dict[str, Any]]:
    """
    Lookup mini-standard entries for a given standard and phase.
    """
    items = mini_standard.get(standard, [])
    return [item for item in items if item["phase"] == phase]


def risk_estimator_tool(system_description: str) -> str:
    """
    Very simple, non-normative risk estimation based on keywords.
    Returns: 'low', 'medium', or 'high'.
    """
    text = system_description.lower()
    score = 0

    high_risk_keywords = [        
        "brake", "braking",
        "steering",
        "train", "railway",
        "high-voltage", "high voltage",
        "battery", "bms",
        "poIrtrain",
        "airbag",
        "collision avoidance",]
    medium_risk_keywords = ["monitor", "monitoring",
        "diagnostic", "diagnostics",
        "emergency",
        "fail-safe", "failsafe",
        "stability control",
        "safety monitoring",]

    for kw in high_risk_keywords:
        if kw in text:
            score += 2
    for kw in medium_risk_keywords:
        if kw in text:
            score += 1

    if score >= 3:
        return "high"
    elif score >= 1:
        return "medium"
    else:
        return "low"


def checklist_formatter_tool(tasks: List[Dict[str, Any]]) -> Dict[str, List[str]]:
    """
    Group generated tasks by phase to create a checklist.
    Each task is expected to have 'phase' and 'description' fields.
    """
    grouped: Dict[str, List[str]] = {}
    for t in tasks:
        phase = t.get("phase", "general")
        grouped.setdefault(phase, []).append(f"- [ ] {t['description']}")
    return grouped


## 3.2 Quick sanity checks for the tools
### 3.2.1 Test the standard lookup tool:

In [7]:
print("ISO 26262 ‚Äì concept phase entries:")
standard_lookup_tool("iso26262", "concept")

ISO 26262 ‚Äì concept phase entries:


[{'id': 'iso26262-concept-hara',
  'phase': 'concept',
  'topic': 'HARA',
  'priority': 'high',
  'summary': 'Perform a hazard analysis and risk assessment for the system, identifying potential hazards and their risk levels.'},
 {'id': 'iso26262-concept-safety-goals',
  'phase': 'concept',
  'topic': 'safety_goals',
  'priority': 'high',
  'summary': 'Define safety goals that mitigate the most critical hazards and document their rationale.'}]

### 3.2.2 Test the risk estimator tool

In [8]:
test_description = "An electronic control unit for braking and stability control in a car"
print("Risk level for test system:\n ", test_description)
print("=>", risk_estimator_tool(test_description))

Risk level for test system:
  An electronic control unit for braking and stability control in a car
=> high


### 3.2.3 Test the checklist formatter

In [9]:
# Test the checklist formatter
dummy_tasks = [
    {"phase": "concept", "description": "Perform hazard analysis and risk assessment (HARA)"},
    {"phase": "concept", "description": "Define safety goals for critical hazards"},
    {"phase": "design", "description": "Derive technical safety requirements"},
    {"phase": "verification", "description": "Create a verification / test plan"},
]

grouped = checklist_formatter_tool(dummy_tasks)

for phase, items in grouped.items():
    display(Markdown(f"### {phase.capitalize()} phase"))
    display(Markdown("\n".join(items)))

### Concept phase

- [ ] Perform hazard analysis and risk assessment (HARA)
- [ ] Define safety goals for critical hazards

### Design phase

- [ ] Derive technical safety requirements

### Verification phase

- [ ] Create a verification / test plan

# 4. Example systems for demo and evaluation

To make SafetyCopilot concrete, this notebook defines a small set of
**example systems** from automotive and railway domains.

These systems are reused in two places later in the notebook:

- A **single-system demo** (Section 6), which shows the full Planner + Checker
  pipeline on one system end to end.
- A simple **evaluation** (Section 7), which measures whether the final
  checklists mention a few must-have safety concepts.

For each system, the notebook defines:

- an `id` and a short human-readable `name`  
- a `domain` (e.g. automotive, railway)  
- a natural-language `description`  
- a list of `expected_must_have` concepts that the final checklist is
  expected to mention at least once

The `expected_must_have` field is later used for a lightweight coverage
metric in the evaluation section. Typical examples include:

- HARA / risk assessment  
- Safety goals  
- Safety requirements  
- Verification or test plan  
- Safety case or safety documentation

The next cell defines three representative systems:

1. An automotive brake control ECU  
2. A high-voltage battery management system (BMS)  
3. A railway signaling subsystem

In [10]:
# The example systems used for demo and evaluation

systems = [
    {
        "id": "brake_ecu",
        "name": "Automotive Brake Control ECU",
        "domain": "automotive",
        "description": (
            "An electronic control unit that monitors wheel speed sensors and controls "
            "hydraulic brake actuators to provide anti-lock braking and stability control."
        ),
        "expected_must_have": ["HARA", "safety_goals", "safety_requirements", "verification_plan", "safety_case"],
    },
    {
        "id": "bms",
        "name": "High-voltage Battery Management System",
        "domain": "automotive",
        "description": (
            "A battery management system for an electric vehicle that monitors cell voltages "
            "and temperatures, controls contactors, and communicates with the vehicle control unit."
        ),
        "expected_must_have": ["HARA", "safety_goals", "diagnostic", "verification_plan", "safety_case"],
    },
    {
        "id": "rail_signal",
        "name": "Railway Signaling Subsystem",
        "domain": "railway",
        "description": (
            "A signaling subsystem that controls track-side signals and interlocking logic "
            "to prevent train collisions and ensure safe routing."
        ),
        "expected_must_have": ["HARA", "safety_goals", "safety_requirements", "verification_plan", "safety_case"],
    },
]

# Compact overview of the defined systems
systems_df = pd.DataFrame([{"id": s["id"], "name": s["name"], "domain": s["domain"]} for s in systems])
systems_df

Unnamed: 0,id,name,domain
0,brake_ecu,Automotive Brake Control ECU,automotive
1,bms,High-voltage Battery Management System,automotive
2,rail_signal,Railway Signaling Subsystem,railway


In [11]:
systems_summary = pd.DataFrame(
    [
        {
            "id": s["id"],
            "name": s["name"],
            "domain": s["domain"],
            "num_expected_topics": len(s["expected_must_have"]),
        }
        for s in systems
    ]
)
systems_summary

Unnamed: 0,id,name,domain,num_expected_topics
0,brake_ecu,Automotive Brake Control ECU,automotive,5
1,bms,High-voltage Battery Management System,automotive,5
2,rail_signal,Railway Signaling Subsystem,railway,5


# 5. Agent architecture and implementation

## 5.1 Component architecture (SafetyCopilot core)
This section gives a high-level view of the main components of SafetyCopilot
and how they interact: the notebook UI, the SafetyCopilot core (agents,
tools, mini-standard and evaluation module), and the ADK / Gemini backend.






![architecture](https://raw.githubusercontent.com/haoj-de/kaggle-assets/main/component-architecture.png)

*Figure 1 - Component architecture of SafetyCopilot.*

## 5.2 Agent architecture: Planner and Checker

SafetyCopilot is implemented as a small two-agent system running on top of
Google's Agent Development Kit (ADK):

- **PlannerAgent**  
  A tool-augmented LLM agent that turns a short system description into a
  phase-based safety work plan. The notebook first computes a coarse-grained
  risk label (`low`, `medium`, `high`) in Python using the
  `risk_estimator_tool`. This `risk_level`, together with the system
  `description`, `domain`, and target standard (e.g. `iso26262`), is passed
  into the Planner as context.

  Inside the agent, two ADK tools are available:
  - `standard_lookup_tool(standard, phase)` queries the `mini_standard`
    knowledge base and returns phase-tagged guideline snippets for ISO 26262‚Äì
    inspired activities.
  - `checklist_formatter_tool(tasks)` groups raw task suggestions into a
    dictionary `phase -> ["- [ ] ..."]`, which is easy to render as a
    Markdown checklist.

  The Planner uses these tools to propose concrete, short, imperative tasks
  and to produce a phase-based checklist with headings such as
  `### Concept phase`, `### Design phase`, etc.

- **CheckerAgent**  
  A second LLM agent that reviews the Planner's checklist. It receives the
  system description, the pre-estimated risk level, and the grouped checklist
  in Markdown form. Guided by its instruction, it:
  - checks whether key safety concepts (HARA / risk assessment, safety goals,
    verification / test plan, safety case) are present for medium and high
    risk systems,
  - adds missing items where needed, and
  - returns an updated checklist plus short review notes.

The overall flow for a single system is therefore:

1. Pick a system entry from the `systems` list.
2. Compute `risk_level = risk_estimator_tool(system["description"])` in Python.
3. Send the description, domain, standard, and `risk_level` to the PlannerAgent.
4. Feed the Planner's checklist into the CheckerAgent together with the
   system description and `risk_level`.
5. Display the final checklist and review in the notebook as Markdown.

## 5.2 Build PlannerAgent and CheckerAgent

The cell below instantiates the two LLM agents using the previously defined
instructions and tools.

- The PlannerAgent uses the `PLANNER_INSTRUCTION` prompt and has access to:
  - `standard_lookup_tool`
  - `checklist_formatter_tool`

- The CheckerAgent uses the `CHECKER_INSTRUCTION` prompt and does not call
  any tools directly. It relies on the pre-estimated `risk_level` that is
  provided in the prompt and on the Planner's checklist.


In [12]:
if MODEL_ID is None:
    print("ADK MODEL_ID is not set. On Kaggle, please ensure MODEL_ID and imports are configured.")
else:
    print("Using model:", MODEL_ID)

PLANNER_INSTRUCTION = """
You are SafetyCopilot, a Functional Safety Planning Agent.

Given:
- A short system description
- A target standard (e.g., iso26262, generic_safety)
- A risk level (low, medium, high)

Your job:

1. For each development phase (concept, design, implementation, verification, safety_case),
   call the `standard_lookup_tool` to retrieve relevant guideline snippets
   from the mini-standard.

2. Based on the snippets and the risk level, propose concrete, actionable
   checklist items. Each item should have:
   - phase
   - topic
   - description (short, imperative, e.g. "Define safety goals for braking hazards")

3. Use `checklist_formatter_tool(tasks)` as your final tool call to group the
   items by phase into Markdown-style checklists with checkboxes
   (always use "- [ ] ..." for each task).

4. In your final response to the user, return a **readable Markdown document**:
   - Start with a short one-sentence summary of the safety work plan.
   - Then, for each phase that has tasks, add a level-3 heading, for example:
       "### Concept phase", "### Design phase", etc.
   - Under each heading, list the checklist items as Markdown checkboxes:
       "- [ ] Perform hazard analysis and risk assessment (HARA)"

Important constraints:
- Do NOT return JSON or Python objects in the final response.
- Do NOT expose internal tool calls.
- Be concise but clear, and avoid copying standard wording verbatim.
"""


CHECKER_INSTRUCTION = """
You are a Safety Checklist Reviewer Agent.

You receive:
- The system description
- The risk level
- A grouped checklist (Markdown-style lists grouped by phase) produced by another agent

Your job:

1. Check whether the checklist includes at least the following critical concepts
   for medium and high risk systems:
   - HARA or risk assessment
   - Safety goals
   - A verification or test plan
   - A safety case or similar documentation plan

2. If something is missing, add additional checklist items to fill the gaps.
   Use the same phase structure as the original checklist.

3. Improve the wording of vague items where needed, so that a functional safety
   engineer could implement them without confusion.

4. When producing the **updated checklist**, follow these rules:
   - Reuse all relevant checklist items from the original plan.
   - For items that come from the original plan, mark them as "- [x] ...".
   - For NEW items that you add during the review, mark them as "- [ ] ...".
   - Keep the phase-based structure ("### Concept phase", "### Design phase", etc.).

5. Return a **Markdown response** with the following structure:

   ## Executive summary
   - 2‚Äì3 short bullet points summarizing the overall quality of the checklist.

   ## Gaps & risks
   - Bullet points describing important gaps, weaknesses, or risks you observed.

   ## Updated checklist
   - Phase-based headings (e.g. "### Concept phase", "### Design phase").
   - Under each phase, checklist items with "- [x] ..." for reused items
     and "- [ ] ..." for new items you added.

   ## Suggested improvements
   - 2‚Äì4 bullet points describing what you added or changed.
   - Mention any remaining assumptions or limitations.

Important constraints:
- Do NOT return JSON or Python objects.
- Do NOT expose internal thought processes or tool calls.
- Keep explanations concrete and practical for a functional safety engineer.
"""

planner_agent = None
checker_agent = None

try:
    if MODEL_ID is not None:
        planner_agent = LlmAgent(
            model=MODEL_ID,
            name="planner_agent",
            instruction=PLANNER_INSTRUCTION,
            tools=[standard_lookup_tool, checklist_formatter_tool],
        )
        print("‚úÖ planner_agent created.")

        checker_agent = LlmAgent(
            model=MODEL_ID,
            name="checker_agent",
            instruction=CHECKER_INSTRUCTION,
            tools=[],
        )
        print("‚úÖ checker_agent created.")
except Exception as e:
    print("Failed to create agents.")
    print(e)

Using model: gemini-2.0-flash
‚úÖ planner_agent created.
‚úÖ checker_agent created.


## 5.3 Configure sessions and runners

To keep conversational state and reuse the same agents across multiple calls,
the notebook uses ADK's `InMemorySessionService` together with `Runner`
objects.

Two logical sessions are created:

- one for the PlannerAgent (`SESSION_ID_PLANNER`),
- one for the CheckerAgent (`SESSION_ID_CHECKER`).

Both sessions share the same `APP_NAME` and `USER_ID`, which can be used to
store project-level context over time. Each `Runner` wraps an agent plus the
session service and exposes an asynchronous `run_async(...)` API that streams
events until a final response is produced.


In [13]:
APP_NAME = "safety_copilot_app"
USER_ID = "demo_user"

# create session ids for planner and checker
SESSION_ID_PLANNER = "planner_session"
SESSION_ID_CHECKER = "checker_session"

session_service = InMemorySessionService()

import asyncio

async def setup_sessions_and_runners():
    # create sessions
    await session_service.create_session(
        app_name=APP_NAME, user_id=USER_ID, session_id=SESSION_ID_PLANNER
    )
    await session_service.create_session(
        app_name=APP_NAME, user_id=USER_ID, session_id=SESSION_ID_CHECKER
    )

    global planner_runner, checker_runner
    planner_runner = Runner(
        agent=planner_agent,
        app_name=APP_NAME,
        session_service=session_service,
    )
    checker_runner = Runner(
        agent=checker_agent,
        app_name=APP_NAME,
        session_service=session_service,
    )

# Run the async setup
await setup_sessions_and_runners()
print("Runners ready ‚úÖ")

Runners ready ‚úÖ


# 6. Single-system demo
## 6.1 Demo: run SafetyCopilot on one example system

In this section, the full **Planner + Checker** pipeline will be run on the
first example system, the **Automotive Brake Control ECU**.

The demo illustrates:

- How the PlannerAgent turns a short system description into a structured safety plan  
- How the CheckerAgent reviews the plan, identifies gaps, and suggests improvements  
- What kind of output a safety engineer might see in practice
## 6.2 Pipeline implementation for a single system
The figure below shows the end-to-end pipeline for a single system, from a short description to a phase-based checklist and review.

![Pipeline overview](https://raw.githubusercontent.com/haoj-de/kaggle-assets/afc97cf1356521c3d974b96577514c8728dc97a4/pipeline-overview.png)

*Figure 2 - SafetyCopilot end-to-end pipeline*

In [14]:
import asyncio
from typing import Tuple

async def run_pipeline_once(
    system: Dict[str, Any],
    standard: str = "iso26262",
) -> Tuple[str, str]:
    """
    Run the full Planner + Checker pipeline for a single system and
    print the intermediate results.

    Args:
        system: One entry from the `systems` list.
        standard: Name of the standard profile, e.g. 'iso26262'.

    Returns:
        A tuple (planner_text, checker_text) with the raw LLM outputs.
    """
    description = system["description"]
    domain = system["domain"]
    risk_level = risk_estimator_tool(description)

    # ---- 1) Build the prompt for the PlannerAgent ----
    planner_prompt = f"""
You are SafetyCopilot, a functional safety planning assistant.

System description:
{description}

Domain: {domain}
Safety standards to consider: {standard}
Risk level (pre-estimated): {risk_level}

Please:
1. Propose a minimal but meaningful safety work plan for this system.
2. Use clear headings and bullet points.
3. Make explicit reference to typical phases and work products from {standard}
   where appropriate.
    """.strip()

    planner_content = types.Content(
        role="user",
        parts=[types.Part(text=planner_prompt)],
    )

    planner_text = ""
    async for event in planner_runner.run_async(
        user_id=USER_ID,
        session_id=SESSION_ID_PLANNER,
        new_message=planner_content,
    ):
        # Collect the final response content
        if event.is_final_response() and event.content and event.content.parts:
            planner_text = event.content.parts[0].text

    display(Markdown("## PlannerAgent proposal\n\n"+ planner_text))

    # ---- 2) Build the prompt for the CheckerAgent ----
    checker_prompt = f"""
You are SafetyCopilot-Checker, a strict safety & compliance reviewer.

System description:
{description}

Domain: {domain}
Target standard: {standard}
Risk level: {risk_level}

Here is the proposed safety plan from another agent
(the checklist items use '- [ ] ...'):

\"\"\"{planner_text}\"\"\"

Tasks:
1. Identify any missing critical work products or phases.
2. Highlight vague items that would worry a functional safety manager.
3. Suggest 3‚Äì5 concrete improvements, in bullet points.
4. Output a short executive summary at the top.
5. Produce an UPDATED checklist where:
   - Items reused from the original plan are marked "- [x] ...".
   - NEW items that you add during the review are marked "- [ ] ...".
   - The checklist stays grouped by phase (Concept / Design / Implementation / Verification / Safety case).

Return your answer in Markdown, with sections:
- Executive summary
- Gaps & risks
- Updated checklist
- Suggested improvements
""".strip()

    checker_content = types.Content(
        role="user",
        parts=[types.Part(text=checker_prompt)],
    )

    checker_text = ""
    async for event in checker_runner.run_async(
        user_id=USER_ID,
        session_id=SESSION_ID_CHECKER,
        new_message=checker_content,
    ):
        if event.is_final_response() and event.content and event.content.parts:
            checker_text = event.content.parts[0].text

    display(Markdown("---\n\n""## CheckerAgent review\n\n" + checker_text))

    return planner_text, checker_text


# Run the demo on the first example system
planner_text, checker_text = await run_pipeline_once(
    systems[0],
    standard="iso26262",
)



## PlannerAgent proposal

Here is a minimal safety work plan for the braking ECU, targeting ISO 26262 compliance.

### Concept phase

- [ ] Perform hazard analysis and risk assessment (HARA)
- [ ] Define safety goals for braking hazards

### Design phase

- [ ] Derive and allocate safety requirements
- [ ] Ensure traceability from safety goals to design elements

### Implementation phase

- [ ] Implement software with coding guidelines and safety mechanisms

### Verification phase

- [ ] Plan verification activities to show safety requirements are met
- [ ] Schedule independent reviews for key work products

### Safety case phase

- [ ] Plan and maintain a safety case
- [ ] Prepare safety manual and usage guidelines


---

## CheckerAgent review

## Executive summary
- The checklist covers the essential phases but lacks detail, especially for a high-risk system.
- There is no explicit safety validation phase.
- The checklist needs more specific guidance on verification and testing activities, and on safety documentation.

## Gaps & risks
- Missing: A dedicated validation phase to confirm that the implemented system meets the overall safety goals in realistic scenarios.
- Vague: "Define safety goals for braking hazards" needs to specify the format and content of the safety goals.
- Vague: "Plan verification activities to show safety requirements are met" requires details about verification methods and coverage criteria.
- Missing: Concrete acceptance criteria for verification and validation activities.
- Missing: A plan for configuration management and tool qualification.

## Updated checklist

### Concept phase

- [x] Perform hazard analysis and risk assessment (HARA)
- [x] Define safety goals for braking hazards
- [ ] Document assumptions made during HARA (e.g. environmental conditions, driver behavior)

### Design phase

- [x] Derive and allocate safety requirements
- [x] Ensure traceability from safety goals to design elements
- [ ] Define hardware and software safety requirements in a measurable and testable format
- [ ] Create a detailed safety architecture, including safety mechanisms and redundancy concepts

### Implementation phase

- [x] Implement software with coding guidelines and safety mechanisms
- [ ] Perform static code analysis to check for coding standard violations and potential vulnerabilities
- [ ] Conduct unit testing of software modules, ensuring sufficient code coverage

### Verification phase

- [x] Plan verification activities to show safety requirements are met
- [x] Schedule independent reviews for key work products
- [ ] Develop a test plan including unit, integration, and system testing to verify safety requirements
- [ ] Define test cases with clear pass/fail criteria based on safety requirements
- [ ] Track verification coverage to ensure all safety requirements are adequately verified

### Validation phase
- [ ] Define a validation plan to confirm that the integrated system meets the safety goals in the intended operational environment.
- [ ] Execute validation tests on a vehicle or a hardware-in-the-loop (HIL) test bench.
- [ ] Document the results of validation activities, including any deviations from expected behavior.

### Safety case phase

- [x] Plan and maintain a safety case
- [x] Prepare safety manual and usage guidelines
- [ ] Document the safety lifecycle, including activities, roles, and responsibilities
- [ ] Establish a configuration management plan to control changes to safety-related items
- [ ] Create a tool qualification plan (if necessary) to ensure the reliability of development tools

## Suggested improvements
- Added a "Validation phase" to explicitly address system-level validation.
- Specified that hardware and software safety requirements should be measurable and testable.
- Added concrete activities like static code analysis, unit testing, and code coverage analysis.
- Included the need for a configuration management plan and a tool qualification plan.
- This checklist assumes a standard V-model development process. If a different lifecycle model is used (e.g., agile), the checklist needs to be adapted accordingly.

# 7. Evaluation
## 7.1 Evaluation: coverage of key safety concepts

To obtain a simple quantitative view of SafetyCopilot's behavior, this project
evaluates the agent on the example systems and measure **concept coverage**.

For each system:

- define a list of *must-have* key concepts, such as:
  - HARA / risk assessment
  - Safety goals
  - Safety requirements
  - Verification or test plan
  - Safety case or safety documentation
- run the full **Planner + Checker pipeline** on the system description
  to generate a refined checklist.
- check how many of the expected concepts appear (as keywords) in the
  final text produced by the CheckerAgent.

This is **not** a formal safety assessment, but it gives a quick indication
of whether the combined Planner + Checker pipeline tends to include the
most important high-level safety activities.

## 7.2 Evaluation functions (Planner + Checker pipeline)

In [15]:
async def run_pipeline_get_text(
    system: Dict[str, Any],
    standard: str = "iso26262",
) -> str:
    """
    Run the full Planner + Checker pipeline for evaluation and return
    the final text produced by the CheckerAgent.

    This function:
    1. Calls the PlannerAgent via its Runner to generate an initial plan.
    2. Calls the CheckerAgent via its Runner to review and refine the plan.
    3. Returns the CheckerAgent's final text output.
    """
    if planner_runner is None or checker_runner is None:
        raise RuntimeError("Agents (Planner/Checker) are not initialized.")

    description = system["description"]
    domain = system["domain"]
    risk_level = risk_estimator_tool(description)

    # ---- 1) Call PlannerAgent to get an initial plan/checklist ----
    planner_input = (
        f"System description:\n{description}\n\n"
        f"Domain: {domain}\n"
        f"Target standard: {standard}\n"
        f"Risk level (pre-estimated): {risk_level}\n\n"
        "Your task:\n"
        "1. Propose a safety and compliance work plan for this system, grouped by phase.\n"
        "2. Mention key concepts such as HARA / risk assessment, safety goals,\n"
        "   safety requirements, verification / test plan, and safety case where appropriate.\n"
        "3. Keep the answer concise (ideally under 200 words).\n"
    )

    planner_content = types.Content(
        role="user",
        parts=[types.Part(text=planner_input)],
    )

    planner_text = ""
    async for event in planner_runner.run_async(
        user_id=USER_ID,
        session_id=SESSION_ID_PLANNER,
        new_message=planner_content,
    ):
        if event.is_final_response() and event.content and event.content.parts:
            planner_text = event.content.parts[0].text

    # ---- 2) Call CheckerAgent to refine / complete the checklist ----
    checker_input = (
        f"System description:\n{description}\n\n"
        f"Domain: {domain}\n"
        f"Target standard: {standard}\n"
        f"Risk level: {risk_level}\n\n"
        "Current checklist or plan from the PlannerAgent:\n"
        f"\"\"\"{planner_text}\"\"\"\n\n"
        "Your task:\n"
        "1. Check whether the checklist includes at least the following critical concepts\n"
        "   for medium and high risk systems: HARA / risk assessment, safety goals,\n"
        "   verification or test plan, and safety case or safety documentation.\n"
        "2. If something is missing, add additional checklist items to fill the gaps.\n"
        "3. Return an updated grouped checklist plus a short textual review (2‚Äì4 bullet points).\n"
        "4. Keep the whole answer concise (ideally under 200 words).\n"
    )

    checker_content = types.Content(
        role="user",
        parts=[types.Part(text=checker_input)],
    )

    checker_text = ""
    async for event in checker_runner.run_async(
        user_id=USER_ID,
        session_id=SESSION_ID_CHECKER,
        new_message=checker_content,
    ):
        if event.is_final_response() and event.content and event.content.parts:
            checker_text = event.content.parts[0].text

    return checker_text


async def evaluate_systems(
    systems: List[Dict[str, Any]],
    standard: str = "iso26262",
) -> pd.DataFrame:
    """
    Evaluate the combined Planner + Checker pipeline on a small set of
    example systems by checking how many expected key concepts appear
    in the final text.

    Returns:
        A pandas DataFrame with one row per system, including a coverage percentage.
    """
    rows: List[Dict[str, Any]] = []

    for s in systems:
        final_text = await run_pipeline_get_text(s, standard=standard)
        text_lower = final_text.lower()

        expected = s["expected_must_have"]
        covered = 0

        for kw in expected:
            # Allow some flexibility in matching; e.g., "HARA" or "hazard analysis"
            key = kw.lower()
            if key == "hara":
                match = ("hara" in text_lower) or ("hazard analysis" in text_lower)
            elif key == "verification_plan":
                match = (
                    ("verification" in text_lower and "plan" in text_lower)
                    or ("test plan" in text_lower)
                )
            else:
                match = key.replace("_", " ") in text_lower

            if match:
                covered += 1

        coverage_ratio = covered / max(1, len(expected))

        rows.append(
            {
                "id": s["id"],
                "name": s["name"],
                "domain": s["domain"],
                "risk": risk_estimator_tool(s["description"]),
                "expected_topics": len(expected),
                "covered_topics": covered,
                "coverage_ratio": coverage_ratio,
            }
        )

    df = pd.DataFrame(rows)

    # Add a coverage percentage column and reorder columns for readability
    df["coverage_%"] = (df["coverage_ratio"] * 100).round(1)
    df = df[
        [
            "id",
            "name",
            "domain",
            "risk",
            "expected_topics",
            "covered_topics",
            "coverage_%",
        ]
    ]

    return df


# Run the evaluation (may take some time depending on model and rate limits)
eval_df = await evaluate_systems(systems, standard="iso26262")

display(eval_df.style.format({"coverage_%": "{:.1f}%"}))
print("Average coverage:", (eval_df["coverage_%"].mean()).round(1), "%")

Unnamed: 0,id,name,domain,risk,expected_topics,covered_topics,coverage_%
0,brake_ecu,Automotive Brake Control ECU,automotive,high,5,5,100.0%
1,bms,High-voltage Battery Management System,automotive,high,5,5,100.0%
2,rail_signal,Railway Signaling Subsystem,railway,medium,5,5,100.0%


Average coverage: 100.0 %


# 8. Conclusion

In this notebook, **SafetyCopilot**, a functional safety checklist agent built with Google's **Agent Development Kit (ADK)** was implemented.

The agent:

- Uses tools to look up a **mini-standard** knowledge base, estimate a simple risk level, and format tasks into phase-based checklists  
- Combines a **PlannerAgent** and a **CheckerAgent** to generate and refine functional safety work plans  
- Stores project-level information through **sessions / memory** via the `user_id` and `session_id` parameters  
- Provides a simple **evaluation** on example systems, measuring coverage of key safety concepts such as HARA, safety goals, verification plans, and safety cases

This project demonstrates how ADK can be used to prototype **enterprise agents** that support functional safety engineers in a structured way.  
With more time, SafetyCopilot could be extended by:

- Covering richer and more detailed safety standards and domain profiles  
- Using more nuanced risk modeling instead of a simple keyword heuristic  
- Allowing interactive editing of generated checklists in a human-in-the-loop workflow  
- Integrating with existing enterprise requirements / ALM tools to export tasks and traceability links
