chore: sync public mirror from internal by haasonsaas · Pull Request #417 · evalops/maestro

haasonsaas · 2026-05-16T04:14:38Z

Summary

sync the sanitized public tree from evalops/maestro-internal
keep evalops/maestro as a generated public mirror of the private source of truth
preserve public-owned CI and trusted-publishing workflows from the public checkout
internal source SHA: 103df07bcc7e84e2bc4e01428d1275a119e7dff8
last generated public sync base: 1e379abc58b01b695553bfbd0b39c99127af120d
previewed public-tree drift: 11 file(s) to copy/update and 0 stale file(s) to delete
public-only commits since last generated sync: 3

Source-of-truth status

Public Mirror Drift Audit

package: @evalops/maestro
private source: https://github.com/evalops/maestro-internal@main (103df07bcc7e)
public projection: https://github.com/evalops/maestro@main (09369c829785)
files to copy or update: 11
stale files to delete: 0
result: drift detected
invariant: public_projection_has_drift

Sample Changed Paths

copy/update .github/actionlint.yaml
copy/update MODULE.bazel
copy/update docs/development/bazel.md
copy/update docs/protocols/codex-operating-layer.json
copy/update docs/protocols/headless.md
copy/update docs/protocols/hosted-runner-contract.md
copy/update packages/tui-rs/src/hosted_runner.rs
copy/update packages/tui-rs/src/ui_state.rs
copy/update src/server/handlers/hosted-runner-drain.ts
copy/update test/server/hosted-runner-drain.test.ts
copy/update test/skill-package-format.test.ts

Guidance

Let internal main generate and merge the public sync PR before relying on public main.

Drift sample

copy/update .github/actionlint.yaml
copy/update MODULE.bazel
copy/update docs/development/bazel.md
copy/update docs/protocols/codex-operating-layer.json
copy/update docs/protocols/headless.md
copy/update docs/protocols/hosted-runner-contract.md
copy/update packages/tui-rs/src/hosted_runner.rs
copy/update packages/tui-rs/src/ui_state.rs
copy/update src/server/handlers/hosted-runner-drain.ts
copy/update test/server/hosted-runner-drain.test.ts
copy/update test/skill-package-format.test.ts

Public-only commits since last generated sync

09369c8 fix: harden Bazel RBE follow-up (fix: harden Bazel RBE follow-up #418)
661ef73 Expose skill runtime activation manifests (Expose skill runtime activation manifests #414)
185f1ea Add Bazel RBE lane (Add Bazel RBE lane #416)

Validation

generated by the sync-public-release-mirror workflow in public-tree mode

Test Plan

generated by the sync-public-release-mirror workflow in public-tree mode
public-source-provenance require-internal-pr check confirms internal source PR lineage
CI, integration, rust-hosted-conformance, coverage, Socket, and Cursor checks must pass before merge

Staged Rollout

Staging is unnecessary for this generated mirror PR: it does not independently promote user-visible behavior. It mirrors already-reviewed internal source from evalops/maestro-internal@103df07bcc7e84e2bc4e01428d1275a119e7dff8, including existing hidden/evaluation surfaces, and keeps public package parity behind the established public-source-provenance gate.

Supersedes

updates existing generated public sync PR chore: sync public mirror from internal #417 to internal source 103df07bcc7e84e2bc4e01428d1275a119e7dff8

cursor · 2026-05-16T04:14:43Z

PR Summary

Medium Risk
Updates the hosted-runner drain/snapshot manifest shape and generation in both TypeScript and Rust, which affects Platform handoff/restore semantics and could break consumers expecting the previous manifest schema. Logic also changes interrupted-drain behavior to preserve partial runtime snapshot metadata, so correctness depends on new parsing/aggregation paths.

Overview
Hosted-runner drain now writes a new work_continuity block into the snapshot manifest (evalops.remote-runner.work-continuity.v1), summarizing active/tracked tools, pending-request counts, and Codex subagent tool-call/child-run/thread IDs without copying prompt payloads.

Both the TypeScript drain handler and the Rust hosted runner implement continuity extraction from runtime snapshot state (including mixed camelCase/snake_case Codex work-graph fields) and add tests/docs/evidence-map updates to lock the contract. Interrupted drain handling is also hardened to carry through the last known runtime session_id/cursor/snapshot details via a typed error wrapper.

Separately, Bazel/CI naming is updated to the internal mirror (evalops_maestro_internal / evalops-maestro-internal-rbe), and Rust UI-state path tests are made deterministic by refactoring env-var handling.

^{Reviewed by Cursor Bugbot for commit 12934be. Bugbot is set up for automated code reviews on this repo. Configure here.}

cursor

Cursor Bugbot has reviewed your changes and found 2 potential issues.

Autofix Details

Bugbot Autofix prepared fixes for both issues found in the latest run.

✅ Fixed: Bazelignore references internal repo symlink name
- Updated .bazelignore to ignore the public Bazel convenience symlink name bazel-maestro instead of the internal mirror name.
✅ Fixed: MODULE.bazel leaks internal module name to public
- Renamed the Bazel module from evalops_maestro_internal to the public-facing evalops_maestro.

_{You can send follow-ups to the cloud agent here.}

chatgpt-codex-connector

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 483d5cdbd6

ℹ️ About Codex in GitHub

Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you

Open a pull request for review
Mark a draft as ready
Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".

cursor

Cursor Bugbot has reviewed your changes and found 1 potential issue.

Bugbot Autofix prepared a fix for the issue found in the latest run.

✅ Fixed: Rust/TypeScript key-fallback behavior diverges on malformed args
- TypeScript now falls back to the snake_case key only when the camelCase value is not an array, matching the Rust drain behavior and covered by a regression test.

_{You can send follow-ups to the cloud agent here.}

^{Reviewed by Cursor Bugbot for commit 4d0f4db. Configure here.}

cursor · 2026-05-16T04:55:56Z

Bugbot Autofix prepared a fix for the issue found in the latest run.

✅ Fixed: Actionlint config breaks public workflow runner label validation
- Added the public evalops-maestro-rbe runner label back to .github/actionlint.yaml so actionlint accepts both public and internal Bazel RBE workflows.

Preview (6202899018)

diff --git a/.github/actionlint.yaml b/.github/actionlint.yaml
--- a/.github/actionlint.yaml
+++ b/.github/actionlint.yaml
@@ -2,3 +2,4 @@
   labels:
     - blacksmith-*
     - evalops-maestro-rbe
+    - evalops-maestro-internal-rbe

diff --git a/MODULE.bazel b/MODULE.bazel
--- a/MODULE.bazel
+++ b/MODULE.bazel
@@ -1,5 +1,5 @@
 module(
-    name = "evalops_maestro",
+    name = "evalops_maestro_internal",
     version = "0.0.0",
 )
 

diff --git a/docs/cookbook/skills/README.md b/docs/cookbook/skills/README.md
--- a/docs/cookbook/skills/README.md
+++ b/docs/cookbook/skills/README.md
@@ -55,6 +55,21 @@
 maestro skill lint .maestro/skills/reviewing-prs --describe-toolbox

+## Runtime Activation
+
+Loading a skill through the Skill tool returns a skillRuntimeActivation
+object in the tool result details. The same shape is available locally:
+
+bash +maestro skill inspect reviewing-prs --json +
+
+Use this manifest in harnesses and adapters when you need to see which
+references, toolbox executables, MCP servers, and tool bounds become active. MCP
+environment values remain in mcp.json; they are not copied into the activation
+manifest. Servers with missing or malformed includeTools are reported as
+warnings and omitted from the activatable server list.
+

Eval Harness

Use maestro skill eval when a package needs pass/fail evidence rather than

diff --git a/docs/design/EVALOPS_AGENT_CORE_PARITY.md b/docs/design/EVALOPS_AGENT_CORE_PARITY.md
--- a/docs/design/EVALOPS_AGENT_CORE_PARITY.md
+++ b/docs/design/EVALOPS_AGENT_CORE_PARITY.md
@@ -88,6 +88,52 @@

toolbox/ is optional. Executables in that directory are expected to support MAESTRO_TOOLBOX_ACTION=describe so Maestro can register them as typed tools when the skill is active.

+When the Skill tool loads a package, it returns a skillRuntimeActivation
+manifest alongside artifact metadata. maestro skill inspect <name> --json
+emits the same contract for local harnesses:
+
+```json
+{

"runtimeActivation": {
"name": "reviewing-prs",
"source": "project",
"profile": {
```
 "model": "gpt-5.5",
```
```
 "mode": "review",
```
```
 "isolatedContext": true
```
},
"tools": {

 "allowed": ["github.get_pull_request"],

```
 "builtin": ["read", "search"]
```
},
"resources": {
```
 "directories": {
```

   "reference": ".maestro/skills/reviewing-prs/reference",

   "toolbox": ".maestro/skills/reviewing-prs/toolbox"

```
 }
```
},
"toolPackage": {
```
 "mcp": {
```

   "configPath": ".maestro/skills/reviewing-prs/mcp.json",

```
   "servers": [
```
```
     {
```
```
       "name": "github",
```
```
       "command": "npx",
```

       "includeTools": ["get_pull_request", "list_pull_request_files"]

```
     }
```
```
   ]
```
```
 }
```
}
}
+}
+```

+The activation manifest exposes scoped paths, toolbox entries, MCP server names,
+and includeTools bounds. It does not copy MCP environment values into
+agent-visible details or telemetry. MCP servers with missing or malformed
+includeTools are omitted from the activatable server list and reported through
+manifest warnings.
+

CLI Contract

maestro skill is the public authoring surface:
@@ -117,6 +163,6 @@

Local AgentRuntime ledger: one SQLite store for runs, tool calls, waits, summaries, checkpoints, and session search.
maestro goal: persistent objective loop backed by the local ledger, promotable to Platform Objectives.
maestro workboard: local multi-agent board mapped to Platform AgentRuns when attached.
-4. Skill-bundled MCP lifecycle: start servers only when the skill triggers, stop them on cooldown/session end.
-5. Skill-bundled toolbox registration: expose described toolbox commands as governed tools while the skill is active.
+4. Skill-bundled MCP lifecycle: activate servers from skillRuntimeActivation only when the skill triggers, stop them on cooldown/session end.
+5. Skill-bundled toolbox registration: expose described toolbox commands from skillRuntimeActivation as governed tools while the skill is active.
Public cookbook and conformance fixtures for third-party skill authors.

diff --git a/docs/development/bazel.md b/docs/development/bazel.md
--- a/docs/development/bazel.md
+++ b/docs/development/bazel.md
@@ -29,5 +29,5 @@


Trusted CI should run on the repo-scoped Buildfarm runner label
-`evalops-maestro-rbe` after Deploy has registered that runner through
+`evalops-maestro-internal-rbe` after Deploy has registered that runner through
`additional_bazel_buildfarm_runners`.

diff --git a/docs/protocols/codex-operating-layer.json b/docs/protocols/codex-operating-layer.json
--- a/docs/protocols/codex-operating-layer.json
+++ b/docs/protocols/codex-operating-layer.json
@@ -276,6 +276,33 @@
			]
		},
		{
+			"area": "remote-runner-continuity",
+			"evidenceType": "source",
+			"path": "src/server/handlers/hosted-runner-drain.ts",
+			"anchors": [
+				"evalops.remote-runner.work-continuity.v1",
+				"collectHostedRunnerWorkContinuity",
+				"codex_subagent_child_run_ids",
+				"codex_subagent_thread_ids"
+			]
+		},
+		{
+			"area": "remote-runner-continuity",
+			"evidenceType": "test",
+			"path": "test/server/hosted-runner-drain.test.ts",
+			"anchors": [
+				"records Codex subagent continuity without copying prompt payloads into manifest metadata",
+				"HOSTED_RUNNER_WORK_CONTINUITY_VERSION",
+				"codex_subagent_child_run_ids"
+			]
+		},
+		{
+			"area": "remote-runner-continuity",
+			"evidenceType": "doc",
+			"path": "docs/protocols/hosted-runner-contract.md",
+			"anchors": ["work_continuity", "Codex subagent child runs"]
+		},
+		{
			"area": "realtime-streaming",
			"evidenceType": "source",
			"path": "src/server/handlers/runtime-app-server-ws.ts",

diff --git a/docs/protocols/headless.md b/docs/protocols/headless.md
--- a/docs/protocols/headless.md
+++ b/docs/protocols/headless.md
@@ -124,6 +124,15 @@
        }
      ]
    },
+    "work_continuity": {
+      "protocol_version": "evalops.remote-runner.work-continuity.v1",
+      "active_tool_count": 0,
+      "tracked_tool_count": 0,
+      "pending_request_count": 0,
+      "codex_subagent_tool_call_ids": [],
+      "codex_subagent_child_run_ids": [],
+      "codex_subagent_thread_ids": []
+    },
    "retention_policy": {
      "policy_version": "evalops.remote-runner.retention.v1",
      "managed_by": "platform",

diff --git a/docs/protocols/hosted-runner-contract.md b/docs/protocols/hosted-runner-contract.md
--- a/docs/protocols/hosted-runner-contract.md
+++ b/docs/protocols/hosted-runner-contract.md
@@ -187,7 +187,8 @@
`evalops.remote-runner.snapshot-manifest.v1`. Both Rust-hosted and
TypeScript-hosted drain paths write this same local manifest envelope, including
the runtime flush status, workspace export contract, headless runtime snapshot,
-and `retention_policy` metadata describing visibility and redaction classes.
+`work_continuity` metadata for active/pending Codex subagent child runs, and
+`retention_policy` metadata describing visibility and redaction classes.
Maestro does not upload to GCS, S3, Modal storage, Daytona storage, or any
other provider store. Upload, retention, workspace artifact hydration, and
choosing which manifest should be restored are Platform responsibilities. See

diff --git a/packages/tui-rs/src/hosted_runner.rs b/packages/tui-rs/src/hosted_runner.rs
--- a/packages/tui-rs/src/hosted_runner.rs
+++ b/packages/tui-rs/src/hosted_runner.rs
@@ -5,7 +5,7 @@
//! contract so Platform and conformance tests can target a Rust runtime without
//! routing through the Node web server.

-use std::collections::{HashMap, HashSet, VecDeque};
+use std::collections::{BTreeSet, HashMap, HashSet, VecDeque};
use std::ffi::OsString;
use std::fs;
use std::io;
@@ -40,12 +40,15 @@
pub const HOSTED_RUNNER_SNAPSHOT_MANIFEST_VERSION: &str =
    "evalops.remote-runner.snapshot-manifest.v1";
pub const HOSTED_RUNNER_RETENTION_POLICY_VERSION: &str = "evalops.remote-runner.retention.v1";
+pub const HOSTED_RUNNER_WORK_CONTINUITY_VERSION: &str = "evalops.remote-runner.work-continuity.v1";

const DEFAULT_LISTEN_HOST: &str = "0.0.0.0";
const DEFAULT_LISTEN_PORT: u16 = 8080;
const DEFAULT_HEARTBEAT_INTERVAL_MS: u64 = 15_000;
const CONNECTION_IDLE_MS: i64 = (DEFAULT_HEARTBEAT_INTERVAL_MS as i64) * 3;
const MAX_EVENTS: usize = 1024;
+const CODEX_SUBAGENT_TOOL_PREFIX: &str = "codex.subagent.";
+const CODEX_SUBAGENT_WORK_GRAPH_SCHEMA: &str = "evalops.maestro.codex.subagent-workgraph.v1";

#[derive(Debug, Clone)]
pub struct HostedRunnerConfig {
@@ -575,6 +578,8 @@
    workspace_root: PathBuf,
    runtime: RuntimeFlushManifest,
    workspace_export: WorkspaceExportManifest,
+    #[serde(default, skip_serializing_if = "Option::is_none")]
+    work_continuity: Option<WorkContinuityManifest>,
    snapshot: RuntimeSnapshot,
    #[serde(default, skip_serializing_if = "Option::is_none")]
    retention_policy: Option<RetentionPolicyManifest>,
@@ -609,6 +614,17 @@
}

#[derive(Debug, Clone, Serialize, Deserialize)]
+struct WorkContinuityManifest {
+    protocol_version: String,
+    active_tool_count: usize,
+    tracked_tool_count: usize,
+    pending_request_count: usize,
+    codex_subagent_tool_call_ids: Vec<String>,
+    codex_subagent_child_run_ids: Vec<String>,
+    codex_subagent_thread_ids: Vec<String>,
+}
+
+#[derive(Debug, Clone, Serialize, Deserialize)]
struct RetentionPolicyManifest {
    policy_version: String,
    managed_by: String,
@@ -656,6 +672,153 @@
    }
}

+fn default_work_continuity_manifest(snapshot: &RuntimeSnapshot) -> WorkContinuityManifest {
+    let state = &snapshot.state;
+    let mut tool_call_ids = BTreeSet::new();
+    let mut child_run_ids = BTreeSet::new();
+    let mut thread_ids = BTreeSet::new();
+    let pending_request_count = state.pending_approvals.len()
+        + state.pending_client_tools.len()
+        + state.pending_mcp_elicitations.len()
+        + state.pending_user_inputs.len()
+        + state.pending_tool_retries.len();
+    for source in state
+        .tracked_tools
+        .iter()
+        .chain(state.pending_approvals.iter())
+        .chain(state.pending_client_tools.iter())
+        .chain(state.pending_mcp_elicitations.iter())
+        .chain(state.pending_user_inputs.iter())
+        .chain(state.pending_tool_retries.iter())
+    {
+        let tool = json_string_field(source, &["tool"]).unwrap_or_default();
+        let is_codex_subagent_tool = tool.starts_with(CODEX_SUBAGENT_TOOL_PREFIX);
+        let has_codex_work_args = collect_codex_work_args(
+            source.get("args"),
+            &mut child_run_ids,
+            &mut thread_ids,
+            is_codex_subagent_tool,
+        );
+        if is_codex_subagent_tool || has_codex_work_args {
+            if let Some(call_id) =
+                json_string_field(source, &["call_id", "callId", "tool_call_id", "toolCallId"])
+            {
+                tool_call_ids.insert(call_id);
+            }
+        }
+    }
+    for active_tool in &state.active_tools {
+        let tool = json_string_field(active_tool, &["tool"]).unwrap_or_default();
+        if tool.starts_with(CODEX_SUBAGENT_TOOL_PREFIX) {
+            if let Some(call_id) = json_string_field(
+                active_tool,
+                &["call_id", "callId", "tool_call_id", "toolCallId"],
+            ) {
+                tool_call_ids.insert(call_id);
+            }
+        }
+    }
+    WorkContinuityManifest {
+        protocol_version: HOSTED_RUNNER_WORK_CONTINUITY_VERSION.to_string(),
+        active_tool_count: state.active_tools.len(),
+        tracked_tool_count: state.tracked_tools.len(),
+        pending_request_count,
+        codex_subagent_tool_call_ids: tool_call_ids.into_iter().collect(),
+        codex_subagent_child_run_ids: child_run_ids.into_iter().collect(),
+        codex_subagent_thread_ids: thread_ids.into_iter().collect(),
+    }
+}
+
+fn collect_codex_work_args(
+    args: Option<&serde_json::Value>,
+    child_run_ids: &mut BTreeSet<String>,
+    thread_ids: &mut BTreeSet<String>,
+    include_loose_args: bool,
+) -> bool {
+    let Some(args) = args.and_then(serde_json::Value::as_object) else {
+        return false;
+    };
+    let graph = args
+        .get("codexWorkGraph")
+        .or_else(|| args.get("codex_work_graph"));
+    let has_codex_graph = graph
+        .and_then(serde_json::Value::as_object)
+        .is_some_and(|graph| {
+            json_string_field_from_object(graph, &["schemaVersion", "schema_version"]).as_deref()
+                == Some(CODEX_SUBAGENT_WORK_GRAPH_SCHEMA)
+        });
+    if !include_loose_args && !has_codex_graph {
+        return false;
+    }
+    collect_json_string_array_from_object(args, &["childRunIds", "child_run_ids"], child_run_ids);
+    collect_json_string_array_from_object(
+        args,
+        &["receiverThreadIds", "receiver_thread_ids"],
+        thread_ids,
+    );
+    if let Some(graph) = graph.and_then(serde_json::Value::as_object) {
+        let child_runs = graph
+            .get("childRuns")
+            .or_else(|| graph.get("child_runs"))
+            .and_then(serde_json::Value::as_array);
+        if let Some(child_runs) = child_runs {
+            for child_run in child_runs {
+                if let Some(child_run) = child_run.as_object() {
+                    if let Some(child_run_id) =
+                        json_string_field_from_object(child_run, &["childRunId", "child_run_id"])
+                    {
+                        child_run_ids.insert(child_run_id);
+                    }
+                    if let Some(thread_id) =
+                        json_string_field_from_object(child_run, &["threadId", "thread_id"])
+                    {
+                        thread_ids.insert(thread_id);
+                    }
+                }
+            }
+        }
+    }
+    include_loose_args || has_codex_graph
+}
+
+fn collect_json_string_array_from_object(
+    object: &serde_json::Map<String, serde_json::Value>,
+    keys: &[&str],
+    values: &mut BTreeSet<String>,
+) {
+    for key in keys {
+        let Some(items) = object.get(*key).and_then(serde_json::Value::as_array) else {
+            continue;
+        };
+        for item in items {
+            if let Some(item) = item.as_str().map(str::trim).filter(|item| !item.is_empty()) {
+                values.insert(item.to_string());
+            }
+        }
+        return;
+    }
+}
+
+fn json_string_field(value: &serde_json::Value, keys: &[&str]) -> Option<String> {
+    value
+        .as_object()
+        .and_then(|object| json_string_field_from_object(object, keys))
+}
+
+fn json_string_field_from_object(
+    object: &serde_json::Map<String, serde_json::Value>,
+    keys: &[&str],
+) -> Option<String> {
+    keys.iter().find_map(|key| {
+        object
+            .get(*key)
+            .and_then(serde_json::Value::as_str)
+            .map(str::trim)
+            .filter(|value| !value.is_empty())
+            .map(ToOwned::to_owned)
+    })
+}
+
#[derive(Debug, Clone, Copy, PartialEq, Eq, Serialize, Deserialize)]
#[serde(rename_all = "snake_case")]
enum RuntimeFlushStatus {
@@ -2741,6 +2904,7 @@
            mode: "local_path_contract".to_string(),
            paths: workspace_export_paths,
        },
+        work_continuity: Some(default_work_continuity_manifest(&snapshot)),
        snapshot,
        retention_policy: Some(default_retention_policy_manifest()),
    };
@@ -3632,6 +3796,16 @@
            "directory"
        );
        assert_eq!(
+            manifest["work_continuity"]["protocol_version"],
+            HOSTED_RUNNER_WORK_CONTINUITY_VERSION
+        );
+        assert_eq!(manifest["work_continuity"]["active_tool_count"], 0);
+        assert_eq!(manifest["work_continuity"]["tracked_tool_count"], 0);
+        assert_eq!(
+            manifest["work_continuity"]["codex_subagent_tool_call_ids"],
+            json!([])
+        );
+        assert_eq!(
            manifest["retention_policy"]["policy_version"],
            HOSTED_RUNNER_RETENTION_POLICY_VERSION
        );
@@ -3940,6 +4114,15 @@
                    "type": "file"
                }]
            },
+            "work_continuity": {
+                "protocol_version": HOSTED_RUNNER_WORK_CONTINUITY_VERSION,
+                "active_tool_count": 1,
+                "tracked_tool_count": 1,
+                "pending_request_count": 0,
+                "codex_subagent_tool_call_ids": ["collab-spawn-ts"],
+                "codex_subagent_child_run_ids": ["agent-run-child-ts"],
+                "codex_subagent_thread_ids": ["child-thread-ts"]
+            },
            "retention_policy": {
                "policy_version": HOSTED_RUNNER_RETENTION_POLICY_VERSION,
                "managed_by": "platform",
@@ -4009,11 +4192,99 @@
                .policy_version,
            HOSTED_RUNNER_RETENTION_POLICY_VERSION
        );
+        let work_continuity = parsed.work_continuity.as_ref().expect("work continuity");
+        assert_eq!(
+            work_continuity.protocol_version,
+            HOSTED_RUNNER_WORK_CONTINUITY_VERSION
+        );
+        assert_eq!(
+            work_continuity.codex_subagent_child_run_ids,
+            vec!["agent-run-child-ts".to_string()]
+        );
        assert_eq!(parsed.snapshot.session_id, "session_ts");
        assert_eq!(parsed.snapshot.cursor, 7);
        assert_eq!(parsed.workspace_export.paths[0].relative_path, "README.md");
    }

+    #[test]
+    fn work_continuity_manifest_extracts_codex_subagent_ids() {
+        let snapshot: RuntimeSnapshot = serde_json::from_value(json!({
+            "protocolVersion": HEADLESS_PROTOCOL_VERSION,
+            "session_id": "session_rust",
+            "cursor": 9,
+            "last_init": null,
+            "state": {
+                "protocol_version": HEADLESS_PROTOCOL_VERSION,
+                "connection_count": 0,
+                "subscriber_count": 0,
+                "connections": [],
+                "model": "gpt-5.4",
+                "provider": "rust",
+                "session_id": "session_rust",
+                "pending_approvals": [{
+                    "id": "approval-rust",
+                    "call_id": "approval-call-rust",
+                    "tool": "shell"
+                }],
+                "pending_client_tools": [],
+                "pending_mcp_elicitations": [],
+                "pending_user_inputs": [{
+                    "id": "input-rust",
+                    "prompt": "continue?"
+                }],
+                "pending_tool_retries": [],
+                "tracked_tools": [{
+                    "call_id": "collab-spawn-rust",
+                    "tool": "codex.subagent.spawnAgent",
+                    "args": {
+                        "prompt": "Sensitive Rust subagent prompt",
+                        "codex_work_graph": {
+                            "schema_version": "evalops.maestro.codex.subagent-workgraph.v1",
+                            "child_runs": [{
+                                "thread_id": "child-thread-rust",
+                                "child_run_id": "agent-run-child-rust"
+                            }]
+                        }
+                    }
+                }],
+                "active_tools": [{
+                    "call_id": "collab-spawn-rust",
+                    "tool": "codex.subagent.spawnAgent",
+                    "output": "starting child"
+                }],
+                "active_utility_commands": [],
+                "active_file_watches": [],
+                "is_ready": true,
+                "is_responding": false
+            }
+        }))
+        .expect("runtime snapshot");
+
+        let continuity = default_work_continuity_manifest(&snapshot);
+
+        assert_eq!(
+            continuity.protocol_version,
+            HOSTED_RUNNER_WORK_CONTINUITY_VERSION
+        );
+        assert_eq!(continuity.active_tool_count, 1);
+        assert_eq!(continuity.tracked_tool_count, 1);
+        assert_eq!(continuity.pending_request_count, 2);
+        assert_eq!(
+            continuity.codex_subagent_tool_call_ids,
+            vec!["collab-spawn-rust".to_string()]
+        );
+        assert_eq!(
+            continuity.codex_subagent_child_run_ids,
+            vec!["agent-run-child-rust".to_string()]
+        );
+        assert_eq!(
+            continuity.codex_subagent_thread_ids,
+            vec!["child-thread-rust".to_string()]
+        );
+        let continuity_json = serde_json::to_string(&continuity).expect("continuity json");
+        assert!(!continuity_json.contains("Sensitive Rust subagent prompt"));
+    }
+
    #[tokio::test]
    async fn failed_restore_manifest_stays_not_ready_and_rejects_attach() {
        let workspace = tempdir().expect("workspace");

diff --git a/packages/tui-rs/src/ui_state.rs b/packages/tui-rs/src/ui_state.rs
--- a/packages/tui-rs/src/ui_state.rs
+++ b/packages/tui-rs/src/ui_state.rs
@@ -89,7 +89,12 @@
}

fn ui_state_path() -> Option<PathBuf> {
-    if let Ok(path) = env::var("MAESTRO_UI_STATE") {
+    let configured_path = env::var("MAESTRO_UI_STATE").ok();
+    ui_state_path_from_env_value(configured_path.as_deref())
+}
+
+fn ui_state_path_from_env_value(path: Option<&str>) -> Option<PathBuf> {
+    if let Some(path) = path {
        if !path.trim().is_empty() {
            let raw = PathBuf::from(path);
            if let Some(expanded) = expand_tilde(&raw) {
@@ -159,56 +164,33 @@
    // ========================================================================
    // UI State Path Tests
    // ========================================================================
-    // NOTE: These tests modify environment variables which is not thread-safe.
-    // They use unique env var names to avoid race conditions with parallel tests.

    #[test]
    fn test_ui_state_path_default() {
-        // Test default behavior by temporarily checking if env var is unset
-        // Note: Due to parallel test execution, we can't reliably clear env vars
-        // Instead, verify the function returns a valid path structure
-        let current_env = std::env::var("MAESTRO_UI_STATE").ok();
-        if current_env.is_none() || current_env.as_ref().is_some_and(|v| v.trim().is_empty()) {
-            let path = ui_state_path();
-            if let Some(p) = path {
-                assert!(
-                    p.ends_with("ui-state.json"),
-                    "Expected path to end with ui-state.json, got: {:?}",
-                    p
-                );
-                assert!(
-                    p.to_string_lossy().contains(".composer"),
-                    "Expected path to contain .composer, got: {:?}",
-                    p
-                );
-            }
+        let path = ui_state_path_from_env_value(None);
+        if let Some(p) = path {
+            assert!(
+                p.ends_with("ui-state.json"),
+                "Expected path to end with ui-state.json, got: {:?}",
+                p
+            );
+            assert!(
+                p.to_string_lossy().contains(".composer"),
+                "Expected path to contain .composer, got: {:?}",
+                p
+            );
        }
-        // If env var is set by another test, skip this test's assertions
    }

    #[test]
    fn test_ui_state_path_from_env() {
-        // Save original value
-        let original = std::env::var("MAESTRO_UI_STATE").ok();
-
-        std::env::set_var("MAESTRO_UI_STATE", "/tmp/custom-ui-state.json");
-        let path = ui_state_path();
+        let path = ui_state_path_from_env_value(Some("/tmp/custom-ui-state.json"));
        assert_eq!(path, Some(PathBuf::from("/tmp/custom-ui-state.json")));
-
-        // Restore original value
-        match original {
-            Some(v) => std::env::set_var("MAESTRO_UI_STATE", v),
-            None => std::env::remove_var("MAESTRO_UI_STATE"),
-        }
    }

    #[test]
    fn test_ui_state_path_empty_env() {
-        // Save original value
-        let original = std::env::var("MAESTRO_UI_STATE").ok();
-
-        std::env::set_var("MAESTRO_UI_STATE", "   ");
-        let path = ui_state_path();
+        let path = ui_state_path_from_env_value(Some("   "));
        // Should fall back to default when env var is empty/whitespace
        if let Some(p) = path {
            assert!(
@@ -217,31 +199,15 @@
                p
            );
        }
-
-        // Restore original value
-        match original {
-            Some(v) => std::env::set_var("MAESTRO_UI_STATE", v),
-            None => std::env::remove_var("MAESTRO_UI_STATE"),
-        }
    }

    #[test]
    fn test_ui_state_path_tilde_expansion() {
-        // Save original value
-        let original = std::env::var("MAESTRO_UI_STATE").ok();
-
-        std::env::set_var("MAESTRO_UI_STATE", "~/my-ui-state.json");
-        let path = ui_state_path();
+        let path = ui_state_path_from_env_value(Some("~/my-ui-state.json"));
        if let Some(p) = path {
            // Should not start with ~ after expansion
            assert!(!p.to_string_lossy().starts_with('~'));
            assert!(p.to_string_lossy().ends_with("my-ui-state.json"));
        }
-
-        // Restore original value
-        match original {
-            Some(v) => std::env::set_var("MAESTRO_UI_STATE", v),
-            None => std::env::remove_var("MAESTRO_UI_STATE"),
-        }
    }
}

diff --git a/src/cli/commands/skill.ts b/src/cli/commands/skill.ts
--- a/src/cli/commands/skill.ts
+++ b/src/cli/commands/skill.ts
@@ -4,6 +4,7 @@
import chalk from "chalk";
import { PATHS } from "../../config/constants.js";
import {
+	buildSkillRuntimeActivation,
	evaluateSkillPackages,
	findSkill,
	formatSkillEvalText,
@@ -172,6 +173,7 @@
		sourcePath: skill.sourcePath,
		resources: skill.resources,
		resourceDirs: skill.resourceDirs,
+		runtimeActivation: buildSkillRuntimeActivation(skill),
	};
	if (options.json) {
		console.log(JSON.stringify(payload, null, 2));

diff --git a/src/server/handlers/hosted-runner-drain.ts b/src/server/handlers/hosted-runner-drain.ts
--- a/src/server/handlers/hosted-runner-drain.ts
+++ b/src/server/handlers/hosted-runner-drain.ts
@@ -25,6 +25,13 @@
export const HOSTED_RUNNER_RETENTION_POLICY_VERSION =
	"evalops.remote-runner.retention.v1";

+export const HOSTED_RUNNER_WORK_CONTINUITY_VERSION =
+	"evalops.remote-runner.work-continuity.v1";
+
+const CODEX_SUBAGENT_TOOL_PREFIX = "codex.subagent.";
+const CODEX_SUBAGENT_WORK_GRAPH_SCHEMA =
+	"evalops.maestro.codex.subagent-workgraph.v1";
+
export enum HostedRunnerDrainStatusValue {
	Drained = "drained",
	Interrupted = "interrupted",
@@ -105,6 +112,24 @@
	snapshot?: HeadlessRuntimeSnapshot;
}

+class HostedRunnerRuntimeDrainError extends Error {
+	readonly sessionId: string;
+	readonly sessionFile?: string;
+	readonly protocolVersion?: string;
+	readonly cursor?: number;
+	readonly snapshot?: HeadlessRuntimeSnapshot;
+
+	constructor(message: string, runtime: HostedRunnerRuntimeDrainResult) {
+		super(message);
+		this.name = "HostedRunnerRuntimeDrainError";
+		this.sessionId = runtime.sessionId;
+		this.sessionFile = runtime.sessionFile;
+		this.protocolVersion = runtime.protocolVersion;
+		this.cursor = runtime.cursor;
+		this.snapshot = runtime.snapshot;
+	}
+}
+
export interface HostedRunnerWorkspaceExportPath {
	input: string;
	path: string;
@@ -134,6 +159,7 @@
		mode: HostedRunnerWorkspaceExportMode;
		paths: HostedRunnerWorkspaceExportPath[];
	};
+	work_continuity: HostedRunnerWorkContinuity;
	snapshot: HeadlessRuntimeSnapshot;
	retention_policy: HostedRunnerRetentionPolicy;
	git?: {
@@ -143,6 +169,16 @@
	};
}

+export interface HostedRunnerWorkContinuity {
+	protocol_version: typeof HOSTED_RUNNER_WORK_CONTINUITY_VERSION;
+	active_tool_count: number;
+	tracked_tool_count: number;
+	pending_request_count: number;
+	codex_subagent_tool_call_ids: string[];
+	codex_subagent_child_run_ids: string[];
+	codex_subagent_thread_ids: string[];
+}
+
export interface HostedRunnerDrainResult {
	status: HostedRunnerDrainStatus;
	runner_session_id: string;
@@ -412,6 +448,115 @@
	};
}

+function isRecord(value: unknown): value is Record<string, unknown> {
+	return Boolean(value && typeof value === "object" && !Array.isArray(value));
+}
+
+function stringArray(value: unknown): string[] {
+	return Array.isArray(value)
+		? value.filter(
+				(item): item is string => typeof item === "string" && item.length > 0,
+			)
+		: [];
+}
+
+function collectCodexWorkArgs(
+	args: unknown,
+	childRunIds: Set<string>,
+	threadIds: Set<string>,
+	includeLooseArgs = false,
+): boolean {
+	if (!isRecord(args)) {
+		return false;
+	}
+	const graph = args.codexWorkGraph ?? args.codex_work_graph;
+	const hasCodexGraph =
+		isRecord(graph) &&
+		(graph.schemaVersion === CODEX_SUBAGENT_WORK_GRAPH_SCHEMA ||
+			graph.schema_version === CODEX_SUBAGENT_WORK_GRAPH_SCHEMA);
+	if (!includeLooseArgs && !hasCodexGraph) {
+		return false;
+	}
+	for (const childRunId of stringArray(
+		args.childRunIds ?? args.child_run_ids,
+	)) {
+		childRunIds.add(childRunId);
+	}
+	for (const threadId of stringArray(
+		args.receiverThreadIds ?? args.receiver_thread_ids,
+	)) {
+		threadIds.add(threadId);
+	}
+	if (isRecord(graph)) {
+		const graphChildRuns = graph.childRuns ?? graph.child_runs;
... diff truncated: showing 800 of 2077 lines

_{You can send follow-ups to the cloud agent here.}

chatgpt-codex-connector

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 768aaac306

ℹ️ About Codex in GitHub

Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you

Open a pull request for review
Mark a draft as ready
Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".

chatgpt-codex-connector · 2026-05-16T05:07:35Z

@@ -1,5 +1,5 @@
 module(
-    name = "evalops_maestro",
+    name = "evalops_maestro_internal",


Restore public Bazel module name

Changing module(name = ...) from evalops_maestro to evalops_maestro_internal is a breaking API change for Bzlmod consumers: downstream workspaces that depend on this repo via bazel_dep(name = "evalops_maestro", ...) will no longer resolve the module name after this sync. Because this commit does not provide a compatibility/migration path, keeping the public module identity unchanged is necessary to avoid immediate dependency breakage.

Useful? React with 👍 / 👎.

chatgpt-codex-connector · 2026-05-16T05:07:35Z


 Trusted CI should run on the repo-scoped Buildfarm runner label
-`evalops-maestro-rbe` after Deploy has registered that runner through
+`evalops-maestro-internal-rbe` after Deploy has registered that runner through


Keep Bazel runner label aligned with CI workflow

The doc now tells operators to register evalops-maestro-internal-rbe, but the active Bazel RBE workflow still runs on evalops-maestro-rbe (.github/workflows/bazel-rbe.yml, runs-on label at line 47). Following this updated doc will provision the wrong label and leave the Bazel smoke job unschedulable on fresh environments.

Useful? React with 👍 / 👎.

chatgpt-codex-connector

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 12934beacd

ℹ️ About Codex in GitHub

Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you

Open a pull request for review
Mark a draft as ready
Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".

chatgpt-codex-connector · 2026-05-16T05:11:17Z

 Trusted CI should run on the repo-scoped Buildfarm runner label
-`evalops-maestro-rbe` after Deploy has registered that runner through
+`evalops-maestro-internal-rbe` after Deploy has registered that runner through


Align Bazel runner label documentation with active workflow

The runner label in this doc was changed to evalops-maestro-internal-rbe, but the actual Bazel CI job still targets evalops-maestro-rbe in .github/workflows/bazel-rbe.yml (runs-on list). Following this updated guidance will register the wrong label and leave Bazel RBE jobs unschedulable in environments that provision runners from the documented value.

Useful? React with 👍 / 👎.

cursor Bot reviewed May 16, 2026

View reviewed changes

Comment thread .bazelignore

Comment thread MODULE.bazel

haasonsaas force-pushed the sync/public-release-mirror branch from 84c6227 to 483d5cd Compare May 16, 2026 04:38

chatgpt-codex-connector Bot reviewed May 16, 2026

View reviewed changes

Comment thread .github/actionlint.yaml

cursor Bot reviewed May 16, 2026

View reviewed changes

Comment thread .github/actionlint.yaml

haasonsaas force-pushed the sync/public-release-mirror branch from 6f652fe to 4d0f4db Compare May 16, 2026 04:48

cursor Bot reviewed May 16, 2026

View reviewed changes

Comment thread src/server/handlers/hosted-runner-drain.ts

haasonsaas enabled auto-merge (squash) May 16, 2026 05:07

chore: sync public mirror from internal

12934be

haasonsaas force-pushed the sync/public-release-mirror branch from 768aaac to 12934be Compare May 16, 2026 05:07

chatgpt-codex-connector Bot reviewed May 16, 2026

View reviewed changes

haasonsaas merged commit b0bc8c5 into main May 16, 2026
13 of 14 checks passed

haasonsaas deleted the sync/public-release-mirror branch May 16, 2026 05:15

This was referenced May 16, 2026

[codex] Guardrail backlog: Other feedback (other-feedback) evalops/.github#48

Open

[codex] Guardrail candidate: Other feedback (other-feedback) #394

Open

[codex] Recent unresolved review feedback evalops/.github#84

Closed

Conversation

haasonsaas commented May 16, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Source-of-truth status

Public Mirror Drift Audit

Sample Changed Paths

Guidance

Drift sample

Public-only commits since last generated sync

Validation

Test Plan

Staged Rollout

Supersedes

Uh oh!

cursor Bot commented May 16, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

PR Summary

Uh oh!

cursor Bot left a comment • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

💡 Codex Review

Uh oh!

Uh oh!

Uh oh!

cursor Bot left a comment • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

cursor Bot commented May 16, 2026

Eval Harness

CLI Contract

Uh oh!

chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

💡 Codex Review

Uh oh!

chatgpt-codex-connector Bot May 16, 2026

Choose a reason for hiding this comment

Uh oh!

chatgpt-codex-connector Bot May 16, 2026

Choose a reason for hiding this comment

Uh oh!

chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

💡 Codex Review

Uh oh!

chatgpt-codex-connector Bot May 16, 2026

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

haasonsaas commented May 16, 2026 •

edited

Loading

cursor Bot commented May 16, 2026 •

edited

Loading

cursor Bot left a comment •

edited

Loading

cursor Bot left a comment •

edited

Loading