Skip to content

chore: sync public mirror from internal#417

Merged
haasonsaas merged 1 commit into
mainfrom
sync/public-release-mirror
May 16, 2026
Merged

chore: sync public mirror from internal#417
haasonsaas merged 1 commit into
mainfrom
sync/public-release-mirror

Conversation

@haasonsaas
Copy link
Copy Markdown
Contributor

@haasonsaas haasonsaas commented May 16, 2026

Summary

  • sync the sanitized public tree from evalops/maestro-internal
  • keep evalops/maestro as a generated public mirror of the private source of truth
  • preserve public-owned CI and trusted-publishing workflows from the public checkout
  • internal source SHA: 103df07bcc7e84e2bc4e01428d1275a119e7dff8
  • last generated public sync base: 1e379abc58b01b695553bfbd0b39c99127af120d
  • previewed public-tree drift: 11 file(s) to copy/update and 0 stale file(s) to delete
  • public-only commits since last generated sync: 3

Source-of-truth status

Public Mirror Drift Audit

  • package: @evalops/maestro
  • private source: https://github.com/evalops/maestro-internal@main (103df07bcc7e)
  • public projection: https://github.com/evalops/maestro@main (09369c829785)
  • files to copy or update: 11
  • stale files to delete: 0
  • result: drift detected
  • invariant: public_projection_has_drift

Sample Changed Paths

  • copy/update .github/actionlint.yaml
  • copy/update MODULE.bazel
  • copy/update docs/development/bazel.md
  • copy/update docs/protocols/codex-operating-layer.json
  • copy/update docs/protocols/headless.md
  • copy/update docs/protocols/hosted-runner-contract.md
  • copy/update packages/tui-rs/src/hosted_runner.rs
  • copy/update packages/tui-rs/src/ui_state.rs
  • copy/update src/server/handlers/hosted-runner-drain.ts
  • copy/update test/server/hosted-runner-drain.test.ts
  • copy/update test/skill-package-format.test.ts

Guidance

Let internal main generate and merge the public sync PR before relying on public main.

Drift sample

  • copy/update .github/actionlint.yaml
  • copy/update MODULE.bazel
  • copy/update docs/development/bazel.md
  • copy/update docs/protocols/codex-operating-layer.json
  • copy/update docs/protocols/headless.md
  • copy/update docs/protocols/hosted-runner-contract.md
  • copy/update packages/tui-rs/src/hosted_runner.rs
  • copy/update packages/tui-rs/src/ui_state.rs
  • copy/update src/server/handlers/hosted-runner-drain.ts
  • copy/update test/server/hosted-runner-drain.test.ts
  • copy/update test/skill-package-format.test.ts

Public-only commits since last generated sync

Validation

  • generated by the sync-public-release-mirror workflow in public-tree mode

Test Plan

  • generated by the sync-public-release-mirror workflow in public-tree mode
  • public-source-provenance require-internal-pr check confirms internal source PR lineage
  • CI, integration, rust-hosted-conformance, coverage, Socket, and Cursor checks must pass before merge

Staged Rollout

  • Staging is unnecessary for this generated mirror PR: it does not independently promote user-visible behavior. It mirrors already-reviewed internal source from evalops/maestro-internal@103df07bcc7e84e2bc4e01428d1275a119e7dff8, including existing hidden/evaluation surfaces, and keeps public package parity behind the established public-source-provenance gate.

Supersedes

@cursor
Copy link
Copy Markdown

cursor Bot commented May 16, 2026

PR Summary

Medium Risk
Updates the hosted-runner drain/snapshot manifest shape and generation in both TypeScript and Rust, which affects Platform handoff/restore semantics and could break consumers expecting the previous manifest schema. Logic also changes interrupted-drain behavior to preserve partial runtime snapshot metadata, so correctness depends on new parsing/aggregation paths.

Overview
Hosted-runner drain now writes a new work_continuity block into the snapshot manifest (evalops.remote-runner.work-continuity.v1), summarizing active/tracked tools, pending-request counts, and Codex subagent tool-call/child-run/thread IDs without copying prompt payloads.

Both the TypeScript drain handler and the Rust hosted runner implement continuity extraction from runtime snapshot state (including mixed camelCase/snake_case Codex work-graph fields) and add tests/docs/evidence-map updates to lock the contract. Interrupted drain handling is also hardened to carry through the last known runtime session_id/cursor/snapshot details via a typed error wrapper.

Separately, Bazel/CI naming is updated to the internal mirror (evalops_maestro_internal / evalops-maestro-internal-rbe), and Rust UI-state path tests are made deterministic by refactoring env-var handling.

Reviewed by Cursor Bugbot for commit 12934be. Bugbot is set up for automated code reviews on this repo. Configure here.

Copy link
Copy Markdown

@cursor cursor Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cursor Bugbot has reviewed your changes and found 2 potential issues.

Autofix Details

Bugbot Autofix prepared fixes for both issues found in the latest run.

  • ✅ Fixed: Bazelignore references internal repo symlink name
    • Updated .bazelignore to ignore the public Bazel convenience symlink name bazel-maestro instead of the internal mirror name.
  • ✅ Fixed: MODULE.bazel leaks internal module name to public
    • Renamed the Bazel module from evalops_maestro_internal to the public-facing evalops_maestro.

You can send follow-ups to the cloud agent here.

Comment thread .bazelignore
Comment thread MODULE.bazel
@haasonsaas haasonsaas force-pushed the sync/public-release-mirror branch from 84c6227 to 483d5cd Compare May 16, 2026 04:38
Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 483d5cdbd6

ℹ️ About Codex in GitHub

Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".

Comment thread .github/actionlint.yaml
Comment thread .github/actionlint.yaml
@haasonsaas haasonsaas force-pushed the sync/public-release-mirror branch from 6f652fe to 4d0f4db Compare May 16, 2026 04:48
Copy link
Copy Markdown

@cursor cursor Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cursor Bugbot has reviewed your changes and found 1 potential issue.

Fix All in Cursor

Bugbot Autofix prepared a fix for the issue found in the latest run.

  • ✅ Fixed: Rust/TypeScript key-fallback behavior diverges on malformed args
    • TypeScript now falls back to the snake_case key only when the camelCase value is not an array, matching the Rust drain behavior and covered by a regression test.

You can send follow-ups to the cloud agent here.

Reviewed by Cursor Bugbot for commit 4d0f4db. Configure here.

Comment thread src/server/handlers/hosted-runner-drain.ts
@cursor
Copy link
Copy Markdown

cursor Bot commented May 16, 2026

Bugbot Autofix prepared a fix for the issue found in the latest run.

  • ✅ Fixed: Actionlint config breaks public workflow runner label validation
    • Added the public evalops-maestro-rbe runner label back to .github/actionlint.yaml so actionlint accepts both public and internal Bazel RBE workflows.
Preview (6202899018)
diff --git a/.github/actionlint.yaml b/.github/actionlint.yaml
--- a/.github/actionlint.yaml
+++ b/.github/actionlint.yaml
@@ -2,3 +2,4 @@
   labels:
     - blacksmith-*
     - evalops-maestro-rbe
+    - evalops-maestro-internal-rbe

diff --git a/MODULE.bazel b/MODULE.bazel
--- a/MODULE.bazel
+++ b/MODULE.bazel
@@ -1,5 +1,5 @@
 module(
-    name = "evalops_maestro",
+    name = "evalops_maestro_internal",
     version = "0.0.0",
 )
 

diff --git a/docs/cookbook/skills/README.md b/docs/cookbook/skills/README.md
--- a/docs/cookbook/skills/README.md
+++ b/docs/cookbook/skills/README.md
@@ -55,6 +55,21 @@
 maestro skill lint .maestro/skills/reviewing-prs --describe-toolbox

+## Runtime Activation
+
+Loading a skill through the Skill tool returns a skillRuntimeActivation
+object in the tool result details. The same shape is available locally:
+
+bash +maestro skill inspect reviewing-prs --json +
+
+Use this manifest in harnesses and adapters when you need to see which
+references, toolbox executables, MCP servers, and tool bounds become active. MCP
+environment values remain in mcp.json; they are not copied into the activation
+manifest. Servers with missing or malformed includeTools are reported as
+warnings and omitted from the activatable server list.
+

Eval Harness

Use maestro skill eval when a package needs pass/fail evidence rather than

diff --git a/docs/design/EVALOPS_AGENT_CORE_PARITY.md b/docs/design/EVALOPS_AGENT_CORE_PARITY.md
--- a/docs/design/EVALOPS_AGENT_CORE_PARITY.md
+++ b/docs/design/EVALOPS_AGENT_CORE_PARITY.md
@@ -88,6 +88,52 @@

toolbox/ is optional. Executables in that directory are expected to support MAESTRO_TOOLBOX_ACTION=describe so Maestro can register them as typed tools when the skill is active.

+When the Skill tool loads a package, it returns a skillRuntimeActivation
+manifest alongside artifact metadata. maestro skill inspect <name> --json
+emits the same contract for local harnesses:
+
+```json
+{

  • "runtimeActivation": {
  • "name": "reviewing-prs",
  • "source": "project",
  • "profile": {
  •  "model": "gpt-5.5",
    
  •  "mode": "review",
    
  •  "isolatedContext": true
    
  • },
  • "tools": {
  •  "allowed": ["github.get_pull_request"],
    
  •  "builtin": ["read", "search"]
    
  • },
  • "resources": {
  •  "directories": {
    
  •    "reference": ".maestro/skills/reviewing-prs/reference",
    
  •    "toolbox": ".maestro/skills/reviewing-prs/toolbox"
    
  •  }
    
  • },
  • "toolPackage": {
  •  "mcp": {
    
  •    "configPath": ".maestro/skills/reviewing-prs/mcp.json",
    
  •    "servers": [
    
  •      {
    
  •        "name": "github",
    
  •        "command": "npx",
    
  •        "includeTools": ["get_pull_request", "list_pull_request_files"]
    
  •      }
    
  •    ]
    
  •  }
    
  • }
  • }
    +}
    +```

+The activation manifest exposes scoped paths, toolbox entries, MCP server names,
+and includeTools bounds. It does not copy MCP environment values into
+agent-visible details or telemetry. MCP servers with missing or malformed
+includeTools are omitted from the activatable server list and reported through
+manifest warnings.
+

CLI Contract

maestro skill is the public authoring surface:
@@ -117,6 +163,6 @@

  1. Local AgentRuntime ledger: one SQLite store for runs, tool calls, waits, summaries, checkpoints, and session search.
  2. maestro goal: persistent objective loop backed by the local ledger, promotable to Platform Objectives.
  3. maestro workboard: local multi-agent board mapped to Platform AgentRuns when attached.
    -4. Skill-bundled MCP lifecycle: start servers only when the skill triggers, stop them on cooldown/session end.
    -5. Skill-bundled toolbox registration: expose described toolbox commands as governed tools while the skill is active.
    +4. Skill-bundled MCP lifecycle: activate servers from skillRuntimeActivation only when the skill triggers, stop them on cooldown/session end.
    +5. Skill-bundled toolbox registration: expose described toolbox commands from skillRuntimeActivation as governed tools while the skill is active.
  4. Public cookbook and conformance fixtures for third-party skill authors.

diff --git a/docs/development/bazel.md b/docs/development/bazel.md
--- a/docs/development/bazel.md
+++ b/docs/development/bazel.md
@@ -29,5 +29,5 @@


Trusted CI should run on the repo-scoped Buildfarm runner label
-`evalops-maestro-rbe` after Deploy has registered that runner through
+`evalops-maestro-internal-rbe` after Deploy has registered that runner through
`additional_bazel_buildfarm_runners`.

diff --git a/docs/protocols/codex-operating-layer.json b/docs/protocols/codex-operating-layer.json
--- a/docs/protocols/codex-operating-layer.json
+++ b/docs/protocols/codex-operating-layer.json
@@ -276,6 +276,33 @@
			]
		},
		{
+			"area": "remote-runner-continuity",
+			"evidenceType": "source",
+			"path": "src/server/handlers/hosted-runner-drain.ts",
+			"anchors": [
+				"evalops.remote-runner.work-continuity.v1",
+				"collectHostedRunnerWorkContinuity",
+				"codex_subagent_child_run_ids",
+				"codex_subagent_thread_ids"
+			]
+		},
+		{
+			"area": "remote-runner-continuity",
+			"evidenceType": "test",
+			"path": "test/server/hosted-runner-drain.test.ts",
+			"anchors": [
+				"records Codex subagent continuity without copying prompt payloads into manifest metadata",
+				"HOSTED_RUNNER_WORK_CONTINUITY_VERSION",
+				"codex_subagent_child_run_ids"
+			]
+		},
+		{
+			"area": "remote-runner-continuity",
+			"evidenceType": "doc",
+			"path": "docs/protocols/hosted-runner-contract.md",
+			"anchors": ["work_continuity", "Codex subagent child runs"]
+		},
+		{
			"area": "realtime-streaming",
			"evidenceType": "source",
			"path": "src/server/handlers/runtime-app-server-ws.ts",

diff --git a/docs/protocols/headless.md b/docs/protocols/headless.md
--- a/docs/protocols/headless.md
+++ b/docs/protocols/headless.md
@@ -124,6 +124,15 @@
        }
      ]
    },
+    "work_continuity": {
+      "protocol_version": "evalops.remote-runner.work-continuity.v1",
+      "active_tool_count": 0,
+      "tracked_tool_count": 0,
+      "pending_request_count": 0,
+      "codex_subagent_tool_call_ids": [],
+      "codex_subagent_child_run_ids": [],
+      "codex_subagent_thread_ids": []
+    },
    "retention_policy": {
      "policy_version": "evalops.remote-runner.retention.v1",
      "managed_by": "platform",

diff --git a/docs/protocols/hosted-runner-contract.md b/docs/protocols/hosted-runner-contract.md
--- a/docs/protocols/hosted-runner-contract.md
+++ b/docs/protocols/hosted-runner-contract.md
@@ -187,7 +187,8 @@
`evalops.remote-runner.snapshot-manifest.v1`. Both Rust-hosted and
TypeScript-hosted drain paths write this same local manifest envelope, including
the runtime flush status, workspace export contract, headless runtime snapshot,
-and `retention_policy` metadata describing visibility and redaction classes.
+`work_continuity` metadata for active/pending Codex subagent child runs, and
+`retention_policy` metadata describing visibility and redaction classes.
Maestro does not upload to GCS, S3, Modal storage, Daytona storage, or any
other provider store. Upload, retention, workspace artifact hydration, and
choosing which manifest should be restored are Platform responsibilities. See

diff --git a/packages/tui-rs/src/hosted_runner.rs b/packages/tui-rs/src/hosted_runner.rs
--- a/packages/tui-rs/src/hosted_runner.rs
+++ b/packages/tui-rs/src/hosted_runner.rs
@@ -5,7 +5,7 @@
//! contract so Platform and conformance tests can target a Rust runtime without
//! routing through the Node web server.

-use std::collections::{HashMap, HashSet, VecDeque};
+use std::collections::{BTreeSet, HashMap, HashSet, VecDeque};
use std::ffi::OsString;
use std::fs;
use std::io;
@@ -40,12 +40,15 @@
pub const HOSTED_RUNNER_SNAPSHOT_MANIFEST_VERSION: &str =
    "evalops.remote-runner.snapshot-manifest.v1";
pub const HOSTED_RUNNER_RETENTION_POLICY_VERSION: &str = "evalops.remote-runner.retention.v1";
+pub const HOSTED_RUNNER_WORK_CONTINUITY_VERSION: &str = "evalops.remote-runner.work-continuity.v1";

const DEFAULT_LISTEN_HOST: &str = "0.0.0.0";
const DEFAULT_LISTEN_PORT: u16 = 8080;
const DEFAULT_HEARTBEAT_INTERVAL_MS: u64 = 15_000;
const CONNECTION_IDLE_MS: i64 = (DEFAULT_HEARTBEAT_INTERVAL_MS as i64) * 3;
const MAX_EVENTS: usize = 1024;
+const CODEX_SUBAGENT_TOOL_PREFIX: &str = "codex.subagent.";
+const CODEX_SUBAGENT_WORK_GRAPH_SCHEMA: &str = "evalops.maestro.codex.subagent-workgraph.v1";

#[derive(Debug, Clone)]
pub struct HostedRunnerConfig {
@@ -575,6 +578,8 @@
    workspace_root: PathBuf,
    runtime: RuntimeFlushManifest,
    workspace_export: WorkspaceExportManifest,
+    #[serde(default, skip_serializing_if = "Option::is_none")]
+    work_continuity: Option<WorkContinuityManifest>,
    snapshot: RuntimeSnapshot,
    #[serde(default, skip_serializing_if = "Option::is_none")]
    retention_policy: Option<RetentionPolicyManifest>,
@@ -609,6 +614,17 @@
}

#[derive(Debug, Clone, Serialize, Deserialize)]
+struct WorkContinuityManifest {
+    protocol_version: String,
+    active_tool_count: usize,
+    tracked_tool_count: usize,
+    pending_request_count: usize,
+    codex_subagent_tool_call_ids: Vec<String>,
+    codex_subagent_child_run_ids: Vec<String>,
+    codex_subagent_thread_ids: Vec<String>,
+}
+
+#[derive(Debug, Clone, Serialize, Deserialize)]
struct RetentionPolicyManifest {
    policy_version: String,
    managed_by: String,
@@ -656,6 +672,153 @@
    }
}

+fn default_work_continuity_manifest(snapshot: &RuntimeSnapshot) -> WorkContinuityManifest {
+    let state = &snapshot.state;
+    let mut tool_call_ids = BTreeSet::new();
+    let mut child_run_ids = BTreeSet::new();
+    let mut thread_ids = BTreeSet::new();
+    let pending_request_count = state.pending_approvals.len()
+        + state.pending_client_tools.len()
+        + state.pending_mcp_elicitations.len()
+        + state.pending_user_inputs.len()
+        + state.pending_tool_retries.len();
+    for source in state
+        .tracked_tools
+        .iter()
+        .chain(state.pending_approvals.iter())
+        .chain(state.pending_client_tools.iter())
+        .chain(state.pending_mcp_elicitations.iter())
+        .chain(state.pending_user_inputs.iter())
+        .chain(state.pending_tool_retries.iter())
+    {
+        let tool = json_string_field(source, &["tool"]).unwrap_or_default();
+        let is_codex_subagent_tool = tool.starts_with(CODEX_SUBAGENT_TOOL_PREFIX);
+        let has_codex_work_args = collect_codex_work_args(
+            source.get("args"),
+            &mut child_run_ids,
+            &mut thread_ids,
+            is_codex_subagent_tool,
+        );
+        if is_codex_subagent_tool || has_codex_work_args {
+            if let Some(call_id) =
+                json_string_field(source, &["call_id", "callId", "tool_call_id", "toolCallId"])
+            {
+                tool_call_ids.insert(call_id);
+            }
+        }
+    }
+    for active_tool in &state.active_tools {
+        let tool = json_string_field(active_tool, &["tool"]).unwrap_or_default();
+        if tool.starts_with(CODEX_SUBAGENT_TOOL_PREFIX) {
+            if let Some(call_id) = json_string_field(
+                active_tool,
+                &["call_id", "callId", "tool_call_id", "toolCallId"],
+            ) {
+                tool_call_ids.insert(call_id);
+            }
+        }
+    }
+    WorkContinuityManifest {
+        protocol_version: HOSTED_RUNNER_WORK_CONTINUITY_VERSION.to_string(),
+        active_tool_count: state.active_tools.len(),
+        tracked_tool_count: state.tracked_tools.len(),
+        pending_request_count,
+        codex_subagent_tool_call_ids: tool_call_ids.into_iter().collect(),
+        codex_subagent_child_run_ids: child_run_ids.into_iter().collect(),
+        codex_subagent_thread_ids: thread_ids.into_iter().collect(),
+    }
+}
+
+fn collect_codex_work_args(
+    args: Option<&serde_json::Value>,
+    child_run_ids: &mut BTreeSet<String>,
+    thread_ids: &mut BTreeSet<String>,
+    include_loose_args: bool,
+) -> bool {
+    let Some(args) = args.and_then(serde_json::Value::as_object) else {
+        return false;
+    };
+    let graph = args
+        .get("codexWorkGraph")
+        .or_else(|| args.get("codex_work_graph"));
+    let has_codex_graph = graph
+        .and_then(serde_json::Value::as_object)
+        .is_some_and(|graph| {
+            json_string_field_from_object(graph, &["schemaVersion", "schema_version"]).as_deref()
+                == Some(CODEX_SUBAGENT_WORK_GRAPH_SCHEMA)
+        });
+    if !include_loose_args && !has_codex_graph {
+        return false;
+    }
+    collect_json_string_array_from_object(args, &["childRunIds", "child_run_ids"], child_run_ids);
+    collect_json_string_array_from_object(
+        args,
+        &["receiverThreadIds", "receiver_thread_ids"],
+        thread_ids,
+    );
+    if let Some(graph) = graph.and_then(serde_json::Value::as_object) {
+        let child_runs = graph
+            .get("childRuns")
+            .or_else(|| graph.get("child_runs"))
+            .and_then(serde_json::Value::as_array);
+        if let Some(child_runs) = child_runs {
+            for child_run in child_runs {
+                if let Some(child_run) = child_run.as_object() {
+                    if let Some(child_run_id) =
+                        json_string_field_from_object(child_run, &["childRunId", "child_run_id"])
+                    {
+                        child_run_ids.insert(child_run_id);
+                    }
+                    if let Some(thread_id) =
+                        json_string_field_from_object(child_run, &["threadId", "thread_id"])
+                    {
+                        thread_ids.insert(thread_id);
+                    }
+                }
+            }
+        }
+    }
+    include_loose_args || has_codex_graph
+}
+
+fn collect_json_string_array_from_object(
+    object: &serde_json::Map<String, serde_json::Value>,
+    keys: &[&str],
+    values: &mut BTreeSet<String>,
+) {
+    for key in keys {
+        let Some(items) = object.get(*key).and_then(serde_json::Value::as_array) else {
+            continue;
+        };
+        for item in items {
+            if let Some(item) = item.as_str().map(str::trim).filter(|item| !item.is_empty()) {
+                values.insert(item.to_string());
+            }
+        }
+        return;
+    }
+}
+
+fn json_string_field(value: &serde_json::Value, keys: &[&str]) -> Option<String> {
+    value
+        .as_object()
+        .and_then(|object| json_string_field_from_object(object, keys))
+}
+
+fn json_string_field_from_object(
+    object: &serde_json::Map<String, serde_json::Value>,
+    keys: &[&str],
+) -> Option<String> {
+    keys.iter().find_map(|key| {
+        object
+            .get(*key)
+            .and_then(serde_json::Value::as_str)
+            .map(str::trim)
+            .filter(|value| !value.is_empty())
+            .map(ToOwned::to_owned)
+    })
+}
+
#[derive(Debug, Clone, Copy, PartialEq, Eq, Serialize, Deserialize)]
#[serde(rename_all = "snake_case")]
enum RuntimeFlushStatus {
@@ -2741,6 +2904,7 @@
            mode: "local_path_contract".to_string(),
            paths: workspace_export_paths,
        },
+        work_continuity: Some(default_work_continuity_manifest(&snapshot)),
        snapshot,
        retention_policy: Some(default_retention_policy_manifest()),
    };
@@ -3632,6 +3796,16 @@
            "directory"
        );
        assert_eq!(
+            manifest["work_continuity"]["protocol_version"],
+            HOSTED_RUNNER_WORK_CONTINUITY_VERSION
+        );
+        assert_eq!(manifest["work_continuity"]["active_tool_count"], 0);
+        assert_eq!(manifest["work_continuity"]["tracked_tool_count"], 0);
+        assert_eq!(
+            manifest["work_continuity"]["codex_subagent_tool_call_ids"],
+            json!([])
+        );
+        assert_eq!(
            manifest["retention_policy"]["policy_version"],
            HOSTED_RUNNER_RETENTION_POLICY_VERSION
        );
@@ -3940,6 +4114,15 @@
                    "type": "file"
                }]
            },
+            "work_continuity": {
+                "protocol_version": HOSTED_RUNNER_WORK_CONTINUITY_VERSION,
+                "active_tool_count": 1,
+                "tracked_tool_count": 1,
+                "pending_request_count": 0,
+                "codex_subagent_tool_call_ids": ["collab-spawn-ts"],
+                "codex_subagent_child_run_ids": ["agent-run-child-ts"],
+                "codex_subagent_thread_ids": ["child-thread-ts"]
+            },
            "retention_policy": {
                "policy_version": HOSTED_RUNNER_RETENTION_POLICY_VERSION,
                "managed_by": "platform",
@@ -4009,11 +4192,99 @@
                .policy_version,
            HOSTED_RUNNER_RETENTION_POLICY_VERSION
        );
+        let work_continuity = parsed.work_continuity.as_ref().expect("work continuity");
+        assert_eq!(
+            work_continuity.protocol_version,
+            HOSTED_RUNNER_WORK_CONTINUITY_VERSION
+        );
+        assert_eq!(
+            work_continuity.codex_subagent_child_run_ids,
+            vec!["agent-run-child-ts".to_string()]
+        );
        assert_eq!(parsed.snapshot.session_id, "session_ts");
        assert_eq!(parsed.snapshot.cursor, 7);
        assert_eq!(parsed.workspace_export.paths[0].relative_path, "README.md");
    }

+    #[test]
+    fn work_continuity_manifest_extracts_codex_subagent_ids() {
+        let snapshot: RuntimeSnapshot = serde_json::from_value(json!({
+            "protocolVersion": HEADLESS_PROTOCOL_VERSION,
+            "session_id": "session_rust",
+            "cursor": 9,
+            "last_init": null,
+            "state": {
+                "protocol_version": HEADLESS_PROTOCOL_VERSION,
+                "connection_count": 0,
+                "subscriber_count": 0,
+                "connections": [],
+                "model": "gpt-5.4",
+                "provider": "rust",
+                "session_id": "session_rust",
+                "pending_approvals": [{
+                    "id": "approval-rust",
+                    "call_id": "approval-call-rust",
+                    "tool": "shell"
+                }],
+                "pending_client_tools": [],
+                "pending_mcp_elicitations": [],
+                "pending_user_inputs": [{
+                    "id": "input-rust",
+                    "prompt": "continue?"
+                }],
+                "pending_tool_retries": [],
+                "tracked_tools": [{
+                    "call_id": "collab-spawn-rust",
+                    "tool": "codex.subagent.spawnAgent",
+                    "args": {
+                        "prompt": "Sensitive Rust subagent prompt",
+                        "codex_work_graph": {
+                            "schema_version": "evalops.maestro.codex.subagent-workgraph.v1",
+                            "child_runs": [{
+                                "thread_id": "child-thread-rust",
+                                "child_run_id": "agent-run-child-rust"
+                            }]
+                        }
+                    }
+                }],
+                "active_tools": [{
+                    "call_id": "collab-spawn-rust",
+                    "tool": "codex.subagent.spawnAgent",
+                    "output": "starting child"
+                }],
+                "active_utility_commands": [],
+                "active_file_watches": [],
+                "is_ready": true,
+                "is_responding": false
+            }
+        }))
+        .expect("runtime snapshot");
+
+        let continuity = default_work_continuity_manifest(&snapshot);
+
+        assert_eq!(
+            continuity.protocol_version,
+            HOSTED_RUNNER_WORK_CONTINUITY_VERSION
+        );
+        assert_eq!(continuity.active_tool_count, 1);
+        assert_eq!(continuity.tracked_tool_count, 1);
+        assert_eq!(continuity.pending_request_count, 2);
+        assert_eq!(
+            continuity.codex_subagent_tool_call_ids,
+            vec!["collab-spawn-rust".to_string()]
+        );
+        assert_eq!(
+            continuity.codex_subagent_child_run_ids,
+            vec!["agent-run-child-rust".to_string()]
+        );
+        assert_eq!(
+            continuity.codex_subagent_thread_ids,
+            vec!["child-thread-rust".to_string()]
+        );
+        let continuity_json = serde_json::to_string(&continuity).expect("continuity json");
+        assert!(!continuity_json.contains("Sensitive Rust subagent prompt"));
+    }
+
    #[tokio::test]
    async fn failed_restore_manifest_stays_not_ready_and_rejects_attach() {
        let workspace = tempdir().expect("workspace");

diff --git a/packages/tui-rs/src/ui_state.rs b/packages/tui-rs/src/ui_state.rs
--- a/packages/tui-rs/src/ui_state.rs
+++ b/packages/tui-rs/src/ui_state.rs
@@ -89,7 +89,12 @@
}

fn ui_state_path() -> Option<PathBuf> {
-    if let Ok(path) = env::var("MAESTRO_UI_STATE") {
+    let configured_path = env::var("MAESTRO_UI_STATE").ok();
+    ui_state_path_from_env_value(configured_path.as_deref())
+}
+
+fn ui_state_path_from_env_value(path: Option<&str>) -> Option<PathBuf> {
+    if let Some(path) = path {
        if !path.trim().is_empty() {
            let raw = PathBuf::from(path);
            if let Some(expanded) = expand_tilde(&raw) {
@@ -159,56 +164,33 @@
    // ========================================================================
    // UI State Path Tests
    // ========================================================================
-    // NOTE: These tests modify environment variables which is not thread-safe.
-    // They use unique env var names to avoid race conditions with parallel tests.

    #[test]
    fn test_ui_state_path_default() {
-        // Test default behavior by temporarily checking if env var is unset
-        // Note: Due to parallel test execution, we can't reliably clear env vars
-        // Instead, verify the function returns a valid path structure
-        let current_env = std::env::var("MAESTRO_UI_STATE").ok();
-        if current_env.is_none() || current_env.as_ref().is_some_and(|v| v.trim().is_empty()) {
-            let path = ui_state_path();
-            if let Some(p) = path {
-                assert!(
-                    p.ends_with("ui-state.json"),
-                    "Expected path to end with ui-state.json, got: {:?}",
-                    p
-                );
-                assert!(
-                    p.to_string_lossy().contains(".composer"),
-                    "Expected path to contain .composer, got: {:?}",
-                    p
-                );
-            }
+        let path = ui_state_path_from_env_value(None);
+        if let Some(p) = path {
+            assert!(
+                p.ends_with("ui-state.json"),
+                "Expected path to end with ui-state.json, got: {:?}",
+                p
+            );
+            assert!(
+                p.to_string_lossy().contains(".composer"),
+                "Expected path to contain .composer, got: {:?}",
+                p
+            );
        }
-        // If env var is set by another test, skip this test's assertions
    }

    #[test]
    fn test_ui_state_path_from_env() {
-        // Save original value
-        let original = std::env::var("MAESTRO_UI_STATE").ok();
-
-        std::env::set_var("MAESTRO_UI_STATE", "/tmp/custom-ui-state.json");
-        let path = ui_state_path();
+        let path = ui_state_path_from_env_value(Some("/tmp/custom-ui-state.json"));
        assert_eq!(path, Some(PathBuf::from("/tmp/custom-ui-state.json")));
-
-        // Restore original value
-        match original {
-            Some(v) => std::env::set_var("MAESTRO_UI_STATE", v),
-            None => std::env::remove_var("MAESTRO_UI_STATE"),
-        }
    }

    #[test]
    fn test_ui_state_path_empty_env() {
-        // Save original value
-        let original = std::env::var("MAESTRO_UI_STATE").ok();
-
-        std::env::set_var("MAESTRO_UI_STATE", "   ");
-        let path = ui_state_path();
+        let path = ui_state_path_from_env_value(Some("   "));
        // Should fall back to default when env var is empty/whitespace
        if let Some(p) = path {
            assert!(
@@ -217,31 +199,15 @@
                p
            );
        }
-
-        // Restore original value
-        match original {
-            Some(v) => std::env::set_var("MAESTRO_UI_STATE", v),
-            None => std::env::remove_var("MAESTRO_UI_STATE"),
-        }
    }

    #[test]
    fn test_ui_state_path_tilde_expansion() {
-        // Save original value
-        let original = std::env::var("MAESTRO_UI_STATE").ok();
-
-        std::env::set_var("MAESTRO_UI_STATE", "~/my-ui-state.json");
-        let path = ui_state_path();
+        let path = ui_state_path_from_env_value(Some("~/my-ui-state.json"));
        if let Some(p) = path {
            // Should not start with ~ after expansion
            assert!(!p.to_string_lossy().starts_with('~'));
            assert!(p.to_string_lossy().ends_with("my-ui-state.json"));
        }
-
-        // Restore original value
-        match original {
-            Some(v) => std::env::set_var("MAESTRO_UI_STATE", v),
-            None => std::env::remove_var("MAESTRO_UI_STATE"),
-        }
    }
}

diff --git a/src/cli/commands/skill.ts b/src/cli/commands/skill.ts
--- a/src/cli/commands/skill.ts
+++ b/src/cli/commands/skill.ts
@@ -4,6 +4,7 @@
import chalk from "chalk";
import { PATHS } from "../../config/constants.js";
import {
+	buildSkillRuntimeActivation,
	evaluateSkillPackages,
	findSkill,
	formatSkillEvalText,
@@ -172,6 +173,7 @@
		sourcePath: skill.sourcePath,
		resources: skill.resources,
		resourceDirs: skill.resourceDirs,
+		runtimeActivation: buildSkillRuntimeActivation(skill),
	};
	if (options.json) {
		console.log(JSON.stringify(payload, null, 2));

diff --git a/src/server/handlers/hosted-runner-drain.ts b/src/server/handlers/hosted-runner-drain.ts
--- a/src/server/handlers/hosted-runner-drain.ts
+++ b/src/server/handlers/hosted-runner-drain.ts
@@ -25,6 +25,13 @@
export const HOSTED_RUNNER_RETENTION_POLICY_VERSION =
	"evalops.remote-runner.retention.v1";

+export const HOSTED_RUNNER_WORK_CONTINUITY_VERSION =
+	"evalops.remote-runner.work-continuity.v1";
+
+const CODEX_SUBAGENT_TOOL_PREFIX = "codex.subagent.";
+const CODEX_SUBAGENT_WORK_GRAPH_SCHEMA =
+	"evalops.maestro.codex.subagent-workgraph.v1";
+
export enum HostedRunnerDrainStatusValue {
	Drained = "drained",
	Interrupted = "interrupted",
@@ -105,6 +112,24 @@
	snapshot?: HeadlessRuntimeSnapshot;
}

+class HostedRunnerRuntimeDrainError extends Error {
+	readonly sessionId: string;
+	readonly sessionFile?: string;
+	readonly protocolVersion?: string;
+	readonly cursor?: number;
+	readonly snapshot?: HeadlessRuntimeSnapshot;
+
+	constructor(message: string, runtime: HostedRunnerRuntimeDrainResult) {
+		super(message);
+		this.name = "HostedRunnerRuntimeDrainError";
+		this.sessionId = runtime.sessionId;
+		this.sessionFile = runtime.sessionFile;
+		this.protocolVersion = runtime.protocolVersion;
+		this.cursor = runtime.cursor;
+		this.snapshot = runtime.snapshot;
+	}
+}
+
export interface HostedRunnerWorkspaceExportPath {
	input: string;
	path: string;
@@ -134,6 +159,7 @@
		mode: HostedRunnerWorkspaceExportMode;
		paths: HostedRunnerWorkspaceExportPath[];
	};
+	work_continuity: HostedRunnerWorkContinuity;
	snapshot: HeadlessRuntimeSnapshot;
	retention_policy: HostedRunnerRetentionPolicy;
	git?: {
@@ -143,6 +169,16 @@
	};
}

+export interface HostedRunnerWorkContinuity {
+	protocol_version: typeof HOSTED_RUNNER_WORK_CONTINUITY_VERSION;
+	active_tool_count: number;
+	tracked_tool_count: number;
+	pending_request_count: number;
+	codex_subagent_tool_call_ids: string[];
+	codex_subagent_child_run_ids: string[];
+	codex_subagent_thread_ids: string[];
+}
+
export interface HostedRunnerDrainResult {
	status: HostedRunnerDrainStatus;
	runner_session_id: string;
@@ -412,6 +448,115 @@
	};
}

+function isRecord(value: unknown): value is Record<string, unknown> {
+	return Boolean(value && typeof value === "object" && !Array.isArray(value));
+}
+
+function stringArray(value: unknown): string[] {
+	return Array.isArray(value)
+		? value.filter(
+				(item): item is string => typeof item === "string" && item.length > 0,
+			)
+		: [];
+}
+
+function collectCodexWorkArgs(
+	args: unknown,
+	childRunIds: Set<string>,
+	threadIds: Set<string>,
+	includeLooseArgs = false,
+): boolean {
+	if (!isRecord(args)) {
+		return false;
+	}
+	const graph = args.codexWorkGraph ?? args.codex_work_graph;
+	const hasCodexGraph =
+		isRecord(graph) &&
+		(graph.schemaVersion === CODEX_SUBAGENT_WORK_GRAPH_SCHEMA ||
+			graph.schema_version === CODEX_SUBAGENT_WORK_GRAPH_SCHEMA);
+	if (!includeLooseArgs && !hasCodexGraph) {
+		return false;
+	}
+	for (const childRunId of stringArray(
+		args.childRunIds ?? args.child_run_ids,
+	)) {
+		childRunIds.add(childRunId);
+	}
+	for (const threadId of stringArray(
+		args.receiverThreadIds ?? args.receiver_thread_ids,
+	)) {
+		threadIds.add(threadId);
+	}
+	if (isRecord(graph)) {
+		const graphChildRuns = graph.childRuns ?? graph.child_runs;
... diff truncated: showing 800 of 2077 lines

You can send follow-ups to the cloud agent here.

@haasonsaas haasonsaas enabled auto-merge (squash) May 16, 2026 05:07
@haasonsaas haasonsaas force-pushed the sync/public-release-mirror branch from 768aaac to 12934be Compare May 16, 2026 05:07
Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 768aaac306

ℹ️ About Codex in GitHub

Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".

Comment thread MODULE.bazel
@@ -1,5 +1,5 @@
module(
name = "evalops_maestro",
name = "evalops_maestro_internal",
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1 Badge Restore public Bazel module name

Changing module(name = ...) from evalops_maestro to evalops_maestro_internal is a breaking API change for Bzlmod consumers: downstream workspaces that depend on this repo via bazel_dep(name = "evalops_maestro", ...) will no longer resolve the module name after this sync. Because this commit does not provide a compatibility/migration path, keeping the public module identity unchanged is necessary to avoid immediate dependency breakage.

Useful? React with 👍 / 👎.

Comment thread docs/development/bazel.md

Trusted CI should run on the repo-scoped Buildfarm runner label
`evalops-maestro-rbe` after Deploy has registered that runner through
`evalops-maestro-internal-rbe` after Deploy has registered that runner through
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Keep Bazel runner label aligned with CI workflow

The doc now tells operators to register evalops-maestro-internal-rbe, but the active Bazel RBE workflow still runs on evalops-maestro-rbe (.github/workflows/bazel-rbe.yml, runs-on label at line 47). Following this updated doc will provision the wrong label and leave the Bazel smoke job unschedulable on fresh environments.

Useful? React with 👍 / 👎.

Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 12934beacd

ℹ️ About Codex in GitHub

Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".

Comment thread docs/development/bazel.md
Comment on lines 31 to +32
Trusted CI should run on the repo-scoped Buildfarm runner label
`evalops-maestro-rbe` after Deploy has registered that runner through
`evalops-maestro-internal-rbe` after Deploy has registered that runner through
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Align Bazel runner label documentation with active workflow

The runner label in this doc was changed to evalops-maestro-internal-rbe, but the actual Bazel CI job still targets evalops-maestro-rbe in .github/workflows/bazel-rbe.yml (runs-on list). Following this updated guidance will register the wrong label and leave Bazel RBE jobs unschedulable in environments that provision runners from the documented value.

Useful? React with 👍 / 👎.

@haasonsaas haasonsaas merged commit b0bc8c5 into main May 16, 2026
13 of 14 checks passed
@haasonsaas haasonsaas deleted the sync/public-release-mirror branch May 16, 2026 05:15
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant