Add fullsend run agent command and experiment for test it#231
Conversation
9aff39a to
0b3eb99
Compare
0b3eb99 to
1383726
Compare
|
Not that really matters, but I passed Leading to: I could test this quickly locally. |
Reconcile the harness schema with Marta's implementation in PR #231 and capture decisions from the Apr 9 meeting and async Slack threads. Schema additions from PR #231: image, description, host_files, providers, agent_input, runner_env. These fields were proven necessary during end-to-end experimentation. Key decisions captured: - Harness as template: required_env declares names only, CI provides values - Credentials never in harness YAML; delivered via host_files or tool servers - Container images are the preferred tool provisioning path (tools_binaries is the fallback) - Multi-agent orchestration at CI layer with trade-off noted (Ralph's cross-platform maintenance concern) - Validation loop, schema versioning, protected fields, and skills loading deferred to separate issues (#234-#237) Made-with: Cursor
Reconcile the harness schema with Marta's implementation in PR #231 and capture decisions from the Apr 9 meeting and async Slack threads. Schema additions from PR #231: image, description, host_files, providers, agent_input, runner_env. These fields were proven necessary during end-to-end experimentation. Key decisions captured: - Harness as template: required_env declares names only, CI provides values - Credentials never in harness YAML; delivered via host_files or tool servers - Container images are the preferred tool provisioning path (tools_binaries is the fallback) - Multi-agent orchestration at CI layer with trade-off noted (Ralph's cross-platform maintenance concern) - Validation loop, schema versioning, protected fields, and skills loading deferred to separate issues (#234-#237) Made-with: Cursor Signed-off-by: Adam Scerra <ascerra@redhat.com>
waynesun09
left a comment
There was a problem hiding this comment.
5 critical security issues that need addressing before merge. The most dangerous is the policy race window (#2) — verified against OpenShell's architecture that the fix (applying policy immediately after Create) is safe since SSH/SCP uses the control plane, not the restricted network namespace.
waynesun09
left a comment
There was a problem hiding this comment.
Round 2: High, Medium, and Low issues
18 additional comments covering supply chain risks (experiment files), test coverage gaps, code bugs, and documentation nits. See the first review for the 5 critical security issues in the core runner code.
|
Alright so I ran this locally and hit the same issue that @rh-hemartin raised on slack where I needed to add --model. Then the other thing that I had to work around was making my agent skill script available. Here's so info on that and what I did. Skills can depend on scripts, but there's no way to declare that in the harness as of right now I tested this locally with a real agent whose code-implementation skill references a scan-secrets helper script (a gitleaks wrapper). The skill's SKILL.md says run $SCAN_SECRETS — but the harness has no way to express "this skill needs this script on $PATH inside the sandbox." The skills: field copies the skill directory into .claude/skills/, but scripts that skills depend on aren't part of that directory. I had to work around it with host_files: This works but it's brittle — the skill and its script dependency are declared in two separate, unrelated places in the harness. If someone copies the skill without the host_files entry, the agent fails at runtime with a confusing "command not found." A few options:
|
Right! This is a bug with an easy fix, I'll take care of it.
I think the way to go for tools that a skill relies on is baking them into the sandbox image. The skill's SKILL.md would reference them in its front-matter as Bash(my-tool *), and since the tool is already on $PATH inside the image, it just works. This gives us reproducibility — the image is the single source of truth for what tools are available. I added host_files support for files that are dynamic by nature — things like the output of a previous agent, or files generated by a pre-script. Your workaround of using it for the scan-secrets script works, but I'd treat it as a short-term solution. For the long term, if a tool is a stable dependency of a skill, it belongs in the image. For the short term, I'd go with your option 3 — document the host_files workaround pattern so others don't have to discover it on their own. On a related but separate topic: there are also tools that run outside the sandbox entirely, like REST APIs that a skill might call. As for having skills declare their own tool dependencies and having fullsend run parse that, I'd rather not go down that path:
Agreed that we should add this to the HOW_TO — both the host_files workaround for now and the image-based approach as the recommended pattern going forward. Also about the REST APIs that isn't covered yet. |
|
Regarding scripts, know that skills that bundle their own We should support copying in any |
Automation layer that takes the code agent's local commit from the sandbox and turns it into a pushed branch and PR. Three files: harness YAML, provider YAML, and a post-script shell script. Depends on repo extraction (in-flight) and PR fullsend-ai#231 (fullsend run CLI). Assisted-by: Claude Code (Opus 4.6)
yep, but from the agent creator POV we should know that with this schema the scripts aren't reused, they aren't tools they are just for the skill, right? |
fff18c8 to
d2e5846
Compare
Site previewPreview: https://2637a2e1-site.fullsend-ai.workers.dev Commit: |
d2e5846 to
953dc2d
Compare
|
Added --model
If, as ralph mentioned, your script is specific to a skill, you can add it under the skill dir, it's supported as the If it's something you want to reuse, either the image, or the host_file option. There is new doc about this in the README.md |
Fixed. The agent was running in the wrong directory — it was seeing the fullsend source repo instead of the target repo. Two changes:
If you test again with REPO_NAME=kubearchive/kubearchive and --target-repo pointing at a kubearchive clone, the agent should see the right repo. |
waynesun09
left a comment
There was a problem hiding this comment.
Inline comment on validation exit code — see thread.
waynesun09
left a comment
There was a problem hiding this comment.
Two high-severity findings from second-round review — both involve the host trusting file paths and content originating from inside the sandbox, which is the security boundary being protected against.
Addresses review feedback from waynesun09 on PR #189. disallowedTools (H1, M1): Add bare-form (zero-arg) variants for all blocked commands, add gh api blocking, add git reset --hard and git rebase. Documented that disallowedTools is belt-and-suspenders — sandbox network policy and post-script validation are the load-bearing layers. scan-secrets (H3, M3, L1): Pin gitleaks version with SHA256 checksum verification for all supported platforms. Add tar extraction error handling. Fix pre-commit fallback to require the gitleaks-specific hook rather than falling through to generic hooks. Add sha256sum/shasum portability for macOS. architecture.md (M2, I2): Document the four-layer defense-in-depth model, three-layer secret scanning architecture, and protected-path enforcement via post-script. skill (M4, I3, L2, L3): Add MAX_RETRIES env var for retry limit. Add multi-run branch reuse guidance for review-rejected commits. Add partial work section. Remove constraint duplication — skill defers to agent definition as authoritative. H2 (protected-path enforcement in post-script) deferred to PR #208. Sandbox network policy and harness timeout deferred to PR #231. Signed-off-by: Adam Scerra <ascerra@redhat.com> Made-with: Cursor
waynesun09
left a comment
There was a problem hiding this comment.
Second-round review — 2 high, 7 medium, 4 low, 1 informational.
The first-round security fixes (credential handling, command injection, policy race) are all solid. The remaining issues center on the sandbox-to-host trust boundary — ExtractOutputFiles, ExtractTranscripts, and the repo write-back all trust file paths and content originating from inside the sandbox, which is the security boundary being protected against.
Must-fix (High):
- Path traversal via sandbox-controlled
findoutput inExtractOutputFiles - Unrestricted
scp -rwrite-back follows symlinks and enables git hook injection
Should-fix (Medium):
runAgentreturns nil on validation failure (exit code always 0)ProcessStatenil panic when ssh binary fails to startsandbox.Deletesilently swallows errors- Stale sandbox-side output contaminates iterations after the first
- Transcript path injection from sandbox-controlled
findoutput host_filessrc path traversal — no containment check- No
modelfield validation (inconsistent with agent name regex)
|
Future improvement: replace Several of the review findings (path traversal, symlink following, no SCP timeout, ProcessState nil panic) stem from shelling out to
Key wins:
Also worth noting: OpenShell has a gRPC Not blocking this PR — the immediate fixes (containment checks, nil guards) are sufficient. Suggesting this as a follow-up issue for the next iteration. All libraries are BSD/MIT licensed, compatible with Apache 2.0. Minimum |
Reconcile the harness schema with Marta's implementation in PR #231 and capture decisions from the Apr 9 meeting and async Slack threads. Schema additions from PR #231: image, description, host_files, providers, agent_input, runner_env. These fields were proven necessary during end-to-end experimentation. Key decisions captured: - Harness as template: required_env declares names only, CI provides values - Credentials never in harness YAML; delivered via host_files or tool servers - Container images are the preferred tool provisioning path (tools_binaries is the fallback) - Multi-agent orchestration at CI layer with trade-off noted (Ralph's cross-platform maintenance concern) - Validation loop, schema versioning, protected fields, and skills loading deferred to separate issues (#234-#237) Made-with: Cursor Signed-off-by: Adam Scerra <ascerra@redhat.com>
953dc2d to
56d7a7b
Compare
56d7a7b to
7375e69
Compare
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> Signed-off-by: Marta Anon <manon@redhat.com>
7375e69 to
5d4d161
Compare
waynesun09
left a comment
There was a problem hiding this comment.
All issues from both review rounds have been addressed. Verified each fix in the diff:
Critical (Round 1) — all fixed:
- Command injection: regex validation
[a-zA-Z0-9_-]+ defense-in-depth escaping - Policy race: policy applied at
sandbox.Create()via--policyflag - Credential exposure in errors: redaction via
strings.ReplaceAll - CLI credential leak: bare-key
--credential KEY, secrets viacmd.Env - Pre/post script semantics: file paths +
ValidateFilesExist()
High (Round 2) — both fixed:
- Path traversal in
ExtractOutputFiles:filepath.Clean+ prefix containment check scp -rsymlink following: replaced withRsyncFromusing--no-links+--exclude .git/hooks/
Medium (Round 2) — all fixed:
runAgentvalidation exit code: returns error when!validationPassedProcessStatenil guard inSSH()/SSHStream()Delete()returns error, caller logs as warning- Stale sandbox output cleared between iterations
- Transcript path containment check added
host_filessrc traversal rejected inResolveRelativeTo- Model field validated with same regex as agent name
Low (Round 2) — all fixed:
- SCP/rsync timeout (5 min via
CommandContext) ApplyPolicydead code removed- Single
os.ExpandEnvpass (no double expansion) - Split on newlines instead of whitespace
No new security issues found. The code is solid.
Automation layer that takes the code agent's local commit from the sandbox and turns it into a pushed branch and PR. Three files: harness YAML, provider YAML, and a post-script shell script. Depends on repo extraction (in-flight) and PR fullsend-ai#231 (fullsend run CLI). Assisted-by: Claude Code (Opus 4.6)
Automation layer that takes the code agent's local commit from the sandbox and turns it into a pushed branch and PR. Three files: harness YAML, provider YAML, and a post-script shell script. Depends on repo extraction (in-flight) and PR fullsend-ai#231 (fullsend run CLI). Assisted-by: Claude Code (Opus 4.6)
Reconcile the harness schema with Marta's implementation in PR #231 and capture decisions from the Apr 9 meeting and async Slack threads. Schema additions from PR #231: image, description, host_files, providers, agent_input, runner_env. These fields were proven necessary during end-to-end experimentation. Key decisions captured: - Harness as template: required_env declares names only, CI provides values - Credentials never in harness YAML; delivered via host_files or tool servers - Container images are the preferred tool provisioning path (tools_binaries is the fallback) - Multi-agent orchestration at CI layer with trade-off noted (Ralph's cross-platform maintenance concern) - Validation loop, schema versioning, protected fields, and skills loading deferred to separate issues (#234-#237) Made-with: Cursor Signed-off-by: Adam Scerra <ascerra@redhat.com>
… ADR-0017 Reconcile the harness schema with Marta's implementation in PR fullsend-ai#231 and capture decisions from the Apr 9 meeting and async Slack threads. Schema additions from PR fullsend-ai#231: image, description, host_files, providers, agent_input, runner_env. These fields were proven necessary during end-to-end experimentation. Key decisions captured: - Harness as template: required_env declares names only, CI provides values - Credentials never in harness YAML; delivered via host_files or tool servers - Container images are the preferred tool provisioning path (tools_binaries is the fallback) - Multi-agent orchestration at CI layer with trade-off noted (Ralph's cross-platform maintenance concern) - Validation loop, schema versioning, protected fields, and skills loading deferred to separate issues (fullsend-ai#234-fullsend-ai#237) Made-with: Cursor Signed-off-by: Adam Scerra <ascerra@redhat.com>
fullsend run— agent runner with sandbox isolationAdds the
fullsend run <agent-name>command, which reads a harness YAML definition, provisions an OpenShell sandbox (optionally from a container image), bootstraps it with agent config/skills/credentials, runs a Claude Code agent inside it, and validates the output.What's included
Go CLI (
internal/cli/run.go,internal/harness/,internal/sandbox/):fullsend run <agent-name> --harness-dir <path>— new subcommand on root--fromfor container images), bootstrap via SCP/SSH, policy application, cleanupTARGET_REPO_DIRenv var to specify the target codebase directory (for two-repo setups where the harness and target repo are in separate repositories)Hello-world experiment (
experiments/runner-hello-world/):fullsend runflowContainerfile) with Claude Code and tool binaries pre-installed, based on the OpenShell base image.fullsendrepo, target codebase in a separate reporun-experiment.sh)Test org
The experiment runs against the test-fullsend org with two repos:
Successful workflow runs:
.fullsendrepoWhat's not tested by this experiment
How to reproduce
See experiments/runner-hello-world/HOW_TO.md.