fix: harden infrastructure and CLI for production readiness#34
fix: harden infrastructure and CLI for production readiness#34
Conversation
- Prevent directory traversal in workspace file injection (cloud-init.ts) - Restrict SSH to explicit CIDRs, default to no SSH ingress (shared-vpc.ts, openclaw-agent.ts) - Enforce IMDSv2 on EC2 instances to block unauthenticated metadata access - Replace silent catch blocks with warning logs across 6 files - Add graceful shutdown: track child processes, forward SIGINT/SIGTERM on exit - Add exponential backoff retry (3 attempts) to Tailscale API calls - Hide unimplemented Hetzner provider from CLI init wizard Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
📝 WalkthroughWalkthroughThe pull request introduces graceful shutdown for child processes, adds exponential backoff retry logic for Tailscale API calls, enhances error logging across modules, implements runtime validation for workspace file paths, restricts SSH access via configurable CIDR blocks, and enforces IMDSv2 for instance metadata security. Changes
Sequence Diagram(s)sequenceDiagram
participant OS as OS/Shell
participant CLI as CLI Process
participant SigHandler as Signal Handler
participant Children as Child Processes
participant Cleanup as Process Cleanup
OS->>CLI: User sends SIGINT/SIGTERM
CLI->>SigHandler: Receives signal
SigHandler->>SigHandler: Get tracked children from Set
loop For each tracked child
SigHandler->>Children: Forward SIGINT/SIGTERM
Children->>Children: Graceful shutdown
Children->>Cleanup: Emit close/error event
Cleanup->>SigHandler: Auto-unregister from tracked Set
end
SigHandler->>CLI: Exit with code (130 for SIGINT, 143 for SIGTERM)
CLI->>OS: Process terminates
Estimated code review effort🎯 3 (Moderate) | ⏱️ ~20 minutes Possibly related PRs
Poem
🚥 Pre-merge checks | ✅ 3✅ Passed checks (3 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing touches
🧪 Generate unit tests (beta)
Comment |
There was a problem hiding this comment.
Actionable comments posted: 2
🤖 Fix all issues with AI agents
In `@cli/lib/tailscale.ts`:
- Around line 37-56: The code interpolates apiKey directly into a shell command
passed to execSync (e.g., in the device-list loop) which risks command
injection; change the calls that build the curl command (the execSync usage that
includes the Authorization header) to avoid shell interpolation by either using
child_process.execFileSync/ spawnSync with the curl binary and an args array or
by passing the key via an environment variable (e.g., set options.env = {
...process.env, TS_API_KEY: apiKey } and use the header "Authorization: Bearer
$TS_API_KEY"), and apply the same change to the deleteTailscaleDevice call so
neither function ever injects apiKey directly into a shell string.
In `@src/components/cloud-init.ts`:
- Around line 328-339: The validation uses the normalized variable but the code
later writes using the original filePath, which leaves Windows backslashes in
filenames; update the write logic in the workspaceFiles loop to use the
normalized path (the normalized variable) as the target for directory creation
and file writes—convert back to platform-specific paths as needed (e.g., using
path.join or path.resolve with workspace root) and ensure all checks
(startsWith, includes(".."), null byte) are performed on normalized before
creating directories or calling the file write functions so files are created in
the proper nested directories rather than as backslash-containing filenames.
🧹 Nitpick comments (2)
cli/lib/tailscale.ts (1)
16-17: Minor: Naming could be clearer for retry semantics.
MAX_RETRIES = 3with loopattempt <= MAX_RETRIESyields 4 total attempts. The warning message correctly says "4 attempts", but the constant name suggests 3 retries (implying 1 initial + 3 retries = 4 total). Consider renaming toMAX_ATTEMPTS = 4orRETRY_COUNT = 3with loopattempt < RETRY_COUNT + 1for clarity.This is a minor readability nit — the behavior is correct.
Also applies to: 37-37
cli/tools/destroy.ts (1)
50-73: Consider consolidating withcli/lib/tailscale.ts.This file has its own
listTailscaleDevicesanddeleteTailscaleDeviceimplementations that don't have the retry logic added incli/lib/tailscale.ts. The destroy flow would benefit from the same exponential backoff resilience.This can be deferred, but worth tracking to avoid divergent implementations.
Also applies to: 78-90
| for (let attempt = 0; attempt <= MAX_RETRIES; attempt++) { | ||
| try { | ||
| const result = execSync( | ||
| `curl -sf -H "Authorization: Bearer ${apiKey}" "https://api.tailscale.com/api/v2/tailnet/${tailnet}/devices?fields=default"`, | ||
| { encoding: "utf-8", stdio: ["pipe", "pipe", "pipe"], timeout: 15000 } | ||
| ); | ||
| const data = JSON.parse(result); | ||
| if (!data.devices || !Array.isArray(data.devices)) return null; | ||
| return data.devices.map((d: Record<string, unknown>) => ({ | ||
| id: d.id as string, | ||
| name: (d.name as string) ?? "", | ||
| hostname: (d.hostname as string) ?? "", | ||
| })); | ||
| } catch (err) { | ||
| lastError = err; | ||
| if (attempt < MAX_RETRIES) { | ||
| sleepSync(BASE_DELAY_MS * Math.pow(2, attempt)); | ||
| } | ||
| } | ||
| } |
There was a problem hiding this comment.
Potential command injection via API key.
The apiKey is interpolated directly into a shell command string passed to execSync. If the API key contains shell metacharacters (unlikely for legitimate Tailscale keys, but possible with malicious input), this could lead to command injection.
Consider using an environment variable or writing to a temp file to pass the authorization header safely.
🔒 Proposed fix using environment variable
export function listTailscaleDevices(
apiKey: string,
tailnet: string
): TailscaleDevice[] | null {
let lastError: unknown;
for (let attempt = 0; attempt <= MAX_RETRIES; attempt++) {
try {
const result = execSync(
- `curl -sf -H "Authorization: Bearer ${apiKey}" "https://api.tailscale.com/api/v2/tailnet/${tailnet}/devices?fields=default"`,
- { encoding: "utf-8", stdio: ["pipe", "pipe", "pipe"], timeout: 15000 }
+ `curl -sf -H "Authorization: Bearer $TAILSCALE_API_KEY" "https://api.tailscale.com/api/v2/tailnet/${encodeURIComponent(tailnet)}/devices?fields=default"`,
+ { encoding: "utf-8", stdio: ["pipe", "pipe", "pipe"], timeout: 15000, env: { ...process.env, TAILSCALE_API_KEY: apiKey } }
);Apply similar fix to deleteTailscaleDevice.
Also applies to: 77-90
🤖 Prompt for AI Agents
In `@cli/lib/tailscale.ts` around lines 37 - 56, The code interpolates apiKey
directly into a shell command passed to execSync (e.g., in the device-list loop)
which risks command injection; change the calls that build the curl command (the
execSync usage that includes the Authorization header) to avoid shell
interpolation by either using child_process.execFileSync/ spawnSync with the
curl binary and an args array or by passing the key via an environment variable
(e.g., set options.env = { ...process.env, TS_API_KEY: apiKey } and use the
header "Authorization: Bearer $TS_API_KEY"), and apply the same change to the
deleteTailscaleDevice call so neither function ever injects apiKey directly into
a shell string.
| for (const [filePath, content] of Object.entries(workspaceFiles)) { | ||
| // Validate path to prevent directory traversal | ||
| const normalized = filePath.replace(/\\/g, "/"); | ||
| if ( | ||
| normalized.includes("..") || | ||
| normalized.startsWith("/") || | ||
| normalized.includes("\0") | ||
| ) { | ||
| throw new Error( | ||
| `Invalid workspace file path: "${filePath}". Paths must be relative and cannot contain "..".` | ||
| ); | ||
| } |
There was a problem hiding this comment.
Use the normalized path for the actual write target.
Right now validation uses normalized, but the write path uses the original filePath, so Windows-style paths become filenames with backslashes instead of directories.
🛠️ Proposed fix
- const fullPath = `/home/ubuntu/.openclaw/workspace/${filePath}`;
+ const safePath = normalized;
+ const fullPath = `/home/ubuntu/.openclaw/workspace/${safePath}`;🤖 Prompt for AI Agents
In `@src/components/cloud-init.ts` around lines 328 - 339, The validation uses the
normalized variable but the code later writes using the original filePath, which
leaves Windows backslashes in filenames; update the write logic in the
workspaceFiles loop to use the normalized path (the normalized variable) as the
target for directory creation and file writes—convert back to platform-specific
paths as needed (e.g., using path.join or path.resolve with workspace root) and
ensure all checks (startsWith, includes(".."), null byte) are performed on
normalized before creating directories or calling the file write functions so
files are created in the proper nested directories rather than as
backslash-containing filenames.
The main stack now fully supports HetznerOpenClawAgent, so the provider selection and label should remain active instead of filtering to AWS-only. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Summary
Production hardening fixes identified during a full audit of the repo. Addresses 7 issues across security, resilience, and UX:
..and absolute paths0.0.0.0/0; SSH requires explicitallowedSshCidrs(matching the Hetzner component's existing approach)curlto the metadata endpointprocess.tsmodule tracks child processes; SIGINT/SIGTERM are forwarded before exit sopulumi up/destroyaren't orphanedagent-army initTest plan
pnpm run buildpasses for both root and CLI packages (verified locally)agent-army initskips Hetzner provider selection, shows "AWS (Hetzner coming soon)"pulumi previewshowsmetadataOptionson EC2 instances and empty SSH ingress on security groupsagent-army deploywith Ctrl+C — childpulumiprocess should be killed cleanlyagent-army destroylogs warnings on Tailscale API failures instead of silently continuing🤖 Generated with Claude Code
Summary by CodeRabbit
New Features
allowedSshCidrsparameter.Improvements