Problem
There's no way to count how many times a CRS called builder sidecar APIs (run-pov, apply-patch-build, run-test) or their success/failure rates during a trial. This data is needed to evaluate each CRS's behavior — understanding how many build iterations, POV verifications, and test runs it performs, how often they succeed, and how the CRS iterates through build-test-verify cycles.
Currently the sidecar processes requests and returns results, but doesn't persist any record of API calls. The only trace is Docker service stdout, which is lost if the container is SIGKILL'd.
Proposal
Register a log directory via libCRS register-log-dir at sidecar startup and write a structured JSONL log of every API call with its result. This ensures the log is persisted to the host-mounted LOG_DIR and survives container termination.
Each API call should log: timestamp, API name, key parameters (harness, build_id), exit code, success/failure, and duration.
Motivation
Evaluating AI-agent CRSes (claude-code, codex, copilot-cli, gemini-cli) requires understanding their interaction patterns with the builder sidecar — how efficiently they iterate, their build success rate, how many POV attempts they make before finding a crash, etc. This applies to both bug-finding and bug-fixing modes.
Problem
There's no way to count how many times a CRS called builder sidecar APIs (
run-pov,apply-patch-build,run-test) or their success/failure rates during a trial. This data is needed to evaluate each CRS's behavior — understanding how many build iterations, POV verifications, and test runs it performs, how often they succeed, and how the CRS iterates through build-test-verify cycles.Currently the sidecar processes requests and returns results, but doesn't persist any record of API calls. The only trace is Docker service stdout, which is lost if the container is SIGKILL'd.
Proposal
Register a log directory via
libCRS register-log-dirat sidecar startup and write a structured JSONL log of every API call with its result. This ensures the log is persisted to the host-mountedLOG_DIRand survives container termination.Each API call should log: timestamp, API name, key parameters (harness, build_id), exit code, success/failure, and duration.
Motivation
Evaluating AI-agent CRSes (claude-code, codex, copilot-cli, gemini-cli) requires understanding their interaction patterns with the builder sidecar — how efficiently they iterate, their build success rate, how many POV attempts they make before finding a crash, etc. This applies to both bug-finding and bug-fixing modes.