Context
CompletionCheck is referenced in two loop strategies in the Harness (#3):
LoopStrategy::Ralph { completion_check: Box<dyn CompletionCheck> } — the external check that determines when a Ralph continuation loop is actually done
LoopStrategy::SelfVerifying { verifier, evaluator_harness } — uses the termination policy's CompletionCheck to determine when the build phase should stop and hand off to the evaluator
Currently a stub in the harness module tagged // SPEC: full trait lives in this issue. The Ralph and SelfVerifying strategies both return HaltReason::StrategyNotYetImplemented until this trait is implemented.
Trait Definition
// Returns None if the task is complete, Some(reason) if not done yet.
// reason is injected into the next turn's context — tells the agent
// what it still needs to do. This is what prevents premature victory.
trait CompletionCheck {
async fn check(state: &SessionStateSnapshot) -> Option<String>
// Human-readable description of what this check evaluates.
// Injected into agent context at session start so it understands
// what "done" means for this task.
fn description() -> String
}
Standard Implementations
FeatureListCheck
Reads feature_list.json from the workspace. Returns Some with the list of incomplete features if any have passes: false. Returns None when all features pass.
FeatureListCheck {
path: PathBuf, // default: "feature_list.json"
}
TestSuiteCheck
Runs the test suite and returns Some(failure_summary) if any tests fail. Returns None when the full suite passes.
TestSuiteCheck {
command: String, // e.g. "npm test", "cargo test", "pytest"
working_dir: PathBuf,
timeout: Duration,
sandbox: Arc<dyn SandboxProvider>,
}
QuestionAnsweredCheck
LLM-as-judge: evaluates whether the agent's final response actually answered the user's question. Used for RAG and conversational agents.
QuestionAnsweredCheck {
judge_model: ModelConfig,
original_question: String,
rubric: Option<String>, // custom evaluation criteria
}
SqlResultCheck
Validates that the SQL result set is non-empty and structurally correct (column names match expectation). Used for NL-to-SQL agents.
SqlResultCheck {
expected_columns: Option<Vec<String>>,
min_rows: Option<usize>,
}
AlwaysComplete
Returns None immediately — task is always considered done when the model claims it is. Used for simple single-turn tasks where the model's self-assessment is sufficient.
Relationship to TerminationPolicy
CompletionCheck is injected into TerminationPolicy and called only when agent_claims_done: true. The TerminationPolicy evaluates budget limits first (unconditionally), then sensor results, then calls CompletionCheck. This is the mechanism that prevents premature victory — the agent claims done, the check says "not yet, here's what's missing", and the harness injects that reason into the next turn.
Checklist
Related Issues
Context
CompletionCheckis referenced in two loop strategies in the Harness (#3):LoopStrategy::Ralph { completion_check: Box<dyn CompletionCheck> }— the external check that determines when a Ralph continuation loop is actually doneLoopStrategy::SelfVerifying { verifier, evaluator_harness }— uses the termination policy'sCompletionCheckto determine when the build phase should stop and hand off to the evaluatorCurrently a stub in the harness module tagged
// SPEC: full trait lives in this issue. The Ralph and SelfVerifying strategies both returnHaltReason::StrategyNotYetImplementeduntil this trait is implemented.Trait Definition
Standard Implementations
FeatureListCheck
Reads
feature_list.jsonfrom the workspace. ReturnsSomewith the list of incomplete features if any havepasses: false. ReturnsNonewhen all features pass.TestSuiteCheck
Runs the test suite and returns
Some(failure_summary)if any tests fail. ReturnsNonewhen the full suite passes.QuestionAnsweredCheck
LLM-as-judge: evaluates whether the agent's final response actually answered the user's question. Used for RAG and conversational agents.
SqlResultCheck
Validates that the SQL result set is non-empty and structurally correct (column names match expectation). Used for NL-to-SQL agents.
AlwaysComplete
Returns
Noneimmediately — task is always considered done when the model claims it is. Used for simple single-turn tasks where the model's self-assessment is sufficient.Relationship to TerminationPolicy
CompletionCheckis injected intoTerminationPolicyand called only whenagent_claims_done: true. TheTerminationPolicyevaluates budget limits first (unconditionally), then sensor results, then callsCompletionCheck. This is the mechanism that prevents premature victory — the agent claims done, the check says "not yet, here's what's missing", and the harness injects that reason into the next turn.Checklist
CompletionChecktrait + all standard implementationsCompletionChecktrait + all standard implementationsCompletionChecktrait + all standard implementationsCompletionChecktrait + all standard implementationsNone/Some(reason)for its domainCompletionCheckin Ralph and TerminationPolicyTerminationPolicy(Implement TerminationPolicy #13) updated to callCompletionCheckcorrectlyfixtures/completion_checks/feature_list_complete.jsonlRelated Issues