test: add unit tests for stdlib/requirements (#814)#820
Conversation
…eqify, req, check, ALoraRequirement)
|
The PR description has been updated. Please fill out the template for your PR to be reviewed. |
psschwei
left a comment
There was a problem hiding this comment.
Solid test PR — the marker fix alone is worth merging, and the 36 new unit tests are well-structured with good edge case coverage. Two minor suggestions below, neither blocking.
psschwei
left a comment
There was a problem hiding this comment.
Suggestion (non-blocking): There's a gap in tool_arg_validator coverage when tool_name=None and none of the tool calls contain the target arg_name. The production code silently returns True in that case (the for loop finishes without failing), which is arguably a latent bug. A test documenting this would be useful:
def test_tool_arg_validator_no_tool_name_arg_missing_everywhere():
ctx = _ctx_with_tool_calls({"tool_a": _make_tool_call("tool_a", {"y": 5})})
req = tool_arg_validator("x must be positive", tool_name=None, arg_name="x", validation_fn=lambda v: v > 0)
result = req.validation_fn(ctx)
assert result.as_bool() is True # documents current (possibly surprising) behaviorCould be a follow-up issue too if you'd rather not expand scope here.
While you're here: there's a typo "Valiudation" in tool_reqs.py (appears twice in error reason strings around lines 109 and 119). Not introduced by this PR, but easy to fix if you're touching this area.
a1f1ad7
Misc PR
Type of PR
Description
36 unit tests for
stdlib/requirements/— pure validation logic, no model or backend needed. Runs in ~6s.Why these tests exist
mellea/stdlib/requirements/was at 42% line coverage from the full HPC test suite (all backends, 1034 tests). However much of that came from e2e tests gated behindollamamarkers that never run in CI. The unit-only coverage for the three target files was far lower.These are all deterministic functions — parsing, validation, type coercion — where unit tests provide high regression value. As with the granite formatter tests (#818), these won't pick up new issues necessarily, but will help prevent us from unintentionally breaking the code.
Coverage change (unit/integration tests only)
tool_reqs.pyrequirement.pymd.pyExcluding
guardian.py(deprecated, 0%, out of scope), the active requirements module is now at 96% from unit tests alone. For reference the full HPC run (including e2e) had the whole module at 42%.What's covered
test_reqlib_tools.py_name2stredge case,uses_tool(present/absent/no calls/callable/check_only),tool_arg_validator(valid/failed/missing tool/missing arg/no calls/all-tools mode)test_requirement.pyrequirement_check_to_bool(threshold logic, missing key, invalid JSON),reqify/req/checkshorthands,simple_validatewith None output,LLMaJRequirement,ALoraRequirementinittest_reqlib_markdown.pyas_markdown_listedge cases (paragraph, mixed content, empty, single item),_md_list/_md_tablevalidation wrappers, table edge casesMarker fix
test_requirement.pyhad a module-levelpytestmark = [pytest.mark.ollama, pytest.mark.e2e]that incorrectly gated the existingsimple_validateunit tests behind ollama. Moved to per-function decorators on the two async tests that actually need them.Design decisions
ChatContext+ModelOutputThunkobjects with canned data — no mocking of the context pipelineModelToolCall.funcusesunittest.mock.Mock(only the func field, not the data structures)ALoraRequirementtests patchIntrinsic.__init__to avoid hitting the adapter catalogueTesting