L/small changes 12#386
Merged
Merged
Conversation
|
Preview deployment for your docs. Learn more about Mintlify Previews.
|
…/small-changes-12
There was a problem hiding this comment.
Cursor Bugbot has reviewed your changes and found 2 potential issues.
Bugbot Autofix is OFF. To automatically fix reported issues with cloud agents, enable autofix in the Cursor dashboard.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.

Note
Medium Risk
Moderate risk: changes core evaluation helpers to be async (including subprocess execution) and extends CLI task discovery/import behavior, which could affect existing integrations and task loading in real projects.
Overview
Adds a significantly expanded
hud.nativeevaluation toolkit: graders are now fully async,BashGraderuses async subprocesses withtimeout_seconds, and a newGrade.gather()runs multiple graders/subscores in parallel. This also introducesLLMJudgeGrader(rubric-based LLM judging) plus built-in answer comparison/normalization helpers, and exports them viahud.native.Improves CLI task collection to better match real project layouts: supports importing a directory as a package (running its own discovery), recursively finds
**/task.py, and adds project-rootsys.pathinjection to fix cross-module imports; includes extensive new tests for these edge cases.Updates agent console output to show a tool discovery table at init and a structured per-step tool call/result summary at INFO, and refreshes docs for the new graders API plus minor doc link/typo fixes.
Written by Cursor Bugbot for commit 2d5b407. This will update automatically on new commits. Configure here.