Skip to content

NO-JIRA: Add find-token skill and custom verification in eval framework#22

Merged
openshift-merge-bot[bot] merged 1 commit into
openshift:mainfrom
harche:find-token-skill-and-eval-framework
May 28, 2026
Merged

NO-JIRA: Add find-token skill and custom verification in eval framework#22
openshift-merge-bot[bot] merged 1 commit into
openshift:mainfrom
harche:find-token-skill-and-eval-framework

Conversation

@harche
Copy link
Copy Markdown
Contributor

@harche harche commented May 12, 2026

Summary

  • Adds find-token skill that generates random verification tokens, proving the agent can discover, execute, and return tool output
  • Adds custom verification via _fn in test_cases.yaml for validating against runtime data (alongside existing static matching)
  • evals/skills/find-token/ serves as reference implementation for both patterns

Test plan

  • bash evals/run.sh -k "find-token" — both test cases pass
  • Existing doc skill evals unaffected

🤖 Generated with Claude Code

@openshift-ci openshift-ci Bot requested a review from mrunalp May 12, 2026 17:51
@openshift-ci openshift-ci Bot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label May 12, 2026
Comment thread evals/skills/find-token/verify.py Outdated

# Load the schema from test_cases.yaml so it's defined in one place.
_CASES = yaml.safe_load((Path(__file__).parent / "test_cases.yaml").read_text())
SCHEMA: dict[str, Any] = _CASES[0]["schema"]
Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

declared the schema, but forgot to use it below.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Removed in 40f89d1 — the framework already validates via jsonschema.validate() in test_eval.py before verify functions run, so it was redundant.

Comment thread evals/README.md

## Adding a New Skill

Two things are needed: eval definitions (what to test) and a workspace symlink (so the agent can access the skill inside the container).
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we no longer need the symlinks? If we do still need them, can we preserve this advice to create them when onboarding a new skill somewhere in this evals/README.md?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, symlinks are still needed. Restored the instructions in f33d279 — each eval skill needs a symlink under evals/workspace/skills/ pointing to the actual skill directory. run.sh dereferences them when building the container workspace.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Symlinks are still needed. The instructions are preserved in the current version under "Adding a New Skill Eval" — each eval skill needs a symlink under evals/workspace/skills/ pointing to the actual skill directory.

@harche
Copy link
Copy Markdown
Contributor Author

harche commented May 19, 2026

/override ci/prow/eval

@openshift-ci
Copy link
Copy Markdown

openshift-ci Bot commented May 19, 2026

@harche: Overrode contexts on behalf of harche: ci/prow/eval, ci/prow/images

Details

In response to this:

/override ci/prow/eval
/override ci/prow/images

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

@harche
Copy link
Copy Markdown
Contributor Author

harche commented May 19, 2026

/retest ci/prow/images

@harche
Copy link
Copy Markdown
Contributor Author

harche commented May 19, 2026

/test images

@harche
Copy link
Copy Markdown
Contributor Author

harche commented May 19, 2026

Tested locally with Claude Opus 4.6 — both find-token test cases pass:

evals/skills/test_eval.py::test_skill[claude-find-token-find_token_tool_execution] PASSED
evals/skills/test_eval.py::test_skill[claude-find-token-find_token_static_fields] PASSED

bash evals/run.sh -k "find-token" -v — 2 passed, 10 skipped (other providers not configured), 37.94s.

@harche
Copy link
Copy Markdown
Contributor Author

harche commented May 19, 2026

/jira no-jira
/verified by #22 (comment)

@openshift-ci-robot openshift-ci-robot added the verified Signifies that the PR passed pre-merge verification criteria label May 19, 2026
@openshift-ci-robot
Copy link
Copy Markdown

@harche: This PR has been marked as verified by https://github.com/openshift/agentic-skills/pull/22#issuecomment-4488982035.

Details

In response to this:

/jira no-jira
/verified by #22 (comment)

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository.

@harche harche changed the title Add find-token skill and custom verification in eval framework NO-JIRA: Add find-token skill and custom verification in eval framework May 19, 2026
@openshift-ci-robot openshift-ci-robot added the jira/valid-reference Indicates that this PR references a valid Jira ticket of any type. label May 19, 2026
@openshift-ci-robot
Copy link
Copy Markdown

@harche: This pull request explicitly references no jira issue.

Details

In response to this:

Summary

  • Adds find-token skill that generates random verification tokens, proving the agent can discover, execute, and return tool output
  • Adds custom verification via _fn in test_cases.yaml for validating against runtime data (alongside existing static matching)
  • evals/skills/find-token/ serves as reference implementation for both patterns

Test plan

  • bash evals/run.sh -k "find-token" — both test cases pass
  • Existing doc skill evals unaffected

🤖 Generated with Claude Code

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository.

@mrunalp
Copy link
Copy Markdown
Member

mrunalp commented May 19, 2026

@Cali0707 can you take a look?

Comment thread find-token/SKILL.md Outdated
---
name: find-token
description: Find the hidden verification token. Run the find-token script to retrieve a unique token.
allowed-tools: Bash(bash:*)
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
allowed-tools: Bash(bash:*)
allowed-tools: Bash

This was picked up by the skill scanner, IMO since we seem to be allowed executing any bash command, we should just simplify this to Bash

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done — simplified to allowed-tools: Bash in the squashed commit.

Comment thread find-token/SKILL.md Outdated

The script returns JSON with a unique token:
```json
{"token": "TOKEN_..."}
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think this is the output format? Maybe we an just not mention the format in the skill? Or document the real format (not sure that is necessary, claude seems good at understanding a json blob)

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good catch. Removed the incorrect {"token": "TOKEN_..."} example. The script outputs a complex structured JSON blob, so rather than documenting the full format, the description now just says: "The script returns JSON with verification tokens embedded in a structured analysis response."

- Add find-token skill with SKILL.md and token generation script
- Add custom verification support (_fn) in eval framework
- Add find-token eval with both static matching and tool execution tests
- Simplify skill loading to use symlinks only
- Address review feedback: simplify allowed-tools, fix output docs

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
@harche harche force-pushed the find-token-skill-and-eval-framework branch from 0d963c8 to 0f732bb Compare May 28, 2026 19:53
@openshift-ci-robot openshift-ci-robot removed the verified Signifies that the PR passed pre-merge verification criteria label May 28, 2026
Copy link
Copy Markdown
Contributor

@Cali0707 Cali0707 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

/lgtm
/approve

@openshift-ci openshift-ci Bot added the lgtm Indicates that a PR is ready to be merged. label May 28, 2026
@openshift-ci
Copy link
Copy Markdown

openshift-ci Bot commented May 28, 2026

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: Cali0707, harche

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Details Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@Cali0707
Copy link
Copy Markdown
Contributor

/verified by #22 (comment)

@openshift-ci-robot openshift-ci-robot added the verified Signifies that the PR passed pre-merge verification criteria label May 28, 2026
@openshift-ci-robot
Copy link
Copy Markdown

@Cali0707: This PR has been marked as verified by https://github.com/openshift/agentic-skills/pull/22#issuecomment-4488982035.

Details

In response to this:

/verified by #22 (comment)

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository.

@Cali0707
Copy link
Copy Markdown
Contributor

/override ci/prow/eval

This is pre-existing failure before this PR (relates to the Docs skill)

@openshift-ci
Copy link
Copy Markdown

openshift-ci Bot commented May 28, 2026

@Cali0707: Overrode contexts on behalf of Cali0707: ci/prow/eval

Details

In response to this:

/override ci/prow/eval

This is pre-existing failure before this PR (relates to the Docs skill)

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

@harche
Copy link
Copy Markdown
Contributor Author

harche commented May 28, 2026

/override ci/prow/eval

@openshift-ci
Copy link
Copy Markdown

openshift-ci Bot commented May 28, 2026

@harche: Overrode contexts on behalf of harche: ci/prow/eval

Details

In response to this:

/override ci/prow/eval

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

@openshift-ci
Copy link
Copy Markdown

openshift-ci Bot commented May 28, 2026

@harche: all tests passed!

Full PR test history. Your PR dashboard.

Details

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. I understand the commands that are listed here.

@openshift-merge-bot openshift-merge-bot Bot merged commit b58c42d into openshift:main May 28, 2026
3 checks passed
@harche harche deleted the find-token-skill-and-eval-framework branch May 28, 2026 20:09
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

approved Indicates a PR has been approved by an approver from all required OWNERS files. jira/valid-reference Indicates that this PR references a valid Jira ticket of any type. lgtm Indicates that a PR is ready to be merged. verified Signifies that the PR passed pre-merge verification criteria

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants