NO-JIRA: Add find-token skill and custom verification in eval framework by harche · Pull Request #22 · openshift/agentic-skills

harche · 2026-05-12T17:51:30Z

Summary

Adds find-token skill that generates random verification tokens, proving the agent can discover, execute, and return tool output
Adds custom verification via _fn in test_cases.yaml for validating against runtime data (alongside existing static matching)
evals/skills/find-token/ serves as reference implementation for both patterns

Test plan

bash evals/run.sh -k "find-token" — both test cases pass
Existing doc skill evals unaffected

🤖 Generated with Claude Code

harche · 2026-05-12T18:43:35Z

+
+# Load the schema from test_cases.yaml so it's defined in one place.
+_CASES = yaml.safe_load((Path(__file__).parent / "test_cases.yaml").read_text())
+SCHEMA: dict[str, Any] = _CASES[0]["schema"]


declared the schema, but forgot to use it below.

Removed in 40f89d1 — the framework already validates via jsonschema.validate() in test_eval.py before verify functions run, so it was redundant.

wking · 2026-05-12T21:42:22Z

-
-## Adding a New Skill
-
-Two things are needed: eval definitions (what to test) and a workspace symlink (so the agent can access the skill inside the container).


Do we no longer need the symlinks? If we do still need them, can we preserve this advice to create them when onboarding a new skill somewhere in this evals/README.md?

Yes, symlinks are still needed. Restored the instructions in f33d279 — each eval skill needs a symlink under evals/workspace/skills/ pointing to the actual skill directory. run.sh dereferences them when building the container workspace.

Symlinks are still needed. The instructions are preserved in the current version under "Adding a New Skill Eval" — each eval skill needs a symlink under evals/workspace/skills/ pointing to the actual skill directory.

harche · 2026-05-19T14:46:03Z

/override ci/prow/eval

openshift-ci · 2026-05-19T14:46:26Z

@harche: Overrode contexts on behalf of harche: ci/prow/eval, ci/prow/images

Details

In response to this:

/override ci/prow/eval
/override ci/prow/images

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

harche · 2026-05-19T14:46:57Z

/retest ci/prow/images

harche · 2026-05-19T14:47:14Z

/test images

harche · 2026-05-19T14:51:01Z

Tested locally with Claude Opus 4.6 — both find-token test cases pass:

evals/skills/test_eval.py::test_skill[claude-find-token-find_token_tool_execution] PASSED
evals/skills/test_eval.py::test_skill[claude-find-token-find_token_static_fields] PASSED

bash evals/run.sh -k "find-token" -v — 2 passed, 10 skipped (other providers not configured), 37.94s.

harche · 2026-05-19T14:55:23Z

/jira no-jira
/verified by #22 (comment)

openshift-ci-robot · 2026-05-19T14:55:36Z

@harche: This PR has been marked as verified by https://github.com/openshift/agentic-skills/pull/22#issuecomment-4488982035.

Details

In response to this:

/jira no-jira
/verified by #22 (comment)

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository.

openshift-ci-robot · 2026-05-19T14:56:24Z

@harche: This pull request explicitly references no jira issue.

Details

In response to this:

Summary

Adds find-token skill that generates random verification tokens, proving the agent can discover, execute, and return tool output

Adds custom verification via _fn in test_cases.yaml for validating against runtime data (alongside existing static matching)

evals/skills/find-token/ serves as reference implementation for both patterns

Test plan

bash evals/run.sh -k "find-token" — both test cases pass

Existing doc skill evals unaffected

🤖 Generated with Claude Code

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository.

mrunalp · 2026-05-19T21:02:50Z

@Cali0707 can you take a look?

Cali0707 · 2026-05-28T18:02:13Z

+---
+name: find-token
+description: Find the hidden verification token. Run the find-token script to retrieve a unique token.
+allowed-tools: Bash(bash:*)


Suggested change

allowed-tools: Bash(bash:*)

allowed-tools: Bash

This was picked up by the skill scanner, IMO since we seem to be allowed executing any bash command, we should just simplify this to Bash

Done — simplified to allowed-tools: Bash in the squashed commit.

Cali0707 · 2026-05-28T18:04:58Z

+
+The script returns JSON with a unique token:
+```json
+{"token": "TOKEN_..."}


I don't think this is the output format? Maybe we an just not mention the format in the skill? Or document the real format (not sure that is necessary, claude seems good at understanding a json blob)

Good catch. Removed the incorrect {"token": "TOKEN_..."} example. The script outputs a complex structured JSON blob, so rather than documenting the full format, the description now just says: "The script returns JSON with verification tokens embedded in a structured analysis response."

- Add find-token skill with SKILL.md and token generation script - Add custom verification support (_fn) in eval framework - Add find-token eval with both static matching and tool execution tests - Simplify skill loading to use symlinks only - Address review feedback: simplify allowed-tools, fix output docs Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

Cali0707

/lgtm
/approve

openshift-ci · 2026-05-28T19:57:44Z

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: Cali0707, harche

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Details

Needs approval from an approver in each of these files:

~~OWNERS~~ [Cali0707,harche]

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

Cali0707 · 2026-05-28T19:57:45Z

/verified by #22 (comment)

openshift-ci-robot · 2026-05-28T19:57:56Z

@Cali0707: This PR has been marked as verified by https://github.com/openshift/agentic-skills/pull/22#issuecomment-4488982035.

Details

In response to this:

/verified by #22 (comment)

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository.

Cali0707 · 2026-05-28T20:00:33Z

/override ci/prow/eval

This is pre-existing failure before this PR (relates to the Docs skill)

openshift-ci · 2026-05-28T20:00:48Z

@Cali0707: Overrode contexts on behalf of Cali0707: ci/prow/eval

Details

In response to this:

/override ci/prow/eval

This is pre-existing failure before this PR (relates to the Docs skill)

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

harche · 2026-05-28T20:01:09Z

/override ci/prow/eval

openshift-ci · 2026-05-28T20:01:29Z

@harche: Overrode contexts on behalf of harche: ci/prow/eval

Details

In response to this:

/override ci/prow/eval

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

openshift-ci · 2026-05-28T20:01:30Z

@harche: all tests passed!

Full PR test history. Your PR dashboard.

Details

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. I understand the commands that are listed here.

openshift-ci Bot requested a review from mrunalp May 12, 2026 17:51

openshift-ci Bot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label May 12, 2026

harche commented May 12, 2026

View reviewed changes

wking reviewed May 12, 2026

View reviewed changes

openshift-ci-robot added the verified Signifies that the PR passed pre-merge verification criteria label May 19, 2026

harche changed the title ~~Add find-token skill and custom verification in eval framework~~ NO-JIRA: Add find-token skill and custom verification in eval framework May 19, 2026

openshift-ci-robot added the jira/valid-reference Indicates that this PR references a valid Jira ticket of any type. label May 19, 2026

Cali0707 reviewed May 28, 2026

View reviewed changes

harche force-pushed the find-token-skill-and-eval-framework branch from 0d963c8 to 0f732bb Compare May 28, 2026 19:53

openshift-ci-robot removed the verified Signifies that the PR passed pre-merge verification criteria label May 28, 2026

Cali0707 approved these changes May 28, 2026

View reviewed changes

openshift-ci Bot assigned Cali0707 May 28, 2026

openshift-ci Bot added the lgtm Indicates that a PR is ready to be merged. label May 28, 2026

openshift-ci-robot added the verified Signifies that the PR passed pre-merge verification criteria label May 28, 2026

openshift-merge-bot Bot merged commit b58c42d into openshift:main May 28, 2026
3 checks passed

harche deleted the find-token-skill-and-eval-framework branch May 28, 2026 20:09


		## Adding a New Skill

		Two things are needed: eval definitions (what to test) and a workspace symlink (so the agent can access the skill inside the container).

Conversation

harche commented May 12, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Test plan

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

harche commented May 19, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

openshift-ci Bot commented May 19, 2026

Uh oh!

harche commented May 19, 2026

Uh oh!

harche commented May 19, 2026

Uh oh!

harche commented May 19, 2026

Uh oh!

harche commented May 19, 2026

Uh oh!

openshift-ci-robot commented May 19, 2026

Uh oh!

openshift-ci-robot commented May 19, 2026

Summary

Test plan

Uh oh!

mrunalp commented May 19, 2026

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Cali0707 left a comment

Choose a reason for hiding this comment

Uh oh!

openshift-ci Bot commented May 28, 2026

Uh oh!

Cali0707 commented May 28, 2026

Uh oh!

openshift-ci-robot commented May 28, 2026

Uh oh!

Cali0707 commented May 28, 2026

Uh oh!

openshift-ci Bot commented May 28, 2026

Uh oh!

harche commented May 28, 2026

Uh oh!

openshift-ci Bot commented May 28, 2026

Uh oh!

openshift-ci Bot commented May 28, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

harche commented May 12, 2026 •

edited

Loading

harche commented May 19, 2026 •

edited

Loading