Add E2 reporting metrics and refresh reports by intertwine · Pull Request #58 · intertwine/security-verifiers

intertwine · 2026-02-04T01:40:53Z

Summary\n- add positive-only F1 and clean-pass metrics to E2 summaries and reports\n- document new metrics and schema updates\n- add batch report generator + skill docs\n- tighten env-config verification scoring and adapters\n\n## Testing\n- make check

claude · 2026-02-04T01:44:45Z

Code review

No issues found. Checked for bugs and CLAUDE.md compliance.

chatgpt-codex-connector

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 21826e9641

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

Open a pull request for review
Mark a draft as ready
Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

chatgpt-codex-connector · 2026-02-04T01:45:06Z

+        if tool_style:
+            primary_prefix = "kube-linter/" if (fixture_type or "k8s") == "k8s" else "semgrep/"
+            oracle = [v for v in oracle if v.get("id", "").startswith(primary_prefix)]
+            predicted = [v for v in predicted if v.get("id", "").startswith(primary_prefix)]


Avoid dropping non-prefixed predictions

When any oracle/prediction uses tool-style IDs, tool_style becomes true and this block filters predictions to kube-linter/ or semgrep/. The model output schema allows arbitrary string IDs, so it’s valid (and common) for a model to return a bare rule_id without a tool prefix. In those cases the new filter silently discards the prediction, which can turn real false positives into “clean passes” and inflate precision/F1. Consider normalizing unprefixed IDs or only filtering when both oracle and predictions are consistently tool-prefixed.

Useful? React with 👍 / 👎.

Add E2 report metrics and batch generator

21826e9

chatgpt-codex-connector Bot reviewed Feb 4, 2026

View reviewed changes

intertwine merged commit f60a411 into main Feb 4, 2026
7 checks passed

intertwine deleted the codex/metrics-report-refresh branch February 4, 2026 02:05

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add E2 reporting metrics and refresh reports#58

Add E2 reporting metrics and refresh reports#58
intertwine merged 1 commit intomainfrom
codex/metrics-report-refresh

intertwine commented Feb 4, 2026

Uh oh!

claude Bot commented Feb 4, 2026

Uh oh!

chatgpt-codex-connector Bot left a comment

Uh oh!

chatgpt-codex-connector Bot Feb 4, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

intertwine commented Feb 4, 2026

Summary\n- add positive-only F1 and clean-pass metrics to E2 summaries and reports\n- document new metrics and schema updates\n- add batch report generator + skill docs\n- tighten env-config verification scoring and adapters\n\n## Testing\n- make check

Uh oh!

claude Bot commented Feb 4, 2026

Code review

Uh oh!

chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

💡 Codex Review

Uh oh!

chatgpt-codex-connector Bot Feb 4, 2026

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant