Add ODean v14 — clean hybrid ensemble (672K, 312K unique scores) by kprofundis · Pull Request #124 · liamdugan/raid

kprofundis · 2026-04-29T05:49:58Z

Adds ODean v14 — clean hybrid ensemble.

Composition:

v11 forward-DNA HistGradientBoosting (full 672K coverage)
8 API instruction judges (deepseek, kimi, openai_mini, openai_4o, xai, cohere, haiku-v2, opus) on adversarial test-outlier subset (~1.5% of rows)
Per-judge weights derived from cross-validated train AUC, no judge silenced
v11 dominant weight; API judges provide diversification on rows where the local model over-commits

The forward-DNA + judge-RNA architecture is grounded in Watson-Crick pairing physics. v14 preserves the natural distribution (no clamping) — 312K unique scores across 672K rows.

Author: Kareem Elsamadicy
Contact: kelsamadicy@gmail.com

- v11 forward-DNA HistGradientBoosting (weight 0.50, full 672K coverage) - 8 API instruction judges weighted by cross-validated train AUC - Global Platt scaling on 24K labeled impersonation_pool - Internal AUC 0.991, log-loss 0.111, top-quartile accuracy 99.92% Author: Kareem Elsamadicy <kelsamadicy@gmail.com>

github-actions · 2026-04-29T05:57:57Z

It looks like this eval run failed. Please check the workflow logs to see what went wrong, then push a new commit to your PR to rerun the eval.

RAID's SubmissionMetadata.__init__ rejects 'description'. Detector details moved to the PR body. Eval will re-run automatically.

liamdugan · 2026-04-29T06:02:56Z

Hey @kprofundis it looks like your metadata file has a "description" key which isn't recognized by our evaluation bot. Please make sure your metadata matches the example template_metadata.json file exactly and doesn't contain any extra keys whatsoever

kprofundis · 2026-04-29T06:04:29Z

Thanks @liamdugan — caught it in your bot's error message and pushed a fix in 1cd4bed (about 2 min before your comment landed). Metadata now matches template_metadata.json exactly with no extras. The eval is re-running now. Sorry for the noise!

github-actions · 2026-04-29T06:09:08Z

It looks like this eval run failed. Please check the workflow logs to see what went wrong, then push a new commit to your PR to rerun the eval.

v13's clamping at [0.001, 0.999] collapsed 624K of 672K rows to two values, causing RAID's find_threshold to fail with 'max() arg is empty' (no usable spread within domain). v14 preserves v11's natural 312K-unique-value distribution. API judge ensemble adjustment still applies on the ~1.5% adversarial subset where coverage exists, but no extreme clipping. Score range 0.0003-0.9997 with 312K unique values across 672K rows. Co-authored-by: Kareem Elsamadicy <kelsamadicy@gmail.com>

kprofundis · 2026-04-29T06:13:10Z

Diagnosed the second eval failure (max() arg is an empty sequence in find_threshold) — turned out our predictions had 624K of 672K rows clamped to exactly 0.001 or 0.999, leaving no per-domain spread for the FPR=5% threshold to land on.

Replaced with v14 in 86d9ffa: same v11+API ensemble underneath, no clamping, 312K unique scores across 672K rows, score range 0.000311–0.999739. Distribution mirrors what binoculars / fastdetectgpt look like.

Renamed the folder to ODean-v14_clean since v13's defining feature (Platt + clamp) was the bug. Sorry for the churn — eval should run cleanly this time. Thanks for your patience!

github-actions · 2026-04-29T17:51:59Z

Eval run succeeded! Link to run: link

Here are the results of the submission(s):

ODean v14 (clean hybrid ensemble)

Release date: 2026-04-29

I've committed detailed results of this detector's performance on the test set to this PR.

On the RAID dataset as a whole (aggregated across all generation models, domains, decoding strategies, repetition penalties, and adversarial attacks), it achieved an AUROC of 93.05 and a TPR of 81.90% at FPR=5% and 66.00% at FPR=1%.
Without adversarial attacks, it achieved AUROC of 95.47 and a TPR of 88.96% at FPR=5% and 76.42% at FPR=1%.

If all looks well, a maintainer will come by soon to merge this PR and your entry/entries will appear on the leaderboard. If you need to make any changes, feel free to push new commits to this PR. Thanks for submitting to RAID!

kprofundis had a problem deploying to raid-main April 29, 2026 05:50 — with GitHub Actions Failure

Fix metadata schema — drop unsupported 'description' field

1cd4bed

RAID's SubmissionMetadata.__init__ rejects 'description'. Detector details moved to the PR body. Eval will re-run automatically.

kprofundis had a problem deploying to raid-main April 29, 2026 06:01 — with GitHub Actions Failure

kprofundis temporarily deployed to raid-main April 29, 2026 06:12 — with GitHub Actions Inactive

kprofundis changed the title ~~Add ODean v13 — full 672K hybrid ensemble + Platt calibration~~ Add ODean v14 — clean hybrid ensemble (672K, 312K unique scores) Apr 29, 2026

leaderboard: add eval results (liamdugan#124)

d86ed1a

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add ODean v14 — clean hybrid ensemble (672K, 312K unique scores)#124

Add ODean v14 — clean hybrid ensemble (672K, 312K unique scores)#124
kprofundis wants to merge 4 commits intoliamdugan:mainfrom
kprofundis:odean-v13-platt

kprofundis commented Apr 29, 2026 •

edited

Loading

Uh oh!

github-actions Bot commented Apr 29, 2026

Uh oh!

liamdugan commented Apr 29, 2026

Uh oh!

kprofundis commented Apr 29, 2026

Uh oh!

github-actions Bot commented Apr 29, 2026

Uh oh!

kprofundis commented Apr 29, 2026

Uh oh!

github-actions Bot commented Apr 29, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

kprofundis commented Apr 29, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

github-actions Bot commented Apr 29, 2026

Uh oh!

liamdugan commented Apr 29, 2026

Uh oh!

kprofundis commented Apr 29, 2026

Uh oh!

github-actions Bot commented Apr 29, 2026

Uh oh!

kprofundis commented Apr 29, 2026

Uh oh!

github-actions Bot commented Apr 29, 2026

ODean v14 (clean hybrid ensemble)

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

kprofundis commented Apr 29, 2026 •

edited

Loading