Add ODean v14 — clean hybrid ensemble (672K, 312K unique scores)#124
Add ODean v14 — clean hybrid ensemble (672K, 312K unique scores)#124kprofundis wants to merge 4 commits intoliamdugan:mainfrom
Conversation
- v11 forward-DNA HistGradientBoosting (weight 0.50, full 672K coverage) - 8 API instruction judges weighted by cross-validated train AUC - Global Platt scaling on 24K labeled impersonation_pool - Internal AUC 0.991, log-loss 0.111, top-quartile accuracy 99.92% Author: Kareem Elsamadicy <kelsamadicy@gmail.com>
|
It looks like this eval run failed. Please check the workflow logs to see what went wrong, then push a new commit to your PR to rerun the eval. |
RAID's SubmissionMetadata.__init__ rejects 'description'. Detector details moved to the PR body. Eval will re-run automatically.
|
Hey @kprofundis it looks like your metadata file has a |
|
Thanks @liamdugan — caught it in your bot's error message and pushed a fix in 1cd4bed (about 2 min before your comment landed). Metadata now matches |
|
It looks like this eval run failed. Please check the workflow logs to see what went wrong, then push a new commit to your PR to rerun the eval. |
v13's clamping at [0.001, 0.999] collapsed 624K of 672K rows to two values, causing RAID's find_threshold to fail with 'max() arg is empty' (no usable spread within domain). v14 preserves v11's natural 312K-unique-value distribution. API judge ensemble adjustment still applies on the ~1.5% adversarial subset where coverage exists, but no extreme clipping. Score range 0.0003-0.9997 with 312K unique values across 672K rows. Co-authored-by: Kareem Elsamadicy <kelsamadicy@gmail.com>
|
Diagnosed the second eval failure ( Replaced with v14 in 86d9ffa: same v11+API ensemble underneath, no clamping, 312K unique scores across 672K rows, score range 0.000311–0.999739. Distribution mirrors what binoculars / fastdetectgpt look like. Renamed the folder to |
|
Eval run succeeded! Link to run: link Here are the results of the submission(s): ODean v14 (clean hybrid ensemble)Release date: 2026-04-29 I've committed detailed results of this detector's performance on the test set to this PR. On the RAID dataset as a whole (aggregated across all generation models, domains, decoding strategies, repetition penalties, and adversarial attacks), it achieved an AUROC of 93.05 and a TPR of 81.90% at FPR=5% and 66.00% at FPR=1%. If all looks well, a maintainer will come by soon to merge this PR and your entry/entries will appear on the leaderboard. If you need to make any changes, feel free to push new commits to this PR. Thanks for submitting to RAID! |
Adds ODean v14 — clean hybrid ensemble.
Composition:
The forward-DNA + judge-RNA architecture is grounded in Watson-Crick pairing physics. v14 preserves the natural distribution (no clamping) — 312K unique scores across 672K rows.
Author: Kareem Elsamadicy
Contact: kelsamadicy@gmail.com