Rename submission modes: benchmark→private, leaderboard→public #413

msaroufim · 2026-02-02T01:59:54Z

This renames the user-facing submission modes for clarity:

BENCHMARK → PRIVATE (run benchmarks without affecting leaderboard ranking)
LEADERBOARD → PUBLIC (official submission to the public leaderboard)

Also adds SECRET mode for internal secret validation runs.

Updates Discord commands: /benchmark → /private, /ranked → /public

Review with gpu-mode/popcorn-cli#33 and gpu-mode/reference-kernels#100

This renames the user-facing submission modes for clarity: - BENCHMARK → PRIVATE (run benchmarks without affecting leaderboard ranking) - LEADERBOARD → PUBLIC (official submission to the public leaderboard) Also adds SECRET mode for internal secret validation runs. Updates Discord commands: /benchmark → /private, /ranked → /public

Copilot

Pull request overview

This PR renames user-facing submission modes from BENCHMARK/LEADERBOARD to PRIVATE/PUBLIC, introduces a new SECRET mode for internal validation, and propagates these changes through the backend, launch pipeline, reports, Discord commands, and tests.

Changes:

Update SubmissionMode enum and evaluation pipeline to use TEST, PRIVATE, PROFILE, PUBLIC, and SECRET, and adjust timeouts and run orchestration accordingly.
Adapt reporting, backend scoring/DB persistence, GitHub/Modal launchers, API validation, and Discord cogs to the new mode names and semantics (PRIVATE = non‑ranked benchmarks, PUBLIC/SECRET = ranked).
Update tests to use the new mode names and updated report strings, plus new run keys like "private" and "public" instead of "benchmark" and "leaderboard".

Reviewed changes

Copilot reviewed 13 out of 13 changed files in this pull request and generated 3 comments.

Show a summary per file

File	Description
tests/test_task.py	Swaps `SubmissionMode.BENCHMARK` with `SubmissionMode.PRIVATE` in task-config tests to reflect new non-ranked benchmark mode.
tests/test_modal.py	Updates Modal launcher tests to use `SubmissionMode.PRIVATE`/`PUBLIC` and to expect `"private"` run keys and “ranked submission” terminology.
tests/test_github.py	Adjusts GitHub launcher tests to use `SubmissionMode.PUBLIC` and to assert against `"private"` run keys in results.
tests/test_backend.py	Switches backend tests to PUBLIC/SECRET modes and new short-report wording; still expects `"leaderboard"` in configs/DB runs, which no longer matches the implementation.
src/libkernelbot/submission.py	Extends `compute_score` to accept a `mode_key` so scores can be computed from `"public"` or `"secret"` runs instead of hard-coded `"leaderboard"`.
src/libkernelbot/run_eval.py	Reworks evaluation modes so `PRIVATE` replaces bare `benchmark` and `PUBLIC`/`SECRET` replace `leaderboard`, updating how tests/benchmarks/ranked runs and timeouts are orchestrated.
src/libkernelbot/report.py	Changes reports to treat `"private"` as the benchmark run key and `"public"`/`\"secret\"` as ranked runs, and updates all user-facing messages from “Leaderboard” to “Ranked submission”.
src/libkernelbot/launchers/github.py	Maps GitHub timeouts to the new enum values, using benchmark timeout for PRIVATE and ranked timeout for PUBLIC.
src/libkernelbot/consts.py	Redefines `SubmissionMode` to `TEST`, `PRIVATE`, `PROFILE`, `PUBLIC`, `SECRET` and updates the docstring to document the new semantics.
src/libkernelbot/backend.py	Updates `submit_full`, `submit_leaderboard`, and `handle_submission` to drive PUBLIC and SECRET runs correctly, compute scores from the appropriate run key, and adjust short reports and secrecy behavior; currently also (incorrectly) ranks PRIVATE runs.
src/kernelbot/cogs/verify_run_cog.py	Adjusts verification slash commands to use PRIVATE/PUBLIC and default to PUBLIC when no mode is specified.
src/kernelbot/cogs/leaderboard_cog.py	Renames Discord commands from `/benchmark`→`/private` and `/ranked`→`/public` and wires them to the new submission modes.
src/kernelbot/api/api_utils.py	Restricts API-allowed submission modes to TEST, PRIVATE, PROFILE, and PUBLIC, matching the new public interface.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Copilot · 2026-02-02T02:07:55Z

src/libkernelbot/backend.py

+            mode_key = mode.value
            if (
-                "leaderboard" in result.runs
-                and result.runs["leaderboard"].run.success
-                and result.runs["leaderboard"].run.passed
+                mode_key in result.runs
+                and result.runs[mode_key].run.success
+                and result.runs[mode_key].run.passed


This logic uses mode_key = mode.value and then computes a score whenever mode_key exists in result.runs, which means PRIVATE submissions will also receive a non-null score. Because create_submission_run later writes that score with secret=False, PRIVATE (non-ranked) runs will end up contributing to the public leaderboard, contradicting the new semantics where only PUBLIC (and internal SECRET) runs should affect ranking. To align behavior with the new mode definitions, restrict score calculation (and setting a non-None score) to ranked modes only (e.g., PUBLIC/SECRET), and keep PRIVATE runs’ score as None so they are excluded by the DB queries that filter on score IS NOT NULL and NOT secret.

Copilot · 2026-02-02T02:07:55Z

tests/test_backend.py

    )
    reporter = MockMultReporter()
    s_id, results = await bot.submit_full(
-        req, mode=consts.SubmissionMode.LEADERBOARD, reporter=reporter
+        req, mode=consts.SubmissionMode.PUBLIC, reporter=reporter
    )

    expected_result = mock_launcher.run_submission.return_value


In this PUBLIC submission test, the mocked launcher still returns only a "leaderboard" run and the DB expectations below assert two runs with mode: "leaderboard" and non-null score, but submit_leaderboard now computes scores based on mode.value ("public"/"secret") and only when that key exists in result.runs. With the current setup the new code will not compute a score (because there is no "public"/"secret" key), and the inserted runs will have score is None, so the test will fail and will not exercise the updated ranking logic. To align the test with the implementation, update the mocked runs dict to use the new PUBLIC/SECRET keys (e.g., "public" and "secret", possibly plus "test"/"private" if you want realistic structure) and adjust the DB assertions so they expect those modes and only the PUBLIC, non-secret run to have a non-null score.

Update test data keys and expected values: - test_report.py: Change "benchmark"/"leaderboard" keys to "private"/"public" - test_submission.py: Update compute_score test to use "public" key - test_backend.py: Update mode values and mock data keys

- Add 'secret' key to mock launcher runs so SECRET mode can find its result - Fix second run's expected mode from 'public' to 'secret'

Set GITHUB_BRANCH env var to use the PR's source branch instead of falling back to main. Uses github.head_ref for PRs, github.ref_name for direct pushes.

Use side_effect to return different FullResult for each call: - First call (PUBLIC mode) returns {"public": eval_result} - Second call (SECRET mode) returns {"secret": eval_result} This prevents the backend from storing all keys from both calls.

github-actions · 2026-02-02T02:23:03Z

Coverage report

Click to see where and how coverage changed

File	Statements	Missing	Coverage	Coverage (new stmts)	Lines missing
src/libkernelbot
backend.py
consts.py
report.py					347
submission.py
utils.py
Project Total

_{This report was generated by python-coverage-comment-action}

Copilot AI review requested due to automatic review settings February 2, 2026 01:59

Copilot started reviewing on behalf of msaroufim February 2, 2026 02:00 View session

Copilot AI reviewed Feb 2, 2026

View reviewed changes

msaroufim added 4 commits February 1, 2026 18:12

Fix test_submit_full mock and expected mode

6fcc19f

- Add 'secret' key to mock launcher runs so SECRET mode can find its result - Fix second run's expected mode from 'public' to 'secret'

Fix GitHub integration test to use PR branch

2e022ca

Set GITHUB_BRANCH env var to use the PR's source branch instead of falling back to main. Uses github.head_ref for PRs, github.ref_name for direct pushes.

msaroufim mentioned this pull request Feb 2, 2026

Rename benchmarks to 'private' and leaderboards to 'public'; public submissions are not deletable by users but can be deleted by admin. #420

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Rename submission modes: benchmark→private, leaderboard→public #413

Rename submission modes: benchmark→private, leaderboard→public #413

Uh oh!

msaroufim commented Feb 2, 2026 •

edited

Loading

Uh oh!

Copilot AI left a comment

Uh oh!

Copilot AI Feb 2, 2026

Uh oh!

Copilot AI Feb 2, 2026

Uh oh!

github-actions bot commented Feb 2, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Rename submission modes: benchmark→private, leaderboard→public #413

Are you sure you want to change the base?

Rename submission modes: benchmark→private, leaderboard→public #413

Uh oh!

Conversation

msaroufim commented Feb 2, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Copilot AI Feb 2, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Feb 2, 2026

Choose a reason for hiding this comment

Uh oh!

github-actions bot commented Feb 2, 2026

Coverage report

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

msaroufim commented Feb 2, 2026 •

edited

Loading