`@remotion/skills-evals`: Add visual skill eval runner by samohovets · Pull Request #7337 · remotion-dev/remotion

samohovets · 2026-05-12T17:01:23Z

Summary

Add a private @remotion/skills-evals package for Pi-based visual eval scenarios against repo-local Remotion skills.
Provide a local React dashboard for scenario-scoped comparisons, before/after video review, Pi exports, manifests, logs, and rendered skill diffs.
Copy the blank template and skill snapshots into isolated run folders, then prompt Pi in two turns: scenario creation first, render-artifact follow-up in the same session.
Support iteration by comparing a scenario against its latest successful comparison before falling back to the appropriate Git baseline.
Seed an animated bar chart scenario and add typography/readability guidance for Remotion skills.
Should help us to address Skills: How do we prevent the AI from making the font sizes / proportions too small? #7256.

Test plan

cd packages/skills-evals && bun run format
cd packages/skills-evals && bun run lint
Browser smoke check for the dashboard home, scenario page, and comparison page

vercel · 2026-05-12T17:01:27Z

The latest updates on your projects. Learn more about Vercel for GitHub.

Project	Deployment	Actions	Updated (UTC)
bugs	Ready	Preview, Comment	May 14, 2026 0:15am

1 Skipped Deployment

Project	Deployment	Actions	Updated (UTC)
remotion	Skipped		May 14, 2026 0:15am

pullfrog

Important

Two issues will keep the runner from working out of the box on a clean checkout: the sandbox project lands inside the monorepo's packages/** workspace glob (so bun install resolves Remotion to the in-repo workspace instead of the registry), and getCurrentRemotionVersion() returns the not-yet-published version every release-targeted PR bumps to. There's also a credential-leak shape worth fixing before this is wired into anything shared (full process.env is forwarded to the pi agent, and the recursive walker follows symlinks back out of the sandbox).

TL;DR — Adds a private @remotion/skills-evals package that drives the external pi CLI through a fresh blank-template sandbox, collects visual artifacts and Pi session exports, and renders a static dark-mode HTML gallery. Aimed at #7256.

Key changes

Add @remotion/skills-evals package — private Bun-CLI package with eval list|run|gallery commands, scenario registry, and an animated bar chart seed scenario.
Sandbox runner — copies packages/template-blank + packages/skills/skills/remotion per run, patches Remotion deps to the current monorepo version, runs bun install, then shells out to pi.
Manifest + gallery — writes a manifest.json per run capturing prompt, prompt hash, skill snapshot hash, command logs, and discovered artifacts; gallery.ts aggregates manifests across .runs/ into a single static HTML page.
Workspace wiring — .gitignore entry for packages/skills-evals/.runs, bun.lock workspace stub, and a new project reference in the root tsconfig.json.

_{Summary ｜ 15 files ｜ 1 commit ｜ base: main ← skills-evals-package}

Sandbox install resolves against the monorepo, not the registry

Before: N/A (new package).
After: runSkillEval writes the per-run project to packages/skills-evals/.runs/<id>/<ts>/project, then runs bun install there.

The root package.json workspaces.packages glob is packages/** with no exclusion for .runs/**, so Bun walks up from the project cwd, finds the monorepo root, and treats the sandbox package.json as a workspace member. That collapses the "install the published Remotion versions" goal into "symlink the in-repo workspace" and makes results non-portable. Either nest .runs/ outside packages/** or add an !packages/skills-evals/.runs/** entry to the workspaces list.

package.json · run-skill-eval.ts

`bun install` will fail every release-targeted PR

Before: N/A.
After: getCurrentRemotionVersion reads packages/core/src/version.ts and writes that exact version into the sandbox package.json.

AGENTS.md instructs every PR to bump version.ts to a version that has not been published yet; bun install will then fail with No matching version found for remotion@<unpublished>. There's no workspace:* fallback, no local-tarball install, and no registry-availability check. Either resolve to the latest published version (a npm view remotion version lookup or a manual override flag) or pin workspace:* against the cooperating workspace stub.

run-skill-eval.ts · packages/core/src/version.ts

Symlink-following + full env forwarding combine into a leak primitive

Before: N/A.
After: Recursive walkers use entry.isDirectory(), which returns true for symlinks-to-directories; runCommand forwards the entire process.env to both bun install and pi.

The walker in run-skill-eval.ts (and the duplicates in pi.ts/gallery.ts) follows directory symlinks. The agent under test runs inside the sandbox and can plant a symlink such as projectRoot/leak -> / or -> $HOME — discoverVisualArtifacts will then emit absolute paths into the manifest and the gallery will render <img src="file:///etc/..."> style links. hashDirectory is also exposed: a symlink to /dev/zero or a large host file inflates memory unboundedly. Independently, command.ts spreads process.env into the spawned subprocess, so any agent prompt that says "echo your env" lands the full set (OPENAI_API_KEY, GITHUB_TOKEN, AWS_*) into Pi's stdout, which then gets logged and linked from the gallery as "Pi stdout". Use lstat (or readdir({recursive: true}) plus an explicit realpath containment check) and pass an explicit env allowlist (PATH, HOME, TMPDIR, plus whatever Pi requires) instead of spreading process.env.

run-skill-eval.ts · command.ts · pi.ts

Convention drift on root `tsconfig.json` and README

Before: Other private Bun-CLI packages (packages/codex-plugin) are NOT listed in the root tsconfig.json references.
After: skills-evals is added to root references with noEmit: true.

The leaf tsconfig matches codex-plugin shape (module: es2020, moduleResolution: bundler, noEmit: true), but codex-plugin is intentionally absent from root references. Adding skills-evals there gives tsc -b a no-emit project to traverse with no .tsbuildinfo invalidation contract — drop the entry to match the established pattern. Separately, the README does not mention that pi is a hard external prerequisite (run-skill-eval.ts shells out to pi directly); a contributor running bun run eval run animated-bar-chart cold will see an opaque ENOENT.

tsconfig.json · README.md · packages/codex-plugin/tsconfig.json

^{｜ Fix all ➔ ｜ Fix 👍s ➔ ｜ View workflow run ｜ Using Claude Opus ｜ 𝕏}

pullfrog · 2026-05-12T17:19:49Z

+	const install = await runCommand({
+		command: ['bun', 'install'],
+		cwd: projectRoot,
+	});


The sandbox project at .runs/<id>/<ts>/project falls inside the root packages/** workspaces glob, so this bun install walks up to the monorepo root and treats the sandbox package.json as a workspace member — Remotion deps get symlinked to the in-repo workspace instead of being installed from the registry, defeating the per-run-fresh-install intent. Either move .runs/ outside packages/**, or add !packages/skills-evals/.runs/** to the root workspaces.packages list.

pullfrog · 2026-05-12T17:19:49Z

+		throw new Error('Could not read the current Remotion version.');
+	}
+
+	return match[1];


packages/core/src/version.ts is bumped to a not-yet-published version on every release-targeted PR (per AGENTS.md), so bun install will fail with No matching version found for remotion@<unpublished> until that release lands on npm. Resolve to the latest published version (e.g. npm view remotion version) with an override flag for testing the in-progress version explicitly.

pullfrog · 2026-05-12T17:19:49Z

+	const startedAt = Date.now();
+	const subprocess = Bun.spawn(command, {
+		cwd,
+		env: {...process.env, ...env},


Spreading process.env forwards every credential the runner has (OPENAI_API_KEY, GITHUB_TOKEN, AWS_*) into both bun install and the pi agent subprocess. The agent under test can read its own env and exfiltrate it through stdout (which is logged and linked from the gallery). Use an explicit allowlist (PATH, HOME, TMPDIR, plus whatever Pi truly needs).

pullfrog · 2026-05-12T17:19:49Z

+	});
+	const timeout = timeoutMs
+		? setTimeout(() => {
+				subprocess.kill();


On timeout, subprocess.kill() sends SIGTERM to pi only — any children it spawned (e.g. remotion render/Chromium) become orphans, and the returned CommandResult carries a non-zero exitCode indistinguishable from a real failure. Consider escalating to SIGKILL after a grace window and surfacing a timedOut: true flag so callers can differentiate.

pullfrog · 2026-05-12T17:19:49Z

+const listFilesRecursively = async (dir: string): Promise<string[]> => {
+	const entries = await readdir(dir, {withFileTypes: true});
+	const files = await Promise.all(
+		entries.map((entry) => {
+			const absolutePath = join(dir, entry.name);
+
+			if (entry.isDirectory()) {
+				if (entry.name === 'node_modules') {
+					return [];
+				}
+
+				return listFilesRecursively(absolutePath);
+			}
+
+			return [absolutePath];
+		}),
+	);
+
+	return files.flat().sort();
+};


entry.isDirectory() returns true for symlinks pointing at directories, so the walker happily descends out of the sandbox if the agent under test (or any artifact written by pi) plants a symlink like leak -> / or -> $HOME. discoverVisualArtifacts then emits absolute host paths into the manifest, and hashDirectory reads arbitrary host files into memory. Switch to lstat (or guard with realpath containment) before recursing.

pullfrog · 2026-05-12T17:19:49Z

+	}
+};
+
+main();


main() is invoked without a .catch(...) so rejections from runSkillEval/generateGallery surface as unhandled-rejection warnings rather than a clean non-zero exit. Wrap with main().catch((err) => { process.stderr.write(${err?.stack ?? err}
); process.exit(1); });.

pullfrog · 2026-05-12T17:19:49Z

+		sessionDir,
+		'-p',
+		prompt,
+	];


model and sessionDir flow into Pi's argv unchecked — a value starting with -- would be reinterpreted by Pi as a flag (argument injection). Today the scenarios are checked-in, but runSkillEval is exported from index.ts so a future caller could supply attacker-controlled values; either reject leading--- values or pass -- as an end-of-options separator before user values.

pullfrog · 2026-05-12T17:19:49Z

+		return `<a href="${href}"><img src="${href}" alt="${escapeHtml(
+			artifact.relativePath,
+		)}" /></a>`;
+	}
+
+	return `<video src="${href}" controls preload="metadata"></video>`;


href is built from relative(...) + encodeURI(...), which does not encode " (legal in filenames on Linux/macOS) — a filename containing a literal " will close the src attribute and inject HTML. Same issue on lines 48 (the <a>/<img> pair) and 97–108 (Pi export, manifest, and log links). Wrap each href with escapeHtml() after encodeURI, or reuse the existing escapeHtml helper for attribute contexts.

pullfrog · 2026-05-12T17:19:49Z

 			"path": "./packages/sfx"
+		},
+		{
+			"path": "./packages/skills-evals"


Other private Bun-CLI packages (packages/codex-plugin) deliberately stay out of the root references list because they're noEmit: true with no project-build contract. Adding skills-evals here gives tsc -b a no-emit project to traverse without a .tsbuildinfo invalidation contract — drop the entry to match the established pattern.

pullfrog · 2026-05-12T17:19:49Z

+bun run eval run animated-bar-chart
+bun run eval run --all
+bun run eval gallery
+```


The runner shells out to a pi binary (src/pi.ts) that is not a workspace dependency. A contributor running bun run eval run animated-bar-chart cold will see an opaque ENOENT — please document pi as a prerequisite and how to install it.

Move the eval UI to a React dashboard and simplify the scenario comparison flow so skill iteration is easier to inspect.

Compare skills against HEAD, expose plain scenario runs when there is no diff, and make Pi render artifacts to a predictable project path.

Keep the private skills eval tool out of the publishable package registry so monorepo consistency checks only validate real Remotion packages.

pullfrog

Important

A few correctness and lifecycle issues that bite as soon as a second scenario or a timeout shows up. The HTTP server also needs a top-level error handler — a couple of paths currently throw straight into Bun's default 500.

TL;DR — Adds a private @remotion/skills-evals package with a Bun-served dashboard, CLI, and Pi-based comparison runner that copies the blank template, snapshots packages/skills/skills, runs two Pi sessions per scenario, and diffs the resulting artifacts against the HEAD baseline.

Key changes

New @remotion/skills-evals private package — sets up the workspace entry (package.json, tsconfig.json, eslint.config.mjs) and wires it into bun.lock, packages/cli/create-video package lists, packages/studio-shared/package-info.ts, and the root tsconfig.json.
Pi-driven eval pipeline — runSkillEval materializes the blank template, copies skills into .pi/skills/remotion, runs bun install plus two Pi turns (scenario then render), exports the session, and writes a manifest with logged commands and discovered artifacts.
Comparison runner — runSkillEvalComparison archives HEAD's packages/skills/skills via git archive | tar -x, diffs it against the working tree, and runs before/after Pi sessions in parallel under .runs/comparisons/<scenario>/<id>.
Local React dashboard — Bun.serve on 127.0.0.1:3030 with SSR routes for home, scenario, run, and comparison pages, plus /files/* for serving artifacts and /api/{compare,run,jobs} for kicking off and polling jobs.
Pi stream parsing & seed scenario — pi-stream-extension.ts parses Pi's streaming output to surface phase progress in the dashboard, and scenarios.ts seeds an animated bar chart scenario; packages/skills/skills/remotion/SKILL.md gains typography/readability guidance.

_{Summary ｜ 31 files ｜ 11 commits ｜ base: main ← skills-evals-package}

Scenario id sanitization is asymmetric between disk and URLs

Before: N/A (new code).
After: Runs write to join(runsRoot, sanitizePathPart(id), …) but URLs are built from the raw manifest.id.

runSkillEval writes to join(runsRoot, sanitizePathPart(input.id), …) (run-skill-eval.ts:280) and runSkillEvalComparison writes to join(comparisonsRoot, sanitizePathPart(scenario.id), …), but the URLs and lookups use the raw id (jobs.ts:236-238, run.tsx:64, scenario.tsx:80, scenario.tsx:57's loadPlainRuns(scenario.id)). The seed animated-bar-chart is a fixed point of sanitizePathPart, so the dashboard works today; the moment anyone adds a scenario with an uppercase letter, space, or id longer than 80 characters, completed runs return 404. Either pass the sanitized id everywhere a URL or directory name is built, or persist the sanitized id on the manifest and route off that.

run-skill-eval.ts · jobs.ts · scenario.tsx

Subprocess timeout doesn't reap the tree and is invisible to callers

Before: N/A.
After: setTimeout(() => subprocess.kill(), timeoutMs) only SIGTERMs the direct child.

Scenarios pass a 20-minute timeoutMs through to pi, but subprocess.kill() (command.ts:71) sends a single SIGTERM to the immediate child — the actual workloads (pi's provider client, bun install, sh -c 'git archive | tar -x') live in descendant processes, which keep running after the parent exits, hold file descriptors on .runs/, and continue burning API credits on the Pi side. runCommand also gives the caller no signal that a timeout fired: it just returns whatever exit code the killed process produced (commonly 143/null), so runPi throws Pi failed (143) and the dashboard cannot distinguish a hung session from a real crash.

command.ts · pi.ts

Comparison runs orphan their sibling on first failure

Before: N/A.
After: Promise.allSettled([runSnapshot('before'), runSnapshot('after')]) waits for both.

When one Pi snapshot rejects early (e.g. bun install fails in seconds), Promise.allSettled keeps awaiting the surviving 20-minute Pi run before re-throwing — the function will reliably report the failure, but only after the other side has finished spending the full timeout's worth of compute. Either race the two with cancellation (AbortSignal plumbed through runCommand, plus SIGTERM/SIGKILL on the sibling) or sequence them.

compare.ts

`/files/*` boundary check is fragile

Before: N/A.
After: if (!file.startsWith(\${resolvedRunsRoot}/`)) return notFound();`

Two issues with one line. The hard-coded / makes this guard reject every path on Windows, where path.resolve returns backslash-separated strings — so /files/* is unusable there. Separately, if anything inside .runs/ ever becomes a symlink (workspace node_modules from bun install regularly contains them), resolve won't follow it and the check passes for a path whose realpath is outside the runs root. The same path.relative(root, file) idiom already used in shared.tsx's toFileUrl handles both cases.

files.ts · shared.tsx

`Bun.serve` has no `error` handler

Before: N/A.
After: Any route handler throw bubbles to Bun's default 500 with no log line.

decodeURIComponent(pathname.replace(/^\/files\//, '')) (routes.ts:65) throws URIError on any malformed percent-encoded request; renderScenario awaits getSkillDiffState() which throws when git is absent or returns non-zero; loadComparison / loadRun read absolute paths stored in manifest.json / comparison.json, so a relocated .runs/ (different checkout, different machine) throws straight out of the handler. None of these have a try { … } catch and Bun.serve is configured without error() (server.tsx:6-11), so the developer gets opaque 500s with no console output. Adding error(err) { console.error(err); return new Response('Internal Server Error', {status: 500}); } and a try/catch around the git call in renderScenario covers the realistic failure modes.

server.tsx · routes.ts · scenario.tsx

^{｜ Fix all ➔ ｜ Fix 👍s ➔ ｜ View workflow run ｜ Using Claude Opus ｜ 𝕏}

pullfrog · 2026-05-13T17:39:42Z

+	const file = resolve(runsRoot, relativePath);
+	const resolvedRunsRoot = resolve(runsRoot);
+
+	if (!file.startsWith(`${resolvedRunsRoot}/`)) {


Hard-coded / makes this guard reject every legitimate path on Windows where path.resolve returns backslash-separated strings, and resolve won't follow symlinks if anything inside .runs/ ever ends up symlinked (workspace node_modules from bun install regularly contains them). Same path.relative(root, file) idiom already used in shared.tsx#toFileUrl handles both cases.

Suggested change

if (!file.startsWith(`${resolvedRunsRoot}/`)) {

const relativeToRoot = relative(resolvedRunsRoot, file);

if (relativeToRoot.startsWith('..') || isAbsolute(relativeToRoot)) {

return notFound();

}

pullfrog · 2026-05-13T17:39:43Z

+		? setTimeout(() => {
+				subprocess.kill();
+			}, timeoutMs)
+		: null;


subprocess.kill() only SIGTERMs the direct child, so a 20-minute Pi timeout leaves the provider client, bun install, and sh -c 'git archive | tar -x' descendants running after this resolves — they keep burning credits and hold file descriptors on .runs/. Also nothing tells the caller a timeout fired: runPi throws Pi failed (143) and the dashboard cannot distinguish a hung run from a real crash. Spawn with a new process group and on timeout signal -pgid, escalate to SIGKILL after a grace period, and surface a timedOut flag on CommandResult.

pullfrog · 2026-05-13T17:39:43Z

+	const [beforeResult, afterResult] = await Promise.allSettled([
+		runSnapshot('before'),
+		runSnapshot('after'),
+	]);


Promise.allSettled waits for both snapshots even after one rejects, so a fast failure on one side leaves the surviving 20-minute Pi run to finish before this function re-throws — wasted compute and model budget every time. Plumb an AbortSignal through runCommand and cancel the sibling on first rejection (or run them sequentially).

pullfrog · 2026-05-13T17:39:43Z

+			job.message = 'Run complete.';
+			job.resultUrl = `/runs/${encodeURIComponent(
+				result.manifest.id,
+			)}/${encodeURIComponent(basename(result.manifest.runDir))}`;


result.manifest.id is the raw scenario id, but result.manifest.runDir was built with sanitizePathPart(input.id) — for any scenario id that isn't a fixed point of the sanitizer (uppercase, spaces, > 80 chars), /runs/<rawId>/<runDir> will 404 because the route handler joins the raw id back into runsRoot. The seed animated-bar-chart happens to work, so this is latent until a new scenario is added. Persist the sanitized id on the manifest (or apply sanitizePathPart here) and use it consistently when building hrefs and reading by id (e.g. run.tsx:9, scenario.tsx:57).

pullfrog · 2026-05-13T17:39:43Z

+
+	'/files/*': (request: Request) => {
+		const {pathname} = new URL(request.url);
+		const relativePath = decodeURIComponent(pathname.replace(/^\/files\//, ''));


decodeURIComponent throws URIError on malformed percent-encoded paths (e.g. /files/%E0%A4%A), which propagates to Bun's default error handling because Bun.serve is configured without an error() callback. Wrap in try { … } catch { return notFound(); } here, and add an error handler to the Bun.serve config in server.tsx so unexpected throws (stale absolute paths in manifests, missing git, etc.) at least log the request that produced them.

Co-authored-by: Cursor <cursoragent@cursor.com>

@remotion/skills-evals: Add visual skill eval runner

86560b5

samohovets changed the title ~~: Add visual skill eval runner~~ Skills: Add visual skill eval runner May 12, 2026

samohovets changed the title ~~Skills: Add visual skill eval runner~~ @remotion/skills-evals: Add visual skill eval runner May 12, 2026

samohovets marked this pull request as draft May 12, 2026 17:01

vercel Bot deployed to Preview – bugs May 12, 2026 17:02 View deployment

vercel Bot deployed to Preview – remotion May 12, 2026 17:04 View deployment

pullfrog Bot reviewed May 12, 2026

View reviewed changes

web ui

b3e3cb2

vercel Bot temporarily deployed to Preview – remotion May 12, 2026 21:37 Inactive

vercel Bot deployed to Preview – bugs May 12, 2026 21:38 View deployment

@remotion/skills-evals: Refine dashboard workflow

fa43e34

Move the eval UI to a React dashboard and simplify the scenario comparison flow so skill iteration is easier to inspect.

vercel Bot deployed to Preview – bugs May 13, 2026 10:55 View deployment

vercel Bot deployed to Preview – remotion May 13, 2026 10:55 View deployment

re-structure files

5419f70

vercel Bot temporarily deployed to Preview – remotion May 13, 2026 11:15 Inactive

vercel Bot deployed to Preview – bugs May 13, 2026 11:15 View deployment

@remotion/skills-evals: Simplify skill iteration flow

1b4f44d

Compare skills against HEAD, expose plain scenario runs when there is no diff, and make Pi render artifacts to a predictable project path.

vercel Bot temporarily deployed to Preview – remotion May 13, 2026 15:07 Inactive

vercel Bot deployed to Preview – bugs May 13, 2026 15:08 View deployment

@remotion/skills-evals: Exclude internal evals from package checks

949364b

Keep the private skills eval tool out of the publishable package registry so monorepo consistency checks only validate real Remotion packages.

vercel Bot temporarily deployed to Preview – remotion May 13, 2026 15:14 Inactive

vercel Bot deployed to Preview – bugs May 13, 2026 15:14 View deployment

correct fix for the package checks

bc27383

vercel Bot deployed to Preview – bugs May 13, 2026 15:46 View deployment

vercel Bot deployed to Preview – remotion May 13, 2026 15:48 View deployment

samohovets added 2 commits May 13, 2026 18:12

Merge remote-tracking branch 'origin/main' into skills-evals-package

3b34ba8

@remotion/skills-evals: Align version with main

c96529e

vercel Bot deployed to Preview – bugs May 13, 2026 16:14 View deployment

vercel Bot deployed to Preview – remotion May 13, 2026 16:16 View deployment

samohovets added 2 commits May 13, 2026 18:33

@remotion/skills-evals: Stream Pi progress in dashboard

d9c30cd

@remotion/skills-evals: Improve scenario dashboard workflow

7ed7f30

vercel Bot temporarily deployed to Preview – remotion May 13, 2026 16:59 Inactive

vercel Bot deployed to Preview – bugs May 13, 2026 17:00 View deployment

samohovets marked this pull request as ready for review May 13, 2026 17:25

pullfrog Bot reviewed May 13, 2026

View reviewed changes

Merge branch 'main' into skills-evals-package

ff15414

vercel Bot deployed to Preview – bugs May 14, 2026 06:49 View deployment

vercel Bot deployed to Preview – remotion May 14, 2026 06:51 View deployment

Bump @remotion/skills-evals to 4.0.461 to match monorepo

4e84149

Co-authored-by: Cursor <cursoragent@cursor.com>

vercel Bot temporarily deployed to Preview – remotion May 14, 2026 08:46 Inactive

vercel Bot deployed to Preview – bugs May 14, 2026 08:47 View deployment

samohovets added 2 commits May 14, 2026 13:15

pullfrog fixes

4ef60fb

remove test scenario

9412488

vercel Bot deployed to Preview – bugs May 14, 2026 11:17 View deployment

vercel Bot deployed to Preview – remotion May 14, 2026 11:18 View deployment

revert skill changes

ba57aca

vercel Bot temporarily deployed to Preview – remotion May 14, 2026 12:15 Inactive

vercel Bot deployed to Preview – bugs May 14, 2026 12:15 View deployment

samohovets merged commit 56eae11 into main May 14, 2026
18 checks passed

samohovets deleted the skills-evals-package branch May 14, 2026 12:25

+              	}
+              };
+              main();

-	if (!file.startsWith(`${resolvedRunsRoot}/`)) {
+	const relativeToRoot = relative(resolvedRunsRoot, file);
+	if (relativeToRoot.startsWith('..') || isAbsolute(relativeToRoot)) {
+		return notFound();
+	}

Uh oh!

Conversation

samohovets commented May 12, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Test plan

Uh oh!

vercel Bot commented May 12, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

pullfrog Bot left a comment

Choose a reason for hiding this comment

Key changes

Sandbox install resolves against the monorepo, not the registry

bun install will fail every release-targeted PR

Symlink-following + full env forwarding combine into a leak primitive

Convention drift on root tsconfig.json and README

Uh oh!

pullfrog Bot May 12, 2026

Choose a reason for hiding this comment

Uh oh!

pullfrog Bot May 12, 2026

Choose a reason for hiding this comment

Uh oh!

pullfrog Bot May 12, 2026

Choose a reason for hiding this comment

Uh oh!

pullfrog Bot May 12, 2026

Choose a reason for hiding this comment

Uh oh!

pullfrog Bot May 12, 2026

Choose a reason for hiding this comment

Uh oh!

pullfrog Bot May 12, 2026

Choose a reason for hiding this comment

Uh oh!

pullfrog Bot May 12, 2026

Choose a reason for hiding this comment

Uh oh!

pullfrog Bot May 12, 2026

Choose a reason for hiding this comment

Uh oh!

pullfrog Bot May 12, 2026

Choose a reason for hiding this comment

Uh oh!

pullfrog Bot May 12, 2026

Choose a reason for hiding this comment

Uh oh!

pullfrog Bot left a comment

Choose a reason for hiding this comment

Key changes

Scenario id sanitization is asymmetric between disk and URLs

Subprocess timeout doesn't reap the tree and is invisible to callers

Comparison runs orphan their sibling on first failure

/files/* boundary check is fragile

Bun.serve has no error handler

Uh oh!

pullfrog Bot May 13, 2026

Choose a reason for hiding this comment

Uh oh!

pullfrog Bot May 13, 2026

Choose a reason for hiding this comment

Uh oh!

pullfrog Bot May 13, 2026

Choose a reason for hiding this comment

Uh oh!

pullfrog Bot May 13, 2026

Choose a reason for hiding this comment

Uh oh!

pullfrog Bot May 13, 2026

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

samohovets commented May 12, 2026 •

edited

Loading

vercel Bot commented May 12, 2026 •

edited

Loading

`bun install` will fail every release-targeted PR

Convention drift on root `tsconfig.json` and README

`/files/*` boundary check is fragile

`Bun.serve` has no `error` handler