chore(bench): Phase 0 harness + Next.js and Vike/GemStack baseline apps by suleimansh · Pull Request #81 · gemstack-land/gemstack

suleimansh · 2026-06-28T01:54:45Z

Phase 0 of the "our AI vs Next.js" benchmark (#75). Closes #78.

Lays down a runnable Phase 0 baseline: two functionally-equivalent apps and a contract-level acceptance harness, ready to time an AI agent against.

What's here

benchmarks/README.md - harness overview, manual run steps, the intervention rubric, and fairness rules.
benchmarks/spec/product.md - the shared "Notes" product and a single HTTP contract both apps implement (the trick that lets one acceptance script grade either).
benchmarks/spec/task-001-tags.md - the Phase 0 task (add tags) and its acceptance criteria.
benchmarks/tasks/task-001-tags/accept.mjs - contract-level acceptance check; BASE_URL=<url> node accept.mjs, exit 0 = pass.
examples/bench-app-next - vanilla Next.js (App Router) baseline. pnpm dev -> :4311.
examples/bench-app-gemstack - Vike + React baseline; summarize wired through @gemstack/ai-sdk via a deterministic stub provider (no network, no key). pnpm dev -> :3100.
pnpm-workspace.yaml - declare sharp:false so pnpm dev runs cleanly (both frameworks pull the optional, prebuilt sharp).

Verification

pnpm install + pnpm --filter @gemstack/ai-sdk build clean.
Both apps boot and pass the full baseline contract (login/cookie/create/list/get/summarize/delete + 401 when unauthenticated).
Running accept.mjs against each baseline fails the identical 5 tag-specific checks and passes everything else - confirming the two apps are equivalent and the acceptance script correctly detects the unimplemented task. Adding tags correctly turns it green.

Notes

The apps are private @gemstack/example-* workspace packages (no build/publish, mirroring mcp-quickstart); no changeset needed.
Next step is the actual Phase 0 measurement: run the same agent on each app against task-001 and record time + interventions per the rubric.

…e apps Sets up the 'our AI vs Next.js' benchmark (#75) Phase 0 (#78): - benchmarks/: shared product spec + a single HTTP contract both apps implement, the task-001 (add tags) spec, and a contract-level acceptance script (accept.mjs) that grades either app via BASE_URL. - examples/bench-app-next: vanilla Next.js App Router baseline. - examples/bench-app-gemstack: Vike + React baseline with the summarize feature wired through @gemstack/ai-sdk (deterministic stub provider). - pnpm-workspace.yaml: declare sharp:false so pnpm dev runs cleanly. Both baselines pass the contract and fail the identical 5 tag checks, confirming the apps are equivalent and the acceptance script is correct.

suleimansh added enhancement New feature or request priority: medium Worth doing, not urgent labels Jun 28, 2026

suleimansh self-assigned this Jun 28, 2026

suleimansh merged commit 24f4413 into main Jun 28, 2026
1 check passed

suleimansh deleted the bench/orchestration-vs-next branch June 28, 2026 07:30

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

chore(bench): Phase 0 harness + Next.js and Vike/GemStack baseline apps#81

chore(bench): Phase 0 harness + Next.js and Vike/GemStack baseline apps#81
suleimansh merged 1 commit into
mainfrom
bench/orchestration-vs-next

suleimansh commented Jun 28, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Uh oh!

Conversation

suleimansh commented Jun 28, 2026

What's here

Verification

Notes

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant