chore(bench): Phase 0 harness + Next.js and Vike/GemStack baseline apps#81
Merged
Conversation
…e apps Sets up the 'our AI vs Next.js' benchmark (#75) Phase 0 (#78): - benchmarks/: shared product spec + a single HTTP contract both apps implement, the task-001 (add tags) spec, and a contract-level acceptance script (accept.mjs) that grades either app via BASE_URL. - examples/bench-app-next: vanilla Next.js App Router baseline. - examples/bench-app-gemstack: Vike + React baseline with the summarize feature wired through @gemstack/ai-sdk (deterministic stub provider). - pnpm-workspace.yaml: declare sharp:false so pnpm dev runs cleanly. Both baselines pass the contract and fail the identical 5 tag checks, confirming the apps are equivalent and the acceptance script is correct.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Phase 0 of the "our AI vs Next.js" benchmark (#75). Closes #78.
Lays down a runnable Phase 0 baseline: two functionally-equivalent apps and a contract-level acceptance harness, ready to time an AI agent against.
What's here
benchmarks/README.md- harness overview, manual run steps, the intervention rubric, and fairness rules.benchmarks/spec/product.md- the shared "Notes" product and a single HTTP contract both apps implement (the trick that lets one acceptance script grade either).benchmarks/spec/task-001-tags.md- the Phase 0 task (add tags) and its acceptance criteria.benchmarks/tasks/task-001-tags/accept.mjs- contract-level acceptance check;BASE_URL=<url> node accept.mjs, exit 0 = pass.examples/bench-app-next- vanilla Next.js (App Router) baseline.pnpm dev-> :4311.examples/bench-app-gemstack- Vike + React baseline; summarize wired through@gemstack/ai-sdkvia a deterministic stub provider (no network, no key).pnpm dev-> :3100.pnpm-workspace.yaml- declaresharp:falsesopnpm devruns cleanly (both frameworks pull the optional, prebuiltsharp).Verification
pnpm install+pnpm --filter @gemstack/ai-sdk buildclean.accept.mjsagainst each baseline fails the identical 5 tag-specific checks and passes everything else - confirming the two apps are equivalent and the acceptance script correctly detects the unimplemented task. Adding tags correctly turns it green.Notes
@gemstack/example-*workspace packages (no build/publish, mirroringmcp-quickstart); no changeset needed.