Export skills function added by Mirror Site maintainer#2138
Conversation
|
@wuchengan734-a11y is attempting to deploy a commit to the Amantus Machina Team on Vercel. A member of the Team first needs to authorize it. |
|
/review |
| const startIndexKey: IndexKey = decodedCursor ?? [undefined, endDate]; | ||
| const endIndexKey: IndexKey = [undefined, startDate]; | ||
|
|
||
| const result = await getPage(ctx, { | ||
| table: "skillSearchDigest", | ||
| index: "by_active_created", |
|
Codex review: needs real behavior proof before merge. Latest ClawSweeper review: 2026-05-22 19:20 UTC / May 22, 2026, 3:20 PM ET. Workflow note: Future ClawSweeper reviews update this same comment in place. How this review workflow works
Summary Reproducibility: yes. for the review blockers: source inspection of the PR head shows the auth-before-rate-limit ordering, owner visibility mismatch, storage-read-before-cap path, and metadata collision path. I did not execute the branch, and no inspectable runtime proof was provided. PR rating Rank-up moves:
What the crustacean ranks mean
Shiny media proof means a screenshot, video, or linked artifact directly shows the changed behavior. Runtime, network, CSP, and security claims still need visible diagnostics. Real behavior proof Risk before merge
Maintainer options:
Next step before merge Security Review findings
Review detailsBest possible solution: Land a documented mirror export endpoint only after it reuses the established public visibility boundary, throttles all attempts, preflights file/byte caps, reserves generated ZIP paths, and has redacted proof from the mirror flow. Do we have a high-confidence way to reproduce the issue? Yes for the review blockers: source inspection of the PR head shows the auth-before-rate-limit ordering, owner visibility mismatch, storage-read-before-cap path, and metadata collision path. I did not execute the branch, and no inspectable runtime proof was provided. Is this the best way to solve the issue? No. The bulk export direction may be useful, but the current patch is not the narrowest maintainable solution until it matches existing public visibility, throttles failed auth, enforces caps before blob reads, reserves generated ZIP paths, and has mirror-flow proof. Label justifications:
Full review comments:
Overall correctness: patch is incorrect Security concerns:
What I checked:
Likely related people:
Codex review notes: model gpt-5.5, reasoning high; reviewed against 232017ba6f0e. |
bc9b081 to
d69b075
Compare
|
/review |
d69b075 to
217930f
Compare
|
/review |
|
ClawSweeper PR egg 🎁 Pass real behavior proof to wake the egg and unlock a hatchable treat. Where did the egg go?
|
234cec2 to
d94c24a
Compare
|
🦞🧹 I asked ClawSweeper to review this item again. Re-review progress:
|
|
/review |
Add a new REST API endpoint that allows authenticated admin users to
export skills in bulk as a merged ZIP archive, designed for the ClawHub
China mirror site to efficiently sync skill data.
- New endpoint: GET /api/v1/skills/export?startDate=&endDate=&limit=&cursor=
- Admin-only auth via requireExportAuth (Bearer token + role check)
- Zip Slip protection: validateSlug + validateFilePath
- Duplicate ZIP path detection in buildMergedExportZip
- Per-skill metadata written to _export_skill_meta.json (avoids collision with skill files)
- Error recording: missing version/blob logged to _errors.json
- Dedicated rate limit tier: export { ip: 10, key: 60, adminKey: 600 }
- Cursor-based pagination on skillSearchDigest.by_active_created index
- Chunked parallel blob reads (50 concurrent)
Co-Authored-By: Claude <noreply@anthropic.com>
efee6d5 to
cb616f3
Compare
|
🦞🧹 I asked ClawSweeper to review this item again. Re-review progress:
|
|
@wuchengan734-a11y I pushed a follow-up commit to this branch with some updates:
Verified with focused handler/query tests, format, lint, TypeScript, and autoreview. Could you please verify the updated branch against your mirror flow? The above was generated by my AI, but @wuchengan734-a11y , I don't think we need to require this to be admin only. I don't think we would grant you guys admin access just to do this export. Will the 60 req/min work for your needs? We could definitely bump this but I just used the Note that I also updated the exported folder structure to use |
# Conflicts: # convex/skills.ts
|
🦞🧹 Reason: re-review requires an open issue or PR. Re-review progress:
|
|
@wuchengan734-a11y also note that in #2376 I capped the pagination at 250 skills. Beyond that Convex was crashing consistently. |
|
Thanks for the update. I agree that requiring site-admin access is unnecessary for our mirror flow, so using a normal API token sounds good. 60 req/min should likely be fine for incremental sync. We’ll verify whether it is sufficient for initial backfill/full mirror runs and let you know if we need a higher burst or rate limit. The / folder structure also makes sense given that slugs are no longer globally unique. We’ll update/verify our mirror-side parsing accordingly. We’ll test the updated branch against our mirror flow, thanks for your feedback. |
Summary
We are the ClawHub China mirror site from bytedance. To keep our mirror in sync with the upstream skill registry, we need an efficient way to bulk-fetch skill data. Currently there is no batch export API — the only option is to download skills one by one, which triggers rate limits and takes hours for 62k+ skills.
This PR adds a new
GET /api/v1/skills/exportendpoint that returns a merged ZIP archive of up to 1000 skills per request, with cursor-based pagination for full-catalog traversal.What's changed
New endpoint:
GET /api/v1/skills/export?startDate=<ms>&endDate=<ms>&limit=<n>&cursor=<token><slug>/subdirectories_manifest.json(skill metadata array) and_meta.jsonper skillX-Next-Cursor/X-Has-Moreresponse headersAuthentication & Authorization (
convex/lib/apiTokenAuth.ts)requireExportAuth()— enforces Bearer token + admin-only role checkEXPORT_ALLOWED_ROLESarraySecurity hardening (
convex/lib/skillZip.ts)validateSlug()— prevents Zip Slip via malicious slug paths (../../etc)validateFilePath()— blocks absolute paths,..traversal, backslashes, empty segmentsbuildMergedExportZip()— detects duplicate ZIP paths to prevent silent overwritesError recording (
convex/httpApiV1/skillsV1.ts)_errors.jsoninside the ZIP (not silently skipped)X-Export-Errorsresponse header reports error countRate limiting (
convex/lib/httpRateLimit.ts)exporttier:{ ip: 10, key: 60, adminKey: 600 }requests/minQuery layer (
convex/skills.ts)listByDateRangeinternalQuery — range scan onskillSearchDigest.by_active_createdindexFiles changed (16)
packages/schema/src/routes.tsskillsExportroute constantconvex/lib/apiTokenAuth.tsrequireExportAuth(admin-only, enabled)convex/lib/httpRateLimit.tsexportrate limit tierconvex/lib/skillZip.tsvalidateSlug,validateFilePath,buildMergedExportZipconvex/skills.tslistByDateRangeinternalQueryconvex/httpApiV1/skillsV1.tsexportSkillsV1Handlerwith security checksconvex/httpApiV1.tsconvex/http.tsconvex/devSeedExport.tstest_export_api.shconvex/lib/access.tsconvex/_generated/api.d.tspackages/schema/dist/*readme_mirror_sync.mdTesting
npx tsc --noEmit— 0 errorsbun run test(vitest) — 180 test files, 1701 tests passed, 0 failedbash test_export_api.sh— 31/31 passed, covering:Notes
No breaking changes
exportrate limit tier is additive (existingread/write/downloadunchanged)skillZip.tsadditions are new exports; existingbuildDeterministicZipunchangedReviewer attention points
adminKey: 600: The admin rate limit for export is generous (600 req/min). This was intentional to allow fast full-catalog sync, but could be reduced to 60-120 if the team prefers a more conservative setting.Uint8Array.from(zipSync(...))inbuildMergedExportZipcreates a copy of the ZIP buffer. Could be optimized to returnzipSync()directly if memory pressure is a concern.Context
We are the ClawHub China mirror site operators. China mainland users experience high latency and intermittent connectivity when accessing clawhub.ai directly. This export API enables us to efficiently bulk-sync the full skill catalog to our mirror, keeping it up-to-date with hourly incremental exports using date-range queries + cursor pagination. Without this endpoint, syncing 62k+ skills requires individual downloads that take 10+ hours and frequently hit rate limits.