Skip to content

feat(cloud): upload compressed sqlite bundles#78

Merged
vincentkoc merged 1 commit into
mainfrom
feature/compressed-sqlite-bundles
May 28, 2026
Merged

feat(cloud): upload compressed sqlite bundles#78
vincentkoc merged 1 commit into
mainfrom
feature/compressed-sqlite-bundles

Conversation

@vincentkoc
Copy link
Copy Markdown
Member

Summary

  • bump crawlkit to v0.11.0
  • publish sanitized SQLite mirrors as gzip chunk bundles with explicit count/privacy manifests
  • keep D1 row ingest intact and verify the compressed bundle still excludes DM rows

Validation

  • GOWORK=off go test -count=1 ./... -coverprofile=coverage.out
  • filtered total coverage: 85.0%
  • GOWORK=off go vet ./...
  • git diff --check

@vincentkoc vincentkoc marked this pull request as ready for review May 28, 2026 18:47
@vincentkoc vincentkoc requested a review from a team as a code owner May 28, 2026 18:47
@clawsweeper
Copy link
Copy Markdown

clawsweeper Bot commented May 28, 2026

Codex review: needs changes before merge. Reviewed May 28, 2026, 2:49 PM ET / 18:49 UTC.

Summary
The branch bumps github.com/openclaw/crawlkit to v0.11.0 and changes discrawl cloud publish to upload a sanitized SQLite archive as a gzip bundle with a count/privacy manifest while keeping D1 ingest tests around non-DM rows.

Reproducibility: not applicable. this is a feature PR for changing the cloud SQLite mirror upload format, not a bug report with a failing current-main reproduction path.

Review metrics: 2 noteworthy metrics.

  • PR surface: 5 files changed, +113/-72. The change mixes dependency, CLI publish contract, tests, and release-note edits, so review should check both behavior and upgrade shape.
  • Cloud output key: 1 key renamed. sqlite_object is replaced by sqlite_bundle, which is the main compatibility-sensitive part of the diff.

Merge readiness
Overall: 🧂 unranked krab
Proof: 🧂 unranked krab
Patch quality: 🦐 gold shrimp
Result: blocked until real behavior proof is added.

Overall follows the weaker of proof and patch quality, so missing proof can cap an otherwise strong patch.

Rank-up moves:

  • [P1] Add redacted terminal output or logs from an after-fix discrawl cloud publish run that shows the bundle upload path.
  • Preserve or deliberately migrate the existing sqlite_object output contract.
  • Remove the normal-PR CHANGELOG.md edit.

Proof guidance:

  • [P1] Needs real behavior proof before merge: The PR body lists tests and vet output but does not include after-fix real behavior proof from a live or local cloud publish run; redacted terminal output or logs showing the bundle upload would satisfy the gate. After adding proof, update the PR body; ClawSweeper should re-review automatically. If it does not, the PR author or someone with repository write access can comment @clawsweeper re-review.

Risk before merge

  • [P1] Changing the published result key from sqlite_object to sqlite_bundle can break scripts or operators that parse current discrawl cloud publish output without a compatibility period.
  • [P1] The feature relies on new crawlkit bundle/upload behavior, so review should verify the worker/R2 side expects the new upload headers and manifest contract before merge.
  • [P1] The branch edits CHANGELOG.md, but repository policy says normal PRs should leave release-note aggregation to the release process.

Maintainer options:

  1. Preserve publish-output compatibility (recommended)
    Keep sqlite_object for existing consumers, add sqlite_bundle alongside it or document and test a deliberate migration path before merging.
  2. Accept the breaking output rename
    Maintainers can intentionally accept the sqlite_object to sqlite_bundle rename if they are comfortable with a user-visible CLI contract break in this feature release.
Copy recommended automerge instruction
@clawsweeper automerge

Special instructions:
Preserve the current `discrawl cloud publish` result contract by keeping `sqlite_object` or adding a tested compatibility field while introducing `sqlite_bundle`; remove the normal-PR `CHANGELOG.md` edit and keep release-note context in the PR body or commit message.

Next step before merge

  • [P2] A repair worker can make the merge-blocking fixes mechanically by preserving the publish output contract and removing the release-owned changelog edit.

Security
Cleared: The diff does not broaden secrets, permissions, CI, install scripts, or dependency sources beyond the expected crawlkit version bump for the new bundle API.

Review findings

  • [P1] Preserve the existing publish result key — internal/cli/cloud_commands.go:132
  • [P3] Move release-note text out of CHANGELOG.md — CHANGELOG.md:21-22
Review details

Best possible solution:

Keep the compressed bundle behavior, but preserve or intentionally migrate the existing publish response contract and move release-note context out of CHANGELOG.md before merge.

Do we have a high-confidence way to reproduce the issue?

Not applicable: this is a feature PR for changing the cloud SQLite mirror upload format, not a bug report with a failing current-main reproduction path.

Is this the best way to solve the issue?

Unclear: uploading compressed SQLite bundles is a plausible direction, but replacing the existing output key and editing CHANGELOG.md are not the narrowest maintainable merge shape under the repository guidance.

Full review comments:

  • [P1] Preserve the existing publish result key — internal/cli/cloud_commands.go:132
    discrawl cloud publish currently reports the uploaded archive under sqlite_object; replacing it with sqlite_bundle makes existing parsers lose the field on upgrade. Please keep the old key as a compatibility alias, or add an explicit tested migration path if this rename is intentional.
    Confidence: 0.86
  • [P3] Move release-note text out of CHANGELOG.md — CHANGELOG.md:21-22
    Repository guidance says normal PRs should not edit CHANGELOG.md because release aggregation is handled separately. Please keep this release-note context in the PR body or commit message instead of changing the release-owned changelog file.
    Confidence: 0.88

Overall correctness: patch is incorrect
Overall confidence: 0.86

AGENTS.md: found and applied where relevant.

Codex review notes: model gpt-5.5, reasoning high; reviewed against 2e336070ed6d.

Label changes

Label changes:

  • add P2: This is a normal-priority cloud publishing enhancement with limited blast radius but real compatibility concerns for publish output consumers.
  • add merge-risk: 🚨 compatibility: The PR changes an existing discrawl cloud publish result key from sqlite_object to sqlite_bundle without a compatibility shim or documented migration.
  • add rating: 🧂 unranked krab: Overall readiness is 🧂 unranked krab; proof is 🧂 unranked krab and patch quality is 🦐 gold shrimp.
  • add status: 📣 needs proof: The PR needs real behavior proof before ClawSweeper can clear the contributor ask. Needs real behavior proof before merge: The PR body lists tests and vet output but does not include after-fix real behavior proof from a live or local cloud publish run; redacted terminal output or logs showing the bundle upload would satisfy the gate. After adding proof, update the PR body; ClawSweeper should re-review automatically. If it does not, the PR author or someone with repository write access can comment @clawsweeper re-review.

Label justifications:

  • P2: This is a normal-priority cloud publishing enhancement with limited blast radius but real compatibility concerns for publish output consumers.
  • merge-risk: 🚨 compatibility: The PR changes an existing discrawl cloud publish result key from sqlite_object to sqlite_bundle without a compatibility shim or documented migration.
  • rating: 🧂 unranked krab: Overall readiness is 🧂 unranked krab; proof is 🧂 unranked krab and patch quality is 🦐 gold shrimp.
  • status: 📣 needs proof: The PR needs real behavior proof before ClawSweeper can clear the contributor ask. Needs real behavior proof before merge: The PR body lists tests and vet output but does not include after-fix real behavior proof from a live or local cloud publish run; redacted terminal output or logs showing the bundle upload would satisfy the gate. After adding proof, update the PR body; ClawSweeper should re-review automatically. If it does not, the PR author or someone with repository write access can comment @clawsweeper re-review.
Evidence reviewed

Acceptance criteria:

  • [P1] GOWORK=off go test -count=1 ./...
  • [P1] GOWORK=off go vet ./...
  • [P1] git diff --check.

What I checked:

  • Repository policy read: Target AGENTS.md was read in full; it says changes must be covered by appropriate checks and normal PRs should not edit CHANGELOG.md because release note aggregation is separate. (AGENTS.md:1, 2e336070ed6d)
  • Existing cloud publish contract: Current main prints sqlite_object in the publish result after uploading the SQLite mirror, so consumers may already parse that key. (internal/cli/cloud_commands.go:127, 2e336070ed6d)
  • PR contract change: The PR diff replaces the emitted sqlite_object key with sqlite_bundle, changing the CLI JSON/status surface rather than adding the bundle field compatibly. (internal/cli/cloud_commands.go:132, 1790a4a7009a)
  • PR dependency change: The PR relies on github.com/openclaw/crawlkit v0.11.0 for BuildGzipSQLiteBundle and UploadSQLiteBundle, so the bundle implementation largely comes from the bumped dependency. (go.mod:47, 1790a4a7009a)
  • Current cloud area history: Recent main history shows cloud publishing and SQLite mirroring work landed in commits c83027c and 479ba45 before this PR; this is the code path the PR extends. (internal/cli/cloud_commands.go:116, c83027c03ef0)

Likely related people:

  • vincentkoc: Current-main blame and history for runCloudPublish and SQLite archive mirroring point to recent cloud publish commits in the same files touched by this PR. (role: recent area contributor; confidence: high; commits: c83027c03ef0, 479ba459ef7e; files: internal/cli/cloud_commands.go, internal/cli/cli_test.go)
  • Nate Spinale: The PR depends on new crawlkit SQLite bundle APIs; local module history only exposes the dependency bump here, while ownership of the external package is outside this checkout. (role: adjacent dependency contributor; confidence: low; commits: 1790a4a7009a; files: go.mod, go.sum)
What the crustacean ranks mean
  • 🦀 challenger crab: rare, exceptional readiness with strong proof, clean implementation, and convincing validation.
  • 🦞 diamond lobster: very strong readiness with only minor maintainer review expected.
  • 🐚 platinum hermit: good normal PR, likely mergeable with ordinary maintainer review.
  • 🦐 gold shrimp: useful signal, but proof or patch confidence is still limited.
  • 🦪 silver shellfish: thin signal; proof, validation, or implementation needs work.
  • 🧂 unranked krab: not merge-ready because proof is missing/unusable or there are serious correctness or safety concerns.
  • 🌊 off-meta tidepool: rating does not apply to this item.

Shiny media proof means a screenshot, video, or linked artifact directly shows the changed behavior. Runtime, network, CSP, and security claims still need visible diagnostics.

How this review workflow works
  • ClawSweeper keeps one durable marker-backed review comment per issue or PR.
  • Re-runs edit this comment so the latest verdict, findings, and automation markers stay together instead of adding duplicate bot comments.
  • A fresh review can be triggered by eligible @clawsweeper re-review comments, exact-item GitHub events, scheduled/background review runs, or manual workflow dispatch.
  • PR/issue authors and users with repository write access can comment @clawsweeper re-review or @clawsweeper re-run on an open PR or issue to request a fresh review only.
  • Maintainers can also comment @clawsweeper review to request a fresh review only.
  • Fresh-review commands do not start repair, autofix, rebase, CI repair, or automerge.
  • Maintainer-only repair and merge flows require explicit commands such as @clawsweeper autofix, @clawsweeper automerge, @clawsweeper fix ci, or @clawsweeper address review.
  • Maintainers can comment @clawsweeper explain to ask for more context, or @clawsweeper stop to stop active automation.

@clawsweeper clawsweeper Bot added rating: 🧂 unranked krab Not merge-ready due to missing proof or serious correctness/safety concerns. status: 📣 needs proof The PR needs real behavior proof before ClawSweeper can clear the contributor ask. P2 Normal priority bug or improvement with limited blast radius. merge-risk: 🚨 compatibility 🚨 Merging this PR could break existing users, config, migrations, defaults, or upgrades. labels May 28, 2026
@vincentkoc vincentkoc force-pushed the feature/compressed-sqlite-bundles branch from 1790a4a to 45e25be Compare May 28, 2026 18:57
@vincentkoc vincentkoc merged commit 09e3701 into main May 28, 2026
10 checks passed
@vincentkoc vincentkoc deleted the feature/compressed-sqlite-bundles branch May 28, 2026 19:01
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

merge-risk: 🚨 compatibility 🚨 Merging this PR could break existing users, config, migrations, defaults, or upgrades. P2 Normal priority bug or improvement with limited blast radius. rating: 🧂 unranked krab Not merge-ready due to missing proof or serious correctness/safety concerns. status: 📣 needs proof The PR needs real behavior proof before ClawSweeper can clear the contributor ask.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant