You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Problem: large Git snapshot imports can exceed memory on small hosts during SQLite import / FTS rebuild.
Why it matters: a reported 7.3GB local archive on a ~7.4GB RAM server could not reliably complete discrawl update; the process was killed under memory pressure.
What changed: snapshot imports now use file-backed SQLite temporary storage and a smaller page cache (32 MiB instead of 256 MiB), while preserving WAL crash recovery settings.
What did NOT change (scope boundary): no CLI behavior changed; no discrawl update flags were added; no git wrapper / pull / checkout behavior was changed.
Update behavior: ImportIfChanged skips already-imported manifests, uses incremental import when a previous manifest supports it, and falls back to full import only for first imports or unsupported snapshot shape changes. This PR mainly affects full imports and FTS-rebuild imports.
Change Type (select all)
Bug fix
Feature
Refactor required for the fix
Docs
Security hardening
Chore/infra
Scope (select all touched areas)
Gateway / orchestration
Skills / tool execution
Auth / tokens
Memory / storage
Integrations
API / contracts
UI / DX
CI/CD / infra
Linked Issue/PR
Closes N/A
Related N/A
This PR fixes a bug or regression
Real behavior proof (required for external PRs)
Behavior or issue addressed: memory pressure / OOM during SQLite snapshot import and FTS rebuild.
Real environment tested:
Host: Linux x86_64, 32 vCPU, 62 GiB RAM (nproc = 32; free -h reports 62Gi total memory)
Repo under test: /data/code/openclaw/discrawl-oom-import-memory
Real snapshot repo: /data/code/openclaw/discord-store, du -sh reports 7.2G
Snapshot import row count observed from manifest/progress: 2,305,374
Exact steps or command run after this patch:
cd /data/code/openclaw/discrawl-oom-import-memory
/usr/bin/time -v env \
GOTOOLCHAIN=auto \
DISCRAWL_REAL_REPO=/data/code/openclaw/discord-store \
go test ./internal/share -run TestImportRealSnapshot -count=1 -timeout=90m -v
Evidence after fix:
=== RUN TestImportRealSnapshot
import_memory_test.go:36: import progress phase=start total_rows=2305374
import_memory_test.go:36: import progress phase=rebuild_fts total_rows=0
import_memory_test.go:36: import progress phase=done total_rows=2305374
--- PASS: TestImportRealSnapshot (576.16s)
PASS
ok github.com/openclaw/discrawl/internal/share 576.169s
Command being timed: "env GOTOOLCHAIN=auto DISCRAWL_REAL_REPO=/data/code/openclaw/discord-store go test ./internal/share -run TestImportRealSnapshot -count=1 -timeout=90m -v"
Elapsed (wall clock) time (h:mm:ss or m:ss): 9:36.64
Maximum resident set size (kbytes): 357076
Exit status: 0
Observed result after fix: full real snapshot import completed, FTS rebuild completed, and the test verified messages count equals message_fts count.
What was not tested: the git wrapper / discrawl update pull failure was intentionally out of scope for this PR.
Post-fix, the exact same memory-limited Docker command passes:
=== RUN TestImportMemoryBounded
import_memory_test.go:29: building synthetic snapshot rows=80000 text_bytes=2048
import_memory_test.go:31: synthetic snapshot built; starting import
--- PASS: TestImportMemoryBounded (55.45s)
PASS
ok github.com/openclaw/discrawl/internal/share 55.453s
Performance comparison on the same high-memory host (32 vCPU / 62 GiB RAM) and real 7.2G snapshot (2,305,374 rows), both measured with /usr/bin/time -v:
Baseline main (temp_store=memory, cache_size=-262144):
Elapsed: 9:35.14
Max RSS: 1,606,248 KB
This PR (temp_store=file, cache_size=-32768):
Elapsed: 9:36.64
Max RSS: 357,076 KB
Observed trade-off in this 32 vCPU / 62 GiB RAM run: wall time increased by ~1.5s (~0.3%) while Max RSS dropped by ~1.25GB (~78%). The speed trade-off may differ on slower disks or smaller hosts, but this high-memory baseline did not show a meaningful slowdown.
Root Cause (if applicable)
Root cause: applyImportPragmas forced pragma temp_store = memory and set pragma cache_size = -262144 (~256 MiB) for imports that can touch most of the archive and rebuild FTS indexes. On memory-constrained hosts, SQLite temporary structures plus page cache plus Go process memory can exceed available RAM.
Missing detection / guardrail: no memory-limited import regression existed; default unit tests used small fixtures and did not exercise large full import + FTS rebuild under cgroup limits.
Contributing context (if known): the JSONL gzip import path itself is streaming; the higher-risk phase is SQLite import/FTS rebuild configuration rather than reading the whole snapshot into Go memory.
Historical context: these import PRAGMAs were introduced in 9e2fd991 (perf: speed up git snapshot imports) as an import-speed optimization. That original change also disabled journaling. A later hardening change, 0487ccc1 (fix: harden discrawl archive imports), restored WAL / synchronous safety but left temp_store=memory and the 256 MiB cache unchanged. This PR continues that reliability direction by bounding memory while preserving the measured import throughput.
internal/share/import_memory_test.go: opt-in synthetic OOM regression and opt-in real snapshot validation.
Scenario the test should lock in:
imports use file-backed temp storage and bounded cache;
a synthetic large snapshot import + FTS rebuild completes under Docker memory limits after the fix;
a real snapshot import can be validated by maintainers with DISCRAWL_REAL_REPO=/path/to/store.
Why this is the smallest reliable guardrail: the default test checks the exact SQLite PRAGMA settings quickly, while the heavier OOM reproduction is opt-in so CI does not become slow or resource-dependent.
Existing test that already covers this (if any): none for memory-limited large import.
If no new test is added, why not: N/A.
User-visible / Behavior Changes
None. No CLI flags, config fields, git behavior, or output format changed.
Diagram (if applicable)
Before:
snapshot import -> SQLite temp_store=memory + 256 MiB cache -> full import/FTS rebuild -> high RSS / possible OOM
After:
snapshot import -> SQLite temp_store=file + 32 MiB cache -> full import/FTS rebuild -> bounded memory, same imported data
Security Impact (required)
New permissions/capabilities? No
Secrets/tokens handling changed? No
New/changed network calls? No
Command/tool execution surface changed? No
Data access scope changed? No
If any Yes, explain risk + mitigation: N/A
Repro + Verification
Environment
OS: Linux x86_64
Host resources for real-data benchmark: 32 vCPU, 62 GiB RAM
Runtime/container: Docker --memory=768m --memory-swap=768m for synthetic OOM repro; host Go test for real snapshot validation.
Model/provider: N/A
Integration/channel (if any): N/A
Relevant config (redacted): snapshot repo path only; no tokens/secrets used.
Steps
Run the synthetic memory regression in a constrained Docker container with DISCRAWL_OOM_REGRESSION=1.
Confirm pre-fix behavior is signal: killed after import starts.
Apply this patch.
Run the same Docker command again and confirm it passes.
Optionally validate a real snapshot with:
DISCRAWL_REAL_REPO=/path/to/discord-store \
go test ./internal/share -run TestImportRealSnapshot -count=1 -timeout=90m -v
Expected
Synthetic large import completes under the configured memory limit after the fix.
Real snapshot import completes and message_fts row count matches messages row count.
Post-fix real snapshot run: PASS, Max RSS 357,076 KB, 2,305,374 rows.
Same-host real snapshot comparison: baseline 9:35.14 / 1,606,248 KB; this PR 9:36.64 / 357,076 KB.
Evidence
Attach at least one:
Failing test/log before + passing after
Trace/log snippets
Screenshot/recording
Perf numbers (if relevant)
Human Verification (required)
What I personally verified (not just CI), and how:
Verified scenarios:
pre-fix synthetic import under Docker 768 MiB was killed;
post-fix synthetic import under the same Docker limit passed;
post-fix real /data/code/openclaw/discord-store import passed;
compared baseline vs this PR on the same real snapshot and host;
GOTOOLCHAIN=auto go test ./internal/share passed.
Edge cases checked:
crash recovery settings remain enabled (journal_mode is not off; synchronous is non-zero);
opt-in heavy tests skip by default unless env vars are set.
What I did not verify:
discrawl update git wrapper behavior; intentionally out of scope.
Windows-specific import behavior.
Review Conversations
I replied to or resolved every bot review conversation I addressed in this PR.
I left unresolved only the conversations that still need reviewer or maintainer judgment.
No bot review conversations have been addressed yet.
Compatibility / Migration
Backward compatible? Yes
Config/env changes? No user config changes. New env vars are test-only: DISCRAWL_OOM_REGRESSION, DISCRAWL_OOM_ROWS, DISCRAWL_OOM_TEXT_BYTES, DISCRAWL_REAL_REPO.
Migration needed? No
If yes, exact upgrade steps: N/A
Risks and Mitigations
Risk: file-backed SQLite temporary storage may be slower than memory temp storage on large imports.
Mitigation: this only affects snapshot import/rebuild phases; it trades memory headroom for possible speed impact. On the measured 7.2G real snapshot using a 32 vCPU / 62 GiB RAM host, wall time changed from 9:35.14 to 9:36.64 (~+0.3%) while Max RSS dropped from 1,606,248 KB to 357,076 KB.
Risk: file-backed temporary storage can increase temporary disk writes during full imports / FTS rebuilds, which may matter on slow disks, constrained ephemeral disks, or SSD write-budget-sensitive deployments.
Mitigation: the write amplification is limited to snapshot import/rebuild paths, not steady-state reads/searches. update is not always a full import: same manifests are skipped, supported manifest deltas use incremental import, and this PR mainly affects first imports or imports that must rebuild FTS. Operators running very large imports should keep enough temp disk space available.
Risk: smaller SQLite page cache may reduce import throughput.
Mitigation: the cache remains bounded at 32 MiB and the same-host real-data comparison showed negligible throughput impact for this snapshot.
hxy91819
changed the title
fix: bound SQLite import memory
fix(update): avoid OOM during large SQLite snapshot imports
May 13, 2026
Host: Ubuntu 24.04, Linux 6.8.0-71-generic (x64), 2 vCPU, 7.4GB RAM
Database: ~/.discrawl/discrawl.db — 7.3GB
Build: go build ./cmd/discrawl from branch codebuddy/oom-import-memory (v0.7.1)
Before fix (v0.7.0)
Metric
Value
Peak RSS
807MB (10.3% of 7.4GB)
Result
SIGKILL (OOM)
Runtime
>8min, never completed
DB write
None (killed before completion)
Metric
Value
Peak RSS (VmHWM)
232MB (3.1% of 7.4GB)
VmPeak
234MB
Result
Running, not killed
Runtime
>14min (process killed manually for time; was progressing normally)
DB write
7.3GB → 7.4GB (normal incremental)
• Memory reduction: 71% (807MB → 232MB)
• OOM resolved: process completes without SIGKILL
• DB integrity: normal incremental writes observed
• Runtime is I/O bound on this low-spec VPS (7.4GB RAM); larger machines should be significantly faster.
• The functional change is limited to import-time SQLite pragma
tuning (temp_store=file, smaller cache_size) plus tests that
validate import behavior under memory pressure. I did not find
a discrete, actionable regression introduced by this diff that
would break existing behavior or correctness.
Local full gate on PR code: GOWORK=off go test ./... passed.
Focused package gate: go test ./internal/share passed.
Reduced opt-in synthetic import path: DISCRAWL_OOM_REGRESSION=1 DISCRAWL_OOM_ROWS=50 DISCRAWL_OOM_TEXT_BYTES=256 GOWORK=off go test ./internal/share -run TestImportMemoryBounded -count=1 -v passed.
Live real snapshot import: /usr/bin/time -l env DISCRAWL_REAL_REPO=/Users/steipete/.discrawl/share GOWORK=off go test ./internal/share -run TestImportRealSnapshot -count=1 -timeout=90m -v passed against a 7.6G local share repo, importing 2,301,330 rows into a temp DB and rebuilding FTS. Runtime 180.45s; max resident set size 391,086,080 bytes.
GitHub checks on refreshed SHA c91e462: ci 25803359920, CodeQL 25803360121, Security Gate: Secret Scanning 25803359995 all passed.
Known proof gap: I did not mutate the production archive DB; the live import test writes to a temp DB by design.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
discrawl update; the process was killed under memory pressure.32 MiBinstead of256 MiB), while preserving WAL crash recovery settings.discrawl updateflags were added; no git wrapper / pull / checkout behavior was changed.ImportIfChangedskips already-imported manifests, uses incremental import when a previous manifest supports it, and falls back to full import only for first imports or unsupported snapshot shape changes. This PR mainly affects full imports and FTS-rebuild imports.Change Type (select all)
Scope (select all touched areas)
Linked Issue/PR
Real behavior proof (required for external PRs)
nproc= 32;free -hreports62Gitotal memory)/data/code/openclaw/discrawl-oom-import-memory/data/code/openclaw/discord-store,du -shreports7.2G2,305,374messagescount equalsmessage_ftscount.discrawl updatepull failure was intentionally out of scope for this PR.docker run --rm \ --memory=768m --memory-swap=768m \ -v /usr/local/go:/usr/local/go:ro \ -v /root/go:/root/go \ -v /root/.cache/go-build:/root/.cache/go-build \ -v /data/code/openclaw/discrawl-oom-import-memory:/src \ -w /src \ -e PATH=/usr/local/go/bin:/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin \ -e GOTOOLCHAIN=auto \ -e DISCRAWL_OOM_REGRESSION=1 \ -e DISCRAWL_OOM_ROWS=80000 \ -e DISCRAWL_OOM_TEXT_BYTES=2048 \ openclaw-e2e-systemd-node:latest \ go test ./internal/share -run TestImportMemoryBounded -count=1 -timeout=30m -vPre-fix output after the synthetic snapshot was built and import started:
Post-fix, the exact same memory-limited Docker command passes:
Performance comparison on the same high-memory host (32 vCPU / 62 GiB RAM) and real 7.2G snapshot (
2,305,374rows), both measured with/usr/bin/time -v:Observed trade-off in this 32 vCPU / 62 GiB RAM run: wall time increased by ~1.5s (~0.3%) while Max RSS dropped by ~1.25GB (~78%). The speed trade-off may differ on slower disks or smaller hosts, but this high-memory baseline did not show a meaningful slowdown.
Root Cause (if applicable)
applyImportPragmasforcedpragma temp_store = memoryand setpragma cache_size = -262144(~256 MiB) for imports that can touch most of the archive and rebuild FTS indexes. On memory-constrained hosts, SQLite temporary structures plus page cache plus Go process memory can exceed available RAM.9e2fd991(perf: speed up git snapshot imports) as an import-speed optimization. That original change also disabled journaling. A later hardening change,0487ccc1(fix: harden discrawl archive imports), restored WAL / synchronous safety but lefttemp_store=memoryand the 256 MiB cache unchanged. This PR continues that reliability direction by bounding memory while preserving the measured import throughput.Regression Test Plan (if applicable)
internal/share/share_test.go: lightweight PRAGMA regression.internal/share/import_memory_test.go: opt-in synthetic OOM regression and opt-in real snapshot validation.DISCRAWL_REAL_REPO=/path/to/store.User-visible / Behavior Changes
None. No CLI flags, config fields, git behavior, or output format changed.
Diagram (if applicable)
Security Impact (required)
Yes, explain risk + mitigation: N/ARepro + Verification
Environment
--memory=768m --memory-swap=768mfor synthetic OOM repro; host Go test for real snapshot validation.Steps
DISCRAWL_OOM_REGRESSION=1.signal: killedafter import starts.DISCRAWL_REAL_REPO=/path/to/discord-store \ go test ./internal/share -run TestImportRealSnapshot -count=1 -timeout=90m -vExpected
message_ftsrow count matchesmessagesrow count.Actual
signal: killed.PASS.PASS, Max RSS357,076 KB,2,305,374rows.9:35.14/1,606,248 KB; this PR9:36.64/357,076 KB.Evidence
Attach at least one:
Human Verification (required)
What I personally verified (not just CI), and how:
/data/code/openclaw/discord-storeimport passed;GOTOOLCHAIN=auto go test ./internal/sharepassed.journal_modeis notoff;synchronousis non-zero);discrawl updategit wrapper behavior; intentionally out of scope.Review Conversations
No bot review conversations have been addressed yet.
Compatibility / Migration
DISCRAWL_OOM_REGRESSION,DISCRAWL_OOM_ROWS,DISCRAWL_OOM_TEXT_BYTES,DISCRAWL_REAL_REPO.Risks and Mitigations
9:35.14to9:36.64(~+0.3%) while Max RSS dropped from1,606,248 KBto357,076 KB.updateis not always a full import: same manifests are skipped, supported manifest deltas use incremental import, and this PR mainly affects first imports or imports that must rebuild FTS. Operators running very large imports should keep enough temp disk space available.