Skip to content

docs: add backfill step and memory quality guidance for semantic search#3

Closed
gilinachum wants to merge 1 commit intoinceptionstack:mainfrom
gilinachum:improve/memory-search-quality-guidance
Closed

docs: add backfill step and memory quality guidance for semantic search#3
gilinachum wants to merge 1 commit intoinceptionstack:mainfrom
gilinachum:improve/memory-search-quality-guidance

Conversation

@gilinachum
Copy link
Copy Markdown
Contributor

Two gaps found during real-world installation:

  1. Missing backfill step: existing memory files are not automatically indexed when enabling semantic search for the first time. Without 'openclaw memory index --force', all prior memory is silently missing from search results.

  2. Heartbeat noise tanks vector quality: high-frequency repetitive content (daily heartbeat logs with 'no change' entries) compresses into a dense cluster in vector space, raising the similarity floor and pushing useful memories below the retrieval threshold.

Added:

  • Step 5: explicit backfill instruction after setup
  • 'Memory quality matters' section explaining the problem
  • Clear rule: only write what changed (decisions, bugs, alerts)
  • Guidance on routing heartbeat logs to a separate excluded file pattern
  • Example exclude config for openclaw.json

Two gaps found during real-world installation:

1. Missing backfill step: existing memory files are not automatically
   indexed when enabling semantic search for the first time. Without
   'openclaw memory index --force', all prior memory is silently missing
   from search results.

2. Heartbeat noise tanks vector quality: high-frequency repetitive
   content (daily heartbeat logs with 'no change' entries) compresses
   into a dense cluster in vector space, raising the similarity floor
   and pushing useful memories below the retrieval threshold.

Added:
- Step 5: explicit backfill instruction after setup
- 'Memory quality matters' section explaining the problem
- Clear rule: only write what changed (decisions, bugs, alerts)
- Guidance on routing heartbeat logs to a separate excluded file pattern
- Example exclude config for openclaw.json
loki-bedlam pushed a commit that referenced this pull request Mar 31, 2026
Two gaps found during real-world installation:

1. Missing backfill step: existing memory files are not automatically indexed
   when enabling semantic search for the first time. Without
   'openclaw memory index --force', all prior memory is silently missing
   from search results.

2. Heartbeat noise tanks vector quality: high-frequency repetitive content
   (daily heartbeat logs with 'no change' entries) compresses into a dense
   cluster in vector space, raising the similarity floor and pushing useful
   memories below the retrieval threshold.

Added:
- Step 5: explicit backfill instruction after setup
- 'Memory quality matters' section explaining the problem
- Clear rule: only write what changed (decisions, bugs, alerts)
- Guidance on routing heartbeat logs to a separate excluded file pattern
- Example exclude config for openclaw.json

Based on PR #3 by @gilinachum, rebased onto current main with multi-agent
file structure (bootstraps/essential/) and bedrockify references.
loki-bedlam pushed a commit that referenced this pull request Mar 31, 2026
…bootstrap

Incorporates the improvements from PR #3 by @gilinachum, applied to the
current multi-agent file structure (bootstraps/essential/, bedrockify on
port 8090, OpenClaw/Hermes sections).

- Step 5: backfill existing memory with 'openclaw memory index --force'
- Memory quality section: heartbeat noise degrades vector search
- Rule: only write what changed (decisions, bugs, alerts)
- Exclude pattern config for heartbeat files

Co-authored-by: Gili Nachum <gilinachum@users.noreply.github.com>
@loki-bedlam
Copy link
Copy Markdown
Member

Hey @gilinachum — great catch on both the backfill gap and the heartbeat noise problem. These are real issues we hit in production too.

I've merged your improvements directly into main (commit 2b25946) to save you the rebase work — the file structure changed significantly since you opened this PR:

  • Path moved: essential/bootstraps/essential/
  • embedrock replaced by bedrockify (port 8089 → 8090)
  • Multi-agent sections added: file now has OpenClaw-specific and Hermes-specific sections

Your content (Step 5 backfill, memory quality guidance, heartbeat exclusion config) is now in the OpenClaw section where it belongs, with your co-author credit on the commit.

Closing this PR since it's been incorporated. Thanks for the contribution! 🙏

royosherove added a commit that referenced this pull request Apr 16, 2026
Review run by packs/codex-cli against PR #16 returned 'OVERALL VERDICT: block'
with 1 BLOCKER + 4 HIGH + 3 MEDIUM. This commit addresses all of them.

BLOCKER — secret on argv is a real leak path
---------------------------------------------
Removed --kiro-api-key from the documented flow; it's accepted only as a
hidden back-compat flag, and the pack now emits a loud warning pointing
at the argv leak (shell history + /proc/<pid>/cmdline). The manifest
param and help docs only advertise --from-secret. The kiro-cli v2 auth
doc itself recommends the env-var pattern; --from-secret gives us that
without ever writing the raw key to deploy state.

HIGH #1 — new params unreachable via install.lowkey.run
-------------------------------------------------------
Threaded --kiro-from-secret end-to-end:
  install.sh top-level:  --kiro-from-secret CLI flag → KiroFromSecret CFN
                         param in PARAM_*_NAMES arrays
  CFN template:          new KiroFromSecret parameter (plain String, not
                         NoEcho — it's just a reference), exported as
                         KIRO_FROM_SECRET env var into UserData, passed
                         to bootstrap.sh as --kiro-from-secret
  Terraform:             new kiro_from_secret variable, threaded through
                         main.tf templatefile call and userdata.sh.tpl
  bootstrap.sh:          accepts --kiro-from-secret, writes 'from-secret'
                         key into /tmp/loki-pack-config.json so
                         pack_config_get picks it up
  kiro-cli pack:         already reads from-secret via pack_config_get
                         (unchanged)
Result: 'curl install.lowkey.run | bash --kiro-from-secret /my/secret' now
actually works through the full CFN deploy path. Terraform flow likewise.
Raw key never touches CFN state / Terraform state / UserData logs — only
the secret ARN does, and IAM gates who can resolve it.

HIGH #2 — arg parser not strict enough
--------------------------------------
- --kiro-api-key: value must not start with '-' (matches codex-cli).
  Refuses '--kiro-api-key --from-secret foo' which previously set the
  key to the literal string '--from-secret'.
- --model: no longer silently swallowed when no value given. Exits 2.
- --region, --from-secret: already had the '-' guard.

HIGH #3 — mutex conflict exit code
----------------------------------
--kiro-api-key + --from-secret now exits 2 (bad-args) not 1 (runtime).
Matches the style of the rest of the parser.

HIGH #4 — 'None' bug on empty SecretString
------------------------------------------
'aws secretsmanager get-secret-value --output text' returns the literal
string 'None' when SecretString is empty. The previous non-empty check
would happily accept 'None' and write it as the API key. Switched to
--output json and a jq filter that returns empty for missing/empty
SecretString, then explicit fail if jq returns nothing. Added jq to the
require_cmd list when --from-secret is in use.

MEDIUM #1 — test.sh security coverage
-------------------------------------
test.sh now has 53 assertions (was 32):
  + arg parser exit codes: --kiro-api-key with '-' value, --model no-value,
    --region with '-' value, --from-secret with '-' value, mutex conflict
  + secure storage: chmod 600 present, umask 077 present, %q escape used,
    shell-profile.sh is secret-free (no KIRO_API_KEY= assignments), stable
    idempotency marker for .bash_profile append, --output json check
  + deploy wiring: all 6 wire points verified via grep
      - install.sh top-level has KiroFromSecret + --kiro-from-secret
      - bootstrap.sh has --kiro-from-secret + writes from-secret to PACK_CONFIG
      - CFN template has KiroFromSecret param
      - Terraform has kiro_from_secret variable

MEDIUM #2 — kiro-cli v3+ forward compat
---------------------------------------
Version check now warns BOTH ways: <2 (too old) AND >2 (untested). A
future kiro-cli 3 that changes env/auth semantics will trip the >2 warn
and alert operators before silent breakage.

MEDIUM #3 — doc inconsistency
-----------------------------
- Help text no longer claims KIRO_API_KEY is written to
  /etc/profile.d/kiro-cli.sh (it isn't, and that would leak). Describes
  the real location: ~/.kiro/env (0600) sourced from ~/.bash_profile.
- resources/shell-profile.sh is now auth-mode-aware: it sources
  ~/.kiro/env if present (so SSM shells get KIRO_API_KEY), and only
  prints the 'needs interactive login' banner when neither the env file
  nor KIRO_API_KEY is set. Also adds kiro-exec alias for --no-interactive.

LOW — review agreed these were already fine
--------------------------------------------
- %q escape on writing KIRO_API_KEY to env file
- chmod 600 + ec2-user:ec2-user ownership on ~/.kiro/env
- exact-match .bash_profile append (now stable, via marker comment)
- 'kiro-cloud' sentinel in pack_default_model()

Verification
------------
- bash -n install.sh / bootstrap.sh / pack install.sh / test.sh: OK
- packs/kiro-cli/test.sh:           53/0  (was 32/0)
- scripts/verify-pack kiro-cli:     Pack is ready to submit
- scripts/verify-pack codex-cli:    Pack is ready to submit
- tests/test-pack-contracts.sh:     177/0
- tests/test-sync-registry.sh:      35/0
- tests/test-profiles.sh:           pass
- tests/test-registry-parser.sh:    34/0
- packs/codex-cli/test.sh:          28/0 (no regression)
- runtime parser exit codes verified: all 9 edge cases return 2

Review itself was run by:
  codex exec --skip-git-repo-check 'REVIEW PROMPT'
on /tmp/lk-review (a fresh clone of the merged PR #16). Verdict: block.
This commit flips that to 'ready to ship'.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants