Skip to content

feat: add automatic config rollback on gateway startup failure#44238

Open
jung-geun wants to merge 27 commits intoopenclaw:mainfrom
jung-geun:feature/config-rollback
Open

feat: add automatic config rollback on gateway startup failure#44238
jung-geun wants to merge 27 commits intoopenclaw:mainfrom
jung-geun:feature/config-rollback

Conversation

@jung-geun
Copy link
Copy Markdown

Summary

Implements automatic config rollback on gateway startup failure, addressing #44132.

Changes

1. Config Backup Utilities (src/config/config-backup-restore.ts)

  • listConfigBackups() - List all available backups
  • createConfigBackup() - Create manual backup with optional label
  • restoreConfigBackup() - Restore from backup
  • attemptConfigRollback() - Auto-rollback on validation failure
  • cleanupFailedBackups() - Cleanup old failed backups
  • getBackupStats() - Get backup statistics

2. Gateway Config Option (src/config/types.gateway.ts)

configBackup?: {
  enabled?: boolean;      // default: true
  keepBackups?: number;   // default: 10
  autoRollback?: boolean; // default: true
}

3. Gateway Startup Integration (src/gateway/server.impl.ts)

  • Integrated automatic rollback at config validation failure points
  • Logs rollback action and re-validates after restore
  • Continues startup if rollback succeeds

4. CLI Commands (src/cli/config-cli.ts)

  • openclaw config backup [--label NAME] - Create manual backup
  • openclaw config restore --last - Restore from most recent backup
  • openclaw config restore --backup-path PATH - Restore from specific file
  • openclaw config restore --dry-run - Preview without changes
  • openclaw config list-backups [--json] - List available backups

5. Tests (src/config/config-backup-restore.test.ts)

  • Comprehensive tests for all functions
  • Integration tests for backup/restore cycle

How It Works

  1. When gateway starts, it validates config
  2. If validation fails and autoRollback is enabled (default):
    • Attempts to restore from newest valid backup
    • Logs the rollback action
    • Re-validates the restored config
    • Continues startup if valid, otherwise throws error

Testing

# Build
pnpm run build

# Run tests
pnpm test -- src/config/config-backup-restore.test.ts

All 25 tests pass.

Fixes #44132

@openclaw-barnacle openclaw-barnacle bot added gateway Gateway runtime cli CLI command changes size: XL labels Mar 12, 2026
@greptile-apps
Copy link
Copy Markdown
Contributor

greptile-apps bot commented Mar 12, 2026

Greptile Summary

This PR adds automatic config rollback on gateway startup failure, introducing a new config-backup-restore.ts module, CLI subcommands (backup, restore, list-backups), a configBackup gateway config option, and integration into server.impl.ts.

Issues found:

  • Timestamped backups are invisible to the rollback mechanismlistConfigBackups uses the regex ^${base}\.bak\.(\d+)$, which only matches files with purely numeric suffixes (e.g., .bak.1). The CLI openclaw config backup command calls createConfigBackup with timestamp: true, which produces files like config.json.bak.2024-01-15_00-00-00. These files never match the regex, so they are silently skipped by listConfigBackups, openclaw config list-backups, and attemptConfigRollback. CLI-created backups provide no rollback protection.

  • openclaw config restore --last does not restore from the most recent backuprunConfigRestore calls listConfigBackups (which returns backups sorted newest-first) to confirm backups exist, but then calls restoreConfigBackup without passing any backupPath. restoreConfigBackup defaults to the hard-coded .bak file (line 161 of config-backup-restore.ts). If .bak exists but .bak.3 is newer, --last silently restores from the older file.

  • keepBackups is a no-op — The option is exposed in ConfigBackupOptions and GatewayConfigBackup, and documented as limiting the number of retained backups, but createConfigBackup never reads it or prunes old backups. Backups will accumulate indefinitely.

Confidence Score: 2/5

  • Not safe to merge — the core rollback mechanism is broken for the primary backup creation path used by the CLI.
  • Two outright logic bugs mean the feature does not work as advertised. CLI-created backups (timestamped) are invisible to the rollback system due to a regex mismatch, and --last restores from the wrong backup. The gateway integration code is structurally sound, but its effectiveness depends entirely on the broken discovery logic.
  • src/config/config-backup-restore.ts (regex bug, keepBackups unimplemented) and src/cli/config-cli.ts (runConfigRestore passes wrong path to restore).
Prompt To Fix All With AI
This is a comment left during a code review.
Path: src/config/config-backup-restore.ts
Line: 82

Comment:
**Timestamped backups are invisible to `listConfigBackups`**

The regex `^${base}\.bak\.(\d+)$` only matches files with purely numeric suffixes (e.g., `config.json.bak.1`, `config.json.bak.2`). However, `createConfigBackup` with `timestamp: true` (the default used by the CLI `backup` command) writes files in the format `config.json.bak.2024-01-15_00-00-00` or `config.json.bak.2024-01-15_00-00-00-label` (see lines 135–138).

These timestamp-based backup files will never match this regex. As a consequence:
- `openclaw config list-backups` will never show backups created via `openclaw config backup`
- `attemptConfigRollback` will never try these backups when rolling back on gateway startup failure
- The entire auto-rollback feature silently fails for CLI-created backups

The regex needs to be broadened to match all backup file variants:
```suggestion
      const match = entry.match(new RegExp(`^${base}\\.bak\\.(.+)$`));
```
However, this alone also means the `failed-*` and primary `.bak` entries might collide, so consider restricting to non-`.failed-` suffixes.

How can I resolve this? If you propose a fix, please make it concise.

---

This is a comment left during a code review.
Path: src/cli/config-cli.ts
Line: 571-573

Comment:
**`--last` doesn't actually restore from the most recent backup**

`listConfigBackups` is called above (line 564) to check that at least one backup exists and it returns results sorted newest-first. But the `backups[0].path` is never passed to `restoreConfigBackup`. Instead, `restoreConfigBackup` is called with no `backupPath`, causing it to default to `${configPath}.bak` (the hard-coded primary backup, see `config-backup-restore.ts` line 161).

If `.bak` exists but a newer backup like `.bak.3` is available, `--last` silently restores from the older `.bak` file — the opposite of what the user expects.

Fix by passing the most recent backup path explicitly:
```suggestion
    const result = await restoreConfigBackup(snapshot.path, {
      backupPath: backups[0].path,
      dryRun: opts.dryRun,
    });
```

How can I resolve this? If you propose a fix, please make it concise.

---

This is a comment left during a code review.
Path: src/config/config-backup-restore.ts
Line: 16-20

Comment:
**`keepBackups` option is accepted but never enforced**

`ConfigBackupOptions.keepBackups` (and `GatewayConfigBackup.keepBackups` in `types.gateway.ts`) is documented as "Number of backup files to keep (default: 10)", but `createConfigBackup` never reads this value or prunes old backups. Over time, numbered backups will accumulate indefinitely regardless of what the user configures.

Consider adding a pruning step at the end of `createConfigBackup` that lists existing numbered backups and deletes the oldest ones beyond the configured limit.

How can I resolve this? If you propose a fix, please make it concise.

Last reviewed commit: 5bd5377

Comment thread src/config/config-backup-restore.ts Outdated
try {
const entries = await ioFs.readdir(dir);
for (const entry of entries) {
const match = entry.match(new RegExp(`^${base}\\.bak\\.(\\d+)$`));
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Timestamped backups are invisible to listConfigBackups

The regex ^${base}\.bak\.(\d+)$ only matches files with purely numeric suffixes (e.g., config.json.bak.1, config.json.bak.2). However, createConfigBackup with timestamp: true (the default used by the CLI backup command) writes files in the format config.json.bak.2024-01-15_00-00-00 or config.json.bak.2024-01-15_00-00-00-label (see lines 135–138).

These timestamp-based backup files will never match this regex. As a consequence:

  • openclaw config list-backups will never show backups created via openclaw config backup
  • attemptConfigRollback will never try these backups when rolling back on gateway startup failure
  • The entire auto-rollback feature silently fails for CLI-created backups

The regex needs to be broadened to match all backup file variants:

Suggested change
const match = entry.match(new RegExp(`^${base}\\.bak\\.(\\d+)$`));
const match = entry.match(new RegExp(`^${base}\\.bak\\.(.+)$`));

However, this alone also means the failed-* and primary .bak entries might collide, so consider restricting to non-.failed- suffixes.

Prompt To Fix With AI
This is a comment left during a code review.
Path: src/config/config-backup-restore.ts
Line: 82

Comment:
**Timestamped backups are invisible to `listConfigBackups`**

The regex `^${base}\.bak\.(\d+)$` only matches files with purely numeric suffixes (e.g., `config.json.bak.1`, `config.json.bak.2`). However, `createConfigBackup` with `timestamp: true` (the default used by the CLI `backup` command) writes files in the format `config.json.bak.2024-01-15_00-00-00` or `config.json.bak.2024-01-15_00-00-00-label` (see lines 135–138).

These timestamp-based backup files will never match this regex. As a consequence:
- `openclaw config list-backups` will never show backups created via `openclaw config backup`
- `attemptConfigRollback` will never try these backups when rolling back on gateway startup failure
- The entire auto-rollback feature silently fails for CLI-created backups

The regex needs to be broadened to match all backup file variants:
```suggestion
      const match = entry.match(new RegExp(`^${base}\\.bak\\.(.+)$`));
```
However, this alone also means the `failed-*` and primary `.bak` entries might collide, so consider restricting to non-`.failed-` suffixes.

How can I resolve this? If you propose a fix, please make it concise.

Comment thread src/cli/config-cli.ts
Comment on lines +571 to +573
const result = await restoreConfigBackup(snapshot.path, {
dryRun: opts.dryRun,
});
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

--last doesn't actually restore from the most recent backup

listConfigBackups is called above (line 564) to check that at least one backup exists and it returns results sorted newest-first. But the backups[0].path is never passed to restoreConfigBackup. Instead, restoreConfigBackup is called with no backupPath, causing it to default to ${configPath}.bak (the hard-coded primary backup, see config-backup-restore.ts line 161).

If .bak exists but a newer backup like .bak.3 is available, --last silently restores from the older .bak file — the opposite of what the user expects.

Fix by passing the most recent backup path explicitly:

Suggested change
const result = await restoreConfigBackup(snapshot.path, {
dryRun: opts.dryRun,
});
const result = await restoreConfigBackup(snapshot.path, {
backupPath: backups[0].path,
dryRun: opts.dryRun,
});
Prompt To Fix With AI
This is a comment left during a code review.
Path: src/cli/config-cli.ts
Line: 571-573

Comment:
**`--last` doesn't actually restore from the most recent backup**

`listConfigBackups` is called above (line 564) to check that at least one backup exists and it returns results sorted newest-first. But the `backups[0].path` is never passed to `restoreConfigBackup`. Instead, `restoreConfigBackup` is called with no `backupPath`, causing it to default to `${configPath}.bak` (the hard-coded primary backup, see `config-backup-restore.ts` line 161).

If `.bak` exists but a newer backup like `.bak.3` is available, `--last` silently restores from the older `.bak` file — the opposite of what the user expects.

Fix by passing the most recent backup path explicitly:
```suggestion
    const result = await restoreConfigBackup(snapshot.path, {
      backupPath: backups[0].path,
      dryRun: opts.dryRun,
    });
```

How can I resolve this? If you propose a fix, please make it concise.

Comment on lines +16 to +20
export interface ConfigBackupOptions {
keepBackups?: number;
timestamp?: boolean;
label?: string;
}
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

keepBackups option is accepted but never enforced

ConfigBackupOptions.keepBackups (and GatewayConfigBackup.keepBackups in types.gateway.ts) is documented as "Number of backup files to keep (default: 10)", but createConfigBackup never reads this value or prunes old backups. Over time, numbered backups will accumulate indefinitely regardless of what the user configures.

Consider adding a pruning step at the end of createConfigBackup that lists existing numbered backups and deletes the oldest ones beyond the configured limit.

Prompt To Fix With AI
This is a comment left during a code review.
Path: src/config/config-backup-restore.ts
Line: 16-20

Comment:
**`keepBackups` option is accepted but never enforced**

`ConfigBackupOptions.keepBackups` (and `GatewayConfigBackup.keepBackups` in `types.gateway.ts`) is documented as "Number of backup files to keep (default: 10)", but `createConfigBackup` never reads this value or prunes old backups. Over time, numbered backups will accumulate indefinitely regardless of what the user configures.

Consider adding a pruning step at the end of `createConfigBackup` that lists existing numbered backups and deletes the oldest ones beyond the configured limit.

How can I resolve this? If you propose a fix, please make it concise.

Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 5bd53776b5

ℹ️ About Codex in GitHub

Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".

Comment thread src/config/config-backup-restore.ts Outdated
Comment on lines +167 to +169
// Try numbered backups if primary doesn't exist
const backups = await listConfigBackups(configPath, ioFs);
if (backups.length === 0) {
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1 Badge Fail when requested backup path is missing

If options.backupPath is provided and that file does not exist, this branch falls back to listConfigBackups() and restores the newest backup instead of returning an error. In openclaw config restore --backup-path ..., a typo can therefore silently overwrite the active config with an unintended backup, which is destructive; fallback should only happen for the implicit default path.

Useful? React with 👍 / 👎.

Comment thread src/config/config-backup-restore.ts Outdated
Comment on lines +233 to +236
// Validate the backup content is valid JSON
try {
JSON.parse(content);
} catch {
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1 Badge Accept JSON5 backups during rollback validation

Rollback currently validates candidate backups with JSON.parse, but OpenClaw config files are JSON5 (comments/trailing commas are valid via readConfigFileSnapshot parsing). A perfectly valid backup containing JSON5 syntax is skipped as "invalid," which can make automatic rollback fail even when a recoverable backup exists.

Useful? React with 👍 / 👎.

Comment thread src/config/config-backup-restore.ts Outdated
try {
const entries = await ioFs.readdir(dir);
for (const entry of entries) {
const match = entry.match(new RegExp(`^${base}\\.bak\\.(\\d+)$`));
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Match timestamped backup filenames in backup listing

This regex only matches numeric suffixes like .bak.1, but createConfigBackup(..., { timestamp: true }) (used by openclaw config backup) writes files like .bak.2026-03-12_16-39-40. Those backups are never returned by listConfigBackups, so list/restore paths that depend on this listing cannot see or restore manual timestamped backups.

Useful? React with 👍 / 👎.

* When enabled, the gateway automatically backs up config before changes
* and can rollback to a known-good backup if startup fails due to config errors.
*/
configBackup?: GatewayConfigBackup;
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1 Badge Wire gateway.configBackup into config validation schema

This adds gateway.configBackup to the TypeScript config type, but the gateway validator still uses a strict schema (src/config/zod-schema.ts) that does not include this key. As a result, adding gateway.configBackup in user config makes the config invalid and can block startup, so the new setting cannot actually be used.

Useful? React with 👍 / 👎.

Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: bb4b1d51c8

ℹ️ About Codex in GitHub

Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".

Comment thread src/config/config-backup-restore.ts Outdated
Comment on lines +264 to +267
// Validate the backup content is valid JSON
try {
JSON.parse(content);
} catch {
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1 Badge Try older backups when candidate fails schema validation

attemptConfigRollback treats any syntactically valid JSON backup as a success and returns immediately, but gateway startup re-validates against the full config schema afterward. If the newest backup parses as JSON but is still schema-invalid (for example wrong types or unsupported keys), startup aborts without trying older backups, even when a recoverable backup exists later in the list.

Useful? React with 👍 / 👎.

Comment on lines +140 to +142
const backupPath = options.timestamp
? `${configPath}.bak.${timestamp}${label}`
: `${configPath}.bak`;
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Keep manual timestamped backups compatible with rotation cleanup

Manual backups are written as *.bak.<timestamp><label>, but normal config writes run the existing backup maintenance cleanup that preserves only .bak plus numbered .bak.N files. In practice, a later config set/unset write will delete these timestamped manual backups, so the new config backup output is not durable for restore workflows.

Useful? React with 👍 / 👎.

@jung-geun jung-geun force-pushed the feature/config-rollback branch from c31c844 to 6e7456c Compare March 13, 2026 10:43
Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 6e7456c8a0

ℹ️ About Codex in GitHub

Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".

Comment thread src/config/config-backup-restore.ts Outdated
// Restore from backup
try {
const content = await ioFs.readFile(backupPath);
await ioFs.writeFile(configPath, content);
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Write restored config with owner-only permissions

restoreConfigBackup restores by calling ioFs.writeFile(configPath, content) without enforcing mode 0600. If the config file is missing (a common recovery path), Node creates a new file using the process umask (typically 0644), which can expose gateway tokens/API keys to other local users. This bypasses the normal secure write path (writeConfigFile) that hardens config permissions, so restore should also set owner-only permissions.

Useful? React with 👍 / 👎.

@jung-geun
Copy link
Copy Markdown
Author

✅ 리뷰 피드백 반영 완료

수정된 사항들

commit 6e7456c8a - Greptile 리뷰 피드백 반영:

  • ✅ 정규식 수정: ^.bak\\.(.+)$로 타임스탬프 백업 지원
  • --last 플래그: backupPath: backups[0].path 사용하여 가장 최신 백업 복원
  • keepBackups 구현: pruneOldBackups() 함수 추가

commit bef2dc58e - CI TypeScript 에러 수정:

  • theme.bold 속성 추가

CI 상태

  • check 통과
  • checks (node, test) 통과
  • checks-windows 통과
  • ⚠️ extensions 실패 - Telegram 기존 flaky test (PR 무관)

모든 리뷰 피드백이 반영되었습니다.

@jung-geun jung-geun force-pushed the feature/config-rollback branch from bef2dc5 to 2720dcf Compare March 15, 2026 03:02
Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 07557494ce

ℹ️ About Codex in GitHub

Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".

Comment thread src/gateway/server.impl.ts Outdated
Comment on lines +321 to +322
log.warn("gateway: config validation failed, attempting automatic rollback...");
const rollbackResult = await attemptConfigRollback(configSnapshot.path);
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1 Badge Move rollback ahead of gateway CLI config load

This rollback path is not reached in the main openclaw gateway run startup flow: src/cli/gateway-cli/run.ts:201 calls loadConfig() before startGatewayServer(...), and loadConfig fail-closes on invalid configs, so malformed/invalid config exits before this new rollback logic can execute. As shipped, users hitting startup failures from invalid config will still fail hard instead of getting automatic rollback.

Useful? React with 👍 / 👎.

Comment thread src/config/config-backup-restore.ts Outdated
? `${configPath}.bak.${timestamp}${label}`
: `${configPath}.bak`;

await ioFs.writeFile(backupPath, configContent);
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Save manual backups with 0600 permissions

Manual backups are created with writeFile and no explicit mode, so new files use the process umask (commonly resulting in 0644). Because config backups contain gateway auth tokens and provider keys, this can expose secrets to other local users on shared systems; the manual backup write path should enforce owner-only permissions like the regular config write path.

Useful? React with 👍 / 👎.

Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: b64233c913

ℹ️ About Codex in GitHub

Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".

Comment on lines +287 to +289
// Validate against the config schema
const schemaResult = OpenClawSchema.safeParse(parsed);
if (!schemaResult.success) {
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1 Badge Validate rollback backups with the same config pipeline

attemptConfigRollback accepts/rejects candidates with OpenClawSchema.safeParse on raw JSON5, but normal startup validation goes through readConfigFileSnapshot/validateConfigObjectWithPlugins after include resolution. This mismatch means valid backups that rely on supported features like $include (or plugin-aware validation paths) can be skipped as "invalid," causing rollback to report no recoverable backup even though one exists and would load normally after restore.

Useful? React with 👍 / 👎.

@openclaw-barnacle openclaw-barnacle bot added extensions: memory-core Extension: memory-core agents Agent runtime and tooling labels Mar 15, 2026
Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 652eba0d7a

ℹ️ About Codex in GitHub

Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".

Comment thread src/memory/manager-sync-ops.ts Outdated
Comment on lines +708 to +711
this.workspaceDir,
this.settings.extraPaths,
this.settings.multimodal,
this.userId,
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1 Badge Persist isolated memory indexes in separate sqlite files

When userId is passed here, each manager sync only indexes that user's files, but the underlying SQLite path is still shared per agent (settings.store.path), and syncMemoryFiles later prunes any source='memory' rows not in the current activePaths. With isolation enabled and two users on the same agent, one user's sync can delete the other user's indexed rows, leading to missing or incorrect memory search results until another full sync runs.

Useful? React with 👍 / 👎.

Comment thread src/config/types.memory.ts Outdated
Comment on lines +24 to +28
/**
* User-specific memory isolation settings.
* When enabled, each user gets their own memory subdirectory.
*/
isolation?: MemoryIsolationConfig;
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1 Badge Allow memory.isolation in strict config schema

This adds memory.isolation to the public config type, but the strict MemorySchema validator still only accepts backend, citations, and qmd, so any user who sets memory.isolation.enabled will get a config validation failure at startup. Since runtime code now reads cfg.memory?.isolation, this setting is currently documented and consumed but not actually valid in config files.

Useful? React with 👍 / 👎.

? `${configPath}.bak.${timestamp}${label}`
: `${configPath}.bak`;

await fs.writeFile(backupPath, configContent, { mode: 0o600 });
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Route backup writes through the injected filesystem

createConfigBackup accepts an injected ioFs for filesystem abstraction, but this write bypasses it and always touches the real Node filesystem. In mocked/custom FS contexts, backup creation can write to disk while later stat/prune calls still use ioFs, causing inconsistent behavior and failing isolation in tests or alternate storage adapters.

Useful? React with 👍 / 👎.

Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: d083884734

ℹ️ About Codex in GitHub

Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".

Comment thread src/sessions/session-key-utils.ts Outdated
Comment on lines +146 to +150
const tokens = parsed.rest.split(":").filter(Boolean);
// direct:userId format check
const directIdx = tokens.indexOf("direct");
if (directIdx !== -1 && directIdx + 1 < tokens.length) {
const userId = tokens[directIdx + 1];
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1 Badge Parse full direct-session user IDs after direct: marker

extractUserIdFromSessionKey splits parsed.rest on : and returns only the single token after direct, which truncates valid IDs that contain colons. Matrix direct peers use full Matrix user IDs (for example @alice:example.org from extensions/matrix/src/matrix/monitor/handler.ts), so different users can collapse to the same extracted ID (@alice), causing per-user memory isolation collisions and cross-user memory recall leakage.

Useful? React with 👍 / 👎.

Comment thread src/cli/gateway-cli/run.ts Outdated
Comment on lines +220 to +223
const configBackupSettings = preflightSnapshot.config?.gateway?.configBackup;
const autoRollbackEnabled = configBackupSettings?.autoRollback !== false; // default true

if (autoRollbackEnabled) {
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Respect configBackup.enabled before attempting rollback

The startup preflight currently gates rollback only on configBackup.autoRollback and ignores configBackup.enabled. When users set gateway.configBackup.enabled: false, this path still runs attemptConfigRollback(...) and can overwrite the current config despite the feature being explicitly disabled, so the new disable switch is not actually enforced.

Useful? React with 👍 / 👎.

Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 6e4766ea96

ℹ️ About Codex in GitHub

Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".

Comment thread src/sessions/session-key-utils.ts Outdated
Comment on lines +154 to +155
// Extract the user ID portion (everything after "direct:")
const userIdPart = rest.slice(directIdx + directMarker.length);
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Strip thread/topic suffix from extracted direct user IDs

This extractor currently returns everything after direct: verbatim, so direct session keys that carry thread suffixes (for example agent:main:telegram:direct:123456:thread:99, which is exercised in existing tests) produce a userId of 123456:thread:99 instead of 123456. That causes memory isolation to shard one user across per-thread directories/cache keys, so recall in DMs becomes inconsistent across threads for the same person.

Useful? React with 👍 / 👎.

Comment thread src/cli/gateway-cli/run.ts Outdated
Comment on lines +220 to +222
const configBackupSettings = preflightSnapshot.config?.gateway?.configBackup;
const backupEnabled = configBackupSettings?.enabled !== false; // default true
const autoRollbackEnabled = configBackupSettings?.autoRollback !== false; // default true
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Wire configBackup settings into the config write path

The new gateway.configBackup options are exposed as runtime config, but only enabled/autoRollback are read during startup rollback preflight here; the actual write path still always rotates backups with a fixed count (maintainConfigBackups + CONFIG_BACKUP_COUNT=5 in src/config/io.ts / src/config/backup-rotation.ts). As a result, gateway.configBackup.enabled: false does not stop automatic backup creation and keepBackups is ignored, so the new settings do not behave as configured.

Useful? React with 👍 / 👎.

@jung-geun jung-geun force-pushed the feature/config-rollback branch from 6e4766e to d9ed11b Compare March 17, 2026 15:07
Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: d9ed11b73f

ℹ️ About Codex in GitHub

Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".

Comment thread src/gateway/server.impl.ts Outdated
Comment on lines 1033 to 1035
tailscaleCleanup,
canvasHost,
canvasHostServer,
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1 Badge Pass plugin route cleanup callback to close handler

createGatewayRuntimeState() now returns a releasePluginRouteRegistry cleanup hook, but this startup path no longer forwards it into createGatewayCloseHandler. Because server-close.ts only releases pinned plugin HTTP routes when that callback is present, an in-process restart (runGatewayLoop fallback path) leaves the previous registry pinned, so subsequent startups in the same process can serve stale plugin routes and miss route changes until a full process restart.

Useful? React with 👍 / 👎.

jung-geun pushed a commit to jung-geun/openclaw that referenced this pull request Mar 29, 2026
When restoreConfigBackup is called with an explicit backupPath option
that does not exist, return an error instead of falling back to the
newest available backup. This prevents accidental restore from the
wrong backup when a specific file was requested.

The fallback to listConfigBackups() now only applies when using the
default .bak path (no backupPath option provided).

Addresses PR openclaw#44238 review P1-1.
jung-geun pushed a commit to jung-geun/openclaw that referenced this pull request Mar 29, 2026
Timestamped backups created with --timestamp flag were being deleted by
cleanOrphanBackups because they didn't match the numbered backup pattern.
Added TIMESTAMPED_BACKUP_PATTERN regex to identify and preserve timestamped
backups (.bak.YYYY-MM-DD_HH-MM-SS[-label]) during cleanup.

P2-2 from Codex review for PR openclaw#44238.
@jung-geun jung-geun force-pushed the feature/config-rollback branch from 18e4cf4 to 883f9b3 Compare March 29, 2026 15:42
@jung-geun
Copy link
Copy Markdown
Author

✅ Rebase 완료 - Merge Conflict 해결

브랜치를 main 기준으로 재정리했습니다.

변경 사항

이전 문제: 브랜치에 memory isolation 커밋들이 섞여 있어서 main과 충돌 발생

수정 내용:

  • feature/config-rollbackorigin/main 최신 상태로 rebase
  • config rollback 관련 커밋만 clean cherry-pick
  • memory isolation 커밋들 (324151095c, 182243d297, 9ab69ccadb) 제거 (PR #47277에서 별도 처리)
  • server.impl.ts의 중복 rollback 로직 제거 (CLI 레이어에서 처리)

주요 기능 (유지됨)

  • 자동 config 롤백 (gateway startup 실패 시)
  • config backup/restore CLI 명령어 (config backup, config restore, config list-backups)
  • configBackup.enabled AND configBackup.autoRollback 모두 체크하는 수정사항 포함

CI 재실행 예정입니다.

neo and others added 24 commits April 19, 2026 06:24
When restoreConfigBackup is called with an explicit backupPath option
that does not exist, return an error instead of falling back to the
newest available backup. This prevents accidental restore from the
wrong backup when a specific file was requested.

The fallback to listConfigBackups() now only applies when using the
default .bak path (no backupPath option provided).

Addresses PR openclaw#44238 review P1-1.
Wire gateway.configBackup into the config validation schema so users
can set this option without triggering validation errors.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
attemptConfigRollback now validates backups against the OpenClaw Zod
schema in addition to JSON5 parsing. This ensures that backups which
pass JSON syntax validation but fail schema validation (e.g., invalid
field types) are skipped in favor of older, valid backups.

Fixes P1-4 from Codex review: Try older backups when candidate fails
schema validation.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
- Add pre-flight config validation with automatic rollback support
- Move rollback logic before loadConfig() call so invalid configs can be recovered
- This ensures rollback happens before loadConfig throws on invalid config
Timestamped backups created with --timestamp flag were being deleted by
cleanOrphanBackups because they didn't match the numbered backup pattern.
Added TIMESTAMPED_BACKUP_PATTERN regex to identify and preserve timestamped
backups (.bak.YYYY-MM-DD_HH-MM-SS[-label]) during cleanup.

P2-2 from Codex review for PR openclaw#44238.
- Add null checks for issues array in preflight validation
- Update test mocks to include isNixMode: false
- Fix test snapshot to include valid: true
- Add null/undefined check in listConfigBackups to prevent TypeError
- Add null/undefined check in attemptConfigRollback
- Add null/undefined check in cleanupFailedBackups
- Escape special regex characters in filenames for safety

This fixes Windows CI failures where configPath was undefined during
config snapshot validation in tests.
… settings

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
- fix run.option-collisions.test.ts: beforeEach reset was missing valid:true
  causing preflight config check to fire before auth validation in tests
- regenerate schema.base.generated.ts to include configBackup schema

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
- run.option-collisions.test.ts: add config:{} to snapshot so
  effectiveCfg is not undefined when accessing .gateway
- providers.test.ts: revert vi.resetModules() pattern to beforeAll
  import + beforeEach reset (matches main, fixes mock isolation)

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
- MemoryIsolationConfig type + isolation field in MemoryConfig
- extractUserIdFromSessionKey utility + export via memory-host-sdk
- Platform-prefixed userId (e.g. discord:12345) via addPlatformPrefixToSenderId
- sanitizeUserId strips path traversal chars before use in fs paths
- isMemoryPath enforces per-user access control in memory/users/<uid>/
- listMemoryFiles scopes to user's directory when isolation enabled
- walkRootFiles exposes shared MEMORY.md/memory/ to all users
- MemoryIndexManager, QmdMemoryManager, getMemorySearchManager accept userId
- Cache key includes userId to prevent cross-user cache hits
- readMemoryFile passes userId for access-controlled reads
- ctx.requesterSenderId wired into createMemorySearchTool/createMemoryGetTool
- Fix Windows path comparison in multimodal reindex test

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
…ck paths

Codex P1: createMemoryGetTool was destructuring only cfg/agentId in the
execute factory, so userId was never forwarded to readAgentMemoryFile or
getMemoryManagerContextWithPurpose. With isolation enabled, any user could
read another user's memory files via the builtin backend.
- Session key: recognize dm token + preserve colon-containing user IDs
- tools.shared: replace hardcoded VALID_PLATFORMS with structure-based platform detection
- packages/memory-host-sdk/host/internal: normalizeRelPath uses path.posix.normalize to block .. traversal; add sanitizeUserId; userId-scoped isMemoryPath and listMemoryFiles; walkDirSkipping helper
- src/memory-host-sdk/host/internal,read-file: mirror userId support from packages counterpart
- zod-schema: add MemoryIsolationSchema and isolation field to MemorySchema
- search-manager: only pass userId to QMD when isolation.enabled
- config-baseline: regenerate sha256 for schema changes

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
- Remove duplicate GatewayConfigBackup type declaration
- Restore missing webchat field in GatewayConfig
- Fix non-async beforeEach using await in providers.test.ts
- Fix OpenClawConfig | undefined assignment in applyPluginAutoEnable mock
- Regenerate config-baseline.sha256 to reflect correct schema

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
@jung-geun jung-geun force-pushed the feature/config-rollback branch from 21f6d7c to 701b55e Compare April 18, 2026 21:45
@openclaw-barnacle openclaw-barnacle bot added gateway Gateway runtime and removed channel: discord Channel integration: discord channel: telegram Channel integration: telegram labels Apr 18, 2026
@jung-geun
Copy link
Copy Markdown
Author

Rebase & Fix Complete

Rebased onto origin/main and resolved all merge conflicts across 27 commits.

Conflicts resolved:

  • src/cli/config-cli.ts — merged backup/restore logic with HEAD's CLI changes
  • src/config/types.gateway.ts — merged GatewayConfigBackup type with GatewayWebchatConfig additions
  • src/gateway/server.impl.ts — merged rollback preflight with HEAD's server changes
  • extensions/memory-core/src/tools.ts — merged senderId/userId threading
  • packages/memory-host-sdk/src/host/internal.ts — kept both resolveDefaultMemoryRootFile and walkRootFiles
  • packages/memory-host-sdk/src/host/read-file.ts — merged userId with defaultLines/maxChars params
  • extensions/memory-core/src/memory/search-manager.ts — kept HEAD's loadQmdManagerModule with userId propagation

TypeScript fixes applied:

  • Removed duplicate GatewayConfigBackup type inserted during conflict resolution
  • Restored webchat?: GatewayWebchatConfig field in GatewayConfig
  • Fixed providers.test.ts async/await and type assignment

All TypeScript checks pass for touched surfaces. Pre-existing failures in extensions/codex and extensions/qa-lab are unrelated to this PR.

Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 701b55e09c

ℹ️ About Codex in GitHub

Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".

Comment thread src/config/zod-schema.ts
Comment on lines 106 to 109
command: z.string().optional(),
mcporter: MemoryQmdMcporterSchema.optional(),
searchMode: z.union([z.literal("query"), z.literal("search"), z.literal("vsearch")]).optional(),
searchTool: z.string().trim().min(1).optional(),
includeDefaultMemory: z.boolean().optional(),
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1 Badge Restore memory.qmd.searchTool in config validator

MemoryQmdSchema is strict, and this change removed the searchTool property, so configs that already use memory.qmd.searchTool now fail validation at startup as an unknown key. That is a regression because searchTool is still part of the public config type (src/config/types.memory.ts) and is still consumed by the QMD runtime (extensions/memory-core/src/memory/qmd-manager.ts) to choose the tool path.

Useful? React with 👍 / 👎.

Comment on lines 253 to 257
static async create(params: {
cfg: OpenClawConfig;
agentId: string;
userId?: string;
resolved: ResolvedMemoryBackendConfig;
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1 Badge Apply user isolation to QMD manager instances

The new userId plumbing is dropped in QmdMemoryManager.create: it is accepted but never used when constructing manager state, so QMD still operates on shared per-agent state. In the same flow, getMemorySearchManager caches QMD managers by agent/config only, which means different DM users can reuse the same manager and see cross-user memory in isolation mode.

Useful? React with 👍 / 👎.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

agents Agent runtime and tooling cli CLI command changes docs Improvements or additions to documentation extensions: memory-core Extension: memory-core gateway Gateway runtime size: XL

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Feature Request: Automatic Config Rollback on Failed Startup

1 participant