Skip to content

perf(plugins): increase discovery cache TTL to reduce redundant filesystem scans during startup#48439

Open
who96 wants to merge 1 commit intoopenclaw:mainfrom
who96:fix/discovery-cache-ttl-48380
Open

perf(plugins): increase discovery cache TTL to reduce redundant filesystem scans during startup#48439
who96 wants to merge 1 commit intoopenclaw:mainfrom
who96:fix/discovery-cache-ttl-48380

Conversation

@who96
Copy link
Copy Markdown

@who96 who96 commented Mar 16, 2026

Summary

Partially mitigates #48380 (discovery-layer caching only)

The 1-second plugin discovery cache TTL (DEFAULT_DISCOVERY_CACHE_MS) expired mid-startup when the gateway boot sequence took longer than 1 second, causing redundant filesystem scans for 71+ plugin directories during startup.

This PR does not address the registry-level cache-key divergence described in #48380 — that requires changes to the startup call sites in server-plugins.ts, runtime-plugins.ts, and channel-resolution.ts to unify cache keys or apply onlyPluginIds filtering. That work should be tracked separately.

Changes

src/plugins/discovery.ts

  • Bump DEFAULT_DISCOVERY_CACHE_MS from 1000 to 5000
  • 5 seconds covers the full gateway startup sequence while remaining short enough to pick up changes during explicit config-reload scenarios
  • The existing OPENCLAW_PLUGIN_DISCOVERY_CACHE_MS and OPENCLAW_DISABLE_PLUGIN_DISCOVERY_CACHE env var overrides continue to work

src/plugins/loader.ts

  • Add debug-level diagnostic logs for plugin registry cache hits and misses
    • Cache hit: logs plugin count
    • Cache miss: logs enabled/total plugin counts and elapsed load time
  • These logs help diagnose startup performance regressions by revealing how many full load cycles actually occur

Testing

  • All existing tests pass (discovery.test.ts, loader.test.ts, server-plugins.test.ts)
  • 1 pre-existing test failure on upstream/main (does not prefer setupEntry for configured channel loads without startup opt-in) is unrelated to this change

Scope

This is a partial mitigation targeting the discovery layer only. The deeper root cause — registry-level cache key divergence caused by different callers passing different preferSetupRuntimeForChannelPlugins / onlyPluginIds combinations — requires a separate effort to unify the startup call sites.

@greptile-apps
Copy link
Copy Markdown
Contributor

greptile-apps bot commented Mar 16, 2026

Greptile Summary

This PR increases the plugin discovery cache TTL from 1 s to 5 s and adds debug-level logs for cache hits and misses in the plugin loader. Together the changes address redundant filesystem scans when the gateway boot sequence exceeds the old 1-second window.

The TTL bump is a safe, targeted fix: the existing OPENCLAW_PLUGIN_DISCOVERY_CACHE_MS and OPENCLAW_DISABLE_PLUGIN_DISCOVERY_CACHE env-var escape hatches remain intact, so operators can override or disable the cache in CI or dev workflows without code changes.

The debug logging is a net positive for diagnosing future startup regressions, though two style concerns have been raised in existing review threads:

  • The (cache miss) label is printed even when cacheEnabled is false (i.e. options.cache = false), making the label inaccurate for non-caching callers such as loadOnboardingPluginRegistry().
  • The cache-hit log at line 838 still uses logger.debug?.() with a template literal, so the string is eagerly evaluated even when debug logging is off. The cache-miss path correctly uses an if (logger.debug) guard to avoid this, but the two paths are inconsistent.

Confidence Score: 4/5

  • This PR is safe to merge; the TTL change is low-risk and the debug logging is additive.
  • The core fix (bumping DEFAULT_DISCOVERY_CACHE_MS to 5000) is a minimal, well-justified change with env-var override safety valves already in place. The debug logging is purely observational and does not affect the load or caching logic. The only open concerns are pre-existing style nits already tracked in review threads (label accuracy and eager evaluation in the cache-hit log path), neither of which affects correctness or runtime behaviour.
  • No files require special attention beyond the two style nits in src/plugins/loader.ts already flagged in existing threads.

Last reviewed commit: 13bca3f

Comment on lines +1353 to +1356
const enabledCount = registry.plugins.filter((p) => p.enabled).length;
logger.debug?.(
`[plugins] loaded ${enabledCount}/${registry.plugins.length} plugins in ${Date.now() - loadStartMs}ms (cache miss)`,
);
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Misleading "(cache miss)" label when caching is disabled

The (cache miss) annotation is printed unconditionally — even when cacheEnabled is false (e.g. callers that pass options.cache = false, such as loadOnboardingPluginRegistry()). In that case no cache lookup was ever attempted, so calling it a "cache miss" is inaccurate and could confuse someone reading the logs.

Consider guarding the label on the actual cache state:

Suggested change
const enabledCount = registry.plugins.filter((p) => p.enabled).length;
logger.debug?.(
`[plugins] loaded ${enabledCount}/${registry.plugins.length} plugins in ${Date.now() - loadStartMs}ms (cache miss)`,
);
const enabledCount = registry.plugins.filter((p) => p.enabled).length;
const cacheLabel = cacheEnabled ? "cache miss" : "cache disabled";
logger.debug?.(
`[plugins] loaded ${enabledCount}/${registry.plugins.length} plugins in ${Date.now() - loadStartMs}ms (${cacheLabel})`,
);
Prompt To Fix With AI
This is a comment left during a code review.
Path: src/plugins/loader.ts
Line: 1353-1356

Comment:
**Misleading "(cache miss)" label when caching is disabled**

The `(cache miss)` annotation is printed unconditionally — even when `cacheEnabled` is `false` (e.g. callers that pass `options.cache = false`, such as `loadOnboardingPluginRegistry()`). In that case no cache lookup was ever attempted, so calling it a "cache miss" is inaccurate and could confuse someone reading the logs.

Consider guarding the label on the actual cache state:

```suggestion
  const enabledCount = registry.plugins.filter((p) => p.enabled).length;
  const cacheLabel = cacheEnabled ? "cache miss" : "cache disabled";
  logger.debug?.(
    `[plugins] loaded ${enabledCount}/${registry.plugins.length} plugins in ${Date.now() - loadStartMs}ms (${cacheLabel})`,
  );
```

How can I resolve this? If you propose a fix, please make it concise.

@who96 who96 force-pushed the fix/discovery-cache-ttl-48380 branch from 682418d to 1824a0d Compare March 16, 2026 18:55
@who96
Copy link
Copy Markdown
Author

who96 commented Mar 16, 2026

@greptileai review

Comment on lines +1353 to +1357
const enabledCount = registry.plugins.filter((p) => p.enabled).length;
const cacheLabel = cacheEnabled ? "cache miss" : "cache disabled";
logger.debug?.(
`[plugins] loaded ${enabledCount}/${registry.plugins.length} plugins in ${Date.now() - loadStartMs}ms (${cacheLabel})`,
);
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Eager evaluation of debug arguments

The template literal passed to logger.debug?.() — including the .filter() call to compute enabledCount and the Date.now() subtraction — is always evaluated before the optional-chained call, even when debug logging is disabled. While the cost per call is negligible, it's inconsistent with the intent of optional-chaining a debug log (i.e. to be a no-op when debug is off).

A guard block avoids the unnecessary work:

Suggested change
const enabledCount = registry.plugins.filter((p) => p.enabled).length;
const cacheLabel = cacheEnabled ? "cache miss" : "cache disabled";
logger.debug?.(
`[plugins] loaded ${enabledCount}/${registry.plugins.length} plugins in ${Date.now() - loadStartMs}ms (${cacheLabel})`,
);
if (logger.debug) {
const enabledCount = registry.plugins.filter((p) => p.enabled).length;
const cacheLabel = cacheEnabled ? "cache miss" : "cache disabled";
logger.debug(
`[plugins] loaded ${enabledCount}/${registry.plugins.length} plugins in ${Date.now() - loadStartMs}ms (${cacheLabel})`,
);
}
Prompt To Fix With AI
This is a comment left during a code review.
Path: src/plugins/loader.ts
Line: 1353-1357

Comment:
**Eager evaluation of debug arguments**

The template literal passed to `logger.debug?.()` — including the `.filter()` call to compute `enabledCount` and the `Date.now()` subtraction — is always evaluated before the optional-chained call, even when debug logging is disabled. While the cost per call is negligible, it's inconsistent with the intent of optional-chaining a debug log (i.e. to be a no-op when debug is off).

A guard block avoids the unnecessary work:

```suggestion
  if (logger.debug) {
    const enabledCount = registry.plugins.filter((p) => p.enabled).length;
    const cacheLabel = cacheEnabled ? "cache miss" : "cache disabled";
    logger.debug(
      `[plugins] loaded ${enabledCount}/${registry.plugins.length} plugins in ${Date.now() - loadStartMs}ms (${cacheLabel})`,
    );
  }
```

How can I resolve this? If you propose a fix, please make it concise.

@who96 who96 force-pushed the fix/discovery-cache-ttl-48380 branch from 1824a0d to 13bca3f Compare March 16, 2026 19:23
@who96
Copy link
Copy Markdown
Author

who96 commented Mar 16, 2026

@greptileai review

@who96
Copy link
Copy Markdown
Author

who96 commented Mar 16, 2026

Addressed both Greptile review points:

  1. Misleading cache miss label — Fixed: now uses cacheLabel variable that prints "cache miss" when caching is enabled vs "cache disabled" when options.cache = false.

  2. Eager evaluation of debug arguments — Fixed: wrapped in if (logger.debug) { ... } guard so the .filter() call and template string are only evaluated when debug logging is actually available.

Copy link
Copy Markdown
Contributor

@obviyus obviyus left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks like a partial mitigation, not a fix for #48380.

The issue describes repeated full plugin loads caused by registry cache-key divergence across startup callers. loadOpenClawPlugins() keys the registry on onlyPluginIds, preferSetupRuntimeForChannelPlugins, and runtimeSubagentMode in src/plugins/loader.ts:325 and src/plugins/loader.ts:815, and the gateway startup path uses a different key shape from other callers in src/gateway/server-plugins.ts:186, src/gateway/server.impl.ts:512, src/agents/runtime-plugins.ts:15, and src/infra/outbound/channel-resolution.ts:54.

This PR only increases the discovery-cache TTL in src/plugins/discovery.ts:43, so it can reduce some repeated filesystem scans, but it does not address the repeated registry rebuilds that the issue calls out. Marking this as Fixes #48380 seems too strong.

There is also still a separate 1-second manifest-registry cache in src/plugins/manifest-registry.ts:68, src/plugins/manifest-registry.ts:287, and src/plugins/manifest-registry.ts:429. Since loadOpenClawPlugins() always goes through loadPluginManifestRegistry() after discovery (src/plugins/loader.ts:942), slower startups can still end up reparsing all manifests after the original 1-second window even if discovery stays warm.

I think the better direction is to fix the startup call sites / registry caching directly: use the existing onlyPluginIds / preferSetupRuntimeForChannelPlugins machinery to avoid full loads for unneeded plugins, or otherwise unify the registry cache behavior. If we keep this TTL bump, it should be framed as partial mitigation and tested/measured as such, not as the fix for the issue.

@who96 who96 force-pushed the fix/discovery-cache-ttl-48380 branch from 13bca3f to 80a18f1 Compare March 17, 2026 07:24
@who96 who96 changed the title fix(plugins): increase discovery cache TTL to cover gateway startup window perf(plugins): increase discovery cache TTL to reduce redundant filesystem scans during startup Mar 17, 2026
@who96
Copy link
Copy Markdown
Author

who96 commented Mar 17, 2026

You're right. Saying "Fixes #48380" overstated what this PR does, so I've updated the description to call it a partial mitigation in the discovery layer.

The cache-key divergence in the registry path (onlyPluginIds / preferSetupRuntimeForChannelPlugins varying across startup callers) looks like separate work and should be handled in its own PR. The manifest-registry 1s TTL is probably worth revisiting there as well.

For this PR, the 5s TTL should reduce repeated filesystem scans during startup, and the added debug logging should make it easier to see how often the registry is being rebuilt. That should give us better data for the follow-up.

…indow

The 1-second discovery cache expired mid-startup, causing redundant
filesystem scans for 71+ plugin directories during startup sequences.

Changes:
- Bump DEFAULT_DISCOVERY_CACHE_MS from 1000 to 5000 to cover the full
  gateway startup sequence while remaining short enough for explicit
  config-reload scenarios.
- Add debug-level diagnostic logs for plugin registry cache hits and
  misses, including enabled/total plugin counts and elapsed time for
  full load cycles. This aids diagnosis of startup performance
  regressions.

The OPENCLAW_PLUGIN_DISCOVERY_CACHE_MS and
OPENCLAW_DISABLE_PLUGIN_DISCOVERY_CACHE env var overrides still work
as before.

Closes openclaw#48380
@who96 who96 force-pushed the fix/discovery-cache-ttl-48380 branch from 440a058 to ec5c69b Compare March 19, 2026 15:14
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants