perf(plugins): increase discovery cache TTL to reduce redundant filesystem scans during startup#48439
perf(plugins): increase discovery cache TTL to reduce redundant filesystem scans during startup#48439who96 wants to merge 1 commit intoopenclaw:mainfrom
Conversation
Greptile SummaryThis PR increases the plugin discovery cache TTL from 1 s to 5 s and adds The TTL bump is a safe, targeted fix: the existing The debug logging is a net positive for diagnosing future startup regressions, though two style concerns have been raised in existing review threads:
Confidence Score: 4/5
Last reviewed commit: 13bca3f |
src/plugins/loader.ts
Outdated
| const enabledCount = registry.plugins.filter((p) => p.enabled).length; | ||
| logger.debug?.( | ||
| `[plugins] loaded ${enabledCount}/${registry.plugins.length} plugins in ${Date.now() - loadStartMs}ms (cache miss)`, | ||
| ); |
There was a problem hiding this comment.
Misleading "(cache miss)" label when caching is disabled
The (cache miss) annotation is printed unconditionally — even when cacheEnabled is false (e.g. callers that pass options.cache = false, such as loadOnboardingPluginRegistry()). In that case no cache lookup was ever attempted, so calling it a "cache miss" is inaccurate and could confuse someone reading the logs.
Consider guarding the label on the actual cache state:
| const enabledCount = registry.plugins.filter((p) => p.enabled).length; | |
| logger.debug?.( | |
| `[plugins] loaded ${enabledCount}/${registry.plugins.length} plugins in ${Date.now() - loadStartMs}ms (cache miss)`, | |
| ); | |
| const enabledCount = registry.plugins.filter((p) => p.enabled).length; | |
| const cacheLabel = cacheEnabled ? "cache miss" : "cache disabled"; | |
| logger.debug?.( | |
| `[plugins] loaded ${enabledCount}/${registry.plugins.length} plugins in ${Date.now() - loadStartMs}ms (${cacheLabel})`, | |
| ); |
Prompt To Fix With AI
This is a comment left during a code review.
Path: src/plugins/loader.ts
Line: 1353-1356
Comment:
**Misleading "(cache miss)" label when caching is disabled**
The `(cache miss)` annotation is printed unconditionally — even when `cacheEnabled` is `false` (e.g. callers that pass `options.cache = false`, such as `loadOnboardingPluginRegistry()`). In that case no cache lookup was ever attempted, so calling it a "cache miss" is inaccurate and could confuse someone reading the logs.
Consider guarding the label on the actual cache state:
```suggestion
const enabledCount = registry.plugins.filter((p) => p.enabled).length;
const cacheLabel = cacheEnabled ? "cache miss" : "cache disabled";
logger.debug?.(
`[plugins] loaded ${enabledCount}/${registry.plugins.length} plugins in ${Date.now() - loadStartMs}ms (${cacheLabel})`,
);
```
How can I resolve this? If you propose a fix, please make it concise.682418d to
1824a0d
Compare
|
@greptileai review |
src/plugins/loader.ts
Outdated
| const enabledCount = registry.plugins.filter((p) => p.enabled).length; | ||
| const cacheLabel = cacheEnabled ? "cache miss" : "cache disabled"; | ||
| logger.debug?.( | ||
| `[plugins] loaded ${enabledCount}/${registry.plugins.length} plugins in ${Date.now() - loadStartMs}ms (${cacheLabel})`, | ||
| ); |
There was a problem hiding this comment.
Eager evaluation of debug arguments
The template literal passed to logger.debug?.() — including the .filter() call to compute enabledCount and the Date.now() subtraction — is always evaluated before the optional-chained call, even when debug logging is disabled. While the cost per call is negligible, it's inconsistent with the intent of optional-chaining a debug log (i.e. to be a no-op when debug is off).
A guard block avoids the unnecessary work:
| const enabledCount = registry.plugins.filter((p) => p.enabled).length; | |
| const cacheLabel = cacheEnabled ? "cache miss" : "cache disabled"; | |
| logger.debug?.( | |
| `[plugins] loaded ${enabledCount}/${registry.plugins.length} plugins in ${Date.now() - loadStartMs}ms (${cacheLabel})`, | |
| ); | |
| if (logger.debug) { | |
| const enabledCount = registry.plugins.filter((p) => p.enabled).length; | |
| const cacheLabel = cacheEnabled ? "cache miss" : "cache disabled"; | |
| logger.debug( | |
| `[plugins] loaded ${enabledCount}/${registry.plugins.length} plugins in ${Date.now() - loadStartMs}ms (${cacheLabel})`, | |
| ); | |
| } |
Prompt To Fix With AI
This is a comment left during a code review.
Path: src/plugins/loader.ts
Line: 1353-1357
Comment:
**Eager evaluation of debug arguments**
The template literal passed to `logger.debug?.()` — including the `.filter()` call to compute `enabledCount` and the `Date.now()` subtraction — is always evaluated before the optional-chained call, even when debug logging is disabled. While the cost per call is negligible, it's inconsistent with the intent of optional-chaining a debug log (i.e. to be a no-op when debug is off).
A guard block avoids the unnecessary work:
```suggestion
if (logger.debug) {
const enabledCount = registry.plugins.filter((p) => p.enabled).length;
const cacheLabel = cacheEnabled ? "cache miss" : "cache disabled";
logger.debug(
`[plugins] loaded ${enabledCount}/${registry.plugins.length} plugins in ${Date.now() - loadStartMs}ms (${cacheLabel})`,
);
}
```
How can I resolve this? If you propose a fix, please make it concise.1824a0d to
13bca3f
Compare
|
@greptileai review |
|
Addressed both Greptile review points:
|
obviyus
left a comment
There was a problem hiding this comment.
This looks like a partial mitigation, not a fix for #48380.
The issue describes repeated full plugin loads caused by registry cache-key divergence across startup callers. loadOpenClawPlugins() keys the registry on onlyPluginIds, preferSetupRuntimeForChannelPlugins, and runtimeSubagentMode in src/plugins/loader.ts:325 and src/plugins/loader.ts:815, and the gateway startup path uses a different key shape from other callers in src/gateway/server-plugins.ts:186, src/gateway/server.impl.ts:512, src/agents/runtime-plugins.ts:15, and src/infra/outbound/channel-resolution.ts:54.
This PR only increases the discovery-cache TTL in src/plugins/discovery.ts:43, so it can reduce some repeated filesystem scans, but it does not address the repeated registry rebuilds that the issue calls out. Marking this as Fixes #48380 seems too strong.
There is also still a separate 1-second manifest-registry cache in src/plugins/manifest-registry.ts:68, src/plugins/manifest-registry.ts:287, and src/plugins/manifest-registry.ts:429. Since loadOpenClawPlugins() always goes through loadPluginManifestRegistry() after discovery (src/plugins/loader.ts:942), slower startups can still end up reparsing all manifests after the original 1-second window even if discovery stays warm.
I think the better direction is to fix the startup call sites / registry caching directly: use the existing onlyPluginIds / preferSetupRuntimeForChannelPlugins machinery to avoid full loads for unneeded plugins, or otherwise unify the registry cache behavior. If we keep this TTL bump, it should be framed as partial mitigation and tested/measured as such, not as the fix for the issue.
13bca3f to
80a18f1
Compare
|
You're right. Saying "Fixes #48380" overstated what this PR does, so I've updated the description to call it a partial mitigation in the discovery layer. The cache-key divergence in the registry path ( For this PR, the 5s TTL should reduce repeated filesystem scans during startup, and the added debug logging should make it easier to see how often the registry is being rebuilt. That should give us better data for the follow-up. |
b9d2b5c to
440a058
Compare
…indow The 1-second discovery cache expired mid-startup, causing redundant filesystem scans for 71+ plugin directories during startup sequences. Changes: - Bump DEFAULT_DISCOVERY_CACHE_MS from 1000 to 5000 to cover the full gateway startup sequence while remaining short enough for explicit config-reload scenarios. - Add debug-level diagnostic logs for plugin registry cache hits and misses, including enabled/total plugin counts and elapsed time for full load cycles. This aids diagnosis of startup performance regressions. The OPENCLAW_PLUGIN_DISCOVERY_CACHE_MS and OPENCLAW_DISABLE_PLUGIN_DISCOVERY_CACHE env var overrides still work as before. Closes openclaw#48380
440a058 to
ec5c69b
Compare
Summary
Partially mitigates #48380 (discovery-layer caching only)
The 1-second plugin discovery cache TTL (
DEFAULT_DISCOVERY_CACHE_MS) expired mid-startup when the gateway boot sequence took longer than 1 second, causing redundant filesystem scans for 71+ plugin directories during startup.This PR does not address the registry-level cache-key divergence described in #48380 — that requires changes to the startup call sites in
server-plugins.ts,runtime-plugins.ts, andchannel-resolution.tsto unify cache keys or applyonlyPluginIdsfiltering. That work should be tracked separately.Changes
src/plugins/discovery.tsDEFAULT_DISCOVERY_CACHE_MSfrom 1000 to 5000OPENCLAW_PLUGIN_DISCOVERY_CACHE_MSandOPENCLAW_DISABLE_PLUGIN_DISCOVERY_CACHEenv var overrides continue to worksrc/plugins/loader.tsdebug-level diagnostic logs for plugin registry cache hits and missesTesting
discovery.test.ts,loader.test.ts,server-plugins.test.ts)upstream/main(does not prefer setupEntry for configured channel loads without startup opt-in) is unrelated to this changeScope
This is a partial mitigation targeting the discovery layer only. The deeper root cause — registry-level cache key divergence caused by different callers passing different
preferSetupRuntimeForChannelPlugins/onlyPluginIdscombinations — requires a separate effort to unify the startup call sites.