Skip to content

perf(es): lazy-load ES schemas per endpoint to cut startup heap (#171)#218

Merged
MattDevy merged 10 commits into
mainfrom
claude/trusting-joliot-608fc9
Apr 29, 2026
Merged

perf(es): lazy-load ES schemas per endpoint to cut startup heap (#171)#218
MattDevy merged 10 commits into
mainfrom
claude/trusting-joliot-608fc9

Conversation

@MattDevy
Copy link
Copy Markdown
Contributor

Summary

Fixes #171 by switching elastic stack es ... to per-endpoint lazy schema loading, using the per-endpoint Zod layout from elastic/elastic-client-generator-js#164 (now merged).

Scenario Before (#171) After
apis.ts import heap delta 373 MB 0.27 MB
elastic stack es --help RSS 599 MB 102 MB
elastic stack es bulk --help RSS ~599 MB 165 MB
elastic stack es search --help RSS ~599 MB 166 MB
elastic stack es indices create RSS ~599 MB 216 MB

Per-leaf deltas match the spot-check table in elastic/elastic-client-generator-js#164 (bulk 34 MB, search 36 MB, ml.get-job-stats 0.9 MB, cat.indices 10 MB).

Review-ability

Split into two commits:

  1. perf(es): lazy-load ES schemas per endpoint to cut startup heap (#171) - the architecture change. 7 files, +462/-148. This is what reviewers should read.
  2. chore(es): regenerate API surface with per-endpoint Zod layout - the regenerated output (842 new files, 64 deleted). Review it by running the pipeline documented in the commit body.

How it works

  • src/es/apis.ts is no longer an eager barrel. It re-exports the metadata-only apiManifest (294 entries, no Zod refs) and adds loadEsApi() / loadEsApisInFile() for dynamic-import on demand.
  • src/es/register.ts splits into a sync registerEsCommands(defs) (explicit definitions, used by tests) and an async registerEsCommandsLazy() (production). The lazy path sniffs process.argv to identify the invoked leaf, pre-loads only that endpoint's schema so Commander can register its flags before parsing, and registers every other leaf as a stub.
  • Stubs lazy-load on demand if the sniff misses (swap-and-reparse), as defence in depth for any argv shape I haven't explicitly covered.
  • Three scripts/*.mjs post-process generator output: prepend @ts-nocheck to 588 schema files (fixes a tsc OOM at 8 GB), split namespace-grouped api files into per-endpoint, build the manifest. They live in this repo for now; elastic/elastic-client-generator-js#166 tracks folding them into the generator.

Behavioural notes for reviewers

  • registerEsCommands() no longer has a default argument. Callers must pass an explicit EsApiDefinition[] or switch to registerEsCommandsLazy(). Only one test was affected; it now uses loadAllEsApis().
  • All 294 ES commands still appear under elastic stack es --help with the same names, namespaces, and descriptions. The only user-visible change is that top-level help is now fast and light.
  • Response format, flag shape, validation errors, and JSON-Schema --help --json output are unchanged for every leaf.

Test plan

  • tsc --noEmit clean (8 GB heap; would OOM without @ts-nocheck on schemas)
  • npm test passes 867/867
  • elastic stack es --help, elastic stack es bulk --help, elastic stack es indices --help, elastic stack es indices create --help, elastic stack es search --help all render correct flags and descriptions
  • RSS measured with /usr/bin/time -lp for every case in the table above
  • Reviewer: confirm functional tests still pass (npm run test:functional:es) against a live ES. I did not run them.
  • Reviewer: spot-check a few leaf invocations against a real cluster (not just --help).

Follow-ups

Closes #171.

@MattDevy MattDevy marked this pull request as draft April 22, 2026 15:46
@github-actions
Copy link
Copy Markdown

github-actions Bot commented Apr 22, 2026

MegaLinter analysis: Success

Descriptor Linter Files Fixed Errors Warnings Elapsed time
✅ ACTION actionlint 1 0 0 0.04s
✅ BASH shellcheck 2 0 0 0.07s
✅ COPYPASTE jscpd yes no no 10.96s
✅ REPOSITORY gitleaks yes no no 81.43s
✅ REPOSITORY git_diff yes no no 0.36s
✅ REPOSITORY secretlint yes no no 13.43s
✅ REPOSITORY trivy yes no no 19.36s
✅ TYPESCRIPT eslint 301 0 0 5.89s
✅ YAML yamllint 1 0 0 0.76s

See detailed reports in MegaLinter artifacts
Set VALIDATE_ALL_CODEBASE: true in mega-linter.yml to validate all sources, not only the diff

MegaLinter is graciously provided by OX Security
Show us your support by starring ⭐ the repository

@MattDevy
Copy link
Copy Markdown
Contributor Author

Binary size impact

Flagging this up-front: the memory win comes with a real package-size regression that deserves review attention.

Measurements (clean builds, main vs this branch)

Metric Main PR Multiplier
dist/ total 23 MB 152 MB 6.6×
npm tarball (compressed) 2.4 MB 21.9 MB 9.1×
npm package unpacked 25.7 MB 196.1 MB 7.6×

Where the bytes live in dist/ (PR branch):

File type Main PR
.js 3 MB 35 MB
.d.ts 15 MB 72 MB
.js.map 2 MB 26 MB
.d.ts.map 1 MB 17 MB

Nearly all of it (146 MB of the 152 MB dist/) is src/es/apis/schemas/. 588 per-endpoint files, each with its transitive type closure inlined and compiled once per file. Source maps and declaration maps balloon disproportionately because Zod's generic types expand into a lot of detail.

Mitigation options, sized

Strategy Unpacked Complexity
Current (ships src/ + full dist/) 196 MB baseline
Drop src/ from files[] in package.json 152 MB one line
Also drop *.map files from the package 108 MB npmignore / files glob
Also drop *.d.ts* under dist/es/apis/schemas/ 40 MB postbuild rm. No public-API loss - schemas carry @ts-nocheck so their external types are already any
Drop all .d.ts* from dist/ 36 MB loses public-API types

The 40 MB target is the natural stopping point: public-API types preserved, internal schema artefacts stripped. It extends the "schemas are codegen, consumers don't introspect their types" stance that @ts-nocheck already assumes.

Proposed handling

Not on this PR, unless reviewers disagree. #171 is framed as a memory issue; the size regression is a side effect that deserves its own review cycle and its own commit history. Plan:

  1. Merge this PR as-is. The memory fix stands on its own.
  2. Open a follow-up with:
    • Strip src/ from files[].
    • Add a postbuild step (rm dist/es/apis/schemas/*.d.ts dist/es/apis/schemas/*.d.ts.map dist/es/apis/schemas/*.js.map).
    • Result: 196 MB → 40 MB unpacked, 21.9 MB → ~5 MB tarball. Still ~2× over today, but the rest is genuinely new content (294 inlined closures) that only Option A (ajv-standalone from elastic/elastic-client-generator-js#164) would compress further.

Happy to fold the mitigation into this PR as a third commit if reviewers want the size win in the same merge. It would delay merge slightly because changing files[] needs checking against supply-chain scanners, npm view tooling, and GitHub's Dependency Graph, which I haven't audited.

@JoshMock
Copy link
Copy Markdown
Member

Happy to fold the mitigation into this PR as a third commit if reviewers want the size win in the same merge.

Let's just open an issue for package size mitigation. It's a real concern, but you've already mapped a clear path to solving it and reviewing separately will be much easier.

@MattDevy
Copy link
Copy Markdown
Contributor Author

Happy to fold the mitigation into this PR as a third commit if reviewers want the size win in the same merge.

Let's just open an issue for package size mitigation. It's a real concern, but you've already mapped a clear path to solving it and reviewing separately will be much easier.

Thanks for the input Josh! I'll get this tidied up and ready to merge first thing tomorrow

The `src/es/apis.ts` barrel statically imported every namespace file,
which transitively pulled in all ~294 Zod request schemas on any
`elastic stack es ...` invocation (even `--help`). Issue #171 measured
this as a 373 MB eager heap floor and `elastic stack es --help` at
599 MB RSS. Short-lived CLI processes tolerated it, but `docs chat`,
agent-loop invocations, and any future long-running mode cannot.

Switch to per-endpoint lazy dispatch. The regenerated code this
commit relies on (per-endpoint Zod schemas from
elastic/elastic-client-generator-js#164, per-endpoint api files, the
manifest) ships in the following commit so this one stays review-able
as an architecture change.

Wiring changes:

- `src/es/apis.ts` is rewritten as a lazy loader. It re-exports the
  cheap `apiManifest` (metadata only, no Zod refs) and adds
  `loadEsApi(meta)` / `loadEsApisInFile(stem)` / `loadAllEsApis()`
  for dynamic-import on demand.

- `src/es/register.ts` splits into:
  - `registerEsCommands(definitions)` - synchronous, takes an explicit
    list of `EsApiDefinition`. Used by tests (argument stays required,
    no default-to-`allApis` any more).
  - `registerEsCommandsLazy()` - async, the production path. Sniffs
    `process.argv` to identify the invoked leaf, pre-loads only that
    one endpoint's Zod schema so Commander can register its flags
    before parsing, and registers every other leaf as a stub. Stubs
    also lazy-load on demand if the sniff misses (defence in depth).

- `src/cli.ts` awaits `registerEsCommandsLazy()`.

- `test/es/register.test.ts` updates the "all built-in API schemas
  are JSON-Schema-serializable" test to use `loadAllEsApis()`
  explicitly, since `registerEsCommands` no longer defaults.

Three integration scripts in `scripts/` post-process generator output
until elastic/elastic-client-generator-js#166 folds them upstream:

- `ts-nocheck-schemas.mjs` prepends `// @ts-nocheck` to each per-
  endpoint schema file. Without this, `tsc` OOMs at 8 GB trying to
  type-check 46 MB of heavily-generic Zod types across 588 files.
  The schemas are auto-generated and consumers introspect them via
  `z.infer`, not the internal type structure, so this is safe.

- `split-api-files.mjs` splits each namespace-grouped `apis/*.ts`
  (e.g. `indices.ts` with ~60 APIs) into one file per endpoint
  (`indices_create.ts`, `indices_delete.ts`, ...) so each file
  imports exactly one schema. Without this, invoking
  `elastic stack es indices create` would load all 60 indices
  schemas (~1 GB), defeating per-endpoint lazy dispatch.

- `build-api-manifest.mjs` scans the now-split api files and emits
  `src/es/api-manifest.ts` (294 entries, metadata only, no Zod refs).

Measured impact (after applying this commit + the regenerated code):

| Scenario                          | Before  | After      |
| --------------------------------- | ------- | ---------- |
| apis.ts import heap delta         | 373 MB  | 0.27 MB    |
| elastic stack es --help RSS       | 599 MB  | 102 MB     |
| elastic stack es bulk --help RSS  | ~599 MB | 165 MB     |
| elastic stack es search --help    | ~599 MB | 166 MB     |
| indices create --help RSS         | ~599 MB | 216 MB     |

Per-leaf deltas match the spot-check table in
elastic/elastic-client-generator-js#164 exactly (bulk 34 MB,
search 36 MB, ml.get-job-stats 0.9 MB, cat.indices 10 MB).

Closes #171.
@MattDevy MattDevy force-pushed the claude/trusting-joliot-608fc9 branch from 58a5403 to e513853 Compare April 29, 2026 11:24
@MattDevy
Copy link
Copy Markdown
Contributor Author

Opened #250 to track the package size mitigation.

@MattDevy MattDevy marked this pull request as ready for review April 29, 2026 13:30
MattDevy and others added 3 commits April 29, 2026 16:30
On Windows, os.homedir() reads USERPROFILE rather than HOME. Without
USERPROFILE: dir, discoverConfigFile() probed the CI runner's actual home
(potentially a slow network path), causing the subprocess to exceed Bun's
5-second timeout. That single timeout left Bun's node:test runner in a broken
state, cascading into ~20 describe() inside another test() failures.
@MattDevy MattDevy merged commit e57920d into main Apr 29, 2026
18 checks passed
@MattDevy MattDevy deleted the claude/trusting-joliot-608fc9 branch April 29, 2026 16:32
flobernd added a commit that referenced this pull request May 13, 2026
- Add `codegen:kibana` target invoking `npx tsx cli/kibana/index.ts` and
  copying `output/kibana/apis/*.ts` into `src/kb/apis/`. The hand-written
  lazy loader at `src/kb/apis.ts` is intentionally not overwritten.
- Run `scripts/build-api-manifest.mjs` after ES codegen and
  `scripts/build-kb-manifest.mts` after Kibana codegen so the per-endpoint
  manifests stay in sync with the generated namespace files.
- Stop copying `output/es/index.ts` over `src/es/apis.ts` (it became a
  hand-written lazy loader in #218).
- Fix `build-kb-manifest.mts` to load definitions directly from each
  `src/kb/apis/*.ts` file (the previous `allKbApis` import was dead since
  the lazy-loader refactor in #266). Manifest regen also picks up the one
  endpoint that had been missing as a result.
- Workflow: append the kibana step, expand diff-detection and `add-paths`
  to include `src/kb`, refresh the PR body to list the kibana outputs.
- CONTRIBUTING: document the new target.

Refs: #79
flobernd added a commit that referenced this pull request May 13, 2026
- Add `codegen:kibana` target invoking `npx tsx cli/kibana/index.ts` and
  copying `output/kibana/apis/*.ts` into `src/kb/apis/`. The hand-written
  lazy loader at `src/kb/apis.ts` is intentionally not overwritten.
- Run `scripts/build-api-manifest.mjs` after ES codegen and
  `scripts/build-kb-manifest.mts` after Kibana codegen so the per-endpoint
  manifests stay in sync with the generated namespace files.
- Stop copying `output/es/index.ts` over `src/es/apis.ts` (it became a
  hand-written lazy loader in #218).
- Fix `build-kb-manifest.mts` to load definitions directly from each
  `src/kb/apis/*.ts` file (the previous `allKbApis` import was dead since
  the lazy-loader refactor in #266). Manifest regen also picks up the one
  endpoint that had been missing as a result.
- Workflow: append the kibana step, expand diff-detection and `add-paths`
  to include `src/kb`, refresh the PR body to list the kibana outputs.
- CONTRIBUTING: document the new target.

Refs: #79
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

perf: loading 292 ES API Zod schemas allocates 373 MB of heap

2 participants