feat(kb): add kibana functional tests#279
Conversation
✅
|
| Descriptor | Linter | Files | Fixed | Errors | Warnings | Elapsed time |
|---|---|---|---|---|---|---|
| shellcheck | 8 | 1 | 0 | 0.46s | ||
| ✅ COPYPASTE | jscpd | yes | no | no | 5.26s | |
| ✅ REPOSITORY | gitleaks | yes | no | no | 92.27s | |
| ✅ REPOSITORY | git_diff | yes | no | no | 0.46s | |
| ✅ REPOSITORY | secretlint | yes | no | no | 17.72s | |
| ✅ REPOSITORY | trivy | yes | no | no | 15.94s | |
| ✅ TYPESCRIPT | eslint | 7 | 0 | 0 | 6.07s | |
| ✅ YAML | yamllint | 1 | 0 | 0 | 0.78s |
Detailed Issues
⚠️ BASH / shellcheck - 1 error
In .buildkite/run-kb-tests.sh line 65:
docker load < "$(ls "$ES_CACHE_DIR/elasticsearch-$STACK_VERSION"*.tar.gz | head -1)"
^-- SC2012 (info): Use find instead of ls to better handle non-alphanumeric filenames.
For more information:
https://www.shellcheck.net/wiki/SC2012 -- Use find instead of ls to better ...
See detailed reports in MegaLinter artifacts
Set VALIDATE_ALL_CODEBASE: true in mega-linter.yml to validate all sources, not only the diff
MegaLinter is graciously provided by OX Security
Show us your support by starring ⭐ the repository
|
Not sure what's up with Megalinter, but otherwise this LGTM. |
yea it was flagging some dummy keys as a leak. updatged the linter to ignore these specific keys |
…ith n2-standard-4
ES startup timeout: root cause and fixesGetting ES running in CI for the Kibana tests turned out to be surprisingly tricky. Here is what happened and how we fixed it. Why ES kept timing out When you run The script was doing things in this order: Meanwhile, ES was not even alive yet. What we changed 1. Start ES before the build
2. Better agent image Switched from 3. Two-phase ES health check Even after 4. ES snapshot cache The |
…lth checks Ubuntu 24.04 agents use nftables which can break Docker --publish port forwarding. Resolve container IPs on the Docker bridge network immediately after docker run and use those for all health checks and CLI config. The Linux host can always reach bridge network container IPs directly without going through port publishing.
Docker bridge/port-publishing both fail on kibana-ubuntu-2404 agents (Ubuntu 24.04 uses nftables which breaks Docker NAT rules). --network host puts both containers directly on the host network stack so localhost:9200 and localhost:5601 work unconditionally — equivalent to how Kibana's own CI runs ES as a native process via node scripts/es snapshot.
…unner container On kibana-ubuntu-2404 agents, all host→container networking is broken: --network host is blocked (user namespace remapping), --publish doesn't work (nftables replaces iptables), and direct bridge IPs are not routed to the host. The only reliable networking is inter-container communication on a custom bridge network. This matches how Kibana's own CI works: Kibana runs ES natively so localhost always works; we achieve the same by running a dedicated test-runner container (node:NODE_VERSION-bookworm-slim) on the same network as ES and Kibana, using Docker DNS aliases (elasticsearch:9200, kibana:5601). The built workspace is mounted read-only so node dist/cli.js works without rebuilding.
The test runner container couldn't reach ES via Docker DNS (elasticsearch:9200) in build 268 for an unknown reason. This adds: - Network diagnostics in the runner (resolv.conf, routes, first verbose curl) so the next failure gives us the exact error (DNS failure, TCP refused, etc.) - IP fallback: docker inspect fetches ES/Kibana container IPs on the host and passes them as ES_IP / KB_IP; the runner uses these if DNS lookup fails - docker logs for ES and Kibana in the cleanup trap for visibility - Lower Node.js heap limit (6GB -> 4GB) to reduce memory pressure during build
ES 8.0+ auto-enables HTTPS on the HTTP layer when ELASTIC_PASSWORD is set. This caused "Empty reply from server" errors because the test runner and Kibana were connecting via http:// to a port expecting TLS. Kibana then crashed and its DNS entry was removed, explaining the secondary "kibana DNS failed" symptom. Setting xpack.security.http.ssl.enabled=false keeps security (auth, RBAC, API keys) enabled while allowing plain HTTP access, which is fine for CI.
Two fixes: - Kibana was crashing silently because --rm auto-removed the container before the cleanup trap could collect logs. Removed --rm so docker logs always works. - Kibana was likely crashing because it tried to connect to ES before ES finished bootstrapping. Moved Kibana startup to after the npm build (~3 min buffer), so ES is fully ready when Kibana first connects. ES still starts early. - Also adds xpack.security.transport.ssl.enabled=false to ES for consistency with the http.ssl flag (aligns with elastic/start-local reference setup).
Kibana 9.x explicitly forbids ELASTICSEARCH_USERNAME=elastic with a fatal config validation error. We must use kibana_system instead. Since the host cannot reach ES directly on this agent, a one-shot Node.js container (setup-kibana.js) runs on the same Docker network, waits for ES cluster health and the security index, sets the kibana_system password, then exits. Kibana is then started with ELASTICSEARCH_USERNAME=kibana_system.
The repo has "type": "module" in package.json so .js files are treated as ESM, causing "require is not defined in ES module scope". Renaming to .cjs forces Node to treat it as CommonJS regardless of package.json. Also adds the missing SPDX-License-Identifier header to pass the test:spdx check.
The /api/actions/connector_types and /api/alerting/rules/_find health checks were timing out because the Fleet plugin's retry loop (FleetEncryptedSaved ObjectEncryptionKeyRequired for agent binary source) was causing those endpoints to return non-200 responses. Fleet's issue is unrelated to our tests. Replace the 30-retry polling loop with a 15-second sleep after Kibana reports "available". By that point all essential plugins (alerting, actions) are initialised as part of the "available" state.
… check Three root causes addressed: 1. `params` (alerting create) and `config`/`secrets` (connector create/update) were typed as "string" in the generated API definitions. The CLI factory only JSON-parses flag values for "object"/"array" typed params, so these were sent as raw string literals instead of JSON objects, producing 400 errors from the Kibana API. Fixed in the generator (elastic-client-generator-js#174) and reflected here. 2. The previous `sleep 15` after Kibana's "available" status was not reliable. Kibana's actions plugin serves 403 "license information is not available" until its license subscription fires after connecting to ES. Replaced with an active poll on GET /api/actions/connector_types which directly confirms the license is loaded and the actions API is ready. 3. Added stderr capture (2>/tmp/cli-err.txt + cat on failure) to the first CLI call in alerting.sh and connectors.sh so the actual HTTP error is visible in the Buildkite log if any future failure occurs.
Polling GET /api/actions/connector_types directly was causing repeated 500 Server Errors in Kibana's HTTP access log (the actions plugin HTTP context is not yet wired when Kibana first reports 'available', so early requests get a 500). This looked like the Fleet error resurfacing. Switch to polling /api/status and checking .status.plugins.actions.level == 'available' .status.plugins.alerting.level == 'available' The status endpoint always returns 200 and never causes log noise. Fleet degradation appears only in plugins.fleet and does not affect plugins.actions or plugins.alerting.
Kibana's Docker entrypoint only processes environment variables in SCREAMING_SNAKE_CASE format (e.g. XPACK_ENCRYPTEDSAVEDOBJECTS_ENCRYPTIONKEY). Dotted-notation names (e.g. xpack.encryptedSavedObjects.encryptionKey) are not picked up, so encryptedSavedObjects.canEncrypt stayed false in CI. Every call to getActionsClient() checks canEncrypt and throws 'Unable to create actions client because the Encrypted Saved Objects plugin is missing encryption key' causing a 500 on both POST /api/alerting/rule and GET /api/actions/connector_types. Confirmed: local Kibana (start-local) sets XPACK_ENCRYPTEDSAVEDOBJECTS_ENCRYPTIONKEY and all 5 functional tests pass (5/5 locally).
Summary
Closes #219
data-views,spaces,alerting,connectors,saved-objectsset -euo pipefail+elastic --json+jqassertion pattern as ES functional testskibana-client.ts(introduced in feat(kb): Add Kibana API support with generated command definitions #195) required to make saved-objects work:application/x-ndjsonresponses are now parsed line-by-line into a JSON array instead of throwing a JSON parse errormultipart/form-datarequests are now sent viaFormData(file upload) instead ofapplication/json, fixing the415 Unsupported Media Typeerror from KibanarequestTypeandresponseTypefields toKbApiDefinitionso the request builder and client know which transport mode to use (depends on elastic/elastic-client-generator-js feat/kb-ndjson-multipart)run-kb-tests.sh) that starts ES + Kibana Docker containers and runs the suite on Node 22 and 24test:functional:kbnpm script for local executionFuture optimization: #280
Test plan
npm run test:functional:kb: 5 passed, 0 failed)