[Create Environment] Follow up fixes on the EA environment with CDR logs by opauloh · Pull Request #4284 · elastic/cloudbeat

opauloh · 2026-04-07T13:10:49Z

To be merged after #4092

Summary of your changes

This PR is based of #4092, it adds a few fixes identified with issues in the Create environment with Entity Store and Entity Analytics:

Fixes Entity Store endpoint (install and status APIs were moved from internal to public)
Remove the manual step to enable the Entity Store v2 Kibana setting from the test environment workflow (it's already covered by the Deploy CDR step)
Updates the Kibana RAM Memory of the Deployment from 1GB to 8GB (Entity Store v2 needs 8Gb minimum to run stable).
Reduce deployment size of AWS instances (this prevents errors with maximum VPCus), also it's a request from the security-cloud-services-team to reduce size). The size chosen is the one recommended in the Elastic Agent docs for general Elastic Agent usage.

Checklist

I have added tests that prove my fix is effective or that my feature works

The /api/fleet/status endpoint returns 404 on ESS deployments running Kibana 9.x. Switch to /api/fleet/agent_policies which is the first endpoint the integration scripts rely on and confirms Fleet is truly ready. Also log the response body on non-200 to aid future debugging, add kbn-xsrf header, and add serverless-mode input to skip the Fleet check on serverless deployments where Fleet is managed by Elastic Cloud. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

GET /api/fleet/agent_policies returning 200 only confirms Kibana's Fleet plugin is up, not that Fleet Server is ready for writes. Switch to POST /api/fleet/setup which is idempotent and only returns isInitialized:true once Fleet Server is fully configured and accepting connections. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

POST /api/fleet/setup returning isInitialized:true only confirms Fleet data is written to Elasticsearch; Fleet Server itself can still be starting up. Add a second polling stage on GET /api/fleet/epm/packages which is the first call every install script makes and requires the full Fleet Server stack to be operational before returning 200. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

A single 200 from /api/fleet/epm/packages is not enough — Fleet Server cycles through restart windows and one passing poll can be followed immediately by 502/503. Require 3 consecutive 200s (up to 60 attempts) before declaring Fleet stable. Also retry the transient 400 "not available with the current configuration" in perform_api_call — Fleet emits this during initialisation but it resolves quickly and should not kill the script on first occurrence. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

Require 5 consecutive 200s (up from 3) from GET /api/fleet/epm/packages before declaring Fleet Server stable — a 3-pass window was too narrow to catch restart cycles. Increase perform_api_call max_retries default from 3 to 8 and cap exponential backoff at 30s (5, 10, 20, 30, 30, 30, 30s ≈ 2.5 min), giving scripts enough time to ride through a Fleet Server restart before giving up. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

429 responses from Fleet (e.g. TLS handshake timeout surfaced as rate limit) are transient and should be retried alongside 5xx errors. get_package_version silently returned None on failure, causing all 16 install scripts to POST a null package version and receive a confusing 400 "expected string but got null". Re-raise the exception so scripts fail immediately with a clear error instead. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

…Store v2 install - Increase Kibana instance size to 8g for improved stability. - Update default AWS EC2 instance type to m7i.large. - Remove the manual Entity Store v2 installation step from the test environment workflow.

…/kibana-fixes

…nual setting - Update the Entity Store v2 installation API to use the `/api/` endpoint instead of `/internal/`. - Remove the manual step to enable the Entity Store v2 Kibana setting from the test environment workflow.

mergify · 2026-04-07T13:11:26Z

This pull request does not have a backport label. Could you fix it @opauloh? 🙏
To fixup this pull request, you need to add the backport labels for the needed
branches, such as:

backport-v./d./d./d is the label to automatically backport to the 8./d branch. /d is the digit
backport-active-all is the label that automatically backports to all active branches.
backport-active-8 is the label that automatically backports to all active minor branches for the 8 major.
backport-active-9 is the label that automatically backports to all active minor branches for the 9 major.

- Enable Security Hub findings and insights within the Cloudtrail integration setup. - Add AWS region support to the integration configuration and CI environment variables. - Remove the deprecated `leadGenerationDetailsEnabled` experimental Kibana setting. - Update Security Hub insights collection interval to 1 hour.

seanrathier and others added 14 commits April 2, 2026 10:26

status check

73eba62

more logs

36e8e88

Merge branch 'main' into seanrathier/cdr-kibana-readiness-check

fdcaa50

Merge branch 'seanrathier/cdr-kibana-readiness-check' into create-env…

4031805

…/kibana-fixes

remove param

adc4b36

status

f689713

opauloh requested a review from gurevichdmitry April 7, 2026 13:10

opauloh requested a review from a team as a code owner April 7, 2026 13:10

mergify bot assigned opauloh Apr 7, 2026

opauloh mentioned this pull request Apr 7, 2026

Add Fleet and Kibana checks in create env workflow #4092

Open

5 tasks

opauloh added the backport-skip label Apr 7, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Create Environment] Follow up fixes on the EA environment with CDR logs#4284

[Create Environment] Follow up fixes on the EA environment with CDR logs#4284
opauloh wants to merge 15 commits intomainfrom
create-env/kibana-fixes

opauloh commented Apr 7, 2026 •

edited

Loading

Uh oh!

mergify bot commented Apr 7, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

opauloh commented Apr 7, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary of your changes

Checklist

Uh oh!

mergify bot commented Apr 7, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

opauloh commented Apr 7, 2026 •

edited

Loading