Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Fleet] Add retry logic to serverless API check #176808

Merged

Conversation

jillguyonnet
Copy link
Contributor

@jillguyonnet jillguyonnet commented Feb 13, 2024

Summary

Closes #176352
Closes #176399

#175315 added the possibility to configure new Fleet Server hosts in serverless, with the constraint that the host URL must match the default URL. The API integration tests written to test this have been flaky, due to failure retrieving the default Fleet Server host or default Elasticsearch output from saved objects. This PR adds retry in the tests.

Note: I have tried adding retry logic in the API handlers but kept hitting test flakiness.

This fix has been tested using the Flaky Test Runner Pipeline, with 48/49 test runs for observability and security project types:
🟢 https://buildkite.com/elastic/kibana-flaky-test-suite-runner/builds/5218
🟢 https://buildkite.com/elastic/kibana-flaky-test-suite-runner/builds/5222
🟢 https://buildkite.com/elastic/kibana-flaky-test-suite-runner/builds/5225

Checklist

@apmmachine
Copy link
Contributor

🤖 GitHub comments

Expand to view the GitHub comments

Just comment with:

  • /oblt-deploy : Deploy a Kibana instance using the Observability test environments.
  • /oblt-deploy-serverless : Deploy a serverless Kibana instance using the Observability test environments.
  • run elasticsearch-ci/docs : Re-trigger the docs validation. (use unformatted text in the comment!)

@jillguyonnet jillguyonnet force-pushed the fleet/176352-fix-flaky-api-tests branch from 2a12f1f to 5e7fc30 Compare February 14, 2024 16:28
@jillguyonnet
Copy link
Contributor Author

/ci

@jillguyonnet jillguyonnet self-assigned this Feb 19, 2024
@jillguyonnet jillguyonnet added Team:Fleet Team label for Observability Data Collection Fleet team release_note:skip Skip the PR/issue when compiling release notes labels Feb 19, 2024
@jillguyonnet
Copy link
Contributor Author

@elasticmachine merge upstream

@kibanamachine
Copy link
Contributor

merge conflict between base and head

@jillguyonnet jillguyonnet force-pushed the fleet/176352-fix-flaky-api-tests branch from a6609c7 to cfe7518 Compare February 19, 2024 14:50
@jillguyonnet
Copy link
Contributor Author

/ci

@jillguyonnet jillguyonnet marked this pull request as ready for review February 19, 2024 16:46
@jillguyonnet jillguyonnet requested a review from a team as a code owner February 19, 2024 16:46
@elasticmachine
Copy link
Contributor

Pinging @elastic/fleet (Team:Fleet)

// No need to validate on update if hosts are not passed.
if (outputId && !output.hosts) {
// No need to validate for other types.
if (output.type !== outputType.Elasticsearch) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the type here could be undefined as our PUT are in fact PATCH

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good catch, thanks. I've modified the unit tests slightly to test that update works as expected when the output type is not passed.

const defaultElasticsearchOutputId = 'es-default-output';
const defaultElasticsearchOutputHostUrl = 'https://localhost:9200';

async function expectDefaultFleetServer() {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is it possible to move expectDefaultFleetServer and expectDefaultElasticsearchOutput to a helper to avoid duplication?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's not super easy to read IMHO because of the different details (e.g. body.item.host_urls vs. body.item.hosts) 😅 maybe something like:

async function expectDefault({ name, endpoint, defaultId, hostField, defaultHostUrl }) {
    await retry.waitForWithTimeout(`get default ${name}`, 30_000, async () => {
      const { body, status } = await supertest.get(`/api/fleet/${endpoint}/${defaultId}`);
      if (status === 200 && body.item[hostField].includes(defaultHostUrl)) {
        return true;
      } else {
        throw new Error(`Expected default ${name} id ${defaultId} to exist`);
      }
    });
  }

and

const fleetServer = {
    name: 'Fleet Server',
    endpoint: 'fleet_server_hosts',
    defaultId: 'default-fleet-server',
    hostField: 'host_urls',
    defaultHostUrl: 'https://localhost:8220',
  };

Any thoughts or recommendation?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry, I didn't mean joining the 2 functions, but the duplication in observability/fleet and security/fleet.
The suggested refactor could make sense too, though not really necessary, I think.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My bad, that makes a lot more sense 🙈 I wasn't sure as these two tests are identical anyway, but I tried to extract this and part of the config as they are linked. Let me know what you think.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks great 👍

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for suggesting 👍 I've re-run the flaky test runner for good measure, all green: https://buildkite.com/elastic/kibana-flaky-test-suite-runner/builds/5237

Copy link
Contributor

@juliaElastic juliaElastic left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good to me

@kibana-ci
Copy link
Collaborator

💛 Build succeeded, but was flaky

Failed CI Steps

Test Failures

  • [job] [logs] FTR Configs #24 / discover/group1 discover histogram "after all" hook for "should reset all histogram state when resetting the saved search"

Metrics [docs]

✅ unchanged

History

To update your PR or re-run it, just comment with:
@elasticmachine merge upstream

cc @jillguyonnet

Copy link
Member

@nchaulet nchaulet left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🚀

@jillguyonnet jillguyonnet merged commit 83e64be into elastic:main Feb 21, 2024
19 checks passed
@jillguyonnet jillguyonnet deleted the fleet/176352-fix-flaky-api-tests branch February 21, 2024 09:14
@kibanamachine kibanamachine added v8.14.0 backport:skip This commit does not require backporting labels Feb 21, 2024
fkanout pushed a commit to fkanout/kibana that referenced this pull request Mar 4, 2024
## Summary

Closes elastic#176352
Closes elastic#176399

elastic#175315 added the possibility to
configure new Fleet Server hosts in serverless, with the constraint that
the host URL must match the default URL. The API integration tests
written to test this have been flaky, due to failure retrieving the
default Fleet Server host or default Elasticsearch output from saved
objects. This PR adds retry in the tests.

Note: I have tried adding retry logic in the API handlers but kept
hitting test flakiness.

This fix has been tested using the Flaky Test Runner Pipeline, with
48/49 test runs for observability and security project types:
🟢
https://buildkite.com/elastic/kibana-flaky-test-suite-runner/builds/5218
🟢
https://buildkite.com/elastic/kibana-flaky-test-suite-runner/builds/5222
🟢
https://buildkite.com/elastic/kibana-flaky-test-suite-runner/builds/5225

### Checklist

- [x] [Unit or functional
tests](https://www.elastic.co/guide/en/kibana/master/development-tests.html)
were updated or added to match the most common scenarios
- [x] [Flaky Test
Runner](https://ci-stats.kibana.dev/trigger_flaky_test_runner/1) was
used on any tests changed
@jillguyonnet jillguyonnet added backport:prev-minor Backport to the previous minor version (i.e. one version back from main) and removed backport:skip This commit does not require backporting labels Mar 12, 2024
@kibanamachine
Copy link
Contributor

💔 All backports failed

Status Branch Result
8.13 Backport failed because of merge conflicts

Manual backport

To create the backport manually run:

node scripts/backport --pr 176808

Questions ?

Please refer to the Backport tool documentation

@jillguyonnet
Copy link
Contributor Author

💚 All backports created successfully

Status Branch Result
8.13

Note: Successful backport PRs will be merged automatically after passing CI.

Questions ?

Please refer to the Backport tool documentation

jillguyonnet added a commit to jillguyonnet/kibana that referenced this pull request Mar 12, 2024
## Summary

Closes elastic#176352
Closes elastic#176399

elastic#175315 added the possibility to
configure new Fleet Server hosts in serverless, with the constraint that
the host URL must match the default URL. The API integration tests
written to test this have been flaky, due to failure retrieving the
default Fleet Server host or default Elasticsearch output from saved
objects. This PR adds retry in the tests.

Note: I have tried adding retry logic in the API handlers but kept
hitting test flakiness.

This fix has been tested using the Flaky Test Runner Pipeline, with
48/49 test runs for observability and security project types:
🟢
https://buildkite.com/elastic/kibana-flaky-test-suite-runner/builds/5218
🟢
https://buildkite.com/elastic/kibana-flaky-test-suite-runner/builds/5222
🟢
https://buildkite.com/elastic/kibana-flaky-test-suite-runner/builds/5225

### Checklist

- [x] [Unit or functional
tests](https://www.elastic.co/guide/en/kibana/master/development-tests.html)
were updated or added to match the most common scenarios
- [x] [Flaky Test
Runner](https://ci-stats.kibana.dev/trigger_flaky_test_runner/1) was
used on any tests changed

(cherry picked from commit 83e64be)

# Conflicts:
#	x-pack/test_serverless/api_integration/test_suites/security/fleet/fleet.ts
jillguyonnet added a commit that referenced this pull request Mar 12, 2024
…8508)

# Backport

This will backport the following commits from `main` to `8.13`:
- [[Fleet] Add retry logic to serverless API check
(#176808)](#176808)

<!--- Backport version: 8.9.8 -->

### Questions ?
Please refer to the [Backport tool
documentation](https://github.com/sqren/backport)

<!--BACKPORT [{"author":{"name":"Jill
Guyonnet","email":"jill.guyonnet@elastic.co"},"sourceCommit":{"committedDate":"2024-02-21T09:14:10Z","message":"[Fleet]
Add retry logic to serverless API check (#176808)\n\n##
Summary\r\n\r\nCloses
#176352
#176399
added the possibility to\r\nconfigure new Fleet Server hosts in
serverless, with the constraint that\r\nthe host URL must match the
default URL. The API integration tests\r\nwritten to test this have been
flaky, due to failure retrieving the\r\ndefault Fleet Server host or
default Elasticsearch output from saved\r\nobjects. This PR adds retry
in the tests.\r\n\r\nNote: I have tried adding retry logic in the API
handlers but kept\r\nhitting test flakiness.\r\n\r\nThis fix has been
tested using the Flaky Test Runner Pipeline, with\r\n48/49 test runs for
observability and security project
types:\r\n🟢\r\nhttps://buildkite.com/elastic/kibana-flaky-test-suite-runner/builds/5218\r\n🟢\r\nhttps://buildkite.com/elastic/kibana-flaky-test-suite-runner/builds/5222\r\n🟢\r\nhttps://buildkite.com/elastic/kibana-flaky-test-suite-runner/builds/5225\r\n\r\n###
Checklist\r\n\r\n- [x] [Unit or
functional\r\ntests](https://www.elastic.co/guide/en/kibana/master/development-tests.html)\r\nwere
updated or added to match the most common scenarios\r\n- [x] [Flaky
Test\r\nRunner](https://ci-stats.kibana.dev/trigger_flaky_test_runner/1)
was\r\nused on any tests
changed","sha":"83e64be5d6730ec1f22ade806192bf1eab54751a","branchLabelMapping":{"^v8.14.0$":"main","^v(\\d+).(\\d+).\\d+$":"$1.$2"}},"sourcePullRequest":{"labels":["release_note:skip","Team:Fleet","backport:prev-minor","v8.14.0"],"number":176808,"url":"#176808
Add retry logic to serverless API check (#176808)\n\n##
Summary\r\n\r\nCloses
#176352
#176399
added the possibility to\r\nconfigure new Fleet Server hosts in
serverless, with the constraint that\r\nthe host URL must match the
default URL. The API integration tests\r\nwritten to test this have been
flaky, due to failure retrieving the\r\ndefault Fleet Server host or
default Elasticsearch output from saved\r\nobjects. This PR adds retry
in the tests.\r\n\r\nNote: I have tried adding retry logic in the API
handlers but kept\r\nhitting test flakiness.\r\n\r\nThis fix has been
tested using the Flaky Test Runner Pipeline, with\r\n48/49 test runs for
observability and security project
types:\r\n🟢\r\nhttps://buildkite.com/elastic/kibana-flaky-test-suite-runner/builds/5218\r\n🟢\r\nhttps://buildkite.com/elastic/kibana-flaky-test-suite-runner/builds/5222\r\n🟢\r\nhttps://buildkite.com/elastic/kibana-flaky-test-suite-runner/builds/5225\r\n\r\n###
Checklist\r\n\r\n- [x] [Unit or
functional\r\ntests](https://www.elastic.co/guide/en/kibana/master/development-tests.html)\r\nwere
updated or added to match the most common scenarios\r\n- [x] [Flaky
Test\r\nRunner](https://ci-stats.kibana.dev/trigger_flaky_test_runner/1)
was\r\nused on any tests
changed","sha":"83e64be5d6730ec1f22ade806192bf1eab54751a"}},"sourceBranch":"main","suggestedTargetBranches":[],"targetPullRequestStates":[{"branch":"main","label":"v8.14.0","labelRegex":"^v8.14.0$","isSourceBranch":true,"state":"MERGED","url":"#176808
Add retry logic to serverless API check (#176808)\n\n##
Summary\r\n\r\nCloses
#176352
#176399
added the possibility to\r\nconfigure new Fleet Server hosts in
serverless, with the constraint that\r\nthe host URL must match the
default URL. The API integration tests\r\nwritten to test this have been
flaky, due to failure retrieving the\r\ndefault Fleet Server host or
default Elasticsearch output from saved\r\nobjects. This PR adds retry
in the tests.\r\n\r\nNote: I have tried adding retry logic in the API
handlers but kept\r\nhitting test flakiness.\r\n\r\nThis fix has been
tested using the Flaky Test Runner Pipeline, with\r\n48/49 test runs for
observability and security project
types:\r\n🟢\r\nhttps://buildkite.com/elastic/kibana-flaky-test-suite-runner/builds/5218\r\n🟢\r\nhttps://buildkite.com/elastic/kibana-flaky-test-suite-runner/builds/5222\r\n🟢\r\nhttps://buildkite.com/elastic/kibana-flaky-test-suite-runner/builds/5225\r\n\r\n###
Checklist\r\n\r\n- [x] [Unit or
functional\r\ntests](https://www.elastic.co/guide/en/kibana/master/development-tests.html)\r\nwere
updated or added to match the most common scenarios\r\n- [x] [Flaky
Test\r\nRunner](https://ci-stats.kibana.dev/trigger_flaky_test_runner/1)
was\r\nused on any tests
changed","sha":"83e64be5d6730ec1f22ade806192bf1eab54751a"}}]}]
BACKPORT-->
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment