[data.search] Expose wait_for_completion_timeout for clients and tests to use #107241

FrankHassanabad · 2021-07-29T21:09:45Z

Describe the feature:
The search strategies do not expose the wait_for_completion_timeout that can be set in REST by clients and instead gives everyone a global default of 100ms it looks like here:

We would like for this to be REST configurable as an option for the strategies mostly for end to end tests to where we can set this value higher as this causes us to either write utility functions or to handle when things come back async vs. sync, versus us just sending down this setting to allow us to wait longer and avoid flake.

Recent flakes on CI seen when queries look to be going from sync to async:
#94535

The text was updated successfully, but these errors were encountered:

elasticmachine · 2021-07-29T21:09:47Z

Pinging @elastic/kibana-app-services (Team:AppServices)

…flake, boilerplate, and technique choices (#116211) ## Summary Fixes flake tests of: #115918 #103273 #108640 #109447 #100630 #94535 #104260 Security solution has been using `bsearch` and has encountered flake in various forms. Different developers have been fixing the flake in a few odd ways (myself included) which aren't 100%. This PR introduces a once-in-for-all REST API retry service called `bsearch` which will query `bsearch` and if `bsearch` is not completed because of async occurring due to slower CI runtimes it will continuously call into the `bsearch` with the correct API to ensure it gets a complete response before returning. ## Usage Anyone can use this service like so: ```ts const bsearch = getService('bsearch'); const response = await bsearch.send<MyType>({ supertest, options: { defaultIndex: ['large_volume_dns_data'], } strategy: 'securitySolutionSearchStrategy', }); ``` If you're using a custom auth then you can set that beforehand like so: ```ts const bsearch = getService('bsearch'); const supertestWithoutAuth = getService('supertestWithoutAuth'); const supertest supertestWithoutAuth.auth(username, password); const response = await bsearch.send<MyType>({ supertest, options: { defaultIndex: ['large_volume_dns_data'], } strategy: 'securitySolutionSearchStrategy', }); ``` ## Misconceptions in the tests leading to flake * Can you just call the bsearch REST API and it will always return data first time? Not always true, as when CI slows down or data increases `bsearch` will give you back an async reference and then your test will blow up. * Can we wrap the REST API in `retry` to fix the flake? Not always but mostly true, as when CI slows down or data increases `bsearch` could return the async version continuously which could then fail your test. It's also tedious to tell everyone in code reviews to wrap everything in `retry` instead of just fixing it with a service as well as inform new people why we are constantly wrapping these tests in `retry`. * Can we manually parse the `bsearch` if it has `async` for each test? This is true but is error prone and I did this for one test and it's ugly and I had issues as I have to wrap 2 things in `retry` and test several conditions. Also it's harder for people to read the tests rather than just reading there is a service call. Also people in code reviews missed where I had bugs with it. Also lots of boiler plate. * Can we just increase the timeout with `wait_for_completion_timeout` and the tests will pass for sure then? Not true today but maybe true later, as this hasn't been added as plumbing yet. See this [open ticket](#107241). Even if it is and we increase the timeout to a very large number bsearch might return with an `async` or you might want to test the `async` path. Either way, if/when we add the ability we can increase it within 1 spot which is this service for everyone rather than going to each individual test to add it. If/when it's added if people don't use the bsearch service we can remove it later if we find this is deterministic enough and no one wants to test bsearch features with their strategies down the road. ## Manual test of bsearch service If you want to manually watch the bsearch operate as if the CI system is running slow or to cause an `async` manually you manually modify this setting here: https://github.com/elastic/kibana/blob/master/src/plugins/data/server/search/strategies/ese_search/request_utils.ts#L61 To be of a lower number such as `1ms` and then you will see it enter the `async` code within `bsearch` consistently ## Reference PRs We cannot set the wait_for_complete just yet #107241 so we decided this was the best way to reduce flake for testing for now. ### Checklist - [x] [Unit or functional tests](https://www.elastic.co/guide/en/kibana/master/development-tests.html) were updated or added to match the most common scenarios

…flake, boilerplate, and technique choices (elastic#116211) ## Summary Fixes flake tests of: elastic#115918 elastic#103273 elastic#108640 elastic#109447 elastic#100630 elastic#94535 elastic#104260 Security solution has been using `bsearch` and has encountered flake in various forms. Different developers have been fixing the flake in a few odd ways (myself included) which aren't 100%. This PR introduces a once-in-for-all REST API retry service called `bsearch` which will query `bsearch` and if `bsearch` is not completed because of async occurring due to slower CI runtimes it will continuously call into the `bsearch` with the correct API to ensure it gets a complete response before returning. ## Usage Anyone can use this service like so: ```ts const bsearch = getService('bsearch'); const response = await bsearch.send<MyType>({ supertest, options: { defaultIndex: ['large_volume_dns_data'], } strategy: 'securitySolutionSearchStrategy', }); ``` If you're using a custom auth then you can set that beforehand like so: ```ts const bsearch = getService('bsearch'); const supertestWithoutAuth = getService('supertestWithoutAuth'); const supertest supertestWithoutAuth.auth(username, password); const response = await bsearch.send<MyType>({ supertest, options: { defaultIndex: ['large_volume_dns_data'], } strategy: 'securitySolutionSearchStrategy', }); ``` ## Misconceptions in the tests leading to flake * Can you just call the bsearch REST API and it will always return data first time? Not always true, as when CI slows down or data increases `bsearch` will give you back an async reference and then your test will blow up. * Can we wrap the REST API in `retry` to fix the flake? Not always but mostly true, as when CI slows down or data increases `bsearch` could return the async version continuously which could then fail your test. It's also tedious to tell everyone in code reviews to wrap everything in `retry` instead of just fixing it with a service as well as inform new people why we are constantly wrapping these tests in `retry`. * Can we manually parse the `bsearch` if it has `async` for each test? This is true but is error prone and I did this for one test and it's ugly and I had issues as I have to wrap 2 things in `retry` and test several conditions. Also it's harder for people to read the tests rather than just reading there is a service call. Also people in code reviews missed where I had bugs with it. Also lots of boiler plate. * Can we just increase the timeout with `wait_for_completion_timeout` and the tests will pass for sure then? Not true today but maybe true later, as this hasn't been added as plumbing yet. See this [open ticket](elastic#107241). Even if it is and we increase the timeout to a very large number bsearch might return with an `async` or you might want to test the `async` path. Either way, if/when we add the ability we can increase it within 1 spot which is this service for everyone rather than going to each individual test to add it. If/when it's added if people don't use the bsearch service we can remove it later if we find this is deterministic enough and no one wants to test bsearch features with their strategies down the road. ## Manual test of bsearch service If you want to manually watch the bsearch operate as if the CI system is running slow or to cause an `async` manually you manually modify this setting here: https://github.com/elastic/kibana/blob/master/src/plugins/data/server/search/strategies/ese_search/request_utils.ts#L61 To be of a lower number such as `1ms` and then you will see it enter the `async` code within `bsearch` consistently ## Reference PRs We cannot set the wait_for_complete just yet elastic#107241 so we decided this was the best way to reduce flake for testing for now. ### Checklist - [x] [Unit or functional tests](https://www.elastic.co/guide/en/kibana/master/development-tests.html) were updated or added to match the most common scenarios # Conflicts: # x-pack/test/api_integration/apis/security_solution/hosts.ts

…flake, boilerplate, and technique choices (elastic#116211) ## Summary Fixes flake tests of: elastic#115918 elastic#103273 elastic#108640 elastic#109447 elastic#100630 elastic#94535 elastic#104260 Security solution has been using `bsearch` and has encountered flake in various forms. Different developers have been fixing the flake in a few odd ways (myself included) which aren't 100%. This PR introduces a once-in-for-all REST API retry service called `bsearch` which will query `bsearch` and if `bsearch` is not completed because of async occurring due to slower CI runtimes it will continuously call into the `bsearch` with the correct API to ensure it gets a complete response before returning. ## Usage Anyone can use this service like so: ```ts const bsearch = getService('bsearch'); const response = await bsearch.send<MyType>({ supertest, options: { defaultIndex: ['large_volume_dns_data'], } strategy: 'securitySolutionSearchStrategy', }); ``` If you're using a custom auth then you can set that beforehand like so: ```ts const bsearch = getService('bsearch'); const supertestWithoutAuth = getService('supertestWithoutAuth'); const supertest supertestWithoutAuth.auth(username, password); const response = await bsearch.send<MyType>({ supertest, options: { defaultIndex: ['large_volume_dns_data'], } strategy: 'securitySolutionSearchStrategy', }); ``` ## Misconceptions in the tests leading to flake * Can you just call the bsearch REST API and it will always return data first time? Not always true, as when CI slows down or data increases `bsearch` will give you back an async reference and then your test will blow up. * Can we wrap the REST API in `retry` to fix the flake? Not always but mostly true, as when CI slows down or data increases `bsearch` could return the async version continuously which could then fail your test. It's also tedious to tell everyone in code reviews to wrap everything in `retry` instead of just fixing it with a service as well as inform new people why we are constantly wrapping these tests in `retry`. * Can we manually parse the `bsearch` if it has `async` for each test? This is true but is error prone and I did this for one test and it's ugly and I had issues as I have to wrap 2 things in `retry` and test several conditions. Also it's harder for people to read the tests rather than just reading there is a service call. Also people in code reviews missed where I had bugs with it. Also lots of boiler plate. * Can we just increase the timeout with `wait_for_completion_timeout` and the tests will pass for sure then? Not true today but maybe true later, as this hasn't been added as plumbing yet. See this [open ticket](elastic#107241). Even if it is and we increase the timeout to a very large number bsearch might return with an `async` or you might want to test the `async` path. Either way, if/when we add the ability we can increase it within 1 spot which is this service for everyone rather than going to each individual test to add it. If/when it's added if people don't use the bsearch service we can remove it later if we find this is deterministic enough and no one wants to test bsearch features with their strategies down the road. ## Manual test of bsearch service If you want to manually watch the bsearch operate as if the CI system is running slow or to cause an `async` manually you manually modify this setting here: https://github.com/elastic/kibana/blob/master/src/plugins/data/server/search/strategies/ese_search/request_utils.ts#L61 To be of a lower number such as `1ms` and then you will see it enter the `async` code within `bsearch` consistently ## Reference PRs We cannot set the wait_for_complete just yet elastic#107241 so we decided this was the best way to reduce flake for testing for now. ### Checklist - [x] [Unit or functional tests](https://www.elastic.co/guide/en/kibana/master/development-tests.html) were updated or added to match the most common scenarios

…flake, boilerplate, and technique choices (#116211) (#116500) ## Summary Fixes flake tests of: #115918 #103273 #108640 #109447 #100630 #94535 #104260 Security solution has been using `bsearch` and has encountered flake in various forms. Different developers have been fixing the flake in a few odd ways (myself included) which aren't 100%. This PR introduces a once-in-for-all REST API retry service called `bsearch` which will query `bsearch` and if `bsearch` is not completed because of async occurring due to slower CI runtimes it will continuously call into the `bsearch` with the correct API to ensure it gets a complete response before returning. ## Usage Anyone can use this service like so: ```ts const bsearch = getService('bsearch'); const response = await bsearch.send<MyType>({ supertest, options: { defaultIndex: ['large_volume_dns_data'], } strategy: 'securitySolutionSearchStrategy', }); ``` If you're using a custom auth then you can set that beforehand like so: ```ts const bsearch = getService('bsearch'); const supertestWithoutAuth = getService('supertestWithoutAuth'); const supertest supertestWithoutAuth.auth(username, password); const response = await bsearch.send<MyType>({ supertest, options: { defaultIndex: ['large_volume_dns_data'], } strategy: 'securitySolutionSearchStrategy', }); ``` ## Misconceptions in the tests leading to flake * Can you just call the bsearch REST API and it will always return data first time? Not always true, as when CI slows down or data increases `bsearch` will give you back an async reference and then your test will blow up. * Can we wrap the REST API in `retry` to fix the flake? Not always but mostly true, as when CI slows down or data increases `bsearch` could return the async version continuously which could then fail your test. It's also tedious to tell everyone in code reviews to wrap everything in `retry` instead of just fixing it with a service as well as inform new people why we are constantly wrapping these tests in `retry`. * Can we manually parse the `bsearch` if it has `async` for each test? This is true but is error prone and I did this for one test and it's ugly and I had issues as I have to wrap 2 things in `retry` and test several conditions. Also it's harder for people to read the tests rather than just reading there is a service call. Also people in code reviews missed where I had bugs with it. Also lots of boiler plate. * Can we just increase the timeout with `wait_for_completion_timeout` and the tests will pass for sure then? Not true today but maybe true later, as this hasn't been added as plumbing yet. See this [open ticket](#107241). Even if it is and we increase the timeout to a very large number bsearch might return with an `async` or you might want to test the `async` path. Either way, if/when we add the ability we can increase it within 1 spot which is this service for everyone rather than going to each individual test to add it. If/when it's added if people don't use the bsearch service we can remove it later if we find this is deterministic enough and no one wants to test bsearch features with their strategies down the road. ## Manual test of bsearch service If you want to manually watch the bsearch operate as if the CI system is running slow or to cause an `async` manually you manually modify this setting here: https://github.com/elastic/kibana/blob/master/src/plugins/data/server/search/strategies/ese_search/request_utils.ts#L61 To be of a lower number such as `1ms` and then you will see it enter the `async` code within `bsearch` consistently ## Reference PRs We cannot set the wait_for_complete just yet #107241 so we decided this was the best way to reduce flake for testing for now. ### Checklist - [x] [Unit or functional tests](https://www.elastic.co/guide/en/kibana/master/development-tests.html) were updated or added to match the most common scenarios # Conflicts: # x-pack/test/api_integration/apis/security_solution/hosts.ts

…flake, boilerplate, and technique choices (#116211) (#116514) ## Summary Fixes flake tests of: #115918 #103273 #108640 #109447 #100630 #94535 #104260 Security solution has been using `bsearch` and has encountered flake in various forms. Different developers have been fixing the flake in a few odd ways (myself included) which aren't 100%. This PR introduces a once-in-for-all REST API retry service called `bsearch` which will query `bsearch` and if `bsearch` is not completed because of async occurring due to slower CI runtimes it will continuously call into the `bsearch` with the correct API to ensure it gets a complete response before returning. ## Usage Anyone can use this service like so: ```ts const bsearch = getService('bsearch'); const response = await bsearch.send<MyType>({ supertest, options: { defaultIndex: ['large_volume_dns_data'], } strategy: 'securitySolutionSearchStrategy', }); ``` If you're using a custom auth then you can set that beforehand like so: ```ts const bsearch = getService('bsearch'); const supertestWithoutAuth = getService('supertestWithoutAuth'); const supertest supertestWithoutAuth.auth(username, password); const response = await bsearch.send<MyType>({ supertest, options: { defaultIndex: ['large_volume_dns_data'], } strategy: 'securitySolutionSearchStrategy', }); ``` ## Misconceptions in the tests leading to flake * Can you just call the bsearch REST API and it will always return data first time? Not always true, as when CI slows down or data increases `bsearch` will give you back an async reference and then your test will blow up. * Can we wrap the REST API in `retry` to fix the flake? Not always but mostly true, as when CI slows down or data increases `bsearch` could return the async version continuously which could then fail your test. It's also tedious to tell everyone in code reviews to wrap everything in `retry` instead of just fixing it with a service as well as inform new people why we are constantly wrapping these tests in `retry`. * Can we manually parse the `bsearch` if it has `async` for each test? This is true but is error prone and I did this for one test and it's ugly and I had issues as I have to wrap 2 things in `retry` and test several conditions. Also it's harder for people to read the tests rather than just reading there is a service call. Also people in code reviews missed where I had bugs with it. Also lots of boiler plate. * Can we just increase the timeout with `wait_for_completion_timeout` and the tests will pass for sure then? Not true today but maybe true later, as this hasn't been added as plumbing yet. See this [open ticket](#107241). Even if it is and we increase the timeout to a very large number bsearch might return with an `async` or you might want to test the `async` path. Either way, if/when we add the ability we can increase it within 1 spot which is this service for everyone rather than going to each individual test to add it. If/when it's added if people don't use the bsearch service we can remove it later if we find this is deterministic enough and no one wants to test bsearch features with their strategies down the road. ## Manual test of bsearch service If you want to manually watch the bsearch operate as if the CI system is running slow or to cause an `async` manually you manually modify this setting here: https://github.com/elastic/kibana/blob/master/src/plugins/data/server/search/strategies/ese_search/request_utils.ts#L61 To be of a lower number such as `1ms` and then you will see it enter the `async` code within `bsearch` consistently ## Reference PRs We cannot set the wait_for_complete just yet #107241 so we decided this was the best way to reduce flake for testing for now. ### Checklist - [x] [Unit or functional tests](https://www.elastic.co/guide/en/kibana/master/development-tests.html) were updated or added to match the most common scenarios Co-authored-by: Kibana Machine <42973632+kibanamachine@users.noreply.github.com>

lukasolson · 2022-05-09T16:53:18Z

I just did a little test locally and I'm not sure when this was fixed, but you can definitely override this value when calling data.search both from the client & server now.

FrankHassanabad added Feature:Search Querying infrastructure in Kibana Team:AppServices Project:AsyncSearch Background search, partial results, async search services. labels Jul 29, 2021

exalate-issue-sync bot added impact:low Addressing this issue will have a low level of impact on the quality/strength of our product. loe:small Small Level of Effort labels Jul 29, 2021

FrankHassanabad mentioned this issue Oct 25, 2021

[Security Solutions] Adds bsearch service to FTR e2e tests to reduce flake, boilerplate, and technique choices #116211

Merged

1 task

lukasolson added the NeededFor:Security label Dec 14, 2021

lukasolson closed this as completed May 9, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[data.search] Expose wait_for_completion_timeout for clients and tests to use #107241

[data.search] Expose wait_for_completion_timeout for clients and tests to use #107241

FrankHassanabad commented Jul 29, 2021 •

edited

Loading

elasticmachine commented Jul 29, 2021

lukasolson commented May 9, 2022

[data.search] Expose wait_for_completion_timeout for clients and tests to use #107241

[data.search] Expose wait_for_completion_timeout for clients and tests to use #107241

Comments

FrankHassanabad commented Jul 29, 2021 • edited Loading

elasticmachine commented Jul 29, 2021

lukasolson commented May 9, 2022

FrankHassanabad commented Jul 29, 2021 •

edited

Loading