[ci] Provide some mechanism to follow the status of a specific FTR config #131879

spalger · 2022-05-09T23:20:31Z

When CI was broken up manually into CI Groups you could watch a specific CI Group which you knew included some test that you were working on and when it passed you knew your work was done. We lost the ability to do that when we moved to dynamically allocated FTR Config Groups because configs move around and are all in anonymous FTR Configs #X/Y groups so the only option is to wait for CI to finish completely.

I have a couple ideas for how we might address this, but I'm open to suggestions:

Automatically detect FTR configs which failed (not flaky) in the previous build of a PR and run them in a separate group
Allow authors to list specific configs which they want to highlight in the description of a PR and run those "configs of interest" in a separate group

These "separate groups" would really be group types, which are planned and automatically split up based on the expected execution time of those tests. They would often only include a single config but importantly they would report a unique status item to github and run in a separate job in Buildkite so the status of those interesting configs could be watched by PR authors.

We should be able to do just about all of this logic in the ci-stats API, but will need to update the kibana-buildkite-library to upload the right pipeline based on the results.

Thoughts?

The text was updated successfully, but these errors were encountered:

elasticmachine · 2022-05-09T23:20:33Z

Pinging @elastic/kibana-operations (Team:Operations)

mattkime · 2022-05-10T02:04:42Z

Broadly speaking, working on APIs differs from working on kibana app functionality - in one case you know which suites you're interested in and the other you really have no idea what might break.

Placing previously failed test runs in a separate group could certainly give faster feedback.

Suggestion - use a github check for each FTR config. This would be more granular than the suites we had before but also more meaningful.

spalger · 2022-05-10T22:08:50Z

Discussed with @mattkime and @brianseeders today, we're going to try running any FTR config that is expected to execute over 2-3 minutes in it's own worker, then all the rest of the configs in small FTR config groups (mostly FTR configs where all tests are skipped). The hope here is to reach a compromise where logs are as accessible as possible, CI can continue to scale while reducing costs, and users have a better experience because statuses will mostly be assigned to specific FTR configs and links will take you directly to the log output of that config.

pheyos · 2022-05-11T08:22:13Z

I think besides watching previously failed tests, one other aspect is the addition of new tests as part of a PR, where the author has a particular interest in seeing the successful execution and maybe also the execution time. I like the idea to run many of the configs in separate workers, which allows to follow the test groups more closely.

spalger added Team:Operations Team label for Operations Team enhancement New value added to drive a business result labels May 9, 2022

spalger mentioned this issue May 9, 2022

[ci] 'FTR Configs' check title => Functional tests #131824

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[ci] Provide some mechanism to follow the status of a specific FTR config #131879

[ci] Provide some mechanism to follow the status of a specific FTR config #131879

spalger commented May 9, 2022

elasticmachine commented May 9, 2022

mattkime commented May 10, 2022

spalger commented May 10, 2022

pheyos commented May 11, 2022

[ci] Provide some mechanism to follow the status of a specific FTR config #131879

[ci] Provide some mechanism to follow the status of a specific FTR config #131879

Comments

spalger commented May 9, 2022

elasticmachine commented May 9, 2022

mattkime commented May 10, 2022

spalger commented May 10, 2022

pheyos commented May 11, 2022