Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support performance presets in the Elasticsearch output #37259

Merged
merged 17 commits into from Dec 5, 2023

Conversation

faec
Copy link
Contributor

@faec faec commented Nov 30, 2023

Proposed commit message

Adds the performance presets described in elastic/elastic-agent#3797 to the Elasticsearch output, configurable with the preset field.

Checklist

  • My code follows the style guidelines of this project
  • I have commented my code, particularly in hard-to-understand areas
  • I have made corresponding changes to the documentation (docs in followup PR since they aren't subject to feature freeze)
  • I have made corresponding change to the default configuration files
  • I have added tests that prove my fix is effective or that my feature works
  • I have added an entry in CHANGELOG.next.asciidoc or CHANGELOG-developer.next.asciidoc.

How to test this PR locally

Live config flags aren't directly exposed, but there are a few ways to confirm what's happening on a live run:

  • Add e.g. preset: balanced to the Elasticsearch output config, and confirm that the resulting log files have the message Applying performance preset 'balanced'
  • Add a preset and also a conflicting flag setting, e.g. bulk_max_size: 500, and confirm that the logs have the message Setting 'bulk_max_size' is ignored because of performance preset
  • Configure the throughput preset with a large input load, and confirm that the process uses more than 1 CPU.

Related issues

@faec faec added enhancement Team:Elastic-Agent Label for the Agent team labels Nov 30, 2023
@faec faec self-assigned this Nov 30, 2023
@botelastic botelastic bot added needs_team Indicates that the issue/PR needs a Team:* label and removed needs_team Indicates that the issue/PR needs a Team:* label labels Nov 30, 2023
Copy link
Contributor

mergify bot commented Nov 30, 2023

This pull request does not have a backport label.
If this is a bug or security fix, could you label this PR @faec? 🙏.
For such, you'll need to label your PR with:

  • The upcoming major version of the Elastic Stack
  • The upcoming minor version of the Elastic Stack (if you're not pushing a breaking change)

To fixup this pull request, you need to add the backport labels for the needed
branches, such as:

  • backport-v8./d.0 is the label to automatically backport to the 8./d branch. /d is the digit

@faec faec added the backport-skip Skip notification from the automated backport with mergify label Nov 30, 2023
@elasticmachine
Copy link
Collaborator

❕ Build Aborted

There is a new build on-going so the previous on-going builds have been aborted.

the below badges are clickable and redirect to their specific view in the CI or DOCS
Pipeline View Test View Changes Artifacts preview

Expand to view the summary

Build stats

  • Start Time: 2023-11-30T22:22:36.349+0000

  • Duration: 14 min 34 sec

Test stats 🧪

Test Results
Failed 0
Passed 3
Skipped 0
Total 3

🤖 GitHub comments

Expand to view the GitHub comments

To re-run your PR in the CI, just comment with:

  • /test : Re-trigger the build.

  • /package : Generate the packages and run the E2E tests.

  • /beats-tester : Run the installation tests with beats-tester.

  • run elasticsearch-ci/docs : Re-trigger the docs validation. (use unformatted text in the comment!)

@elasticmachine
Copy link
Collaborator

💚 Build Succeeded

the below badges are clickable and redirect to their specific view in the CI or DOCS
Pipeline View Test View Changes Artifacts preview preview

Expand to view the summary

Build stats

  • Duration: 132 min 25 sec

❕ Flaky test report

No test was executed to be analysed.

🤖 GitHub comments

Expand to view the GitHub comments

To re-run your PR in the CI, just comment with:

  • /test : Re-trigger the build.

  • /package : Generate the packages and run the E2E tests.

  • /beats-tester : Run the installation tests with beats-tester.

  • run elasticsearch-ci/docs : Re-trigger the docs validation. (use unformatted text in the comment!)

@faec
Copy link
Contributor Author

faec commented Dec 1, 2023

buildkite test it

@elasticmachine
Copy link
Collaborator

💔 Tests Failed

the below badges are clickable and redirect to their specific view in the CI or DOCS
Pipeline View Test View Changes Artifacts preview preview

Expand to view the summary

Build stats

  • Start Time: 2023-12-01T22:03:38.616+0000

  • Duration: 130 min 9 sec

Test stats 🧪

Test Results
Failed 2
Passed 28181
Skipped 1967
Total 30150

Test errors 2

Expand to view the tests failures

Build&Test / libbeat-unitTest / TestApplyPresetWithConflicts – github.com/elastic/beats/v7/libbeat/outputs/elasticsearch
    Expand to view the error details

     Failed 
    

    Expand to view the stacktrace

     === RUN   TestApplyPresetWithConflicts
        config_presets_test.go:102: 
            	Error Trace:	/var/lib/jenkins/workspace/PR-37259-3-71fc3fdb-0ae5-4424-871e-366494108326/src/github.com/elastic/beats/libbeat/outputs/elasticsearch/config_presets_test.go:102
            	Error:      	Not equal: 
            	            	expected: 7
            	            	actual  : 6
            	Test:       	TestApplyPresetWithConflicts
            	Messages:   	Number of conflicts should equal number of overridden fields
    --- FAIL: TestApplyPresetWithConflicts (0.00s)
     
    

Build&Test / libbeat-goIntegTest / TestApplyPresetWithConflicts – github.com/elastic/beats/v7/libbeat/outputs/elasticsearch
    Expand to view the error details

     Failed 
    

    Expand to view the stacktrace

     === RUN   TestApplyPresetWithConflicts
        config_presets_test.go:102: 
            	Error Trace:	/var/lib/jenkins/workspace/PR-37259-3-8b14170f-2674-4f13-912f-716ab2ae4ef6/src/github.com/elastic/beats/libbeat/outputs/elasticsearch/config_presets_test.go:102
            	Error:      	Not equal: 
            	            	expected: 7
            	            	actual  : 6
            	Test:       	TestApplyPresetWithConflicts
            	Messages:   	Number of conflicts should equal number of overridden fields
    --- FAIL: TestApplyPresetWithConflicts (0.00s)
     
    

Steps errors 8

Expand to view the steps failures

libbeat-unitTest - mage build unitTest
  • Took 12 min 20 sec . View more details here
  • Description: mage build unitTest
libbeat-unitTest - mage build unitTest
  • Took 10 min 32 sec . View more details here
  • Description: mage build unitTest
libbeat-unitTest - mage build unitTest
  • Took 9 min 29 sec . View more details here
  • Description: mage build unitTest
libbeat-goIntegTest - mage goIntegTest
  • Took 15 min 49 sec . View more details here
  • Description: mage goIntegTest
libbeat-goIntegTest - mage goIntegTest
  • Took 11 min 9 sec . View more details here
  • Description: mage goIntegTest
libbeat-goIntegTest - mage goIntegTest
  • Took 11 min 9 sec . View more details here
  • Description: mage goIntegTest
Print Message
  • Took 0 min 0 sec . View more details here
  • Description: �[39;49m[INFO] Allocating a worker with the labels 'immutable && ubuntu-22'.�[0m
Error signal
  • Took 0 min 0 sec . View more details here
  • Description: Error 'hudson.AbortException: script returned exit code 1'

🐛 Flaky test report

❕ There are test failures but not known flaky tests.

Expand to view the summary

Genuine test errors 2

💔 There are test failures but not known flaky tests, most likely a genuine test failure.

  • Name: Build&Test / libbeat-unitTest / TestApplyPresetWithConflicts – github.com/elastic/beats/v7/libbeat/outputs/elasticsearch
  • Name: Build&Test / libbeat-goIntegTest / TestApplyPresetWithConflicts – github.com/elastic/beats/v7/libbeat/outputs/elasticsearch

🤖 GitHub comments

Expand to view the GitHub comments

To re-run your PR in the CI, just comment with:

  • /test : Re-trigger the build.

  • /package : Generate the packages and run the E2E tests.

  • /beats-tester : Run the installation tests with beats-tester.

  • run elasticsearch-ci/docs : Re-trigger the docs validation. (use unformatted text in the comment!)

@faec faec marked this pull request as ready for review December 4, 2023 17:32
@faec faec requested a review from a team as a code owner December 4, 2023 17:32
@faec
Copy link
Contributor Author

faec commented Dec 4, 2023

Still adding docs, but marking ready for review since the code and tests are done

@elasticmachine
Copy link
Collaborator

❕ Build Aborted

Either there was a build timeout or someone aborted the build.

the below badges are clickable and redirect to their specific view in the CI or DOCS
Pipeline View Test View Changes Artifacts preview

Expand to view the summary

Build stats

  • Duration: 7 min 59 sec

🤖 GitHub comments

Expand to view the GitHub comments

To re-run your PR in the CI, just comment with:

  • /test : Re-trigger the build.

  • /package : Generate the packages and run the E2E tests.

  • /beats-tester : Run the installation tests with beats-tester.

  • run elasticsearch-ci/docs : Re-trigger the docs validation. (use unformatted text in the comment!)

// config overrides.
// Returns a list of the user fields that were overwritten.
func applyPreset(preset string, userConfig *config.C) ([]string, error) {
presetConfig := presetConfigs[preset]
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Might want to do presetConfig, ok := to check if the preset exists?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's what the presetConfig == nil check on the next line is for, the ok check wouldn't catch anything that nil doesn't already

libbeat/outputs/elasticsearch/elasticsearch.go Outdated Show resolved Hide resolved
Copy link
Member

@cmacknz cmacknz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is it possible to log all the values used for a preset when a preset is applied? We can just log the contents of presetConfigs for example.

Going to test this manually before approving besides some logging suggestions this looks good.

libbeat/outputs/elasticsearch/elasticsearch.go Outdated Show resolved Hide resolved
@elasticmachine
Copy link
Collaborator

❕ Build Aborted

Either there was a build timeout or someone aborted the build.

the below badges are clickable and redirect to their specific view in the CI or DOCS
Pipeline View Test View Changes Artifacts preview

Expand to view the summary

Build stats

  • Duration: 126 min 18 sec

🤖 GitHub comments

Expand to view the GitHub comments

To re-run your PR in the CI, just comment with:

  • /test : Re-trigger the build.

  • /package : Generate the packages and run the E2E tests.

  • /beats-tester : Run the installation tests with beats-tester.

  • run elasticsearch-ci/docs : Re-trigger the docs validation. (use unformatted text in the comment!)

@cmacknz
Copy link
Member

cmacknz commented Dec 4, 2023

Built Beats from this branch, copied them into an 8.12.0-SNAPSHOT agent package, then enrolled that agent with an 8.12.0-SNAPSHOT stateful deployment on cloud.

I then added two custom output parameters bulk_max_size: 1024 and worker: 2 and applied them. Then I added preset: throughput to the output configuration in Fleet and I see it propagate to the agent but I don't see any sign it was actually applied.

❯ sudo elastic-agent inspect
...
outputs:
  default:
    api_key: <REDACTED>
    bulk_max_size: 1024
    hosts:
    - https://68a148d3f66443eb824a0e350a781b22.eastus2.staging.azure.foundit.no:443
    preset: throughput
    type: elasticsearch
    worker: 2
...

❯ sudo elastic-agent logs -n 1000 | rg preset
❯

Do we need to make modifications in the control protocol output reload path as well?

// Set those variables regardless of the outcome of output.Reload
// this ensures that if we're on a failed output state and a new
// output configuration is sent, the Beat will gracefully exit
cm.lastOutputCfg = expected.Config
cm.lastBeatOutputCfg = reloadConfig
err = output.Reload(reloadConfig)
if err != nil {
return false, fmt.Errorf("failed to reload output: %w", err)
}

@elasticmachine
Copy link
Collaborator

💚 Build Succeeded

the below badges are clickable and redirect to their specific view in the CI or DOCS
Pipeline View Test View Changes Artifacts preview preview

Expand to view the summary

Build stats

  • Duration: 133 min 15 sec

❕ Flaky test report

No test was executed to be analysed.

🤖 GitHub comments

Expand to view the GitHub comments

To re-run your PR in the CI, just comment with:

  • /test : Re-trigger the build.

  • /package : Generate the packages and run the E2E tests.

  • /beats-tester : Run the installation tests with beats-tester.

  • run elasticsearch-ci/docs : Re-trigger the docs validation. (use unformatted text in the comment!)

@elasticmachine
Copy link
Collaborator

❕ Build Aborted

There is a new build on-going so the previous on-going builds have been aborted.

the below badges are clickable and redirect to their specific view in the CI or DOCS
Pipeline View Test View Changes Artifacts preview

Expand to view the summary

Build stats

  • Start Time: 2023-12-04T20:03:00.785+0000

  • Duration: 35 min 0 sec

Test stats 🧪

Test Results
Failed 2
Passed 15757
Skipped 553
Total 16312

Test errors 2

Expand to view the tests failures

Build&Test / x-pack/functionbeat-windows-2016-windows-2016 / [empty] – TEST-go-unit.xml
  • no error details
  • Expand to view the stacktrace

     Test report file C:\Users\jenkins\workspace\PR-37259-8-0fb30da6-e1b4-4929-9289-7141ce9334e7\src\github.com\elastic\beats\build\x-pack\functionbeat\build\TEST-go-unit.xml was length 0 
    

Build&Test / libbeat-unitTest / [empty] – TEST-go-unit.xml
  • no error details
  • Expand to view the stacktrace

     Test report file /var/lib/jenkins/workspace/PR-37259-8-6a539738-9ea1-4f0a-9eb9-030bcd605081/src/github.com/elastic/beats/build/libbeat/build/TEST-go-unit.xml was length 0 
    

Steps errors 1

Expand to view the steps failures

Error signal
  • Took 0 min 0 sec . View more details here
  • Description: Error 'org.jenkinsci.plugins.workflow.steps.FlowInterruptedException'

🤖 GitHub comments

Expand to view the GitHub comments

To re-run your PR in the CI, just comment with:

  • /test : Re-trigger the build.

  • /package : Generate the packages and run the E2E tests.

  • /beats-tester : Run the installation tests with beats-tester.

  • run elasticsearch-ci/docs : Re-trigger the docs validation. (use unformatted text in the comment!)

@elasticmachine
Copy link
Collaborator

💚 Build Succeeded

the below badges are clickable and redirect to their specific view in the CI or DOCS
Pipeline View Test View Changes Artifacts preview preview

Expand to view the summary

Build stats

  • Duration: 133 min 5 sec

❕ Flaky test report

No test was executed to be analysed.

🤖 GitHub comments

Expand to view the GitHub comments

To re-run your PR in the CI, just comment with:

  • /test : Re-trigger the build.

  • /package : Generate the packages and run the E2E tests.

  • /beats-tester : Run the installation tests with beats-tester.

  • run elasticsearch-ci/docs : Re-trigger the docs validation. (use unformatted text in the comment!)

@faec
Copy link
Contributor Author

faec commented Dec 4, 2023

Do we need to make modifications in the control protocol output reload path as well?

Possibly, but so far I think the issue is elsewhere. I can reproduce your problem, but some expected log messages seem to be missing even when the component is definitely restarted. I'm troubleshooting now with some extra-verbose custom builds.

@faec
Copy link
Contributor Author

faec commented Dec 4, 2023

I've verified (via incredibly verbose logging) that the presets are seen and correctly updates when changed in a live policy. However, none of the log messages at "warn" level are visible, which might be what was throwing us off. Is there some reason those might be getting filtered?

@faec
Copy link
Contributor Author

faec commented Dec 5, 2023

Found the real problem: our config library is doing some sort of tricky internal namespace tracking. It doesn't affect which flags are set / applied, but it does affect our override checking: when we call FlattenedKeys to get the user's original config keys, the config object "remembers" that it lives in an elasticsearch config tree and prepends elasticsearch. to every value which means we never detect any overrides.

Not all of the config helpers work this way: rendering the full config to a string doesn't include this prefix, and when we call Merge on the user and preset configs, it still overwrites the fields correctly despite only one of them having the elasticsearch namespace. This could be a bug in the library, or just a confusingly documented feature of the namespace types... for now I'll look for a workaround that will let us handle overridden fields correctly.

@faec faec requested a review from a team as a code owner December 5, 2023 16:27
@elasticmachine
Copy link
Collaborator

❕ Build Aborted

Either there was a build timeout or someone aborted the build.

the below badges are clickable and redirect to their specific view in the CI or DOCS
Pipeline View Test View Changes Artifacts preview

Expand to view the summary

Build stats

  • Duration: 11 min 29 sec

🤖 GitHub comments

Expand to view the GitHub comments

To re-run your PR in the CI, just comment with:

  • /test : Re-trigger the build.

  • /package : Generate the packages and run the E2E tests.

  • /beats-tester : Run the installation tests with beats-tester.

  • run elasticsearch-ci/docs : Re-trigger the docs validation. (use unformatted text in the comment!)

@elasticmachine
Copy link
Collaborator

❕ Build Aborted

Either there was a build timeout or someone aborted the build.

the below badges are clickable and redirect to their specific view in the CI or DOCS
Pipeline View Test View Changes Artifacts preview

Expand to view the summary

Build stats

  • Duration: 33 min 18 sec

🤖 GitHub comments

Expand to view the GitHub comments

To re-run your PR in the CI, just comment with:

  • /test : Re-trigger the build.

  • /package : Generate the packages and run the E2E tests.

  • /beats-tester : Run the installation tests with beats-tester.

  • run elasticsearch-ci/docs : Re-trigger the docs validation. (use unformatted text in the comment!)

@faec
Copy link
Contributor Author

faec commented Dec 5, 2023

"I'll just include the default/reference config changes in this PR too. This is surely not hubris and I will definitely not regret doing this the day before feature freeze," I thought yesterday. But anyway now all the CI checks are failing even though my local make check says everything is up to date. Investigating.

I'm planning to leave the other docs updates for a followup PR, to avoid similar issues. However, the problem from yesterday seems fixed, verified in manual testing, and I added a unit test for the specific workaround I had to add -- this should be ready for final review, subject to making CI happy.

@elasticmachine
Copy link
Collaborator

❕ Build Aborted

Either there was a build timeout or someone aborted the build.

the below badges are clickable and redirect to their specific view in the CI or DOCS
Pipeline View Test View Changes Artifacts preview

Expand to view the summary

Build stats

  • Duration: 45 min 57 sec

🤖 GitHub comments

Expand to view the GitHub comments

To re-run your PR in the CI, just comment with:

  • /test : Re-trigger the build.

  • /package : Generate the packages and run the E2E tests.

  • /beats-tester : Run the installation tests with beats-tester.

  • run elasticsearch-ci/docs : Re-trigger the docs validation. (use unformatted text in the comment!)

@elasticmachine
Copy link
Collaborator

❕ Build Aborted

Either there was a build timeout or someone aborted the build.

the below badges are clickable and redirect to their specific view in the CI or DOCS
Pipeline View Test View Changes Artifacts preview

Expand to view the summary

Build stats

  • Duration: 39 min 26 sec

🤖 GitHub comments

Expand to view the GitHub comments

To re-run your PR in the CI, just comment with:

  • /test : Re-trigger the build.

  • /package : Generate the packages and run the E2E tests.

  • /beats-tester : Run the installation tests with beats-tester.

  • run elasticsearch-ci/docs : Re-trigger the docs validation. (use unformatted text in the comment!)

Copy link
Member

@ebeahan ebeahan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Config template changes LGTM. Only nit about a deleted module file.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This module removal intentional? Didn't see a mention about the removal in the commit or description, but I may have missed it.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm -- not intentional on my part, but it looks like it was one of the side effects of doing a make update on the full repo. Looking closer, I see that there's an identical file already in that directory without the -xpack suffix, and that every other module in that directory has only a single config file with no -xpack suffix, so I think this behavior is correct and maybe it got missed because the CI's make check wasn't strict enough to enforce that regeneration. Good catch, thanks :-)

@elasticmachine
Copy link
Collaborator

💚 Build Succeeded

the below badges are clickable and redirect to their specific view in the CI or DOCS
Pipeline View Test View Changes Artifacts preview preview

Expand to view the summary

Build stats

  • Duration: 162 min 52 sec

❕ Flaky test report

No test was executed to be analysed.

🤖 GitHub comments

Expand to view the GitHub comments

To re-run your PR in the CI, just comment with:

  • /test : Re-trigger the build.

  • /package : Generate the packages and run the E2E tests.

  • /beats-tester : Run the installation tests with beats-tester.

  • run elasticsearch-ci/docs : Re-trigger the docs validation. (use unformatted text in the comment!)

Copy link
Member

@cmacknz cmacknz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Tested again with Fleet, works as expected. Thanks! 🚢

@faec faec merged commit 76919e0 into elastic:main Dec 5, 2023
129 checks passed
@faec faec deleted the config-presets branch December 5, 2023 19:54
Scholar-Li pushed a commit to Scholar-Li/beats that referenced this pull request Feb 5, 2024
Adds the performance presets described in elastic/elastic-agent#3797 to the Elasticsearch output, configurable with the `preset` field.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
backport-skip Skip notification from the automated backport with mergify enhancement Team:Elastic-Agent Label for the Agent team
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Add presets for performance tuning to ES output configuration
5 participants