Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: 'elasticsearchCaptureBodyUrls' config option #2873

Merged
merged 10 commits into from
Aug 18, 2022

Conversation

trentm
Copy link
Member

@trentm trentm commented Aug 12, 2022

This adds a new config option, per the
'elasticsearch_capture_body_urls' cross-agent spec, to control for which
ES client requests the request body is captured. By default it is just
the body of search-related APIs. The use case for configuring this is
to capture for all requests for reply in Kibana performance work. This
option is not expected to be used often.

Closes: #2880
Closes: #2019
Refs: elastic/apm#665
Refs: elastic/apm#670

checklist

This adds a new config option, per the
'elasticsearch_capture_body_urls' cross-agent spec, to control for which
ES client requests the request body is captured. By default it is just
the body of search-related APIs. The use case for configuring this is
to capture for *all* requests for reply in Kibana performance work. This
option is not expected to be used often.
@trentm trentm self-assigned this Aug 12, 2022
@github-actions github-actions bot added the agent-nodejs Make available for APM Agents project planning. label Aug 12, 2022
@apmmachine
Copy link
Contributor

apmmachine commented Aug 12, 2022

💚 Build Succeeded

the below badges are clickable and redirect to their specific view in the CI or DOCS
Pipeline View Test View Changes Artifacts preview preview

Expand to view the summary

Build stats

  • Start Time: 2022-08-12T23:15:21.852+0000

  • Duration: 34 min 28 sec

Test stats 🧪

Test Results
Failed 0
Passed 260042
Skipped 0
Total 260042

🤖 GitHub comments

To re-run your PR in the CI, just comment with:

  • /test : Re-trigger the build.

  • run module tests for <modules> : Run TAV tests for one or more modules, where <modules> can be either a comma separated list of modules (e.g. memcached,redis) or the string literal ALL to test all modules

  • run benchmark tests : Run the benchmark test only.

  • run elasticsearch-ci/docs : Re-trigger the docs validation. (use unformatted text in the comment!)

@trentm
Copy link
Member Author

trentm commented Aug 12, 2022

run module tests for elasticsearch,@elastic/elasticsearch

@trentm trentm marked this pull request as ready for review August 13, 2022 00:06
@trentm trentm requested a review from astorm August 13, 2022 00:07
@trentm
Copy link
Member Author

trentm commented Aug 15, 2022

some premature optimization perf notes

I digressed slightly to sanity check that the change in ES URL path checking here (from a custom pathIsAQuery regexp, to a WildcardMatcher-created array of regexps) will not be a performance problem.

tl;dr: The WildcardMatcher-based comparison is much slower. However, it is on the order of 1 microsecond per path check on my MacBook Pro (2.4 GHz 8-Core Intel Core i9). Given that ES queries are remote calls and on the order of milliseconds, it would generally be premature optimization to bother getting fancy to improve performance of this.

details

Here is my benchmark script: https://gist.github.com/trentm/ba426e5a08317b18d40e897aeb536c0c

And a run:

% node bench-shouldCaptureBody.js
WilcardMatcher array:
	[/^.*\/_search$/i, /^.*\/_search\/template$/i, /^.*\/_msearch$/i, /^.*\/_msearch\/template$/i, /^.*\/_async_search$/i, /^.*\/_count$/i, /^.*\/_sql$/i, /^.*\/_eql\/search$/i]
Current pathIsAQuery RegExp: /\/(_search|_msearch|_count|_async_search|_sql|_eql)(\/|$)/
Naive WildcardMatcher in a single RegExp:
	/^(.*\/_search|.*\/_search\/template|.*\/_msearch|.*\/_msearch\/template|.*\/_async_search|.*\/_count|.*\/_sql|.*\/_eql\/search)$/i
customEquivReIgnoreCase:
	/\/(_search|_search\/template|_msearch|_msearch\/template|_async_search|_count|_sql|_eql\/search)$/i
customEquivRe:
	/\/(_search|_search\/template|_msearch|_msearch\/template|_async_search|_count|_sql|_eql\/search)$/

WildcardMatcher array x 719,208 ops/sec ±0.99% (90 runs sampled)
Current pathIsAQuery RegExp x 15,700,534 ops/sec ±0.97% (90 runs sampled)
Naive WildcardMatcher in a single RegExp x 856,094 ops/sec ±1.10% (88 runs sampled)
customEquivReIgnoreCase x 17,477,871 ops/sec ±1.37% (88 runs sampled)
customEquivRe x 17,942,769 ops/sec ±0.91% (88 runs sampled)
Fastest is customEquivRe
  1. The WildcardMatcher-based array of regexps that this PR changes to is ~20x slower than the current pathIsAQuery RegExp.
  2. A naive potential optimization might be to make a WildcardMatcher array config into a single regexp -- ^(pattern1|pattern2|...)$. This turns out to be not much faster, in this case.
  3. The major perf difference is that the (a) the WildcardMatcher necessarily always has a ^ anchor and (b) every pattern is leading with * to avoid that anchor. A possible optimization of WildcardMatcher usage would be to drop the ^ anchor and the leading * if and only if every pattern had that leading *. An equivalent optimization could be done for the '$' anchor and a trailing * on every pattern.
  4. There is a small perf win with dropping the case-insensitive flag. Arguably the default elasticsearch_capture_body_urls values could/should be case-sensitive. I don't believe the Elasticsearch REST API would accept POST /my-index/_SeArCh.

However, this is all fast enough, so I'm not proposing doing any WildcardMatcher perf improvements right now.

@trentm
Copy link
Member Author

trentm commented Aug 17, 2022

@astorm the spec for this will get merged tomorrow, so feel free to review this when you get a chance. Thanks.

@@ -18,6 +18,8 @@ const apm = require('../').start({ // elastic-apm-node
logUncaughtExceptions: true
})

// Note that version *8* is installed by default. To use v7 you'll need to:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: Might be better to say version 7 is not installed by default instead of Note that version 8 is installed by default, since the later will inevitably be not true one day but the former will almost certainly be true into the future.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sounds good to me, thanks.

Note, if interested: For a while I'd had both v7 and v8 installed using an alias for v7 -- "@elastic/elasticsearch7": "npm:@elastic/elasticsearch@7.x". However the default npm in node v8 doesn't support that alias syntax, so I dropped it.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

commit f8631c0

@@ -249,6 +249,7 @@ declare namespace apm {
contextPropagationOnly?: boolean;
disableInstrumentations?: string | string[];
disableSend?: boolean;
elasticsearchCaptureBodyUrls?: Array<string>;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No objection, but why Array<string> instead of string[]?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Honestly just copying syntax from elsewhere in this file. https://www.typescriptlang.org/docs/handbook/2/everyday-types.html#arrays says both are fine and doesn't claim a preference, so I'm fine with either way. Both are used in index.d.ts.

@@ -10,64 +10,50 @@
// - elasticsearch - the legacy Elasticsearch JS client
// - @elastic/elasticsearch - the new Elasticsearch JS client

const querystring = require('querystring')
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍 dings the "two require('querystring') references left before we can drop another dep." bell 🛎️

@trentm trentm merged commit ee08f45 into main Aug 18, 2022
@trentm trentm deleted the trentm/elasticsearch_capture_body_urls branch August 18, 2022 23:54
@apmmachine
Copy link
Contributor

💚 Build Succeeded

the below badges are clickable and redirect to their specific view in the CI or DOCS
Pipeline View Test View Changes Artifacts preview preview

Expand to view the summary

Build stats

  • Start Time: 2022-08-18T23:49:18.790+0000

  • Duration: 21 min 41 sec

Test stats 🧪

Test Results
Failed 0
Passed 260042
Skipped 0
Total 260042

🤖 GitHub comments

To re-run your PR in the CI, just comment with:

  • /test : Re-trigger the build.

  • run module tests for <modules> : Run TAV tests for one or more modules, where <modules> can be either a comma separated list of modules (e.g. memcached,redis) or the string literal ALL to test all modules

  • run benchmark tests : Run the benchmark test only.

  • run elasticsearch-ci/docs : Re-trigger the docs validation. (use unformatted text in the comment!)

@trentm
Copy link
Member Author

trentm commented Oct 19, 2022

elastic-apm-node@3.39.0 is release with this now

fpm-peter pushed a commit to fpm-git/apm-agent-nodejs that referenced this pull request Aug 20, 2024
This adds a new config option, per the
'elasticsearch_capture_body_urls' cross-agent spec, to control for which
ES client requests the request body is captured. By default it is just
the body of search-related APIs. The use case for configuring this is
to capture for *all* requests for reply in Kibana performance work. This
option is not expected to be used often.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
agent-nodejs Make available for APM Agents project planning. performance
Projects
None yet
3 participants