-
Notifications
You must be signed in to change notification settings - Fork 8.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Reduces field capabilities event loop block times by scaling linearly using hashes #75718
Merged
FrankHassanabad
merged 3 commits into
elastic:master
from
FrankHassanabad:node-block-fix-caps
Aug 24, 2020
Merged
Reduces field capabilities event loop block times by scaling linearly using hashes #75718
FrankHassanabad
merged 3 commits into
elastic:master
from
FrankHassanabad:node-block-fix-caps
Aug 24, 2020
Conversation
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
…g linearly using hashes in two areas
FrankHassanabad
changed the title
Fixes field_capabilities to not block the NodeJS event loop by scaling linearly using hashes in two areas
Fixes field_capabilities reduce NodeJS event block times by scaling linearly using hashes
Aug 22, 2020
FrankHassanabad
changed the title
Fixes field_capabilities reduce NodeJS event block times by scaling linearly using hashes
Reduces field capabilities event loop block times by scaling linearly using hashes
Aug 22, 2020
Pinging @elastic/siem (Team:SIEM) |
💚 Build SucceededBuild metrics
History
To update your PR or re-run it, just comment with: |
mattkime
approved these changes
Aug 24, 2020
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Niiiiiiice!
FrankHassanabad
added a commit
to FrankHassanabad/kibana
that referenced
this pull request
Aug 24, 2020
… using hashes (elastic#75718) ## Summary On the `security_solution` project from different customers we have been getting reports of scaling issues and excessive NodeJS event blocking times. After in-depth tracing through some of the index and field capabilities calls we identified two of the "hot paths" running through `field_capabilities` to where it is using double looped arrays rather than hashes. By switching these two hot spots out for hashes we are now able to reduce the event loop block times by an order of magnitude. Before this PR you can see event loop block times as high as: ```ts field_cap: 575.131ms ``` And after this PR you will see event loop block times drop by an order of magnitude to: ```ts field_cap: 31.783ms ``` when you're calling into indexes as large as `filebeat-*`. This number can be higher if you're concatenating several large indexes together trying to get capabilities from each one all at once. We already only call `getFieldCapabilities` with one index at a time to spread out event block times. The fix is to use a hash within two key areas within these two files: ```ts src/plugins/data/server/index_patterns/fetcher/lib/field_capabilities/field_capabilities.ts src/plugins/data/server/index_patterns/fetcher/lib/field_capabilities/field_caps_response.ts ``` This effect happens during the query of `SourceQuery`/`IndexFields` within `security_solution` but you should be able to trigger it with any application code who calls into those code paths with large index sizes such as `filebeat-*` anywhere in Kibana. An explanation of how to see the block times for before and after --- Add, `console.time('field_cap');` and `console.timeEnd('field_cap');` to where the synchronize code is for testing the optimizations of before and after. For example around lines 45 with the original code: ```ts const esFieldCaps: FieldCapsResponse = await callFieldCapsApi(callCluster, indices); console.time('field_cap'); // <--- start timer const fieldsFromFieldCapsByName = keyBy(readFieldCapsResponse(esFieldCaps), 'name'); const allFieldsUnsorted = Object.keys(fieldsFromFieldCapsByName) .filter((name) => !name.startsWith('_')) .concat(metaFields) .reduce(concatIfUniq, [] as string[]) .map<FieldDescriptor>((name) => defaults({}, fieldsFromFieldCapsByName[name], { name, type: 'string', searchable: false, aggregatable: false, readFromDocValues: false, }) ) .map(mergeOverrides); const sorted = sortBy(allFieldsUnsorted, 'name'); console.timeEnd('field_cap'); // <--- outputs the end timer return sorted; ``` And around lines 45 with this pull request: ```ts const esFieldCaps: FieldCapsResponse = await callFieldCapsApi(callCluster, indices); console.time('field_cap'); // <--- start timer const fieldsFromFieldCapsByName = keyBy(readFieldCapsResponse(esFieldCaps), 'name'); const allFieldsUnsorted = Object.keys(fieldsFromFieldCapsByName) .filter((name) => !name.startsWith('_')) .concat(metaFields) .reduce<{ names: string[]; hash: Record<string, string> }>( (agg, value) => { // This is intentionally using a "hash" and a "push" to be highly optimized with very large indexes if (agg.hash[value] != null) { return agg; } else { agg.hash[value] = value; agg.names.push(value); return agg; } }, { names: [], hash: {} } ) .names.map<FieldDescriptor>((name) => defaults({}, fieldsFromFieldCapsByName[name], { name, type: 'string', searchable: false, aggregatable: false, readFromDocValues: false, }) ) .map(mergeOverrides); const sorted = sortBy(allFieldsUnsorted, 'name'); console.timeEnd('field_cap'); // <--- outputs the end timer return sorted; ``` And then reload the security solutions application web page or generically anything that is going to call filebeat-* index or another large index or you could concatenate several indexes together as well to test out the performance. For security solutions we can just visit any page such as this one below which has a filebeat-* index: ``` http://localhost:5601/app/security/timelines/ ``` Be sure to load it _twice_ for testing as NodeJS will sometimes report better numbers the second time as it does optimizations after the first time it encounters some code paths. You should begin to see numbers similar to this in the before: ```ts field_cap: 575.131ms ``` This indicates that it is blocking the event loop for around half a second before this fix. If an application adds additional indexes on-top of `filebeat`, or if it tries to execute other code after this (which we do in security solutions) then the block times will climb even higher. However, after this fix, the m^n are changed to use hashing so this only climb by some constant * n where n is your fields and for filebeat-* it will should very low around: ```ts field_cap: 31.783ms ``` ### Checklist Unit tests already present, so this shouldn't break anything 🤞 . - [x] [Unit or functional tests](https://www.elastic.co/guide/en/kibana/master/development-tests.html) were updated or added to match the most common scenarios
FrankHassanabad
added a commit
to FrankHassanabad/kibana
that referenced
this pull request
Aug 24, 2020
… using hashes (elastic#75718) ## Summary On the `security_solution` project from different customers we have been getting reports of scaling issues and excessive NodeJS event blocking times. After in-depth tracing through some of the index and field capabilities calls we identified two of the "hot paths" running through `field_capabilities` to where it is using double looped arrays rather than hashes. By switching these two hot spots out for hashes we are now able to reduce the event loop block times by an order of magnitude. Before this PR you can see event loop block times as high as: ```ts field_cap: 575.131ms ``` And after this PR you will see event loop block times drop by an order of magnitude to: ```ts field_cap: 31.783ms ``` when you're calling into indexes as large as `filebeat-*`. This number can be higher if you're concatenating several large indexes together trying to get capabilities from each one all at once. We already only call `getFieldCapabilities` with one index at a time to spread out event block times. The fix is to use a hash within two key areas within these two files: ```ts src/plugins/data/server/index_patterns/fetcher/lib/field_capabilities/field_capabilities.ts src/plugins/data/server/index_patterns/fetcher/lib/field_capabilities/field_caps_response.ts ``` This effect happens during the query of `SourceQuery`/`IndexFields` within `security_solution` but you should be able to trigger it with any application code who calls into those code paths with large index sizes such as `filebeat-*` anywhere in Kibana. An explanation of how to see the block times for before and after --- Add, `console.time('field_cap');` and `console.timeEnd('field_cap');` to where the synchronize code is for testing the optimizations of before and after. For example around lines 45 with the original code: ```ts const esFieldCaps: FieldCapsResponse = await callFieldCapsApi(callCluster, indices); console.time('field_cap'); // <--- start timer const fieldsFromFieldCapsByName = keyBy(readFieldCapsResponse(esFieldCaps), 'name'); const allFieldsUnsorted = Object.keys(fieldsFromFieldCapsByName) .filter((name) => !name.startsWith('_')) .concat(metaFields) .reduce(concatIfUniq, [] as string[]) .map<FieldDescriptor>((name) => defaults({}, fieldsFromFieldCapsByName[name], { name, type: 'string', searchable: false, aggregatable: false, readFromDocValues: false, }) ) .map(mergeOverrides); const sorted = sortBy(allFieldsUnsorted, 'name'); console.timeEnd('field_cap'); // <--- outputs the end timer return sorted; ``` And around lines 45 with this pull request: ```ts const esFieldCaps: FieldCapsResponse = await callFieldCapsApi(callCluster, indices); console.time('field_cap'); // <--- start timer const fieldsFromFieldCapsByName = keyBy(readFieldCapsResponse(esFieldCaps), 'name'); const allFieldsUnsorted = Object.keys(fieldsFromFieldCapsByName) .filter((name) => !name.startsWith('_')) .concat(metaFields) .reduce<{ names: string[]; hash: Record<string, string> }>( (agg, value) => { // This is intentionally using a "hash" and a "push" to be highly optimized with very large indexes if (agg.hash[value] != null) { return agg; } else { agg.hash[value] = value; agg.names.push(value); return agg; } }, { names: [], hash: {} } ) .names.map<FieldDescriptor>((name) => defaults({}, fieldsFromFieldCapsByName[name], { name, type: 'string', searchable: false, aggregatable: false, readFromDocValues: false, }) ) .map(mergeOverrides); const sorted = sortBy(allFieldsUnsorted, 'name'); console.timeEnd('field_cap'); // <--- outputs the end timer return sorted; ``` And then reload the security solutions application web page or generically anything that is going to call filebeat-* index or another large index or you could concatenate several indexes together as well to test out the performance. For security solutions we can just visit any page such as this one below which has a filebeat-* index: ``` http://localhost:5601/app/security/timelines/ ``` Be sure to load it _twice_ for testing as NodeJS will sometimes report better numbers the second time as it does optimizations after the first time it encounters some code paths. You should begin to see numbers similar to this in the before: ```ts field_cap: 575.131ms ``` This indicates that it is blocking the event loop for around half a second before this fix. If an application adds additional indexes on-top of `filebeat`, or if it tries to execute other code after this (which we do in security solutions) then the block times will climb even higher. However, after this fix, the m^n are changed to use hashing so this only climb by some constant * n where n is your fields and for filebeat-* it will should very low around: ```ts field_cap: 31.783ms ``` ### Checklist Unit tests already present, so this shouldn't break anything 🤞 . - [x] [Unit or functional tests](https://www.elastic.co/guide/en/kibana/master/development-tests.html) were updated or added to match the most common scenarios
FrankHassanabad
added a commit
that referenced
this pull request
Aug 24, 2020
… using hashes (#75718) (#75805) ## Summary On the `security_solution` project from different customers we have been getting reports of scaling issues and excessive NodeJS event blocking times. After in-depth tracing through some of the index and field capabilities calls we identified two of the "hot paths" running through `field_capabilities` to where it is using double looped arrays rather than hashes. By switching these two hot spots out for hashes we are now able to reduce the event loop block times by an order of magnitude. Before this PR you can see event loop block times as high as: ```ts field_cap: 575.131ms ``` And after this PR you will see event loop block times drop by an order of magnitude to: ```ts field_cap: 31.783ms ``` when you're calling into indexes as large as `filebeat-*`. This number can be higher if you're concatenating several large indexes together trying to get capabilities from each one all at once. We already only call `getFieldCapabilities` with one index at a time to spread out event block times. The fix is to use a hash within two key areas within these two files: ```ts src/plugins/data/server/index_patterns/fetcher/lib/field_capabilities/field_capabilities.ts src/plugins/data/server/index_patterns/fetcher/lib/field_capabilities/field_caps_response.ts ``` This effect happens during the query of `SourceQuery`/`IndexFields` within `security_solution` but you should be able to trigger it with any application code who calls into those code paths with large index sizes such as `filebeat-*` anywhere in Kibana. An explanation of how to see the block times for before and after --- Add, `console.time('field_cap');` and `console.timeEnd('field_cap');` to where the synchronize code is for testing the optimizations of before and after. For example around lines 45 with the original code: ```ts const esFieldCaps: FieldCapsResponse = await callFieldCapsApi(callCluster, indices); console.time('field_cap'); // <--- start timer const fieldsFromFieldCapsByName = keyBy(readFieldCapsResponse(esFieldCaps), 'name'); const allFieldsUnsorted = Object.keys(fieldsFromFieldCapsByName) .filter((name) => !name.startsWith('_')) .concat(metaFields) .reduce(concatIfUniq, [] as string[]) .map<FieldDescriptor>((name) => defaults({}, fieldsFromFieldCapsByName[name], { name, type: 'string', searchable: false, aggregatable: false, readFromDocValues: false, }) ) .map(mergeOverrides); const sorted = sortBy(allFieldsUnsorted, 'name'); console.timeEnd('field_cap'); // <--- outputs the end timer return sorted; ``` And around lines 45 with this pull request: ```ts const esFieldCaps: FieldCapsResponse = await callFieldCapsApi(callCluster, indices); console.time('field_cap'); // <--- start timer const fieldsFromFieldCapsByName = keyBy(readFieldCapsResponse(esFieldCaps), 'name'); const allFieldsUnsorted = Object.keys(fieldsFromFieldCapsByName) .filter((name) => !name.startsWith('_')) .concat(metaFields) .reduce<{ names: string[]; hash: Record<string, string> }>( (agg, value) => { // This is intentionally using a "hash" and a "push" to be highly optimized with very large indexes if (agg.hash[value] != null) { return agg; } else { agg.hash[value] = value; agg.names.push(value); return agg; } }, { names: [], hash: {} } ) .names.map<FieldDescriptor>((name) => defaults({}, fieldsFromFieldCapsByName[name], { name, type: 'string', searchable: false, aggregatable: false, readFromDocValues: false, }) ) .map(mergeOverrides); const sorted = sortBy(allFieldsUnsorted, 'name'); console.timeEnd('field_cap'); // <--- outputs the end timer return sorted; ``` And then reload the security solutions application web page or generically anything that is going to call filebeat-* index or another large index or you could concatenate several indexes together as well to test out the performance. For security solutions we can just visit any page such as this one below which has a filebeat-* index: ``` http://localhost:5601/app/security/timelines/ ``` Be sure to load it _twice_ for testing as NodeJS will sometimes report better numbers the second time as it does optimizations after the first time it encounters some code paths. You should begin to see numbers similar to this in the before: ```ts field_cap: 575.131ms ``` This indicates that it is blocking the event loop for around half a second before this fix. If an application adds additional indexes on-top of `filebeat`, or if it tries to execute other code after this (which we do in security solutions) then the block times will climb even higher. However, after this fix, the m^n are changed to use hashing so this only climb by some constant * n where n is your fields and for filebeat-* it will should very low around: ```ts field_cap: 31.783ms ``` ### Checklist Unit tests already present, so this shouldn't break anything 🤞 . - [x] [Unit or functional tests](https://www.elastic.co/guide/en/kibana/master/development-tests.html) were updated or added to match the most common scenarios
FrankHassanabad
added a commit
that referenced
this pull request
Aug 24, 2020
… using hashes (#75718) (#75804) ## Summary On the `security_solution` project from different customers we have been getting reports of scaling issues and excessive NodeJS event blocking times. After in-depth tracing through some of the index and field capabilities calls we identified two of the "hot paths" running through `field_capabilities` to where it is using double looped arrays rather than hashes. By switching these two hot spots out for hashes we are now able to reduce the event loop block times by an order of magnitude. Before this PR you can see event loop block times as high as: ```ts field_cap: 575.131ms ``` And after this PR you will see event loop block times drop by an order of magnitude to: ```ts field_cap: 31.783ms ``` when you're calling into indexes as large as `filebeat-*`. This number can be higher if you're concatenating several large indexes together trying to get capabilities from each one all at once. We already only call `getFieldCapabilities` with one index at a time to spread out event block times. The fix is to use a hash within two key areas within these two files: ```ts src/plugins/data/server/index_patterns/fetcher/lib/field_capabilities/field_capabilities.ts src/plugins/data/server/index_patterns/fetcher/lib/field_capabilities/field_caps_response.ts ``` This effect happens during the query of `SourceQuery`/`IndexFields` within `security_solution` but you should be able to trigger it with any application code who calls into those code paths with large index sizes such as `filebeat-*` anywhere in Kibana. An explanation of how to see the block times for before and after --- Add, `console.time('field_cap');` and `console.timeEnd('field_cap');` to where the synchronize code is for testing the optimizations of before and after. For example around lines 45 with the original code: ```ts const esFieldCaps: FieldCapsResponse = await callFieldCapsApi(callCluster, indices); console.time('field_cap'); // <--- start timer const fieldsFromFieldCapsByName = keyBy(readFieldCapsResponse(esFieldCaps), 'name'); const allFieldsUnsorted = Object.keys(fieldsFromFieldCapsByName) .filter((name) => !name.startsWith('_')) .concat(metaFields) .reduce(concatIfUniq, [] as string[]) .map<FieldDescriptor>((name) => defaults({}, fieldsFromFieldCapsByName[name], { name, type: 'string', searchable: false, aggregatable: false, readFromDocValues: false, }) ) .map(mergeOverrides); const sorted = sortBy(allFieldsUnsorted, 'name'); console.timeEnd('field_cap'); // <--- outputs the end timer return sorted; ``` And around lines 45 with this pull request: ```ts const esFieldCaps: FieldCapsResponse = await callFieldCapsApi(callCluster, indices); console.time('field_cap'); // <--- start timer const fieldsFromFieldCapsByName = keyBy(readFieldCapsResponse(esFieldCaps), 'name'); const allFieldsUnsorted = Object.keys(fieldsFromFieldCapsByName) .filter((name) => !name.startsWith('_')) .concat(metaFields) .reduce<{ names: string[]; hash: Record<string, string> }>( (agg, value) => { // This is intentionally using a "hash" and a "push" to be highly optimized with very large indexes if (agg.hash[value] != null) { return agg; } else { agg.hash[value] = value; agg.names.push(value); return agg; } }, { names: [], hash: {} } ) .names.map<FieldDescriptor>((name) => defaults({}, fieldsFromFieldCapsByName[name], { name, type: 'string', searchable: false, aggregatable: false, readFromDocValues: false, }) ) .map(mergeOverrides); const sorted = sortBy(allFieldsUnsorted, 'name'); console.timeEnd('field_cap'); // <--- outputs the end timer return sorted; ``` And then reload the security solutions application web page or generically anything that is going to call filebeat-* index or another large index or you could concatenate several indexes together as well to test out the performance. For security solutions we can just visit any page such as this one below which has a filebeat-* index: ``` http://localhost:5601/app/security/timelines/ ``` Be sure to load it _twice_ for testing as NodeJS will sometimes report better numbers the second time as it does optimizations after the first time it encounters some code paths. You should begin to see numbers similar to this in the before: ```ts field_cap: 575.131ms ``` This indicates that it is blocking the event loop for around half a second before this fix. If an application adds additional indexes on-top of `filebeat`, or if it tries to execute other code after this (which we do in security solutions) then the block times will climb even higher. However, after this fix, the m^n are changed to use hashing so this only climb by some constant * n where n is your fields and for filebeat-* it will should very low around: ```ts field_cap: 31.783ms ``` ### Checklist Unit tests already present, so this shouldn't break anything 🤞 . - [x] [Unit or functional tests](https://www.elastic.co/guide/en/kibana/master/development-tests.html) were updated or added to match the most common scenarios
parkiino
pushed a commit
to parkiino/kibana
that referenced
this pull request
Sep 1, 2020
… using hashes (elastic#75718) ## Summary On the `security_solution` project from different customers we have been getting reports of scaling issues and excessive NodeJS event blocking times. After in-depth tracing through some of the index and field capabilities calls we identified two of the "hot paths" running through `field_capabilities` to where it is using double looped arrays rather than hashes. By switching these two hot spots out for hashes we are now able to reduce the event loop block times by an order of magnitude. Before this PR you can see event loop block times as high as: ```ts field_cap: 575.131ms ``` And after this PR you will see event loop block times drop by an order of magnitude to: ```ts field_cap: 31.783ms ``` when you're calling into indexes as large as `filebeat-*`. This number can be higher if you're concatenating several large indexes together trying to get capabilities from each one all at once. We already only call `getFieldCapabilities` with one index at a time to spread out event block times. The fix is to use a hash within two key areas within these two files: ```ts src/plugins/data/server/index_patterns/fetcher/lib/field_capabilities/field_capabilities.ts src/plugins/data/server/index_patterns/fetcher/lib/field_capabilities/field_caps_response.ts ``` This effect happens during the query of `SourceQuery`/`IndexFields` within `security_solution` but you should be able to trigger it with any application code who calls into those code paths with large index sizes such as `filebeat-*` anywhere in Kibana. An explanation of how to see the block times for before and after --- Add, `console.time('field_cap');` and `console.timeEnd('field_cap');` to where the synchronize code is for testing the optimizations of before and after. For example around lines 45 with the original code: ```ts const esFieldCaps: FieldCapsResponse = await callFieldCapsApi(callCluster, indices); console.time('field_cap'); // <--- start timer const fieldsFromFieldCapsByName = keyBy(readFieldCapsResponse(esFieldCaps), 'name'); const allFieldsUnsorted = Object.keys(fieldsFromFieldCapsByName) .filter((name) => !name.startsWith('_')) .concat(metaFields) .reduce(concatIfUniq, [] as string[]) .map<FieldDescriptor>((name) => defaults({}, fieldsFromFieldCapsByName[name], { name, type: 'string', searchable: false, aggregatable: false, readFromDocValues: false, }) ) .map(mergeOverrides); const sorted = sortBy(allFieldsUnsorted, 'name'); console.timeEnd('field_cap'); // <--- outputs the end timer return sorted; ``` And around lines 45 with this pull request: ```ts const esFieldCaps: FieldCapsResponse = await callFieldCapsApi(callCluster, indices); console.time('field_cap'); // <--- start timer const fieldsFromFieldCapsByName = keyBy(readFieldCapsResponse(esFieldCaps), 'name'); const allFieldsUnsorted = Object.keys(fieldsFromFieldCapsByName) .filter((name) => !name.startsWith('_')) .concat(metaFields) .reduce<{ names: string[]; hash: Record<string, string> }>( (agg, value) => { // This is intentionally using a "hash" and a "push" to be highly optimized with very large indexes if (agg.hash[value] != null) { return agg; } else { agg.hash[value] = value; agg.names.push(value); return agg; } }, { names: [], hash: {} } ) .names.map<FieldDescriptor>((name) => defaults({}, fieldsFromFieldCapsByName[name], { name, type: 'string', searchable: false, aggregatable: false, readFromDocValues: false, }) ) .map(mergeOverrides); const sorted = sortBy(allFieldsUnsorted, 'name'); console.timeEnd('field_cap'); // <--- outputs the end timer return sorted; ``` And then reload the security solutions application web page or generically anything that is going to call filebeat-* index or another large index or you could concatenate several indexes together as well to test out the performance. For security solutions we can just visit any page such as this one below which has a filebeat-* index: ``` http://localhost:5601/app/security/timelines/ ``` Be sure to load it _twice_ for testing as NodeJS will sometimes report better numbers the second time as it does optimizations after the first time it encounters some code paths. You should begin to see numbers similar to this in the before: ```ts field_cap: 575.131ms ``` This indicates that it is blocking the event loop for around half a second before this fix. If an application adds additional indexes on-top of `filebeat`, or if it tries to execute other code after this (which we do in security solutions) then the block times will climb even higher. However, after this fix, the m^n are changed to use hashing so this only climb by some constant * n where n is your fields and for filebeat-* it will should very low around: ```ts field_cap: 31.783ms ``` ### Checklist Unit tests already present, so this shouldn't break anything 🤞 . - [x] [Unit or functional tests](https://www.elastic.co/guide/en/kibana/master/development-tests.html) were updated or added to match the most common scenarios
MindyRS
added
the
Team: SecuritySolution
Security Solutions Team working on SIEM, Endpoint, Timeline, Resolver, etc.
label
Sep 23, 2021
Pinging @elastic/security-solution (Team: SecuritySolution) |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Labels
release_note:fix
Team: SecuritySolution
Security Solutions Team working on SIEM, Endpoint, Timeline, Resolver, etc.
Team:SIEM
v7.9.1
v7.10.0
v8.0.0
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Summary
On the
security_solution
project from different customers we have been getting reports of scaling issues and excessive NodeJS event blocking times. After in-depth tracing through some of the index and field capabilities calls we identified two of the "hot paths" running throughfield_capabilities
to where it is using double looped arrays rather than hashes. By switching these two hot spots out for hashes we are now able to reduce the event loop block times by an order of magnitude.Before this PR you can see event loop block times as high as:
And after this PR you will see event loop block times drop by an order of magnitude to:
when you're calling into indexes as large as
filebeat-*
. This number can be higher if you're concatenating several large indexes together trying to get capabilities from each one all at once. We already only callgetFieldCapabilities
with one index at a time to spread out event block times.The fix is to use a hash within two key areas within these two files:
This effect happens during the query of
SourceQuery
/IndexFields
withinsecurity_solution
but you should be able to trigger it with any application code who calls into those code paths with large index sizes such asfilebeat-*
anywhere in Kibana.An explanation of how to see the block times for before and after
Add,
console.time('field_cap');
andconsole.timeEnd('field_cap');
to where the synchronize code is for testing the optimizations of before and after.For example around lines 45 with the original code:
And around lines 45 with this pull request:
And then reload the security solutions application web page or generically anything that is going to call filebeat-* index or another large index or you could concatenate several indexes together as well to test out the performance. For security solutions we can just visit any page such as this one below which has a filebeat-* index:
Be sure to load it twice for testing as NodeJS will sometimes report better numbers the second time as it does optimizations after the first time it encounters some code paths.
You should begin to see numbers similar to this in the before:
This indicates that it is blocking the event loop for around half a second before this fix. If an application adds additional indexes on-top of
filebeat
, or if it tries to execute other code after this (which we do in security solutions) then the block times will climb even higher.However, after this fix, the m^n are changed to use hashing so this only climb by some constant * n where n is your fields and for filebeat-* it will should very low around:
Checklist
Unit tests already present, so this shouldn't break anything 🤞 .