[APM][Poc] Optimize memory usage in the reduce script #187445
Draft
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Summary
POC to improve the memory consumption of the scripted metrics agg.
Current version
Running the script with the
oom.ts
scenario caused an Out of Memory (OOM) error. The heap usage spiked to around 96% before crashing.The problem is caused by the
paths
object, which is a matrix containing the entire path for all events:Rows: 8517
Columns (avg): 252 (could be many more)
Fields in each item: 3 (
service.name
,service.environment
,agent.name
)Size of a field in bytes (avg): 55
paths
object consumption (ROUGH ESTIMATION): 8517 rows * 252 columns * (3 fields * 55 bytes/field) = ~355 MB (all parallel requests combined)This amount of data can also lead to a "content length bigger than the maximum allowed string" exception.
Refactored version
With the refactored version, the heap usage appears linear and didn't exceed 90% in any of the 3 calls made.
The query was streamlined to respond with what is necessary for the frontend to render the map in the
connections
object. This creates a much smaller object that the query has to hold in memory compared to what paths require.eg:
With the oom scenario,
connections
holds approximately 550 unique connections. Therefore:Keys: ~550 * 2 (
source
anddestination
) = 1100Fields in each key: 3 (
service.name
,service.environment
,agent.name
fields)Size of a field in bytes (avg): 55
connections
object consumption (ROUGH ESTIMATION): 1100 keys * (3 fields * 55 bytes/field) = ~177KB (all parallel requests combined)Worst-case scenarios, such as a long cyclic map, might still cause the space complexity to be O(N^2) and could potentially lead to an OOM.