Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[APM][Poc] Optimize memory usage in the reduce script #187445

Draft
wants to merge 2 commits into
base: main
Choose a base branch
from

Conversation

crespocarlos
Copy link
Contributor

@crespocarlos crespocarlos commented Jul 3, 2024

Summary

POC to improve the memory consumption of the scripted metrics agg.

Current version

Running the script with the oom.ts scenario caused an Out of Memory (OOM) error. The heap usage spiked to around 96% before crashing.

image

The problem is caused by the paths object, which is a matrix containing the entire path for all events:

image

Rows: 8517
Columns (avg): 252 (could be many more)
Fields in each item: 3 (service.name, service.environment, agent.name)
Size of a field in bytes (avg): 55

paths object consumption (ROUGH ESTIMATION): 8517 rows * 252 columns * (3 fields * 55 bytes/field) = ~355 MB (all parallel requests combined)

This amount of data can also lead to a "content length bigger than the maximum allowed string" exception.

Refactored version

With the refactored version, the heap usage appears linear and didn't exceed 90% in any of the 3 calls made.

image

The query was streamlined to respond with what is necessary for the frontend to render the map in the connections object. This creates a much smaller object that the query has to hold in memory compared to what paths require.

eg:

"service-446~>service-447": {
  "destination": {
      "span.subtype": "unknown",
      "span.destination.service.resource": "service-447",
      "span.type": "app"
    },
    "source": {
      "service.environment": "Synthtrace: service_map",
      "service.name": "service-446",
      "agent.name": "nodejs"
    }
  },

With the oom scenario,connections holds approximately 550 unique connections. Therefore:

Keys: ~550 * 2 (source and destination) = 1100
Fields in each key: 3 (service.name, service.environment, agent.name fields)
Size of a field in bytes (avg): 55

connections object consumption (ROUGH ESTIMATION): 1100 keys * (3 fields * 55 bytes/field) = ~177KB (all parallel requests combined)

image

Worst-case scenarios, such as a long cyclic map, might still cause the space complexity to be O(N^2) and could potentially lead to an OOM.

@obltmachine
Copy link

🤖 GitHub comments

Expand to view the GitHub comments

Just comment with:

  • /oblt-deploy : Deploy a Kibana instance using the Observability test environments.
  • run docs-build : Re-trigger the docs validation. (use unformatted text in the comment!)

@crespocarlos crespocarlos force-pushed the 179229-service-map-reduce-script-improvement branch from 84430f4 to bf0bb62 Compare July 3, 2024 08:17
@crespocarlos crespocarlos changed the title 179229 service map reduce script improvement [APM][Poc] Optimize memory usage in the reduce script Jul 3, 2024
@crespocarlos
Copy link
Contributor Author

/ci

@elasticmachine
Copy link
Contributor

elasticmachine commented Jul 3, 2024

💔 Build Failed

Failed CI Steps

Test Failures

  • [job] [logs] Jest Tests #16 / AllCasesListGeneric Actions Assignees should show the assignees column on platinum license

Metrics [docs]

✅ unchanged

History

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants