Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improving lambda Cold start #727

Open
rapphil opened this issue May 30, 2023 · 2 comments
Open

Improving lambda Cold start #727

rapphil opened this issue May 30, 2023 · 2 comments
Labels
enhancement New feature or request

Comments

@rapphil
Copy link
Contributor

rapphil commented May 30, 2023

Is your feature request related to a problem? Please describe.

A Lambda cold start happens when a new instance of a Lambda function must be created and initialized. The cold start refers to the delay between invocation and runtime created by the initialization process.
New instances needs to be initialized whenever other instances have expired due to inactivity or when there are more invocations than active instances. Cold starts are an inherent problem with Lambda functions because it is not possible to keep lambda initialized forever.

The OpenTelemetry SDK was not created with Lambda functions in mind. If you use OpenTelemetry inside a lambda function, the overhead of initializing the SDK and optionally auto instrumenting the application code adds up in the cold start time. This is specially painful for users because this inserts high latency in their application and increases the cost of running the lambdas.

Describe the solution you'd like

This proposal will tackle the cold start time of the OpenTelemetry lambda layers with the following plan:

Plan:

  • Continuously measure the cold start time of the layers with each release. This will help catching regressions in performance and also show trends and where we should invest our time.
  • Profile each layer to identify where all this time is spent on in the code.
  • Propose optimizations in the initialization of the SDK in each layer: Using the profiling information from the previous step, look for low hanging fruits and also more complex refactoring that will improve the performance.

Methodology for measuring the cold startup:

  • Measure the cold start time for lambdas with and without the layers for each supported layer.
    • Create a sample application and deploy to a lambda function
    • Generate load for this sample application
    • Vary a parameter in the lambda function that will force the lambda to be recreated.
    • Parse the logs of the lambda function with the following query:
filter @type="REPORT"
| filter ispresent(@initDuration)
| stats count(@initDuration) as coldStartCount, pct(@initDuration, 50) as p50Init, pct(@initDuration, 90) as p90Init, pct(@initDuration, 99) as p99Init group by @log 

Methodology for profiling the lambda functions:

  • TBD - We will need to

Additional context
References

#263

@rapphil rapphil added the enhancement New feature or request label May 30, 2023
@darnley
Copy link

darnley commented Jun 12, 2023

Cool! Glad we're addressing this issue. :-)

@eiathom
Copy link

eiathom commented Aug 15, 2023

Hi, is there any timeline or proposals in planning or in place to address the issues shown via metrics on cold start being recorded?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

3 participants