Improving lambda Cold start #727

rapphil · 2023-05-30T17:33:36Z

Is your feature request related to a problem? Please describe.

A Lambda cold start happens when a new instance of a Lambda function must be created and initialized. The cold start refers to the delay between invocation and runtime created by the initialization process.
New instances needs to be initialized whenever other instances have expired due to inactivity or when there are more invocations than active instances. Cold starts are an inherent problem with Lambda functions because it is not possible to keep lambda initialized forever.

The OpenTelemetry SDK was not created with Lambda functions in mind. If you use OpenTelemetry inside a lambda function, the overhead of initializing the SDK and optionally auto instrumenting the application code adds up in the cold start time. This is specially painful for users because this inserts high latency in their application and increases the cost of running the lambdas.

Describe the solution you'd like

This proposal will tackle the cold start time of the OpenTelemetry lambda layers with the following plan:

Plan:

Continuously measure the cold start time of the layers with each release. This will help catching regressions in performance and also show trends and where we should invest our time.
Profile each layer to identify where all this time is spent on in the code.
Propose optimizations in the initialization of the SDK in each layer: Using the profiling information from the previous step, look for low hanging fruits and also more complex refactoring that will improve the performance.

Methodology for measuring the cold startup:

Measure the cold start time for lambdas with and without the layers for each supported layer.
- Create a sample application and deploy to a lambda function
- Generate load for this sample application
- Vary a parameter in the lambda function that will force the lambda to be recreated.
- Parse the logs of the lambda function with the following query:

filter @type="REPORT"
| filter ispresent(@initDuration)
| stats count(@initDuration) as coldStartCount, pct(@initDuration, 50) as p50Init, pct(@initDuration, 90) as p90Init, pct(@initDuration, 99) as p99Init group by @log

Methodology for profiling the lambda functions:

TBD - We will need to

Additional context
References

#263

The text was updated successfully, but these errors were encountered:

darnley · 2023-06-12T05:29:12Z

Cool! Glad we're addressing this issue. :-)

eiathom · 2023-08-15T17:54:10Z

Hi, is there any timeline or proposals in planning or in place to address the issues shown via metrics on cold start being recorded?

rapphil added the enhancement New feature or request label May 30, 2023

deki mentioned this issue Sep 6, 2023

Cold starts double to quadruple when layer is included aws-observability/aws-otel-lambda#228

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Improving lambda Cold start #727

Improving lambda Cold start #727

rapphil commented May 30, 2023

darnley commented Jun 12, 2023

eiathom commented Aug 15, 2023 •

edited

Improving lambda Cold start #727

Improving lambda Cold start #727

Comments

rapphil commented May 30, 2023

darnley commented Jun 12, 2023

eiathom commented Aug 15, 2023 • edited

eiathom commented Aug 15, 2023 •

edited