Lambda Timeouts #573

toreyjs · 2019-09-03T20:47:05Z

I'm submitting a...

Bug report

Expected Behavior

Builds should finish in ~1s without erroring/timing out

Current Behavior

Somewhat randomly (though often in groups) error spikes will happen, accompanied by time outs.

Average error rate seems to be ~2-3% per day according to Alexa Dev Console session metrics (although it has gone above it by a lot for a few days, higher lambda timeouts seem to help a bit)

What I've Tried

I narrowed down what external services are used in my skill to: external assets, dynamoDB, analytics.

Based on logs, it does seem to save (Saved user: [id]), analytics return, it sends a response (Response JSON (jovo-framework) / Response (framework)) and ends with -- middleware 'platform.output' done - however at this point it eventually times out (based on how long my lambda timeout is, currently 8s).

Our working guess is that despite a response being sent the lambda never closes, which eventually sends and END RequestId: [id], causing an error despite everything going alright. However, nothing other than the lambda seem to be an issue when tested (everything seems to return, DB metrics look fine, mp3 files are only used on device side).

Small side note is that we have two jovo skills, and interestingly the large error / duration spikes happen at almost exactly the same time / amount (looking at cloudwatch metrics). Another skill the client has also uses Jovo and while it does have timeouts, they don't mirror the results like my two skills do. Note: My two skills use separate databases / lambdas, but do use the same server for assets (though nothing fetched server side) and same analytics platform (though again it always returns way before timeout); nothing else should be shared.

Logs

Logs look something like this (with DEBUG level on):

@timestamp,@message
(NOTE: the line below is a log I have at the top of app.js, right before 'new App()')
18:00:35.004,"2019-08-21T18:00:35.003Z	[id]	process.env.DEBUG = false, global.gAppMode = production
18:00:34.421,"END RequestId: [id]
18:00:34.421,"REPORT RequestId: [id]	Duration: 8005.38 ms	Billed Duration: 8000 ms 	Memory Size: 128 MB	Max Memory Used: 53 MB	
18:00:34.421,"2019-08-21T18:00:34.420Z [id] Task timed out after 8.01 seconds

18:00:27.724,"[id]	-- middleware 'platform.output' done (38 ms)
18:00:27.724,"[id]	
18:00:27.724,"[id]	Response  (framework) -------------------------------------------------
18:00:27.724,"[id]	
18:00:27.724,"[id]	
18:00:27.724,"[id]	  -- Response JSON (jovo-framework) ----------------------------------------
18:00:27.724,"[id]	
18:00:27.724,"[id]	{
18:00:27.724,"[id]		""version"": ""1.0"",
18:00:27.724,"[id]		""response"": {
18:00:27.724,"[id]			""shouldEndSession"": false,
18:00:27.724,"[id]			""outputSpeech"": { ... },
18:00:27.724,"[id]			""reprompt"": { ... },
18:00:27.724,"[id]			""directives"": [ ... ],
18:00:27.724,"[id]			""card"": { ... }
18:00:27.724,"[id]		},
18:00:27.724,"[id]		""sessionAttributes"": {
18:00:27.724,"[id]			""_JOVO_STATE_"": ""PlayerCountSelectState""
18:00:27.724,"[id]		}
18:00:27.724,"[id]	}
18:00:27.724,"[id]	-- middleware 'response' done (1 ms)
18:00:27.647,"[id]	 this.$user.$metaData
18:00:27.647,"[id]	-- middleware 'user.save' done (260 ms)
18:00:27.647,"[id]	
18:00:27.647,"[id]	Output object: this.$output (framework) -------------------------------
18:00:27.647,"[id]	
18:00:27.647,"[id]	{
18:00:27.647,"[id]		""Alexa"": {
18:00:27.647,"[id]			""DisplayTemplate"": { ... },
18:00:27.647,"[id]			""StandardCard"": { ... },
18:00:27.647,"[id]			""Directives"": [ ... ]
18:00:27.647,"[id]		},
18:00:27.647,"[id]		""ask"": { ... }
18:00:27.647,"[id]	}
18:00:27.646,"[id]	
18:00:27.646,"[id]	Jovo user: (save)  (framework) ----------------------------------------
18:00:27.646,"[id]	
18:00:27.646,"[id]	 Saved user: [account]
18:00:27.646,"[id]	 this.$user.$context
18:00:27.646,"[id]	{
18:00:27.646,"[id]		""prev"": [ ... ]
18:00:27.646,"[id]	}
18:00:27.646,"[id]	 this.$user.$data
18:00:27.646,"[id]	{ ... }
18:00:27.387,"[id]	
18:00:27.387,"[id]	  -- ON_REQUEST ------------------------------------------------------------
18:00:27.387,"[id]	
18:00:27.387,"[id]	 ON_REQUEST (0 ms)
18:00:27.387,"[id]	
18:00:27.387,"[id]	Handle  (framework) ---------------------------------------------------
18:00:27.387,"[id]	
18:00:27.387,"[id]	��
18:00:27.387,"[id]	-- middleware 'handler' done (160 ms)
18:00:27.386,"2019-08-21T18:00:27.385Z	[id]	startState: PlayerCountSelectState
18:00:27.386,"[id]	 toIntent: Start
18:00:27.386,"[id]	 NEW_SESSION (159 ms)
18:00:27.328,"2019-08-21T18:00:27.327Z	[id]	startState: StartGame
18:00:27.328,"[id]	 toIntent: Start
18:00:27.268,"[id]	 toIntent: Start
18:00:27.228,"2019-08-21T18:00:27.228Z	[id]	startState: Greeting
18:00:27.227,"[id]	{
18:00:27.227,"[id]		""type"": ""LAUNCH"",
18:00:27.227,"[id]		""path"": ""LAUNCH""
18:00:27.227,"[id]	}
18:00:27.227,"[id]	-- middleware 'router' done (2 ms)
18:00:27.227,"[id]	
18:00:27.227,"[id]	Jovo handler  (framework) ---------------------------------------------
18:00:27.227,"[id]	
18:00:27.227,"[id]	
18:00:27.227,"[id]	  -- NEW_SESSION -----------------------------------------------------------
18:00:27.227,"[id]	
18:00:27.226,"[id]	
18:00:27.226,"[id]	Jovo router  (framework) ----------------------------------------------
18:00:27.226,"[id]	
18:00:27.226,"[id]	Route object:
18:00:27.225,"[id]	this.$user.new = false
18:00:27.225,"[id]	{
18:00:27.225,"[id]		""prev"": [ ... ]
18:00:27.225,"[id]	}
18:00:27.225,"[id]	this.$user.$data
18:00:27.225,"[id]	{ ... }
18:00:27.225,"[id]	this.$user.$metaData
18:00:27.225,"[id]	-- middleware 'user.load' done (640 ms)
18:00:27.225,"[id]	-- middleware 'before.router' done (0 ms)
18:00:27.168,"[id]	
18:00:27.168,"[id]	Jovo user (load) (framework) ------------------------------------------
18:00:27.168,"[id]	
18:00:27.168,"[id]	this.$user.getId(): [id]
18:00:26.588,"[id]	-- middleware 'platform.nlu' done (37 ms)
18:00:26.588,"[id]	
18:00:26.588,"[id]	After nlu  (framework) ------------------------------------------------
18:00:26.588,"[id]	
18:00:26.588,"[id]	this.$nlu : undefined
18:00:26.588,"[id]	this.$inputs : {}
18:00:26.548,"[id]	-- middleware 'after.platform.init' done (1 ms)
18:00:26.548,"[id]	-- middleware 'platform.init' done (3 ms)
18:00:26.548,"[id]	
18:00:26.548,"[id]	After init  (framework) -----------------------------------------------
18:00:26.548,"[id]	
18:00:26.548,"[id]	this.$alexaSkill initialized
18:00:26.548,"[id]	this.$type: {""type"":""LAUNCH""}
18:00:26.548,"[id]	this.$session.$data : {}
18:00:26.545,"[id]	
18:00:26.545,"[id]	  -- Request JSON (jovo-framework) -----------------------------------------
18:00:26.545,"[id]	
18:00:26.545,"[id]	{
18:00:26.545,"[id]		""version"": ""1.0"",
18:00:26.545,"[id]		""session"": { ... },
18:00:26.545,"[id]		""context"": { ... },
18:00:26.545,"[id]		""request"": { ... }
18:00:26.545,"[id]	}
18:00:26.545,"[id]	-- middleware 'request' done (57 ms)
18:00:26.488,"[id]	-- middleware 'setup' done (69 ms)
18:00:26.424,"[id]	Adding resources to $cms object:
18:00:26.424,"[id]	{}
18:00:26.418,"[id]	
18:00:26.418,"[id]	Start request (framework) ---------------------------------------------
18:00:26.418,"[id]	
18:00:26.415,"START RequestId: [id] Version: $LATEST

==Conclusion==
I'm not sure if I am perhaps missing something obvious from the logs or if it's some internal Jovo thing I may not have accounted for, but after working with amazon a bit on trying to narrow down the issue we decided it would be best to reach out to you. Any help would of course be appreciated.

Your Environment

"jovo-db-dynamodb": "^2.2.0",
"jovo-framework": "^2.1.1",
"jovo-platform-alexa": "^2.1.2",
"jovo-plugin-debugger": "^2.1.1",

Operating System: Local: mac. Server: AWS Lambda

The text was updated successfully, but these errors were encountered:

aswetlow · 2019-09-12T07:26:11Z

Thanks for reaching out. Sorry for the late reply (I was on vacation :))

I heard about the issue but haven't been able to reproduce it. Will run some tests at the weekend.

It's really weird that it occurs in two independent lambdas at the same time.

toreyjs · 2019-09-18T22:39:34Z

Thanks, let me know if you need any additional details.

aswetlow · 2019-10-21T15:39:10Z

Any news on this?
I just found out that I had several timeouts in one of my skills. It was an erroneous API call that took longer than expected.

Do you call any external APIs that may take longer?

toreyjs · 2019-10-30T16:06:44Z

So far no solution; increasing the lambda's memory seems to have helped, but we've also had a decrease in users around the same time, so it's hard to know for sure which caused a decrease in errors. Only APIs we used were analytics, and were turned off to no noticeable effect.

After looking at x-ray logs (which mostly error out at AWS:Lambda), we're looking into database stuff again (though I'm not overly hopeful)

gshenar · 2019-11-27T15:20:52Z

I have been having these issues as well. The error rate is low, but its quite noticeable given that I'm handling millions of calls a month. I tried removing dashbot - doesn't seem to have been the fix. There aren't really any external calls I'm doing besides dynamo through the framework

jankoenig · 2019-11-27T16:07:55Z

Hey @gshenar, could you take a look at your DynamoDb monitoring? We've had issues in the past where read/write capacity was reached during peak times

gshenar · 2019-11-27T16:15:27Z

Yeah here's a screenshot. I have lots of alarms set up and it hasn't been going over.

klapper42 · 2020-02-09T08:09:06Z

I had a similar problem with jovo-db-mysql (reproducible). Not sure what config triggered the issue, but it looked like the connection pooling held the lambda function from terminating.
What helped was:

# src/index.js
exports.handler = async (event, context, callback) => {
    context.callbackWaitsForEmptyEventLoop = false;
    await app.handle(new Lambda(event, context, callback));
};

r-pai · 2020-04-25T15:11:47Z

I had a similar problem with jovo-db-mysql (reproducible). Not sure what config triggered the issue, but it looked like the connection pooling held the lambda function from terminating.
What helped was:
# src/index.js
exports.handler = async (event, context, callback) => {
    context.callbackWaitsForEmptyEventLoop = false;
    await app.handle(new Lambda(event, context, callback));
};

Thanks @klapper42 , the solution you provided solved my issue.

My use case was a basic one, I wanted my hello world jovo project to run in AWS lambda and for session data instead of filedDB chose Mysql DB. When added to run in lambda ,it timed out.
But when I run in my local with MYSQL database there was no issues.

Steps:

Create new jovo project with language as typescript
Followed steps in https://www.jovo.tech/docs/databases/mysql
Ran in local environment - success
Ran using AWS lambda, it timed out. Note 15 sec is the max

{
"errorMessage": "2020-04-25T15:08:59.734Z e7b73234-1977-4b7f-a3cf-939f55da2532 Task timed out after 15.02 seconds"
}

aswetlow · 2022-01-05T13:48:07Z

I close this issue due to incativity.

aswetlow closed this as completed Jan 5, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Lambda Timeouts #573

Lambda Timeouts #573

toreyjs commented Sep 3, 2019

aswetlow commented Sep 12, 2019

toreyjs commented Sep 18, 2019

aswetlow commented Oct 21, 2019

toreyjs commented Oct 30, 2019

gshenar commented Nov 27, 2019

jankoenig commented Nov 27, 2019

gshenar commented Nov 27, 2019

klapper42 commented Feb 9, 2020

r-pai commented Apr 25, 2020

aswetlow commented Jan 5, 2022

Lambda Timeouts #573

Lambda Timeouts #573

Comments

toreyjs commented Sep 3, 2019

I'm submitting a...

Expected Behavior

Current Behavior

What I've Tried

Logs

Your Environment

aswetlow commented Sep 12, 2019

toreyjs commented Sep 18, 2019

aswetlow commented Oct 21, 2019

toreyjs commented Oct 30, 2019

gshenar commented Nov 27, 2019

jankoenig commented Nov 27, 2019

gshenar commented Nov 27, 2019

klapper42 commented Feb 9, 2020

r-pai commented Apr 25, 2020

aswetlow commented Jan 5, 2022