Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Lambda Timeouts #573

Closed
1 task done
toreyjs opened this issue Sep 3, 2019 · 10 comments
Closed
1 task done

Lambda Timeouts #573

toreyjs opened this issue Sep 3, 2019 · 10 comments

Comments

@toreyjs
Copy link

toreyjs commented Sep 3, 2019

I'm submitting a...

  • Bug report

Expected Behavior

Builds should finish in ~1s without erroring/timing out

Current Behavior

Somewhat randomly (though often in groups) error spikes will happen, accompanied by time outs.

Average error rate seems to be ~2-3% per day according to Alexa Dev Console session metrics (although it has gone above it by a lot for a few days, higher lambda timeouts seem to help a bit)

What I've Tried

I narrowed down what external services are used in my skill to: external assets, dynamoDB, analytics.

Based on logs, it does seem to save (Saved user: [id]), analytics return, it sends a response (Response JSON (jovo-framework) / Response (framework)) and ends with -- middleware 'platform.output' done - however at this point it eventually times out (based on how long my lambda timeout is, currently 8s).

Our working guess is that despite a response being sent the lambda never closes, which eventually sends and END RequestId: [id], causing an error despite everything going alright. However, nothing other than the lambda seem to be an issue when tested (everything seems to return, DB metrics look fine, mp3 files are only used on device side).

Small side note is that we have two jovo skills, and interestingly the large error / duration spikes happen at almost exactly the same time / amount (looking at cloudwatch metrics). Another skill the client has also uses Jovo and while it does have timeouts, they don't mirror the results like my two skills do. Note: My two skills use separate databases / lambdas, but do use the same server for assets (though nothing fetched server side) and same analytics platform (though again it always returns way before timeout); nothing else should be shared.

Logs

Logs look something like this (with DEBUG level on):

@timestamp,@message
(NOTE: the line below is a log I have at the top of app.js, right before 'new App()')
18:00:35.004,"2019-08-21T18:00:35.003Z	[id]	process.env.DEBUG = false, global.gAppMode = production
18:00:34.421,"END RequestId: [id]
18:00:34.421,"REPORT RequestId: [id]	Duration: 8005.38 ms	Billed Duration: 8000 ms 	Memory Size: 128 MB	Max Memory Used: 53 MB	
18:00:34.421,"2019-08-21T18:00:34.420Z [id] Task timed out after 8.01 seconds

18:00:27.724,"[id]	-- middleware 'platform.output' done (38 ms)
18:00:27.724,"[id]	
18:00:27.724,"[id]	Response  (framework) -------------------------------------------------
18:00:27.724,"[id]	
18:00:27.724,"[id]	
18:00:27.724,"[id]	  -- Response JSON (jovo-framework) ----------------------------------------
18:00:27.724,"[id]	
18:00:27.724,"[id]	{
18:00:27.724,"[id]		""version"": ""1.0"",
18:00:27.724,"[id]		""response"": {
18:00:27.724,"[id]			""shouldEndSession"": false,
18:00:27.724,"[id]			""outputSpeech"": { ... },
18:00:27.724,"[id]			""reprompt"": { ... },
18:00:27.724,"[id]			""directives"": [ ... ],
18:00:27.724,"[id]			""card"": { ... }
18:00:27.724,"[id]		},
18:00:27.724,"[id]		""sessionAttributes"": {
18:00:27.724,"[id]			""_JOVO_STATE_"": ""PlayerCountSelectState""
18:00:27.724,"[id]		}
18:00:27.724,"[id]	}
18:00:27.724,"[id]	-- middleware 'response' done (1 ms)
18:00:27.647,"[id]	 this.$user.$metaData
18:00:27.647,"[id]	-- middleware 'user.save' done (260 ms)
18:00:27.647,"[id]	
18:00:27.647,"[id]	Output object: this.$output (framework) -------------------------------
18:00:27.647,"[id]	
18:00:27.647,"[id]	{
18:00:27.647,"[id]		""Alexa"": {
18:00:27.647,"[id]			""DisplayTemplate"": { ... },
18:00:27.647,"[id]			""StandardCard"": { ... },
18:00:27.647,"[id]			""Directives"": [ ... ]
18:00:27.647,"[id]		},
18:00:27.647,"[id]		""ask"": { ... }
18:00:27.647,"[id]	}
18:00:27.646,"[id]	
18:00:27.646,"[id]	Jovo user: (save)  (framework) ----------------------------------------
18:00:27.646,"[id]	
18:00:27.646,"[id]	 Saved user: [account]
18:00:27.646,"[id]	 this.$user.$context
18:00:27.646,"[id]	{
18:00:27.646,"[id]		""prev"": [ ... ]
18:00:27.646,"[id]	}
18:00:27.646,"[id]	 this.$user.$data
18:00:27.646,"[id]	{ ... }
18:00:27.387,"[id]	
18:00:27.387,"[id]	  -- ON_REQUEST ------------------------------------------------------------
18:00:27.387,"[id]	
18:00:27.387,"[id]	 ON_REQUEST (0 ms)
18:00:27.387,"[id]	
18:00:27.387,"[id]	Handle  (framework) ---------------------------------------------------
18:00:27.387,"[id]	
18:00:27.387,"[id]	��
18:00:27.387,"[id]	-- middleware 'handler' done (160 ms)
18:00:27.386,"2019-08-21T18:00:27.385Z	[id]	startState: PlayerCountSelectState
18:00:27.386,"[id]	 toIntent: Start
18:00:27.386,"[id]	 NEW_SESSION (159 ms)
18:00:27.328,"2019-08-21T18:00:27.327Z	[id]	startState: StartGame
18:00:27.328,"[id]	 toIntent: Start
18:00:27.268,"[id]	 toIntent: Start
18:00:27.228,"2019-08-21T18:00:27.228Z	[id]	startState: Greeting
18:00:27.227,"[id]	{
18:00:27.227,"[id]		""type"": ""LAUNCH"",
18:00:27.227,"[id]		""path"": ""LAUNCH""
18:00:27.227,"[id]	}
18:00:27.227,"[id]	-- middleware 'router' done (2 ms)
18:00:27.227,"[id]	
18:00:27.227,"[id]	Jovo handler  (framework) ---------------------------------------------
18:00:27.227,"[id]	
18:00:27.227,"[id]	
18:00:27.227,"[id]	  -- NEW_SESSION -----------------------------------------------------------
18:00:27.227,"[id]	
18:00:27.226,"[id]	
18:00:27.226,"[id]	Jovo router  (framework) ----------------------------------------------
18:00:27.226,"[id]	
18:00:27.226,"[id]	Route object:
18:00:27.225,"[id]	this.$user.new = false
18:00:27.225,"[id]	{
18:00:27.225,"[id]		""prev"": [ ... ]
18:00:27.225,"[id]	}
18:00:27.225,"[id]	this.$user.$data
18:00:27.225,"[id]	{ ... }
18:00:27.225,"[id]	this.$user.$metaData
18:00:27.225,"[id]	-- middleware 'user.load' done (640 ms)
18:00:27.225,"[id]	-- middleware 'before.router' done (0 ms)
18:00:27.168,"[id]	
18:00:27.168,"[id]	Jovo user (load) (framework) ------------------------------------------
18:00:27.168,"[id]	
18:00:27.168,"[id]	this.$user.getId(): [id]
18:00:26.588,"[id]	-- middleware 'platform.nlu' done (37 ms)
18:00:26.588,"[id]	
18:00:26.588,"[id]	After nlu  (framework) ------------------------------------------------
18:00:26.588,"[id]	
18:00:26.588,"[id]	this.$nlu : undefined
18:00:26.588,"[id]	this.$inputs : {}
18:00:26.548,"[id]	-- middleware 'after.platform.init' done (1 ms)
18:00:26.548,"[id]	-- middleware 'platform.init' done (3 ms)
18:00:26.548,"[id]	
18:00:26.548,"[id]	After init  (framework) -----------------------------------------------
18:00:26.548,"[id]	
18:00:26.548,"[id]	this.$alexaSkill initialized
18:00:26.548,"[id]	this.$type: {""type"":""LAUNCH""}
18:00:26.548,"[id]	this.$session.$data : {}
18:00:26.545,"[id]	
18:00:26.545,"[id]	  -- Request JSON (jovo-framework) -----------------------------------------
18:00:26.545,"[id]	
18:00:26.545,"[id]	{
18:00:26.545,"[id]		""version"": ""1.0"",
18:00:26.545,"[id]		""session"": { ... },
18:00:26.545,"[id]		""context"": { ... },
18:00:26.545,"[id]		""request"": { ... }
18:00:26.545,"[id]	}
18:00:26.545,"[id]	-- middleware 'request' done (57 ms)
18:00:26.488,"[id]	-- middleware 'setup' done (69 ms)
18:00:26.424,"[id]	Adding resources to $cms object:
18:00:26.424,"[id]	{}
18:00:26.418,"[id]	
18:00:26.418,"[id]	Start request (framework) ---------------------------------------------
18:00:26.418,"[id]	
18:00:26.415,"START RequestId: [id] Version: $LATEST

==Conclusion==
I'm not sure if I am perhaps missing something obvious from the logs or if it's some internal Jovo thing I may not have accounted for, but after working with amazon a bit on trying to narrow down the issue we decided it would be best to reach out to you. Any help would of course be appreciated.

Your Environment

"jovo-db-dynamodb": "^2.2.0",
"jovo-framework": "^2.1.1",
"jovo-platform-alexa": "^2.1.2",
"jovo-plugin-debugger": "^2.1.1",
  • Operating System: Local: mac. Server: AWS Lambda
@aswetlow
Copy link
Member

Thanks for reaching out. Sorry for the late reply (I was on vacation :))

I heard about the issue but haven't been able to reproduce it. Will run some tests at the weekend.

It's really weird that it occurs in two independent lambdas at the same time.

@toreyjs
Copy link
Author

toreyjs commented Sep 18, 2019

Thanks, let me know if you need any additional details.

@aswetlow
Copy link
Member

Any news on this?
I just found out that I had several timeouts in one of my skills. It was an erroneous API call that took longer than expected.

Do you call any external APIs that may take longer?

@toreyjs
Copy link
Author

toreyjs commented Oct 30, 2019

So far no solution; increasing the lambda's memory seems to have helped, but we've also had a decrease in users around the same time, so it's hard to know for sure which caused a decrease in errors. Only APIs we used were analytics, and were turned off to no noticeable effect.

After looking at x-ray logs (which mostly error out at AWS:Lambda), we're looking into database stuff again (though I'm not overly hopeful)

@gshenar
Copy link
Contributor

gshenar commented Nov 27, 2019

I have been having these issues as well. The error rate is low, but its quite noticeable given that I'm handling millions of calls a month. I tried removing dashbot - doesn't seem to have been the fix. There aren't really any external calls I'm doing besides dynamo through the framework

@jankoenig
Copy link
Member

Hey @gshenar, could you take a look at your DynamoDb monitoring? We've had issues in the past where read/write capacity was reached during peak times

@gshenar
Copy link
Contributor

gshenar commented Nov 27, 2019

Yeah here's a screenshot. I have lots of alarms set up and it hasn't been going over.
image

@klapper42
Copy link

I had a similar problem with jovo-db-mysql (reproducible). Not sure what config triggered the issue, but it looked like the connection pooling held the lambda function from terminating.
What helped was:

# src/index.js
exports.handler = async (event, context, callback) => {
    context.callbackWaitsForEmptyEventLoop = false;
    await app.handle(new Lambda(event, context, callback));
};

@r-pai
Copy link

r-pai commented Apr 25, 2020

I had a similar problem with jovo-db-mysql (reproducible). Not sure what config triggered the issue, but it looked like the connection pooling held the lambda function from terminating.
What helped was:

# src/index.js
exports.handler = async (event, context, callback) => {
    context.callbackWaitsForEmptyEventLoop = false;
    await app.handle(new Lambda(event, context, callback));
};

Thanks @klapper42 , the solution you provided solved my issue.

My use case was a basic one, I wanted my hello world jovo project to run in AWS lambda and for session data instead of filedDB chose Mysql DB. When added to run in lambda ,it timed out.
But when I run in my local with MYSQL database there was no issues.

Steps:

  1. Create new jovo project with language as typescript
  2. Followed steps in https://www.jovo.tech/docs/databases/mysql
  3. Ran in local environment - success
  4. Ran using AWS lambda, it timed out. Note 15 sec is the max

{
"errorMessage": "2020-04-25T15:08:59.734Z e7b73234-1977-4b7f-a3cf-939f55da2532 Task timed out after 15.02 seconds"
}

@aswetlow
Copy link
Member

aswetlow commented Jan 5, 2022

I close this issue due to incativity.

@aswetlow aswetlow closed this as completed Jan 5, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

6 participants