Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat(AWS Lambda): Add support for image config #8778

Merged
merged 3 commits into from
Jan 21, 2021

Conversation

pgrzesik
Copy link
Contributor

Closes: #8583

There's one thing that I'd like to clarify before merging this one. Initially, I've added support for image config for deploy function as well, however, as it turns out, updating code of function with an image causes subsequent configuration update to fail (I believe there's some async process triggered on AWS side that updates the function even after the request completes). We could potentially introduce check + warning that when function has an image, configuration update is skipped and if you'd like to update configuration, you have to force it with --update-config flag. We can also just skip it for now and potentially add support for it later. What do you think @medikoo?

@pgrzesik pgrzesik self-assigned this Jan 18, 2021
@pgrzesik pgrzesik force-pushed the feature-add-support-for-aws-lambda-image-config branch 2 times, most recently from 23c4765 to f6b1bdc Compare January 18, 2021 12:11
@codecov
Copy link

codecov bot commented Jan 18, 2021

Codecov Report

Merging #8778 (f824d00) into master (420e937) will increase coverage by 0.03%.
The diff coverage is 100.00%.

Impacted file tree graph

@@            Coverage Diff             @@
##           master    #8778      +/-   ##
==========================================
+ Coverage   87.76%   87.80%   +0.03%     
==========================================
  Files         264      264              
  Lines        9820     9856      +36     
==========================================
+ Hits         8619     8654      +35     
- Misses       1201     1202       +1     
Impacted Files Coverage Δ
lib/plugins/aws/deployFunction.js 94.85% <100.00%> (+0.34%) ⬆️
lib/plugins/aws/package/compile/functions.js 96.16% <100.00%> (+0.13%) ⬆️
lib/plugins/aws/provider.js 94.72% <100.00%> (+0.21%) ⬆️
lib/Serverless.js 96.49% <0.00%> (-0.88%) ⬇️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 420e937...f824d00. Read the comment docs.

@pgrzesik pgrzesik requested a review from medikoo January 18, 2021 12:26
Copy link
Contributor

@medikoo medikoo left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@pgrzesik looks very good. I have just few minor suggestions.

updating code of function with an image causes subsequent configuration update to fail

It's not clear exactly what do you mean by code of function?

AFAIK with images there's no code updated. We update just configuration properties related to image. Code is specific to regular handler what do I miss?

if (imageDefinedInProvider) {
if (_.isObject(imageDefinedInProvider)) {
if (!imageDefinedInProvider.uri && !imageDefinedInProvider.path) {
async function (functionName, image) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's maybe accept just functionName argument, as technically then image is redundant and can be resolved as:

const { image } =  this.serverless.service.getFunction(functionName);

properties: {
name: {
type: 'string',
pattern: '^[a-z]{1,32}$',
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It'll be nice to reuse imageNamePattern variable

Comment on lines 1587 to 1589
const modulesCacheStub = {
'child-process-ext/spawn': sinon.stub().resolves(),
};
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There's dedicated shouldStubSpawn option, that does just that

},
},
})
).to.be.rejectedWith('Referenced "undefinedimage" not defined in "provider.ecr.images"');
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's test against error codes

@pgrzesik pgrzesik force-pushed the feature-add-support-for-aws-lambda-image-config branch from f6b1bdc to 850fef0 Compare January 18, 2021 18:48
@pgrzesik
Copy link
Contributor Author

pgrzesik commented Jan 18, 2021

updating code of function with an image causes subsequent configuration update to fail

It's not clear exactly what do you mean by code of function?

AFAIK with images there's no code updated. We update just configuration properties related to image. Code is specific to regular handler what do I miss?

Sorry, I've read what I wrote and it's definitely not clear what I meant. The way it works is that updating ImageUri is done as a part of updateFunctionCode API call, where updating ImageConfig is done via updateFunctionConfiguration API call - it's a little bit unintuitive, but unfortunatelly that's how it works on AWS side. Doing instant call to updateFunctionConfiguration after calling updateFunctionCode with ImageUri causes updateFunctionConfiguration to fail.

That brings a question - how to handle use case when someone wants to deploy such function when also updating it's configuration? I see the following options:

  • Not allowing to update configuration at all when using image (seems prohibitive)
  • By default, not updating configuration when using image and emitting warning that you have to perform it separately with --update-config flag
  • Waiting until it's possible to update configuration with retry + backoff

What do you think?

@pgrzesik pgrzesik requested a review from medikoo January 18, 2021 19:11
@medikoo
Copy link
Contributor

medikoo commented Jan 18, 2021

Doing instant call to updateFunctionConfiguration after calling updateFunctionCode with ImageUri causes updateFunctionConfiguration to fail.

Thanks for clarification.

Does the fail happen in all cases? Even if updateFunctionCodecall doesn't actually change anything (uri stays same)? Or it happens only if uri is changed, or only when we switch from regular code to image` ?

@pgrzesik
Copy link
Contributor Author

Thanks for clarification, now all clear.

Does the fail happen in all cases? Even if updateFunctionCodecall doesn't actually change anything (uri stays same)? Or it happens only if uri is changed, or only when we switch from regular code to image` ?

From my observations it looks like it's always failing in such case, even if the ImageUri is not changed at all, e.g. run sls deploy and after it finishes run sls deploy function -f fnName without changing anything will fail on updating function configuration.

@medikoo
Copy link
Contributor

medikoo commented Jan 19, 2021

From my observations it looks like it's always failing in such case

Will retrying will help? (e.g. we can retry for c.a. 15 seconds)

@pgrzesik
Copy link
Contributor Author

pgrzesik commented Jan 19, 2021

From my observations it looks like it's always failing in such case

Will retrying will help? (e.g. we can retry for c.a. 15 seconds)

I believe it should work, as running it with --update-config flag after some time does the trick - I'll verify if it's a valid approach

@pgrzesik
Copy link
Contributor Author

pgrzesik commented Jan 19, 2021

I've adjusted our retry logic in provider to accommodate for this case, but our standard approach of 4 retires with 4-7 seconds of delay does not always do the trick, sometimes it's enough, sometimes it still fails. Please see below as it looks like. I'm not sure about this approach as it's not reliable and still fails in >50% of cases at least in my observations. We could either force longer retries here or longer delays between retries or choose another approach from the ones mentioned above.

Failing
Serverless: Successfully deployed function: helloFromFirst
Serverless: Recoverable error occurred (The operation cannot be performed at this time. An update is in progress for resource: arn:aws:lambda:us-east-1:600238737408:function:someproject-dev-helloFromFirst), sleeping for ~6 seconds. Try 1 of 4
Serverless: Recoverable error occurred (The operation cannot be performed at this time. An update is in progress for resource: arn:aws:lambda:us-east-1:600238737408:function:someproject-dev-helloFromFirst), sleeping for ~5 seconds. Try 2 of 4
Serverless: Recoverable error occurred (The operation cannot be performed at this time. An update is in progress for resource: arn:aws:lambda:us-east-1:600238737408:function:someproject-dev-helloFromFirst), sleeping for ~4 seconds. Try 3 of 4
Serverless: Recoverable error occurred (The operation cannot be performed at this time. An update is in progress for resource: arn:aws:lambda:us-east-1:600238737408:function:someproject-dev-helloFromFirst), sleeping for ~6 seconds. Try 4 of 4

  Serverless Error ---------------------------------------

  The operation cannot be performed at this time. An update is in progress for resource: arn:aws:lambda:us-east-1:600238737408:function:someproject-dev-helloFromFirst

  Get Support --------------------------------------------
     Docs:          docs.serverless.com
     Bugs:          github.com/serverless/serverless/issues
     Issues:        forum.serverless.com

  Your Environment Information ---------------------------
     Operating System:          linux
     Node Version:              15.6.0
     Framework Version:         2.19.0 (local)
     Plugin Version:            4.4.2
     SDK Version:               2.3.2
     Components Version:        3.4.7


Success
Serverless: Successfully deployed function: helloFromFirst
Serverless: Recoverable error occurred (The operation cannot be performed at this time. An update is in progress for resource: arn:aws:lambda:us-east-1:600238737408:function:someproject-dev-helloFromFirst), sleeping for ~4 seconds. Try 1 of 4
Serverless: Recoverable error occurred (The operation cannot be performed at this time. An update is in progress for resource: arn:aws:lambda:us-east-1:600238737408:function:someproject-dev-helloFromFirst), sleeping for ~4 seconds. Try 2 of 4
Serverless: Recoverable error occurred (The operation cannot be performed at this time. An update is in progress for resource: arn:aws:lambda:us-east-1:600238737408:function:someproject-dev-helloFromFirst), sleeping for ~6 seconds. Try 3 of 4
Serverless: Recoverable error occurred (The operation cannot be performed at this time. An update is in progress for resource: arn:aws:lambda:us-east-1:600238737408:function:someproject-dev-helloFromFirst), sleeping for ~5 seconds. Try 4 of 4
Serverless: Successfully updated function: helloFromFirst

@medikoo
Copy link
Contributor

medikoo commented Jan 19, 2021

@pgrzesik as I think of it, best DX approach could be to:

  1. Retrieve current value of function code from AWS
  2. If we discover that user is switching between regular handler and image, reject the request, and request full deployment (at least I guess, it's hard to make it right, and changing such state without doing regular CF deployment feels as bad idea)
  3. If it's regular handler, proceed as usual
  4. If's an image handler and we detect that uri have changed. Let's issue function update
  5. If we updated the uri, let's first get function configuration to confirm on weather there's anything to update. If there is, issue update and retry until success (ofc retry only on An update is in progress error)
  6. if there was no uri update, unconditionally issue configuration update.

It feels complicated, but I guess it secures are really solid experience. What do yo think?

@pgrzesik
Copy link
Contributor Author

pgrzesik commented Jan 19, 2021

Thanks @medikoo, I like the approach, we're already fetching results of getFunction call, so we have that information available. There are however two things:

  1. Our current request mechanism does not support retrying until success so we should first either add that functionality or not use retry mechanism from there and retry in other way - changes in request feel like they deserve a separate PR
  2. I feel like the changes in updateCode/Configuration logic that you propose deserve a separate PR as the logic that checks if anything changes in configuration have be implemented carefully

What do you think?

@medikoo
Copy link
Contributor

medikoo commented Jan 19, 2021

Our current request mechanism does not support retrying until success so we should first either add that functionality or not use retry mechanism from there and retry in other way - changes in request feel like they deserve a separate PR

I wonder if it should be a part of request, as what's in request is retries on generic errors. While here it's about doing retries on specific error. I'd probably let request to retry on general errors (with its defaults in place), and in context of this functionality wrap it with retry logic specific to this error.

However if you feel we can neatly incorporate retrying on specific errors into request (via some option), then it also can be a valid route.

I feel like the changes in updateCode/Configuration logic that you propose deserve a separate PR

It depends. Do you think that what we have currently in master deserves improvements? Or are those improvements preferred now, in context of properties we're adding here. If it's the latter, then it feels as it belongs to this PR

@pgrzesik
Copy link
Contributor Author

I wonder if it should be a part of request, as what's in request is retries on generic errors. While here it's about doing retries on specific error. I'd probably let request to retry on general errors (with its defaults in place), and in context of this functionality wrap it with retry logic specific to this error.

However if you feel we can neatly incorporate retrying on specific errors into request (via some option), then it also can be a valid route.

I was thinking about it too, I've implemented a simple forceRetryOn option where you can provide error code that should always be retried, however, we would still need the functionality of retrying until success which current provider.request doesn't have, which would warrant a bigger refactoring/improvement. I think we could get away with specific retry only in deployFunction context.

It depends. Do you think that what we have currently in master deserves improvements? Or are those improvements preferred now, in context of properties we're adding here. If it's the latter, then it feels as it belongs to this PR

I believe the current functionality on master might suffer from the exact same issue that I've noticed here. When someone uses image and has some other configuration to be updated, they will run into the same error, as it's only adding a few extra pieces of configuration and doesn't change anything in how the command is executed in general.

@pgrzesik pgrzesik force-pushed the feature-add-support-for-aws-lambda-image-config branch from 850fef0 to 99bcc58 Compare January 19, 2021 12:04
@pgrzesik
Copy link
Contributor Author

pgrzesik commented Jan 19, 2021

In the last commit I added a simplified version of your proposal:

  1. Checking if we're switching between image and handler
  2. Checking if imageSha changed, if not, skipping deployment
  3. Always trying to retry updateFunctionConfiguration.

Number 3 should have no effect for regular deploys with handler defined and for updates where imageSha didn't change. The only downside is when we're trying to issue an update when the config didn't change but imageSha changed, but it's a bit tricky to check if config did in fact change (due to the fact that some fields are mapped differently and some of them might be e.g. missing when we removed some properties, e.g. VpcConfig). I'll still try to see if we can check it in a reasonable manner.

Minus the config change checking, what do you think about the approach to implementation?

Update: It gets quite messy to actually validate if something changed in the function's Configuration as every field has to be considered separately, not sure if it's worth the effort here

Copy link
Contributor

@medikoo medikoo left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @pgrzesik! I've put some comments under proposed changes.

Unfortunately it's getting a bit more complex, but I guess it's for good. This should ensure really reliable, not flaky, experience

due to the fact that some fields are mapped differently and some of them might be e.g. missing when we removed some properties, e.g. VpcConfig

I assume that function configuration update is patch like operation (?) So AWS will attempt to update only properties which we pass. Is that the case?

If it's so, wouldn't it work only by confirming the properties we intend to pass to function configuration update, whether what's already configured is not the same (?)

Or do values that are coming from AWS, are somewhat normalized and it's hard for use to have solid answer for some?

' Please run "serverless deploy" to ensure consistent deploy.',
];
throw new this.serverless.classes.Error(
errorMessage,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If I read correctly it's an array, while what we should pass to ServerlessError class should be string

(ideally would be simply to provide this error message inline, without arrayifciation and joining - we can simply split string into parts)

`The function "${this.options.function}" you want to update with handler was previously packaged as an image.`,
' Please run "serverless deploy" to ensure consistent deploy.',
];
throw new this.serverless.classes.Error(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To maintain single convention it'll be nice to always rely on ServerlessError as required from lib/servelress-error (altough I know that probably other errors in this file are constructed via classes.Error - I've opened an issue to cleanup all such instances: #8783)

this.serverless.cli.log(
`Retrying update of function: ${this.options.function}. Reason: ${err.message}`
);
await wait(5000);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would do 1s. I think there's no harm with issuing request more frequently, and we can ensure that way use won't wait too long (5s adds significant overhead)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was even considering going higher here, the problem is that it takes ~25-30 seconds in my case to actually succeed with an update and I can imagine it might be even longer for specific use cases. Do you think it's a good approach to issue 25-30 retries under regular circumstances?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I guess it's not that harmful. Context is most likely that some developer is updating the dev stage, and changed both image uri and some config (but that's to be confirmed I believe). So case is quite rare and should not affect production deployments.

Additionally we have some backoff logic in case of rate limit errors. So it's not that we'll just crash in such case.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, in general, that case should be quite rare if we have proper checking for configuration update only in case when something really changed.

@@ -63,6 +65,35 @@ class AwsDeployFunction {
}
}

// TODO: TESTS
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For completeness it'll be nice to address those TODO's. Is this PR in temporary unfinished state?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I pushed that commit with intention to discuss the approach, it's definitely not in the final state as it's not cleaned up and tested properly

async callUpdateFunctionConfiguration(params) {
await this.provider.request('Lambda', 'updateFunctionConfiguration', params);
// TODO: ADD max count?
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We may set some time restriction, e.g. do not go beyond one minute.

@@ -244,6 +318,7 @@ class AwsDeployFunction {

await this.provider.request('Lambda', 'updateFunctionCode', params);
this.serverless.cli.log(`Successfully deployed function: ${this.options.function}`);
return;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This change I guess is not necessary (?)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

that's correct, I was toying with checking if the function was deployed or not and returning here bool value

@pgrzesik
Copy link
Contributor Author

pgrzesik commented Jan 19, 2021

If it's so, wouldn't it work only by confirming the properties we intend to pass to function configuration update, whether what's already configured is not the same (?)

It's exactly the approach I took, but it's a bit messy to properly validate and also error-prone, especially in situations where we're considering layers, vpc configuration and anything that is a bit more complex than a simple string. I'll move forward with this type of implementation though and see if I can make it work in a reliable manner 👍

@pgrzesik pgrzesik force-pushed the feature-add-support-for-aws-lambda-image-config branch from 99bcc58 to 91eb1c9 Compare January 20, 2021 17:43
@pgrzesik
Copy link
Contributor Author

I believe most of the points were addressed via: #8786 and I've rebased + added last pieces (checking for ImageConfig change) - should be ready to review 💯

@pgrzesik pgrzesik requested a review from medikoo January 21, 2021 10:02
Copy link
Contributor

@medikoo medikoo left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks great 👍

@pgrzesik pgrzesik merged commit 9a55537 into master Jan 21, 2021
@pgrzesik pgrzesik deleted the feature-add-support-for-aws-lambda-image-config branch January 21, 2021 10:51
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

AWS Lambda: Support ImageConfig properties
2 participants