Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat(AWS Lambda): Add support for building Docker images #8725

Merged
merged 5 commits into from
Jan 18, 2021

Conversation

pgrzesik
Copy link
Contributor

@pgrzesik pgrzesik commented Jan 7, 2021

This is still WIP, but I wanted to consult a few smaller things before cleaning everything up, providing proper test coverage and docs.

A few questions:

  1. Is it dangerous to try to remove ECR repository at all times? I don't think so because we have a pretty unique name, however, we cannot be certain if someone never used Docker support or maybe someone, in the meantime, removed all defined images from their serverless.yml. What do you think here? Should we try to remove ECR repository only if there are provider.images defined?
  2. I was pondering with the idea of re-tagging images in case of deploy failure as it's a little bit messy from implementation standpoint and I'm wondering if we even need that. We're always pinning configuration to specific imageSha - if deploy fails, we won't trigger cleanupEcrRepository, which means the image won't be deleted and Lambdas won't be affected. There will be an untagged + tagged image that is not actually used in the repository, but it's going to be fixed automatically after next successful deployment, so unless someone purposefully removes the image from ECR, there shouldn't be any issues. What is your opinion on that? Do you think it's still worth attempting to re-tag the image in case of failure?
  3. Currently, while building multiple images and uploading them to ECR, the messages are "mixed', meaning you'll see the output from two docker build and docker push processes displayed simultaneously. I don't think we can "fix" it other than enforcing serial build process. Do you think it's fine to keep the outputs like that?
  4. I was thinking about potential improvements for the future e.g. allowing people to provide their own ECR repository as configuration. As of right now, we would have to introduce extra property on provider, in addition to images. I was thinking that maybe, we should create an "umbrella" property on provider like ecr or docker or something along those lines that will hold all related configurations (provider.docker.images, provider.docker.repository, etc)? We might get additional ones in the future (e.g. setting to keep old images instead of removing them). What do you think?
  5. What is the "recommended" way of handling errors from spawnExt? Should we try to catch them and provide a custom error message for user or just fail whenever process fails?
  6. Should we inspect if docker is installed on the machine before continuing further? Right now it will fail on build attepmt, but I was thinking that maybe we should fail faster in such case.

Closes: #8622

@codecov
Copy link

codecov bot commented Jan 7, 2021

Codecov Report

Merging #8725 (a4b317a) into master (0f7b31b) will increase coverage by 0.13%.
The diff coverage is 98.41%.

Impacted file tree graph

@@            Coverage Diff             @@
##           master    #8725      +/-   ##
==========================================
+ Coverage   87.59%   87.72%   +0.13%     
==========================================
  Files         259      262       +3     
  Lines        9640     9763     +123     
==========================================
+ Hits         8444     8565     +121     
- Misses       1196     1198       +2     
Impacted Files Coverage Δ
lib/plugins/aws/lib/checkIfEcrRepositoryExists.js 88.88% <88.88%> (ø)
lib/plugins/aws/provider.js 94.50% <98.88%> (+1.41%) ⬆️
lib/plugins/aws/deploy/index.js 96.36% <100.00%> (-0.19%) ⬇️
lib/plugins/aws/deployFunction.js 95.86% <100.00%> (+0.86%) ⬆️
lib/plugins/aws/lib/naming.js 98.22% <100.00%> (+0.03%) ⬆️
lib/plugins/aws/package/compile/functions.js 96.02% <100.00%> (-0.17%) ⬇️
lib/plugins/aws/remove/index.js 100.00% <100.00%> (ø)
lib/plugins/aws/remove/lib/ecr.js 100.00% <100.00%> (ø)
lib/classes/ConfigSchemaHandler/index.js 91.35% <0.00%> (-0.06%) ⬇️
...classes/ConfigSchemaHandler/resolveDataPathSize.js 91.30% <0.00%> (ø)
... and 4 more

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 0f7b31b...a4b317a. Read the comment docs.

Copy link
Contributor

@medikoo medikoo left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @pgrzesik it looks really good, and I think not much is left here. See my comments

Should we try to remove ECR repository only if there are provider.images defined?

Great question. I would attempt to remove repository unconditionally as a whole only when we're removing the service via sls remove, still I would attempt it in a way in which it doesn't prolong time of sls remove when repository doesn't exist (I proposed how we can do it in one of the comments

Do you think it's still worth attempting to re-tag the image in case of failure?

I think we may either:

  • Skip it, and do not attempt to search and remove unused images
  • Always be clean, and always attempt to leave clean slate (attempt to remove any not used images)

Second feels a bit aggressive, I think we may skip it for now, and eventually revisit if concern would be raised. Working with specific use case may give us better idea on what approach would be most optimal, so we may wait for that.

I don't think we can "fix" it other than enforcing serial build process. Do you think it's fine to keep the outputs like that?

I'd rather fix it. Outputs like that are very confusing and may lead to wrong assumptions. I'd fix it by flushing output of processes after they finalize and not on the go

I was thinking that maybe, we should create an "umbrella" property on provider like ecr or docker

Very good call. I propose to use ecr (AWS pointed to us once that ECR is not only for docker, in light of that docker doesn't seem as valid choice)

What is the "recommended" way of handling errors from spawnExt? Should we try to catch them and provide a custom error message for user or just fail whenever process fails?

I think we should catch, and crash with meaningful ServerlessError. If we pass through original error, it'll be assumed as programmer error (so like 500), and will be presented to user with full stack trace (and such errors unconditionally are treated as bugs in our logic).

Right now it will fail on build attepmt, but I was thinking that maybe we should fail faster in such case.

If from build fail we can easily diagnose error signals that docker is not installed. I would keep that, and just throw meaningful ServerlessError in response.


It'll be great to also test this on Windows. Do you have some access to Windows machine?

// TODO: VALIDATE THAT LENGTH
'^[a-z]{1,64}$': {
oneOf: [
// TODO: Combine into one?
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it's best to stick to one option, and validate exclusivity of path and uri manually.

It's because in next PR will introdoce common options that apply to all scenarios and we should not duplicate their definitions. Other reason is that it'll make validation errors more human
friendly, anyOf fail in such case could be reported just with provider.images.foo format is not supported which doesn't inform what properties didn't validate

Otherwise (as a side note), let's always rely on anyOf (not oneOf)

With oneOf AJV always validates all paths (if more that one passes validation fails), and applies coercion if needed in each case, which may malform final value.

With anyOf AJV stops at first passing option

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, that's a great advice, I'll change it

type: 'string',
pattern:
'^\\d+\\.dkr\\.ecr\\.[a-z0-9-]+..amazonaws.com\\/([^@]+)|([^@:]+@sha256:[a-f0-9]{64})$',
oneOf: [
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's use anyOf

@@ -1698,6 +1744,170 @@ class AwsProvider {
return this.getStackResources(res.NextToken, allResources);
});
}

getEcrRepositoryName() {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We have a dedicated naming.js module for such methods

Comment on lines 1790 to 1842
if (err.providerError && err.providerError.code === 'RepositoryNotFoundException') {
// TODO: CONSIDER MORE PARAMS FOR REPO CREATION?
const result = await this.request('ECR', 'createRepository', {
repositoryName,
});
repositoryUri = result.repository.repositoryUri;
} else {
throw err;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Style hint.

It usually feels more concise to construct flow as:

if (!x) throw error;
... process ...

Instead of

if (x) {
   ... process ...
} else {
  throw error;
}

Comment on lines 1826 to 1828
const { repositoryUri, repositoryName } = await this.getOrCreateEcrRepository();

await this.dockerLoginToEcr();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Those two can probably be issued in parallel

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are you talking about getOrCreateEcrRepository and dockerLoginToEcr? I think it's not safe to issue them in parallel as if the repo doesn't exist, we have to create it before trying to use dockerLoginToEcr

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm.. but aren't we logging in to ECR in general, and not to specific repo?

Or do you mean that if there's no single repo at given region AWS would reject login?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You're totally right, please disregard my previous comment

lib/plugins/aws/provider.js Show resolved Hide resolved
if (err.providerError && err.providerError.code === 'RepositoryNotFoundException') {
// Pass silently
} else {
throw err;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wonder whether it's fine to unconditionally throw here.

e.g. what if user works in region in which AWS ECR is not supported at all (I'm not sure if it can be the case, but if it would we will throw at every removal in such region.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I did not consider that situation, I'll investigate if such a thing can happen

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I tried to investigate it but I didn't find a single mention of ECR not being available in a specific region - do you envision any other potential issues with unconditionally throwing here @medikoo ?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I saw you already updated to issue it only if it's confirmed that repository exists. In such case I think it's totally safe to run it unconditionally

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's correct, however, checking for repository existence has similar check at the moment, which can trigger (potentially) the same issue

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, I just saw that. I think we can leave it for now, and eventually we'll react if anything is reported.

I'm not sure how it'll behave in e.g. Chinese regions, which are not accessible if you do not have registered business in China

Copy link
Contributor Author

@pgrzesik pgrzesik Jan 13, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was thinking about China specifically, but if you don't have access there, how all other API calls via SDK work (or not)?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was referring more to case where someone has access but e.g. given AWS service is not provided in the region (I can imagine that china regions could be limited to US ones).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Understood 👍 I was actually researching availability of ECR service but didn't find a single region where it's not available, including China regions: https://www.amazonaws.cn/en/about-aws/regional-product-services/

Comment on lines 20 to 24
await this.validate();
await this.emptyS3Bucket();
const cfData = await this.removeStack();
await this.monitorStack('delete', cfData);
await this.removeEcrRepository();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It would be nice to construct this in a way, that we have two methods. First checkIfEcRepositoryExist, which we should issue at beginning but do not await,
and await after monitorStack succeds, and if it returns true proceed with removeEcrRepository()

This will ensure we do not add time to remove command for services where ECR is not involved

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's a clever way to skip adding extra time for people that don't use docker at all, I'll update it

},
fnImageWithTag: {
image: '000000000000.dkr.ecr.sa-east-1.amazonaws.com/test-lambda-docker:stable',
},
fnImageWithExplicitUri: {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's maybe prefix name with fnProviderImage to differentiate those which have image config inline and those which refer to provider.images

@pgrzesik
Copy link
Contributor Author

pgrzesik commented Jan 8, 2021

Thanks @medikoo for all great points raised

Should we try to remove ECR repository only if there are provider.images defined?

Great question. I would attempt to remove repository unconditionally as a whole only when we're removing the service via sls remove, still I would attempt it in a way in which it doesn't prolong time of sls remove when repository doesn't exist (I proposed how we can do it in one of the comments

I really like the proposed idea 👍

Do you think it's still worth attempting to re-tag the image in case of failure?

I think we may either:

  • Skip it, and do not attempt to search and remove unused images
  • Always be clean, and always attempt to leave clean slate (attempt to remove any not used images)

Second feels a bit aggressive, I think we may skip it for now, and eventually revisit if concern would be raised. Working with specific use case may give us better idea on what approach would be most optimal, so we may wait for that.

The only thing I'm worried about here is the potential leftover images generating extra costs for users. Storage in ECR is $0.1/GB/month, which can quickly add up to a significant sum after some time. But I like the idea of potentially revisiting the approach after we get feedback from the community. With that in mind, do you suggest to remove the logic that cleans up unused images after each deploy for now?

I don't think we can "fix" it other than enforcing serial build process. Do you think it's fine to keep the outputs like that?

I'd rather fix it. Outputs like that are very confusing and may lead to wrong assumptions. I'd fix it by flushing output of processes after they finalize and not on the go

Yeah, that's the only way to ensure we don't mix up outputs. The only downside is that users won't see progress of build/push in real-time, but I guess we have to make a tradeoff here.

I was thinking that maybe, we should create an "umbrella" property on provider like ecr or docker

Very good call. I propose to use ecr (AWS pointed to us once that ECR is not only for docker, in light of that docker doesn't seem as valid choice)

👍

What is the "recommended" way of handling errors from spawnExt? Should we try to catch them and provide a custom error message for user or just fail whenever process fails?

I think we should catch, and crash with meaningful ServerlessError. If we pass through original error, it'll be assumedThanks @medikoo for all great points raised

Should we try to remove ECR repository only if there are provider.images defined?

Great question. I would attempt to remove repository unconditionally as a whole only when we're removing the service via sls remove, still I would attempt it in a way in which it doesn't prolong time of sls remove when repository doesn't exist (I proposed how we can do it in one of the comments

I really like the proposed idea 👍

Do you think it's still worth attempting to re-tag the image in case of failure?

I think we may either:

  • Skip it, and do not attempt to search and remove unused images
  • Always be clean, and always attempt to leave clean slate (attempt to remove any not used images)

Second feels a bit aggressive, I think we may skip it for now, and eventually revisit if concern would be raised. Working with specific use case may give us better idea on what approach would be most optimal, so we may wait for that.

The only thing I'm worried about here is the potential leftover images generating extra costs for users. Storage in ECR is $0.1/GB/month, which can quickly add up to a significant sum after some time. But I like the idea of potentially revisiting the approach after we get feedback from the community. With that in mind, do you suggest to remove the logic that cleans up unused images after each deploy for now?

I don't think we can "fix" it other than enforcing serial build process. Do you think it's fine to keep the outputs like that?

I'd rather fix it. Outputs like that are very confusing and may lead to wrong assumptions. I'd fix it by flushing output of processes after they finalize and not on the go

Yeah, that's the only way to ensure we don't mix up outputs. The only downside is that users won't see progress of build/push in real-time, but I guess we have to make a tradeoff here.

I was thinking that maybe, we should create an "umbrella" property on provider like ecr or docker

Very good call. I propose to use ecr (AWS pointed to us once that ECR is not only for docker, in light of that docker doesn't seem as valid choice)

👍

What is the "recommended" way of handling errors from spawnExt? Should we try to catch them and provide a custom error message for user or just fail whenever process fails?

I think wThanks @medikoo for all great points raised

Should we try to remove ECR repository only if there are provider.images defined?

Great question. I would attempt to remove repository unconditionally as a whole only when we're removing the service via sls remove, still I would attempt it in a way in which it doesn't prolong time of sls remove when repository doesn't exist (I proposed how we can do it in one of the comments

I really like the proposed idea 👍

Do you think it's still worth attempting to re-tag the image in case of failure?

I think we may either:

  • Skip it, and do not attempt to search and remove unused images
  • Always be clean, and always attempt to leave clean slate (attempt to remove any not used images)

Second feels a bit aggressive, I think we may skip it for now, and eventually revisit if concern would be raised. Working with specific use case may give us better idea on what approach would be most optimal, so we may wait for that.

The only thing I'm worried about here is the potential leftover images generating extra costs for users. Storage in ECR is $0.1/GB/month, which can quickly add up to a significant sum after some time. But I like the idea of potentially revisiting the approach after we get feedback from the community. With that in mind, do you suggest to remove the logic that cleans up unused images after each deploy for now?

I don't think we can "fix" it other than enforcing serial build process. Do you think it's fine to keep the outputs like that?

I'd rather fix it. Outputs like that are very confusing and may lead to wrong assumptions. I'd fix it by flushing output of processes after they finalize and not on the go

Yeah, that's the only way to ensure we don't mix up outputs. The only downside is that users won't see progress of build/push in real-time, but I guess we have to make a tradeoff here.

I was thinking that maybe, we should create an "umbrella" property on provider like ecr or docker

Very good call. I propose to use ecr (AWS pointed to us once that ECR is not only for docker, in light of that docker doesn't seem as valid choice)

👍

What is the "recommended" way of handling errors from spawnExt? Should we try to catch them and provide a custom error message for user or just fail whenever process fails?

I think we should catch, and crash with meaningful ServerlessError. If we pass through original error, it'll be assumed as programmer error (so like 500), and will be presented to user with full stack trace (and such errors unconditionally are treated as bugs in our logic).

Makes perfect sense, I'll adjust it

Right now it will fail on build attepmt, but I was thinking that maybe we should fail faster in such case.

If from build fail we can easily diagnose error signals that docker is not installed. I would keep that, and just throw meaningful ServerlessError in response.

Will try to do that, thanks for a suggestion 👍

It'll be great to also test this on Windows. Do you have some access to Windows machine?

Yes, I've planned to try to test it out on Windows machine after wrapping up the development.

Thanks once again for all the answers and suggestions 🙇 e should catch, and crash with meaningful ServerlessError. If we pass through original error, it'll be assumed as programmer error (so like 500), and will be presented to user with full stack trace (and such errors unconditionally are treated as bugs in our logic).

Makes perfect sense, I'll adjust it

Right now it will fail on build attepmt, but I was thinking that maybe we should fail faster in such case.

If from build fail we can easily diagnose error signals that docker is not installed. I would keep that, and just throw meaningful ServerlessError in response.

Will try to do that, thanks for a suggestion 👍

It'll be great to also test this on Windows. Do you have some access to Windows machine?

Yes, I've planned to try to test it out on Windows machine after wrapping up the development.

Thanks once again for all the answers and suggestions 🙇 as programmer error (so like 500), and will be presented to user with full stack trace (and such errors unconditionally are treated as bugs in our logic).

Makes perfect sense, I'll adjust it

Right now it will fail on build attepmt, but I was thinking that maybe we should fail faster in such case.

If from build fail we can easily diagnose error signals that docker is not installed. I would keep that, and just throw meaningful ServerlessError in response.

Will try to do that, thanks for a suggestion 👍

It'll be great to also test this on Windows. Do you have some access to Windows machine?

Yes, I've planned to try to test it out on Windows machine after wrapping up the development.

Thanks once again for all the answers and suggestions 🙇

@medikoo
Copy link
Contributor

medikoo commented Jan 8, 2021

Storage in ECR is $0.1/GB/month,

I wasn't aware it's the case. In light of that as we already have cleanup logic, maybe let's keep it, and it could be fine to keep the image on failed deployment as you proposed, as I take it's still destined to be removed on cleanup operation with next (successful) deployment if it for some reason it will end as not used (?)

@pgrzesik
Copy link
Contributor Author

pgrzesik commented Jan 8, 2021

Storage in ECR is $0.1/GB/month,

I wasn't aware it's the case. In light of that as we already have cleanup logic, maybe let's keep it, and it could be fine to keep the image on failed deployment as you proposed, as I take it's still destined to be removed on cleanup operation with next (successful) deployment if it for some reason it will end as not used (?)

Yes, in current form, if the deploy will fail for some reason, the image will be kept, but during next deployment we will once again push the newest version and re-tag even if nothing changed for the image and the function will be configured with that latest version. The old ones won't have tags so they will be removed

async cleanupEcrRepository() {
// TODO: Add check if we should even try removing anything
const repositoryName = this.provider.naming.getEcrRepositoryName();
const registryId = await this.provider.getAccountId();
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you see any issues with memoize'ing getAccountId call on provider? I'm using it a few times and I dont' want to incur extra API calls - I think it's safe but wanted to double check here

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it should be totally safe. I do not envision scenario where creds change over the course of processing command.

Still there might be an issue if this command is invoked prematurely (e.g. before variables are resolved), but hopefully there won't be such case.

Also memoizing may influence tests. If there are issues, it might be needed to reset cache here:

// Ensure to reset cache for local serverless installation resolution
// Leaking it across test files may introduce wrong assumptions when result is used for testing
resolveLocalServerless.clear();
// Ensure to reset memoization on artifacts generation after each test file run.
// Reason is that homedir is automatically cleaned for each test,
// therefore eventually built custom resource file is also removed
_ensureArtifact.clear();
but I think, that in this case it should not be needed

@pgrzesik
Copy link
Contributor Author

In addition, I've testing it extensively on Windows machine, I managed to catch one extra bug there but otherwise it works without issues 👍

@pgrzesik pgrzesik force-pushed the add-support-for-docker-images branch 9 times, most recently from 2417185 to 2f7ae4c Compare January 13, 2021 10:06
@pgrzesik pgrzesik marked this pull request as ready for review January 13, 2021 10:20
@pgrzesik
Copy link
Contributor Author

@medikoo I believe it's ready to be reviewed again - I've added all missing pieces, tested it both on Windows and Linux (including regression with just functions), all feedback will be much appreciated 🙇

Copy link
Contributor

@medikoo medikoo left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you @pgrzesik that's outstanding!

Concerning implentation I have just some nitpicks, otherwise I think it'll be nice to improve tests organization and we should be good to go

}

// We should only keep the images that are tagged with imageNames that are defined in service
const imageTagsDefinedInService = Object.keys(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: It'll be nicer to have it as Set instance

images: {
type: 'object',
patternProperties: {
'^[a-z]{1,32}$': {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I assume it's case insensitve. For the sake of clarity, I would probably add A-Z also I think we should support numbers and - (just not as fist chars), what do you think?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it's in fact case sensitive and you cannot use uppercase. I totally agree that we can use numbers and additional chars though 👍

{ $ref: '#/definitions/ecrImageUri' },
{
type: 'string',
pattern: '^[a-z]{1,32}$',
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It'll be nice to keep this pattern in variable, as e.g. we do here:

const functionNamePattern = '^[a-zA-Z0-9-_]+$';

});
const { authorizationToken, proxyEndpoint } = result.authorizationData[0];
const decodedAuthToken = Buffer.from(authorizationToken, 'base64').toString().split(':')[1];
const dockerArgs = ['login', '--username', 'AWS', '--password-stdin', proxyEndpoint];
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think docker has an option to simply pass password as an argument (--password ?).

It'll probably be cleaner to use that, instead of playing with stdin

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

At first I went with --password, however, Docker complains that it's insecure to use it --password directly and suggests to use --password-stdin instead. See message below:

WARNING! Using --password via the CLI is insecure. Use --password-stdin.

In general, even if password-stdin is used, there's still another warning:

WARNING! Your password will be stored unencrypted in /home/pgrzesik/.docker/config.json.
Configure a credential helper to remove this warning. See
https://docs.docker.com/engine/reference/commandline/login/#credentials-store

But we cannot really do anything meaningful about that one if we want to automatically login people to docker.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What do you think here? Should we go back to --password or keep the current approach (though I agree that it's so-so with stdin interaction)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If it's just a printed warning when --password is used, and not a deprecation of any kind, I would switch to regular param input

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Changed 👍

const pathToDockerfile = path.join(imagePath, 'Dockerfile');

try {
const stat = await fs.promises.stat(pathToDockerfile);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: Convention is to name stats the result of fs.stat

);
}

const [{ repositoryUri, repositoryName }, ,] = await Promise.all([
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: I guess trailing commas are not needed

},
})
).to.be.rejectedWith(
'Either "uri" or "path" property needs to be set on image "invalidimage"'
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's rely on codes, as e.g. here:

).to.eventually.be.rejected.and.have.property(
'code',
'EXTERNAL_HTTP_API_AUTHORIZER_WITHOUT_EXTERNAL_HTTP_API'

(Human readable message are subject to typo's, grammar improvements)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

good call 👍

@@ -1756,6 +1756,363 @@ describe('lib/plugins/aws/package/compile/functions/index.test.js', () => {
expect(fooFunctionRole).to.equal(fooFunctionConfig.role);
});
});

it('should fail if `functions[].image` references image without path and uri', async () => {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Aren't those tests duplicate of ones in deployFunction.test.js ?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's correct, just tested in different use case scenario - do you feel like these should be just moved to provider.test.js?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've moved these to provider.test.js from both deployFunction.test.js and functions.test.js

},
};

it('should fail if `functions[].image` references image without path and uri', async () => {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think those tests should be in provider.test.js where we confirm custom validation of properties.

(in general I would test functionalities in test files which correspond to files in which they were implemented)

Also I guess we can use package command

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How would you go about ensuring that the resolution happens correctly also for deploy function -f functionName? While the functionality is in provider.js, both deploy and deploy function use it a bit differently.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Very good point (and I think in tons of other places we forgot about packaging that's implied by sls deploy function.

I think we should probably ensure one sanity test (happy path) for both sls package (implied by sls deploy) and sls deploy function, but then test specific scenarios on just one of them.

Otherwise ideally it would be to have this functionality implemented independently of Serverless instance (it could take a imageName and imagePath), and then we can do unit tests against that uitlity, and with runServerless just confirm happy path for sls package and sls deploy function.

Still I guess it's not possible until we have #8445 done (and it's likely that some other things will stand in a way)

(e.g with refactor I'm working on, I'm secluding all those bricks into independent utils, that makes right unit testing easier)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I really like that suggestion and that's what I went with - happy path cases in both functions.test.js and deployFunction.test.js and full coverage in provider.test.js. I agree that ideally, I would love to have image resolution implemented as independent util, but at the moment it's hard as there are direct dependencies on provider instance (the biggest one on request that you mentioned). But I know we're on a good path to better modularization 🙌

@pgrzesik pgrzesik force-pushed the add-support-for-docker-images branch from 4fde59f to fd1cdf7 Compare January 13, 2021 21:43
Copy link
Contributor

@medikoo medikoo left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you @pgrzesik ! Looks great. I have just two last questions to clarify, and we're good to go

lib/plugins/aws/provider.js Show resolved Hide resolved
];
try {
const { stdBuffer } = await spawnExt('docker', dockerArgs);
process.stdout.write(stdBuffer);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is the warning that we use --password now visible in a console?

If it's the case I would probably not show output of this command by default (what's more important is image operations actions, where it's simple login)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, it's visible in the console, we could hide it but I'm not sure if we should hide information that we're storing the credentials in an unencrypted way locally. That's what is showing right now from docker login

WARNING! Using --password via the CLI is insecure. Use --password-stdin.
WARNING! Your password will be stored unencrypted in /home/pgrzesik/.docker/config.json.
Configure a credential helper to remove this warning. See
https://docs.docker.com/engine/reference/commandline/login/#credentials-store

Login Succeeded

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm.. I think it's confusing for user. It's not user that made a choice of how password is transferred, and visible warning may imply to him that something is not right.

I would hide this output and instead show a visible notice:

Logged Docker successfully into AWS ECR.

Note: that authentication token is stored as unencrypted  in /home/pgrzesik/.docker/config.json.

Second line I'd put in orange as we do with other warnings. I believe we should also add a note about that into doc.

Also this triggers another question, if user will configure a credential helper that's mentioned in docs, can we somehow omit the login step? (we may ofc leave it for other PR)

Copy link
Contributor Author

@pgrzesik pgrzesik Jan 14, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

About displaying that note: it might be tricky as the outputs vary between operating systems (as well as the location of .docker/config.json) - for example, on Windows machine I didn't get that warning. As for the credential helpers, I believe we still have to perform login for two reasons:

  1. Tokens obtained from AWS are short-lived and there's no information about their validity stored (we could try to fallback to login only if needed (when push operation is denied e.g.), but that still requires logging in)
  2. Even if credential helper is used (afaik), it's only about where the tokens are stored, e.g. in some kind of encrypted file instead of plaintext as in default, we still have to login, but it's just up to docker to properly store that token during login

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Now I'm thinking about it more - maybe we shouldn't start with docker login, but rather only issue it if docker push command fails due to incorrect credentials?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There's also https://github.com/awslabs/amazon-ecr-credential-helper utility that would make docker login step redundant in case someone uses it

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

but rather only issue it if docker push command fails due to incorrect credentials?

Good idea! Going that road will definitely be nicer. At least user will have a choice and be able to setup authentication via own means if that's preferred.

but if for some reason recognition of authentication error is tricky I would skip it for now.

About displaying that note: it might be tricky as the outputs vary between operating systems

I would probably then search for a phrase stored unencrypted in <somePath>, and show our note only if we find it in output, otherwise not show it at all.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've went for this type of output (warning with orange colour):

Serverless: Login to Docker succeeded!
Serverless: WARNING: Docker authentication token will be stored unencrypted. Configure Docker credential helper to remove this warning.
The push refers to repository [600238737408.dkr.ecr.us-east-1.amazonaws.com/serverless-someproject-dev]
bdfa6eb032c6: Preparing
61b3904fa195: Preparing
6a7f9b3693e2: Preparing

What do you think?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks great, I would probably add ..will be stored unencrypted in docker config, so user is not confused where it's stored in general

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

good call 👍

@pgrzesik pgrzesik force-pushed the add-support-for-docker-images branch 3 times, most recently from 24ccb59 to da640c3 Compare January 14, 2021 15:20
@pgrzesik pgrzesik requested a review from medikoo January 14, 2021 15:31
Copy link
Contributor

@medikoo medikoo left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @pgrzesik I have just one style improvement proposal. Let me know

Comment on lines 1888 to 1912
try {
({ stdBuffer: pushDockerStdBuffer } = await spawnExt('docker', pushDockerArgs));
process.stdout.write(pushDockerStdBuffer);
} catch (outerPushErr) {
let errToThrow = outerPushErr;
if (
outerPushErr.stdBuffer &&
outerPushErr.stdBuffer.includes('no basic auth credentials')
) {
errToThrow = null;
await this.dockerLoginToEcr();
try {
({ stdBuffer: pushDockerStdBuffer } = await spawnExt('docker', pushDockerArgs));
process.stdout.write(pushDockerStdBuffer);
} catch (innerPushError) {
errToThrow = innerPushError;
}
}
if (errToThrow) {
throw new this.serverless.classes.Error(
`Encountered error during executing: docker ${pushDockerArgs}\nOutput of the command:\n${errToThrow.stdBuffer}`,
'DOCKER_PUSH_ERROR'
);
}
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it'll be nicer to configure with more functional than imperative approach. Something as:

const pushTag = (args, options = {}) => {
  try {
    spawn('push tag', args);
  } catch (error) {
    if (!options.isLoggedIn && notLoggedInError) {
      login();
      return pushTag(args, { isLoggedIn: true })
    }
    throw error;
  }
}

pushTag(args);

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

great idea - changed

@pgrzesik pgrzesik force-pushed the add-support-for-docker-images branch from e2a5fab to a4b317a Compare January 15, 2021 09:31
Copy link
Contributor

@medikoo medikoo left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@pgrzesik Great job!

Last thing that was holding us, was random crashes we observed in coverage build run against Node.js v10.

I've attempted to further investigate it, and so far outcome is as follows:

It looks that the orphaned crash happens either on fs.promises.open, fs.readFile or fs.open call. As it's only those functions (with fs.promises.readFile) which result with such error with no stack trace (Interestingly callback taking ones, sometimes expose stack trace and sometimes not).
I've excluded the fs.promises.readFile from that list, as having it patched I was able to reproduce the error without info in error message: https://github.com/serverless/serverless/runs/1707179867

At some points I've attempted to patch fs.readFileSync (although that was a false call, as those always produce stack trace), and after repeated runs got the error, but this time it was OOM error: https://github.com/serverless/serverless/runs/1708122362

This signals to me, that it's likely we're nearing end of memory and it causes the condition which produces the error we're occasionally seing

I had no further success in reproducing the error, when patching fs.promises.open, fs.promises.readFile, fs.readFile and fs.open. I've restarted build countless times and it always was successful.
It was during the weekend, so I wonder whether traffic on GitHub Actions servers doesn't influence some more restrictive memory usage which then makes error more likely.

I propose to merge it now, and if we observe the issue again, I'd commit-in the patch for fs methods to have better stack trace if error happens again. Then having it repeated again we'll decide on further steps.

At this point it looks as condition related to tight memory usage, which may no longer be likely (as e.g. we reverted AJV to v6 which is less consuming)

@pgrzesik
Copy link
Contributor Author

Thanks @medikoo for extremely detailed report of your investigation and all the time you spend researching it 🙇 Your suggestion about running out of memory makes a lot of sense as it was only visible on constrained CI environment, where locally I've never encountered it.

@brunocasado
Copy link

this doesn't works....

Your Environment Information ---------------------------
Operating System: linux
Node Version: 14.15.4
Framework Version: 2.15.0 (local)
Plugin Version: 4.2.0
SDK Version: 2.3.2
Components Version: 3.4.3

Serverless: Configuration warning:
Serverless: at 'functions.calculateBussofane.image': should match pattern "^\d+.dkr.ecr.[a-z0-9-]+..amazonaws.com/[^@]+@sha256:[a-f0-9]{64}$"
Serverless: at 'provider': unrecognized property 'ecr'

What i'm doing wrong?

@pgrzesik
Copy link
Contributor Author

pgrzesik commented Mar 1, 2021

Hello @brunocasado 👋 I believe you need to update your serverless version - according to your listing you're using 2.15.0 while support for building images was released with 2.20.0 - please refer to the changelog: https://github.com/serverless/serverless/blob/master/CHANGELOG.md#2200-2021-01-21

@brunocasado
Copy link

Hello @brunocasado 👋 I believe you need to update your serverless version - according to your listing you're using 2.15.0 while support for building images was released with 2.20.0 - please refer to the changelog: https://github.com/serverless/serverless/blob/master/CHANGELOG.md#2200-2021-01-21

omg i got.

sorry, now i realized that i did an update only in the global serverless and not inside the application.

@pgrzesik
Copy link
Contributor Author

pgrzesik commented Mar 1, 2021

No worries @brunocasado - hope it works correctly after upgrade 💯

@brunocasado
Copy link

No worries @brunocasado - hope it works correctly after upgrade 💯

thank you!

It's working!

My opinion: This works much better than zip method. My problem with layer + python requirements ( size problem) is solved. :)

@JacobWithACapitalJ
Copy link

Hi, are there any docs/ tutorials for this? I'm struggling to find anything that explains using this new feature. Thanks!

@pgrzesik
Copy link
Contributor Author

Hello @JacobWithACapitalJ 👋 We have a blog post that includes a walkthrough: https://www.serverless.com/blog/container-support-for-lambda/

@paulacampigotto
Copy link

@pgrzesik is it possible to create an ECR and upload a local image ? just like it's done with lambdas, but with no lambda? I would like to create an ECR and use it's image in a ECS's task definition, creating the image in the same serverless.yml file.

@pgrzesik
Copy link
Contributor Author

Hello @paulacampigotto - at the moment such functionality is not supported, building and uploading image is triggered only if the image definition is used for at least one function.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

AWS Lambda: Auto create and push docker images to ECR repository
5 participants