feat(AWS Lambda): Add support for building Docker images #8725

pgrzesik · 2021-01-07T15:12:03Z

This is still WIP, but I wanted to consult a few smaller things before cleaning everything up, providing proper test coverage and docs.

A few questions:

Is it dangerous to try to remove ECR repository at all times? I don't think so because we have a pretty unique name, however, we cannot be certain if someone never used Docker support or maybe someone, in the meantime, removed all defined images from their serverless.yml. What do you think here? Should we try to remove ECR repository only if there are provider.images defined?
I was pondering with the idea of re-tagging images in case of deploy failure as it's a little bit messy from implementation standpoint and I'm wondering if we even need that. We're always pinning configuration to specific imageSha - if deploy fails, we won't trigger cleanupEcrRepository, which means the image won't be deleted and Lambdas won't be affected. There will be an untagged + tagged image that is not actually used in the repository, but it's going to be fixed automatically after next successful deployment, so unless someone purposefully removes the image from ECR, there shouldn't be any issues. What is your opinion on that? Do you think it's still worth attempting to re-tag the image in case of failure?
Currently, while building multiple images and uploading them to ECR, the messages are "mixed', meaning you'll see the output from two docker build and docker push processes displayed simultaneously. I don't think we can "fix" it other than enforcing serial build process. Do you think it's fine to keep the outputs like that?
I was thinking about potential improvements for the future e.g. allowing people to provide their own ECR repository as configuration. As of right now, we would have to introduce extra property on provider, in addition to images. I was thinking that maybe, we should create an "umbrella" property on provider like ecr or docker or something along those lines that will hold all related configurations (provider.docker.images, provider.docker.repository, etc)? We might get additional ones in the future (e.g. setting to keep old images instead of removing them). What do you think?
What is the "recommended" way of handling errors from spawnExt? Should we try to catch them and provide a custom error message for user or just fail whenever process fails?
Should we inspect if docker is installed on the machine before continuing further? Right now it will fail on build attepmt, but I was thinking that maybe we should fail faster in such case.

Closes: #8622

codecov · 2021-01-07T15:17:13Z

Codecov Report

Merging #8725 (a4b317a) into master (0f7b31b) will increase coverage by 0.13%.
The diff coverage is 98.41%.

@@            Coverage Diff             @@
##           master    #8725      +/-   ##
==========================================
+ Coverage   87.59%   87.72%   +0.13%     
==========================================
  Files         259      262       +3     
  Lines        9640     9763     +123     
==========================================
+ Hits         8444     8565     +121     
- Misses       1196     1198       +2

Impacted Files	Coverage Δ
lib/plugins/aws/lib/checkIfEcrRepositoryExists.js	`88.88% <88.88%> (ø)`
lib/plugins/aws/provider.js	`94.50% <98.88%> (+1.41%)`	⬆️
lib/plugins/aws/deploy/index.js	`96.36% <100.00%> (-0.19%)`	⬇️
lib/plugins/aws/deployFunction.js	`95.86% <100.00%> (+0.86%)`	⬆️
lib/plugins/aws/lib/naming.js	`98.22% <100.00%> (+0.03%)`	⬆️
lib/plugins/aws/package/compile/functions.js	`96.02% <100.00%> (-0.17%)`	⬇️
lib/plugins/aws/remove/index.js	`100.00% <100.00%> (ø)`
lib/plugins/aws/remove/lib/ecr.js	`100.00% <100.00%> (ø)`
lib/classes/ConfigSchemaHandler/index.js	`91.35% <0.00%> (-0.06%)`	⬇️
...classes/ConfigSchemaHandler/resolveDataPathSize.js	`91.30% <0.00%> (ø)`
... and 4 more

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 0f7b31b...a4b317a. Read the comment docs.

medikoo

Thanks @pgrzesik it looks really good, and I think not much is left here. See my comments

Should we try to remove ECR repository only if there are provider.images defined?

Great question. I would attempt to remove repository unconditionally as a whole only when we're removing the service via sls remove, still I would attempt it in a way in which it doesn't prolong time of sls remove when repository doesn't exist (I proposed how we can do it in one of the comments

Do you think it's still worth attempting to re-tag the image in case of failure?

I think we may either:

Skip it, and do not attempt to search and remove unused images
Always be clean, and always attempt to leave clean slate (attempt to remove any not used images)

Second feels a bit aggressive, I think we may skip it for now, and eventually revisit if concern would be raised. Working with specific use case may give us better idea on what approach would be most optimal, so we may wait for that.

I don't think we can "fix" it other than enforcing serial build process. Do you think it's fine to keep the outputs like that?

I'd rather fix it. Outputs like that are very confusing and may lead to wrong assumptions. I'd fix it by flushing output of processes after they finalize and not on the go

I was thinking that maybe, we should create an "umbrella" property on provider like ecr or docker

Very good call. I propose to use ecr (AWS pointed to us once that ECR is not only for docker, in light of that docker doesn't seem as valid choice)

What is the "recommended" way of handling errors from spawnExt? Should we try to catch them and provide a custom error message for user or just fail whenever process fails?

I think we should catch, and crash with meaningful ServerlessError. If we pass through original error, it'll be assumed as programmer error (so like 500), and will be presented to user with full stack trace (and such errors unconditionally are treated as bugs in our logic).

Right now it will fail on build attepmt, but I was thinking that maybe we should fail faster in such case.

If from build fail we can easily diagnose error signals that docker is not installed. I would keep that, and just throw meaningful ServerlessError in response.

It'll be great to also test this on Windows. Do you have some access to Windows machine?

medikoo · 2021-01-07T21:13:58Z

lib/plugins/aws/provider.js

+                // TODO: VALIDATE THAT LENGTH
+                '^[a-z]{1,64}$': {
+                  oneOf: [
+                    // TODO: Combine into one?


I think it's best to stick to one option, and validate exclusivity of path and uri manually.

It's because in next PR will introdoce common options that apply to all scenarios and we should not duplicate their definitions. Other reason is that it'll make validation errors more human
friendly, anyOf fail in such case could be reported just with provider.images.foo format is not supported which doesn't inform what properties didn't validate

Otherwise (as a side note), let's always rely on anyOf (not oneOf)

With oneOf AJV always validates all paths (if more that one passes validation fails), and applies coercion if needed in each case, which may malform final value.

With anyOf AJV stops at first passing option

Thanks, that's a great advice, I'll change it

medikoo · 2021-01-07T21:14:57Z

lib/plugins/aws/provider.js

-              type: 'string',
-              pattern:
-                '^\\d+\\.dkr\\.ecr\\.[a-z0-9-]+..amazonaws.com\\/([^@]+)|([^@:]+@sha256:[a-f0-9]{64})$',
+              oneOf: [


Let's use anyOf

medikoo · 2021-01-07T21:15:47Z

lib/plugins/aws/provider.js

@@ -1698,6 +1744,170 @@ class AwsProvider {
      return this.getStackResources(res.NextToken, allResources);
    });
  }
+
+  getEcrRepositoryName() {


We have a dedicated naming.js module for such methods

medikoo · 2021-01-07T21:20:04Z

lib/plugins/aws/provider.js

+          if (err.providerError && err.providerError.code === 'RepositoryNotFoundException') {
+            // TODO: CONSIDER MORE PARAMS FOR REPO CREATION?
+            const result = await this.request('ECR', 'createRepository', {
+              repositoryName,
+            });
+            repositoryUri = result.repository.repositoryUri;
+          } else {
+            throw err;


Style hint.

It usually feels more concise to construct flow as:

if (!x) throw error; ... process ...

Instead of

if (x) { ... process ... } else { throw error; }

medikoo · 2021-01-07T21:22:29Z

lib/plugins/aws/provider.js

+        const { repositoryUri, repositoryName } = await this.getOrCreateEcrRepository();
+
+        await this.dockerLoginToEcr();


Those two can probably be issued in parallel

Are you talking about getOrCreateEcrRepository and dockerLoginToEcr? I think it's not safe to issue them in parallel as if the repo doesn't exist, we have to create it before trying to use dockerLoginToEcr

Hmm.. but aren't we logging in to ECR in general, and not to specific repo?

Or do you mean that if there's no single repo at given region AWS would reject login?

You're totally right, please disregard my previous comment

lib/plugins/aws/provider.js

medikoo · 2021-01-07T21:31:29Z

lib/plugins/aws/remove/lib/ecr.js

+    if (err.providerError && err.providerError.code === 'RepositoryNotFoundException') {
+      // Pass silently
+    } else {
+      throw err;


I wonder whether it's fine to unconditionally throw here.

e.g. what if user works in region in which AWS ECR is not supported at all (I'm not sure if it can be the case, but if it would we will throw at every removal in such region.

I did not consider that situation, I'll investigate if such a thing can happen

I tried to investigate it but I didn't find a single mention of ECR not being available in a specific region - do you envision any other potential issues with unconditionally throwing here @medikoo ?

I saw you already updated to issue it only if it's confirmed that repository exists. In such case I think it's totally safe to run it unconditionally

That's correct, however, checking for repository existence has similar check at the moment, which can trigger (potentially) the same issue

Yes, I just saw that. I think we can leave it for now, and eventually we'll react if anything is reported.

I'm not sure how it'll behave in e.g. Chinese regions, which are not accessible if you do not have registered business in China

I was thinking about China specifically, but if you don't have access there, how all other API calls via SDK work (or not)?

I was referring more to case where someone has access but e.g. given AWS service is not provided in the region (I can imagine that china regions could be limited to US ones).

Understood 👍 I was actually researching availability of ECR service but didn't find a single region where it's not available, including China regions: https://www.amazonaws.cn/en/about-aws/regional-product-services/

medikoo · 2021-01-07T21:33:46Z

lib/plugins/aws/remove/index.js

+        await this.validate();
+        await this.emptyS3Bucket();
+        const cfData = await this.removeStack();
+        await this.monitorStack('delete', cfData);
+        await this.removeEcrRepository();


It would be nice to construct this in a way, that we have two methods. First checkIfEcRepositoryExist, which we should issue at beginning but do not await,
and await after monitorStack succeds, and if it returns true proceed with removeEcrRepository()

This will ensure we do not add time to remove command for services where ECR is not involved

That's a clever way to skip adding extra time for people that don't use docker at all, I'll update it

medikoo · 2021-01-07T21:35:12Z

test/unit/lib/plugins/aws/package/compile/functions.test.js

            },
            fnImageWithTag: {
              image: '000000000000.dkr.ecr.sa-east-1.amazonaws.com/test-lambda-docker:stable',
            },
+            fnImageWithExplicitUri: {


Let's maybe prefix name with fnProviderImage to differentiate those which have image config inline and those which refer to provider.images

pgrzesik · 2021-01-08T09:31:59Z

Thanks @medikoo for all great points raised

Should we try to remove ECR repository only if there are provider.images defined?

Great question. I would attempt to remove repository unconditionally as a whole only when we're removing the service via sls remove, still I would attempt it in a way in which it doesn't prolong time of sls remove when repository doesn't exist (I proposed how we can do it in one of the comments

I really like the proposed idea 👍

Do you think it's still worth attempting to re-tag the image in case of failure?

I think we may either:

Skip it, and do not attempt to search and remove unused images

Always be clean, and always attempt to leave clean slate (attempt to remove any not used images)

Second feels a bit aggressive, I think we may skip it for now, and eventually revisit if concern would be raised. Working with specific use case may give us better idea on what approach would be most optimal, so we may wait for that.

The only thing I'm worried about here is the potential leftover images generating extra costs for users. Storage in ECR is $0.1/GB/month, which can quickly add up to a significant sum after some time. But I like the idea of potentially revisiting the approach after we get feedback from the community. With that in mind, do you suggest to remove the logic that cleans up unused images after each deploy for now?

I don't think we can "fix" it other than enforcing serial build process. Do you think it's fine to keep the outputs like that?

I'd rather fix it. Outputs like that are very confusing and may lead to wrong assumptions. I'd fix it by flushing output of processes after they finalize and not on the go

Yeah, that's the only way to ensure we don't mix up outputs. The only downside is that users won't see progress of build/push in real-time, but I guess we have to make a tradeoff here.

I was thinking that maybe, we should create an "umbrella" property on provider like ecr or docker

Very good call. I propose to use ecr (AWS pointed to us once that ECR is not only for docker, in light of that docker doesn't seem as valid choice)

👍

What is the "recommended" way of handling errors from spawnExt? Should we try to catch them and provide a custom error message for user or just fail whenever process fails?

I think we should catch, and crash with meaningful ServerlessError. If we pass through original error, it'll be assumedThanks @medikoo for all great points raised

Should we try to remove ECR repository only if there are provider.images defined?

Great question. I would attempt to remove repository unconditionally as a whole only when we're removing the service via sls remove, still I would attempt it in a way in which it doesn't prolong time of sls remove when repository doesn't exist (I proposed how we can do it in one of the comments

I really like the proposed idea 👍

Do you think it's still worth attempting to re-tag the image in case of failure?

I think we may either:

Skip it, and do not attempt to search and remove unused images

Always be clean, and always attempt to leave clean slate (attempt to remove any not used images)

Second feels a bit aggressive, I think we may skip it for now, and eventually revisit if concern would be raised. Working with specific use case may give us better idea on what approach would be most optimal, so we may wait for that.

The only thing I'm worried about here is the potential leftover images generating extra costs for users. Storage in ECR is $0.1/GB/month, which can quickly add up to a significant sum after some time. But I like the idea of potentially revisiting the approach after we get feedback from the community. With that in mind, do you suggest to remove the logic that cleans up unused images after each deploy for now?

I don't think we can "fix" it other than enforcing serial build process. Do you think it's fine to keep the outputs like that?

I'd rather fix it. Outputs like that are very confusing and may lead to wrong assumptions. I'd fix it by flushing output of processes after they finalize and not on the go

Yeah, that's the only way to ensure we don't mix up outputs. The only downside is that users won't see progress of build/push in real-time, but I guess we have to make a tradeoff here.

I was thinking that maybe, we should create an "umbrella" property on provider like ecr or docker

Very good call. I propose to use ecr (AWS pointed to us once that ECR is not only for docker, in light of that docker doesn't seem as valid choice)

👍

What is the "recommended" way of handling errors from spawnExt? Should we try to catch them and provide a custom error message for user or just fail whenever process fails?

I think wThanks @medikoo for all great points raised

Should we try to remove ECR repository only if there are provider.images defined?

Great question. I would attempt to remove repository unconditionally as a whole only when we're removing the service via sls remove, still I would attempt it in a way in which it doesn't prolong time of sls remove when repository doesn't exist (I proposed how we can do it in one of the comments

I really like the proposed idea 👍

Do you think it's still worth attempting to re-tag the image in case of failure?

I think we may either:

Skip it, and do not attempt to search and remove unused images

Always be clean, and always attempt to leave clean slate (attempt to remove any not used images)

Second feels a bit aggressive, I think we may skip it for now, and eventually revisit if concern would be raised. Working with specific use case may give us better idea on what approach would be most optimal, so we may wait for that.

The only thing I'm worried about here is the potential leftover images generating extra costs for users. Storage in ECR is $0.1/GB/month, which can quickly add up to a significant sum after some time. But I like the idea of potentially revisiting the approach after we get feedback from the community. With that in mind, do you suggest to remove the logic that cleans up unused images after each deploy for now?

I don't think we can "fix" it other than enforcing serial build process. Do you think it's fine to keep the outputs like that?

I'd rather fix it. Outputs like that are very confusing and may lead to wrong assumptions. I'd fix it by flushing output of processes after they finalize and not on the go

Yeah, that's the only way to ensure we don't mix up outputs. The only downside is that users won't see progress of build/push in real-time, but I guess we have to make a tradeoff here.

I was thinking that maybe, we should create an "umbrella" property on provider like ecr or docker

Very good call. I propose to use ecr (AWS pointed to us once that ECR is not only for docker, in light of that docker doesn't seem as valid choice)

👍

What is the "recommended" way of handling errors from spawnExt? Should we try to catch them and provide a custom error message for user or just fail whenever process fails?

I think we should catch, and crash with meaningful ServerlessError. If we pass through original error, it'll be assumed as programmer error (so like 500), and will be presented to user with full stack trace (and such errors unconditionally are treated as bugs in our logic).

Makes perfect sense, I'll adjust it

Right now it will fail on build attepmt, but I was thinking that maybe we should fail faster in such case.

If from build fail we can easily diagnose error signals that docker is not installed. I would keep that, and just throw meaningful ServerlessError in response.

Will try to do that, thanks for a suggestion 👍

It'll be great to also test this on Windows. Do you have some access to Windows machine?

Yes, I've planned to try to test it out on Windows machine after wrapping up the development.

Thanks once again for all the answers and suggestions 🙇 e should catch, and crash with meaningful ServerlessError. If we pass through original error, it'll be assumed as programmer error (so like 500), and will be presented to user with full stack trace (and such errors unconditionally are treated as bugs in our logic).

Makes perfect sense, I'll adjust it

Right now it will fail on build attepmt, but I was thinking that maybe we should fail faster in such case.

If from build fail we can easily diagnose error signals that docker is not installed. I would keep that, and just throw meaningful ServerlessError in response.

Will try to do that, thanks for a suggestion 👍

It'll be great to also test this on Windows. Do you have some access to Windows machine?

Yes, I've planned to try to test it out on Windows machine after wrapping up the development.

Thanks once again for all the answers and suggestions 🙇 as programmer error (so like 500), and will be presented to user with full stack trace (and such errors unconditionally are treated as bugs in our logic).

Makes perfect sense, I'll adjust it

Right now it will fail on build attepmt, but I was thinking that maybe we should fail faster in such case.

If from build fail we can easily diagnose error signals that docker is not installed. I would keep that, and just throw meaningful ServerlessError in response.

Will try to do that, thanks for a suggestion 👍

It'll be great to also test this on Windows. Do you have some access to Windows machine?

Yes, I've planned to try to test it out on Windows machine after wrapping up the development.

Thanks once again for all the answers and suggestions 🙇

medikoo · 2021-01-08T13:11:21Z

Storage in ECR is $0.1/GB/month,

I wasn't aware it's the case. In light of that as we already have cleanup logic, maybe let's keep it, and it could be fine to keep the image on failed deployment as you proposed, as I take it's still destined to be removed on cleanup operation with next (successful) deployment if it for some reason it will end as not used (?)

pgrzesik · 2021-01-08T13:20:56Z

Storage in ECR is $0.1/GB/month,

I wasn't aware it's the case. In light of that as we already have cleanup logic, maybe let's keep it, and it could be fine to keep the image on failed deployment as you proposed, as I take it's still destined to be removed on cleanup operation with next (successful) deployment if it for some reason it will end as not used (?)

Yes, in current form, if the deploy will fail for some reason, the image will be kept, but during next deployment we will once again push the newest version and re-tag even if nothing changed for the image and the function will be configured with that latest version. The old ones won't have tags so they will be removed

pgrzesik · 2021-01-11T07:45:11Z

lib/plugins/aws/deploy/lib/cleanupEcrRepository.js

+  async cleanupEcrRepository() {
+    // TODO: Add check if we should even try removing anything
+    const repositoryName = this.provider.naming.getEcrRepositoryName();
+    const registryId = await this.provider.getAccountId();


Do you see any issues with memoize'ing getAccountId call on provider? I'm using it a few times and I dont' want to incur extra API calls - I think it's safe but wanted to double check here

I think it should be totally safe. I do not envision scenario where creds change over the course of processing command.

Still there might be an issue if this command is invoked prematurely (e.g. before variables are resolved), but hopefully there won't be such case.

Also memoizing may influence tests. If there are issues, it might be needed to reset cache here:

serverless/test/mochaPatch.js

Lines 22 to 28 in b4fef7d

// Ensure to reset cache for local serverless installation resolution

// Leaking it across test files may introduce wrong assumptions when result is used for testing

resolveLocalServerless.clear();

// Ensure to reset memoization on artifacts generation after each test file run.

// Reason is that homedir is automatically cleaned for each test,

// therefore eventually built custom resource file is also removed

_ensureArtifact.clear();

but I think, that in this case it should not be needed

test/unit/lib/plugins/aws/deploy/index.test.js

test/unit/lib/plugins/aws/remove/index.test.js

test/unit/lib/plugins/aws/package/compile/functions.test.js

pgrzesik · 2021-01-11T07:51:16Z

In addition, I've testing it extensively on Windows machine, I managed to catch one extra bug there but otherwise it works without issues 👍

pgrzesik · 2021-01-13T10:22:17Z

@medikoo I believe it's ready to be reviewed again - I've added all missing pieces, tested it both on Windows and Linux (including regression with just functions), all feedback will be much appreciated 🙇

medikoo

Thank you @pgrzesik that's outstanding!

Concerning implentation I have just some nitpicks, otherwise I think it'll be nice to improve tests organization and we should be good to go

medikoo · 2021-01-13T14:02:29Z

lib/plugins/aws/deploy/lib/cleanupEcrRepository.js

+    }
+
+    // We should only keep the images that are tagged with imageNames that are defined in service
+    const imageTagsDefinedInService = Object.keys(


nit: It'll be nicer to have it as Set instance

medikoo · 2021-01-13T14:11:08Z

lib/plugins/aws/provider.js

+                images: {
+                  type: 'object',
+                  patternProperties: {
+                    '^[a-z]{1,32}$': {


I assume it's case insensitve. For the sake of clarity, I would probably add A-Z also I think we should support numbers and - (just not as fist chars), what do you think?

it's in fact case sensitive and you cannot use uppercase. I totally agree that we can use numbers and additional chars though 👍

medikoo · 2021-01-13T14:12:20Z

lib/plugins/aws/provider.js

+                { $ref: '#/definitions/ecrImageUri' },
+                {
+                  type: 'string',
+                  pattern: '^[a-z]{1,32}$',


It'll be nice to keep this pattern in variable, as e.g. we do here:

serverless/lib/configSchema.js

Line 3 in 30015ea

const functionNamePattern = '^[a-zA-Z0-9-_]+$';

medikoo · 2021-01-13T14:13:59Z

lib/plugins/aws/provider.js

+        });
+        const { authorizationToken, proxyEndpoint } = result.authorizationData[0];
+        const decodedAuthToken = Buffer.from(authorizationToken, 'base64').toString().split(':')[1];
+        const dockerArgs = ['login', '--username', 'AWS', '--password-stdin', proxyEndpoint];


I think docker has an option to simply pass password as an argument (--password ?).

It'll probably be cleaner to use that, instead of playing with stdin

At first I went with --password, however, Docker complains that it's insecure to use it --password directly and suggests to use --password-stdin instead. See message below:

WARNING! Using --password via the CLI is insecure. Use --password-stdin.

In general, even if password-stdin is used, there's still another warning:

WARNING! Your password will be stored unencrypted in /home/pgrzesik/.docker/config.json. Configure a credential helper to remove this warning. See https://docs.docker.com/engine/reference/commandline/login/#credentials-store

But we cannot really do anything meaningful about that one if we want to automatically login people to docker.

What do you think here? Should we go back to --password or keep the current approach (though I agree that it's so-so with stdin interaction)

If it's just a printed warning when --password is used, and not a deprecation of any kind, I would switch to regular param input

Changed 👍

medikoo · 2021-01-13T14:17:01Z

lib/plugins/aws/provider.js

+        const pathToDockerfile = path.join(imagePath, 'Dockerfile');
+
+        try {
+          const stat = await fs.promises.stat(pathToDockerfile);


nit: Convention is to name stats the result of fs.stat

medikoo · 2021-01-13T14:18:20Z

lib/plugins/aws/provider.js

+          );
+        }
+
+        const [{ repositoryUri, repositoryName }, ,] = await Promise.all([


nit: I guess trailing commas are not needed

medikoo · 2021-01-13T14:28:47Z

test/unit/lib/plugins/aws/deployFunction.test.js

+          },
+        })
+      ).to.be.rejectedWith(
+        'Either "uri" or "path" property needs to be set on image "invalidimage"'


Let's rely on codes, as e.g. here:

serverless/test/unit/lib/plugins/aws/package/compile/events/httpApi.test.js

Lines 456 to 458 in 30015ea

).to.eventually.be.rejected.and.have.property(

'code',

'EXTERNAL_HTTP_API_AUTHORIZER_WITHOUT_EXTERNAL_HTTP_API'

(Human readable message are subject to typo's, grammar improvements)

good call 👍

medikoo · 2021-01-13T14:30:42Z

test/unit/lib/plugins/aws/package/compile/functions.test.js

@@ -1756,6 +1756,363 @@ describe('lib/plugins/aws/package/compile/functions/index.test.js', () => {
        expect(fooFunctionRole).to.equal(fooFunctionConfig.role);
      });
    });
+
+    it('should fail if `functions[].image` references image without path and uri', async () => {


Aren't those tests duplicate of ones in deployFunction.test.js ?

That's correct, just tested in different use case scenario - do you feel like these should be just moved to provider.test.js?

I've moved these to provider.test.js from both deployFunction.test.js and functions.test.js

medikoo · 2021-01-13T14:31:50Z

test/unit/lib/plugins/aws/deployFunction.test.js

+      },
+    };
+
+    it('should fail if `functions[].image` references image without path and uri', async () => {


I think those tests should be in provider.test.js where we confirm custom validation of properties.

(in general I would test functionalities in test files which correspond to files in which they were implemented)

Also I guess we can use package command

How would you go about ensuring that the resolution happens correctly also for deploy function -f functionName? While the functionality is in provider.js, both deploy and deploy function use it a bit differently.

Very good point (and I think in tons of other places we forgot about packaging that's implied by sls deploy function.

I think we should probably ensure one sanity test (happy path) for both sls package (implied by sls deploy) and sls deploy function, but then test specific scenarios on just one of them.

Otherwise ideally it would be to have this functionality implemented independently of Serverless instance (it could take a imageName and imagePath), and then we can do unit tests against that uitlity, and with runServerless just confirm happy path for sls package and sls deploy function.

Still I guess it's not possible until we have #8445 done (and it's likely that some other things will stand in a way)

(e.g with refactor I'm working on, I'm secluding all those bricks into independent utils, that makes right unit testing easier)

I really like that suggestion and that's what I went with - happy path cases in both functions.test.js and deployFunction.test.js and full coverage in provider.test.js. I agree that ideally, I would love to have image resolution implemented as independent util, but at the moment it's hard as there are direct dependencies on provider instance (the biggest one on request that you mentioned). But I know we're on a good path to better modularization 🙌

medikoo

Thank you @pgrzesik ! Looks great. I have just two last questions to clarify, and we're good to go

lib/plugins/aws/provider.js

medikoo · 2021-01-14T10:16:37Z

lib/plugins/aws/provider.js

+        ];
+        try {
+          const { stdBuffer } = await spawnExt('docker', dockerArgs);
+          process.stdout.write(stdBuffer);


Is the warning that we use --password now visible in a console?

If it's the case I would probably not show output of this command by default (what's more important is image operations actions, where it's simple login)

Yes, it's visible in the console, we could hide it but I'm not sure if we should hide information that we're storing the credentials in an unencrypted way locally. That's what is showing right now from docker login

WARNING! Using --password via the CLI is insecure. Use --password-stdin. WARNING! Your password will be stored unencrypted in /home/pgrzesik/.docker/config.json. Configure a credential helper to remove this warning. See https://docs.docker.com/engine/reference/commandline/login/#credentials-store Login Succeeded

Hmm.. I think it's confusing for user. It's not user that made a choice of how password is transferred, and visible warning may imply to him that something is not right.

I would hide this output and instead show a visible notice:

Logged Docker successfully into AWS ECR. Note: that authentication token is stored as unencrypted in /home/pgrzesik/.docker/config.json.

Second line I'd put in orange as we do with other warnings. I believe we should also add a note about that into doc.

Also this triggers another question, if user will configure a credential helper that's mentioned in docs, can we somehow omit the login step? (we may ofc leave it for other PR)

About displaying that note: it might be tricky as the outputs vary between operating systems (as well as the location of .docker/config.json) - for example, on Windows machine I didn't get that warning. As for the credential helpers, I believe we still have to perform login for two reasons:

Tokens obtained from AWS are short-lived and there's no information about their validity stored (we could try to fallback to login only if needed (when push operation is denied e.g.), but that still requires logging in)

Even if credential helper is used (afaik), it's only about where the tokens are stored, e.g. in some kind of encrypted file instead of plaintext as in default, we still have to login, but it's just up to docker to properly store that token during login

Now I'm thinking about it more - maybe we shouldn't start with docker login, but rather only issue it if docker push command fails due to incorrect credentials?

There's also https://github.com/awslabs/amazon-ecr-credential-helper utility that would make docker login step redundant in case someone uses it

but rather only issue it if docker push command fails due to incorrect credentials?

Good idea! Going that road will definitely be nicer. At least user will have a choice and be able to setup authentication via own means if that's preferred.

but if for some reason recognition of authentication error is tricky I would skip it for now.

About displaying that note: it might be tricky as the outputs vary between operating systems

I would probably then search for a phrase stored unencrypted in <somePath>, and show our note only if we find it in output, otherwise not show it at all.

I've went for this type of output (warning with orange colour):

Serverless: Login to Docker succeeded! Serverless: WARNING: Docker authentication token will be stored unencrypted. Configure Docker credential helper to remove this warning. The push refers to repository [600238737408.dkr.ecr.us-east-1.amazonaws.com/serverless-someproject-dev] bdfa6eb032c6: Preparing 61b3904fa195: Preparing 6a7f9b3693e2: Preparing

What do you think?

Looks great, I would probably add ..will be stored unencrypted in docker config, so user is not confused where it's stored in general

good call 👍

medikoo

Thanks @pgrzesik I have just one style improvement proposal. Let me know

medikoo · 2021-01-14T16:38:11Z

lib/plugins/aws/provider.js

+        try {
+          ({ stdBuffer: pushDockerStdBuffer } = await spawnExt('docker', pushDockerArgs));
+          process.stdout.write(pushDockerStdBuffer);
+        } catch (outerPushErr) {
+          let errToThrow = outerPushErr;
+          if (
+            outerPushErr.stdBuffer &&
+            outerPushErr.stdBuffer.includes('no basic auth credentials')
+          ) {
+            errToThrow = null;
+            await this.dockerLoginToEcr();
+            try {
+              ({ stdBuffer: pushDockerStdBuffer } = await spawnExt('docker', pushDockerArgs));
+              process.stdout.write(pushDockerStdBuffer);
+            } catch (innerPushError) {
+              errToThrow = innerPushError;
+            }
+          }
+          if (errToThrow) {
+            throw new this.serverless.classes.Error(
+              `Encountered error during executing: docker ${pushDockerArgs}\nOutput of the command:\n${errToThrow.stdBuffer}`,
+              'DOCKER_PUSH_ERROR'
+            );
+          }
+        }


I think it'll be nicer to configure with more functional than imperative approach. Something as:

const pushTag = (args, options = {}) => { try { spawn('push tag', args); } catch (error) { if (!options.isLoggedIn && notLoggedInError) { login(); return pushTag(args, { isLoggedIn: true }) } throw error; } } pushTag(args);

great idea - changed

medikoo

@pgrzesik Great job!

Last thing that was holding us, was random crashes we observed in coverage build run against Node.js v10.

I've attempted to further investigate it, and so far outcome is as follows:

It looks that the orphaned crash happens either on fs.promises.open, fs.readFile or fs.open call. As it's only those functions (with fs.promises.readFile) which result with such error with no stack trace (Interestingly callback taking ones, sometimes expose stack trace and sometimes not).
I've excluded the fs.promises.readFile from that list, as having it patched I was able to reproduce the error without info in error message: https://github.com/serverless/serverless/runs/1707179867

At some points I've attempted to patch fs.readFileSync (although that was a false call, as those always produce stack trace), and after repeated runs got the error, but this time it was OOM error: https://github.com/serverless/serverless/runs/1708122362

This signals to me, that it's likely we're nearing end of memory and it causes the condition which produces the error we're occasionally seing

I had no further success in reproducing the error, when patching fs.promises.open, fs.promises.readFile, fs.readFile and fs.open. I've restarted build countless times and it always was successful.
It was during the weekend, so I wonder whether traffic on GitHub Actions servers doesn't influence some more restrictive memory usage which then makes error more likely.

I propose to merge it now, and if we observe the issue again, I'd commit-in the patch for fs methods to have better stack trace if error happens again. Then having it repeated again we'll decide on further steps.

At this point it looks as condition related to tight memory usage, which may no longer be likely (as e.g. we reverted AJV to v6 which is less consuming)

pgrzesik · 2021-01-18T10:31:43Z

Thanks @medikoo for extremely detailed report of your investigation and all the time you spend researching it 🙇 Your suggestion about running out of memory makes a lot of sense as it was only visible on constrained CI environment, where locally I've never encountered it.

brunocasado · 2021-03-01T21:42:15Z

this doesn't works....

Your Environment Information ---------------------------
Operating System: linux
Node Version: 14.15.4
Framework Version: 2.15.0 (local)
Plugin Version: 4.2.0
SDK Version: 2.3.2
Components Version: 3.4.3

Serverless: Configuration warning:
Serverless: at 'functions.calculateBussofane.image': should match pattern "^\d+.dkr.ecr.[a-z0-9-]+..amazonaws.com/[^@]+@sha256:[a-f0-9]{64}$"
Serverless: at 'provider': unrecognized property 'ecr'

What i'm doing wrong?

pgrzesik · 2021-03-01T21:50:16Z

Hello @brunocasado 👋 I believe you need to update your serverless version - according to your listing you're using 2.15.0 while support for building images was released with 2.20.0 - please refer to the changelog: https://github.com/serverless/serverless/blob/master/CHANGELOG.md#2200-2021-01-21

brunocasado · 2021-03-01T21:53:32Z

Hello @brunocasado 👋 I believe you need to update your serverless version - according to your listing you're using 2.15.0 while support for building images was released with 2.20.0 - please refer to the changelog: https://github.com/serverless/serverless/blob/master/CHANGELOG.md#2200-2021-01-21

omg i got.

sorry, now i realized that i did an update only in the global serverless and not inside the application.

pgrzesik · 2021-03-01T21:55:18Z

No worries @brunocasado - hope it works correctly after upgrade 💯

brunocasado · 2021-03-01T23:43:22Z

No worries @brunocasado - hope it works correctly after upgrade 💯

thank you!

It's working!

My opinion: This works much better than zip method. My problem with layer + python requirements ( size problem) is solved. :)

JacobWithACapitalJ · 2021-03-15T11:17:58Z

Hi, are there any docs/ tutorials for this? I'm struggling to find anything that explains using this new feature. Thanks!

pgrzesik · 2021-03-15T11:45:48Z

Hello @JacobWithACapitalJ 👋 We have a blog post that includes a walkthrough: https://www.serverless.com/blog/container-support-for-lambda/

paulacampigotto · 2022-02-14T14:49:19Z

@pgrzesik is it possible to create an ECR and upload a local image ? just like it's done with lambdas, but with no lambda? I would like to create an ECR and use it's image in a ECS's task definition, creating the image in the same serverless.yml file.

pgrzesik · 2022-02-14T15:36:19Z

Hello @paulacampigotto - at the moment such functionality is not supported, building and uploading image is triggered only if the image definition is used for at least one function.

pgrzesik added enhancement cat/lambda-image labels Jan 7, 2021

pgrzesik self-assigned this Jan 7, 2021

pgrzesik requested a review from medikoo January 7, 2021 15:15

medikoo suggested changes Jan 7, 2021

View reviewed changes

pgrzesik commented Jan 11, 2021

View reviewed changes

test/unit/lib/plugins/aws/deploy/index.test.js Outdated Show resolved Hide resolved

pgrzesik commented Jan 11, 2021

View reviewed changes

test/unit/lib/plugins/aws/remove/index.test.js Outdated Show resolved Hide resolved

pgrzesik commented Jan 11, 2021

View reviewed changes

test/unit/lib/plugins/aws/package/compile/functions.test.js Outdated Show resolved Hide resolved

This was referenced Jan 11, 2021

test: Cover AwsRemove with runServerless-based tests #8745

Merged

test: Refactor several AwsDeploy tests to runServerless #8746

Merged

pgrzesik force-pushed the add-support-for-docker-images branch 9 times, most recently from 2417185 to 2f7ae4c Compare January 13, 2021 10:06

pgrzesik marked this pull request as ready for review January 13, 2021 10:20

pgrzesik requested a review from medikoo January 13, 2021 10:22

pgrzesik force-pushed the add-support-for-docker-images branch from 2f7ae4c to 20f8420 Compare January 13, 2021 11:06

medikoo suggested changes Jan 13, 2021

View reviewed changes

pgrzesik force-pushed the add-support-for-docker-images branch from 4fde59f to fd1cdf7 Compare January 13, 2021 21:43

medikoo reviewed Jan 14, 2021

View reviewed changes

pgrzesik force-pushed the add-support-for-docker-images branch 3 times, most recently from 24ccb59 to da640c3 Compare January 14, 2021 15:20

pgrzesik requested a review from medikoo January 14, 2021 15:31

medikoo reviewed Jan 14, 2021

View reviewed changes

pgrzesik added 4 commits January 14, 2021 18:22

feat(AWS Lambda): Add support for building Docker images

251442c

test: Move tests to provider module and remove duplicates

4be1460

feat: Use --password instead of --password-stdin for docker login

522a3a0

feat: Try to docker login only if really needed

41a499f

pgrzesik force-pushed the add-support-for-docker-images branch from 3b22d79 to e2a5fab Compare January 14, 2021 17:23

pgrzesik requested a review from medikoo January 14, 2021 18:49

refactor: Recursive approach to docker push & login

a4b317a

pgrzesik force-pushed the add-support-for-docker-images branch from e2a5fab to a4b317a Compare January 15, 2021 09:31

medikoo approved these changes Jan 18, 2021

View reviewed changes

medikoo mentioned this pull request Jan 18, 2021

[DO NOT MERGE] Diagnose failing packaging test #8763

Closed

pgrzesik merged commit 789c2e3 into master Jan 18, 2021

pgrzesik deleted the add-support-for-docker-images branch January 18, 2021 10:32

johan1252 mentioned this pull request Feb 3, 2021

As of v2.20.0 "serverless remove" requires ECR IAM permissions even if not using lambda container images #8892

Closed

		const { repositoryUri, repositoryName } = await this.getOrCreateEcrRepository();

		await this.dockerLoginToEcr();

	// Ensure to reset cache for local serverless installation resolution
	// Leaking it across test files may introduce wrong assumptions when result is used for testing
	resolveLocalServerless.clear();
	// Ensure to reset memoization on artifacts generation after each test file run.
	// Reason is that homedir is automatically cleaned for each test,
	// therefore eventually built custom resource file is also removed
	_ensureArtifact.clear();

	).to.eventually.be.rejected.and.have.property(
	'code',
	'EXTERNAL_HTTP_API_AUTHORIZER_WITHOUT_EXTERNAL_HTTP_API'

feat(AWS Lambda): Add support for building Docker images #8725

feat(AWS Lambda): Add support for building Docker images #8725

Conversation

pgrzesik commented Jan 7, 2021 • edited

codecov bot commented Jan 7, 2021 • edited

Codecov Report

medikoo left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

pgrzesik Jan 13, 2021 • edited

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

pgrzesik commented Jan 8, 2021

medikoo commented Jan 8, 2021

pgrzesik commented Jan 8, 2021

Choose a reason for hiding this comment

Choose a reason for hiding this comment

pgrzesik commented Jan 11, 2021

pgrzesik commented Jan 13, 2021

medikoo left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

medikoo left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

pgrzesik Jan 14, 2021 • edited

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

medikoo left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

medikoo left a comment

Choose a reason for hiding this comment

pgrzesik commented Jan 18, 2021

brunocasado commented Mar 1, 2021

pgrzesik commented Jan 7, 2021 •

edited

codecov bot commented Jan 7, 2021 •

edited

pgrzesik Jan 13, 2021 •

edited

pgrzesik Jan 14, 2021 •

edited