AWS metadata and STS API calls are slow #873

joeduffy · 2020-02-11T16:28:29Z

During the configuration of the AWS provider, it calls into multiple AWS APIs to check various endpoints and identity metadata. Times vary quite a bit depending on the speed of your network, however it's not uncommon for these to add up to 10-15 seconds of lag before an update even begins running.

Here are the specific calls:

Note that you can set config variables to skip this logic:

pulumi config set aws:skipCredentialsValidation true
pulumi config set aws:skipMetadataApiCheck true
pulumi config set aws:skipRequestingAccountId true

For some reason, the provider seems to call these APIs twice, one of which ignores the config. @lukehoban posited that this could be due to the way we do a prepass over configuration to validate and check for defaults. If so, that seems like it's a bug that we do it without having first applied the configuration.

Note also that you can set AWS_METADATA_TIMEOUT=0 which shortens the timeouts of the AWS metadata API calls and does have a small noticeable effect.

I don't know precisely what to do here, but we could consider setting our own defaults differently than the underlying Terraform provider. I don't know enough about what those APIs are doing -- it appears, for instance, that the metadata API check is determining whether the update is happening from within an AWS data center (though why the code needs to know that, I'm not quite sure).

The text was updated successfully, but these errors were encountered:

uLan08 · 2020-02-12T14:37:08Z

I encountered an issue somewhat related to this. I was trying to spin up the basic example from pulumi new aws-javascript but pulumi up was just hanging indefinitely. The most I waited was probably 5 mins.

Setting pulumi config set aws:skipCredentialsValidation true fixed it for me,

Here are some of my environment info in case you need them.

▶ node -v
v10.16.3

▶ pulumi version
v1.10.1

"@pulumi/pulumi": "^1.0.0",
"@pulumi/aws": "^1.0.0",
"@pulumi/awsx": "^0.18.10"


config:
  aws:profile: foo
  aws:region: ap-southeast-1
  pulumi:template: aws-javascript

It was stuck at

▶ pulumi up --logtostderr -v=9 2> out.txt
Previewing update (dev):
     Type                 Name             Plan        
     pulumi:pulumi:Stack  pulumi-demo-dev  create..

Last entries from logs:

I0212 22:33:41.285702   73852 eventsink.go:60] AWS Auth provider used: "SharedCredentialsProvider"
I0212 22:33:41.285742   73852 eventsink.go:63] eventSink::Debug(<{%reset%}>AWS Auth provider used: "SharedCredentialsProvider"<{%reset%}>)
I0212 22:33:41.288928   73852 eventsink.go:60] Trying to get account information via sts:GetCallerIdentity
I0212 22:33:41.288964   73852 eventsink.go:63] eventSink::Debug(<{%reset%}>Trying to get account information via sts:GetCallerIdentity<{%reset%}>)

joeduffy · 2020-02-12T16:15:45Z

Interesting, @stack72 is @uLan08's issue the same as #814? And pulumi/pulumi#3604?

joeduffy · 2020-02-12T16:17:19Z

Assigning to @pgavlin during M32 to triage as part of the overall performance push. On slow networks, this is by far the dominant performance issue (when deploying to AWS), as far as I can tell.

pgavlin · 2020-02-12T17:18:02Z

I'm already full up for M32--@leezen I'm going to bounce this to you and we can figure out where to put it.

lukehoban · 2020-02-26T01:40:07Z

The commentary in pulumi/pulumi#3671 (comment) is really mostly about the specific issue tracked here. Copying below as well:

A few more observations looking at some detailed logs of the initialization sequence:

On first ever update of a stack, we Configure the AWS provider once for preview and once for update. On every future update, we Configure the AWS provider twice for preview and once for update (!).
During the Configure for AWS we hit the EC2 metadata endpoint twice (once due to our own PreConfigureCallback, the other part of the upstream provider Configure). Despite being set to timeout after 100ms, each call reliably takes 1-1.5s.
After those two calls, we then call sts/GetCallerIdentity twice (the upstream provider just happens to do this by default) and ec2/DescribeAccountAttributes once - each call taking .5-1s.

Total of above is that Configure takes 3.5-6s. And we call it twice during preview.

lukehoban · 2020-02-26T16:37:07Z

Given the above - I see a few things we could consider changing:

Removing the second Configure we do during a preview. This should be unnecessary. Would save 3.5-6s during preview.
Changing our defaults on skipCredentialsValidation (the source of the second GetCallerIdentity call which would save 0.5-1s) and skipMetadataApiCheck (the source of the two EC2 metadata calls which would save 2-3s).
Avoiding doing our PreConfigureCallback check (and accepting worse error messages - or string matching error messages to overwrite them) which wouldn't help much if we also did (2) above, but if we didn't do it, would save 1-1.5s.

I don't love changing our defaults on these things. It will no doubt lead to confusion in less common cases if we do. But it does feel like defaults that don't penalize 100% of usage in favor of relatively corner case needs are sensible?

Thoughts?

stack72 · 2020-02-26T16:39:34Z

@lukehoban FWIW, there are well known people in the Terraform ecosystem that set these defaults to be different by default as it's known to be slow upstream as well

I would suggest we set these and just document that it's the case

shousper · 2021-01-05T00:46:51Z

Seeing as this issue is coming up on nearly 12 months, would it be worth considering some action on pulumi's side? I'm running into this issue pretty frequently myself with a fairly basic (standard?) cross-account setup using assume role everywhere with AWS profiles.

Setting aws:skipCredentialsValidation to true in every pulumi stack I create is getting old pretty fast. Is it time to bite the bullet and enable this behaviour by default?

stack72 · 2021-01-05T22:24:09Z

Hi @shousper

So I've actually been looking at this issue tonight to see what we can set the default variables to.

Do you happen to know how long your pulumi run was before setting that value? I am trying to gauge if you are getting the same behaviour as me

Paul

Fixes: #873 * `skipCredentialsValidation` now defaults to `true`. * `skipGetEc2Platforms` now defaults to `true`. * `skipMetadataApiCheck` now defaults to `true`. * `skipRegionValidation` now defaults to `true`.

shousper · 2021-01-06T08:38:22Z

Sorry @stack72, doesn't look like this is my issue. It ramped up, so I dug deeper.. looks like I'm in this trap: hashicorp/terraform#27350 golang/go#42700

😞

#1288) Fixes: #873 * `skipCredentialsValidation` now defaults to `true`. * `skipGetEc2Platforms` now defaults to `true`. * `skipMetadataApiCheck` now defaults to `true`. * `skipRegionValidation` now defaults to `true`.

This PR explores reverting the default `aws:skipMetadataApiCheck=false` setting to enable the provider to be able to seamlessly authenticate against an IMDS(v2) endpoints in the AWS environment. It appears that doing so no longer slows down the provider startup time perceptibly. The way I tested the speed delta was by measuring local empty preview of an AWS s3 Bucket using AWS_PROFILE authentication with local <-> us-east-1 there is no perceptible difference. Fixes: #1692 An integration test is added that exercises `pulumi preview` on an EC2 instance with IMDSv2 and asserts that the provider can authenticate successfully. Background: - #873 - #1288

joeduffy mentioned this issue Feb 11, 2020

Improve startup performance pulumi/pulumi#3671

Open

5 tasks

joeduffy assigned pgavlin Feb 12, 2020

joeduffy added this to the 0.32 milestone Feb 12, 2020

pgavlin assigned leezen and unassigned pgavlin Feb 12, 2020

lukehoban assigned lukehoban and unassigned leezen Mar 1, 2020

lukehoban modified the milestones: 0.32, 0.33 Mar 5, 2020

lukehoban mentioned this issue Mar 11, 2020

CLI hangs at sts:GetCallerIdentity when temporary AWS credentials exist but are expired #814

Closed

lukehoban modified the milestones: 0.33, 0.34 Apr 3, 2020

leezen removed this from the 0.34 milestone Apr 7, 2020

lukehoban mentioned this issue Jan 5, 2021

Breaking changes to consider for Pulumi 3.0 pulumi/pulumi#5731

Closed

5 tasks

stack72 mentioned this issue Jan 5, 2021

Setting Provider defaults to remove slow calls to AWS STS and Metadata #1288

Merged

stack72 assigned stack72 and unassigned lukehoban Jan 5, 2021

stack72 added this to the current milestone Jan 5, 2021

leezen removed this from the current milestone Jan 12, 2021

leezen modified the milestones: 0.50, 0.51 Jan 12, 2021

leezen modified the milestones: 0.51, 0.52 Feb 2, 2021

stack72 closed this as completed in #1288 Feb 7, 2021

t0yv0 mentioned this issue May 17, 2024

Do not skip metadata API check by default #3960

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

AWS metadata and STS API calls are slow #873

AWS metadata and STS API calls are slow #873

joeduffy commented Feb 11, 2020 •

edited

Loading

uLan08 commented Feb 12, 2020

joeduffy commented Feb 12, 2020

joeduffy commented Feb 12, 2020

pgavlin commented Feb 12, 2020

lukehoban commented Feb 26, 2020 •

edited

Loading

lukehoban commented Feb 26, 2020

stack72 commented Feb 26, 2020

shousper commented Jan 5, 2021

stack72 commented Jan 5, 2021

shousper commented Jan 6, 2021

AWS metadata and STS API calls are slow #873

AWS metadata and STS API calls are slow #873

Comments

joeduffy commented Feb 11, 2020 • edited Loading

uLan08 commented Feb 12, 2020

joeduffy commented Feb 12, 2020

joeduffy commented Feb 12, 2020

pgavlin commented Feb 12, 2020

lukehoban commented Feb 26, 2020 • edited Loading

lukehoban commented Feb 26, 2020

stack72 commented Feb 26, 2020

shousper commented Jan 5, 2021

stack72 commented Jan 5, 2021

shousper commented Jan 6, 2021

joeduffy commented Feb 11, 2020 •

edited

Loading

lukehoban commented Feb 26, 2020 •

edited

Loading