Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

SSM fetch waits on when use interpolation #4984

Closed
wbotelhos opened this issue May 16, 2018 · 35 comments
Closed

SSM fetch waits on when use interpolation #4984

wbotelhos opened this issue May 16, 2018 · 35 comments
Labels

Comments

@wbotelhos
Copy link

wbotelhos commented May 16, 2018

Description

Using version >= 1.27.0 makes my SSM variable waits on and do not work.

  # 2501:   ssm:/homolog/purger/VARNISH_DESCRIBING_ARN waited on by: ${ssm:/homolog/purger/VARNISH_DESCRIBING_ARN}
  • What did you expect should have happened?

It should fetch the variable normally, like when a do:

aws ssm get-parameter --name '/homolog/purger/VARNISH_DESCRIBING_ARN'

{
  "Parameter": {
    "Version": 1, 
    "Type": "String", 
    "Name": "/homolog/purger/VARNISH_DESCRIBING_ARN", 
     "Value": "arn:aws:autoscaling:us-east-1:xxx:autoScalingGroup::varnish-homolog"
    }
}
  • What was the config you used?
custom:
  stage: ${opt:stage, 'development'}

frameworkVersion: "=1.27.2"

functions:
  purge:
    events:
      - http:
          method: post
          path: purge

      - sns: "arn:aws:sns:us-east-1:x:purges"

    handler: handler.purge
    memorySize: 128
    name: purger-purge-${self:custom.stage}

    tags:
      Environment: ${self:custom.stage}
      Function: purge
      Name: purger
      Owner: front
      Type: lambda

    vpc:
      security_group_ids:
        - sg-x

      subnets:
        - subnet-x
        - subnet-x
        - subnet-x

provider:
  environment:
    AZION_LOGIN_URL: "https://api.azion.net/tokens"
    AZION_PASSWORD: ${ssm:/${self:custom.stage}/purger/AZION_PASSWORD}
    AZION_PURGE_URL: "https://api.azion.net/purge/wildcard"
    AZION_USERNAME: ${ssm:/${self:custom.stage}/purger/AZION_USERNAME}
    HOST: ${self:custom.stage}
    NODE_ENV: ${ssm:/${self:custom.stage}/purger/NODE_ENV}
    VARNISH_AUTO_SCALING_NAME: "varnish-${self:custom.stage}"
    VARNISH_AUTO_SCALING_REGION: ${ssm:/${self:custom.stage}/purger/VARNISH_AUTO_SCALING_REGION}
    VARNISH_DESCRIBING_ARN: ${ssm:/${self:custom.stage}/purger/VARNISH_DESCRIBING_ARN}

  name: aws
  region: us-east-1
  runtime: nodejs8.10
  stage: ${self:custom.stage}
  timeout: 20

service: purger
  • What stacktrace or error message from your provider did you see?
Serverless Warning --------------------------------------
 
  ################################################################################
 
 
 Serverless Warning --------------------------------------
 
  # 2501: 8 of 9 promises have settled
 
 
 Serverless Warning --------------------------------------
 
  # 2501: 1 unsettled promises:
 
 
 Serverless Warning --------------------------------------
 
  # 2501:   ssm:/homolog/purger/VARNISH_DESCRIBING_ARN waited on by: ${ssm:/homolog/purger/VARNISH_DESCRIBING_ARN}
 
 
 Serverless Warning --------------------------------------
 
  ################################################################################
  • Serverless Framework Version you're using: 1.27.2
  • Operating System:
System Version: macOS 10.13.4 (17E199)
Kernel Version: Darwin 17.5.0
Boot Volume: Macintosh HD
@HyperBrain
Copy link
Member

/cc @erikerikson

@HyperBrain HyperBrain added the bug label May 17, 2018
@erikerikson
Copy link
Member

Thank you for this (and for CCing me Frank). Sorry for leaving it on the back burner, I have been spinning up a team but should have time to investigate today.

@erikerikson
Copy link
Member

Hi @wbotelhos

I've been working to reproduce this using your listed config but struggling to do so.

I have altered my getValueFromSsm, to cut it short and return the variable name you use (rather than a value):

    return BbPromise.delay(Math.random() * 500)
      .then(() => BbPromise.resolve(variableString.slice(variableString.lastIndexOf('/')+1)))

If you add this to $(npm root -g)/node_modules/serverless/lib/classes/Variables.js does it clear up the problem (while obviously not giving you expected values)? If so, I need to look at it more but otherwise, I am seeing correct rendering without holdup.

I assume that you're seeing that warning message repeatedly, over and over until you kill the process? Let me know if not and it eventually completes/proceeds. That would indicate that its taking longer than usual and the warning is just superfluous in your case.

Maybe there's a detail I'm missing? I could use some help experiencing this issue.

@Rhymond
Copy link

Rhymond commented May 21, 2018

I'm getting the same warning messages. It looks like it shows promise progress in warning messages.


@erikerikson
Copy link
Member

@rymond does the tool eventually complete it's action it does that final report of three pending promises repeat until you kill the process?

@Rhymond
Copy link

Rhymond commented May 21, 2018

It doesn't repeat. It finishes and works as expected, just it's strange that it shows warning for each SSM key.
I can't kill proccess because it runs on CirceCI.

@erikerikson
Copy link
Member

erikerikson commented May 21, 2018

Thank you very much for adding this information.

The tool currently does this to assist users in avoiding deadlocks. I could see how it would trigger warning adverse CD pipelines though (i.e. some are configured to break on warnings).

Perhaps this should be disabled in non-interactive mode?

Maybe there aren't consequences but the messaging drives curiosity and I'm over-extrapolating?

I originally added this messaging for safety in case my strategy for resolving deadlocks was incomplete and users stare at an empty console if the variables get delayed indefinitely.

Honestly, I didn't expect users to see it so thank you for reporting that you have and helping me understand the circumstances.

@wbotelhos
Copy link
Author

Hi @erikerikson ,

Unfortunately I cannot reproduce the problem again.
In my case, ROLLBACK happened and none of the variables were written in Lambda.
I did not try to check and kill the process while I was trying to find the problem.

Last week I solved the problem by avoiding interpolation. Today I put back the old code and variables and the problem no longer happened. If it happens again I will reopen this issue.

Thanks for the support. <3

@alanjds
Copy link

alanjds commented May 21, 2018

Sidenote: @wbotelhos is targeting a us-east-1 region from a computer +200ms away from it. Maybe this can be related.

@Rhymond
Copy link

Rhymond commented May 22, 2018

@erikerikson
I cannot get any warning while running it locally.
I think there is a bit of latency between CircleCI and AWS region and it's enough to trigger the warning

@erikerikson
Copy link
Member

Thank you each very much for the extra information. I am very glad there were no direct negatives but I'm sorry the message consumed some of your time and I'll take it as an action item to look into timing assumptions and non-interactive mode behavior differences.

@fazelmk
Copy link

fazelmk commented May 28, 2018

not sure if related but this worked. And now it complains about variable not been at type of string.

EMAIL_CONFIG: '{"HOST":"<myHost>","PORT":465,"SECURE":true, "USER":"<myUser>","EMAIL":"<myEmail>","PASSWORD":"${ssm:/${self:custom.project_name}/dev/EMAIL_PASSWORD~true}"}'

I'm using a fork that contains a feature released in the new version. So I tried to update to see if it was working and I encounter the "unsettled promises". But ignoring it. It works in the endpoint that don't use the EMAIL_CONFIG variable. The ones that use it crashes with the not a string error.

@HyperBrain
Copy link
Member

@fazelmk I think this is also related to #4890 which added checks of the variable types. AWS only allows string values or AWS references (like Fn::GetAtt, Ref, etc.). Is your EMAIL_CONFIG a string or an object? The latter would trigger the error. Maybe the single quotes should be exchanged to double quotes with escapes on the inner quotes?

@fazelmk
Copy link

fazelmk commented May 28, 2018

@HyperBrain it's a string.
Will give a try on double quotes with escaped quotes later on today and let you know if it works.

@fazelmk
Copy link

fazelmk commented May 31, 2018

Sorry for the delay. with double quotes gives the same error.

A valid SSM parameter to satisfy the declaration 'ssm:/self:custom.project_name/prod/EMAIL_PASSWORD~true' could not be found.

if I remove the quotes it says it's not a string as the EMAIL_CONFIG will receive a json object

the problem is ${self:custom.project_name} not been parsed in the new version.

@ChristopheBougere
Copy link

We are having the same warnings (N of M promises have settled), but does seems to impact anything.
We are also querying us-east-1 from Europe, so might be due to the delay?

@erikerikson
Copy link
Member

That observation is almost certainly due to the delays @ChristopheBougere - the warnings are merely that. They were added because previously promises were "deadlocking" when circular references were used. Those bugs were fixed but I left the code in there to help debug any problems that might come up resulting from somewhat extensive changes to the variables system that I made to fix the "deadlocking" issue. They didn't alter the external contracts/user expected behaviors but it's very easy to get confused about the intersecting use cases in that sector of code.

@erikerikson
Copy link
Member

@fazelmk I tried reproducing your concern using 1.27.3 and the following:

# serverless.yml
service: foo
provider: aws
custom:
  project_name: my-project
  EMAIL_CONFIG: '{"HOST":"<myHost>","PORT":465,"SECURE":true, "USER":"<myUser>","EMAIL":"<myEmail>","PASSWORD":"${ssm:/${self:custom.project_name}/dev/EMAIL_PASSWORD~true}"}'

I then observed:

$sls print
Serverless: Recoverable error occured (The security token included in the request is expired), sleeping 5 seconds. Try 1 of 4
 
 Serverless Warning --------------------------------------
 
  ################################################################################
 
 
 Serverless Warning --------------------------------------
 
  # 2505: 1 of 2 promises have settled
 
 
 Serverless Warning --------------------------------------
 
  # 2505: 1 unsettled promises:
 
 
 Serverless Warning --------------------------------------
 
  # 2505:   ssm:/my-project/dev/EMAIL_PASSWORD~true waited on by: {"HOST":"<myHost>","PORT":465,"SECURE":true, "USER":"<myUser>","EMAIL":"<myEmail>","PASSWORD":"${ssm:/my-project/dev/EMAIL_PASSWORD~true}"}
 
 
 Serverless Warning --------------------------------------
 
  ################################################################################

Note that I didn't use active credentials and thus the observed behavior was expected but also that the references to ${self:custom.project_name} were successfully replaced with the value my-project as expected but seemingly in contradiction to your report.

If you continue to experience your problem, please open a new issue with meta data as indicated in the issue template as well as materials that reproduce the issue.

@erikerikson
Copy link
Member

TL;DR: background to and request for opinions on correct behavior

For those on this issue thread experiencing the warning message and concerned about it, the code that schedules the warning messages that have been reported is at https://github.com/serverless/serverless/blob/master/lib/classes/PromiseTracker.js#L16 with the following code:

this.interval = setInterval(this.report.bind(this), 2500);

That provides 2.5 seconds as the interval at which to check for and, if existing, warn users about promises that are latent in completing (resolving or rejecting). I can accept as reasonable an argument that this could be longer. My original thought was that 2.5 seconds seemed to be good timeline to "speak up" to avoid active users starting to wonder what was happening and taking anti-productive corrective action. I could see an argument for avoiding the messaging in a non-observed context but subtracting that information in such a case seems like it could make any challenges that do occur harder to track down.

There are a number of fixes that are possible. One would be improving the messaging so that it is more clear about the circumstance and the lack of need for worry. Another would be changing the business logic around timing or display of the messages. Surely there are others.

What do you believe would be the best way to deal with the circumstance while ensuring that if things get into a bad state that users can identify the cause and take productive action to resolve whatever issue might be involved?

@wbotelhos
Copy link
Author

What about to use info level? The warn makes me think that I have to take an action, but I don't and I can't.

@ChristopheBougere
Copy link

Thanks @erikerikson for the detailed explanation!
Agreed, maybe the info level would make more sense.
The message could also be changed to something like:
Still waiting for SSM parameter ${param_name} after 2.5 seconds. Is it normal?

@erikerikson
Copy link
Member

Using info seems good. Unless I get an objection soon, I'll write up and submit a PR claiming resolution of this issue. Thank you @wbotelhos and @ChristopheBougere!

@brickpop
Copy link

brickpop commented Jul 21, 2018

I am encountering this error in the following scenario:

  • I run serverless config credentials --provider aws --key ... --secret ...
  • If I run it with no serverless.yml at all or with any variable interpolated, it works fine
  • Otherwise, it will try to resolve the SSM variables from serverless.yml (even before the credentials are registered), and will fail

Bypassing the checks in lib/classes/Variables.js > getValueFromSsm (lines 694-697) will allow it to complete.

@erikerikson
Copy link
Member

Hi @ledfusion - it seems that you are experiencing an issue where the interpretation of your serverless.yml fails prior to establishing the credentials that you're trying to establish with your command. This is a different matter than raised by this issue; could you please open a new and separate issue to cover this bug?

As noted, it looks like you have identified a workaround despite the friction it creates.

Thank you.

@erikerikson
Copy link
Member

Those who have followed this thread, I have opened #5151 to change the output in this circumstance to:

Serverless Information ----------------------------------

##########################################################################################
 # 0: 0 of 2 promises have settled
 # 0: 2 unsettled promises:
 # 0:   foo waited on by: ${foo:}
 # 0:   foo waited on by: ${foo:}
 # This can result from latent connections but may represent a cyclic variable dependency
##########################################################################################

In hindsight, this language is pretty poor so I'd be happy to take suggestions for improvement or accept modifications to my PR with improvements.

@dead10ck
Copy link

@erikerikson Hello, and thank you for your work! As a recent user of serverless in a production environment, seeing this warning unsettled me a bit, since by chance, we didn't happen to see it during deploys to our dev environments, but did during deploys to our production environment.

As far as I see from previous comments in this thread, it seems the warning is printed in case there are any cyclic dependencies in the config. I'm not sure how difficult this might be to implement, but perhaps another solution might be to do static analysis on the config at an early stage to detect dependency cycles, and if one is detected, error out hard before any actions are performed. This is what terraform does.

@erikerikson
Copy link
Member

Hi @dead10ck,

I'm very sorry that you were unsettled by this warning. I will attempt to share some background that I suspect you may agree makes it important if not also unfortunate.

The code that prints this warning runs on a timer that refreshes whenever a promise is resolved. One of the primary motivations was to detect cycles but also hung promises. It shows up on slow connections too. A less publicized reason for the check is that the variable system grew organically without a necessarily consistent design, test suite, or set of rules and had been problematically synchronous. Additionally there's been a lot of rich and creative use of the system and it can be difficult to predict the consequences of changes. One of the results of the synchronicity was deadlocks; given a shift from being synchronous and it's production use, I wanted to have a screaming/flashing red/etc. trail if I had introduced any bugs. I was once responsible for dropping a company's production database during the earlier stages of this variables rework due to a bug (please hard code and regularly update your serverless dependency after testing). As a result a major part of the reason for this warning is ensuring no unnoticed bugs were introduced (silent bugs are some of the worst and tattling seems like the best means of voicing them). The whole community cares deeply about maintaining the trust users (including ourselves) put in this tool.

In a very real sense, the warning you saw is printed out during the "static" analysis process. Note that many asynchronous processes are involved in populating a service such as with the variable type mentioned in the title of this issue. We cannot retrieve an SSM variable without interacting with the AWS service, using your credentials, et cetera. Because the variables or content in S3 among the other sources can contain arbitrary JSON, there is no way to predict cycles without populating the service and every contained variable. Population is one of the earlier steps in many workflows of the framework. Certainly it occurs prior to actualdeployment of CloudFormation or any alternative assets. And such, in a very real sense, taking into an account the asynchronous actions, the static analysis you request is actually present.

I hope this helps explain this otherwise somewhat less than thoroughly documented subsystem. If I've missed discussing or explaining any concern of yours, please feel free to follow up.

@dead10ck
Copy link

dead10ck commented Aug 13, 2019

Perhaps I'm just not familiar with the architecture of serverless, so please forgive me if I'm totally ignorant here, but when I think of the "stages" a run goes through from reading a user's config to a deployed environment, I would think that the first step is when the config is parsed in full, along with all "dependencies", where my understanding of the word "depenencies" would mean any string interpolations, fields of a config that refer to other parts of the same config, etc., and that this would happen before the "rendering" phase, where things like JSON text files, SSM lookups, etc., are performed to fill in the interpolations to end up with the final "execution plan."

Because the variables or content in S3 among the other sources can contain arbitrary JSON, there is no way to predict cycles without populating the service and every contained variable.

If I'm understanding you, you're saying that the dependencies cannot be resolved until after the parsing phase? What kind of dependency cycles can happen that you can't know about until during the rendering phase?

@erikerikson
Copy link
Member

You have nothing to apologize for.

The initially loaded serverless config is (where is generated by) a local file in one of the supported formats (If I'm not out a date... JSON, YML, JavaScript). That file can reference something like an S3 hosted file and also referenced into the contents of that file. An example of this would be the selection of a dev versus prod configuration file where the two share naming structures but differ in the concrete values in order to support almost identical dev and prod environments. Likewise, the S3 hosted file can contain references to its own values in order to avoid repeating itself. Whether necessary or not, the initial file is parsed prior to it being populated based on the parsing of its contents. Further, it is not until all of the populations have completed that you have the specified final vendor specific asset that can be sent to The appropriate service to then enact alterations of the cloud environment. Perhaps it would be helpful for me to state that the variable system makes no distinction between local and remote sources of configuration except in the details of the generation of the promise that delivers the JavaScript object containing the configuration. The variable subsystem operates purely upon JavaScript objects.

Things got a little strange when the self-reference was introduced. See, for example, https://github.com/Nordstrom/hello-retail/blob/e4e1667c4e4376ea271175081f28716f6166ce04/product-catalog/builder/serverless.yml#L14. The self reference significantly allows identical declarations within and outside of the same referenced serverless file so that inter-serverless config file definitions could reduce duplicate declarations. This allows a file to be properly interpreted whether loaded the primary config or as a dependency within another config. An envisioned option would be a new reference type that would be local to the file it's declared in but that work was never prioritized.

@dead10ck
Copy link

@erikerikson ahh, I see, I wasn't aware of those features that dynamically load a wholly separate serverless config. That actually makes sense, thanks for the explanation!

One thing that comes to mind for me is that this is a bit like how terraform does modules, where the module may be external or local. I think this is why terraform requires the init command before running; I believe that command fetches and resolves all dependencies so that all configs are prepared before running so that the terraform command can do a complete static analysis of the entire set of resource definitions, including those that come from other sources. Not to keep throwing suggestions for more work at you guys, but perhaps something analogous is possible with serverless (though I'm not sure if it would help with the self-reference feature—that's an interesing one).

@erikerikson
Copy link
Member

What you describe is, in effect, what serverless does every time a command requiring service population is run. An advantage of this is that configuration drift is reduced. That said you might want to open an issue and advocate for the feature if you have a valuable use case that could be enabled by the change.

@stephan-nordnes-eriksen

Thanks for the info here! I think that the message could be worded more towards still waiting, and hopefully also add a all promises completed if/when that happens. The way it is now makes me think there is something wrong, and I have to restart the whole deployment process because it wasn't able to fetch all my parameters.

Perhaps this wording could work?

X out of Y promises have settled
(Y - X) promises are taking longer than expected. 
Promise 1: ssm:promiseName~true still waiting
Promise 2: ssm: promiseNameTwo~true still waiting
...

And then

All promises completed after X seconds.

or

(Y - X) promises failed after X seconds

This would give me the confidence that my deployment didn't fail, or perhaps didn't get all the data it needed.

@erikerikson
Copy link
Member

@stephan-nordnes-eriksen (and anyone else who cares to express a preference) Do you like or have alternatives to these messages?

Progress:

############################################################################################
    # 2: 3 of 5 promises have settled
    # 2: 2 are taking longer than expected:
    # 2:   foo waited on by: ${foo:}
    # 2:   bar waited on by: ${bar:}
    # This can result from latent connections but may represent a cyclic variable dependency
    ##########################################################################################

Completion:

############################################################################################
    # Completed after 6ms
    # 5 promises are in the following states:
    #   resolved: 3
    #   rejected: 1
    #   pending:  1
    ##########################################################################################

@stephan-nordnes-eriksen

@erikerikson Not that I am in any way an authority on this, but this looks great to me! Thanks a lot!

@Rhymond
Copy link

Rhymond commented Nov 15, 2019

agree with @stephan-nordnes-eriksen. It looks informative and descriptive enough to understand what's happening. Thanks @erikerikson!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

10 participants