Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Nice2have: better boto error handling when AWS service isn't available (here: some authentication problems) #30808

Closed
Reiner030 opened this issue Feb 2, 2016 · 7 comments
Assignees
Labels
Bug broken, incorrect, or confusing behavior fixed-pls-verify fix is linked, bug author to confirm fix P2 Priority 2 RIoT Relates to integration with cloud providers, hypervisors, API-based services, etc. severity-medium 3rd level, incorrect or bad functionality, confusing and lacks a work around
Milestone

Comments

@Reiner030
Copy link

Hello,

today I got in highstate problems because of boto calls not working as expected:

root@ip-172-31-24-226:~# salt-call boto_ec2.get_id salt solr-master-euc1-01 -l debug
[DEBUG   ] Reading configuration from /etc/salt/minion
[DEBUG   ] Including configuration from '/etc/salt/minion.d/_schedule.conf'
[DEBUG   ] Reading configuration from /etc/salt/minion.d/_schedule.conf
[DEBUG   ] Configuration file path: /etc/salt/minion
[WARNING ] Insecure logging configuration detected! Sensitive data may be logged.
[DEBUG   ] Reading configuration from /etc/salt/minion
[DEBUG   ] Including configuration from '/etc/salt/minion.d/_schedule.conf'
[DEBUG   ] Reading configuration from /etc/salt/minion.d/_schedule.conf
[DEBUG   ] Please install 'virt-what' to improve results of the 'virtual' grain.
[DEBUG   ] Initializing new SAuth for ('/etc/salt/pki/minion', 'solr-master-euc1-01', 'tcp://52.28.xx.xx:4506')
[DEBUG   ] Generated random reconnect delay between '1000ms' and '11000ms' (2732)
[DEBUG   ] Setting zmq_reconnect_ivl to '2732ms'
[DEBUG   ] Setting zmq_reconnect_ivl_max to '11000ms'
[DEBUG   ] Initializing new AsyncZeroMQReqChannel for ('/etc/salt/pki/minion', 'solr-master-euc1-01', 'tcp://52.28.xx.xx:4506', 'aes')
[DEBUG   ] Initializing new SAuth for ('/etc/salt/pki/minion', 'solr-master-euc1-01', 'tcp://52.28.xx.xx:4506')
[DEBUG   ] Initializing new AsyncZeroMQReqChannel for ('/etc/salt/pki/minion', 'solr-master-euc1-01', 'tcp://52.28.xx.xx:4506', 'clear')
[DEBUG   ] Decrypting the current master AES key
[DEBUG   ] Loaded minion key: /etc/salt/pki/minion/minion.pem
[DEBUG   ] Loaded minion key: /etc/salt/pki/minion/minion.pem
[DEBUG   ] LazyLoaded jinja.render
[DEBUG   ] LazyLoaded yaml.render
[DEBUG   ] LazyLoaded boto.assign_funcs
[DEBUG   ] LazyLoaded boto_ec2.get_id
[ERROR   ] An un-handled exception was caught by salt's global exception handler:
NameError: global name '__salt__' is not defined
Traceback (most recent call last):
  File "/usr/bin/salt-call", line 11, in <module>
    salt_call()
  File "/usr/lib/python2.7/dist-packages/salt/scripts.py", line 335, in salt_call
    client.run()
  File "/usr/lib/python2.7/dist-packages/salt/cli/call.py", line 53, in run
    caller.run()
  File "/usr/lib/python2.7/dist-packages/salt/cli/caller.py", line 133, in run
    ret = self.call()
  File "/usr/lib/python2.7/dist-packages/salt/cli/caller.py", line 196, in call
    ret['return'] = func(*args, **kwargs)
  File "/usr/lib/python2.7/dist-packages/salt/modules/boto_ec2.py", line 191, in get_id
    keyid=keyid, profile=profile)
  File "/usr/lib/python2.7/dist-packages/salt/modules/boto_ec2.py", line 120, in find_instances
    conn = _get_conn(region=region, key=key, keyid=keyid, profile=profile)
  File "/usr/lib/python2.7/dist-packages/salt/utils/boto.py", line 176, in get_connection
    keyid, profile)
  File "/usr/lib/python2.7/dist-packages/salt/utils/boto.py", line 91, in _get_profile
    if not region and __salt__['config.option'](service + '.region'):
NameError: global name '__salt__' is not defined
Traceback (most recent call last):
  File "/usr/bin/salt-call", line 11, in <module>
    salt_call()
  File "/usr/lib/python2.7/dist-packages/salt/scripts.py", line 335, in salt_call
    client.run()
  File "/usr/lib/python2.7/dist-packages/salt/cli/call.py", line 53, in run
    caller.run()
  File "/usr/lib/python2.7/dist-packages/salt/cli/caller.py", line 133, in run
    ret = self.call()
  File "/usr/lib/python2.7/dist-packages/salt/cli/caller.py", line 196, in call
    ret['return'] = func(*args, **kwargs)
  File "/usr/lib/python2.7/dist-packages/salt/modules/boto_ec2.py", line 191, in get_id
    keyid=keyid, profile=profile)
  File "/usr/lib/python2.7/dist-packages/salt/modules/boto_ec2.py", line 120, in find_instances
    conn = _get_conn(region=region, key=key, keyid=keyid, profile=profile)
  File "/usr/lib/python2.7/dist-packages/salt/utils/boto.py", line 176, in get_connection
    keyid, profile)
  File "/usr/lib/python2.7/dist-packages/salt/utils/boto.py", line 91, in _get_profile
    if not region and __salt__['config.option'](service + '.region'):
NameError: global name '__salt__' is not defined
****
root@ip-172-31-24-226:~# salt-call --version
salt-call 2015.8.4 (Beryllium)
root@ip-172-31-24-226:~# lsb_release -d
Description:    Debian GNU/Linux 8.1 (jessie)
root@ip-172-31-24-226:~# apt-cache show python-boto | grep Version
Version: 2.34.0-2
  • Also tested with updated boto 2.38.0-1 from sid - same problem
  • My previous setup instances can still run this command
    but not the new ones (which are generated out of cloud profile).
  • IAM profile is working (checked with e.g.:
root@ip-172-31-24-226:~# wget -qO- http://169.254.169.254/latest/meta-data/iam/info ; echo                          {
  "Code" : "Success",
  "LastUpdated" : "2016-02-02T13:03:22Z",
  "InstanceProfileArn" : "arn:aws:iam::123456789012:instance-profile/solr-master",
  "InstanceProfileId" : "AIPAJQXXFIYGXXXXXXXXX"
}
  • AWS seems having a temporary credentials problem - again ^^...
 root@ip-172-31-24-226:~# aws ec2 describe-instances --filters 'Name=tag:Name,Values=solr-slave-euc1-01' --region=eu-central-1

A client error (AuthFailure) occurred when calling the DescribeInstances operation: AWS was not able to validate the provided access credentials
@Reiner030
Copy link
Author

ah, forgot to mention why better error handling ^^...

In my states I have a function call to get instance id and get today for new instance only this error:

2016-02-02 14:17:23,927 [salt.state       ][CRITICAL][2725] Rendering SLS 'prod:solr' failed: mapping values are not allowed here; line 34

---
[...]
  File "/usr/lib/python2.7/dist-packages/salt/modules/boto_ec2.py", line 120, in find_instances
    conn = _get_conn(region=region, key=key, keyid=keyid, profile=profile)
  File "/usr/lib/python2.7/dist-packages/salt/utils/boto.py", line 176, in get_connection
    keyid, profile)
  File "/usr/lib/python2.7/dist-packages/salt/utils/boto.py", line 96, in _get_profile
    if not key and __salt__['config.option'](service + '.key'):    <======================
NameError: global name '__salt__' is not defined
Traceback (most recent call last):
  File "/usr/bin/salt-call", line 11, in <module>
    salt_call()
  File "/usr/lib/python2.7/dist-packages/salt/scripts.py", line 335, in salt_call
[...]

@jfindlay jfindlay added Bug broken, incorrect, or confusing behavior severity-medium 3rd level, incorrect or bad functionality, confusing and lacks a work around P2 Priority 2 RIoT Relates to integration with cloud providers, hypervisors, API-based services, etc. labels Feb 3, 2016
@jfindlay jfindlay added this to the Approved milestone Feb 3, 2016
@jfindlay
Copy link
Contributor

jfindlay commented Feb 3, 2016

@Reiner030, thanks for the report. Have you upgraded salt between the time that this was working and stopped working? I suspect there is more happening here than misplaced error messages since from what I know, the loader does not make __salt__ available for salt.utils.

@Reiner030
Copy link
Author

Hello @jfindlay, ... no all instances running same version.
The only difference is that actually AWS authentication seems be broken on new instances since yesterday...

Also tested with AWSCLI directly (which is also installed in debian image):

root@jenkins-slave-euc1-03:~# aws ec2 describe-instances --filters 'Name=tag:Name,Values=jenkins-slave-euc1-03' --region=eu-central-1

A client error (AuthFailure) occurred when calling the DescribeInstances operation: AWS was not able to validate the provided access credentials
root@jenkins-slave-euc1-03:~# aws configure list
      Name                    Value             Type    Location
      ----                    -----             ----    --------
   profile                <not set>             None    None
access_key     ****************YYJA         iam-role
secret_key     ****************VOng         iam-role
    region                <not set>             None    None

(the only public available cause I found for this problem - not correct time in instance - is checked and not the case; shouldn't be wondering on a new setup instance ;) )

@rallytime
Copy link
Contributor

@Reiner030 I have fixed this stacktrace bug in #30867. Apologies for the inconvenience. Are you in a position to give that change a try and confirm this issue is resolved for you?

@rallytime rallytime added the fixed-pls-verify fix is linked, bug author to confirm fix label Feb 3, 2016
@rallytime rallytime self-assigned this Feb 3, 2016
@Reiner030
Copy link
Author

Hello @rallytime . thanks for improving the function. I am running only "stable" Debian packages yet from your repository so testing is not possible in the moment for me.

@rallytime rallytime modified the milestones: B 3, Approved Feb 4, 2016
@rallytime
Copy link
Contributor

@Reiner030 No problem. The fix, once merged, will be available in the 2015.8.6 release.

@rallytime
Copy link
Contributor

This is fixed in 2015.8.7, which has now been released. This is also a duplicate of #30300. A such, I am going to close this issue. If this bug pops up again, leave a comment and we can readdress the issue. Thanks!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Bug broken, incorrect, or confusing behavior fixed-pls-verify fix is linked, bug author to confirm fix P2 Priority 2 RIoT Relates to integration with cloud providers, hypervisors, API-based services, etc. severity-medium 3rd level, incorrect or bad functionality, confusing and lacks a work around
Projects
None yet
Development

No branches or pull requests

3 participants