Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[salt-cloud] libcloud >= 1.0.0 incompatible regarding node_state #33367

Closed
supertom opened this issue May 19, 2016 · 5 comments
Closed

[salt-cloud] libcloud >= 1.0.0 incompatible regarding node_state #33367

supertom opened this issue May 19, 2016 · 5 comments
Labels
Bug broken, incorrect, or confusing behavior P2 Priority 2 RIoT Relates to integration with cloud providers, hypervisors, API-based services, etc. Salt-Cloud severity-medium 3rd level, incorrect or bad functionality, confusing and lacks a work around
Milestone

Comments

@supertom
Copy link
Contributor

Description of Issue/Question

The data type of NodeState members has changed in libcloud >= 1.0. In the latest stable (0.20.1), it is an integer, in >= 1.0.0, it is a string.

0.20.1
https://github.com/apache/libcloud/blob/87deb04498ea8bb5068e362810c3d2ab89359b48/libcloud/compute/types.py#L192

In version 1.0.0, it is a string

1.0.0
https://github.com/apache/libcloud/blob/735a786e05590a6a1425bff042e8cd35206579b2/libcloud/compute/types.py#L244

Salt-Cloud, explicitly the node_state function, is expecting an integer.

def node_state(id_):

This results in an error:
Failed to execute 'gce.list_nodes()' while querying for running nodes: 'running', which ultimately affects the deletion of nodes.

For context, here's the pull request in libcloud referencing where the change was made and the discussion behind it.

Possible Solutions

I see a few options to resolve this.

  • libcloud version guard for Salt. Set a max libcloud version < 1.0.0, probably around here
  • Push it on the provider wrappers. Have each wrapper do their own checks for data type and version
  • Have node_state do a data type (kind of a hack) or a version check

The first option is the easiest, of course. It's also not clear to me the future of node_state, as it's not used in too many places. Node state is an important concept; rather than a utility function, perhaps we should make it an enum or a class (in libcloud it's a class with defined data members) drivers can directly reference?

Happy to implement the fix but would like some feedback/thoughts before I do. FYI @erjohnso FYI @tonybaloney

Steps to Reproduce Issue

  1. Install libcloud >=1.0.0
  2. pip install -I apache-libcloud==1.0.0-rc2
  3. create a cloud node (tested on GCE, but reading the code, looks like ec2 would be affected)
  4. Run commands
    1. (delete a node) salt-cloud -d instance-name (No machines were found to be destroyed)
    2. (query nodes) salt-cloud -Q (no response)
  5. With the debug flag set, we see:
Traceback (most recent call last):
  File "/usr/lib/python2.7/dist-packages/salt/cloud/__init__.py", line 2352, in run_parallel_map_providers_query
    cloud.clouds[data['fun']]()
  File "/usr/lib/python2.7/dist-packages/salt/cloud/libcloudfuncs.py", line 443, in list_nodes
    'state': node_state(node.state)
  File "/usr/lib/python2.7/dist-packages/salt/cloud/libcloudfuncs.py", line 67, in node_state
    return states[id_]
KeyError: 'running' 

Versions Report

All recent salt-cloud versions expect node.state to be an integer

@Ch3LL
Copy link
Contributor

Ch3LL commented May 20, 2016

@supertom i'm not able to replicate this behavior on ec2. Here is my test case:

➜  salt git:(tagv2015.8.8.2) ✗ salt --version
salt 2015.8.8.2 (Beryllium)
  1. created the ec2 vm sudo salt-cloud --no-deploy -p ec2_centos_6 ch3ll_centos_6_1
  2. Then i deleted the vm with salt-cloud -d ch3ll_centos_6_1 with no stack trace:
amazon:
    ----------
    ec2:
        ----------
        ch3ll_centos_6_1:
            ----------
            currentState:
                ----------
                code:
                    32
                name:
                    shutting-down
            instanceId:
                i-6a4143ed
            newname:
                ch3ll_centos_6_1-DEL4917b51fa4284316bf77dfe0facbf07b
            previousState:
                ----------
                code:
                    16
                name:
                    running

as you can see my libcloud version is 1.0.0rc2:

➜  salt git:(tagv2015.8.8.2) ✗ pip list | grep -i libcloud
apache-libcloud (1.0.0rc2)
You are using pip version 8.1.1, however version 8.1.2 is available.
You should consider upgrading via the 'pip install --upgrade pip' command.

what version of salt are you running this on?

@Ch3LL Ch3LL added cannot-reproduce cannot be replicated with info/context provided info-needed waiting for more info labels May 20, 2016
@Ch3LL Ch3LL added this to the Blocked milestone May 20, 2016
@supertom
Copy link
Contributor Author

supertom commented May 23, 2016

@Ch3LL Thanks for taking a look.

Any provider that's delegating the list_nodes action to SaltStack rather than providing its own is possibly affected. I misspoke previously, EC2 isn't affected (see below as to why).

GCE is definitely affected. Test case for GCE is simply:

  • install latest salt; install libcloud 1.0.0x
  • salt-cloud --no-deploy -p zone1a test-minion
  • salt-cloud -d zone1a test-minion

You'll receive the error message: No machines were found to be destroyed. Stack trace shows:

  • KeyError: 'running'

(As I mentioned earlier, it's because node_state expects an int, but the new libcloud now returns a string.)

I'll read the code for some other providers and see if they are affected, but I don't have test suites to confirm that. Do you have those available? Others possibly affected look to be: dimensiondata, rackspace, cloudstack, and openstack.

EC2 isn't affected as it doesn't delegate this action. There is code directly in SaltStack to handle this.

We can code around this, as was done for EC2, but I think others will be affected, and then points to some potentially dead/not-useful code in libcloudfuncs.

Version Report:

Dependency Versions:
Jinja2: 2.6
M2Crypto: 0.21.1
Mako: Not Installed
PyYAML: 3.10
PyZMQ: 13.1.0
Python: 2.7.3 (default, Mar 13 2014, 11:03:55)
RAET: Not Installed
Tornado: 4.3
ZMQ: 3.2.3
cffi: Not Installed
cherrypy: Not Installed
dateutil: Not Installed
gitdb: Not Installed
gitpython: Not Installed
ioflo: Not Installed
libgit2: Not Installed
libnacl: Not Installed
msgpack-pure: Not Installed
msgpack-python: 0.1.10
mysql-python: Not Installed
pycparser: Not Installed
pycrypto: 2.6
pygit2: Not Installed
python-gnupg: Not Installed
smmap: Not Installed
timelib: Not Installed

System Versions:
dist: debian 7.10
machine: x86_64
release: 3.2.0-4-amd64
system: debian 7.10

@tonybaloney
Copy link
Contributor

Option 3. You already have the six library loaded. IMHO this is the best option because SaltStack doesn't dictate libcloud version (only minimum) and my perception is that 0.14 is most common. libcloud 1.0 (proper) is coming out v. soon so we need to support both in 2016.3.1

  • Have node_state do a data type (kind of a hack) or a version check

@Ch3LL
Copy link
Contributor

Ch3LL commented Jun 2, 2016

@supertom thanks for the clarification and pointing out I was just testing with the wrong cloud provider. Looks like I am actually seeing this with rackspace:

[DEBUG   ] Failed to execute 'openstack.list_nodes()' while querying for running nodes: 'running'
Traceback (most recent call last):
  File "/home/ch3ll/git/salt/salt/cloud/__init__.py", line 2364, in run_parallel_map_providers_query
    cloud.clouds[data['fun']]()
  File "/home/ch3ll/git/salt/salt/cloud/libcloudfuncs.py", line 443, in list_nodes
    'state': node_state(node.state)
  File "/home/ch3ll/git/salt/salt/cloud/libcloudfuncs.py", line 67, in node_state
    return states[id_]
KeyError: 'running'

Looks like @tonybaloney has created a fix. I did a quick smoke test with the fix and it seems to be working. Would you mind seeing if that fixes the issue on GCE?

@Ch3LL Ch3LL added Bug broken, incorrect, or confusing behavior P2 Priority 2 severity-medium 3rd level, incorrect or bad functionality, confusing and lacks a work around Salt-Cloud RIoT Relates to integration with cloud providers, hypervisors, API-based services, etc. and removed cannot-reproduce cannot be replicated with info/context provided info-needed waiting for more info labels Jun 2, 2016
@Ch3LL Ch3LL modified the milestones: Approved, Blocked Jun 2, 2016
@supertom
Copy link
Contributor Author

supertom commented Jun 9, 2016

@Ch3LL Tested with GCE. LGTM.

@supertom supertom closed this as completed Jun 9, 2016
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Bug broken, incorrect, or confusing behavior P2 Priority 2 RIoT Relates to integration with cloud providers, hypervisors, API-based services, etc. Salt-Cloud severity-medium 3rd level, incorrect or bad functionality, confusing and lacks a work around
Projects
None yet
Development

No branches or pull requests

3 participants