start command fails intermitently when creating a security group in a VPC #413

Open
antonio-osorio opened this Issue Jul 18, 2014 · 0 comments

Projects

None yet

1 participant

@antonio-osorio

Hi,
When starting clusters inside a VPC, I intermitently get the following error:

...
>>> Cluster template settings are valid
>>> Starting cluster...
>>> Launching a 1-node VPC cluster...
2014-07-18 17:06:10,097 cluster.py:1182 - DEBUG - Launching master (ami: ami-eeb77b86, type: g2.2xlarge)
>>> Creating security group @sc-my_cluster...
!!! ERROR - InvalidGroup.NotFound: The security group 'sg-0ee39c6b' does not exist
Traceback (most recent call last):
  File "/Users/xxxxx/Projects/external/StarCluster/starcluster/cli.py", line 274, in main
    sc.execute(args)
  File "/Users/xxxxx/Projects/external/StarCluster/starcluster/commands/start.py", line 244, in execute
    validate_running=validate_running)
  File "/Users/xxxxx/Projects/external/StarCluster/starcluster/cluster.py", line 1634, in start
    return self._start(create=create, create_only=create_only)
  File "<string>", line 2, in _start
  File "/Users/xxxxx/Projects/external/StarCluster/starcluster/utils.py", line 112, in wrap_f
    res = func(*arg, **kargs)
  File "/Users/xxxxx/Projects/external/StarCluster/starcluster/cluster.py", line 1649, in _start
    self.create_cluster()
  File "/Users/xxxxx/Projects/external/StarCluster/starcluster/cluster.py", line 1163, in create_cluster
    self._create_flat_rate_cluster()
  File "/Users/xxxxx/Projects/external/StarCluster/starcluster/cluster.py", line 1185, in _create_flat_rate_cluster
    force_flat=True)[0]
  File "/Users/xxxxx/Projects/external/StarCluster/starcluster/cluster.py", line 926, in create_nodes
    cluster_sg = self.cluster_group.name
  File "/Users/xxxxx/Projects/external/StarCluster/starcluster/cluster.py", line 655, in cluster_group
    vpc_id=vpc_id)
  File "/Users/xxxxx/Projects/external/StarCluster/starcluster/awsutils.py", line 307, in create_group
    to_port=ssh_port, cidr_ip=static.WORLD_CIDRIP)
  File "/Users/xxxxx/.virtualenvs/starcluster/lib/python2.7/site-packages/boto-2.31.1-py2.7.egg/boto/ec2/securitygroup.py", line 204, in authorize
    dry_run=dry_run)
  File "/Users/xxxxx/.virtualenvs/starcluster/lib/python2.7/site-packages/boto-2.31.1-py2.7.egg/boto/ec2/connection.py", line 3179, in authorize_security_group
    params, verb='POST')
  File "/Users/xxxxx/.virtualenvs/starcluster/lib/python2.7/site-packages/boto-2.31.1-py2.7.egg/boto/connection.py", line 1197, in get_status
    raise self.ResponseError(response.status, response.reason, body)
EC2ResponseError: EC2ResponseError: 400 Bad Request
<?xml version="1.0" encoding="UTF-8"?>
<Response><Errors><Error><Code>InvalidGroup.NotFound</Code><Message>The security group 'sg-0ee39c6b' does not exist</Message></Error></Errors><RequestID>0fe1f64d-76db-490d-8b78-6e4a6f649f25</RequestID></Response>

From some digging around the code it seems to be happening when StarCluster attempts to modify a recently created security group.

Might be related to: http://docs.aws.amazon.com/AWSEC2/latest/UserGuide/query-api-troubleshooting.html#eventual-consistency

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment