Security Group patience #189

Closed
FinchPowers opened this Issue Dec 17, 2012 · 2 comments

Projects

None yet

2 participants

@FinchPowers

Creating security group @sc-test...
!!! ERROR - InvalidGroup.NotFound: The security group 'sg-12e2d77a' does not exist

We'll need to put some patience there too.

@FinchPowers FinchPowers added a commit to datacratic/StarCluster that referenced this issue Dec 17, 2012
@FinchPowers FinchPowers #189 added patience when creating a security grp a03a6d2
@jtriley jtriley added a commit that closed this issue Dec 29, 2012
@jtriley awsutils: refetch group before calling authorize
In some cases operating on a security group object immediately after
it's been created returns a DNE error due to a race condition in the EC2
API back-end. Updated EasyEC2.create_group to avoid this issue by
waiting until the new group can successfully be refetched via
EasyEC2.get_group_or_none before calling authorize() or returning the
new group to the caller.

closes gh-189
closes gh-190
a248562
@jtriley jtriley closed this in a248562 Dec 29, 2012
@FinchPowers

It's back.

Launching master node (ami: ami-39e97d50, type: t1.micro)...
Creating security group @sc-test...
!!! ERROR - InvalidGroup.NotFound: The security group 'sg-1a141f72' does not exist
Traceback (most recent call last):
File "/home/fmlheureux/workspace/StarCluster/starcluster/cli.py", line 257, in main
sc.execute(args)
File "/home/fmlheureux/workspace/StarCluster/starcluster/commands/start.py", line 197, in execute
validate_running=validate_running)
File "/home/fmlheureux/workspace/StarCluster/starcluster/cluster.py", line 1572, in start
return self._start(create=create, create_only=create_only)
File "", line 2, in _start
File "/home/fmlheureux/workspace/StarCluster/starcluster/utils.py", line 92, in wrap_f
res = func(arg, *kargs)
File "/home/fmlheureux/workspace/StarCluster/starcluster/cluster.py", line 1587, in _start
self.create_cluster()
File "/home/fmlheureux/workspace/StarCluster/starcluster/cluster.py", line 1090, in create_cluster
self._create_spot_cluster()
File "/home/fmlheureux/workspace/StarCluster/starcluster/cluster.py", line 1143, in _create_spot_cluster
force_flat=force_flat)
File "/home/fmlheureux/workspace/StarCluster/starcluster/cluster.py", line 824, in create_node
spot_bid=spot_bid, force_flat=force_flat)[0]
File "/home/fmlheureux/workspace/StarCluster/starcluster/cluster.py", line 856, in create_nodes
cluster_sg = self.cluster_group.name
File "/home/fmlheureux/workspace/StarCluster/starcluster/cluster.py", line 659, in cluster_group
sg.add_tag(static.VERSION_TAG, str(static.VERSION))
File "/home/fmlheureux/datacratic_starcluster/local/lib/python2.7/site-packages/boto-2.7.0-py2.7.egg/boto/ec2/ec2object.py", line 79, in add_tag
status = self.connection.create_tags([self.id], {key : value})
File "/home/fmlheureux/datacratic_starcluster/local/lib/python2.7/site-packages/boto-2.7.0-py2.7.egg/boto/ec2/connection.py", line 3260, in create_tags
return self.get_status('CreateTags', params, verb='POST')
File "/home/fmlheureux/datacratic_starcluster/local/lib/python2.7/site-packages/boto-2.7.0-py2.7.egg/boto/connection.py", line 1042, in get_status
raise self.ResponseError(response.status, response.reason, body)
EC2ResponseError: EC2ResponseError: 400 Bad Request
<?xml version="1.0" encoding="UTF-8"?>
InvalidGroup.NotFoundThe security group 'sg-1a141f72' does not exist92c1b816-85d1-49ab-bda2-4b3ddf56f41f

@FinchPowers

It's now quite clear how it can be possible with the last fix. When I started that cluster, I just had terminated one seconds before. My theory is the following.

  1. delete is called
  2. starting new cluster
  3. create is called
  4. verification works, but because the delete call did not propagated yet
  5. delete propagates
  6. we get the error I just posted because the new create call did not propagate yet
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment