Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Error - Placement group is in use and may not be deleted. Starcluster 0.94 #277

Open
johnbot1 opened this issue Jul 25, 2013 · 6 comments
Open

Comments

@johnbot1
Copy link

Hi,

I'm able to successfully start a 2 node (master,node001) cluster in us-west-2 using m1.xlarge as the head and an HVM cc2.8xlarge as node001 without a problem. Trying to delete the cluster results in an error when it attempts to remove the placement group @sc-testcluster1. Ideas?

Thanks
John

ubuntu@ip-172-31-32-8:~$ starcluster terminate  testcluster1
StarCluster - (http://star.mit.edu/cluster) (v. 0.94)
Software Tools for Academics and Researchers (STAR)
Please submit bug reports to starcluster@mit.edu

Terminate EBS cluster testcluster1 (y/n)? y
>>> Running plugin starcluster.plugins.pkginstaller.PackageInstaller
>>> Running plugin starcluster.plugins.users.CreateUsers
>>> Running plugin starcluster.plugins.sge.SGEPlugin
>>> Running plugin starcluster.clustersetup.DefaultClusterSetup
>>> Terminating node: master (i-728bed45)
>>> Terminating node: node001 (i-dc88eeeb)
>>> Canceling spot instance request: sir-8f28603d
>>> Waiting for cluster to terminate... 
>>> Removing @sc-testcluster1 placement group
!!! ERROR - Failed to terminate cluster!

Traceback (most recent call last):
  File "/usr/local/lib/python2.7/dist-packages/StarCluster-0.94-py2.7.egg/starcluster/commands/terminate.py", line 87, in terminate
    self._terminate_cluster(cl)
  File "/usr/local/lib/python2.7/dist-packages/StarCluster-0.94-py2.7.egg/starcluster/commands/terminate.py", line 64, in _terminate_cluster
    cl.terminate_cluster()
  File "/usr/local/lib/python2.7/dist-packages/StarCluster-0.94-py2.7.egg/starcluster/cluster.py", line 1434, in terminate_cluster
    pg.delete()
  File "/usr/local/lib/python2.7/dist-packages/boto-2.9.8-py2.7.egg/boto/ec2/placementgroup.py", line 49, in delete
    return self.connection.delete_placement_group(self.name)
  File "/usr/local/lib/python2.7/dist-packages/boto-2.9.8-py2.7.egg/boto/ec2/connection.py", line 3208, in delete_placement_group
    return self.get_status('DeletePlacementGroup', params, verb='POST')
  File "/usr/local/lib/python2.7/dist-packages/boto-2.9.8-py2.7.egg/boto/connection.py", line 1096, in get_status
    raise self.ResponseError(response.status, response.reason, body)
EC2ResponseError: EC2ResponseError: 400 Bad Request
<?xml version="1.0" encoding="UTF-8"?>
<Response><Errors><Error><Code>InvalidPlacementGroup.InUse</Code><Message>The placement group '@sc-testcluster1' is in use and may not be deleted.</Message></Error></Errors><RequestID>6562a634-e123-4b96-9c6a-d2e35793fd05</RequestID></Response>
!!! ERROR - Use -f to forcefully terminate the cluster
!!! ERROR - InvalidPlacementGroup.InUse: The placement group '@sc-testcluster1' is in use and may not be deleted.
Traceback (most recent call last):
  File "/usr/local/lib/python2.7/dist-packages/StarCluster-0.94-py2.7.egg/starcluster/cli.py", line 274, in main
    sc.execute(args)
  File "/usr/local/lib/python2.7/dist-packages/StarCluster-0.94-py2.7.egg/starcluster/commands/terminate.py", line 101, in execute
    self.terminate(cluster_name, force=self.opts.force)
  File "/usr/local/lib/python2.7/dist-packages/StarCluster-0.94-py2.7.egg/starcluster/commands/terminate.py", line 87, in terminate
    self._terminate_cluster(cl)
  File "/usr/local/lib/python2.7/dist-packages/StarCluster-0.94-py2.7.egg/starcluster/commands/terminate.py", line 64, in _terminate_cluster
    cl.terminate_cluster()
  File "/usr/local/lib/python2.7/dist-packages/StarCluster-0.94-py2.7.egg/starcluster/cluster.py", line 1434, in terminate_cluster
    pg.delete()
  File "/usr/local/lib/python2.7/dist-packages/boto-2.9.8-py2.7.egg/boto/ec2/placementgroup.py", line 49, in delete
    return self.connection.delete_placement_group(self.name)
  File "/usr/local/lib/python2.7/dist-packages/boto-2.9.8-py2.7.egg/boto/ec2/connection.py", line 3208, in delete_placement_group
    return self.get_status('DeletePlacementGroup', params, verb='POST')
  File "/usr/local/lib/python2.7/dist-packages/boto-2.9.8-py2.7.egg/boto/connection.py", line 1096, in get_status
    raise self.ResponseError(response.status, response.reason, body)
EC2ResponseError: EC2ResponseError: 400 Bad Request
<?xml version="1.0" encoding="UTF-8"?>
<Response><Errors><Error><Code>InvalidPlacementGroup.InUse</Code><Message>The placement group '@sc-testcluster1' is in use and may not be deleted.</Message></Error></Errors><RequestID>6562a634-e123-4b96-9c6a-d2e35793fd05</RequestID></Response>
@johnbot1
Copy link
Author

It appears to happen very fast so I wonder if it's not waiting long enough for the instances using the security group to terminate? I should add the the worker node001 was a spot instance if that helps.

@johnbot1
Copy link
Author

After terminating the cluster only the placement group remains. Terminating again with the -force option is able to successfully terminate the placement group.

ubuntu@ip-172-31-32-8:~$ starcluster listclusters
StarCluster - (http://star.mit.edu/cluster) (v. 0.94)
Software Tools for Academics and Researchers (STAR)
Please submit bug reports to starcluster@mit.edu

-----------------------------------------------
testcluster1 (security group: @sc-testcluster1)
-----------------------------------------------
Launch time: N/A
Uptime: N/A
Zone: N/A
Keypair: N/A
EBS volumes: N/A
Cluster nodes: N/A

ubuntu@ip-172-31-32-8:~$ starcluster  terminate -f testcluster1
StarCluster - (http://star.mit.edu/cluster) (v. 0.94)
Software Tools for Academics and Researchers (STAR)
Please submit bug reports to starcluster@mit.edu

*** WARNING - Ignoring cluster settings due to --force option
Terminate cluster testcluster1 (y/n)? y
*** WARNING - Cannot run plugins: No master node found!
>>> Waiting for cluster to terminate... 
>>> Removing @sc-testcluster1 placement group
>>> Removing @sc-testcluster1 security group... 

@FinchPowers
Copy link
Contributor

Try #218

@jtriley
Copy link
Owner

jtriley commented Aug 6, 2013

@FinchPowers I'm not sure #218 will fix this. The exception is being raised on the first call to pg.delete() so the "waiting for placement group/security group to delete" stuff wouldn't even be involved in this case...

I'm going to try to reproduce this and figure out what condition we need to wait for...unfortunately we might just have to do yet another try/except loop until it's successful.

@jtriley
Copy link
Owner

jtriley commented Aug 6, 2013

@FinchPowers Also IIRC placement groups and security groups are not linked so really just comes down to whether the instances are terminated. Perhaps this could be related to the spot request not completely closing before terminating the PG? Worth testing...

@FinchPowers
Copy link
Contributor

@jtriley You are right this will not fix it. As you said, SC probably need to wait for complete instance termination before making the call to delete the placement group.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants