nixops destroy frequently fails #58

andy-dean · 2018-01-23T19:20:12Z

I occasionally get an error when running "nixops destroy" - probably 10-20% of the times I run "nixops destroy". When the error happens, I end up with a volume that is no longer attached to any EC2 instance.

Here's the command I use:

nixops destroy -d some-deploy --confirm

And here is the console output I see:

warning: are you sure you want to destroy EC2 machine ‘machine’? (y/N) y
machine> destroying EC2 machine... [shutting-down] [shutting-down] [shutting-down] [shutting-down] Traceback (most recent call last):
  File "/nix/store/0h2c0k9mr8y5pvjd3ml30ms5rdf4kia1-nixops-1.5.1/bin/..nixops-wrapped-wrapped", line 951, in <module>
    args.op()
  File "/nix/store/0h2c0k9mr8y5pvjd3ml30ms5rdf4kia1-nixops-1.5.1/bin/..nixops-wrapped-wrapped", line 400, in op_destroy
    wipe=args.wipe)
  File "/nix/store/0h2c0k9mr8y5pvjd3ml30ms5rdf4kia1-nixops-1.5.1/lib/python2.7/site-packages/nixops/deployment.py", line 1073, in destroy_resources
    self._destroy_resources(include, exclude, wipe)
  File "/nix/store/0h2c0k9mr8y5pvjd3ml30ms5rdf4kia1-nixops-1.5.1/lib/python2.7/site-packages/nixops/deployment.py", line 1067, in _destroy_resources
    nixops.parallel.run_tasks(nr_workers=-1, tasks=self.resources.values(), worker_fun=worker)
  File "/nix/store/0h2c0k9mr8y5pvjd3ml30ms5rdf4kia1-nixops-1.5.1/lib/python2.7/site-packages/nixops/parallel.py", line 41, in thread_fun
    result_queue.put((worker_fun(t), None))
  File "/nix/store/0h2c0k9mr8y5pvjd3ml30ms5rdf4kia1-nixops-1.5.1/lib/python2.7/site-packages/nixops/deployment.py", line 1060, in worker
    if m.destroy(wipe=wipe): self.delete_resource(m)
  File "/nix/store/0h2c0k9mr8y5pvjd3ml30ms5rdf4kia1-nixops-1.5.1/lib/python2.7/site-packages/nixops/backends/ec2.py", line 1257, in destroy
    instance = self._get_instance(update=True)
  File "/nix/store/0h2c0k9mr8y5pvjd3ml30ms5rdf4kia1-nixops-1.5.1/lib/python2.7/site-packages/nixops/backends/ec2.py", line 285, in _get_instance
    assert instance_id
AssertionError

The text was updated successfully, but these errors were encountered:

coretemp · 2018-01-26T12:26:19Z

From what I can see, self.vm_id is probably None causing the assertion to fail. The root cause of the issue is that there is no documentation for vm_id. Additionally, it appears that nixops assumes that AWS APIs return an answer every single time, which is not the case.

If the size of your nixops deployment grows towards thousands of machines, the probability of a failed deployment will go to 1.

AWS APIs are rate limited, but there is nothing in nixops that tries to cope with failure. The automation in nixops seems to be limited currently, because of its many failure modes.

grahamc transferred this issue from NixOS/nixops Apr 20, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

nixops destroy frequently fails #58

nixops destroy frequently fails #58

andy-dean commented Jan 23, 2018

coretemp commented Jan 26, 2018

nixops destroy frequently fails #58

nixops destroy frequently fails #58

Comments

andy-dean commented Jan 23, 2018

coretemp commented Jan 26, 2018