Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

nixops destroy frequently fails #58

Open
andy-dean opened this issue Jan 23, 2018 · 1 comment
Open

nixops destroy frequently fails #58

andy-dean opened this issue Jan 23, 2018 · 1 comment

Comments

@andy-dean
Copy link

I occasionally get an error when running "nixops destroy" - probably 10-20% of the times I run "nixops destroy". When the error happens, I end up with a volume that is no longer attached to any EC2 instance.

Here's the command I use:

nixops destroy -d some-deploy --confirm

And here is the console output I see:

warning: are you sure you want to destroy EC2 machine ‘machine’? (y/N) y
machine> destroying EC2 machine... [shutting-down] [shutting-down] [shutting-down] [shutting-down] Traceback (most recent call last):
  File "/nix/store/0h2c0k9mr8y5pvjd3ml30ms5rdf4kia1-nixops-1.5.1/bin/..nixops-wrapped-wrapped", line 951, in <module>
    args.op()
  File "/nix/store/0h2c0k9mr8y5pvjd3ml30ms5rdf4kia1-nixops-1.5.1/bin/..nixops-wrapped-wrapped", line 400, in op_destroy
    wipe=args.wipe)
  File "/nix/store/0h2c0k9mr8y5pvjd3ml30ms5rdf4kia1-nixops-1.5.1/lib/python2.7/site-packages/nixops/deployment.py", line 1073, in destroy_resources
    self._destroy_resources(include, exclude, wipe)
  File "/nix/store/0h2c0k9mr8y5pvjd3ml30ms5rdf4kia1-nixops-1.5.1/lib/python2.7/site-packages/nixops/deployment.py", line 1067, in _destroy_resources
    nixops.parallel.run_tasks(nr_workers=-1, tasks=self.resources.values(), worker_fun=worker)
  File "/nix/store/0h2c0k9mr8y5pvjd3ml30ms5rdf4kia1-nixops-1.5.1/lib/python2.7/site-packages/nixops/parallel.py", line 41, in thread_fun
    result_queue.put((worker_fun(t), None))
  File "/nix/store/0h2c0k9mr8y5pvjd3ml30ms5rdf4kia1-nixops-1.5.1/lib/python2.7/site-packages/nixops/deployment.py", line 1060, in worker
    if m.destroy(wipe=wipe): self.delete_resource(m)
  File "/nix/store/0h2c0k9mr8y5pvjd3ml30ms5rdf4kia1-nixops-1.5.1/lib/python2.7/site-packages/nixops/backends/ec2.py", line 1257, in destroy
    instance = self._get_instance(update=True)
  File "/nix/store/0h2c0k9mr8y5pvjd3ml30ms5rdf4kia1-nixops-1.5.1/lib/python2.7/site-packages/nixops/backends/ec2.py", line 285, in _get_instance
    assert instance_id
AssertionError
@coretemp
Copy link

From what I can see, self.vm_id is probably None causing the assertion to fail. The root cause of the issue is that there is no documentation for vm_id. Additionally, it appears that nixops assumes that AWS APIs return an answer every single time, which is not the case.

If the size of your nixops deployment grows towards thousands of machines, the probability of a failed deployment will go to 1.

AWS APIs are rate limited, but there is nothing in nixops that tries to cope with failure. The automation in nixops seems to be limited currently, because of its many failure modes.

@grahamc grahamc transferred this issue from NixOS/nixops Apr 20, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants