Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Hangs at copying closure... #1557

Closed
n8henrie opened this issue Apr 26, 2023 · 13 comments
Closed

Hangs at copying closure... #1557

n8henrie opened this issue Apr 26, 2023 · 13 comments

Comments

@n8henrie
Copy link

I have a nixops configuration (my first one) that has been working for a couple months. Earlier this week I went to redeploy (without config changes, trying to debug an intermittent docker issue on the target machine) and it hung at copying closure.... Since then it hangs here every time. I've tried running with --debug and no relevant additional information is printed.

I've tried rebooting the target machine, nixos-rebuild --rollback to a prior working generation on the target machine, updating nixopsUnstable on the host machine.

rsyncing the config over and doing nixos-rebuild switch locally works without errors (I have it set up so it uses the same config through nixops).

Are there other debugging steps I can take?

@lelit
Copy link

lelit commented Apr 29, 2023

I had the same problem and after some investigation I determined that it was due to using nix 2.15, so the quickest workaround has been something like the following

let
  # As of 2023-04-19, nix 2.15 breaks nixops: it becomes stalled at "copying closure" step
  # Use latest version known to be working, 2.14.1
  currentMasterPkgs = import (builtins.fetchTarball {
    url = "https://github.com/NixOS/nixpkgs/archive/8ad5e8132c5dcf977e308e7bf5517cc6cc0bf7d8.tar.gz";
  }) {};
  nix214 = currentMasterPkgs.nixVersions.nix_2_14;
in mkShell {
  buildInputs = [
    nix214
    pkgs.nixops
  ];
}

@n8henrie
Copy link
Author

@lelit you are a life-saver.

$ nix shell github:nixos/nixpkgs/8ad5e8132c5dcf977e308e7bf5517cc6cc0bf7d8#nix -c nixops deploy

working like a charm. Is this a documented issue somewhere else? If not I'll start a bisect -- I'm not even sure if the issue would belong here, at nixpkgs/nix, or elsewhere.

@lelit
Copy link

lelit commented Apr 29, 2023

I looked around but did not find anything. I identified latest nix as the culprit by sheer luck, knowing that the previous deploy of a week before worked as usual, and the only difference on my work machine has been a monthly update of its NixOS.

@sebastiant
Copy link

sebastiant commented May 26, 2023

Thanks for the workaround! A slightly less reproducable but easier to remember nix-shell command:
$ nix-shell -p nixVersions.nix_2_14

@nyarly
Copy link

nyarly commented Jun 9, 2023

So glad to have finally found this - is there an issue opened against NixOs/nix for this?

@n8henrie
Copy link
Author

n8henrie commented Jun 9, 2023

I tried to make a script to bisect nixpkgs but couldn't find the culprit commit. Maybe I'm bisecting the wrong codebase.

vincentbernat added a commit to vincentbernat/nixops-take1 that referenced this issue Jun 10, 2023
There is a bug in Nix 2.15 where NixOps does not work anymore. See:
NixOS/nixops#1557
@corngood
Copy link
Contributor

corngood commented Jun 13, 2023

Just investigating this a little...

nixops is running

export NIX_SSHOPTS="-o Port=22 -o StrictHostKeyChecking=accept-new -i /tmp/nixops-tmpnsxeilim/id_nixops-*** -o ControlPath=/tmp/nixops-ssh-tmplqvmzvpi/master-socket"
nix-copy-closure --to *** /nix/store/5d1604l152flbxdzashz59165bqqc9hr-nixos-system-*** --use-substitutes

The socket doesn't actually exist. It doesn't hang without the -o ControlPath. I'll bisect nix, but it looks like it might be partially the fault of nixops.

@corngood
Copy link
Contributor

NixOS/nix@5291a82

Is the first bad commit on nix.

@nyarly
Copy link

nyarly commented Jun 13, 2023

That's what I was finding: if you run the command without -oControlPath, it works great. With, it hangs, and because nix-copy-closure does one request to figure out what to copy and then starts doing copies, I think it's guaranteed to trigger the bug here.

@nyarly
Copy link

nyarly commented Jun 13, 2023

NixOS/nix@5291a82

Is the first bad commit on nix.

That checks out with NixOS/nix#8329. I did find that downgrading my NixOps shell.nix to Nix 2.13 fixed the issue.

@corngood
Copy link
Contributor

NixOS/nix#8329 was fixed, but only for cases where the master could be found without NIX_SSHOPTS. The remaining issue is NixOS/nix#8480, and my proposed fix is in NixOS/nix#8506.

@kevincox
Copy link
Contributor

This seems fixed. Tested on nixos-unstable. Can someone second this and we can close?

@jerith666
Copy link

Yes, with:

$ nix --version && nixops --version
nix (Nix) 2.17.0
NixOps 1.7

... nixops deploy works fine, I no longer get a hang at "copying closure". Thanks!

@roberth roberth closed this as completed Sep 30, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

8 participants