Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Ansible remote provisioner hangs building CentOS VMs in Virtualbox #6877

Open
DagSonsteboSB opened this issue Oct 18, 2018 · 8 comments · May be fixed by #8625
Open

Ansible remote provisioner hangs building CentOS VMs in Virtualbox #6877

DagSonsteboSB opened this issue Oct 18, 2018 · 8 comments · May be fixed by #8625

Comments

@DagSonsteboSB
Copy link

@DagSonsteboSB DagSonsteboSB commented Oct 18, 2018

Environment:

  • Host platform: OSX Sierra 10.12 / OSX Mojave 10.14 (both platforms tested independently, same behaviour)
  • Packer version: 1.3.1
  • Ansible version: ansible 2.7.0
  • VIrtualbox version: 5.2.18
  • Provisioner configuration in playbook:
  "provisioners": [
    {
      "type": "ansible",
  	  "playbook_file": "ansible/anscentosmgmt.yml",
      "extra_arguments": [ "-vv" ],
      "ansible_env_vars": [ "ANSIBLE_HOST_KEY_CHECKING=False", "ANSIBLE_SSH_ARGS='-o ControlMaster=auto -o ControlPersist=60s'" ],
      "user": "root"
    }
  ]

Summary of build task:

  • Kickstart build of CentOS 7.5 VM
  • Ansible playbook carrying out basic OS config followed by application install. Playbook consists of about 10 different roles.

Observed behaviour:

  • First of all this is not new, I have seen this behaviour in previous packer / ansible / guest OS versions over the last couple of years. It's a lot worse with CentOS7 guests - going back to CentOS6 has been the solution in the past.
  • Having reviewed a number of tickets it seems this is a repeating regression - multiple tickets have been logged, fixed and closed in the past for this.
  • Ansible remote provisioner will hang forever at completely random points throughout the playbook, and the issues don't seem to be ansible task type related.
  • " [DEBUG] Opening new ssh session" seems to always be the last log entry, hence suspect this has to do with ssh session handling.
  • However stopping the guest VM will immediately kill off the packer job, hence SSH session is assumed to be established.

Packer logs:
2018/10/18 16:33:33 packer: 2018/10/18 16:33:33 [DEBUG] Opening new ssh session
2018/10/18 16:33:33 packer: 2018/10/18 16:33:33 [DEBUG] starting remote command: /bin/sh -c 'chmod u+x /root/.ansible/tmp/ansible-tmp-1539876812.741955-174870228422670/ /root/.ansible/tmp/ansible-tmp-1539876812.741955-174870228422670/AnsiballZ_systemd.py && sleep 0'
2018/10/18 16:33:45 packer: 2018/10/18 16:33:45 [INFO] RPC endpoint: Communicator ended with: 0
2018/10/18 16:33:45 [INFO] 0 bytes written for 'stdout'
2018/10/18 16:33:45 [INFO] 0 bytes written for 'stderr'
2018/10/18 16:33:45 [INFO] RPC client: Communicator ended with: 0
2018/10/18 16:33:45 [INFO] RPC endpoint: Communicator ended with: 0
2018/10/18 16:33:45 [INFO] 0 bytes written for 'stdin'
2018/10/18 16:33:45 packer: 2018/10/18 16:33:45 [INFO] 0 bytes written for 'stdout'
2018/10/18 16:33:45 packer: 2018/10/18 16:33:45 [INFO] 0 bytes written for 'stderr'
2018/10/18 16:33:45 packer: 2018/10/18 16:33:45 [INFO] RPC client: Communicator ended with: 0
2018/10/18 16:33:45 packer: 2018/10/18 16:33:45 [INFO] 0 bytes written for 'stdin'
2018/10/18 16:33:45 packer: 2018/10/18 16:33:45 ansible provisioner pty-req request
2018/10/18 16:33:45 packer: 2018/10/18 16:33:45 new exec request: /bin/sh -c '/usr/bin/python /root/.ansible/tmp/ansible-tmp-1539876812.741955-174870228422670/AnsiballZ_systemd.py && sleep 0'
2018/10/18 16:33:45 packer: 2018/10/18 16:33:45 [DEBUG] Opening new ssh session

The following appears when you shut down the guest VM:
2018/10/18 17:26:13 packer: 2018/10/18 17:26:13 [ERROR] ssh session open error: 'ssh: unexpected packet in response to channel open: ', attempting reconnect
2018/10/18 17:26:13 packer: 2018/10/18 17:26:13 [DEBUG] reconnecting to TCP connection for SSH
2018/10/18 17:26:13 packer: 2018/10/18 17:26:13 [ERROR] reconnection error: dial tcp 127.0.0.1:3262: connect: connection refused
2018/10/18 17:26:13 [INFO] 0 bytes written for 'stderr'
2018/10/18 17:26:13 [INFO] 0 bytes written for 'stdout'
2018/10/18 17:26:13 [INFO] 0 bytes written for 'stdin'

@DanielMarquard

This comment has been minimized.

Copy link

@DanielMarquard DanielMarquard commented Dec 31, 2018

I'm seeing the same issue. The weird thing is that I can open a new terminal and SSH into the VM without any issues.

@DanielMarquard

This comment has been minimized.

Copy link

@DanielMarquard DanielMarquard commented Jan 1, 2019

Using -vvvv, I can see that a large yum update hangs. The output of the installation progress is cut off in the same place every time with no errors. I didn't have this issue building a similar image in qemu, but I'm not convinced that VirtualBox is the problem here. Still trying to isolate the issue.

@DagSonsteboSB, what do you see when you use -vvvv instead of -vv?

Happy New Year! 😄

@DagSonsteboSB

This comment has been minimized.

Copy link
Author

@DagSonsteboSB DagSonsteboSB commented Jan 3, 2019

@DanielMarquard Happy New Year to you as well - all the best for 2019.

In my experience the process would hang at completely random places in the installs. I did try with -vvvv and saw no further difference, just a full stop of processing. Same as you I would still SSH into the VM in question, but the original install script would continue hanging until I ctrl-c'd it.

I have had to drop back to using the ansible-local provisioner in the meantime, following this bug for progress before I go back. In my case the ansible-local provisioner works fine, but ultimately this just relies on local execution rather than SSH, hence I wouldn't expect to see the same issue.

@jalbstmeijer

This comment has been minimized.

Copy link

@jalbstmeijer jalbstmeijer commented Feb 26, 2019

Hi,
Same issue here.
Running packer 1.3.4 from a Centos6 host, using the amazon-ebs builder and Ansible provisioner to create Centos6/7 ami's.

Just hangs after this:

amazon-ebs: TASK [Gathering Facts] *********************************************************
2019/02/25 17:04:14 packer-1.3.4: 2019/02/25 17:04:14 SSH proxy: accepted connection
2019/02/25 17:04:14 packer-1.3.4: 2019/02/25 17:04:14 authentication attempt from 127.0.0.1:33032 to 127.0.0.1:41559 as xxx using none
2019/02/25 17:04:14 packer-1.3.4: 2019/02/25 17:04:14 authentication attempt from 127.0.0.1:33032 to 127.0.0.1:41559 as xxx using publickey
2019/02/25 17:04:14 packer-1.3.4: 2019/02/25 17:04:14 rejecting auth-agent-req@openssh.com request
2019/02/25 17:04:14 packer-1.3.4: 2019/02/25 17:04:14 new env request: LC_PAPER=nl_NL.UTF-8
2019/02/25 17:04:14 packer-1.3.4: 2019/02/25 17:04:14 new env request: LC_MONETARY=nl_NL.UTF-8
2019/02/25 17:04:14 packer-1.3.4: 2019/02/25 17:04:14 new env request: LC_NUMERIC=nl_NL.UTF-8
2019/02/25 17:04:14 packer-1.3.4: 2019/02/25 17:04:14 new env request: XMODIFIERS=@im=ibus
2019/02/25 17:04:14 packer-1.3.4: 2019/02/25 17:04:14 new env request: LANG=en_US.UTF-8
2019/02/25 17:04:14 packer-1.3.4: 2019/02/25 17:04:14 new env request: LC_MEASUREMENT=nl_NL.UTF-8
2019/02/25 17:04:14 packer-1.3.4: 2019/02/25 17:04:14 new env request: LC_TIME=nl_NL.UTF-8
2019/02/25 17:04:14 packer-1.3.4: 2019/02/25 17:04:14 new exec request: /bin/sh -c '/usr/bin/python && sleep 0'
2019/02/25 17:04:14 packer-1.3.4: 2019/02/25 17:04:14 [DEBUG] Opening new ssh session
2019/02/25 17:04:14 packer-1.3.4: 2019/02/25 17:04:14 [INFO] 213455 bytes written for 'stdin'
2019/02/25 17:04:14 [INFO] 213455 bytes written for 'stdin'
2019/02/25 17:04:14 packer-1.3.4: 2019/02/25 17:04:14 [DEBUG] starting remote command: /bin/sh -c '/usr/bin/python && sleep 0'

In #4993 I read "using the ansible provisionner is discouraged in favor of shell-local".
Is that true? Is that why this issue does not seem to get much attention?

For now I'm using shell-local to invoke Ansible, having to work around the fact that the provisionner can not lookup the target ip, not making it feel like the preferred way to go yet.

Gr, J

@SwampDragons

This comment has been minimized.

Copy link
Member

@SwampDragons SwampDragons commented Feb 26, 2019

Is that why this issue does not seem to get much attention?

Yes, The Ansible provisioner is a community-supported plugin, which means HashiCorp's two full-time engineers who are dedicated to Packer (myself and @azr) aren't going to spend time on it other than reviewing pull requests.

We're trying to figure out a good way to share the target ip with the shell provisioner to make this easier.

@jalbstmeijer

This comment has been minimized.

Copy link

@jalbstmeijer jalbstmeijer commented Feb 27, 2019

Thanks for clearing that up

We're trying to figure out a good way to share the target ip with the shell provisioner to make this easier.

Yes, in my use case I need, ssh ip, ssh port and ssh key available for the shell provisioner

@alistairfay

This comment has been minimized.

Copy link

@alistairfay alistairfay commented Nov 8, 2019

I've been troubleshooting a similar issue today (building a RHEL machine via virtual box and using the ansible (remote) provisioner)

I have some success with adding 'pipelining=True' to my ansible.cfg in the [ssh_connection] section and the role I was running got much further but then stalled in a similar way later on.

It appears to me that the issue is that virtualbox sets up port forwarding and on large ansible runs there are many ssh connections and we eventually run out.

I'd like to understand how virtual box implements the port forwarding and if we can monitor the amount of open connections and check against any limit.

@nirix

This comment has been minimized.

Copy link

@nirix nirix commented Nov 12, 2019

We had the same issue with building VM's using Packer, Ansible, and Virtualbox. We run Linux hosts (Fedora and Ubuntu) to build Linux guests (RHEL and CentOS), over the past year or so we've tried countless versions of Packer, Ansible, and Virtualbox.

I finally found a workaround for the hanging by running Ansible in the VM itself with ansible-local.

@SwampDragons SwampDragons linked a pull request that will close this issue Jan 17, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Linked pull requests

Successfully merging a pull request may close this issue.

6 participants
You can’t perform that action at this time.