Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Vagrant ssh hangs with multiple boxes. Standard ssh to the private ip as vagrant@192.x.x.x works fine. #10899

Open
queglay opened this issue Jun 10, 2019 · 3 comments

Comments

@queglay
Copy link

queglay commented Jun 10, 2019

Vagrant version

2.2.4

Host operating system

Macos 10.13.6

Guest operating system

bento/ubuntu16.04 & ubuntu/xenial64

Vagrantfile

# you must install this plugin to set the disk size:
# vagrant plugin install vagrant-disksize

Vagrant.configure("2") do |config|
  # Ubuntu 16.04
  # config.vm.box = "ubuntu/xenial64"
  # networking issues
  config.vm.box = "bento/ubuntu-16.04"
  config.vm.box_version = "201812.27.0"
  # cant install xserver-xorg-legacy
  # config.vm.box = "ubuntu/trusty64"
  #config.vm.box = "bento/ubuntu-17.10"
  #18 has no rc.local
  #config.vm.box = "bento/ubuntu-18.04"
  #config.vm.box_version = "20190411.0.0"
  #config.ssh.username = "vagrant"
  #config.ssh.password = ENV['TF_VAR_vagrant_password']

  mac_string = ENV['TF_VAR_vagrant_mac']
  vaultkeypresent = ENV['TF_VAR_vaultkeypresent']
  bridgenic = ENV['TF_VAR_bridgenic']
  envtier = ENV['TF_VAR_envtier']
  name = ENV['TF_VAR_openfirehawkserver_name']
  openfirehawkserver = ENV['TF_VAR_openfirehawkserver']
  network = ENV['TF_VAR_network']

  config.vm.define "ansible_control_"+envtier
  config.vagrant.plugins = ['vagrant-disksize', 'vagrant-reload']
  config.disksize.size = '65536MB'

  if network == 'public'
      config.vm.network "public_network", mac: mac_string
      #, bridge: bridgenic
    else
      # use a private network mode if you don't have control over the network environment - eg wifi in a cafe / other location.
      config.vm.network "private_network", ip: openfirehawkserver, mac: mac_string
    end
  
  # routing issues?  https://stackoverflow.com/questions/35208188/how-can-i-define-network-settings-with-vagrant
  config.vm.provider "virtualbox" do |vb|
    # Display the VirtualBox GUI when booting the machine
    vb.gui = true
    # Customize the amount of memory on the VM:
    vb.memory = ENV['TF_VAR_openfirehawkserver_ram']
    vb.cpus = ENV['TF_VAR_openfirehawkserver_vcpus']
    vb.customize ["modifyvm", :id, "--accelerate2dvideo", "on"]
    vb.customize ["modifyvm", :id, "--accelerate3d", "on"]
    vb.customize ['modifyvm', :id, '--clipboard', 'bidirectional']
    #enable promiscuous mode to enable routes from aws through the openfirehawkserver vpn into your local network
    #vb.customize ["modifyvm", :id, "--nicpromisc0", "allow-all"]
    #vb.customize ["modifyvm", :id, "--nicpromisc1", "allow-all"]
    vb.customize ["modifyvm", :id, "--nicpromisc2", "allow-all"]
    vb.customize ["modifyvm", :id, "--nicpromisc3", "allow-all"]
  end
  config.vm.provision "shell", inline: "echo 'source /vagrant/scripts/env.sh' > /etc/profile.d/sa-environment.sh", :run => 'always'
  config.vm.provision "shell", inline: "echo DEBIAN_FRONTEND=$DEBIAN_FRONTEND"

  config.vm.provision "shell", inline: "export DEBIAN_FRONTEND=noninteractive"
  config.vm.provision "shell", inline: "sudo rm /etc/localtime && sudo ln -s /usr/share/zoneinfo/Australia/Brisbane /etc/localtime", run: "always"
  config.vm.provision "shell", inline: "sudo apt-get update"
  # temp disable as we are getting freezing with ssh issues
  config.vm.provision "shell", inline: "sudo apt-get install -y sshpass"

  ### Install Ansible Block ###
  config.vm.provision "shell", inline: "sudo apt-get install -y software-properties-common"
  #config.vm.provision "shell", inline: "pip install --upgrade pip"
  #config.vm.provision "shell", inline: "sudo apt-get install -y python-pip python-dev"
  #pip install --upgrade pip
  #config.vm.provision "shell", inline: "sudo -H pip install ansible==2.7.11"
  # to list available versions - pip install ansible==
  config.vm.provision "shell", inline: "sudo apt-add-repository --yes --update ppa:ansible/ansible"
  config.vm.provision "shell", inline: "sudo apt-get install -y ansible"

  # we define the location of the ansible hosts file in an environment variable.
  config.vm.provision "shell", inline: "grep -qxF 'ANSIBLE_INVENTORY=/vagrant/ansible/hosts' /etc/environment || echo 'ANSIBLE_INVENTORY=/vagrant/ansible/hosts' | sudo tee -a /etc/environment"
  
  # these utils are likely require dfor promisc mode on ethernet which is required if routing on a local network.
  config.vm.provision "shell", inline: "sudo apt-get install -y virtualbox-guest-dkms"
  # looks like guest utils is the culprit
  config.vm.provision "shell", inline: "sudo apt-get install -y virtualbox-guest-utils"

  #reboot required for desktop to function.

  # ### Install ubuntu desktop and virtualbox additions.  Because a reboot is required, provisioning is handled here. ###
  # # # Install the gui with vagrant or install the gui with ansible installed on the host.  
  # # # This creates potentiall issues because ideally, Ansible should be used within the vm only to limit ansible version issues if the user updates vagrant on their host.
  # config.vm.provision "shell", inline: "sudo apt-get install -y ubuntu-desktop"
  # # ...or xfce.  pick one.
  # #config.vm.provision "shell", inline: "sudo apt-get install -y curl xfce4"
  # config.vm.provision "shell", inline: "sudo apt-get install -y virtualbox-guest-dkms virtualbox-guest-utils virtualbox-guest-x11 xserver-xorg-legacy"
  # # Permit anyone to start the GUI
  # config.vm.provision "shell", inline: "sudo sed -i 's/allowed_users=.*$/allowed_users=anybody/' /etc/X11/Xwrapper.config"
  # ## End Ubuntu Desktop block ###

  # #disable the update notifier.  We do not want to update to ubuntu 18, currently deadline installer gui doesn't work in 18.
  config.vm.provision "shell", inline: "sudo sed -i 's/Prompt=.*$/Prompt=never/' /etc/update-manager/release-upgrades"
  

  # for dpkg or virtualbox issues, see https://superuser.com/questions/298367/how-to-fix-virtualbox-startup-error-vboxadd-service-failed

  config.vm.provision "shell", inline: "sudo reboot"
  # trigger reload
  config.vm.provision :reload
  config.trigger.after :up do |trigger|
    trigger.warn = "Taking Snapshot"
    trigger.run = {inline: "vagrant snapshot push"}
  end

  config.vm.provision "shell", inline: "sudo reboot"
  config.vm.provision :reload
  
end

Debug output

Cannot generate debugging info because ssh is broken, output below

The error message is shown below. In many cases, errors from this
library are caused by ssh-agent issues. Try disabling your SSH
agent or removing some keys and try again.

If the problem persists, please report a bug to the net-ssh project.

timeout during server version negotiating
An error occurred in the underlying SSH library that Vagrant uses.
The error message is shown below. In many cases, errors from this
library are caused by ssh-agent issues. Try disabling your SSH
agent or removing some keys and try again.

If the problem persists, please report a bug to the net-ssh project.

timeout during server version negotiating
 INFO interface: Machine: error-exit ["Vagrant::Errors::NetSSHException", "An error occurred in the underlying SSH library that Vagrant uses.\nThe error message is shown below. In many cases, errors from this\nlibrary are caused by ssh-agent issues. Try disabling your SSH\nagent or removing some keys and try again.\n\nIf the problem persists, please report a bug to the net-ssh project.\n\ntimeout during server version negotiating"]

Expected behavior

ssh should not hang

Actual behavior

after some intermittent period of time, the vagrant ssh session will hang.

interestingly, only vagrant ssh is broken. I can still ssh to the private IP on my local network as the vagrant user or another new user.

Steps to reproduce

  1. I started with a fresh install of macos 10.13.6, vagrant 2.2.4, ansible 2.8.1
  2. I launch my vm with vagrant up (note this has quite a few environment variables that I setup elsewhere.
  3. I after some time the session will freeze, sometimes within a day.
  4. I will still be able to login using a standard ssh to the private ip, only vagrant ssh is broken. I am also able to use the virtual box gui window to login. however, it appears that if I login using standard ssh, although a valid ip exists on my local network, it will not have access to the web because the default route is via 10.0.2.0
  5. I have also tested with multiple NIC's as the primary on the host (a thunderbolt 10GBE NIC, and the inbuilt wifi NIC in a Macbook Pro 2017)
  6. A reboot may fix the issue for some time, but it will reoccur.
  7. I have tested both bento/ubuntu16.04 and ubuntu/xenial64 to eliminate issue related to either box.
  8. I have reproduced the problem over many months and many multiple reinstalls of the host os to clean out potential config from a bad version of virtual box or other software.
@briancain
Copy link
Member

Hey there @queglay - What version of VirtualBox are you using? Have you tried upgrading to the latest version of that? Also, if you run vagrant ssh --debug, you should be able to see the exact ssh command vagrant is using to log into the box. If you run that yourself, do you experience the same issue? Thanks!

@queglay
Copy link
Author

queglay commented Jun 12, 2019

I neglected to mention that I do need to control my routes because it runs open vpn, as a router. I realised that the default route was 10.2.something.something
I've been trying next this line out for the last three days to ensure the default route is actually 192.168.x.1, and its been stable-

config.vm.network "public_network", mac: mac_string, use_dhcp_assigned_default_route: true

Why do vagrant boxes change the default route? if this is indeed the fix its been a major source of pain for me to work this out, and I don't even know why it has fixed the problem.

I'm using the latest major version for those boxes-
5.2.30
when I went to a higher version than that I had other problems.

@queglay
Copy link
Author

queglay commented Aug 21, 2019

this problem has returned today. I was using a defined ip, and mac with this line

config.vm.network "public_network", ip: openfirehawkserver, mac: mac_string, use_dhcp_assigned_default_route: true

I also tried using an assigned ip from the router instead, but got a hang as soon as I ssh'd into the instance.

config.vm.network "public_network", mac: mac_string, use_dhcp_assigned_default_route: true, bridge: bridgenic

I also ran vagrant ssh --debug, ad aquired the command. it looked like this-

/usr/bin/ssh "vagrant@127.0.0.1" -p "2222" -o LogLevel=FATAL -o Compression=yes -o DSAAuthentication=yes -o IdentitiesOnly=yes -o StrictHostKeyChecking=no -o UserKnownHostsFile=/dev/null -i /somepath/machines/project/virtualbox/private_key

It hangs if I just use this command, and eventually throws back -

ssh_exchange_identification: read: Connection reset by peer

The vm interacts fine through the standard virtualbox window, so its to do with ssh.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants