Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

virt: libvirt: configure virtio rng #53087

Open
ewenmcneill opened this issue May 17, 2019 · 5 comments
Open

virt: libvirt: configure virtio rng #53087

ewenmcneill opened this issue May 17, 2019 · 5 comments
Labels
Milestone

Comments

@ewenmcneill
Copy link

@ewenmcneill ewenmcneill commented May 17, 2019

Description of Issue/Question

Modern Linux (Debian Buster / Unstable, etc) is very slow to start services that depend on randomness (eg, ssh) if the random number generator takes a while to initialise. In particular (a) those services are typically trying to start before the random number generator has been re-seeded, and (b) at least by default the re-seeding of the random number generator doesn't contribute to counted entropy (which leads to the random number generator still waiting for "real" randomness before it will return results).

For virtual machines, the best solution to this is for the hypervisor to provide a virtual random number device. For instance, with libvirt it is possible to do this with something like:

    <rng model='virtio'>
      <rate bytes='192' period='300000'/>
      <backend model='random'>/dev/random</backend>
      <address type='pci' domain='0x0000' bus='0x00' slot='0x02' function='0x0'/>
    </rng>

(For more details see https://libvirt.org/formatdomain.html#elementsRng)

As far as I can tell neither salt.states.virt.running nor salt.modules.virt.init currently provide any way to pass this type of <rng model='virtio'> configuration through to libvirt, resulting in virtual machines where the randomness is slow to fill, and ssh` is not answering for many seconds after the virtual machine starts:

ewen@ashram:~$ ssh 172.20.2.64
ssh: connect to host 172.20.2.64 port 22: Connection refused
ewen@ashram:~$ 

(Note that the VM is up, as it's replying that it is refusing the connection not just hanging; it's just that 30+ second after the VM booted it still hasn't got enough randomness to allow ssh to start. That VM is running Debian Unstable, but from other reading I've done I'd expect Debian Buster, recent Ubuntu, etc, to all be the same.)

Arguably the virtio rng model should probably be configured automatically / by default for libvirt these days, maybe pointed at /dev/unradom instead of /dev/random. But at minimum there should be some way to ensure that this can be part of the libvirt VM definition -- either by explicit parameters, or maybe by providing an XML fragment to include in the libvirt definition that is generated.

Versions Report

ewen@noc:~$ salt --versions-report
Salt Version:
           Salt: 2019.2.0
 
Dependency Versions:
           cffi: 0.8.6
       cherrypy: Not Installed
       dateutil: 2.2
      docker-py: Not Installed
          gitdb: 0.5.4
      gitpython: 0.3.2 RC1
          ioflo: Not Installed
         Jinja2: 2.9.4
        libgit2: Not Installed
        libnacl: Not Installed
       M2Crypto: Not Installed
           Mako: Not Installed
   msgpack-pure: Not Installed
 msgpack-python: 0.4.2
   mysql-python: Not Installed
      pycparser: 2.10
       pycrypto: 2.6.1
   pycryptodome: Not Installed
         pygit2: Not Installed
         Python: 2.7.9 (default, Sep 25 2018, 23:32:58)
   python-gnupg: Not Installed
         PyYAML: 3.11
          PyZMQ: 14.4.0
           RAET: Not Installed
          smmap: 0.8.2
        timelib: Not Installed
        Tornado: 4.4.3
            ZMQ: 4.0.5
 
System Versions:
           dist: debian 8.11 
         locale: ANSI_X3.4-1968
        machine: i686
        release: 4.9.0-0.bpo.9-686-pae
         system: Linux
        version: debian 8.11 
 
ewen@noc:~$
@ewenmcneill

This comment has been minimized.

Copy link
Author

@ewenmcneill ewenmcneill commented May 17, 2019

FTR, this is what it looks like from inside the VM when it's waiting on randomness before it can fully start ssh:

ewen@debian-unstable:~$ ps ax | grep ssh
  267 ?        Ss     0:00 /usr/sbin/sshd -t
  290 ttyS0    S+     0:00 grep ssh
ewen@debian-unstable:~$ cat /proc/sys/kernel/random/entropy_avail
52
ewen@debian-unstable:~$ ps ax | grep ssh
  267 ?        Ss     0:00 /usr/sbin/sshd -t
  293 ttyS0    S+     0:00 grep ssh
ewen@debian-unstable:~$ 

and there's nothing listening on TCP/22. Ie, it's stuck in the "sshd -tphase, where-t` is the config test stage:

     -t      Test mode.  Only check the validity of the configuration file and
             sanity of the keys.  This is useful for updating sshd reliably as
             configuration options may change.

due to the minimal randomness.

Eventually after the randomness comes online, the sshd -t succeeds, and ssh can start properly:

ewen@debian-unstable:~$ ps ax | grep ssh
  305 ?        Ss     0:00 /usr/sbin/sshd -D
  308 ttyS0    S+     0:00 grep ssh
ewen@debian-unstable:~$ 

and then, eg, ssh to the VM will work. But as noted above, this can be 30-60 seconds for an otherwise idle (eg, test) VM.

Ewen

@ewenmcneill

This comment has been minimized.

Copy link
Author

@ewenmcneill ewenmcneill commented May 17, 2019

Also FTR, this is what it looks like inside the VM when there is a rng virtio device provided:

ewen@debian-unstable:~$ sudo dmesg | grep rng
[    0.095228] random: get_random_bytes called from start_kernel+0x93/0x52c with crng_init=0
[    1.958468] random: crng init done
ewen@debian-unstable:~$ cat /proc/sys/kernel/random/entropy_avail
771
ewen@debian-unstable:~$ uptime
 16:39:51 up 0 min,  1 user,  load average: 0.60, 0.17, 0.06
ewen@debian-unstable:~$ ps ax | grep ssh
  272 ?        Ss     0:00 /usr/sbin/sshd -D
  309 ttyS0    S+     0:00 grep ssh
ewen@debian-unstable:~$ 

Note how in under a minute (well under 30 seconds), there is plenty of randomness, and sshd is running in daemon mode and it's possible to ssh into the VM. (There's actually enough randomness for the crng to be read in about 2 seconds after initial boot, instead of 30-90 seconds.)

Ewen

@ewenmcneill

This comment has been minimized.

Copy link
Author

@ewenmcneill ewenmcneill commented May 17, 2019

In case it helps anyone else, for now I've hacked my minion template for libvirt_domain to just write out the values that I want for the rng virtio device. This works (only when the VM is first defined), but it'd probably be helpful to others if it was configurable. (Obvious things to configure are the rate in bytes, over what time period, and whether it's from /dev/random or /dev/urandom.)

When the VM is deployed I get something like:

root@naosr620:~# grep -A 4 '<rng' /etc/libvirt/qemu/debian_unstable.xml
    <rng model='virtio'>
      <rate bytes='192' period='300000'/>
      <backend model='random'>/dev/random</backend>
      <address type='pci' domain='0x0000' bus='0x00' slot='0x06' function='0x0'/>
    </rng>
root@naosr620:~# 

(and I can ssh into the VM almost immediately -- see #53087 (comment)).

Ewen

PS: Patch against 2019.2, change on the minion, and then sudo service salt-minion restart before deploying a new VM with virt.running.

ewen@naosr620:~$ diff -u /usr/lib/python2.7/dist-packages/salt/templates/virt/libvirt_domain.jinja.old /usr/lib/python2.7/dist-packages/salt/templates/virt/libvirt_domain.jinja
--- /usr/lib/python2.7/dist-packages/salt/templates/virt/libvirt_domain.jinja.old	2019-02-16 19:13:46.000000000 +1300
+++ /usr/lib/python2.7/dist-packages/salt/templates/virt/libvirt_domain.jinja	2019-05-17 16:35:21.636845068 +1200
@@ -81,6 +81,13 @@
                 {% endif %}
                 {% endif %}
 
+                {# 2019-05-17: inject rng virtio module #}
+                {# See: https://github.com/saltstack/salt/issues/53087 #}
+                <rng model='virtio'>
+                       <rate bytes='192' period='300000'/>
+                       <backend model='random'>/dev/random</backend>
+                </rng>
+
         </devices>
         <features>
                 <acpi />
ewen@naosr620:~$ 
@garethgreenaway

This comment has been minimized.

Copy link
Member

@garethgreenaway garethgreenaway commented May 17, 2019

@ewenmcneill Good find! This looks like a great start to fixing this issue, it should definitely be configurable, eg. a true/false value passed along to the template defaulting to false that includes the addition that you made to the template above. Additionally the random device could be configurable with an option as well, taking a default perhaps. Would you be able to submit a PR with the changes?

@ewenmcneill

This comment has been minimized.

Copy link
Author

@ewenmcneill ewenmcneill commented May 17, 2019

Definitely a feature request :-) It is one that's likely to become more urgent in the next 6 months or so, as I've been watching others running into these "limited randomness" issues particularly with ssh startup, for about 6 months now (and that's when I first hit it in my test Debian Unstable VM).

I'll put creating a PR for this on my (long!) todo list, but it might be some weeks before I get to look at it (among other things I have a bunch of VMs to get off old servers onto newer servers soon, which is how I found the issue). Happy if someone else wants to do it first :-)

For future reference, I suspect config like:

      - rng:
            source: /dev/urandom
            bits: 192
            interval: 300000     # ms
            model: virtio

is probably a reasonable virt.running state config snippet. And if passed through to the libvirt_domain.jinja template could configure something suitable. It'd probably also be useful to have defaults, at least for libvirt that are something like those.

Ewen

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
2 participants
You can’t perform that action at this time.