Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Allowing autostart setting for KVM deployed VMs with persistent images, to restart them upon node reboot #599

Open
rsmontero opened this issue Nov 20, 2017 · 20 comments

Comments

@rsmontero
Copy link
Member


Author Name: Olivier Berger (Olivier Berger)
Original Redmine Issue: 1290, https://dev.opennebula.org/issues/1290
Original Date: 2012-05-23


As discussed in http://lists.opennebula.org/pipermail/users-opennebula.org/2012-May/008959.html I think it would be great to offer some support for automatic restarting of VMs (libvirt autostart setting) images that would have persistent images, in case a node is rebooted (or in case of power outages and other restarts).

One main change involved is the need to define the domains, instead of transient domains.

See the discussion of proposed changes

@rsmontero
Copy link
Member Author


Original Redmine Comment
Author Name: jordan pittier (jordan pittier)
Original Date: 2012-05-24T18:48:06Z


Sounds great but, isn't this KVM specific ? Although I am using KVM, I like the way Opennebula tries to be as much "cross hypervisor" as possible.

@rsmontero
Copy link
Member Author


Original Redmine Comment
Author Name: Olivier Berger (Olivier Berger)
Original Date: 2012-05-24T20:55:58Z


jordan pittier wrote:

Sounds great but, isn't this KVM specific ? Although I am using KVM, I like the way Opennebula tries to be as much "cross hypervisor" as possible.

I don't think this is specific to KVM (although I mentioned KVM in the title of this ticket), as it is a a libvirt option (see http://libvirt.org/sources/virshcmdref/html/sect-autostart.html).

Still it may happen that libvirt only supports this for KVM. I haven't tested with others. But I couldn't imagine other hypervisors wouldn't have such a feature :-/

@rsmontero
Copy link
Member Author


Original Redmine Comment
Author Name: jordan pittier (jordan pittier)
Original Date: 2012-05-31T14:14:34Z


You are correct.

The this is, Opennebula uses libvirt only to manage KVM hosts.

@rsmontero
Copy link
Member Author


Original Redmine Comment
Author Name: Daniel Dehennin (Daniel Dehennin)
Original Date: 2013-10-02T12:07:19Z


+1

It could be great to have a checkbox option in template definition and/or at VM instance time.

Some tests may be performed before enabling the “autostart” as I'm not sure it will work for non-persistent disk.

Thanks.

@rsmontero
Copy link
Member Author


Original Redmine Comment
Author Name: Daniel Dehennin (Daniel Dehennin)
Original Date: 2013-11-07T11:17:36Z


Daniel Dehennin wrote:

+1

It could be great to have a checkbox option in template definition and/or at VM instance time.

We could also define a name prefix to use when ONE will @onetemplate instanciate@ automatically.

Some tests may be performed before enabling the “autostart” as I'm not sure it will work for non-persistent disk.

I thought a little to this issue as we need it and I wonder if it could not be implemented with actual features of ONE instead of using libvirt feature (for KVM).

In fact, I'm quite sure we should not use libvirt to manage this even for KVM VMs, on my own system, service dependencies is problematic between libvirt and OpenvSwitch.

Instead we could use the @suspend@, @Stop@ and @resume@ mechanisms, making it working with non-persistent storage if I understood correctly theses commands.

We must face different use-cases, depending of what happen and if we are in single node or multi nodes.

For single node

  1. on planned shutdown/reboot after an upgrade for example (init 0/init 6)

when the ONE node is shutdown, we could just run @OneVM suspend@ all the running VM and put them in an @autoboot@ state.

when the ONE node is booted, run @OneVM resume@ on each VMs in @autoboot@ state

  1. on ONE node crash like hardware failure

when the ONE node is booted

run @OneVM boot@ on each VMs in @unknown@ state using an auto-start enabled template

search for auto-start enabled templates and for each one: if no @running@ VMs use it, run @onetemplate instanciate@

For multi-nodes

  1. on planned shutdown/reboot after an upgrade for example (init 0/init 6)
    ** if @System@ datastore is shared, just live migrate all VMs to other nodes
    ** if @System@ datastore is local to the node, run @OneVM stop@ and @OneVM resume@ to perform a “cold migration”

I'm not sure about the best thing to do on hardware failure, in fact, as I'm missing test machines for now I don't even know what ONE does in such situation.

Regards.

@rsmontero
Copy link
Member Author


Original Redmine Comment
Author Name: Daniel Dehennin (Daniel Dehennin)
Original Date: 2014-10-03T21:45:21Z


The ideal would be to have a trusted synchronous communication from the node to the frontend to report a shutdown/reboot.

My idea it to use the monitoring system with an init script on the node like the @libvirt-guest@ one:

  1. started last, stopped first
  2. push some kind of shutdown monitoring
  3. then the frontend run a node hook and apply a policy like the one I describe in my previous comment (single/multi nodes, with/without shared storage)
  4. the script must wait for some feedback of the frontend

This will not work with pull based monitoring.

Another option is to hit the frontend directly by RPC but this requires authentication/authorization of nodes on the frontend.

Is there a way for nodes to advertise the frontend of a shutdown/reboot?

@rsmontero
Copy link
Member Author


Original Redmine Comment
Author Name: Ruben S. Montero (@rsmontero)
Original Date: 2014-10-07T14:28:56Z


Is there a way for nodes to advertise the frontend of a shutdown/reboot?

In a shutdown/reboot cycle:

  1. Long enough cycle the host should transit the error -> on state so a hook can be easily triggered
  2. For quick reboots oned could not even notice the reboot, in that case we can add a probe in the monitor system. However the VMs will be moved to POWER-OFF. We can add a hook (on vm power off) to power on them if the host was rebooted. We need to add a probe for that.

So I'd suggest

  1. Add a probe with uptime information
  2. Write a hook for VM on POWER-OFF, if the VM went to power off just after the host booted restart it.

I really like this approach as it is hypervisor independent.

@rsmontero
Copy link
Member Author


Original Redmine Comment
Author Name: EOLE Team (EOLE Team)
Original Date: 2014-10-07T14:57:28Z


Ruben S. Montero wrote:

So I'd suggest

Add a probe with uptime information

Write a hook for VM on POWER-OFF, if the VM went to power off just after the host booted restart it.

Could we make a difference between VMs in @power-off@ because of the reboot and VMs in @power-off@ because user want them powered off?

I'm not sure we can blindly boot VMs after a reboot.

I really like this approach as it is hypervisor independent.

Yes, me too, even if I personally only use KVM ;-)

Regards.

@rsmontero
Copy link
Member Author


Original Redmine Comment
Author Name: Ruben S. Montero (@rsmontero)
Original Date: 2014-10-09T09:01:41Z


EOLE Team wrote:

Ruben S. Montero wrote:

So I'd suggest

Add a probe with uptime information

Write a hook for VM on POWER-OFF, if the VM went to power off just after the host booted restart it.

Could we make a difference between VMs in @power-off@ because of the reboot and VMs in @power-off@ because user want them powered off?

Yes I think we can use the REASON field of history. Simply add a new reason for automatic transitions (vs user requested). This together with the uptime of the host should be enough...

Chhers

@rsmontero
Copy link
Member Author


Original Redmine Comment
Author Name: Olivier Berger (Olivier Berger)
Original Date: 2015-08-12T14:17:46Z


FWIW, a discussion about this issue : https://forum.opennebula.org/t/automatically-restart-vms-after-host-restart/454

Any progress to expect ?

@rsmontero
Copy link
Member Author


Original Redmine Comment
Author Name: Olivier Berger (Olivier Berger)
Original Date: 2015-08-12T14:19:46Z


Olivier Berger wrote:

As discussed in http://lists.opennebula.org/pipermail/users-opennebula.org/2012-May/008959.html ...

Btw, the list archive is gone, but one can still find it on https://www.mail-archive.com/users%40lists.opennebula.org/msg06649.html

Hth

@rsmontero
Copy link
Member Author


Original Redmine Comment
Author Name: Olivier Berger (Olivier Berger)
Original Date: 2015-08-13T09:31:22Z


Would it be possible to at least have KVM VMs created as non-transient, i.e. using virsh define + virsh start instead of just virsh create so that it is possible to manually perform a virsh autostart if needed (virsh autostart won't work on transient VMs), like in https://gist.github.com/anonymous/2776202, but without line 34 ?

I'm not sure there would be any side effects, and that would be a first improvement for KVM, until a more generic solution is found ?

@rsmontero
Copy link
Member Author


Original Redmine Comment
Author Name: Ruben S. Montero (@rsmontero)
Original Date: 2015-08-14T15:00:21Z


Yes the main reason for holding this back is the side effects for all the operations. I agree the idea would be to define+start and when the VM is removed from the host it needs to be undefined. We need to review all the operations to check when we need to do the undefine, e.g. poweroff, migrations etc... now it is assumed that the VM is not defined.

@rsmontero
Copy link
Member Author


Original Redmine Comment
Author Name: EOLE Team (EOLE Team)
Original Date: 2015-11-17T13:35:54Z


Ruben S. Montero wrote:

Yes I think we can use the REASON field of history. Simply add a new reason for automatic transitions (vs user requested). This together with the uptime of the host should be enough...

Could we open a new issue for this point ?

This could solve the “host crashed” case:

  • we set a @Reason@ for any operation (automatic or user requested)
  • in case of a host crash, the VMs will be reported as @PowerOff@ without any @Reason@.

Then, we should add the possiblity to have a @HOST_HOOK@ executed when a host enter the @on@ state, in which case we list all VMs on that host in @PowerOff@ state without any @Reason@ and @resume@ them.

Regards.

@stale
Copy link

stale bot commented Mar 6, 2019

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions. The OpenNebula Dev Team

@baby-gnu
Copy link
Contributor

Thanks for keeping this in backlog 👍

@mathieumd
Copy link

Does that mean that OpenNebula does not support restarting VM after their host has been rebooted?

@Githopp192
Copy link

yes .. this is really a good an important question ..
normally you can easy configure that with "virsh autostart " -->

   autostart [--disable] domain
       Configure a domain to be automatically started at boot.

       The option --disable disables autostarting.

But this will not work on OpenNebula controlled VMs -->

error: Failed to mark domain 2 as autostarted
error: Requested operation is not valid: cannot set autostart for transient domain

@rsmontero
Copy link
Member Author

Hi,

There are 2 considerations here:

  • OpenNebula create transient domains, i.e. they are not persisted in libvirt
  • Considering a distributed system, when a host fails you'd be most probably interested in restarting the VM in other host, rather than waiting for the hypervisor to be fixed. This functionality is described here:

https://docs.opennebula.io/5.12/advanced_components/ha/ftguide.html#host-failures

Maybe, we can extend this hook to not recreate VMs in other hosts, but rather the host to be back online. This could be executed by a simple hook (restart host VMs in UKNOWN when the host go back to "online")

What do you think?

@Githopp192
Copy link

yes,Ruben .. this would probably make sense (as addition) to extend the hook, that it would be possible to NOT recreate VMs in other hosts - (i also simply thought to create a own script for that).

On the other hand, i would need such a solution as well (that when one host goes down or not working properly), than through a hook the vm would be migrated to another host.

But i never tested that .. i'm using this kind of design -->
VMs of Host1 are runing on --> host1:/home/datastores
VMs of Host2 are runing on --> host2:/home/datastores

Important data are on Host1 --> host1:/datastore_ssd

per Sunstone all DS are NFS-Shared - but only sunstone will see all (nfs-shared) DS.

What would happen by an automatic VM-Migrate hook from host1: to host2: ?
(Host1: has got all vm(os) and data on the local host)

[root@sunstone ~]# onedatastore list
ID USER GROUP NAME SIZE AVA CLUSTERS IMAGES TYPE DS TM STAT

101 oneadmin oneadmin data_hd_srv 16T 99% 0 1 img fs qcow2 on
100 oneadmin oneadmin data_ssd_srv 4T 87% 0 1 img fs qcow2 on
1 oneadmin oneadmin os_nvme_srv 4T 93% 0 5 img fs qcow2 on
0 oneadmin oneadmin vm_nvme_srv 4T 93% 0 0 sys - qcow2 on

[root@sunstone ~]# df -h

host1:/home/datastores 4.0T 297G 3.8T 8% /mnt/datastores/nfs-dsnvme-srv
host2:/home/datastores 1.0T 53G 972G 6% /mnt/datastores/nfs-dshdd-srv2
host1:/datastore_ssd 4.0T 533G 3.5T 14% /mnt/datastores/nfs-dsssd-srv
host1:/datastore_hdd 16T 216G 16T 2% /mnt/datastores/nfs-dshdd-srv

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

5 participants