Skip to content

Ansible to build a two node cluster for virtualization using drdb

License

Notifications You must be signed in to change notification settings

ppouliot/ansible-drbd_virt_cluster

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

ansible-drbd_virt_cluster

A Traditional 2 Node Linux HA Virtualization cluster using DRBD/CLVM/OCFS2/Corosync/Pacemaker with IPMI fencing on Ubuntu 18.04

Description

This is for building, and configuring a traditional two node Linux highly available KVM virtualization cluster using the following technologies to provide a managed shared storage infrastructure and highly avaiable virtual machine instances. Currently this is a work in progress. Removing any comments an running the site.yml should in theory produce the a working cluster stack. Slight moficiations may apply.

  1. DRBD - Distributed Replicated Block Device. Each node is cross connected over a 10G (eno1) interface which is configured for a /30 subnet. This will be used as the primary interface for storage replication across the DRBD nodes.

  2. DLM - Distributed Locking Manager. Criticial for cluster operations and split brain prevention.

  3. CLVM - Cluster LVM. LVM is enabled for clustering and the /dev/DRBD0 device is used as a physical in the CLVM volume group.

  4. OCFS2 - Oracle Clustered Filesystem (Version 2). OCFS2 provides a shared file system which enables each node to both read and write to the filesystem on the DRBD0 device at the same time. The O2CB cluster resource provides the necessary locking to prevent contention between nodes. The ocfs2 filesystems are mounted at /etc/libvirt/qemu for providing a shared location for virtual machine (qemu/KVM) xml configuration files, and /var/lib/libvirt/images for the qcow2 file backed virtual machines operating within the cluster.

  5. Corosync - Corosync is the component of the linux-ha stack that provides the communication mechinism for the cluster and quorum.

  6. Pacemaker - Pacemaker provides the misc cluster resource definitions for the different resources types used in the cluster.

  7. Stonith - Stonith(Shoot The Other Node In The Head). Stonith is a fencing mechnism for preventing split brain scenerios and for split brain recovery. In our case we configure the cluster to use the physical nodes IPMI interfaces to forcefully fence (reboot) the node to prevent storage corruption.

Additional Technologies Used

  1. KVM/Libvirt/QEMU - The standard KVM/Libvirt/QEMU stack is used for providing virtualization. Virtualization networking is provided by a LACP bond on network interfaces ens3 and ens4. Virtualization management and VM egress traffic is places upon a br0 which resised on top of the bonded interface (bond0). Virtual Machine deployment are configured for lights out management, meaning all VM console operations when deployed using the provided scripts are configured for serial communications. This ensure virtual machine consoles are accessable when managing the physical hypervisor hosts over ssh. (This is currently only privided for CentOS 7, Ubuntu 18.0LTS, CoreOS and RancherOS, as well as generic PXE via the create_*_vm.sh scripts here: ./files/bin ." )

  2. Munin - Munin provides a light weight monitoring infrastructure to gain performance visualization across the cluster. It is currently hosted upon a virtual ip across the cluster, the cluster resource can be migrated across both nodes should one of the nodes be need to be taken down. It's hosted via the a location config of the default nginx site configured on each hypervisor host in the cluster.

Basic Cluster Operations

  • To edit the running CRM CIB
root@virt-cl-drbd-0:~# crm configure edit
  • Check running cluster resources status (crm_mon -1)

    Removing the '-1' from the crm_mon command will keep the status running in the forground. This is useful for watching as you start the cluster resources.

root@virt-cl-drbd-0:~# crm_mon -1
Stack: corosync
Current DC: virt-cl-drbd-1 (version 1.1.18-2b07d5c5a9) - partition with quorum
Last updated: Tue Aug  7 09:47:15 2018
Last change: Tue Aug  7 09:47:13 2018 by root via cibadmin on virt-cl-drbd-0

2 nodes configured
26 resources configured

Online: [ virt-cl-drbd-0 virt-cl-drbd-1 ]

Active resources:

 p_fence_virt-cl-drbd-0	(stonith:fence_ipmilan):	Started virt-cl-drbd-1
 p_fence_virt-cl-drbd-1	(stonith:fence_ipmilan):	Started virt-cl-drbd-0
 Resource Group: g_vip_nginx
     p_virtual_ip	(ocf::heartbeat:IPaddr2):	Started virt-cl-drbd-0
     p_nginx	(ocf::heartbeat:nginx):	Started virt-cl-drbd-0
 Master/Slave Set: ms-drbd0 [p_drbd_r0]
     Masters: [ virt-cl-drbd-0 virt-cl-drbd-1 ]
 Clone Set: hasi-clone [g_hasi]
     Started: [ virt-cl-drbd-0 virt-cl-drbd-1 ]
 vm_ipam1	(ocf::heartbeat:VirtualDomain):	Started virt-cl-drbd-1
 vm_ipam2	(ocf::heartbeat:VirtualDomain):	Started virt-cl-drbd-0
 vm_jenkins	(ocf::heartbeat:VirtualDomain):	Started virt-cl-drbd-0
 vm_quartermaster	(ocf::heartbeat:VirtualDomain):	Started virt-cl-drbd-1
 vm_awx	(ocf::heartbeat:VirtualDomain):	Started virt-cl-drbd-1
 vm_puppetmaster	(ocf::heartbeat:VirtualDomain):	Started virt-cl-drbd-0
  • Check running cluster resources status (drbdadm status)
root@virt-cl-drbd-0:~# drbdadm status
r0 role:Primary
  disk:UpToDate
  virt-cl-drbd-1 role:Primary
    peer-disk:UpToDate
  • Show all configured cluster resources including thier state (crm resource show)
root@virt-cl-drbd-0:~# crm resource show
 p_fence_virt-cl-drbd-0	(stonith:fence_ipmilan):	Started
 p_fence_virt-cl-drbd-1	(stonith:fence_ipmilan):	Started
 Resource Group: g_vip_nginx
     p_virtual_ip	(ocf::heartbeat:IPaddr2):	Started
     p_nginx	(ocf::heartbeat:nginx):	Started
 Master/Slave Set: ms-drbd0 [p_drbd_r0]
     Masters: [ virt-cl-drbd-0 virt-cl-drbd-1 ]
 Clone Set: hasi-clone [g_hasi]
     Started: [ virt-cl-drbd-0 virt-cl-drbd-1 ]
 vm_ipam1	(ocf::heartbeat:VirtualDomain):	Started
 vm_ipam2	(ocf::heartbeat:VirtualDomain):	Started
 vm_jenkins	(ocf::heartbeat:VirtualDomain):	Started
 vm_quartermaster	(ocf::heartbeat:VirtualDomain):	Started
 vm_awx	(ocf::heartbeat:VirtualDomain):	Started
 vm_puppetmaster	(ocf::heartbeat:VirtualDomain):	Started

Virtal Machine Management Usefull Commands

  • List on the locally connected hypervisor
virsh list
  • List VMs on the remote hypervisor node
virsh --connect qemu+ssh://virt-cl-drbd-1/system list
  • To connect to a vm on the current hypervisor node
virsh console vm_ipam1
  • To connect to a vm on the remote hypervisor node
virsh --connect qemu+ssh://virt-cl-drbd-1/system console vm_ipam2
  • To live migrate a cluster manganaged resource
root@virt-cl-drbd-0:~# crm resource migrate vm_puppetmaster force
INFO: Move constraint created for vm_puppetmaster
  • To Automatically create a CentOS 7 vm named vm_centos with a 40gb harddrive image
root@virt-cl-drbd-0:~# ./bin/create_centos_vm.sh vm_centos 40
  • To Automatically add a CentOS 7 vm named vm_centos as a cluster managed resource after creation using the command abovea
root@virt-cl-drbd-0:~# crm configure < /etc/libvirt/qemu/vm_centos.crm
  • To Automatically create a Ubuntu 18.04LTS vm named vm_ubuntu with a 40gb harddrive image
root@virt-cl-drbd-0:~# ./bin/create_centos_vm.sh vm_centos 40
  • To Automatically add a Ubuntu 18.04LTS vm named vm_ubuntu as a cluster managed resource after creation using the command above
root@virt-cl-drbd-0:~# crm configure < /etc/libvirt/qemu/vm_ubuntu.crm
  • To Automatically create a RancherOS 1.4.0 vm named vm_ros with a 40gb harddrive image
root@virt-cl-drbd-0:~# ./bin/create_rancheros_vm.sh vm_ros 40
  • To Automatically add a RancherOS 1.4.0 vm named vm_ubuntu as a cluster managed resource after creation using the command above
root@virt-cl-drbd-0:~# crm configure < /etc/libvirt/qemu/vm_ros.crm
  • To Automatically create a CoreOS Alpha vm named vm_coreos with a 40gb harddrive image
root@virt-cl-drbd-0:~# ./bin/create_rancheros_vm.sh vm_coreos 40
  • To Automatically add a CoreOS Alpha vm named vm_coreos as a cluster managed resource after creation using the command above
root@virt-cl-drbd-0:~# crm configure < /etc/libvirt/qemu/vm_coreos.crm
  • To remove a Virtual Machine definition and delete it's associated storage
root@virt-cl-drbd-0:~# virsh undefine --remove-all-storage myvm
  • Recover from Cluster fencing loop or split brain

    • Ensure pacemaker is started and run:
    crm configure edit
    
    • Using vi commands search and replace "Started" with "Stopped"
    :s/Started/Stopped/g
    
    • Save the changes.
    • Repeat on Other node if necessary.
    • Start the ms-drbd0 resource to begin resync.
    crm resource start ms-drbd0
    
    • Watch the progress complete will change from Secondary/Secondary to Primary/Primary when complete.
    watch drbdadm status
    
    • When drbd is Primary/Primary you may bring up the rest of the cluster stack
    crm resource start hasi-clone
    crm resource start g_vip_nginx
    crm resource start p_fence_virt-cl-drbd-0
    crm resource start p_fence_virt-cl-drbd-1
    
    • When all cluster resources are succesfully started you may then start the virtual machine resources. For example:
    crm resource start vm_ipam1
    

Referrences

Linux HA Reference

Netplan

DRBD (Active/Active)

DLM

OCFS2

CLVM

Stonith

Pacemaker

LibVirt/QEMU

Dell OpenManager

Munin

Munin Architecture

Munin Architecture

Workflow

About

Ansible to build a two node cluster for virtualization using drdb

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published