Permalink
Browse files

LVM agents: Deactivate remotely active LVs before attempting EX activ…

…ation

These changes address rhbz729812.  When a clustered volume group is part of
an HA LVM setup (an active/passive installation), the LVs that are activated
as part of the 'clvmd' startup may cause problems with the initialization of
HA LVM.  This is because HA LVM requires that logical volumes be active
exclusively.  If a remote machine activates the LVs via the 'clvmd' init
script before the local machine runs the service manager, it will be
impossible to activate the LVs exclusively.

Should the exclusive activation fail, the solution is to first attempt to
deactivate the LVs cluster-wide (unless the LVs are open - in which case
a deactivation would obviously fail).  If the deactivation is successful,
then another attempt can be made to activate exclusively.

This isn't a perfect solution.  It is still possible for yet another machine
to attempt a 'clvmd' activation between the time we deactivate and try again
for the exclusive activation.  This does substantially close the gap for
problems though.  If this solution proves to be insufficient, then 'lvs' or
some other command will need to be given the capabilities to report whether
an LV is active remotely or not.  This would give us the ability to distiguish
between a activation that fails because '--partial' was not used and an
exclusive activation that failed because another machine has the LV open
remotely.
  • Loading branch information...
1 parent aed6066 commit 5419c920cf892e7582e12585f0b115a2bab97d34 @jbrassow committed Oct 8, 2012
Showing with 68 additions and 11 deletions.
  1. +40 −10 rgmanager/src/resources/lvm_by_lv.sh
  2. +28 −1 rgmanager/src/resources/lvm_by_vg.sh
View
50 rgmanager/src/resources/lvm_by_lv.sh
@@ -365,23 +365,53 @@ lv_activate()
function lv_start_clustered
{
- if ! lvchange -aey $OCF_RESKEY_vg_name/$OCF_RESKEY_lv_name; then
- ocf_log err "Failed to activate logical volume, $OCF_RESKEY_vg_name/$OCF_RESKEY_lv_name"
- ocf_log notice "Attempting cleanup of $OCF_RESKEY_vg_name/$OCF_RESKEY_lv_name"
+ if lvchange -aey $OCF_RESKEY_vg_name/$OCF_RESKEY_lv_name; then
+ return $OCF_SUCCESS
+ fi
- if ! lvconvert --repair --use-policies $OCF_RESKEY_vg_name/$OCF_RESKEY_lv_name; then
- ocf_log err "Failed to cleanup $OCF_RESKEY_vg_name/$OCF_RESKEY_lv_name"
+ # FAILED exclusive activation:
+ # This can be caused by an LV being active remotely.
+ # Before attempting a repair effort, we should attempt
+ # to deactivate the LV cluster-wide; but only if the LV
+ # is not open. Otherwise, it is senseless to attempt.
+ if ! [[ "$(lvs -o attr --noheadings $OCF_RESKEY_vg_name/$OCF_RESKEY_lv_name)" =~ ....ao ]]; then
+ # We'll wait a small amount of time for some settling before
+ # attempting to deactivate. Then the deactivate will be
+ # immediately followed by another exclusive activation attempt.
+ sleep 5
+ if ! lvchange -an $OCF_RESKEY_vg_name/$OCF_RESKEY_lv_name; then
+ # Someone could have the device open.
+ # We can't do anything about that.
+ ocf_log err "Unable to perform required deactivation of $OCF_RESKEY_vg_name/$OCF_RESKEY_lv_name before starting"
return $OCF_ERR_GENERIC
fi
- if ! lvchange -aey $OCF_RESKEY_vg_name/$OCF_RESKEY_lv_name; then
- ocf_log err "Failed second attempt to activate $OCF_RESKEY_vg_name/$OCF_RESKEY_lv_name"
- return $OCF_ERR_GENERIC
+ if lvchange -aey $OCF_RESKEY_vg_name/$OCF_RESKEY_lv_name; then
+ # Second attempt after deactivation was successful, we now
+ # have the lock exclusively
+ return $OCF_SUCCESS
fi
+ fi
- ocf_log notice "Second attempt to activate $OCF_RESKEY_vg_name/$OCF_RESKEY_lv_name successful"
- return $OCF_SUCCESS
+ # Failed to activate:
+ # This could be due to a device failure (or another machine could
+ # have snuck in between the deactivation/activation). We don't yet
+ # have a mechanism to check for remote activation, so we will proceed
+ # with repair action.
+ ocf_log err "Failed to activate logical volume, $OCF_RESKEY_vg_name/$OCF_RESKEY_lv_name"
+ ocf_log notice "Attempting cleanup of $OCF_RESKEY_vg_name/$OCF_RESKEY_lv_name"
+
+ if ! lvconvert --repair --use-policies $OCF_RESKEY_vg_name/$OCF_RESKEY_lv_name; then
+ ocf_log err "Failed to cleanup $OCF_RESKEY_vg_name/$OCF_RESKEY_lv_name"
+ return $OCF_ERR_GENERIC
+ fi
+
+ if ! lvchange -aey $OCF_RESKEY_vg_name/$OCF_RESKEY_lv_name; then
+ ocf_log err "Failed second attempt to activate $OCF_RESKEY_vg_name/$OCF_RESKEY_lv_name"
+ return $OCF_ERR_GENERIC
fi
+
+ ocf_log notice "Second attempt to activate $OCF_RESKEY_vg_name/$OCF_RESKEY_lv_name successful"
return $OCF_SUCCESS
}
View
29 rgmanager/src/resources/lvm_by_vg.sh
@@ -194,10 +194,37 @@ function vg_start_clustered
local results
local all_pvs
local resilience
+ local try_again=false
ocf_log info "Starting volume group, $OCF_RESKEY_vg_name"
if ! vgchange -aey $OCF_RESKEY_vg_name; then
+ try_again=true
+
+ # Failure to activate:
+ # This could be caused by a remotely active LV. Before
+ # attempting any repair of the VG, we will first attempt
+ # to deactivate the VG cluster-wide.
+ # We must check for open LVs though, since these cannot
+ # be deactivated. We have no choice but to go one-by-one.
+
+ # Allow for some settling
+ sleep 5
+
+ results=(`lvs -o name,attr --noheadings $OCF_RESKEY_vg_name 2> /dev/null`)
+ a=0
+ while [ ! -z "${results[$a]}" ]; do
+ if [[ ! ${results[$(($a + 1))]} =~ ....ao ]]; then
+ if ! lvchange -an $OCF_RESKEY_vg_name/${results[$a]}; then
+ ocf_log err "Unable to perform required deactivation of $OCF_RESKEY_vg_name before starting"
+ return $OCF_ERR_GENERIC
+ fi
+ fi
+ a=$(($a + 2))
+ done
+ fi
+
+ if try_again && ! vgchange -aey $OCF_RESKEY_vg_name; then
ocf_log err "Failed to activate volume group, $OCF_RESKEY_vg_name"
ocf_log notice "Attempting cleanup of $OCF_RESKEY_vg_name"
@@ -218,7 +245,7 @@ function vg_start_clustered
# Make sure all the logical volumes are active
results=(`lvs -o name,attr --noheadings 2> /dev/null $OCF_RESKEY_vg_name`)
a=0
- while [ ! -z ${results[$a]} ]; do
+ while [ ! -z "${results[$a]}" ]; do
if [[ ! ${results[$(($a + 1))]} =~ ....a. ]]; then
all_pvs=(`pvs --noheadings -o name 2> /dev/null`)
resilience=" --config devices{filter=["

0 comments on commit 5419c92

Please sign in to comment.