feat: Added support for creating shared LVM setups #388

japokorn · 2023-10-02T10:27:13Z

Enhancement:
Support for creating shared VGs

Reason:
Requested by GFS2

Result:

shared LVM setup needs lvmlockd service with dlm lock manager to be running
to test this change ha_cluster system role is used to set up degenerated cluster on localhost
requires blivet version with shared LVM setup support (Add support for creating shared LVM setups storaged-project/blivet#1123)

tests/test-verify-pool.yml

tests/tests_lvm_pool_shared.yml

tests/collection-requirements.yml

codecov · 2023-10-03T07:27:36Z

Codecov Report

Attention: 6 lines in your changes are missing coverage. Please review.

Comparison is base (c4147d2) 13.67% compared to head (7d8b953) 13.65%.
Report is 15 commits behind head on main.

Files	Patch %	Lines
library/blivet.py	0.00%	6 Missing ⚠️

Additional details and impacted files

@@            Coverage Diff             @@
##             main     #388      +/-   ##
==========================================
- Coverage   13.67%   13.65%   -0.03%     
==========================================
  Files           8        8              
  Lines        1733     1736       +3     
  Branches       79       79              
==========================================
  Hits          237      237              
- Misses       1496     1499       +3

Flag	Coverage Δ
sanity	`16.54% <ø> (ø)`

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

tests/tests_lvm_pool_shared.yml

richm · 2023-10-11T20:43:15Z

ping - any updates?

README.md

japokorn · 2023-10-12T15:09:31Z

New blivet version has been released today. I have incorporated all suggestions into the code and uncommented the test.

tests/tests_lvm_pool_shared.yml

richm · 2023-11-09T20:13:09Z

With the suggested fixes, I can run the test up until here on centos-9:

TASK [linux-system-roles.storage : Manage the pools and volumes to match the specified state] ***
task path: /home/rmeggins/linux-system-roles/storage/tests/roles/linux-system-roles.storage/tasks/main-blivet.yml:73
Thursday 09 November 2023  12:58:52 -0700 (0:00:00.017)       0:03:16.483 ***** 
fatal: [/home/rmeggins/.cache/linux-system-roles/centos-9.qcow2]: FAILED! => {
    "actions": [],
    "changed": false,
    "crypts": [],
    "leaves": [],
    "mounts": [],
    "packages": [],
    "pools": [],
    "volumes": []
}
MSG:

failed to set up pool 'vg1': __init__() got an unexpected keyword argument 'shared'

    def _create(self):
        if not self._device:
            members = self._manage_encryption(self._create_members())
            try:
                pool_device = self._blivet.new_vg(name=self._pool['name'], parents=members, shared=self._pool['shared'])
            except Exception as e:
                raise BlivetAnsibleError("failed to set up pool '%s': %s" % (self._pool['name'], str(e)))

what version of blivet has the support for shared? Is it in centos9 yet?

japokorn · 2023-11-15T13:01:20Z

what version of blivet has the support for shared? Is it in centos9 yet?

I have added the switch that skips the test if needed based on blivet version as per vtrefny #388 (comment)

richm · 2023-11-15T15:33:02Z

tests/tests_lvm_pool_shared.yml

+      meta: end_host
+      when: inventory_hostname == "localhost"
+
+    - name: Gather package facts


Suggested change

- name: Gather package facts

- name: Run the role to install blivet

include_role:

name: linux-system-roles.storage

vars:

storage_pools: []

storage_volumes: []

- name: Gather package facts

otherwise, the task Set blivet package name fails, as blivet is not installed.
Are you able to run this test locally on your laptop?

I moved the initial storage role (which installs blivet) run before the check.
While running the test I also noticed that in HA cluster role test_setup.yml there is this task:

- name: Set node name to 'localhost' for single-node clusters set_fact: inventory_hostname: localhost # noqa: var-naming when: ansible_play_hosts_all | length == 1

I am not sure what its purpose is in the role but it messed up the test when I tried to run it on single remote node. I replaced it with task that changes inventory name from 'localhost' to '127.0.0.1' and it seems to do the trick.

- feature requested by GFS2 - adds support for creating shared VGs - shared LVM setup needs lvmlockd service with dlm lock manager to be running - to test this change ha_cluster system role is used to set up degenerated cluster on localhost - the test will be skipped if run locally due to an issue with underlying services - requires blivet version with shared LVM setup support (storaged-project/blivet#1123)

richm · 2023-11-29T02:34:28Z

ok - but - is there some platform that has the correct version of blivet? Alternately - if you have some copr blivet build that you are using, can you attach the log output from running the test with the right version of blivet?

japokorn · 2023-11-29T10:16:06Z

ok - but - is there some platform that has the correct version of blivet? Alternately - if you have some copr blivet build that you are using, can you attach the log output from running the test with the right version of blivet?

I am running the test (not skipped) on Fedora 38 with the latest blivet package (python3-blivet-3.8.2-99.20231127115915812391.3.9.devel.64.gfc7f3fc5.fc38.noarch)

richm · 2023-11-29T14:40:30Z

[citest]

richm · 2023-11-29T23:45:23Z

library/blivet.py

@@ -1527,7 +1527,7 @@ def _create(self):
        if not self._device:
            members = self._manage_encryption(self._create_members())
            try:
-                pool_device = self._blivet.new_vg(name=self._pool['name'], parents=members)
+                pool_device = self._blivet.new_vg(name=self._pool['name'], parents=members, shared=self._pool['shared'])


Need some sort of logic here to avoid using the shared parameter if not supported

if self._blivet.new_vg supports 'shared' parameter: pool_device = self._blivet.new_vg(name=self._pool['name'], parents=members, shared=self._pool['shared']) else: pool_device = self._blivet.new_vg(name=self._pool['name'], parents=members)

There's probably some way to use introspection to see if the new_vg method supports shared

or some other way to dynamically construct the new_vg arguments e.g. new_vg_args = {} the pass like new_vg(**new_vg_args)

This is what is causing some of the test failures

I have modified the condition so the shared parameter is not used when its value is default (false). This should fix the tests.

Older versions of blivet do not support 'shared' parameter. This resulted in failures in tests unrelated to shared VGs. This change fixes that behavior as well as fixes minor condition error in a test.

japokorn · 2023-11-30T14:11:10Z

[citest]

richm · 2023-12-06T18:28:53Z

Looks like fedora 39 has the right version of blivet. When I try your latest like this:
tox -e qemu-ansible-core-2.15 -- --image-name fedora-39 --log-level debug -- tests/tests_lvm_pool_shared.yml
I get this error:

TASK [fedora.linux_system_roles.ha_cluster : Create a corosync.conf file content using pcs-0.10] ***
...
fatal: [/home/rmeggins/.cache/linux-system-roles/fedora-39.qcow2]: FAILED! => {
    "changed": true,
    "cmd": [
        "pcs",
        "cluster",
        "setup",
        "--corosync_conf",
        "/tmp/ansible.cjhl1_x4_ha_cluster_corosync_conf",
        "--overwrite",
        "--no-cluster-uuid",
        "--",
        "rhel9-1node",
        "/home/rmeggins/.cache/linux-system-roles/fedora-39.qcow2"
    ],
    "delta": "0:00:01.327931",
    "end": "2023-12-06 18:25:28.852939",
    "rc": 1,
    "start": "2023-12-06 18:25:27.525008"
}

STDERR:

Warning: Unable to read the known-hosts file: No such file or directory: '/var/lib/pcsd/known-hosts'
No addresses specified for host '/home/rmeggins/.cache/linux-system-roles/fedora-39.qcow2', using '/home/rmeggins/.cache/linux-system-roles/fedora-39.qcow2'
Error: Unable to resolve addresses: '/home/rmeggins/.cache/linux-system-roles/fedora-39.qcow2', use --force to override
Error: Errors have occurred, therefore pcs is unable to continue

The problem is that runqemu uses the file name of the qcow2 file as the hostname.

richm · 2023-12-06T20:06:27Z

If I add this to the test:

    - name: Set up test environment for the ha_cluster role
      include_role:
        name: fedora.linux_system_roles.ha_cluster
        tasks_from: test_setup.yml

    - name: Create cluster
...

Then I get much farther, until here:

    - name: >-
        Create a disk device; specify disks as non-list mounted on
        {{ mount_location }}

...

TASK [linux-system-roles.storage : Manage the pools and volumes to match the specified state] ***
...
fatal: [/home/rmeggins/.cache/linux-system-roles/fedora-39.qcow2]: FAILED! => {
...
MSG:

Failed to commit changes to disk: Process reported exit code 3:   Using a shared lock type requires lvmlockd (lvm.conf use_lvmlockd.)
  Run `vgcreate --help' for more information.

I guess somewhere in the blivet module or blivet library it manages lvm.conf?

I think we need to change https://github.com/linux-system-roles/ha_cluster/blob/main/tasks/test_setup.yml#L9 to make it more generally applicable.

- name: Set node name to 'localhost' for single-node clusters
  set_fact:
    inventory_hostname: localhost  # noqa: var-naming
  when: ansible_play_hosts_all | length == 1

@tomjelinek @spetrosi I think the intention of this code is - "If inventory_hostname is not resolvable (i.e. is a qcow2 path as used by tox -e qemu, or is some sort of hostaliases like sut as used by baseos ci), then use localhost as it will always be resolvable". The problem is the test "is hostname resolvable" is not easy to do, and even with getent hosts $name, you don't know if the user provided $name as some sort of alias that actually resolved to a real hostname that is incorrect. In Jan's case, he is using an external managed host (not a local qcow2 image file) which has a real, resolvable hostname and IP address that he wants to use. I think we need to introduce a flag like ha_cluster_test_use_given_hostname:

- name: Set node name to 'localhost' for single-node clusters
  set_fact:
    inventory_hostname: localhost  # noqa: var-naming
  when:
    - ansible_play_hosts_all | length == 1
    - not ha_cluster_test_use_given_hostname | d(false)

Then

all tox -e qemu tests, baseos ci, and downstream automated tests will work
Jan can provide -e ha_cluster_test_use_given_hostname=true or otherwise provide this parameter in his inventory when running his tests e.g.

tox -e qemu-ansible-core-2.15 -- --image-name fedora-39 --log-level debug -e ha_cluster_test_use_given_hostname=true -- tests/tests_lvm_pool_shared.yml

wdyt?

tomjelinek · 2023-12-07T14:10:26Z

@richm You got the intention absolutely right.

Adding the proposed flag works for me. It would be nice if it can be tested (@japokorn ?) before merging it in the ha_cluster role. And a comment explaining the flag is meant for other roles and thus must be kept in place even though it's not used anywhere in ha_cluster role would be helpful. Feel free to open a PR after testing or let me know to do it myself.

richm · 2023-12-07T16:53:00Z

@tomjelinek there's also an issue with lvmlockd - man lvmlockd

USAGE
   Initial set up
       Setting up LVM to use lvmlockd and a shared VG for the first time includes some one time set up steps:

   1. choose a lock manager
       dlm
       If  dlm  (or  corosync)  are already being used by other cluster software, then select dlm.  dlm uses corosync which requires addi‐
       tional configuration beyond the scope of this document.  See corosync and dlm documentation for instructions on configuration,  set
       up and usage.

how to choose the lock manager? What additional configuration is required by corosync and dlm? Seems like this is something we need to add to the ha_cluster role.

   2. configure hosts to use lvmlockd
       On all hosts running lvmlockd, configure lvm.conf:
       use_lvmlockd = 1

@japokorn where/how is this done? seems like something the storage role/blivet should do?

   3. start lvmlockd
       Start the lvmlockd daemon.
       Use systemctl, a cluster resource agent, or run directly, e.g.
       systemctl start lvmlockd

this seems like something the ha_cluster role should do after it installs lvm2-lockd and dlm.

4. start lock manager
...
       dlm
       Start the dlm and corosync daemons.
       Use systemctl, a cluster resource agent, or run directly, e.g.
       systemctl start corosync dlm

This also seems like something the ha_cluster role should do.

   5. create VG on shared devices
       vgcreate --shared <vgname> <devices>

the storage role does this

   6. start VG on all hosts
       vgchange --lock-start

       Shared VGs must be started before they are used.  Starting the VG performs lock manager initialization that is necessary  to  begin
       using locks (i.e.  creating and joining a lockspace).  Starting the VG may take some time, and until the start completes the VG may
       not be modified or activated.

@japokorn this seems like something the storage role should do?

   7. create and activate LVs
       Standard lvcreate and lvchange commands are used to create and activate LVs in a shared VG.

This also seems like something the storage role should do

   Normal start up and shut down
       After initial set up, start up and shut down include the following steps.  They can be performed directly or may be automated using
       systemd or a cluster resource manager/agents.

       • start lvmlockd
       • start lock manager
       • vgchange --lock-start
       • activate LVs in shared VGs

@tomjelinek this says ". . . may be automated using systemd or a cluster resource manager/agents." - is this something that the ha_cluster role can configure the cluster resource manager/agents to do?

tomjelinek · 2023-12-11T16:12:57Z

how to choose the lock manager?

Well, the documentation says that dlm should be used if corosync is in use. HA cluster uses corosync.

What additional configuration is required by corosync and dlm? Seems like this is something we need to add to the ha_cluster role.

I'm not aware of any configuration options in corosync related to dlm. And I'm not aware of any required dlm configuration, just run with the defaults.

"... may be automated using systemd or a cluster resource manager/agents." - is this something that the ha_cluster role can configure the cluster resource manager/agents to do?

It means: create cluster resources. So you just need to instruct the ha_cluster role to create the appropriate resources, ocf:pacemaker:controld and ocf:heartbeat:lvmlockd.

richm · 2023-12-11T16:22:58Z

@tomjelinek afaict the test is setting the appropriate parameters/resources - https://github.com/linux-system-roles/storage/pull/388/files#diff-2892843b9952fe8a2e8f5867b7f5092369acfd8ae20990b1689a366c01b1584cR68-R82

Then maybe the reason it is working in Jan's testing is because he has a "real" hostname and a real IP address, but in the baseos ci and local qemu testing, the inventory_hostname is fake?

tomjelinek · 2023-12-12T11:33:49Z

@richm Yes, the variables look good. I have verified that the cluster is able to start dlm and lvmlockd resources with no issues with such settings, if it uses a real node name. If the cluster is set up with the 'localhost' node, dlm times out on start. I'm not sure why that happens. I already tried debugging this back in October but I was unable to get any useful info from dlm debug logs.

richm reviewed Oct 2, 2023

View reviewed changes

tests/test-verify-pool.yml Outdated Show resolved Hide resolved

richm reviewed Oct 2, 2023

View reviewed changes

tests/test-verify-pool.yml Show resolved Hide resolved

richm reviewed Oct 2, 2023

View reviewed changes

tests/tests_lvm_pool_shared.yml Outdated Show resolved Hide resolved

richm reviewed Oct 2, 2023

View reviewed changes

tests/tests_lvm_pool_shared.yml Outdated Show resolved Hide resolved

richm reviewed Oct 2, 2023

View reviewed changes

tests/collection-requirements.yml Outdated Show resolved Hide resolved

japokorn force-pushed the main-shared_vg_support branch from 468e363 to 592f745 Compare October 3, 2023 07:26

richm reviewed Oct 3, 2023

View reviewed changes

tests/tests_lvm_pool_shared.yml Outdated Show resolved Hide resolved

richm reviewed Oct 3, 2023

View reviewed changes

tests/tests_lvm_pool_shared.yml Outdated Show resolved Hide resolved

richm reviewed Oct 3, 2023

View reviewed changes

tests/tests_lvm_pool_shared.yml Outdated Show resolved Hide resolved

richm reviewed Oct 3, 2023

View reviewed changes

tests/tests_lvm_pool_shared.yml Outdated Show resolved Hide resolved

richm reviewed Oct 3, 2023

View reviewed changes

tests/tests_lvm_pool_shared.yml Outdated Show resolved Hide resolved

richm reviewed Oct 3, 2023

View reviewed changes

tests/tests_lvm_pool_shared.yml Outdated Show resolved Hide resolved

spetrosi reviewed Oct 12, 2023

View reviewed changes

README.md Show resolved Hide resolved

japokorn force-pushed the main-shared_vg_support branch from 592f745 to a6ff2fe Compare October 12, 2023 15:07

japokorn force-pushed the main-shared_vg_support branch 3 times, most recently from 9e201d3 to 8dc721a Compare October 12, 2023 15:25

japokorn marked this pull request as ready for review October 12, 2023 15:55

richm reviewed Oct 12, 2023

View reviewed changes

tests/tests_lvm_pool_shared.yml Outdated Show resolved Hide resolved

richm reviewed Oct 12, 2023

View reviewed changes

tests/tests_lvm_pool_shared.yml Outdated Show resolved Hide resolved

richm reviewed Oct 12, 2023

View reviewed changes

tests/tests_lvm_pool_shared.yml Outdated Show resolved Hide resolved

richm reviewed Oct 12, 2023

View reviewed changes

tests/tests_lvm_pool_shared.yml Outdated Show resolved Hide resolved

richm reviewed Oct 12, 2023

View reviewed changes

tests/tests_lvm_pool_shared.yml Outdated Show resolved Hide resolved

richm reviewed Oct 12, 2023

View reviewed changes

tests/tests_lvm_pool_shared.yml Outdated Show resolved Hide resolved

japokorn force-pushed the main-shared_vg_support branch from 8dc721a to 988c73f Compare October 12, 2023 18:54

richm reviewed Oct 12, 2023

View reviewed changes

tests/tests_lvm_pool_shared.yml Outdated Show resolved Hide resolved

richm reviewed Oct 12, 2023

View reviewed changes

tests/tests_lvm_pool_shared.yml Outdated Show resolved Hide resolved

richm reviewed Nov 9, 2023

View reviewed changes

tests/tests_lvm_pool_shared.yml Outdated Show resolved Hide resolved

richm reviewed Nov 9, 2023

View reviewed changes

tests/tests_lvm_pool_shared.yml Show resolved Hide resolved

richm reviewed Nov 9, 2023

View reviewed changes

tests/tests_lvm_pool_shared.yml Outdated Show resolved Hide resolved

richm reviewed Nov 9, 2023

View reviewed changes

tests/tests_lvm_pool_shared.yml Outdated Show resolved Hide resolved

richm reviewed Nov 9, 2023

View reviewed changes

tests/tests_lvm_pool_shared.yml Outdated Show resolved Hide resolved

japokorn force-pushed the main-shared_vg_support branch 3 times, most recently from d6181b4 to 7003e18 Compare November 15, 2023 12:57

richm reviewed Nov 15, 2023

View reviewed changes

japokorn force-pushed the main-shared_vg_support branch from 7003e18 to 0ed33f2 Compare November 28, 2023 18:12

japokorn force-pushed the main-shared_vg_support branch from 0ed33f2 to 79b1520 Compare November 28, 2023 18:21

richm reviewed Nov 29, 2023

View reviewed changes

Added blivet backwards compatibility for shared VGs

7d8b953

Older versions of blivet do not support 'shared' parameter. This resulted in failures in tests unrelated to shared VGs. This change fixes that behavior as well as fixes minor condition error in a test.

richm approved these changes Dec 12, 2023

View reviewed changes

richm merged commit eec6543 into linux-system-roles:main Dec 12, 2023
17 of 19 checks passed

richm mentioned this pull request Jan 8, 2024

RFE: Basic support for creating shared logical volumes #341

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: Added support for creating shared LVM setups #388

feat: Added support for creating shared LVM setups #388

japokorn commented Oct 2, 2023 •

edited

codecov bot commented Oct 3, 2023 •

edited

richm commented Oct 11, 2023

japokorn commented Oct 12, 2023

richm commented Nov 9, 2023

japokorn commented Nov 15, 2023

richm Nov 15, 2023

richm Nov 15, 2023 •

edited

japokorn Nov 28, 2023

richm commented Nov 29, 2023

japokorn commented Nov 29, 2023

richm commented Nov 29, 2023

richm Nov 29, 2023

japokorn Nov 30, 2023

japokorn commented Nov 30, 2023

richm commented Dec 6, 2023

richm commented Dec 6, 2023

tomjelinek commented Dec 7, 2023

richm commented Dec 7, 2023

tomjelinek commented Dec 11, 2023

richm commented Dec 11, 2023

tomjelinek commented Dec 12, 2023

-    - name: Gather package facts
+    - name: Run the role to install blivet
+      include_role:
+        name: linux-system-roles.storage
+      vars:
+        storage_pools: []
+        storage_volumes: []
+    - name: Gather package facts

feat: Added support for creating shared LVM setups #388

feat: Added support for creating shared LVM setups #388

Conversation

japokorn commented Oct 2, 2023 • edited

codecov bot commented Oct 3, 2023 • edited

Codecov Report

richm commented Oct 11, 2023

japokorn commented Oct 12, 2023

richm commented Nov 9, 2023

japokorn commented Nov 15, 2023

richm Nov 15, 2023

Choose a reason for hiding this comment

richm Nov 15, 2023 • edited

Choose a reason for hiding this comment

japokorn Nov 28, 2023

Choose a reason for hiding this comment

richm commented Nov 29, 2023

japokorn commented Nov 29, 2023

richm commented Nov 29, 2023

richm Nov 29, 2023

Choose a reason for hiding this comment

japokorn Nov 30, 2023

Choose a reason for hiding this comment

japokorn commented Nov 30, 2023

richm commented Dec 6, 2023

richm commented Dec 6, 2023

tomjelinek commented Dec 7, 2023

richm commented Dec 7, 2023

tomjelinek commented Dec 11, 2023

richm commented Dec 11, 2023

tomjelinek commented Dec 12, 2023

japokorn commented Oct 2, 2023 •

edited

codecov bot commented Oct 3, 2023 •

edited

richm Nov 15, 2023 •

edited