Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

"zpool online -e" doesn't work without running export + partprobe +import in the VM #7582

Closed
gordan-bobic opened this issue May 31, 2018 · 3 comments

Comments

@gordan-bobic
Copy link
Contributor

gordan-bobic commented May 31, 2018

Latest CentOS (7.5) kernel and ZoL 0.7.9, KVM hypervisor

To reproduce:
Create a VM with 50GB disk2/vdb, start the vm and create pool on it.

On hypervisor:

# lvresize -L 100G /dev/vms/testvm-disk2 
  Size of logical volume vms/testvm-disk2 changed from 50.00 GiB (12800 extents) to 100.00 GiB (25600 extents).
  Logical volume vms/testvm-disk2 successfully resized.
# virsh blockresize --path /dev/vms/testvm-disk2 --size 100G testvm
Block device '/dev/vms/testvm-disk2' is resized

On guest:

# dmesg | tail -2
[3368803.667305] virtio_blk virtio3: new size: 209715200 512-byte logical blocks (107 GB/100 GiB)
[3368803.667316] vdb: detected capacity change from 53687091200 to 107374182400
# zpool set autoexpand=on data
# zpool online -e data /dev/vdb
# zpool list
NAME   SIZE  ALLOC   FREE  EXPANDSZ   FRAG    CAP  DEDUP  HEALTH  ALTROOT
data  49.9G  47.4G  2.4G         -      -    95%  1.00x  ONLINE  -

Exporting the pool, running partprobe, then re-importing the pool, then doing zpool online -e data /dev/vdb does work, but having to introduce downtime to extend the disk is far from ideal.

@DeHackEd
Copy link
Contributor

Sounds like a Linux kernel behaviour where a live partition can't be resized by re-reading the partition table while said partition is in use. Hence the export.

@gordan-bobic
Copy link
Contributor Author

Perhaps, but it works for LVM volumes without unmounting and partprobing, so it should also be achievable with ZFS.

@shartse
Copy link
Contributor

shartse commented May 31, 2018

autoexpand currently doesn't work at all. However, you should be able to manually expand the vdev without exporting the pool.

I'm currently working on a fix that has zpool reopen detect the new expandsize. Then zfs will know about the new space and you can run zpool online -e to use it: #7546.

As for the way things work now, here's a hacky workaround: Run partprobe on your device (it should give the warning about new space). Then zpool online -e <pool> <dev> followed by partprobe again allows zfs to detect the new expandsz available for the the vdev (visible through zpool list -v). To add the space to the vdev run zpool online -e

behlendorf pushed a commit that referenced this issue May 31, 2018
Update bdev_capacity to have wholedisk vdevs query the
size of the underlying block device (correcting for the size
of the efi parition and partition alignment) and therefore detect
expanded space.

Correct vdev_get_stats_ex so that the expandsize is aligned
to metaslab size and new space is only reported if it is large
enough for a new metaslab.

Reviewed by: Don Brady <don.brady@delphix.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed by: George Wilson <george.wilson@delphix.com>
Reviewed-by: Matthew Ahrens <mahrens@delphix.com>
Reviewed by: John Wren Kennedy <jwk404@gmail.com>
Signed-off-by: sara hartse <sara.hartse@delphix.com>
External-issue: LX-165
Closes #7546 
Issue #7582
behlendorf added a commit to behlendorf/zfs that referenced this issue Jun 13, 2018
While the autoexpand property may seem like a small feature it
depends on a significant amount of system infrastructure.  Enough
of that infrastructure is now in place with a few customizations
for Linux the autoexpand property for whole disk configurations
can be supported.

Autoexpand works as follows; when a block device is resized a
change event is generated by udev with the DISK_MEDIA_CHANGE key.
The ZED, which is monitoring udev events detects the event for
disks (but not partitions) and hands it off to zfs_deliver_dle().
The zfs_deliver_dle() function appends the exected whole disk
partition suffix, and if the partition can be matched against
a known pool vdev it re-opens it.

Re-opening the vdev with trigger a re-reading of the partition
table so the maximum possible expansion size can be reported.
Next if the property autoexpand is set to "on" a vdev expansion
will be attempted.  After performing some sanity checks on the
disk to verify it's safe to expand the ZFS partition (-part1) it
will be expanded an the partition table updated.  The partition
is then re-opened again to detect the updated size which allows
the new capacity to be used.

Added PHYS_PATH="/dev/zvol/dataset" to vdev configuration for
ZFS volumes.  This was required for the test cases which test
expansion by layering a new pool on top of ZFS volumes.

Enable the zpool_expand_001_pos and /zpool_expand_003_pos
test cases which excercise the autoexpand property.

Fixed zfs_zevent_wait() signal handling which could result
in the ZED spinning when a signal was not handled.

Removed vdev_disk_rrpart() functionality which can be abandoned
in favour of re-opening the device which trigger a re-read of
the partition table as long no other partitions are in use.
This will be true as long as we're working with hole disks.
As a bonus this allows us to remove to Linux kernel API checks.

Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Issue openzfs#120
Issue openzfs#2437
Issue openzfs#5771
Issue openzfs#7582
behlendorf added a commit to behlendorf/zfs that referenced this issue Jun 13, 2018
While the autoexpand property may seem like a small feature it
depends on a significant amount of system infrastructure.  Enough
of that infrastructure is now in place with a few customizations
for Linux the autoexpand property for whole disk configurations
can be supported.

Autoexpand works as follows; when a block device is resized a
change event is generated by udev with the DISK_MEDIA_CHANGE key.
The ZED, which is monitoring udev events detects the event for
disks (but not partitions) and hands it off to zfs_deliver_dle().
The zfs_deliver_dle() function appends the exected whole disk
partition suffix, and if the partition can be matched against
a known pool vdev it re-opens it.

Re-opening the vdev with trigger a re-reading of the partition
table so the maximum possible expansion size can be reported.
Next if the property autoexpand is set to "on" a vdev expansion
will be attempted.  After performing some sanity checks on the
disk to verify it's safe to expand the ZFS partition (-part1) it
will be expanded an the partition table updated.  The partition
is then re-opened again to detect the updated size which allows
the new capacity to be used.

Added PHYS_PATH="/dev/zvol/dataset" to vdev configuration for
ZFS volumes.  This was required for the test cases which test
expansion by layering a new pool on top of ZFS volumes.

Enable the zpool_expand_001_pos and /zpool_expand_003_pos
test cases which excercise the autoexpand property.

Fixed zfs_zevent_wait() signal handling which could result
in the ZED spinning when a signal was not handled.

Removed vdev_disk_rrpart() functionality which can be abandoned
in favour of re-opening the device which trigger a re-read of
the partition table as long no other partitions are in use.
This will be true as long as we're working with hole disks.
As a bonus this allows us to remove to Linux kernel API checks.

Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Issue openzfs#120
Issue openzfs#2437
Issue openzfs#5771
Issue openzfs#7582
behlendorf added a commit to behlendorf/zfs that referenced this issue Jun 14, 2018
While the autoexpand property may seem like a small feature it
depends on a significant amount of system infrastructure.  Enough
of that infrastructure is now in place with a few customizations
for Linux the autoexpand property for whole disk configurations
can be supported.

Autoexpand works as follows; when a block device is resized a
change event is generated by udev with the DISK_MEDIA_CHANGE key.
The ZED, which is monitoring udev events detects the event for
disks (but not partitions) and hands it off to zfs_deliver_dle().
The zfs_deliver_dle() function appends the exected whole disk
partition suffix, and if the partition can be matched against
a known pool vdev it re-opens it.

Re-opening the vdev with trigger a re-reading of the partition
table so the maximum possible expansion size can be reported.
Next if the property autoexpand is set to "on" a vdev expansion
will be attempted.  After performing some sanity checks on the
disk to verify it's safe to expand the ZFS partition (-part1) it
will be expanded an the partition table updated.  The partition
is then re-opened again to detect the updated size which allows
the new capacity to be used.

Added PHYS_PATH="/dev/zvol/dataset" to vdev configuration for
ZFS volumes.  This was required for the test cases which test
expansion by layering a new pool on top of ZFS volumes.

Enable the zpool_expand_001_pos and /zpool_expand_003_pos
test cases which excercise the autoexpand property.

Fixed zfs_zevent_wait() signal handling which could result
in the ZED spinning when a signal was not handled.

Removed vdev_disk_rrpart() functionality which can be abandoned
in favour of re-opening the device which trigger a re-read of
the partition table as long no other partitions are in use.
This will be true as long as we're working with hole disks.
As a bonus this allows us to remove to Linux kernel API checks.

Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Issue openzfs#120
Issue openzfs#2437
Issue openzfs#5771
Issue openzfs#7582
behlendorf added a commit to behlendorf/zfs that referenced this issue Jun 15, 2018
While the autoexpand property may seem like a small feature it
depends on a significant amount of system infrastructure.  Enough
of that infrastructure is now in place with a few customizations
for Linux the autoexpand property for whole disk configurations
can be supported.

Autoexpand works as follows; when a block device is resized a
change event is generated by udev with the DISK_MEDIA_CHANGE key.
The ZED, which is monitoring udev events detects the event for
disks (but not partitions) and hands it off to zfs_deliver_dle().
The zfs_deliver_dle() function appends the exected whole disk
partition suffix, and if the partition can be matched against
a known pool vdev it re-opens it.

Re-opening the vdev with trigger a re-reading of the partition
table so the maximum possible expansion size can be reported.
Next if the property autoexpand is set to "on" a vdev expansion
will be attempted.  After performing some sanity checks on the
disk to verify it's safe to expand the ZFS partition (-part1) it
will be expanded an the partition table updated.  The partition
is then re-opened again to detect the updated size which allows
the new capacity to be used.

Added PHYS_PATH="/dev/zvol/dataset" to vdev configuration for
ZFS volumes.  This was required for the test cases which test
expansion by layering a new pool on top of ZFS volumes.

Enable the zpool_expand_001_pos and /zpool_expand_003_pos
test cases which excercise the autoexpand property.

Fixed zfs_zevent_wait() signal handling which could result
in the ZED spinning when a signal was not handled.

Removed vdev_disk_rrpart() functionality which can be abandoned
in favour of re-opening the device which trigger a re-read of
the partition table as long no other partitions are in use.
This will be true as long as we're working with hole disks.
As a bonus this allows us to remove to Linux kernel API checks.

Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Issue openzfs#120
Issue openzfs#2437
Issue openzfs#5771
Issue openzfs#7582
behlendorf added a commit to behlendorf/zfs that referenced this issue Jun 20, 2018
While the autoexpand property may seem like a small feature it
depends on a significant amount of system infrastructure.  Enough
of that infrastructure is now in place with a few customizations
for Linux the autoexpand property for whole disk configurations
can be supported.

Autoexpand works as follows; when a block device is resized a
change event is generated by udev with the DISK_MEDIA_CHANGE key.
The ZED, which is monitoring udev events detects the event for
disks (but not partitions) and hands it off to zfs_deliver_dle().
The zfs_deliver_dle() function appends the exected whole disk
partition suffix, and if the partition can be matched against
a known pool vdev it re-opens it.

Re-opening the vdev with trigger a re-reading of the partition
table so the maximum possible expansion size can be reported.
Next if the property autoexpand is set to "on" a vdev expansion
will be attempted.  After performing some sanity checks on the
disk to verify it's safe to expand the ZFS partition (-part1) it
will be expanded an the partition table updated.  The partition
is then re-opened again to detect the updated size which allows
the new capacity to be used.

Added PHYS_PATH="/dev/zvol/dataset" to vdev configuration for
ZFS volumes.  This was required for the test cases which test
expansion by layering a new pool on top of ZFS volumes.

Enable the zpool_expand_001_pos and /zpool_expand_003_pos
test cases which excercise the autoexpand property.

Fixed zfs_zevent_wait() signal handling which could result
in the ZED spinning when a signal was not handled.

Removed vdev_disk_rrpart() functionality which can be abandoned
in favour of re-opening the device which trigger a re-read of
the partition table as long no other partitions are in use.
This will be true as long as we're working with hole disks.
As a bonus this allows us to remove to Linux kernel API checks.

Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Issue openzfs#120
Issue openzfs#2437
Issue openzfs#5771
Issue openzfs#7582
behlendorf added a commit to behlendorf/zfs that referenced this issue Jun 28, 2018
While the autoexpand property may seem like a small feature it
depends on a significant amount of system infrastructure.  Enough
of that infrastructure is now in place with a few modifications
for Linux it can be supported.

Auto-expand works as follows; when a block device is modified
(re-sized, closed after being open r/w, etc) a change uevent is
generated for udev.  The ZED, which is monitoring udev events,
passes the change event along to zfs_deliver_dle() if the disk
or partition contains a zfs_member as identified by blkid.

From here the device is matched against all imported pool vdevs
using the vdev_guid which was read from the label by blkid.  If
a match is found the ZED reopens the pool vdev.  This re-opening
is important because it allows the vdev to be briefly closed so
the disk partition table can be re-read.  Otherwise, it wouldn't
be possible to report thee maximum possible expansion size.

Finally, if the property autoexpand=on a vdev expansion will be
attempted.  After performing some sanity checks on the disk to
verify that it is safe to expand,  the primary partition (-part1)
will be expanded and the partition table updated.  The partition
is then re-opened (again) to detect the updated size which allows
the new capacity to be used.

In order to make all of the above possible the following changes
were required:

* Updated the zpool_expand_001_pos and zpool_expand_003_pos tests.
  These tests now create a pool which is layered on a loopback,
  scsi_debug, and file vdev.  This allows for testing of non-
  partitioned block device (loopback), a partition block device
  (scsi_debug), and a file which does not receive udev change
  events.  This provided for better test coverage, and by removing
  the layering on ZFS volumes there issues surrounding layering
  one pool on another are avoided.

* zpool_find_vdev_by_physpath() updated to accept a vdev guid.
  This allows for matching by guid rather than path which is a
  more reliable way for the ZED to reference a vdev.

* Fixed zfs_zevent_wait() signal handling which could result
  in the ZED spinning when a signal was not handled.

* Removed vdev_disk_rrpart() functionality which can be abandoned
  in favor of kernel provided blkdev_reread_part() function.

* Added a rwlock which is held as a writer while a disk is being
  reopened.  This is important to prevent errors from occurring
  for any configuration related IOs which bypass the SCL_ZIO lock.
  The zpool_reopen_007_pos.ksh test case was added to verify IO
  error are never observed when reopening.  This is not expected
  to impact IO performance.

Additional fixes which aren't critical but were discovered and
resolved in the course of developing this functionality.

* Added PHYS_PATH="/dev/zvol/dataset" to the vdev configuration for
  ZFS volumes.  This is as good as a unique physical path, while the
  volumes are not used in the test cases anymore for other reasons
  this improvement was included.

Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Issue openzfs#120
Issue openzfs#2437
Issue openzfs#5771
Issue openzfs#7366
Issue openzfs#7582
behlendorf added a commit to behlendorf/zfs that referenced this issue Jun 28, 2018
While the autoexpand property may seem like a small feature it
depends on a significant amount of system infrastructure.  Enough
of that infrastructure is now in place with a few modifications
for Linux it can be supported.

Auto-expand works as follows; when a block device is modified
(re-sized, closed after being open r/w, etc) a change uevent is
generated for udev.  The ZED, which is monitoring udev events,
passes the change event along to zfs_deliver_dle() if the disk
or partition contains a zfs_member as identified by blkid.

From here the device is matched against all imported pool vdevs
using the vdev_guid which was read from the label by blkid.  If
a match is found the ZED reopens the pool vdev.  This re-opening
is important because it allows the vdev to be briefly closed so
the disk partition table can be re-read.  Otherwise, it wouldn't
be possible to report thee maximum possible expansion size.

Finally, if the property autoexpand=on a vdev expansion will be
attempted.  After performing some sanity checks on the disk to
verify that it is safe to expand,  the primary partition (-part1)
will be expanded and the partition table updated.  The partition
is then re-opened (again) to detect the updated size which allows
the new capacity to be used.

In order to make all of the above possible the following changes
were required:

* Updated the zpool_expand_001_pos and zpool_expand_003_pos tests.
  These tests now create a pool which is layered on a loopback,
  scsi_debug, and file vdev.  This allows for testing of non-
  partitioned block device (loopback), a partition block device
  (scsi_debug), and a file which does not receive udev change
  events.  This provided for better test coverage, and by removing
  the layering on ZFS volumes there issues surrounding layering
  one pool on another are avoided.

* zpool_find_vdev_by_physpath() updated to accept a vdev guid.
  This allows for matching by guid rather than path which is a
  more reliable way for the ZED to reference a vdev.

* Fixed zfs_zevent_wait() signal handling which could result
  in the ZED spinning when a signal was not handled.

* Removed vdev_disk_rrpart() functionality which can be abandoned
  in favor of kernel provided blkdev_reread_part() function.

* Added a rwlock which is held as a writer while a disk is being
  reopened.  This is important to prevent errors from occurring
  for any configuration related IOs which bypass the SCL_ZIO lock.
  The zpool_reopen_007_pos.ksh test case was added to verify IO
  error are never observed when reopening.  This is not expected
  to impact IO performance.

Additional fixes which aren't critical but were discovered and
resolved in the course of developing this functionality.

* Added PHYS_PATH="/dev/zvol/dataset" to the vdev configuration for
  ZFS volumes.  This is as good as a unique physical path, while the
  volumes are not used in the test cases anymore for other reasons
  this improvement was included.

Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Issue openzfs#120
Issue openzfs#2437
Issue openzfs#5771
Issue openzfs#7366
Issue openzfs#7582
behlendorf added a commit to behlendorf/zfs that referenced this issue Jun 29, 2018
While the autoexpand property may seem like a small feature it
depends on a significant amount of system infrastructure.  Enough
of that infrastructure is now in place with a few modifications
for Linux it can be supported.

Auto-expand works as follows; when a block device is modified
(re-sized, closed after being open r/w, etc) a change uevent is
generated for udev.  The ZED, which is monitoring udev events,
passes the change event along to zfs_deliver_dle() if the disk
or partition contains a zfs_member as identified by blkid.

From here the device is matched against all imported pool vdevs
using the vdev_guid which was read from the label by blkid.  If
a match is found the ZED reopens the pool vdev.  This re-opening
is important because it allows the vdev to be briefly closed so
the disk partition table can be re-read.  Otherwise, it wouldn't
be possible to report thee maximum possible expansion size.

Finally, if the property autoexpand=on a vdev expansion will be
attempted.  After performing some sanity checks on the disk to
verify that it is safe to expand,  the primary partition (-part1)
will be expanded and the partition table updated.  The partition
is then re-opened (again) to detect the updated size which allows
the new capacity to be used.

In order to make all of the above possible the following changes
were required:

* Updated the zpool_expand_001_pos and zpool_expand_003_pos tests.
  These tests now create a pool which is layered on a loopback,
  scsi_debug, and file vdev.  This allows for testing of non-
  partitioned block device (loopback), a partition block device
  (scsi_debug), and a file which does not receive udev change
  events.  This provided for better test coverage, and by removing
  the layering on ZFS volumes there issues surrounding layering
  one pool on another are avoided.

* zpool_find_vdev_by_physpath() updated to accept a vdev guid.
  This allows for matching by guid rather than path which is a
  more reliable way for the ZED to reference a vdev.

* Fixed zfs_zevent_wait() signal handling which could result
  in the ZED spinning when a signal was not handled.

* Removed vdev_disk_rrpart() functionality which can be abandoned
  in favor of kernel provided blkdev_reread_part() function.

* Added a rwlock which is held as a writer while a disk is being
  reopened.  This is important to prevent errors from occurring
  for any configuration related IOs which bypass the SCL_ZIO lock.
  The zpool_reopen_007_pos.ksh test case was added to verify IO
  error are never observed when reopening.  This is not expected
  to impact IO performance.

Additional fixes which aren't critical but were discovered and
resolved in the course of developing this functionality.

* Added PHYS_PATH="/dev/zvol/dataset" to the vdev configuration for
  ZFS volumes.  This is as good as a unique physical path, while the
  volumes are not used in the test cases anymore for other reasons
  this improvement was included.

Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Issue openzfs#120
Issue openzfs#2437
Issue openzfs#5771
Issue openzfs#7366
Issue openzfs#7582
behlendorf added a commit to behlendorf/zfs that referenced this issue Jul 9, 2018
While the autoexpand property may seem like a small feature it
depends on a significant amount of system infrastructure.  Enough
of that infrastructure is now in place with a few modifications
for Linux it can be supported.

Auto-expand works as follows; when a block device is modified
(re-sized, closed after being open r/w, etc) a change uevent is
generated for udev.  The ZED, which is monitoring udev events,
passes the change event along to zfs_deliver_dle() if the disk
or partition contains a zfs_member as identified by blkid.

From here the device is matched against all imported pool vdevs
using the vdev_guid which was read from the label by blkid.  If
a match is found the ZED reopens the pool vdev.  This re-opening
is important because it allows the vdev to be briefly closed so
the disk partition table can be re-read.  Otherwise, it wouldn't
be possible to report thee maximum possible expansion size.

Finally, if the property autoexpand=on a vdev expansion will be
attempted.  After performing some sanity checks on the disk to
verify that it is safe to expand,  the primary partition (-part1)
will be expanded and the partition table updated.  The partition
is then re-opened (again) to detect the updated size which allows
the new capacity to be used.

In order to make all of the above possible the following changes
were required:

* Updated the zpool_expand_001_pos and zpool_expand_003_pos tests.
  These tests now create a pool which is layered on a loopback,
  scsi_debug, and file vdev.  This allows for testing of non-
  partitioned block device (loopback), a partition block device
  (scsi_debug), and a file which does not receive udev change
  events.  This provided for better test coverage, and by removing
  the layering on ZFS volumes there issues surrounding layering
  one pool on another are avoided.

* zpool_find_vdev_by_physpath() updated to accept a vdev guid.
  This allows for matching by guid rather than path which is a
  more reliable way for the ZED to reference a vdev.

* Fixed zfs_zevent_wait() signal handling which could result
  in the ZED spinning when a signal was not handled.

* Removed vdev_disk_rrpart() functionality which can be abandoned
  in favor of kernel provided blkdev_reread_part() function.

* Added a rwlock which is held as a writer while a disk is being
  reopened.  This is important to prevent errors from occurring
  for any configuration related IOs which bypass the SCL_ZIO lock.
  The zpool_reopen_007_pos.ksh test case was added to verify IO
  error are never observed when reopening.  This is not expected
  to impact IO performance.

Additional fixes which aren't critical but were discovered and
resolved in the course of developing this functionality.

* Added PHYS_PATH="/dev/zvol/dataset" to the vdev configuration for
  ZFS volumes.  This is as good as a unique physical path, while the
  volumes are not used in the test cases anymore for other reasons
  this improvement was included.

Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Issue openzfs#120
Issue openzfs#2437
Issue openzfs#5771
Issue openzfs#7366
Issue openzfs#7582
behlendorf added a commit to behlendorf/zfs that referenced this issue Jul 12, 2018
While the autoexpand property may seem like a small feature it
depends on a significant amount of system infrastructure.  Enough
of that infrastructure is now in place with a few modifications
for Linux it can be supported.

Auto-expand works as follows; when a block device is modified
(re-sized, closed after being open r/w, etc) a change uevent is
generated for udev.  The ZED, which is monitoring udev events,
passes the change event along to zfs_deliver_dle() if the disk
or partition contains a zfs_member as identified by blkid.

From here the device is matched against all imported pool vdevs
using the vdev_guid which was read from the label by blkid.  If
a match is found the ZED reopens the pool vdev.  This re-opening
is important because it allows the vdev to be briefly closed so
the disk partition table can be re-read.  Otherwise, it wouldn't
be possible to report thee maximum possible expansion size.

Finally, if the property autoexpand=on a vdev expansion will be
attempted.  After performing some sanity checks on the disk to
verify that it is safe to expand,  the primary partition (-part1)
will be expanded and the partition table updated.  The partition
is then re-opened (again) to detect the updated size which allows
the new capacity to be used.

In order to make all of the above possible the following changes
were required:

* Updated the zpool_expand_001_pos and zpool_expand_003_pos tests.
  These tests now create a pool which is layered on a loopback,
  scsi_debug, and file vdev.  This allows for testing of non-
  partitioned block device (loopback), a partition block device
  (scsi_debug), and a file which does not receive udev change
  events.  This provided for better test coverage, and by removing
  the layering on ZFS volumes there issues surrounding layering
  one pool on another are avoided.

* zpool_find_vdev_by_physpath() updated to accept a vdev guid.
  This allows for matching by guid rather than path which is a
  more reliable way for the ZED to reference a vdev.

* Fixed zfs_zevent_wait() signal handling which could result
  in the ZED spinning when a signal was not handled.

* Removed vdev_disk_rrpart() functionality which can be abandoned
  in favor of kernel provided blkdev_reread_part() function.

* Added a rwlock which is held as a writer while a disk is being
  reopened.  This is important to prevent errors from occurring
  for any configuration related IOs which bypass the SCL_ZIO lock.
  The zpool_reopen_007_pos.ksh test case was added to verify IO
  error are never observed when reopening.  This is not expected
  to impact IO performance.

Additional fixes which aren't critical but were discovered and
resolved in the course of developing this functionality.

* Added PHYS_PATH="/dev/zvol/dataset" to the vdev configuration for
  ZFS volumes.  This is as good as a unique physical path, while the
  volumes are not used in the test cases anymore for other reasons
  this improvement was included.

Signed-off-by: Sara Hartse <sara.hartse@delphix.com>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Issue openzfs#120
Issue openzfs#2437
Issue openzfs#5771
Issue openzfs#7366
Issue openzfs#7582
behlendorf added a commit to behlendorf/zfs that referenced this issue Jul 13, 2018
While the autoexpand property may seem like a small feature it
depends on a significant amount of system infrastructure.  Enough
of that infrastructure is now in place with a few modifications
for Linux it can be supported.

Auto-expand works as follows; when a block device is modified
(re-sized, closed after being open r/w, etc) a change uevent is
generated for udev.  The ZED, which is monitoring udev events,
passes the change event along to zfs_deliver_dle() if the disk
or partition contains a zfs_member as identified by blkid.

From here the device is matched against all imported pool vdevs
using the vdev_guid which was read from the label by blkid.  If
a match is found the ZED reopens the pool vdev.  This re-opening
is important because it allows the vdev to be briefly closed so
the disk partition table can be re-read.  Otherwise, it wouldn't
be possible to report thee maximum possible expansion size.

Finally, if the property autoexpand=on a vdev expansion will be
attempted.  After performing some sanity checks on the disk to
verify that it is safe to expand,  the primary partition (-part1)
will be expanded and the partition table updated.  The partition
is then re-opened (again) to detect the updated size which allows
the new capacity to be used.

In order to make all of the above possible the following changes
were required:

* Updated the zpool_expand_001_pos and zpool_expand_003_pos tests.
  These tests now create a pool which is layered on a loopback,
  scsi_debug, and file vdev.  This allows for testing of non-
  partitioned block device (loopback), a partition block device
  (scsi_debug), and a file which does not receive udev change
  events.  This provided for better test coverage, and by removing
  the layering on ZFS volumes there issues surrounding layering
  one pool on another are avoided.

* zpool_find_vdev_by_physpath() updated to accept a vdev guid.
  This allows for matching by guid rather than path which is a
  more reliable way for the ZED to reference a vdev.

* Fixed zfs_zevent_wait() signal handling which could result
  in the ZED spinning when a signal was not handled.

* Removed vdev_disk_rrpart() functionality which can be abandoned
  in favor of kernel provided blkdev_reread_part() function.

* Added a rwlock which is held as a writer while a disk is being
  reopened.  This is important to prevent errors from occurring
  for any configuration related IOs which bypass the SCL_ZIO lock.
  The zpool_reopen_007_pos.ksh test case was added to verify IO
  error are never observed when reopening.  This is not expected
  to impact IO performance.

Additional fixes which aren't critical but were discovered and
resolved in the course of developing this functionality.

* Added PHYS_PATH="/dev/zvol/dataset" to the vdev configuration for
  ZFS volumes.  This is as good as a unique physical path, while the
  volumes are not used in the test cases anymore for other reasons
  this improvement was included.

Signed-off-by: Sara Hartse <sara.hartse@delphix.com>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Issue openzfs#120
Issue openzfs#2437
Issue openzfs#5771
Issue openzfs#7366
Issue openzfs#7582
tonyhutter pushed a commit to tonyhutter/zfs that referenced this issue Aug 15, 2018
Update bdev_capacity to have wholedisk vdevs query the
size of the underlying block device (correcting for the size
of the efi parition and partition alignment) and therefore detect
expanded space.

Correct vdev_get_stats_ex so that the expandsize is aligned
to metaslab size and new space is only reported if it is large
enough for a new metaslab.

Reviewed by: Don Brady <don.brady@delphix.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed by: George Wilson <george.wilson@delphix.com>
Reviewed-by: Matthew Ahrens <mahrens@delphix.com>
Reviewed by: John Wren Kennedy <jwk404@gmail.com>
Signed-off-by: sara hartse <sara.hartse@delphix.com>
External-issue: LX-165
Closes openzfs#7546
Issue openzfs#7582
tonyhutter pushed a commit to tonyhutter/zfs that referenced this issue Aug 23, 2018
Update bdev_capacity to have wholedisk vdevs query the
size of the underlying block device (correcting for the size
of the efi parition and partition alignment) and therefore detect
expanded space.

Correct vdev_get_stats_ex so that the expandsize is aligned
to metaslab size and new space is only reported if it is large
enough for a new metaslab.

Reviewed by: Don Brady <don.brady@delphix.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed by: George Wilson <george.wilson@delphix.com>
Reviewed-by: Matthew Ahrens <mahrens@delphix.com>
Reviewed by: John Wren Kennedy <jwk404@gmail.com>
Signed-off-by: sara hartse <sara.hartse@delphix.com>
External-issue: LX-165
Closes openzfs#7546
Issue openzfs#7582
tonyhutter pushed a commit to tonyhutter/zfs that referenced this issue Sep 5, 2018
Update bdev_capacity to have wholedisk vdevs query the
size of the underlying block device (correcting for the size
of the efi parition and partition alignment) and therefore detect
expanded space.

Correct vdev_get_stats_ex so that the expandsize is aligned
to metaslab size and new space is only reported if it is large
enough for a new metaslab.

Reviewed by: Don Brady <don.brady@delphix.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed by: George Wilson <george.wilson@delphix.com>
Reviewed-by: Matthew Ahrens <mahrens@delphix.com>
Reviewed by: John Wren Kennedy <jwk404@gmail.com>
Signed-off-by: sara hartse <sara.hartse@delphix.com>
External-issue: LX-165
Closes openzfs#7546
Issue openzfs#7582
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants