Skip to content

Commit 2e5e205

Browse files
damien-lemoalgregkh
authored andcommitted
block: fix zone write plug removal
[ Upstream commit b7d4ffb ] Commit 7b29518 ("block: Do not remove zone write plugs still in use") modified disk_should_remove_zone_wplug() to add a check on the reference count of a zone write plug to prevent removing zone write plugs from a disk hash table when the plugs are still being referenced by BIOs or requests in-flight. However, this check does not take into account that a BIO completion may happen right after its submission by a zone write plug BIO work, and before the zone write plug BIO work releases the zone write plug reference count. This situation leads to disk_should_remove_zone_wplug() returning false as in this case the zone write plug reference count is at least equal to 3. If the BIO that completes in such manner transitioned the zone to the FULL condition, the zone write plug for the FULL zone will remain in the disk hash table. Furthermore, relying on a particular value of a zone write plug reference count to set the BLK_ZONE_WPLUG_UNHASHED flag is fragile as reading the atomic reference count and doing a comparison with some value is not overall atomic at all. Address these issues by reworking the reference counting of zone write plugs so that removing plugs from a disk hash table can be done directly from disk_put_zone_wplug() when the last reference on a plug is dropped. To do so, replace the function disk_remove_zone_wplug() with disk_mark_zone_wplug_dead(). This new function sets the zone write plug flag BLK_ZONE_WPLUG_DEAD (which replaces BLK_ZONE_WPLUG_UNHASHED) and drops the initial reference on the zone write plug taken when the plug was added to the disk hash table. This function is called either for zones that are empty or full, or directly in the case of a forced plug removal (e.g. when the disk hash table is being destroyed on disk removal). With this change, disk_should_remove_zone_wplug() is also removed. disk_put_zone_wplug() is modified to call the function disk_free_zone_wplug() to remove a zone write plug from a disk hash table and free the plug structure (with a call_rcu()), when the last reference on a zone write plug is dropped. disk_free_zone_wplug() always checks that the BLK_ZONE_WPLUG_DEAD flag is set. In order to avoid having multiple zone write plugs for the same zone in the disk hash table, disk_get_and_lock_zone_wplug() checked for the BLK_ZONE_WPLUG_UNHASHED flag. This check is removed and a check for the new BLK_ZONE_WPLUG_DEAD flag is added to blk_zone_wplug_handle_write(). With this change, we continue preventing adding multiple zone write plugs for the same zone and at the same time re-inforce checks on the user behavior by failing new incoming write BIOs targeting a zone that is marked as dead. This case can happen only if the user erroneously issues write BIOs to zones that are full, or to zones that are currently being reset or finished. Fixes: 7b29518 ("block: Do not remove zone write plugs still in use") Cc: stable@vger.kernel.org Signed-off-by: Damien Le Moal <dlemoal@kernel.org> Reviewed-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com> Signed-off-by: Jens Axboe <axboe@kernel.dk> [ dropped blk_zone_set_cond() and disk_zone_wplug_update_cond() calls due to missing zones_cond tracking prereq ] Signed-off-by: Sasha Levin <sashal@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
1 parent d765bba commit 2e5e205

1 file changed

Lines changed: 56 additions & 89 deletions

File tree

block/blk-zoned.c

Lines changed: 56 additions & 89 deletions
Original file line numberDiff line numberDiff line change
@@ -85,17 +85,17 @@ static inline unsigned int disk_zone_wplugs_hash_size(struct gendisk *disk)
8585
* being executed or the zone write plug bio list is not empty.
8686
* - BLK_ZONE_WPLUG_NEED_WP_UPDATE: Indicates that we lost track of a zone
8787
* write pointer offset and need to update it.
88-
* - BLK_ZONE_WPLUG_UNHASHED: Indicates that the zone write plug was removed
89-
* from the disk hash table and that the initial reference to the zone
90-
* write plug set when the plug was first added to the hash table has been
91-
* dropped. This flag is set when a zone is reset, finished or become full,
92-
* to prevent new references to the zone write plug to be taken for
93-
* newly incoming BIOs. A zone write plug flagged with this flag will be
94-
* freed once all remaining references from BIOs or functions are dropped.
88+
* - BLK_ZONE_WPLUG_DEAD: Indicates that the zone write plug will be
89+
* removed from the disk hash table of zone write plugs when the last
90+
* reference on the zone write plug is dropped. If set, this flag also
91+
* indicates that the initial extra reference on the zone write plug was
92+
* dropped, meaning that the reference count indicates the current number of
93+
* active users (code context or BIOs and requests in flight). This flag is
94+
* set when a zone is reset, finished or becomes full.
9595
*/
9696
#define BLK_ZONE_WPLUG_PLUGGED (1U << 0)
9797
#define BLK_ZONE_WPLUG_NEED_WP_UPDATE (1U << 1)
98-
#define BLK_ZONE_WPLUG_UNHASHED (1U << 2)
98+
#define BLK_ZONE_WPLUG_DEAD (1U << 2)
9999

100100
/**
101101
* blk_zone_cond_str - Return string XXX in BLK_ZONE_COND_XXX.
@@ -479,65 +479,42 @@ static void disk_free_zone_wplug_rcu(struct rcu_head *rcu_head)
479479
mempool_free(zwplug, zwplug->disk->zone_wplugs_pool);
480480
}
481481

482-
static inline void disk_put_zone_wplug(struct blk_zone_wplug *zwplug)
482+
static void disk_free_zone_wplug(struct blk_zone_wplug *zwplug)
483483
{
484-
if (refcount_dec_and_test(&zwplug->ref)) {
485-
WARN_ON_ONCE(!bio_list_empty(&zwplug->bio_list));
486-
WARN_ON_ONCE(zwplug->flags & BLK_ZONE_WPLUG_PLUGGED);
487-
WARN_ON_ONCE(!(zwplug->flags & BLK_ZONE_WPLUG_UNHASHED));
488-
489-
call_rcu(&zwplug->rcu_head, disk_free_zone_wplug_rcu);
490-
}
491-
}
492-
493-
static inline bool disk_should_remove_zone_wplug(struct gendisk *disk,
494-
struct blk_zone_wplug *zwplug)
495-
{
496-
/* If the zone write plug was already removed, we are done. */
497-
if (zwplug->flags & BLK_ZONE_WPLUG_UNHASHED)
498-
return false;
484+
struct gendisk *disk = zwplug->disk;
485+
unsigned long flags;
499486

500-
/* If the zone write plug is still plugged, it cannot be removed. */
501-
if (zwplug->flags & BLK_ZONE_WPLUG_PLUGGED)
502-
return false;
487+
WARN_ON_ONCE(!(zwplug->flags & BLK_ZONE_WPLUG_DEAD));
488+
WARN_ON_ONCE(zwplug->flags & BLK_ZONE_WPLUG_PLUGGED);
489+
WARN_ON_ONCE(!bio_list_empty(&zwplug->bio_list));
503490

504-
/*
505-
* Completions of BIOs with blk_zone_write_plug_bio_endio() may
506-
* happen after handling a request completion with
507-
* blk_zone_write_plug_finish_request() (e.g. with split BIOs
508-
* that are chained). In such case, disk_zone_wplug_unplug_bio()
509-
* should not attempt to remove the zone write plug until all BIO
510-
* completions are seen. Check by looking at the zone write plug
511-
* reference count, which is 2 when the plug is unused (one reference
512-
* taken when the plug was allocated and another reference taken by the
513-
* caller context).
514-
*/
515-
if (refcount_read(&zwplug->ref) > 2)
516-
return false;
491+
spin_lock_irqsave(&disk->zone_wplugs_lock, flags);
492+
hlist_del_init_rcu(&zwplug->node);
493+
atomic_dec(&disk->nr_zone_wplugs);
494+
spin_unlock_irqrestore(&disk->zone_wplugs_lock, flags);
517495

518-
/* We can remove zone write plugs for zones that are empty or full. */
519-
return !zwplug->wp_offset || disk_zone_wplug_is_full(disk, zwplug);
496+
call_rcu(&zwplug->rcu_head, disk_free_zone_wplug_rcu);
520497
}
521498

522-
static void disk_remove_zone_wplug(struct gendisk *disk,
523-
struct blk_zone_wplug *zwplug)
499+
static inline void disk_put_zone_wplug(struct blk_zone_wplug *zwplug)
524500
{
525-
unsigned long flags;
501+
if (refcount_dec_and_test(&zwplug->ref))
502+
disk_free_zone_wplug(zwplug);
503+
}
526504

527-
/* If the zone write plug was already removed, we have nothing to do. */
528-
if (zwplug->flags & BLK_ZONE_WPLUG_UNHASHED)
529-
return;
505+
/*
506+
* Flag the zone write plug as dead and drop the initial reference we got when
507+
* the zone write plug was added to the hash table. The zone write plug will be
508+
* unhashed when its last reference is dropped.
509+
*/
510+
static void disk_mark_zone_wplug_dead(struct blk_zone_wplug *zwplug)
511+
{
512+
lockdep_assert_held(&zwplug->lock);
530513

531-
/*
532-
* Mark the zone write plug as unhashed and drop the extra reference we
533-
* took when the plug was inserted in the hash table.
534-
*/
535-
zwplug->flags |= BLK_ZONE_WPLUG_UNHASHED;
536-
spin_lock_irqsave(&disk->zone_wplugs_lock, flags);
537-
hlist_del_init_rcu(&zwplug->node);
538-
atomic_dec(&disk->nr_zone_wplugs);
539-
spin_unlock_irqrestore(&disk->zone_wplugs_lock, flags);
540-
disk_put_zone_wplug(zwplug);
514+
if (!(zwplug->flags & BLK_ZONE_WPLUG_DEAD)) {
515+
zwplug->flags |= BLK_ZONE_WPLUG_DEAD;
516+
disk_put_zone_wplug(zwplug);
517+
}
541518
}
542519

543520
static void blk_zone_wplug_bio_work(struct work_struct *work);
@@ -557,18 +534,7 @@ static struct blk_zone_wplug *disk_get_and_lock_zone_wplug(struct gendisk *disk,
557534
again:
558535
zwplug = disk_get_zone_wplug(disk, sector);
559536
if (zwplug) {
560-
/*
561-
* Check that a BIO completion or a zone reset or finish
562-
* operation has not already removed the zone write plug from
563-
* the hash table and dropped its reference count. In such case,
564-
* we need to get a new plug so start over from the beginning.
565-
*/
566537
spin_lock_irqsave(&zwplug->lock, *flags);
567-
if (zwplug->flags & BLK_ZONE_WPLUG_UNHASHED) {
568-
spin_unlock_irqrestore(&zwplug->lock, *flags);
569-
disk_put_zone_wplug(zwplug);
570-
goto again;
571-
}
572538
return zwplug;
573539
}
574540

@@ -654,14 +620,8 @@ static void disk_zone_wplug_set_wp_offset(struct gendisk *disk,
654620
zwplug->flags &= ~BLK_ZONE_WPLUG_NEED_WP_UPDATE;
655621
zwplug->wp_offset = wp_offset;
656622
disk_zone_wplug_abort(zwplug);
657-
658-
/*
659-
* The zone write plug now has no BIO plugged: remove it from the
660-
* hash table so that it cannot be seen. The plug will be freed
661-
* when the last reference is dropped.
662-
*/
663-
if (disk_should_remove_zone_wplug(disk, zwplug))
664-
disk_remove_zone_wplug(disk, zwplug);
623+
if (!zwplug->wp_offset || disk_zone_wplug_is_full(disk, zwplug))
624+
disk_mark_zone_wplug_dead(zwplug);
665625
}
666626

667627
static unsigned int blk_zone_wp_offset(struct blk_zone *zone)
@@ -1076,6 +1036,19 @@ static bool blk_zone_wplug_handle_write(struct bio *bio, unsigned int nr_segs)
10761036
return true;
10771037
}
10781038

1039+
/*
1040+
* If we got a zone write plug marked as dead, then the user is issuing
1041+
* writes to a full zone, or without synchronizing with zone reset or
1042+
* zone finish operations. In such case, fail the BIO to signal this
1043+
* invalid usage.
1044+
*/
1045+
if (zwplug->flags & BLK_ZONE_WPLUG_DEAD) {
1046+
spin_unlock_irqrestore(&zwplug->lock, flags);
1047+
disk_put_zone_wplug(zwplug);
1048+
bio_io_error(bio);
1049+
return true;
1050+
}
1051+
10791052
/* Indicate that this BIO is being handled using zone write plugging. */
10801053
bio_set_flag(bio, BIO_ZONE_WRITE_PLUGGING);
10811054

@@ -1144,7 +1117,7 @@ static void blk_zone_wplug_handle_native_zone_append(struct bio *bio)
11441117
disk->disk_name, zwplug->zone_no);
11451118
disk_zone_wplug_abort(zwplug);
11461119
}
1147-
disk_remove_zone_wplug(disk, zwplug);
1120+
disk_mark_zone_wplug_dead(zwplug);
11481121
spin_unlock_irqrestore(&zwplug->lock, flags);
11491122

11501123
disk_put_zone_wplug(zwplug);
@@ -1249,14 +1222,8 @@ static void disk_zone_wplug_unplug_bio(struct gendisk *disk,
12491222
}
12501223

12511224
zwplug->flags &= ~BLK_ZONE_WPLUG_PLUGGED;
1252-
1253-
/*
1254-
* If the zone is full (it was fully written or finished, or empty
1255-
* (it was reset), remove its zone write plug from the hash table.
1256-
*/
1257-
if (disk_should_remove_zone_wplug(disk, zwplug))
1258-
disk_remove_zone_wplug(disk, zwplug);
1259-
1225+
if (!zwplug->wp_offset || disk_zone_wplug_is_full(disk, zwplug))
1226+
disk_mark_zone_wplug_dead(zwplug);
12601227
spin_unlock_irqrestore(&zwplug->lock, flags);
12611228
}
12621229

@@ -1450,9 +1417,9 @@ static void disk_destroy_zone_wplugs_hash_table(struct gendisk *disk)
14501417
while (!hlist_empty(&disk->zone_wplugs_hash[i])) {
14511418
zwplug = hlist_entry(disk->zone_wplugs_hash[i].first,
14521419
struct blk_zone_wplug, node);
1453-
refcount_inc(&zwplug->ref);
1454-
disk_remove_zone_wplug(disk, zwplug);
1455-
disk_put_zone_wplug(zwplug);
1420+
spin_lock_irq(&zwplug->lock);
1421+
disk_mark_zone_wplug_dead(zwplug);
1422+
spin_unlock_irq(&zwplug->lock);
14561423
}
14571424
}
14581425

0 commit comments

Comments
 (0)