Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
38 commits
Select commit Hold shift + click to select a range
2251588
reiserfs: Replace 1-element array with C99 style flex-array
Aug 21, 2023
a6e414a
perf tools: Update copy of libbpf's hashmap.c
acmel Sep 11, 2023
f787596
tools headers UAPI: Sync files changed by new fchmodat2 and map_shado…
acmel Sep 11, 2023
417ecb6
tools headers UAPI: Copy seccomp.h to be able to build 'perf bench' i…
acmel Sep 13, 2023
678ddf7
perf bench sched-seccomp-notify: Use the tools copy of seccomp.h UAPI
acmel Sep 13, 2023
15ca354
tools arch x86: Sync the msr-index.h copy with the kernel sources
acmel Sep 13, 2023
c2122b6
tools headers UAPI: Update tools's copy of drm.h headers
acmel Sep 13, 2023
4a73fca
perf bpf-prologue: Remove unused file
captain5050 Sep 13, 2023
33b725c
perf trace: Avoid compile error wrt redefining bool
captain5050 Sep 13, 2023
d1bac78
perf jevents metric: Fix type of strcmp_cpuid_str
captain5050 Sep 14, 2023
eaaebb0
perf pmu: Ensure all alias variables are initialized
captain5050 Sep 14, 2023
e47749f
perf jevent: fix core dump on software events on s390
Sep 13, 2023
8ed99af
selftests/user_events: Fix to unmount tracefs when test created mount
beaubelgrave Sep 15, 2023
a682821
workqueue: Removed double allocation of wq_update_pod_attrs_buf
rostedt Sep 5, 2023
dd64c87
workqueue: Fix missed pwq_release_worker creation in wq_cpu_intensive…
Sep 11, 2023
8287474
direct_write_fallback(): on error revert the ->ki_pos update from buf…
Sep 13, 2023
db7fcc8
aio: Annotate struct kioctx_table with __counted_by
kees Sep 15, 2023
be049c3
fs-writeback: do not requeue a clean inode having skipped pages
Sep 16, 2023
ae81711
fs/pipe: remove duplicate "offset" initializer
MaxKellermann Sep 19, 2023
2ba0dd6
porting: document new block device opening order
brauner Sep 15, 2023
060e6c7
porting: document superblock as block device holder
brauner Sep 15, 2023
2ed45c0
btrfs: fix race when refilling delayed refs block reserve
fdmanana Sep 8, 2023
a7ddeeb
btrfs: prevent transaction block reserve underflow when starting tran…
fdmanana Sep 8, 2023
1bf76df
btrfs: return -EUCLEAN for delayed tree ref with a ref count not equa…
fdmanana Sep 8, 2023
d2f79e6
btrfs: remove redundant BUG_ON() from __btrfs_inc_extent_ref()
fdmanana Sep 8, 2023
8ec0a4a
btrfs: log message if extent item not found when running delayed exte…
fdmanana Sep 8, 2023
58bfe2c
btrfs: properly report 0 avail for very full file systems
josefbacik Sep 18, 2023
74ee791
btrfs: reset destination buffer when read_extent_buffer() gets invali…
adam900710 Sep 19, 2023
20218df
btrfs: make sure to initialize start and len in find_free_dev_extent
josefbacik Sep 5, 2023
b4c639f
btrfs: initialize start_slot in btrfs_log_prealloc_extents
josefbacik Sep 5, 2023
2d1b3bb
ovl: disable IOCB_DIO_CALLER_COMP
axboe Sep 25, 2023
493c719
ntfs3: put resources during ntfs_fill_super()
brauner Sep 25, 2023
03dbab3
overlayfs: set ctime when setting mtime and atime
jtlayton Sep 13, 2023
5c519bc
Merge tag 'perf-tools-fixes-for-v6.6-1-2023-09-25' of git://git.kerne…
torvalds Sep 26, 2023
84422ae
Merge tag 'v6.6-rc4.vfs.fixes' of gitolite.kernel.org:pub/scm/linux/k…
torvalds Sep 26, 2023
50768a4
Merge tag 'linux-kselftest-fixes-6.6-rc4' of git://git.kernel.org/pub…
torvalds Sep 26, 2023
cac405a
Merge tag 'for-6.6-rc3-tag' of git://git.kernel.org/pub/scm/linux/ker…
torvalds Sep 26, 2023
0e94513
Merge tag 'wq-for-6.6-rc3-fixes' of git://git.kernel.org/pub/scm/linu…
torvalds Sep 26, 2023
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
96 changes: 96 additions & 0 deletions Documentation/filesystems/porting.rst
Original file line number Diff line number Diff line change
Expand Up @@ -949,3 +949,99 @@ mmap_lock held. All in-tree users have been audited and do not seem to
depend on the mmap_lock being held, but out of tree users should verify
for themselves. If they do need it, they can return VM_FAULT_RETRY to
be called with the mmap_lock held.

---

**mandatory**

The order of opening block devices and matching or creating superblocks has
changed.

The old logic opened block devices first and then tried to find a
suitable superblock to reuse based on the block device pointer.

The new logic tries to find a suitable superblock first based on the device
number, and opening the block device afterwards.

Since opening block devices cannot happen under s_umount because of lock
ordering requirements s_umount is now dropped while opening block devices and
reacquired before calling fill_super().

In the old logic concurrent mounters would find the superblock on the list of
superblocks for the filesystem type. Since the first opener of the block device
would hold s_umount they would wait until the superblock became either born or
was discarded due to initialization failure.

Since the new logic drops s_umount concurrent mounters could grab s_umount and
would spin. Instead they are now made to wait using an explicit wait-wake
mechanism without having to hold s_umount.

---

**mandatory**

The holder of a block device is now the superblock.

The holder of a block device used to be the file_system_type which wasn't
particularly useful. It wasn't possible to go from block device to owning
superblock without matching on the device pointer stored in the superblock.
This mechanism would only work for a single device so the block layer couldn't
find the owning superblock of any additional devices.

In the old mechanism reusing or creating a superblock for a racing mount(2) and
umount(2) relied on the file_system_type as the holder. This was severly
underdocumented however:

(1) Any concurrent mounter that managed to grab an active reference on an
existing superblock was made to wait until the superblock either became
ready or until the superblock was removed from the list of superblocks of
the filesystem type. If the superblock is ready the caller would simple
reuse it.

(2) If the mounter came after deactivate_locked_super() but before
the superblock had been removed from the list of superblocks of the
filesystem type the mounter would wait until the superblock was shutdown,
reuse the block device and allocate a new superblock.

(3) If the mounter came after deactivate_locked_super() and after
the superblock had been removed from the list of superblocks of the
filesystem type the mounter would reuse the block device and allocate a new
superblock (the bd_holder point may still be set to the filesystem type).

Because the holder of the block device was the file_system_type any concurrent
mounter could open the block devices of any superblock of the same
file_system_type without risking seeing EBUSY because the block device was
still in use by another superblock.

Making the superblock the owner of the block device changes this as the holder
is now a unique superblock and thus block devices associated with it cannot be
reused by concurrent mounters. So a concurrent mounter in (2) could suddenly
see EBUSY when trying to open a block device whose holder was a different
superblock.

The new logic thus waits until the superblock and the devices are shutdown in
->kill_sb(). Removal of the superblock from the list of superblocks of the
filesystem type is now moved to a later point when the devices are closed:

(1) Any concurrent mounter managing to grab an active reference on an existing
superblock is made to wait until the superblock is either ready or until
the superblock and all devices are shutdown in ->kill_sb(). If the
superblock is ready the caller will simply reuse it.

(2) If the mounter comes after deactivate_locked_super() but before
the superblock has been removed from the list of superblocks of the
filesystem type the mounter is made to wait until the superblock and the
devices are shut down in ->kill_sb() and the superblock is removed from the
list of superblocks of the filesystem type. The mounter will allocate a new
superblock and grab ownership of the block device (the bd_holder pointer of
the block device will be set to the newly allocated superblock).

(3) This case is now collapsed into (2) as the superblock is left on the list
of superblocks of the filesystem type until all devices are shutdown in
->kill_sb(). In other words, if the superblock isn't on the list of
superblock of the filesystem type anymore then it has given up ownership of
all associated block devices (the bd_holder pointer is NULL).

As this is a VFS level change it has no practical consequences for filesystems
other than that all of them must use one of the provided kill_litter_super(),
kill_anon_super(), or kill_block_super() helpers.
2 changes: 1 addition & 1 deletion fs/aio.c
Original file line number Diff line number Diff line change
Expand Up @@ -80,7 +80,7 @@ struct aio_ring {
struct kioctx_table {
struct rcu_head rcu;
unsigned nr;
struct kioctx __rcu *table[];
struct kioctx __rcu *table[] __counted_by(nr);
};

struct kioctx_cpu {
Expand Down
46 changes: 35 additions & 11 deletions fs/btrfs/delayed-ref.c
Original file line number Diff line number Diff line change
Expand Up @@ -103,24 +103,17 @@ void btrfs_update_delayed_refs_rsv(struct btrfs_trans_handle *trans)
* Transfer bytes to our delayed refs rsv.
*
* @fs_info: the filesystem
* @src: source block rsv to transfer from
* @num_bytes: number of bytes to transfer
*
* This transfers up to the num_bytes amount from the src rsv to the
* This transfers up to the num_bytes amount, previously reserved, to the
* delayed_refs_rsv. Any extra bytes are returned to the space info.
*/
void btrfs_migrate_to_delayed_refs_rsv(struct btrfs_fs_info *fs_info,
struct btrfs_block_rsv *src,
u64 num_bytes)
{
struct btrfs_block_rsv *delayed_refs_rsv = &fs_info->delayed_refs_rsv;
u64 to_free = 0;

spin_lock(&src->lock);
src->reserved -= num_bytes;
src->size -= num_bytes;
spin_unlock(&src->lock);

spin_lock(&delayed_refs_rsv->lock);
if (delayed_refs_rsv->size > delayed_refs_rsv->reserved) {
u64 delta = delayed_refs_rsv->size -
Expand Down Expand Up @@ -163,6 +156,8 @@ int btrfs_delayed_refs_rsv_refill(struct btrfs_fs_info *fs_info,
struct btrfs_block_rsv *block_rsv = &fs_info->delayed_refs_rsv;
u64 limit = btrfs_calc_delayed_ref_bytes(fs_info, 1);
u64 num_bytes = 0;
u64 refilled_bytes;
u64 to_free;
int ret = -ENOSPC;

spin_lock(&block_rsv->lock);
Expand All @@ -178,9 +173,38 @@ int btrfs_delayed_refs_rsv_refill(struct btrfs_fs_info *fs_info,
ret = btrfs_reserve_metadata_bytes(fs_info, block_rsv, num_bytes, flush);
if (ret)
return ret;
btrfs_block_rsv_add_bytes(block_rsv, num_bytes, false);
trace_btrfs_space_reservation(fs_info, "delayed_refs_rsv",
0, num_bytes, 1);

/*
* We may have raced with someone else, so check again if we the block
* reserve is still not full and release any excess space.
*/
spin_lock(&block_rsv->lock);
if (block_rsv->reserved < block_rsv->size) {
u64 needed = block_rsv->size - block_rsv->reserved;

if (num_bytes >= needed) {
block_rsv->reserved += needed;
block_rsv->full = true;
to_free = num_bytes - needed;
refilled_bytes = needed;
} else {
block_rsv->reserved += num_bytes;
to_free = 0;
refilled_bytes = num_bytes;
}
} else {
to_free = num_bytes;
refilled_bytes = 0;
}
spin_unlock(&block_rsv->lock);

if (to_free > 0)
btrfs_space_info_free_bytes_may_use(fs_info, block_rsv->space_info,
to_free);

if (refilled_bytes > 0)
trace_btrfs_space_reservation(fs_info, "delayed_refs_rsv", 0,
refilled_bytes, 1);
return 0;
}

Expand Down
1 change: 0 additions & 1 deletion fs/btrfs/delayed-ref.h
Original file line number Diff line number Diff line change
Expand Up @@ -407,7 +407,6 @@ void btrfs_update_delayed_refs_rsv(struct btrfs_trans_handle *trans);
int btrfs_delayed_refs_rsv_refill(struct btrfs_fs_info *fs_info,
enum btrfs_reserve_flush_enum flush);
void btrfs_migrate_to_delayed_refs_rsv(struct btrfs_fs_info *fs_info,
struct btrfs_block_rsv *src,
u64 num_bytes);
bool btrfs_check_space_for_delayed_refs(struct btrfs_fs_info *fs_info);

Expand Down
18 changes: 10 additions & 8 deletions fs/btrfs/extent-tree.c
Original file line number Diff line number Diff line change
Expand Up @@ -1514,15 +1514,14 @@ static int __btrfs_inc_extent_ref(struct btrfs_trans_handle *trans,
btrfs_release_path(path);

/* now insert the actual backref */
if (owner < BTRFS_FIRST_FREE_OBJECTID) {
BUG_ON(refs_to_add != 1);
if (owner < BTRFS_FIRST_FREE_OBJECTID)
ret = insert_tree_block_ref(trans, path, bytenr, parent,
root_objectid);
} else {
else
ret = insert_extent_data_ref(trans, path, bytenr, parent,
root_objectid, owner, offset,
refs_to_add);
}

if (ret)
btrfs_abort_transaction(trans, ret);
out:
Expand Down Expand Up @@ -1656,7 +1655,10 @@ static int run_delayed_extent_op(struct btrfs_trans_handle *trans,
goto again;
}
} else {
err = -EIO;
err = -EUCLEAN;
btrfs_err(fs_info,
"missing extent item for extent %llu num_bytes %llu level %d",
head->bytenr, head->num_bytes, extent_op->level);
goto out;
}
}
Expand Down Expand Up @@ -1699,12 +1701,12 @@ static int run_delayed_tree_ref(struct btrfs_trans_handle *trans,
parent = ref->parent;
ref_root = ref->root;

if (node->ref_mod != 1) {
if (unlikely(node->ref_mod != 1)) {
btrfs_err(trans->fs_info,
"btree block(%llu) has %d references rather than 1: action %d ref_root %llu parent %llu",
"btree block %llu has %d references rather than 1: action %d ref_root %llu parent %llu",
node->bytenr, node->ref_mod, node->action, ref_root,
parent);
return -EIO;
return -EUCLEAN;
}
if (node->action == BTRFS_ADD_DELAYED_REF && insert_reserved) {
BUG_ON(!extent_op || !extent_op->update_flags);
Expand Down
8 changes: 7 additions & 1 deletion fs/btrfs/extent_io.c
Original file line number Diff line number Diff line change
Expand Up @@ -3995,8 +3995,14 @@ void read_extent_buffer(const struct extent_buffer *eb, void *dstv,
char *dst = (char *)dstv;
unsigned long i = get_eb_page_index(start);

if (check_eb_range(eb, start, len))
if (check_eb_range(eb, start, len)) {
/*
* Invalid range hit, reset the memory, so callers won't get
* some random garbage for their uninitialzed memory.
*/
memset(dstv, 0, len);
return;
}

offset = get_eb_offset_in_page(eb, start);

Expand Down
2 changes: 1 addition & 1 deletion fs/btrfs/super.c
Original file line number Diff line number Diff line change
Expand Up @@ -2117,7 +2117,7 @@ static int btrfs_statfs(struct dentry *dentry, struct kstatfs *buf)
* calculated f_bavail.
*/
if (!mixed && block_rsv->space_info->full &&
total_free_meta - thresh < block_rsv->size)
(total_free_meta < thresh || total_free_meta - thresh < block_rsv->size))
buf->f_bavail = 0;

buf->f_type = BTRFS_SUPER_MAGIC;
Expand Down
6 changes: 3 additions & 3 deletions fs/btrfs/transaction.c
Original file line number Diff line number Diff line change
Expand Up @@ -631,14 +631,14 @@ start_transaction(struct btrfs_root *root, unsigned int num_items,
reloc_reserved = true;
}

ret = btrfs_block_rsv_add(fs_info, rsv, num_bytes, flush);
ret = btrfs_reserve_metadata_bytes(fs_info, rsv, num_bytes, flush);
if (ret)
goto reserve_fail;
if (delayed_refs_bytes) {
btrfs_migrate_to_delayed_refs_rsv(fs_info, rsv,
delayed_refs_bytes);
btrfs_migrate_to_delayed_refs_rsv(fs_info, delayed_refs_bytes);
num_bytes -= delayed_refs_bytes;
}
btrfs_block_rsv_add_bytes(rsv, num_bytes, true);

if (rsv->space_info->force_alloc)
do_chunk_alloc = true;
Expand Down
2 changes: 1 addition & 1 deletion fs/btrfs/tree-log.c
Original file line number Diff line number Diff line change
Expand Up @@ -4722,7 +4722,7 @@ static int btrfs_log_prealloc_extents(struct btrfs_trans_handle *trans,
struct extent_buffer *leaf;
int slot;
int ins_nr = 0;
int start_slot;
int start_slot = 0;
int ret;

if (!(inode->flags & BTRFS_INODE_PREALLOC))
Expand Down
13 changes: 6 additions & 7 deletions fs/btrfs/volumes.c
Original file line number Diff line number Diff line change
Expand Up @@ -1594,25 +1594,24 @@ static int find_free_dev_extent(struct btrfs_device *device, u64 num_bytes,
u64 search_start;
u64 hole_size;
u64 max_hole_start;
u64 max_hole_size;
u64 max_hole_size = 0;
u64 extent_end;
u64 search_end = device->total_bytes;
int ret;
int slot;
struct extent_buffer *l;

search_start = dev_extent_search_start(device);
max_hole_start = search_start;

WARN_ON(device->zone_info &&
!IS_ALIGNED(num_bytes, device->zone_info->zone_size));

path = btrfs_alloc_path();
if (!path)
return -ENOMEM;

max_hole_start = search_start;
max_hole_size = 0;

if (!path) {
ret = -ENOMEM;
goto out;
}
again:
if (search_start >= search_end ||
test_bit(BTRFS_DEV_STATE_REPLACE_TGT, &device->dev_state)) {
Expand Down
11 changes: 8 additions & 3 deletions fs/fs-writeback.c
Original file line number Diff line number Diff line change
Expand Up @@ -1535,10 +1535,15 @@ static void requeue_inode(struct inode *inode, struct bdi_writeback *wb,

if (wbc->pages_skipped) {
/*
* writeback is not making progress due to locked
* buffers. Skip this inode for now.
* Writeback is not making progress due to locked buffers.
* Skip this inode for now. Although having skipped pages
* is odd for clean inodes, it can happen for some
* filesystems so handle that gracefully.
*/
redirty_tail_locked(inode, wb);
if (inode->i_state & I_DIRTY_ALL)
redirty_tail_locked(inode, wb);
else
inode_cgwb_move_to_attached(inode, wb);
return;
}

Expand Down
1 change: 1 addition & 0 deletions fs/libfs.c
Original file line number Diff line number Diff line change
Expand Up @@ -1903,6 +1903,7 @@ ssize_t direct_write_fallback(struct kiocb *iocb, struct iov_iter *iter,
* We don't know how much we wrote, so just return the number of
* bytes which were direct-written
*/
iocb->ki_pos -= buffered_written;
if (direct_written)
return direct_written;
return err;
Expand Down
1 change: 1 addition & 0 deletions fs/ntfs3/super.c
Original file line number Diff line number Diff line change
Expand Up @@ -1562,6 +1562,7 @@ static int ntfs_fill_super(struct super_block *sb, struct fs_context *fc)
put_inode_out:
iput(inode);
out:
ntfs3_put_sbi(sbi);
kfree(boot2);
return err;
}
Expand Down
2 changes: 1 addition & 1 deletion fs/overlayfs/copy_up.c
Original file line number Diff line number Diff line change
Expand Up @@ -337,7 +337,7 @@ static int ovl_set_timestamps(struct ovl_fs *ofs, struct dentry *upperdentry,
{
struct iattr attr = {
.ia_valid =
ATTR_ATIME | ATTR_MTIME | ATTR_ATIME_SET | ATTR_MTIME_SET,
ATTR_ATIME | ATTR_MTIME | ATTR_ATIME_SET | ATTR_MTIME_SET | ATTR_CTIME,
.ia_atime = stat->atime,
.ia_mtime = stat->mtime,
};
Expand Down
6 changes: 6 additions & 0 deletions fs/overlayfs/file.c
Original file line number Diff line number Diff line change
Expand Up @@ -391,6 +391,12 @@ static ssize_t ovl_write_iter(struct kiocb *iocb, struct iov_iter *iter)
if (!ovl_should_sync(OVL_FS(inode->i_sb)))
ifl &= ~(IOCB_DSYNC | IOCB_SYNC);

/*
* Overlayfs doesn't support deferred completions, don't copy
* this property in case it is set by the issuer.
*/
ifl &= ~IOCB_DIO_CALLER_COMP;

old_cred = ovl_override_creds(file_inode(file)->i_sb);
if (is_sync_kiocb(iocb)) {
file_start_write(real.file);
Expand Down
1 change: 0 additions & 1 deletion fs/pipe.c
Original file line number Diff line number Diff line change
Expand Up @@ -537,7 +537,6 @@ pipe_write(struct kiocb *iocb, struct iov_iter *from)
break;
}
ret += copied;
buf->offset = 0;
buf->len = copied;

if (!iov_iter_count(from))
Expand Down
Loading