Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix remounting snapshots read-write #6515

Merged
merged 1 commit into from Aug 17, 2017
Merged

Conversation

loli10K
Copy link
Contributor

@loli10K loli10K commented Aug 15, 2017

Description

Apparently it's not enough to preserve/restore MS_RDONLY on the superblock flags to avoid remounting a snapshot read-write: be explicit about our intentions to the VFS layer so the readonly bit is updated correctly in do_remount_sb().

This is still causing another issue different from the one reported in #6510, i'm still investigating this:

[16498.559788] VERIFY(list_is_empty(list)) failed
[16498.560525] PANIC at list.h:88:list_destroy()
[16498.561242] Showing stack for process 908
[16498.561967] CPU: 0 PID: 908 Comm: dbu_evict Tainted: P           OE   4.9.0 #4
[16498.564434] Hardware name: Bochs Bochs, BIOS Bochs 01/01/2011
[16498.566161]  0000000000000000 ffffffff88129055 ffffffffc0a4e081 ffffbf2040e73e10
[16498.569740]  ffffffffc07eb03f ffffa08c4a3586c0 ffffffff00000028 ffffbf2040e73e20
[16498.573670]  ffffbf2040e73dc0 6c28594649524556 655f73695f747369 73696c287974706d
[16498.575983] Call Trace:
[16498.576456]  [<ffffffff88129055>] ? dump_stack+0x5c/0x77
[16498.578307]  [<ffffffffc07eb03f>] ? spl_panic+0xbf/0xf0 [spl]
[16498.580339]  [<ffffffff883f8dce>] ? mutex_lock+0xe/0x30
[16498.582178]  [<ffffffff883f8dce>] ? mutex_lock+0xe/0x30
[16498.584078]  [<ffffffffc0986fb9>] ? refcount_remove_many+0x179/0x280 [zfs]
[16498.585849]  [<ffffffffc0930935>] ? dbuf_rele_and_unlock+0x1e5/0x450 [zfs]
[16498.586479]  [<ffffffffc0930b65>] ? dbuf_rele_and_unlock+0x415/0x450 [zfs]
[16498.588155]  [<ffffffffc098525a>] ? multilist_insert+0xca/0x1a0 [zfs]
[16498.590031]  [<ffffffffc091cd0a>] ? remove_reference+0x8a/0x180 [zfs]
[16498.591858]  [<ffffffff883f8dce>] ? mutex_lock+0xe/0x30
[16498.592541]  [<ffffffffc0986fb9>] ? refcount_remove_many+0x179/0x280 [zfs]
[16498.594420]  [<ffffffffc096bffe>] ? dsl_dir_evict+0x1ae/0x1e0 [zfs]
[16498.596174]  [<ffffffffc07e8eb9>] ? taskq_thread+0x269/0x520 [spl]
[16498.597929]  [<ffffffff87ea9170>] ? wake_up_q+0x60/0x60
[16498.599595]  [<ffffffffc07e8c50>] ? taskq_thread_spawn+0x50/0x50 [spl]
[16498.600174]  [<ffffffff87e9dc10>] ? kthread+0xe0/0x100
[16498.601646]  [<ffffffff87e2b76b>] ? __switch_to+0x2bb/0x700
[16498.602361]  [<ffffffff87e9db30>] ? kthread_park+0x60/0x60
[16498.604091]  [<ffffffff883fbaf5>] ? ret_from_fork+0x25/0x30

Motivation and Context

Fix #6510

How Has This Been Tested?

Types of changes

  • Bug fix (non-breaking change which fixes an issue)
  • New feature (non-breaking change which adds functionality)
  • Performance enhancement (non-breaking change which improves efficiency)
  • Code cleanup (non-breaking change which makes code smaller or more readable)
  • Breaking change (fix or feature that would cause existing functionality to change)
  • Documentation (a change to man pages or other documentation)

Checklist:

  • My code follows the ZFS on Linux code style requirements.
  • I have updated the documentation accordingly.
  • I have read the CONTRIBUTING document.
  • I have added tests to cover my changes.
  • All new and existing tests passed.
  • All commit messages are properly formatted and contain Signed-off-by.
  • Change has been approved by a ZFS on Linux member.

@loli10K loli10K added the Status: Work in Progress Not yet ready for general review label Aug 15, 2017
@behlendorf behlendorf added this to Needs Commit in 0.7.2 Aug 15, 2017
ASSERT(vfsp->vfs_do_readonly);
*flags |= MS_RDONLY;
return (EROFS);
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The readonly flags is always exists in flags as MS_RDONLY. We cannot rely on the option string for common mount flags.
Also, we should do the same thing for reaonly import.

@@ -1769,6 +1769,13 @@ zfs_remount(struct super_block *sb, int *flags, zfs_mnt_t *zm)
if (error)
return (error);

if ((dmu_objset_is_snapshot(zfsvfs->z_os) ||
!spa_writeable(dmu_objset_spa(zfsvfs->z_os))) &&
(!vfsp->vfs_readonly || !(*flags & MS_RDONLY))) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You need to remove the check on vfsp. As I said, we cannot rely on option string as MS_RDONLY is the only authoritative flag for readonly. It's perfectly normal to have MS_RDONLY and emtpy option string, and we need to treat that as readonly.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree that MS_RDONLY is the only authoritative flag for readonly to the VFS, but what happens if we have flags=MS_RDONLY and data="rw")?

I think we should stop processing is this other case too, otherwise we will execute a lot of code (zfs_register_callbacks/zfs_unregister_callbacks/zfsvfs_vfs_free) that is useless and potentially dangerous: for instance the panic in list_destroy() seems to be caused by unregistered callbacks.

*flags |= MS_RDONLY;
return (EROFS);
}

zfs_unregister_callbacks(zfsvfs);
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[16498.559788] VERIFY(list_is_empty(list)) failed
[16498.560525] PANIC at list.h:88:list_destroy()
[16498.561242] Showing stack for process 908
[16498.561967] CPU: 0 PID: 908 Comm: dbu_evict Tainted: P           OE   4.9.0 #4
[16498.564434] Hardware name: Bochs Bochs, BIOS Bochs 01/01/2011
[16498.566161]  0000000000000000 ffffffff88129055 ffffffffc0a4e081 ffffbf2040e73e10
[16498.569740]  ffffffffc07eb03f ffffa08c4a3586c0 ffffffff00000028 ffffbf2040e73e20
[16498.573670]  ffffbf2040e73dc0 6c28594649524556 655f73695f747369 73696c287974706d
[16498.575983] Call Trace:
[16498.576456]  [<ffffffff88129055>] ? dump_stack+0x5c/0x77
[16498.578307]  [<ffffffffc07eb03f>] ? spl_panic+0xbf/0xf0 [spl]
[16498.580339]  [<ffffffff883f8dce>] ? mutex_lock+0xe/0x30
[16498.582178]  [<ffffffff883f8dce>] ? mutex_lock+0xe/0x30
[16498.584078]  [<ffffffffc0986fb9>] ? refcount_remove_many+0x179/0x280 [zfs]
[16498.585849]  [<ffffffffc0930935>] ? dbuf_rele_and_unlock+0x1e5/0x450 [zfs]
[16498.586479]  [<ffffffffc0930b65>] ? dbuf_rele_and_unlock+0x415/0x450 [zfs]
[16498.588155]  [<ffffffffc098525a>] ? multilist_insert+0xca/0x1a0 [zfs]
[16498.590031]  [<ffffffffc091cd0a>] ? remove_reference+0x8a/0x180 [zfs]
[16498.591858]  [<ffffffff883f8dce>] ? mutex_lock+0xe/0x30
[16498.592541]  [<ffffffffc0986fb9>] ? refcount_remove_many+0x179/0x280 [zfs]
[16498.594420]  [<ffffffffc096bffe>] ? dsl_dir_evict+0x1ae/0x1e0 [zfs]
[16498.596174]  [<ffffffffc07e8eb9>] ? taskq_thread+0x269/0x520 [spl]
[16498.597929]  [<ffffffff87ea9170>] ? wake_up_q+0x60/0x60
[16498.599595]  [<ffffffffc07e8c50>] ? taskq_thread_spawn+0x50/0x50 [spl]
[16498.600174]  [<ffffffff87e9dc10>] ? kthread+0xe0/0x100
[16498.601646]  [<ffffffff87e2b76b>] ? __switch_to+0x2bb/0x700
[16498.602361]  [<ffffffff87e9db30>] ? kthread_park+0x60/0x60
[16498.604091]  [<ffffffff883fbaf5>] ? ret_from_fork+0x25/0x3

The issue here is that zfs_unregister_callbacks() does not execute dsl_prop_unregister_all() if dmu_objset_is_snapshot(os), so we leave a dirtry/non-empty ds->ds_prop_cbs list when we remount successfully a snapshot (-o remount,noxattr).

Maybe we should skip callback registration in zfs_remount() when we are processing a snapshot.

@tuxoko
Copy link
Contributor

tuxoko commented Aug 16, 2017

Yeah, the snapshot didn't register the callback in the first place as it can't change properties anyway. And yes, most of the callback should be removed since they are all MS_* flags and handle by the kernel. But we can do the cleanup later.

Last bit,
http://man7.org/linux/man-pages/man2/mount.2.html
http://elixir.free-electrons.com/linux/latest/source/fs/super.c#L820

EACCES Mounting a read-only filesystem was attempted without giving the MS_RDONLY flag.

Otherwise LGTM.

@loli10K
Copy link
Contributor Author

loli10K commented Aug 16, 2017

@tuxoko the reason i choose EROFS is because i found it was used by other filesystem in their .remount_fs functions:

ext2_remount : http://elixir.free-electrons.com/linux/latest/source/fs/ext2/super.c#L1376
ext4_remount : http://elixir.free-electrons.com/linux/latest/source/fs/ext4/super.c#L5017

I will force-push an updated version with s/EROFS/EACCES/ as soon as the buildslaves are done running the new test case. Thanks!

@tuxoko
Copy link
Contributor

tuxoko commented Aug 16, 2017

@loli10K

The error values given below result from filesystem type independent errors. Each filesystem type may have its own special errors and its own special behavior. See the Linux kernel source code for details.

Actually I think you're right. Since this is specific for snapshot and readonly import, not a fs independent error, we can leave it as EROFS.

@loli10K loli10K removed the Status: Work in Progress Not yet ready for general review label Aug 16, 2017
zfsvfs_vfs_free(zfsvfs->z_vfs);

vfsp->vfs_data = zfsvfs;
zfsvfs->z_vfs = vfsp;
(void) zfs_register_callbacks(vfsp);
if (issnap)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This check and the one above look inverted to me. We should only be registering and unregistering callbacks when it's not a snapshot.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You're right, the check is inverted. But we should be hitting VERIFY(list_is_empty(list)) failed from the dbu_evict thread in this case, so this means the test script is not covering every possible case.

I'll update this in a bit, thanks.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As you pointed out earlier zfs_unregister_callbacks() already does this check internally so no need to do it twice. I think we only need to check such that callbacks are registered for non-snapshots. And to fix the test case.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We can still trigger the failure in dbu_evict with a successfull mount -t zfs -o remount,ro $MNTPSNAP: i'm adding this to the test case.

@tuxoko
Copy link
Contributor

tuxoko commented Aug 16, 2017

@behlendorf
I have a question about the readonly property on dataset. Should this be a hard rule? Or should this be like atime that can be override by mount flag? Because currently it behaves like atime that is override by mount flag.

@behlendorf
Copy link
Contributor

Good question. The readonly property on a dataset was historically intended to be overridden as described in the "Temporary Mount Point Properties" section of zfs(8).

@tuxoko
Copy link
Contributor

tuxoko commented Aug 16, 2017

Oh, I see that now, then we're good since it's documented that way.

@loli10K loli10K force-pushed the issue-6510 branch 2 times, most recently from ed59dda to a2d4865 Compare August 17, 2017 06:45
@loli10K
Copy link
Contributor Author

loli10K commented Aug 17, 2017

This is not quite ready yet, sorry: some distro (CentOS 6, Ubuntu 14) don't handle mount -t zfs -o remount,ro but seem to accept mount -o remount,ro:

http://build.zfsonlinux.org/builders/CentOS%206%20x86_64%20%28TEST%29/builds/7/steps/shell_8/logs/log
http://build.zfsonlinux.org/builders/Ubuntu%2014.04%20i686%20%28TEST%29/builds/3426/steps/shell_8/logs/log

Also CentOS 7 x86_64 is failing every combination of mount [-t zfs] [-s] -o remount,ro: http://build.zfsonlinux.org/builders/CentOS%207%20x86_64%20%28TEST%29/builds/6/steps/shell_8/logs/log

07:02:59.90 NOTE: --start-- troubleshooting remount,ro
07:02:59.90 NOTE: mount -t zfs -o remount,ro /var/tmp/testdir/zfs_snap_mount
07:02:59.90 filesystem 'testpool/testfs@snap' cannot be mounted due to invalid option 'seclabel'.
07:02:59.90 Use the '-s' option to ignore the bad mount option.
07:02:59.90 1
07:02:59.90 NOTE: mount -o remount,ro /var/tmp/testdir/zfs_snap_mount
07:02:59.90 filesystem 'testpool/testfs@snap' cannot be mounted due to invalid option 'seclabel'.
07:02:59.90 Use the '-s' option to ignore the bad mount option.
07:02:59.90 1
07:02:59.90 NOTE: mount -s -o remount,ro /var/tmp/testdir/zfs_snap_mount
07:02:59.91 filesystem 'testpool/testfs@snap' cannot be mounted due to invalid option 'seclabel'.
07:02:59.91 Use the '-s' option to ignore the bad mount option.
07:02:59.91 1
07:02:59.91 NOTE: --end-- troubleshooting remount,ro

Finally, the following kmemleak warnings still need to be investigated: http://build.zfsonlinux.org/builders/Ubuntu%2016.04%20x86_64%20Kmemleak%20%28TEST%29/builds/2986/steps/shell_8/logs/kmemleak

unreferenced object 0xffff8b0d29f121e0 (size 96):
  comm "mount.zfs", pid 9375, jiffies 4295859680 (age 12740.472s)
  hex dump (first 32 bytes):
    00 00 00 00 00 00 00 00 00 e6 41 17 0d 8b ff ff  ..........A.....
    01 00 00 00 00 00 00 00 00 00 00 00 01 00 00 00  ................
  backtrace:
    [<ffffffffafc8d22a>] kmemleak_alloc+0x7a/0x100
    [<ffffffffaf31f3bc>] __kmalloc_node+0x26c/0x510
    [<ffffffffc07aaa09>] spl_kmem_zalloc+0x159/0x2a0 [spl]
    [<ffffffffc0aa4dec>] zfsvfs_parse_options+0x4c/0x470 [zfs]
    [<ffffffffc0aa57f9>] zfs_remount+0x59/0x180 [zfs]
    [<ffffffffc0aed4cc>] zpl_remount_fs+0x5c/0x140 [zfs]
    [<ffffffffaf363916>] do_remount_sb+0xa6/0x300
    [<ffffffffaf395d70>] do_mount+0xbb0/0x1520
    [<ffffffffaf396bbd>] SyS_mount+0xdd/0x160
    [<ffffffffafc9ca7b>] entry_SYSCALL_64_fastpath+0x1e/0xad
    [<ffffffffffffffff>] 0xffffffffffffffff
unreferenced object 0xffff8b0d1741e600 (size 32):
  comm "mount.zfs", pid 9375, jiffies 4295859680 (age 12740.472s)
  hex dump (first 32 bytes):
    2f 76 61 72 2f 74 6d 70 2f 74 65 73 74 64 69 72  /var/tmp/testdir
    2f 7a 66 73 5f 73 6e 61 70 5f 6d 6f 75 6e 74 00  /zfs_snap_mount.
  backtrace:
    [<ffffffffafc8d22a>] kmemleak_alloc+0x7a/0x100
    [<ffffffffaf31dff3>] __kmalloc+0x263/0x3e0
    [<ffffffffaf630bdc>] match_strdup+0x2c/0x80
    [<ffffffffc0aa4f27>] zfsvfs_parse_options+0x187/0x470 [zfs]
    [<ffffffffc0aa57f9>] zfs_remount+0x59/0x180 [zfs]
    [<ffffffffc0aed4cc>] zpl_remount_fs+0x5c/0x140 [zfs]
    [<ffffffffaf363916>] do_remount_sb+0xa6/0x300
    [<ffffffffaf395d70>] do_mount+0xbb0/0x1520
    [<ffffffffaf396bbd>] SyS_mount+0xdd/0x160
    [<ffffffffafc9ca7b>] entry_SYSCALL_64_fastpath+0x1e/0xad
    [<ffffffffffffffff>] 0xffffffffffffffff

I'm in the process of building a box with kmemleak enabled, i hope to push an update later today: sorry about the noise.

@loli10K loli10K force-pushed the issue-6510 branch 2 times, most recently from 8ea558d to 05049dc Compare August 17, 2017 09:59
It's not enough to preserve/restore MS_RDONLY on the superblock flags
to avoid remounting a snapshot read-write: be explicit about our
intentions to the VFS layer so the readonly bit is updated correctly
in do_remount_sb().

Signed-off-by: loli10K <ezomori.nozomu@gmail.com>
@behlendorf
Copy link
Contributor

@loli10K were you able to identify the kmem leak? The source code changes look the same as in the last version. Thanks for sorting out the test case failure.

@tuxoko
Copy link
Contributor

tuxoko commented Aug 17, 2017

@behlendorf
I think he moved the snapshot check up, so there's no leak when it return EROFS now.

@behlendorf
Copy link
Contributor

Got it. Then this looks good.

@loli10K
Copy link
Contributor Author

loli10K commented Aug 17, 2017

Yes, sorry, forgot to update: @tuxoko is exactly right.

@behlendorf behlendorf merged commit 08de8c1 into openzfs:master Aug 17, 2017
@behlendorf behlendorf moved this from Needs Commit to TODO in 0.7.2 Aug 17, 2017
@loli10K loli10K deleted the issue-6510 branch August 17, 2017 21:29
tonyhutter pushed a commit that referenced this pull request Aug 22, 2017
It's not enough to preserve/restore MS_RDONLY on the superblock flags
to avoid remounting a snapshot read-write: be explicit about our
intentions to the VFS layer so the readonly bit is updated correctly
in do_remount_sb().

Reviewed-by: Chunwei Chen <tuxoko@gmail.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: loli10K <ezomori.nozomu@gmail.com>
Closes #6510
Closes #6515
@tonyhutter tonyhutter moved this from TODO to Done in 0.7.2 Aug 22, 2017
Fabian-Gruenbichler pushed a commit to Fabian-Gruenbichler/zfs that referenced this pull request Sep 29, 2017
It's not enough to preserve/restore MS_RDONLY on the superblock flags
to avoid remounting a snapshot read-write: be explicit about our
intentions to the VFS layer so the readonly bit is updated correctly
in do_remount_sb().

Reviewed-by: Chunwei Chen <tuxoko@gmail.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: loli10K <ezomori.nozomu@gmail.com>
Closes openzfs#6510
Closes openzfs#6515
FransUrbo pushed a commit to FransUrbo/zfs that referenced this pull request Apr 28, 2019
It's not enough to preserve/restore MS_RDONLY on the superblock flags
to avoid remounting a snapshot read-write: be explicit about our
intentions to the VFS layer so the readonly bit is updated correctly
in do_remount_sb().

Reviewed-by: Chunwei Chen <tuxoko@gmail.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: loli10K <ezomori.nozomu@gmail.com>
Closes openzfs#6510 
Closes openzfs#6515
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
No open projects
0.7.2
Done
Development

Successfully merging this pull request may close these issues.

Snapshots was allowed to mount as read-write and attempt to write to that snapshot lead a kernel panic
3 participants