New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
aufs Operation not allowed on linux 6.6.4, 6.6.5, 6.6.6 #33
Comments
|
At linux kernel 6.6.1, 6.6.2, 6.6.3 all is good. |
|
Hello,
KrasNIX:
**result:**
ls: unable to access '/au/res': Operation not allowed
? d????????? ? ? ? ? ? res
524291 drwxr-xr-x 2 root root 4096 dec 12 21:23 ro
524290 drwxr-xr-x 4 root root 4096 dec 12 21:26 rw
**dmesg:**
[ 463.707906] RIP: 0010:vfs_getattr+0x4c/0x60
:::
Thanks for the report.
It seems that you've met a problem which roadie2 found recently.
#32
Until aufs6.6.4 comes up, replace fs/stat.c from linux-v6.6.3.
J. R. Okajima
|
|
J.R.O, I don't know if this fix helps Puppy Linux users who were having a kernel panic as I don't know how aufs is handled in the initrd, though it must use a mount operation as well. This may also mean your patches don't need updating, hopefully that's the case. I'll add that "udba=reval" and "udba=notify" did not work. |
|
As far as I can tell (I'm not an expert), Puppy has in it's initrd: I have been building with a reverted fs/stack.c while waiting for aufs-6.6.4 so have not yet been able to check if udba=none fixes the boot crash. What is the implication of changing from udba=reval to udba=none??? |
|
Confirmed - changing init to udba=none does result in a successful boot to a working desktop. However - such a change would not be retrospective meaning that 6.6.4 onwards kernels could never be used on systems built before the change... so not viable from my perspective. |
|
Hello,
roadie2:
I've found that the boot problem Porteus and PorteuX users were having can be fixed by adding the "udba=none" option to the initrd mount code. The kernel versions that gave problems boot cleanly without replacing or editing fs/stat.c
Yes, udba=none makes aufs to skip internal getattr() and gain some
performance. Note that it is useful when your all branch fs-es are local
and hidden from others.
In other words, all your branches (layers) are accessed via aufs
only. If you or someone else touches a file on a branch directly,
eg. bypassing aufs, then it will make aufs confused and MAY cause some
destructive result.
J. R. Okajima
|
|
Waoh, You found the root cause!!
Now I can confirm that the commit
8a924db2d7b5 2023-11-18 fs: Pass AT_GETATTR_NOSEC flag to getattr interface function
in linux-v6.7-rc4 and in v6.6.4 (3fb0fa086419) made the change.
Aufs has to follow that change.
I will release aufs6.6.4 as soon as possible.
Unfortunately, I have no time to test aufs6.6.4, but I made a patch.
Just untested at all.
If you guys don't mind, please test it and report the result.
Thanx in advance.
J. R. Okajima
diff --git a/fs/aufs/i_op.c b/fs/aufs/i_op.c
index c3ff23f485d7..329d73968dc3 100644
--- a/fs/aufs/i_op.c
+++ b/fs/aufs/i_op.c
@@ -1288,9 +1288,13 @@ static int aufs_getattr(struct mnt_idmap *idmap, const struct path *path,
goto out_fill; /* pretending success */
positive = d_is_positive(h_path.dentry);
- if (positive)
+ if (positive) {
/* no vfsub version */
- err = vfs_getattr(&h_path, st, request, query);
+ if (query & AT_GETATTR_NOSEC)
+ err = vfs_getattr_nosec(&h_path, st, request, query);
+ else
+ err = vfs_getattr(&h_path, st, request, query);
+ }
if (!err) {
if (positive)
au_refresh_iattr(inode, st,
|
|
Many thanks - confirmed - k6.6.7 built with patched /fs/aufs/i_op.c boots correctly. Will use it as daily. |
|
Confirmed, booting well with Porteus. Thanks very much |
|
It’s not clear, does the latest patch (for i_op.c) fix the problem described in the first message? For some reason, the last messages from the system booted successfully. |
|
vladns:
It=E2=80=99s not clear, does the latest patch (for i_op.c) fix the proble=
m described in the first message? For some reason, the last messages from=
the system booted successfully.
I's not clear for me what you refer by the words the "first" and "last"
messages are, but the patch (for fs/aufs/i_op.c) is definitely necessary
for v6.6.4 and later.
And I guess the report from KrasNIX (is this the one you call the first
msg?) will be fixed.
J. R. Okajima
|
|
this helps for me |
|
The fact is that on my 6.6.X kernels, a kernel message (error) periodically appears: After "Comm:", there may be other services. |
|
KrasNIX:
`Until aufs6.6.4 comes up, replace fs/stat.c from linux-v6.6.3.`
this helps for me
aufs6.6.4 doesn't appear yet, but I posted a patch.
I'd suggest you to keep your fs/stat.c as vanilla kernel and apply a
patch to fs/aufs/i_op.c.
Happy Holidays!
|
|
vladns:
The fact is that on my 6.6.X kernels, a kernel message (error) periodically appears:
```
divide error: 0000 [#1] PREEMPT SMP PTI
router kernel: CPU: 5 PID: 8900 Comm: NTCP Pumper Not tainted 6.6.8-arch1-1-aufs #1 11d8e1235f53d9ca3d
Hardware name: MSI MS-7851, BIOS V4.10 XX/XX/20XX
RIP: 0010:tcp_rcv_space_adjust+0xbe/0x160
```
I guess this is a problem of a module related to network, and aufs is
NOT related. If you can post your full stack trace.
J. R. Okajima
|
|
|
Hello vladns,
Your stacktrace shows that
- the command "NTCP Pumper" whose pid is 8900 issued read(2) for a
socket.
- the kernel recieved the read(2) request and ipv6/tcp handled it.
- in tcp_recvmsg(), something wrong happened.
tcp_recvmsg -> tcp_recvmsg_locked -> ... -> tcp_rcv_space_adjust
which means aufs is unrelated and I can do nothing to help you.
Do you think there is a scenario like this?
- "NTCP Pumper" has some configuration or library file on aufs
- aufs doesn't show the file to the command
- the command gets crazy and behaves wrong
If so, you need to find the file which triggers the problem.
J. R. Okajima
|
|
Thanks for the help. NTCP is part of the i2p(d) router. I don’t know what exactly is happening there. But this is not so important, because... instead of NTCP there may be another program, at least 2 or 3. This is the last one in the messages. But when I install kernel 6.5.9 with aufs support, these problems do not exist. |
|
@sfjro have had the NOSEC fix in our tree for a bit but running into something new it appears in 6.6.8 when launching a have tried enabling SHWH to no avail - same effect with and without. |
|
Hello RageLtMan,
RageLtMan:
aufs au_lkup_dentry:238:dockerd[496]: I/O Error, both of real entry and whiteout found, proc, err -5
Hmm, it should not happen (but happened, I know).
The normal case:
- assume you have two layers, upper RW and lower RO.
- on RO, there is a file (or dir) named 'proc'
- on RW, there is a file named '.wh.proc'
- then you won't see 'proc' in aufs.
The abnormal case (your case):
- on RO, there may or may not exist a file (or dir) named 'proc'
- on RW, there is a file (or dir) named 'proc'
- on RW, there is a file named '.wh.proc' too
- then you will see the log "I/O Error, both of ..."
The possible scenario (which is just my guess) is
- on RO, there is a file (or dir) named 'proc'
- run "rm proc", then a file named '.wh.proc' will be created on RW
- create a file (or dir) named 'proc' on RW directly (bypassing aufs)
- then you will have both of '.wh.proc' and 'proc' on RW, and the log
will appear.
Did you run such command on RW directly (bypassing aufs)?
J. R. Okajima
|
|
@sfjro: thanks for pinging back. I ran no commands whatsoever, the invocation causing this is The This works in 6.1 by the way so figure its something with how 6.6 is stacking those fs' |
|
RageLtMan:
# docker run --rm -ti hello-world
docker: Error response from daemon: stat /var/lib/docker/aufs/mnt/2c41877fe89b5a846606a2c86b3e032d55dcdfbfb384819c99e365aea6c30bf5-init/dev/pts: input/output error.
See 'docker run --help'.
***@***.*** vagrant]# dmesg|grep aufs
[ 20.508179] aufs 6.6-20231106
[ 20.931173] aufs au_opts_verify:763:dockerd[497]: dirperm1 breaks the protection by the permission bits on the lower branch
[ 23.697143] aufs au_lkup_dentry:238:dockerd[493]: I/O Error, both of real entry and whiteout found, dev, err -5
:::
Ok, Now I see something is changed in mainline kernel, and I have to
check it. Unfortunately I am busy and I can't do it right now.
I hope I will be able to return aufs in a few months.
J. R. Okajima
|
|
Roger, thanks. In the meantime, would you be able to point me in the general area where i should look for a solution? |
|
RageLtMan:
Roger, thanks. In the meantime, would you be able to point me in the general area where i should look for a solution?
If I had enough time to investigate, I would try git-diff or git-bisect.
- first, identify the "good" kernel and "bad" kernel.
in this case, linux-v6.6.7 and linux-v6.6.8?
- run "git diff v6.6.7 v6.6.8" or "git log v6.6.7..v6.6.8"
to shrink the range, it is a good idea to append "fs", "include" or something.
- in a simple case, these diff or log tells us the cause.
- otherwise, we need to try git-bisect which may include the compile and
run, which take a long time.
I don't force you to try these.
It's a simplified instruction for myself.
J. R. Okajima
|
|
@sfjro I successfully compiled and run kernel 6.6.10 in Porteus 5.01 x86_64 with your /fs/aufs/i_op.c patch |
|
@TurboBlaze - just vanilla 6.6.10? # docker run --rm -ti hello-world
docker: Error response from daemon: stat /var/lib/docker/aufs/mnt/a23767eccedee97aa675346307b0b2cc37582d31e319d06d70d2f1049ee1a85f-init/sys: input/output error.
See 'docker run --help'.
# dmesg | tail -200| grep aufs
[ 67.592285] aufs au_opts_verify:763:dockerd[551]: dirperm1 breaks the protection by the permission bits on the lower branch
[ 69.952540] aufs au_lkup_dentry:238:dockerd[554]: I/O Error, both of real entry and whiteout found, sys, err -5
[ 69.954114] aufs au_lkup_dentry:238:dockerd[554]: I/O Error, both of real entry and whiteout found, sys, err -5
# uname -r
6.6.10built in-tree with the following config options: |
|
@sempervictus config is |
Aufs simply follows the change in mainline, 8a924db 2023-11-18 fs: Pass AT_GETATTR_NOSEC flag to getattr interface function On github, roadie2 found the cause of some issues. See-also: sfjro/aufs-standalone#32 See-also: sfjro/aufs-standalone#33 Signed-off-by: J. R. Okajima <hooanon05g@gmail.com>
Aufs simply follows the change in mainline, 8a924db 2023-11-18 fs: Pass AT_GETATTR_NOSEC flag to getattr interface function On github, roadie2 found the cause of some issues. See-also: sfjro/aufs-standalone#32 See-also: sfjro/aufs-standalone#33 Signed-off-by: J. R. Okajima <hooanon05g@gmail.com> (cherry picked from commit aeafb3b)
Simple test:
result:
ls: unable to access '/au/res': Operation not allowed
? d????????? ? ? ? ? ? res
524291 drwxr-xr-x 2 root root 4096 dec 12 21:23 ro
524290 drwxr-xr-x 4 root root 4096 dec 12 21:26 rw
dmesg:
[ 463.707906] RIP: 0010:vfs_getattr+0x4c/0x60
[ 463.707909] Code: 13 5b 5d 41 5c 41 5d 31 d2 31 c9 31 f6 31 ff c3 cc cc cc cc 89 d9 44 89 ea 5b 4c 89 e6 48 89 ef 5d 41 5c 41 5d e9 b4 fe ff ff <0f> 0b b8 ff ff ff ff 31 d2 31 c9 31 f6 31 ff c3 cc cc cc cc 90 90
[ 463.707912] RSP: 0018:ffff956d80c6bd88 EFLAGS: 00010286
[ 463.707915] RAX: ffff88c0d012e900 RBX: 0000000000000000 RCX: 0000000080000000
[ 463.707917] RDX: 000000000000035e RSI: ffff956d80c6be50 RDI: ffff956d80c6bda8
[ 463.707918] RBP: ffff956d80c6bde8 R08: 0000000000000000 R09: 0000000000000000
[ 463.707920] R10: 0000000000000000 R11: 0000000000000000 R12: ffff956d80c6be50
[ 463.707921] R13: ffff88c0c051c540 R14: ffff88c0d00f20d8 R15: ffff88c0c3bb2800
[ 463.707923] FS: 00007f2f200fc800(0000) GS:ffff88c1f7d40000(0000) knlGS:0000000000000000
[ 463.707925] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 463.707926] CR2: 000055c4900afd58 CR3: 0000000101eee000 CR4: 00000000000006e0
[ 463.707930] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[ 463.707931] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
[ 463.707933] Call Trace:
[ 463.707948]
[ 463.707949] ? vfs_getattr+0x4c/0x60
[ 463.707954] ? __warn+0x7d/0x130
[ 463.707969] ? vfs_getattr+0x4c/0x60
[ 463.707972] ? report_bug+0x19e/0x1d0
[ 463.707977] ? handle_bug+0x42/0x80
[ 463.707983] ? exc_invalid_op+0x13/0x70
[ 463.707987] ? asm_exc_invalid_op+0x16/0x20
[ 463.707996] ? vfs_getattr+0x4c/0x60
[ 463.707998] aufs_getattr+0x11f/0x210
[ 463.708004] vfs_statx+0xc2/0x180
[ 463.708008] do_statx+0x67/0xc0
[ 463.708013] __x64_sys_statx+0x62/0x90
[ 463.708017] do_syscall_64+0x3a/0x90
[ 463.708020] entry_SYSCALL_64_after_hwframe+0x6e/0xd8
[ 463.708024] RIP: 0033:0x7f2f2029092a
[ 463.708030] Code: 48 8b 05 d9 a4 0d 00 ba ff ff ff ff 64 c7 00 16 00 00 00 e9 a5 fd ff ff e8 e3 05 02 00 0f 1f 00 41 89 ca b8 4c 01 00 00 0f 05 <48> 3d 00 f0 ff ff 77 2e 89 c1 85 c0 74 0f 48 8b 05 a1 a4 0d 00 64
[ 463.708032] RSP: 002b:00007ffc778bf498 EFLAGS: 00000202 ORIG_RAX: 000000000000014c
[ 463.708035] RAX: ffffffffffffffda RBX: 000055c4900aac28 RCX: 00007f2f2029092a
[ 463.708036] RDX: 0000000000000900 RSI: 00007ffc778c072f RDI: 00000000ffffff9c
[ 463.708038] RBP: 000000000000035e R08: 00007ffc778bf4a0 R09: 0000000000000002
[ 463.708039] R10: 000000000000035e R11: 0000000000000202 R12: 00007ffc778c072f
[ 463.708040] R13: 0000000000000000 R14: 000055c4900aac10 R15: 0000000000000001
[ 463.708044]
[ 463.708045] ---[ end trace 0000000000000000 ]---
The text was updated successfully, but these errors were encountered: