Possible null pointer dereference in fuse_vnop_lookup() #298

pliard-chromium · 2016-07-15T21:01:24Z

Hi,

I have been experiencing a crash in the kernel extension in fuse_vnop_lookup() at this line: https://github.com/osxfuse/kext/blob/master/osxfuse/fuse_vnops.c#L1547

Is |parentvp| expected to always be non-null when fuse_vnop_lookup() gets called?

bfleischer · 2016-07-15T21:04:13Z

Could you post the complete panic log?

pliard-chromium · 2016-07-15T21:13:36Z

The call trace isn't symbolized unfortunately but I hope this helps. I pasted below the assembly code around RIP (0xe8bb) for convenience.

[0xe89e] <+600>: callq 0x8dc0 ; FSNodeGenericFromHNode
[0xe8a3] <+605>: movq 0x20(%rax), %rdi
[0xe8a7] <+609>: movq %rdi, -0xd0(%rbp)
[0xe8ae] <+616>: callq 0xe8b3 ; <+621>
[0xe8b3] <+621>: movq %rax, %rdi
[0xe8b6] <+624>: callq 0x8dc0 ; FSNodeGenericFromHNode
[0xe8bb] <+629>: movq 0x10(%rax), %r12
[0xe8bf] <+633>: movq 0xf0(%r15), %rax
[0xe8c6] <+640>: imull $0x64, %eax, %edx
[0xe8c9] <+643>: shrq $0x20, %rax
[0xe8cd] <+647>: addl %edx, %eax
[0xe8cf] <+649>: cmpl $0x2c4, %eax

OS X Problem Report.txt

bfleischer · 2016-07-15T22:18:07Z

This does not seem to be an official osxfuse release. Can you reproduce the issue with an official release?

pliard-chromium · 2016-07-15T22:34:22Z

We have only "re-branded" the extension by changing the constants in common/fuse_version.h. Our copy of the kext itself is otherwise unchanged and currently points to this commit: https://github.com/osxfuse/kext/tree/dac0c16c930680a3e33568d47dd62060ec104c7e

I also tried upgrading our copy of kext to 3.4.0 without any changes and the issue still reproduced. Though we do have a large version gap between our copy of the kext (https://github.com/osxfuse/kext/tree/dac0c16c930680a3e33568d47dd62060ec104c7e) and the rest of userspace osxfuse that we use (https://github.com/osxfuse/osxfuse/tree/595a6e4893e67fa50b889d8ecc9a5437019ef897)

Let me try to repro anyway with an up-to-date userspace although the kext should probably be resilient to a "bad userspace" ideally.

bfleischer · 2016-07-15T22:47:39Z

The crash log you attached is pretty much useless without the corresponding debug symbols.

If you can reproduce this with the official 3.4.0 release I'm happy to take a look. The fact that you even removed the bundle identifier in the panic log does not look good. I'm not offering free support for (possibly commercial) forks of osxfuse.

pliard-chromium · 2016-07-19T17:33:41Z

I just finished upgrading our copy of OSXFuse to 3.4.0 and I can still reproduce the same issue. I'm now working on getting a core dump/setting up remote debugging.

The crash only reproduces when the "allow root" mount option is set. We don't need that option anymore technically thanks to osxfuse/kext@7efb7fa although I'm worried that not allowing root is only masking the issue by preventing 'mds', the root process that the crashing thread belongs to, from using our file system.

To be clear, I'm not expecting free support. I'm more concerned about potential end users of OSXFuse in the field experiencing the same issue. I masked the name of the kernel extension as the project I'm working on is still confidential and it won't be used for commercial purposes either FWIW.

I will update the bug with a meaningful panic report as soon as I can get it.

pliard-chromium · 2016-07-19T23:25:19Z

I have the crash now in lldb and I'm attaching the call trace.
call_trace.txt

As you can see in the backtrace I was wrong. The |ap| pointer itself that is passed to fuse_vnop_lookup() is NULL.

I'm still digging more into the issue but FYI below is the |nameidata| struct being passed to lookup():
(lldb) p *ndp
(nameidata) $10 = {
ni_dirp = 581211003312668672
ni_segflg = 830705584
ni_op = -128
ni_startdir = 0xffffff8127b1bbf0
ni_rootdir = 0xffffff8127b1bc38
ni_usedvp = 0xffffff8000000100
ni_vp = 0xffffff8127b1bc67
ni_dvp = 0xeb90dc3800000002
ni_pathlen = 0
ni_next = 0x73656d7500000010 ""
ni_pathbuf = ""
ni_loopcnt = 4714980737810432
ni_cnd = {
cn_nameiop = 830705584
cn_flags = 4294967168
cn_context = 0xffffff8127b1bd38
cn_ndp = 0xffffff8127b1bd80
cn_pnbuf = 0xffffff8000000100 ""
cn_pnlen = 665959838
cn_nameptr = 0x812c58b300000010 ""
cn_namelen = 0
cn_hash = 4294967169
cn_consume = 0
}
ni_flag = -1676148577
ni_ncgeneration = -892027961
}
(lldb)

As well as the |nameidata| being passed to namei():
(nameidata) $12 = {
ni_dirp = 4451136240
ni_segflg = UIO_USERSPACE64
ni_op = OP_OPEN
ni_startdir = 0x0000000000000000
ni_rootdir = 0xffffff802f409960
ni_usedvp = 0x0000000000000000
ni_vp = 0x0000000000000000
ni_dvp = 0xffffff80363813c0
ni_pathlen = 1
ni_next = 0xffffff8127b1bc69 ""
ni_pathbuf = "/Volumes/FS/FS_12/54443fdfdfd.rtfd/.."
ni_loopcnt = 0
ni_cnd = {
cn_nameiop = 0
cn_flags = 135323732
cn_context = 0xffffff8031838fb0
cn_ndp = 0xffffff8127b1bbf0
cn_pnbuf = 0xffffff8127b1bc38 "/Volumes/FS/FS_12/54443fdfdfd.rtfd/.."
cn_pnlen = 256
cn_nameptr = 0xffffff8127b1bc67 ".."
cn_namelen = 2
cn_hash = 3952139320
cn_consume = 0
}
ni_flag = 16
ni_ncgeneration = 1936026997
}
(lldb)

pliard-chromium · 2016-07-19T23:51:44Z

I don't see how the OSXFuse kext could be responsible for being provided with a NULL pointer so I'm going to close this. We must be doing something wrong in the way we build the kext. Sorry for the noise.

pliard-chromium · 2016-07-20T20:00:09Z

It turns out I was using the wrong OSXFuse kext symbols yesterday which is why I was seeing a NULL pointer being provided to OSXFuse. The problem described in my first comment still happens. I dumped my lldb session as well as a fix that I applied that fixed the issue there FYI:
https://docs.google.com/document/d/1jsu7qUxt9QeT6U-spmSmzH855lHQ6ADqyzsSMfe74uM/edit#

Feel free to use the fix at the end of the document if you think this is the correct approach.

catwell · 2016-07-22T09:54:03Z

I can confirm that I have seen an issue that could have the same cause. When you try to rename(..) the kernel panics.

We run a modified version of PJD on our filesystem and it exhibits this issue in two tests.

bfleischer · 2016-07-24T20:00:48Z

Thanks @pliard-chromium for your extensive analysis. This issue might be related to osxfuse/kext#5. osxfuse seems to return vnodes that are in the process of being recycled.

@catwell, can you post your version of PJD and some repro steps? Being able to reproduce this issue reliably would help a lot.

catwell · 2016-07-25T06:35:01Z

Not sure I can post the whole PJD but I can try to extract code that reproduces it later today.

catwell · 2016-07-25T13:01:45Z

This small C program makes the kernel panic:

#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#include <unistd.h>
#include <sys/stat.h>

#define die(...) do { \
    fprintf(stderr, __VA_ARGS__); \
    exit(EXIT_FAILURE); \
} while(0)

int main (int argc, char **argv) {
    if (argc != 2) die("USAGE: %s [path_to_mountpoint]\n", argv[0]);
    if (strlen(argv[1]) > 200) die("path to mountpoint too long");

    char path[256];
    sprintf(path, "%s/foo", argv[1]);
    if (mkdir(path)) perror("mkdir");
    sprintf(path, "%s/foo/..", argv[1]);
    if (rmdir(path)) perror("rmdir");

    return EXIT_SUCCESS;
}

Note that I personally do not think it is linked to osxfuse/kext#5, which occurs when changes are made without going through th FUSE interface (whereas this issue does not require anything like that, only doing something unexpected with ..).

bfleischer · 2016-07-27T05:44:35Z

Thanks @catwell, up until now I hadn't realized the panic was this easy to reproduce.

bfleischer · 2016-08-08T20:06:17Z

osxfuse 3.4.2 fixes the kernel panic @catwell reported.

@pliard-chromium, could you check if you are still able to reproduce your panic? Judging from your analysis those two panics might not be related.

catwell · 2016-08-08T20:44:18Z

Thank you for the fix!

bfleischer self-assigned this Jul 15, 2016

pliard-chromium closed this as completed Jul 19, 2016

pliard-chromium reopened this Jul 20, 2016

bfleischer added the bug label Jul 24, 2016

bfleischer added this to the 3.4.2 milestone Jul 24, 2016

bfleischer closed this as completed in osxfuse/kext@cf0da1e Aug 8, 2016

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Possible null pointer dereference in fuse_vnop_lookup() #298

Possible null pointer dereference in fuse_vnop_lookup() #298

pliard-chromium commented Jul 15, 2016

bfleischer commented Jul 15, 2016

pliard-chromium commented Jul 15, 2016 •

edited

bfleischer commented Jul 15, 2016

pliard-chromium commented Jul 15, 2016

bfleischer commented Jul 15, 2016 •

edited

pliard-chromium commented Jul 19, 2016

pliard-chromium commented Jul 19, 2016 •

edited

pliard-chromium commented Jul 19, 2016

pliard-chromium commented Jul 20, 2016

catwell commented Jul 22, 2016 •

edited

bfleischer commented Jul 24, 2016

catwell commented Jul 25, 2016

catwell commented Jul 25, 2016 •

edited

bfleischer commented Jul 27, 2016 •

edited

bfleischer commented Aug 8, 2016

catwell commented Aug 8, 2016

Possible null pointer dereference in fuse_vnop_lookup() #298

Possible null pointer dereference in fuse_vnop_lookup() #298

Comments

pliard-chromium commented Jul 15, 2016

bfleischer commented Jul 15, 2016

pliard-chromium commented Jul 15, 2016 • edited

bfleischer commented Jul 15, 2016

pliard-chromium commented Jul 15, 2016

bfleischer commented Jul 15, 2016 • edited

pliard-chromium commented Jul 19, 2016

pliard-chromium commented Jul 19, 2016 • edited

pliard-chromium commented Jul 19, 2016

pliard-chromium commented Jul 20, 2016

catwell commented Jul 22, 2016 • edited

bfleischer commented Jul 24, 2016

catwell commented Jul 25, 2016

catwell commented Jul 25, 2016 • edited

bfleischer commented Jul 27, 2016 • edited

bfleischer commented Aug 8, 2016

catwell commented Aug 8, 2016

pliard-chromium commented Jul 15, 2016 •

edited

bfleischer commented Jul 15, 2016 •

edited

pliard-chromium commented Jul 19, 2016 •

edited

catwell commented Jul 22, 2016 •

edited

catwell commented Jul 25, 2016 •

edited

bfleischer commented Jul 27, 2016 •

edited