Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Possible null pointer dereference in fuse_vnop_lookup() #298

Closed
pliard-chromium opened this issue Jul 15, 2016 · 16 comments
Closed

Possible null pointer dereference in fuse_vnop_lookup() #298

pliard-chromium opened this issue Jul 15, 2016 · 16 comments
Assignees
Labels
Milestone

Comments

@pliard-chromium
Copy link

Hi,

I have been experiencing a crash in the kernel extension in fuse_vnop_lookup() at this line: https://github.com/osxfuse/kext/blob/master/osxfuse/fuse_vnops.c#L1547

It looks like |pdp|, which is assigned from |parentvp|, is dereferenced in VTOI() while being NULL. My knowledge of the codebase is very limited but I see in other places including FSNodeGetOrCreateFileVNodeByID(), where it's set, that |parentvp|'s validity is checked before being dereferenced.

Is |parentvp| expected to always be non-null when fuse_vnop_lookup() gets called?

@bfleischer
Copy link
Member

Could you post the complete panic log?

@bfleischer bfleischer self-assigned this Jul 15, 2016
@pliard-chromium
Copy link
Author

pliard-chromium commented Jul 15, 2016

The call trace isn't symbolized unfortunately but I hope this helps. I pasted below the assembly code around RIP (0xe8bb) for convenience.

[0xe89e] <+600>: callq 0x8dc0 ; FSNodeGenericFromHNode
[0xe8a3] <+605>: movq 0x20(%rax), %rdi
[0xe8a7] <+609>: movq %rdi, -0xd0(%rbp)
[0xe8ae] <+616>: callq 0xe8b3 ; <+621>
[0xe8b3] <+621>: movq %rax, %rdi
[0xe8b6] <+624>: callq 0x8dc0 ; FSNodeGenericFromHNode
[0xe8bb] <+629>: movq 0x10(%rax), %r12
[0xe8bf] <+633>: movq 0xf0(%r15), %rax
[0xe8c6] <+640>: imull $0x64, %eax, %edx
[0xe8c9] <+643>: shrq $0x20, %rax
[0xe8cd] <+647>: addl %edx, %eax
[0xe8cf] <+649>: cmpl $0x2c4, %eax

OS X Problem Report.txt

@bfleischer
Copy link
Member

This does not seem to be an official osxfuse release. Can you reproduce the issue with an official release?

@pliard-chromium
Copy link
Author

We have only "re-branded" the extension by changing the constants in common/fuse_version.h. Our copy of the kext itself is otherwise unchanged and currently points to this commit: https://github.com/osxfuse/kext/tree/dac0c16c930680a3e33568d47dd62060ec104c7e

I also tried upgrading our copy of kext to 3.4.0 without any changes and the issue still reproduced. Though we do have a large version gap between our copy of the kext (https://github.com/osxfuse/kext/tree/dac0c16c930680a3e33568d47dd62060ec104c7e) and the rest of userspace osxfuse that we use (https://github.com/osxfuse/osxfuse/tree/595a6e4893e67fa50b889d8ecc9a5437019ef897)

Let me try to repro anyway with an up-to-date userspace although the kext should probably be resilient to a "bad userspace" ideally.

@bfleischer
Copy link
Member

bfleischer commented Jul 15, 2016

The crash log you attached is pretty much useless without the corresponding debug symbols.

If you can reproduce this with the official 3.4.0 release I'm happy to take a look. The fact that you even removed the bundle identifier in the panic log does not look good. I'm not offering free support for (possibly commercial) forks of osxfuse.

@pliard-chromium
Copy link
Author

I just finished upgrading our copy of OSXFuse to 3.4.0 and I can still reproduce the same issue. I'm now working on getting a core dump/setting up remote debugging.

The crash only reproduces when the "allow root" mount option is set. We don't need that option anymore technically thanks to osxfuse/kext@7efb7fa although I'm worried that not allowing root is only masking the issue by preventing 'mds', the root process that the crashing thread belongs to, from using our file system.

To be clear, I'm not expecting free support. I'm more concerned about potential end users of OSXFuse in the field experiencing the same issue. I masked the name of the kernel extension as the project I'm working on is still confidential and it won't be used for commercial purposes either FWIW.

I will update the bug with a meaningful panic report as soon as I can get it.

@pliard-chromium
Copy link
Author

pliard-chromium commented Jul 19, 2016

I have the crash now in lldb and I'm attaching the call trace.
call_trace.txt

As you can see in the backtrace I was wrong. The |ap| pointer itself that is passed to fuse_vnop_lookup() is NULL.

I'm still digging more into the issue but FYI below is the |nameidata| struct being passed to lookup():
(lldb) p *ndp
(nameidata) $10 = {
ni_dirp = 581211003312668672
ni_segflg = 830705584
ni_op = -128
ni_startdir = 0xffffff8127b1bbf0
ni_rootdir = 0xffffff8127b1bc38
ni_usedvp = 0xffffff8000000100
ni_vp = 0xffffff8127b1bc67
ni_dvp = 0xeb90dc3800000002
ni_pathlen = 0
ni_next = 0x73656d7500000010 ""
ni_pathbuf = ""
ni_loopcnt = 4714980737810432
ni_cnd = {
cn_nameiop = 830705584
cn_flags = 4294967168
cn_context = 0xffffff8127b1bd38
cn_ndp = 0xffffff8127b1bd80
cn_pnbuf = 0xffffff8000000100 ""
cn_pnlen = 665959838
cn_nameptr = 0x812c58b300000010 ""
cn_namelen = 0
cn_hash = 4294967169
cn_consume = 0
}
ni_flag = -1676148577
ni_ncgeneration = -892027961
}
(lldb)

As well as the |nameidata| being passed to namei():
(nameidata) $12 = {
ni_dirp = 4451136240
ni_segflg = UIO_USERSPACE64
ni_op = OP_OPEN
ni_startdir = 0x0000000000000000
ni_rootdir = 0xffffff802f409960
ni_usedvp = 0x0000000000000000
ni_vp = 0x0000000000000000
ni_dvp = 0xffffff80363813c0
ni_pathlen = 1
ni_next = 0xffffff8127b1bc69 ""
ni_pathbuf = "/Volumes/FS/FS_12/54443fdfdfd.rtfd/.."
ni_loopcnt = 0
ni_cnd = {
cn_nameiop = 0
cn_flags = 135323732
cn_context = 0xffffff8031838fb0
cn_ndp = 0xffffff8127b1bbf0
cn_pnbuf = 0xffffff8127b1bc38 "/Volumes/FS/FS_12/54443fdfdfd.rtfd/.."
cn_pnlen = 256
cn_nameptr = 0xffffff8127b1bc67 ".."
cn_namelen = 2
cn_hash = 3952139320
cn_consume = 0
}
ni_flag = 16
ni_ncgeneration = 1936026997
}
(lldb)

@pliard-chromium
Copy link
Author

I don't see how the OSXFuse kext could be responsible for being provided with a NULL pointer so I'm going to close this. We must be doing something wrong in the way we build the kext. Sorry for the noise.

@pliard-chromium
Copy link
Author

It turns out I was using the wrong OSXFuse kext symbols yesterday which is why I was seeing a NULL pointer being provided to OSXFuse. The problem described in my first comment still happens. I dumped my lldb session as well as a fix that I applied that fixed the issue there FYI:
https://docs.google.com/document/d/1jsu7qUxt9QeT6U-spmSmzH855lHQ6ADqyzsSMfe74uM/edit#

Feel free to use the fix at the end of the document if you think this is the correct approach.

@catwell
Copy link

catwell commented Jul 22, 2016

I can confirm that I have seen an issue that could have the same cause. When you try to rename(..) the kernel panics.

We run a modified version of PJD on our filesystem and it exhibits this issue in two tests.

@bfleischer bfleischer added the bug label Jul 24, 2016
@bfleischer bfleischer added this to the 3.4.2 milestone Jul 24, 2016
@bfleischer
Copy link
Member

Thanks @pliard-chromium for your extensive analysis. This issue might be related to osxfuse/kext#5. osxfuse seems to return vnodes that are in the process of being recycled.

@catwell, can you post your version of PJD and some repro steps? Being able to reproduce this issue reliably would help a lot.

@catwell
Copy link

catwell commented Jul 25, 2016

Not sure I can post the whole PJD but I can try to extract code that reproduces it later today.

@catwell
Copy link

catwell commented Jul 25, 2016

This small C program makes the kernel panic:

#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#include <unistd.h>
#include <sys/stat.h>

#define die(...) do { \
    fprintf(stderr, __VA_ARGS__); \
    exit(EXIT_FAILURE); \
} while(0)

int main (int argc, char **argv) {
    if (argc != 2) die("USAGE: %s [path_to_mountpoint]\n", argv[0]);
    if (strlen(argv[1]) > 200) die("path to mountpoint too long");

    char path[256];
    sprintf(path, "%s/foo", argv[1]);
    if (mkdir(path)) perror("mkdir");
    sprintf(path, "%s/foo/..", argv[1]);
    if (rmdir(path)) perror("rmdir");

    return EXIT_SUCCESS;
}

Note that I personally do not think it is linked to osxfuse/kext#5, which occurs when changes are made without going through th FUSE interface (whereas this issue does not require anything like that, only doing something unexpected with ..).

@bfleischer
Copy link
Member

bfleischer commented Jul 27, 2016

Thanks @catwell, up until now I hadn't realized the panic was this easy to reproduce.

@bfleischer
Copy link
Member

osxfuse 3.4.2 fixes the kernel panic @catwell reported.

@pliard-chromium, could you check if you are still able to reproduce your panic? Judging from your analysis those two panics might not be related.

@catwell
Copy link

catwell commented Aug 8, 2016

Thank you for the fix!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

3 participants