New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
linux_get_parent_info: Check for NULL priv->sysfs_dir before strcmp #129
Conversation
Which version of libusb is this? The fact that linux_hotplug_enumerate() is On Sun, Nov 29, 2015 at 5:45 PM, staples1347 notifications@github.com
|
I've experienced the crash on libusb 1.0.19 and 1.0.20, but haven't tested on the latest master. Also, if it helps, the systems are all running on Gentoo with Network UPS Tools 2.7.2 or 2.7.3 and various linux kernels from 3.14 series as well as 4.1.6 from Gentoo's hardened-sources packages, and hardened gcc, and I've seen the crash with Socomec, Cyberpower, and APC USB UPSs. The crash occurs shortly after Linux re-detects the ups's usb interface. Would it help if I post some debug output from a system? |
Yes, debug output may be helpful. Also, please try the latest master, as
there were some changes to netlink processing since 1.0.20 was released.
|
Okay, latest master appears to have the problem as well. I don't seem to get the crash when libusb debug is enabled though or if the nut driver is running in the forground which makes it seem like a timing issue, so I have added some custom debugging output to the linux_get_parent_info function with the NULL check fix included which hopefully makes sense. Also, on this system at least, it looks like the issue only occurs with a physical disconnect and reconnect, or if the ehci driver is unbound or rebound both of which trigger a fast disconnect and reconnect on the ohci driver. Unbind and bind for ohci doesn't trigger the bug. I am using echo > unbind and echo > bind to /sys/bus/pci/drivers/ehci-pci/ to to trigger the bug on this system at the moment. The debug output appears to be exactly the same every time I trigger the bug on this system at the moment with nutdrv_qx on nut-2.7.2 and a Socomec UPS and kernel 4.1.6). This is the current lsusb list of devices on the system I am testing with at the moment: |
Here are two files with normal libusb (master) debug output while running nutdrv_qx, and unbinding and binding, to show what the output is normally (I don't get the crash with normal libusb debug enabled). With the first log, I did unbind from ehci at around 15 seconds which triggered a fast disconnect and reconnect of the ups. With the second log, I did unbind from ohci at around 12 seconds, and then bind back to ohci at around 22 seconds. Both tests started with all the usb interfaces bound to their proper drivers. libusb_standard_debug_nut_usbups_with_ehci_unbind.txt.gz |
Can you try the attached patch against master (without your fix) and debug output turned off? Since this appears to be timing critical, the patch will attempt to quickly dump out the netlink messages whilst avoiding the overhead of the standard debug logging. |
With that patch I got a buffer overflow, but was able to modify it to get suitable output: then after "ehci-pci unbind", this: then a few seconds later, this, and then the crash: This is the patch I've used to create the above output: netlink_dump_2.txt |
I don't see why the original patch would have run into a buffer overflow, but here's another to try that is simpler and more paranoid. Your version of the patch doesn't capture the required information. The netlink messages consist of a series of NULL-terminated strings, and your patch will only print the first line. That is why my patch seeks out the NULL bytes and replaces them with linefeed characters. |
Ah I see now. It looks like during the process of trying to solve this issue, we've uncovered another bug which is: because len is type: size_t, len < 32 doesn't evaluate -1 as true. The if test at the top of linux_netlink_read_message should be changed to something like if (len < 32 || len==-1) to catch all recvmsg errors. Alternatively, len should be changed to type: ssize_t as that's the type that recvmsg returns according to my man page and then -1 will evaluate as true for test: len < 32. This doesn't fix the crash when removing the usb device though so I've attached the output from your netlink_dump_3 patch up until the crash with some comments to indicate when the output occurred. |
Yes, I noticed the size_t issue, which is why I added the assertion statement. That was the only thing I could think of that would have caused a buffer overflow in my original patch. Does your application crash as a result of the assertion, or something else? |
I've done a bit more debugging (after fixing the size_t issue) to see if I could isolate the crashing issue even more. Hopefully the attached files will assist with fixing the problem, although at this stage I'm not sure what the correct code fix should be if not the one I committed. It looks like on modern systems this issue will only occur if the program is running as non-root, libusb has been set to not use udev, and there is a security system in place such as grsecurity blocking access to /sys/bus/usb for non-root users. Also, if I allow libusb to use udev on the grsecurity system, nut can't find any usb devices when non-root user so I can only use the legacy usbfs. Running the nut driver as root works around the crash as it then has access to sysfs which passes in valid sysfs_dir entries (sysfs_scan_device) although that isn't as secure. Also, on my laptop, running non-grsecurity kernel with udev, linux_udev_scan_devices also passes in valid sysfs_dir entries. Only usbfs_scan_busdir passes in NULL for the sysfs_dir parameter as it doesn't have access to the proper name for the sysfs directory so that's why there is a potential crash later on. A small test program I wrote didn't crash though. So far I've only been able to get the crash to occur via the nut drivers. This is a custom debug log while running nutdrv_qx as the nut user with grsecurity kernel and then it crashes when I unbind the usb device from the ehci-pci driver: usbdebug_no_udev_as_nut_user_with_grsecurity_based_kernel_with_crash.txt This is a custom debug log while running nutdrv_qx as the root user with grsecurity kernel and no crash when I unbind the usb device from the ehci-pci driver: usbdebug_no_udev_as_root_with_grsecurity_based_kernel_no_crash.txt |
Hi, Ok, given the comment from 11 Dec 2015 I understand how this can happen (under a somewhat exotic set of circumstances) and avoiding a null-ptr deref is always good, so I've applied the patch. Regards, Hans |
This is a quick fix for a crash seen when a usb ups disconnects and reconnects while Network UPS Tools is monitoring it. Here is the valgrind output:
==2383== Process terminating with default action of signal 11 (SIGSEGV)
==2383== Access not within mapped region at address 0x0
==2383== at 0x4C2F341: strcmp (in /usr/lib64/valgrind/vgpreload_memcheck-amd64-linux.so)
==2383== by 0x561800B: linux_enumerate_device (linux_usbfs.c:1037)
==2383== by 0x5618CB8: linux_hotplug_enumerate (linux_usbfs.c:1113)
==2383== by 0x561A192: linux_netlink_read_message (linux_netlink.c:322)
==2383== by 0x561A724: linux_netlink_hotplug_poll (linux_netlink.c:366)
==2383== by 0x560D168: libusb_get_device_list (core.c:807)
==2383== by 0x4E383DF: usb_find_busses (in /lib64/libusb-0.1.so.4.4.4)
==2383== by 0x120D55: libusb_open (libusb.c:164)
==2383== by 0x11EE09: upsdrv_updateinfo (usbhid-ups.c:1355)
==2383== by 0x11D016: main (main.c:708)