Navigation Menu

Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

linux_get_parent_info: Check for NULL priv->sysfs_dir before strcmp #129

Closed
wants to merge 1 commit into from

Conversation

staples1347
Copy link
Contributor

This is a quick fix for a crash seen when a usb ups disconnects and reconnects while Network UPS Tools is monitoring it. Here is the valgrind output:
==2383== Process terminating with default action of signal 11 (SIGSEGV)
==2383== Access not within mapped region at address 0x0
==2383== at 0x4C2F341: strcmp (in /usr/lib64/valgrind/vgpreload_memcheck-amd64-linux.so)
==2383== by 0x561800B: linux_enumerate_device (linux_usbfs.c:1037)
==2383== by 0x5618CB8: linux_hotplug_enumerate (linux_usbfs.c:1113)
==2383== by 0x561A192: linux_netlink_read_message (linux_netlink.c:322)
==2383== by 0x561A724: linux_netlink_hotplug_poll (linux_netlink.c:366)
==2383== by 0x560D168: libusb_get_device_list (core.c:807)
==2383== by 0x4E383DF: usb_find_busses (in /lib64/libusb-0.1.so.4.4.4)
==2383== by 0x120D55: libusb_open (libusb.c:164)
==2383== by 0x11EE09: upsdrv_updateinfo (usbhid-ups.c:1355)
==2383== by 0x11D016: main (main.c:708)

@dickens
Copy link
Member

dickens commented Dec 2, 2015

Which version of libusb is this? The fact that linux_hotplug_enumerate() is
called with a NULL sys_name argument means that
linux_netlink_read_message() parsed a message where the DEVPATH key did not
contain a '/' character. If this was indeed a USB device message being
handled, this is unusual.

On Sun, Nov 29, 2015 at 5:45 PM, staples1347 notifications@github.com
wrote:

This is a quick fix for a crash seen when a usb ups disconnects and
reconnects while Network UPS Tools is monitoring it. Here is the valgrind
output:
==2383== Process terminating with default action of signal 11 (SIGSEGV)
==2383== Access not within mapped region at address 0x0
==2383== at 0x4C2F341: strcmp (in
/usr/lib64/valgrind/vgpreload_memcheck-amd64-linux.so)
==2383== by 0x561800B: linux_enumerate_device (linux_usbfs.c:1037)
==2383== by 0x5618CB8: linux_hotplug_enumerate (linux_usbfs.c:1113)
==2383== by 0x561A192: linux_netlink_read_message (linux_netlink.c:322)
==2383== by 0x561A724: linux_netlink_hotplug_poll (linux_netlink.c:366)
==2383== by 0x560D168: libusb_get_device_list (core.c:807)
==2383== by 0x4E383DF: usb_find_busses (in /lib64/libusb-0.1.so.4.4.4)
==2383== by 0x120D55: libusb_open (libusb.c:164)
==2383== by 0x11EE09: upsdrv_updateinfo (usbhid-ups.c:1355)

==2383== by 0x11D016: main (main.c:708)

You can view, comment on, or merge this pull request online at:

#129
Commit Summary

  • linux_get_parent_info: Check for NULL priv->sysfs_dir before strcmp

File Changes

Patch Links:


Reply to this email directly or view it on GitHub
#129.

@staples1347
Copy link
Contributor Author

I've experienced the crash on libusb 1.0.19 and 1.0.20, but haven't tested on the latest master. Also, if it helps, the systems are all running on Gentoo with Network UPS Tools 2.7.2 or 2.7.3 and various linux kernels from 3.14 series as well as 4.1.6 from Gentoo's hardened-sources packages, and hardened gcc, and I've seen the crash with Socomec, Cyberpower, and APC USB UPSs. The crash occurs shortly after Linux re-detects the ups's usb interface.

Would it help if I post some debug output from a system?

@dickens
Copy link
Member

dickens commented Dec 3, 2015 via email

@staples1347
Copy link
Contributor Author

Okay, latest master appears to have the problem as well. I don't seem to get the crash when libusb debug is enabled though or if the nut driver is running in the forground which makes it seem like a timing issue, so I have added some custom debugging output to the linux_get_parent_info function with the NULL check fix included which hopefully makes sense. Also, on this system at least, it looks like the issue only occurs with a physical disconnect and reconnect, or if the ehci driver is unbound or rebound both of which trigger a fast disconnect and reconnect on the ohci driver. Unbind and bind for ohci doesn't trigger the bug. I am using echo > unbind and echo > bind to /sys/bus/pci/drivers/ehci-pci/ to to trigger the bug on this system at the moment. The debug output appears to be exactly the same every time I trigger the bug on this system at the moment with nutdrv_qx on nut-2.7.2 and a Socomec UPS and kernel 4.1.6).

This is the current lsusb list of devices on the system I am testing with at the moment:
Bus 011 Device 001: ID 1d6b:0001 Linux Foundation 1.1 root hub
Bus 010 Device 001: ID 1d6b:0001 Linux Foundation 1.1 root hub
Bus 006 Device 001: ID 1d6b:0002 Linux Foundation 2.0 root hub
Bus 005 Device 001: ID 1d6b:0002 Linux Foundation 2.0 root hub
Bus 004 Device 001: ID 1d6b:0003 Linux Foundation 3.0 root hub
Bus 003 Device 001: ID 1d6b:0002 Linux Foundation 2.0 root hub
Bus 002 Device 001: ID 1d6b:0003 Linux Foundation 3.0 root hub
Bus 001 Device 001: ID 1d6b:0002 Linux Foundation 2.0 root hub
Bus 009 Device 064: ID 0665:5161 Cypress Semiconductor USB to Serial (This is the ups interface)
Bus 009 Device 001: ID 1d6b:0001 Linux Foundation 1.1 root hub
Bus 008 Device 002: ID 04d9:1603 Holtek Semiconductor, Inc. Keyboard
Bus 008 Device 001: ID 1d6b:0001 Linux Foundation 1.1 root hub
Bus 007 Device 002: ID 04e8:6012 Samsung Electronics Co., Ltd
Bus 007 Device 001: ID 1d6b:0002 Linux Foundation 2.0 root hub

usbdebug.txt

@staples1347
Copy link
Contributor Author

Here are two files with normal libusb (master) debug output while running nutdrv_qx, and unbinding and binding, to show what the output is normally (I don't get the crash with normal libusb debug enabled). With the first log, I did unbind from ehci at around 15 seconds which triggered a fast disconnect and reconnect of the ups. With the second log, I did unbind from ohci at around 12 seconds, and then bind back to ohci at around 22 seconds. Both tests started with all the usb interfaces bound to their proper drivers.

libusb_standard_debug_nut_usbups_with_ehci_unbind.txt.gz
libusb_standard_debug_nut_usbups_with_ohci_unbind_bind.txt.gz

@dickens
Copy link
Member

dickens commented Dec 4, 2015

Can you try the attached patch against master (without your fix) and debug output turned off? Since this appears to be timing critical, the patch will attempt to quickly dump out the netlink messages whilst avoiding the overhead of the standard debug logging.
netlink_dump.txt

@staples1347
Copy link
Contributor Author

With that patch I got a buffer overflow, but was able to modify it to get suitable output:
nutdrv_qx startup:
remove@/devices/pci0000:00/0000:00:13.0/usb9/9-3/9-3:1.0/usbmisc/hiddev0
remove@/usbmisc
remove@/devices/pci0000:00/0000:00:13.0/usb9/9-3/9-3:1.0/0003:0665:5161.004F/hidraw/hidraw2
remove@/devices/pci0000:00/0000:00:13.0/usb9/9-3/9-3:1.0/0003:0665:5161.004F

then after "ehci-pci unbind", this:
remove@/devices/pci0000:00/0000:00:13.2/usb6/6-0:1.0
remove@/devices/pci0000:00/0000:00:13.2/usb6
remove@/devices/pci0000:00/0000:00:13.0/usb9/9-3/9-3:1.0
remove@/devices/pci0000:00/0000:00:13.0/usb9/9-3

then a few seconds later, this, and then the crash:
add@/devices/pci0000:00/0000:00:13.0/usb9/9-3

This is the patch I've used to create the above output: netlink_dump_2.txt

@dickens
Copy link
Member

dickens commented Dec 8, 2015

I don't see why the original patch would have run into a buffer overflow, but here's another to try that is simpler and more paranoid. Your version of the patch doesn't capture the required information. The netlink messages consist of a series of NULL-terminated strings, and your patch will only print the first line. That is why my patch seeks out the NULL bytes and replaces them with linefeed characters.

netlink_dump_3.txt

@staples1347
Copy link
Contributor Author

Ah I see now. It looks like during the process of trying to solve this issue, we've uncovered another bug which is: because len is type: size_t, len < 32 doesn't evaluate -1 as true. The if test at the top of linux_netlink_read_message should be changed to something like if (len < 32 || len==-1) to catch all recvmsg errors. Alternatively, len should be changed to type: ssize_t as that's the type that recvmsg returns according to my man page and then -1 will evaluate as true for test: len < 32. This doesn't fix the crash when removing the usb device though so I've attached the output from your netlink_dump_3 patch up until the crash with some comments to indicate when the output occurred.
usbdebug.txt

@dickens
Copy link
Member

dickens commented Dec 10, 2015

Yes, I noticed the size_t issue, which is why I added the assertion statement. That was the only thing I could think of that would have caused a buffer overflow in my original patch. Does your application crash as a result of the assertion, or something else?

@staples1347
Copy link
Contributor Author

I've done a bit more debugging (after fixing the size_t issue) to see if I could isolate the crashing issue even more. Hopefully the attached files will assist with fixing the problem, although at this stage I'm not sure what the correct code fix should be if not the one I committed.

It looks like on modern systems this issue will only occur if the program is running as non-root, libusb has been set to not use udev, and there is a security system in place such as grsecurity blocking access to /sys/bus/usb for non-root users. Also, if I allow libusb to use udev on the grsecurity system, nut can't find any usb devices when non-root user so I can only use the legacy usbfs. Running the nut driver as root works around the crash as it then has access to sysfs which passes in valid sysfs_dir entries (sysfs_scan_device) although that isn't as secure. Also, on my laptop, running non-grsecurity kernel with udev, linux_udev_scan_devices also passes in valid sysfs_dir entries. Only usbfs_scan_busdir passes in NULL for the sysfs_dir parameter as it doesn't have access to the proper name for the sysfs directory so that's why there is a potential crash later on. A small test program I wrote didn't crash though. So far I've only been able to get the crash to occur via the nut drivers.

This is a custom debug log while running nutdrv_qx as the nut user with grsecurity kernel and then it crashes when I unbind the usb device from the ehci-pci driver: usbdebug_no_udev_as_nut_user_with_grsecurity_based_kernel_with_crash.txt

This is a custom debug log while running nutdrv_qx as the root user with grsecurity kernel and no crash when I unbind the usb device from the ehci-pci driver: usbdebug_no_udev_as_root_with_grsecurity_based_kernel_no_crash.txt

@hjelmn hjelmn added the linux label Mar 6, 2016
@jwrdegoede
Copy link
Contributor

Hi,

Ok, given the comment from 11 Dec 2015 I understand how this can happen (under a somewhat exotic set of circumstances) and avoiding a null-ptr deref is always good, so I've applied the patch.

Regards,

Hans

@jwrdegoede jwrdegoede closed this Nov 28, 2016
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants