Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

HPE team - found a stack override problem on libusb-win32 #19

Closed
JamesHuangHPE opened this issue Aug 11, 2021 · 29 comments
Closed

HPE team - found a stack override problem on libusb-win32 #19

JamesHuangHPE opened this issue Aug 11, 2021 · 29 comments
Labels

Comments

@JamesHuangHPE
Copy link

JamesHuangHPE commented Aug 11, 2021

Hi Peter,

This is James from HPE, we recently found a stack override problem on the libusb-win32 driver.
https://sourceforge.net/projects/libusb-win32/

Please review below for details and let us know your opinion and suggestions.
It would be great if you can help us to fix this issue.
I have sent you an email around 8/5.

The problem is the irp->userIosb ptr member which points to local io_status which is allocated locally from the stack.

The kernel updates the 2 qword io_status which is originally on the stack of call_usbd_ex but sometimes the kernel stomps the stack of another routine if call_usbd_ex has already returned when the update happens.

Here’s a lot more detail from libusb_driver.c:

NTSTATUS call_usbd_ex(libusb_device_t *dev, void *urb, ULONG control_code,      <-- see note1
                                    int timeout, int max_timeout)
{
       KEVENT event;
       NTSTATUS status;
       IRP *irp;
       IO_STACK_LOCATION *next_irp_stack;
       LARGE_INTEGER _timeout;
       IO_STATUS_BLOCK io_status;                                                 <-- see note 2
 
       if (max_timeout > 0 && timeout > max_timeout)
       {
              timeout = max_timeout;
       }
       if (timeout <= 0)
              timeout = LIBUSB_MAX_CONTROL_TRANSFER_TIMEOUT;
 
       KeInitializeEvent(&event, NotificationEvent, FALSE);
 
       irp = IoBuildDeviceIoControlRequest(control_code, dev->target_device,     \
              NULL, 0, NULL, 0, TRUE,                                             > see note 3
              NULL, &io_status);                                                 /
 
       if (!irp)
       {
              return STATUS_NO_MEMORY;
       }
 
       next_irp_stack = IoGetNextIrpStackLocation(irp);
       next_irp_stack->Parameters.Others.Argument1 = urb;
       next_irp_stack->Parameters.Others.Argument2 = NULL;
 
       IoSetCompletionRoutine(irp, on_usbd_complete, &event, TRUE, TRUE, TRUE);
 
       status = IoCallDriver(dev->target_device, irp);                          \_ returns STATUS_PENDING so need to wait for irp completion
       if(status == STATUS_PENDING)                                             /
       {
              _timeout.QuadPart = -(timeout * 10000);
 
              if(KeWaitForSingleObject(&event, Executive, KernelMode,            \_ wait times-out after specified 500ms  --  see note 4
                     FALSE, &_timeout) == STATUS_TIMEOUT)                        /
              {
                     USBERR0("request timed out\n");
                     IoCancelIrp(irp);                                           <-- attempt to cancel the irp because of the time-out
              }
       }
 
       /* wait until completion routine is called */
       KeWaitForSingleObject(&event, Executive, KernelMode, FALSE, NULL);       <-- wait (with infinite time-out) for completion
 
       status = irp->IoStatus.Status;                                           <-- see note 5
 
       /* complete the request */
       IoCompleteRequest(irp, IO_NO_INCREMENT);                                 <-- see note 6
       
       USBDBG("status = %08Xh\n",status);
       return status;                                                           <-- see note 7
}

Note 1:
The call_usbd_ex() routine executes all the time but precisely every 60sec there is a unique usb request that always times-out after 500ms and fails:
2: kd> dt _URB ffff868d`0be18570 UrbControlVendorClassRequest.Value
USBPORT!_URB
+0x000 UrbControlVendorClassRequest :
+0x082 Value : 0x3fd <-- 0x3fd = 1021 is a unique usb vendor request code that always times-out
Eaton should be able to tell us what this 0x3fd (1021) request is, why it gets sent every 60sec, and what might cause it to always time-out.
We should check a “working” system to see if this request is being sent and if it always time-out and fails.
If this request didn’t get sent or if it didn’t time-out and fail then we would not have a problem.

Note 2:
The 2 qword IO_STATUS_BLOCK io_status is allocated as a local on the stack of the current routine:
2: kd> dt IO_STATUS_BLOCK Status Information
libusb0!IO_STATUS_BLOCK
+0x000 Status : Int4B <-- low dword of first qword will be updated with ntstatus 0xc0000120 (The I/O request was canceled)
+0x008 Information : Uint8B <-- second qword will be updated with the byte count of the transfer which will be 0x00000000’00000000
This is what gets updated or stomped on the stack and the results are:

  • If the writes happen before this routine returns then this is just a normal status block update
  • If the writes happen after this routine returns then this stomps the stack possibly of some other routine and timing / stack usage determines if it crashes

Note 3:
The irp is created here. Notice that the &io_status ptr is passed as an arg and it becomes the irp member UserIosb ptr:
2: kd> dt nt!_IRP IoStatus UserIosb
+0x030 IoStatus : _IO_STATUS_BLOCK
+0x048 UserIosb : Ptr64 _IO_STATUS_BLOCK <-- ptr to local io_status allocated on this routine’s stack
The irp that’s created has the following flags:
2: kd> dt nt!_IRP 0xffffe486`c1d56990 Flags
+0x010 Flags : 0x40060000 <-- 0x40000000 => FILE_FLAG_OVERLAPPED => it can be asynchronous (or synchronous)
When the kernel is later called to complete the request, it copies IoStatus into the ptr contents of *UserIosb (ie. io_status) which is the final status info for the irp:

  • This is what can stomp the stack of another routine depending upon when it happens
  • ie. it might be a synchronous or asynchronous completion
  • if it is async then the stack will get stomped later but there will only be a crash if another routine is actively using those stack addrs at that time
    Either of the following will work-around the problem:
  • If io_status is defined as a global (instead of as a local on the stack) then the stack will not get stomped
  • If io_status is replaced with a NULL then the kernel will not update user status (*UserIosb) and the stack will not get stomped
    o Note: I don’t think it is documented that this ptr can be a NULL ptr but I verified that this works (using kd to hack the assembly code)

Note 4:
This times-out after a specified 500ms wait/time-out (ie. the request/irp event does not get signaled within 500ms).
If I hack the assembly with kd to make it an infinite wait/time-out then it works:

  • This means that this usb request (ie. the specific one that happens every 60sec) is taking far too long
  • Eaton should be able to tell us what this vendor _URB request (value 0x3fd / 1021) is and why it takes so long
  • We should also check a “working” system to see if the same request is being sent every 60sec and if it also times-out
  • ie. if this request is not being sent and/or not timing-out (or we can stop it) then the system will not fail

Note 5:
The “status = irp->IoStatus.Status” should really be “status = irp->UserIosb->Status” (where the final status is copied into) but this doesn’t cause a problem.
It does mean that irp->UserIosb is not being used here.

Note 6:
This call (IoCompleteRequest) tells the kernel to finish and do clean-up of the irp. This includes the kernel executing nt!IopCompleteRequest to copy IoStatus into *UserIosb:

  • this can happen synchronously by direct calls (w/o an APC int) as shown in one of the callstacks earlier in this email
  • this can also happen asynchronously sometime later via an APC int as shown in another of the callstacks earlier in this email

Note 7:
The call_usbd_ex routine returns here and the locals on its stack should not be accessed from this point onwards:

  • if the irp was handled synchronously then io_status on the current stack was previously written and will not be accessed again => no problem
  • if the irp is being handled asynchronously then io_status on the stack might be written sometime after the return (on some other routine’s stack):
    o if the APC int and the write of io_status happens before the penter() check then the stack is stomped w/o problems because the addr is not yet in use
    o if the APC int and the write of io_status happens after the penter() check then the stack is stomped w/o problems if the addr is not currently used
    o if the APC int and the write of io_status happens near the penter() check then the args being passed up the stack during that time window get stomped
     this last case is where we see 1 or 2 stomped args on the stack causing a penter trap or crash
     ie. stack addr of old qword io_status.Information stomped w/ 0 and old dword io_status.Status stomped w/ ntstatus 0xc0000120

Best Regards,
James

@mcuee
Copy link
Owner

mcuee commented Aug 13, 2021

Sorry but I do not see any emails in libusb-win32 mailing list. You need to subscribe to the mailing list in order to post.

@mcuee
Copy link
Owner

mcuee commented Aug 13, 2021

@dontech Just wondering if you have some time to take a look whether this is a real issue or not.

@mcuee
Copy link
Owner

mcuee commented Aug 13, 2021

@JamesHuangHPE
Have you tried out the latest snapshot version here? Take note it is unsigned version so you may have to use a test machine and enable test signing.
https://sourceforge.net/projects/libusb-win32/files/libusb-win32-snapshots/20190918/
https://docs.microsoft.com/en-us/windows-hardware/drivers/install/the-testsigning-boot-configuration-option

Just wondering if it is possible for you to change to libusbk driver. It is difficult for use to support libusb-win32 driver now.

@mcuee
Copy link
Owner

mcuee commented Aug 13, 2021

Even for libusbk driver, we are shipping the old version because of the new Windows 10 kernel driver requirements. We have switched the focus on the libusbk.dll and not updating the libusbk.sys. WinUSB driver should be the way to go.

@JamesHuangHPE
Copy link
Author

Hi mucee,

I email to this address before ==> pdt@dontech.dk

James

@JamesHuangHPE
Copy link
Author

@JamesHuangHPE
Have you tried out the latest snapshot version here? Take note it is unsigned version so you may have to use a test machine and enable test signing.
https://sourceforge.net/projects/libusb-win32/files/libusb-win32-snapshots/20190918/
https://docs.microsoft.com/en-us/windows-hardware/drivers/install/the-testsigning-boot-configuration-option

Just wondering if it is possible for you to change to libusbk driver. It is difficult for use to support libusb-win32 driver now.

The issue also existing in latest version.

@mcuee
Copy link
Owner

mcuee commented Aug 13, 2021

I see. In that case, let see whether @dontech can come and comment.

@mcuee
Copy link
Owner

mcuee commented Aug 13, 2021

@JamesHuangHPE I will strongly encourage you to try out libusbk driver to see if that helps. libusbk.sys is almost a drop in replacement for libusb0.sys for most of the devices. You do not need to change your application as libusb0.dll is still working with libusbk.sys.

You can use the libusbk-inf-wizard to replace the driver.

@mcuee
Copy link
Owner

mcuee commented Aug 13, 2021

Ref: libusbk.sys vs libusb0.sys vs WinUSB
http://libusbk.sourceforge.net/UsbK3/usbk_comparisons.html

@dontech
Copy link
Collaborator

dontech commented Aug 15, 2021 via email

@mcuee mcuee added the bug label Aug 16, 2021
@mcuee
Copy link
Owner

mcuee commented Aug 16, 2021

@TravisRo If you have some time, please take a look as well.

@dontech
Copy link
Collaborator

dontech commented Sep 20, 2021

Any update on this?

James, did you try the code i posted?

@JamesHuangHPE
Copy link
Author

Hi Dontech,

Where did you post the code?
Is there any link?
Thank you.

James

@dontech
Copy link
Collaborator

dontech commented Sep 21, 2021

Hi James,

When I meant "the code" I meant the suggestion above which I wrote on Aug 15.

I will need to modify the driver to try it, but since you already have done that before, it should not be a problem.

Thanks,

/pedro

@JamesHuangHPE
Copy link
Author

hi Pedro,

Could you please modify the driver and provide it to us for testing?
I did not do the modification before, it was done by other engineer before.
Thank you.

James

@dontech
Copy link
Collaborator

dontech commented Sep 24, 2021

Hi James,

I am not sure when or if i have the time for this. Furthermore, i am not sure i can reproduce the problem here.

I will see what i can do, but i cannot guarantee when i will have time for this.

If you need it done within a reasonable time-frame, i would suggest you allocate some resources for it.

Thanks,

/pedro

@dontech
Copy link
Collaborator

dontech commented Sep 26, 2021

Hello again James.

OK i found a little time for this, and i think the fix works:

f41b963

@dontech
Copy link
Collaborator

dontech commented Sep 26, 2021

Hi James,

  1. Please review the changes i have made thoroughly.

  2. Please try these binaries. You will need to disable driver signing requirements or re-sign them for testing (using your own inf).

libusb_fix_stack_smash.zip

Thanks,

/pedro

@dontech
Copy link
Collaborator

dontech commented Sep 28, 2021

James, please update this.

@JamesHuangHPE
Copy link
Author

Hi Peter,

The HPE SW expert is reviewing the modification.
should have some update later.
Thank you.

James

@dontech
Copy link
Collaborator

dontech commented Oct 16, 2021

Hi James,

Any update on this? Does the fix work?

@JamesHuangHPE
Copy link
Author

Hi Peter,

Sorry for the late reply, we are seeking some other way to fix this issue.
I will provide more response once the solution is confirmed.
Thank you for your support.

Best Regards,
James

@dontech
Copy link
Collaborator

dontech commented Oct 18, 2021

OK, but any update on why this does not work would be appreciated.

@dontech
Copy link
Collaborator

dontech commented Oct 25, 2021

@mcuee
Copy link
Owner

mcuee commented Oct 25, 2021

@JamesHuangHPE
Please try the latest snapshot and check whether this issue has been fixed. Thanks.

@dontech
Copy link
Collaborator

dontech commented Nov 13, 2021

James, could you/HPE report back on this?

Lets get this one closed, so we are sure we have it right for the next release.

@JamesHuangHPE
Copy link
Author

Hi Peter,

Sorry for the late response.
We found other solution on the application software, fix the URB request timeout problem to cause the USB driver enter error path to cause this issue.
Thank you for you update, and please close this case.

Best Regards,
James

@dontech
Copy link
Collaborator

dontech commented Nov 15, 2021

OK The fix is still valid however, as the driver is now more safe from BSODs.

Thanks,

/pedro

@JamesHuangHPE
Copy link
Author

Hi Peter,

Thank you for all your response and driver updated.
Really appreciate your support.

Best Regards,
James

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

3 participants