-
Notifications
You must be signed in to change notification settings - Fork 49
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Missing data for In transfer of size >65536 bytes #45
Comments
Does this occure if you do not use libusb-1.0 API, rather using libusb-win32 API or libusbK API? As far as I know, there is no such issue if I use libusbK API (using kBench) and libusb0.sys. Can you post the libusb-1.0 debug log (and probably also libusb0.sys debug log if you can). |
I didn't try without libusb-1.0, I only tried with different drivers, but the same top-level components. I will try to post some debug logs, yes, OK. |
What is the trick for getting some output from libsub0.sys by the way? It's quite frustrating that I cannot get a single line of output. On the opposite side of the spectrum, libsusb-1.0 with debug log level is so verbose that it slow down my application to the point where I get timeouts I don't usually have. What is your technique for debugging libusb0.sys? |
Here is some libusb-1.0 debug logs.
|
Hi @sonatique, thanks for the report. It would seem that the problem can be in many places. libusb-1.0 DLL calls libusbk DLL to access libusb0, so there are a few places this can go bad (apart from the device itself). WinUSB is supported directly in libusb-1.0, so the problem could lie in libusbk. After that as @mcuee stated, you could try porting the same program to libusb-0.1. The problem is still interesting to us, as running through libusb-1.0 is not an unlikely scenario. |
@dontech , thank you for your interest in my issue. Given the comment in transfer.c about some limit involving 65536, but given that the code below doesn't use the value 65536 at all, I have the feeling that something may be "out of sync" around here, maybe insufficiently tested. I'll be happy to try to reproduce by writing a libsub-0.1 program, but as said, not easy with my device, it would require porting a lot of code. And let's say that I do, and that I share the program with you: you would probably still be unable to reproduce, given you don't have my device. We should share a common device, firmware or something, without it it seems tedious. So this makes me want to ask these questions:
I am willing to help, but the curve to enter seems steep. Thank you for any information you may provide to me. |
First of all thanks for spending time on this! :-)
Great work. This means the problem is either in libusbK DLL when it calls libusb0.sys or in libusb0.sys itself. Let me see if I can reproduce this with one of my HS controllers (e.g. an STM32 controller or similar).
Yeah probably both. The limits could very well be the problem, as they are almost never tested. Also, I do not think this was the focus previously, as very few people transfer that much in each packet. But as data-rates go up, this becomes normal.
Probably I do not need your device. If the problem is in the library/driver, a loopback device in high speed should be fine.
There is currently no support for unit-testing or similar. Only on-device testing. WDM drivers are not typically written with unit-testing in mind, as the WDM system is from a time before that was a thing.
DebugView from microsoft should spit out debug messages if you use the debug build. Please file a report if that does not work. https://learn.microsoft.com/en-us/sysinternals/downloads/debugview
A legacy controller It was hot when the library was written. Personally i use STM32s for most of my testing, and embedded linux in host and device modes (gadget mode).
I would get an STM32F, or STM32H as they are cheap and accessible and have HS parts (most micros are FS only). But there are many brands and names that would work. For testing you typically only need the basic features and a simple loopback program running on the device.
That should work out-of-the-box. Install Visual Studio 2019, and simply "build". If that does not work, file a bug report here on github.
That should also work out-of-the-box if you use the right EWDK version 22000.
https://learn.microsoft.com/en-us/windows-hardware/drivers/other-wdk-downloads If that or any other make target does not work, file a bug report here on github.
I think I should probably update the README on how to build this... :-). It should not be that steep. Thanks for helping, /pedro |
@sonatique, please state what version of libusb-1.0 libraries you are using |
Yes STM32F/STM32H is good. Another good option is EZ-USB FX2LP (high speed, very cheap boards from online shops) and EZ-USB FX3 (superspeed). Reference discussion here: you can see that I have many boards. EZ-USB FX3 libusbK benchmark testing FW and test results. You can either use Cypress' original FW, or the modified FW by Travis. |
@sonatique Also please post the skeleton libusb-1.0 code you are using to write/read from/to the device. From your description you are using the async API, but please post some details. Also a dump of the descriptors on your device would be helpful. E.g. the size of the physical endpoints would be great. |
…B data loss with libusb0.sys v1.2.7.3 when transfer IN buffer size is larger than 65536. The idea is to write 64bit incrementing numbers, by bunch of 1000 (i.e. 8000 bytes write transfer, value chosen to ensure short packets) to loopback device, and concurrently read from loopback using a buffer of a given size, and compare received value with an incrementing counter. If received value doesn't match expected value, it indicates some data has been lost. I can see no loss data for IN transfer buffer size <= 65536 and lost data with read buffer size > 65536 Steps to reproduce issue "mcuee/libusb-win32#45" with CYUSBKIT-003 EZ-USB FX3 dev board and FX3 SDK demo firmware "dma_examples\cyfxbulklpauto" (binary file "USBBulkLoopAuto.img" attached here for simplification, source code available as part of Cypress/Infineon FX3 SDK) 1) Upload USBBulkLoopAuto.img to EZ-USB FX3 RAM (for instance using Infineon USB Control Center) 2) Use Zadig Version 2.8 to force libusb-win32 (v1.2.7.3) driver for "FX3" device 3) Compile and run loopbacktest program: everything should work (no "Integrity error!" message, only printf of current received data) 4) Stop running program, change source code line 86 from "static uint8_t buf[OK_ReadBufferSize];" to "static uint8_t buf[NotOK_ReadBufferSize];", recompile, run. 5) Program will promptly stop with "Integrity error!" message. From expected and received value we can easily see that some data has been lost. 6) Use Zadig to force libusbK device, run same program again: no issues. Note 1: I verified using a USB physical traffic sniffer that IN and OUT data present on the bus match expectation in any cases (traces available on demand) Note 2: Any loopback capable USB device should work for reproducing the issue
@dontech : sorry for the large delay before replying, but I am now back at work on this issue, I made a libusb-1.0 based program that can easily reproduce the issue using a device running a loopback firmware (I used a EZ-USB FX3 with default SDK "auto" loopback firmware). As said in this commit comments: The idea is to write 64bit incrementing numbers, by bunch of 1000 (i.e. 8000 bytes write transfer, value chosen to ensure short packets) to loopback device, and concurrently read from loopback device using a buffer of a given size, and compare received value with an incrementing counter. If received value doesn't match expected value, it indicates some data has been lost. I can see no loss data for IN transfer buffer size <= 65536 and lost data with read buffer size > 65536 Steps to reproduce issue with CYUSBKIT-003 EZ-USB FX3 dev board and FX3 SDK demo firmware "dma_examples\cyfxbulklpauto" (binary file "USBBulkLoopAuto.img" put in he commit for simplification, source code available as part of Cypress/Infineon FX3 SDK)
Note 1: I verified using a USB physical traffic sniffer that IN and OUT data present on the bus match expectation in any cases (traces available on demand) Note 2: Any loopback capable USB device should work for reproducing the issue Note 3: When I first see the bug I was using libusb-win32 v1.3.0.0, but for ease of testing I used latest Zadig which only carries 1.2.7.3. Issue is there in both cases. I hope that will help finding and solving the issue, thank you very much in advance for your work! |
OK @sonatique i reproduced the problem with your example, and with a similar libusb0 example. I have located the source of the problem. The problem is that the maximum size for a transfer to the core USB driver that libusb0 sends IRPs to is 64K. At least in Windows 10. Not sure what the limit was in previous Window versions. In the driver there are 2 paths. Small ones (below 64K) and "transfer", and large ones are "large transfer". Not sure why its made like this, as it would probably have been better just to have 1 path, so it got tested more. For large transfers, it splits the transfers in sub transfers and sends all the transfers to the driver below. This is where there is a problem. If the first transfer is "short", meaning that the entire transfer window is not used, the remaining transfers are cancelled. This is where it gets nasty. The "larger transfer" code is inventive, but has some problems, and some bugs. There are 2 problems/bugs here:
Need to do some more testing and fixing to get this under control. @sonatique: As a side-note, I can mention that I do not think going above 64K actually makes sense for an application. 1024*48 = 49152 bytes. So going above 64K should not yield any significant advantage. In your application, do you get more bandwidth by going above 64K? This would be interresting to know. |
@dontech Thansks a lot for having looked at this, for your detailed explanations, and in advance for your efforts at fixing these problems. To answer your question: actually with my application, I do not get an higher throughput if I, for instance, use 128K instead of 64K, I can reach my maximum (180MBytes/s) in both cases. However, when doubling the buffer size, I divide the CPU consumption by about 2, and this is very interesting for my needs. By increasing the buffer size even more (I found that 2MBytes was a nice value) I can make CPU consumption of the USB acquisition part going to something negligible, for the same maximal rate of 180MBytes/s |
OK I have made a conclusion. The entire "large_transfers" code is a no-go.
The problem is that the behavior for IoCancelIrp() for USB transfers is not well defined. A) In some cases the Cancel function ends up cancelling pending packets which have not started. OK. B) In some cases the Cancel causes the driver below to abort in the middle of an operation where data fetching has already started. This means the device sees this as sent, but the driver now throws it away. This will never work. I have made a patch that simply removes "large_transfers" support completely, as I am sure it is fundamentally broken. |
@mcuee @tormodvolden @TravisRo do you have any comments on this? |
Now, after this we need to find out why you have high CPU usage. My guess would be that for some reason your application IOCTL requests from your app is more expensive than the IOCTL from the driver to the core USB stack. IOCTL from applications in general to the kernel require more than intra-kernel IOCTLs I guess I have some questions: A) What is your endpoint parameters? Could you dump your USB descriptor? B) Is the CPU load smaller with libusbk ? |
@dontech thanks for your question. B) Quick answer: no. libusbK0.sys: WinUSB.sys: |
Wow those are really great numbers. Thanks. I should probably study the libusbk code to see how they solve the problem of large transfers. |
OK i figured it out. The difference is that libusbk queues all the writes, and then fires them serially after each other, until it gets a short packet. This will work. We could change libusb0 to do the same, but then i need to change quite a bit in the driver transfer code.... |
Minor thing -- you can build Zadig to use libusb-win32 1.3.0.1. Or you can use the binary I built. There is a driver display issue though, probably a libwdi bug. |
It is interesting to know that libusb-win32 has issues to deal with large transfer. Normally I use libusbK kBench program to carry out the test using different drivers (libusbK.sys, libusb0.sys and WinUSB) under Windows. Or libusbdotnet BenchmarkCon application under macOS (it also works under Windows but not as good as kBench). It does not seem to work with isochronous transfer though and it has not been updated for a long time. I have not encountered such issues for USB Bulk transfer -- maybe the test program can not catch the issue. |
@dontech |
@sonatique I think I have fixed the problem.
|
@sonatique: I made this for you to test with: https://github.com/mcuee/libusb-win32/releases/tag/test_release Please make sure to enable test signing before using it, or to load it up with zadig. Please report back if this works for you, and what performance you get. It should be good! |
Not any more... :-) |
@dontech : hum: how do you tell Zadig to install driver files from a given folder? I can not figure that out, I can only select one from the built-in list. |
This of course only works if you remember to disable driver signing: As admin: bcdedit.exe -set loadoptions DISABLE_INTEGRITY_CHECKS Looking forward to your results |
@dontech and @sonatique It seems to me that @dontech has signed properly the libwdi 1.5.0 source code: Just modify the configure file to point to the right locations.
|
@mcuee : thanks I will try. |
No. I only signed it with my signature. So you need to turn on test signing. Submitting to microsoft is a pain that i try to avoid unless necessary. The rest of what @mcuee said is correct. But turn ON testsigning. |
@dontech : I just tested your correct libusb0.sys for performance: results are equal to what I had with the bugged version, that's very good! From my measurement it appears that libusb0.sys seems to be the most efficient driver amongst the 3, excellent. Just to be sure I will test for data integrity on Monday, and let you know, but I assume it will be OK. That's great, thanks a lot for your work! |
@sonatique and @dontech I did some tests with libusbK kBench as well last time and indeed libusb0.sys is the most efficient. I assume it is because it is based on WDM and not KMDF (WinUSB.sys and libusbk.sys are based on KMDF). cyusb3.sys is also based on KMDF but it has the same performance as libusb0.sys (using Cypress's own host application) Reference discussion in OSR ntdev
BTW, I believe kBench (using libusbK API) is more efficient than libusb-1.0 API. |
@mcuee: libusb0 might be even faster now, to take the lead. The reason is that previously each bulk transfer call would result in one transfer in the driver below. Now, with the latest changes i made, even smaller transfers can be split in several USBD calls. So e.g.: before: 1 transfer of 1MB of data on an 4KB endpoint would result in 1 USBD transfer of 4KB = 4KB Each IOCTL has an overhead on CPU and bandwidth, so cutting them down is more efficient. |
Excellent. It also turned out the code was easier than i thought. Re-using the IRPs and URBs gave it the last boost. Now we just need someone to review. |
@dontech : I just test with the simple loopback test I provided on December 7 and I can no longer see any "data corruption", that is great, thanks a lot! However: I then tried to increase buffer size even more, just for fun (since you claimed we could go up to 4GBytes ;-) ) I have no idea where this comes from. Did you try on your side with 64 MBytes or more? |
OK that is probably a separate issues. The number 4GB was supposed to be 2GB, which is the limit imposed by the IOCTL from user-space to the driver. I never tried it, just looked at the possible values. Why 64MB is the limit before a crash is beyond me, but lets take this is a separate bug-report. |
Verified closed by reporter. |
Hello,
I discovered by accident that as soon as I want to use transfer size > 65536, for inbound transfers, I start randomly missing some pieces of data, even if the actual length of transferred data is well beyond 65536.
I am only using bulk In transfer on a USB2.0 High Speed link/device.
For instance: I am using libusb-1.0 with libusb0.sys driver (and therefore libusbK.dll in the middle).
If I use 65536 or less, everything is fine. If I use 65536+512, or any value larger than 66536 + N * 512, I get the missing data issue.
If I use Zadig to switch to libusbK.sys or WinUSB.sys, I get no issues at all, for larger or smaller than 65536 transfer buffer size.
I can see from the comment at line 352 of transfer.c that "each bulk/interrupt irp/urb pair can request at the most 65536 bytes" but I don't see how this value is actually used in the code, I find no other occurrence of this value. Instead I can see that "maxTransferSize" is used somehow, but I do not expect maxTransferSize to be equal to 65536 on my system (Windows 10).
And even if maxTransferSize was equal to 65536, I would expect the code to work, but it doesn't seem it does correctly.
Is this a known issue / limitation?
The text was updated successfully, but these errors were encountered: