-
Notifications
You must be signed in to change notification settings - Fork 1.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
CI: [TEST ONLY] Run macOS with debug level 3 to investigate stress_mt failure #1358
base: master
Are you sure you want to change the base?
Conversation
I have been trying to reproduce the stress_mt failure locally on an Intel MBP but to no avail. |
9e174b5
to
8272de2
Compare
|
@hjelmn Any idea why we see this intermittently on the macOS CI host? |
As a good heisenbug the stress_mt failure will likely not happen now...
8272de2
to
82fce0e
Compare
Hm so looking at those "another process has device opened for exclusive access", does that mean that on macOS it's not possible to libusb_open same device multiple times? Is this a bug in the backend or a general limitation of the OS? |
If it's the latter, one thing that could be done is to use reference counting in the backend's device wrapper. But first would be good to understand where this limitation lies and why you can't reproduce it locally. Maybe different OS versions? Or simply different system load, e.g. worth trying to increase number of threads as well as iterations locally to see if it reproduces under heavier contention. |
I'll probably leave further investigation to macOS backend maintainers. But this looks like exactly the kind of issues with concurrent transfers I was hoping the updated test would help reveal, as it did for me in the WebUSB backend as well. |
The latest run's failure is a bit strange.
|
No issues here with git HEAD on my Mac Mini M1 running macOS 14.1.
|
Yes, the last run (with debug level 4) is different. There are some failing opens, but also the thread "0" didn't find any devices. Naturally the full debug log is enormous (15 MB), I copy in just the first lines of the stress_mt part here with the 5 devices detected. Five devices detected
And here is the single "error" level message:
|
The CI is running macOS 12.7.1. I have been trying to reproduce on 13.6. Unless we have reason to believe this is a recent regression, I don't think we should block 1.0.27 on this issue. Especially if no one has seen it before or is able to reproduce it outside the CI. |
This reverts commit 31e889a.
The CI run on macOS 13.6 succeeded both with debug level 3 and 4. |
Does anyone around have macOS 12 so that they can try to reproduce the failure on it? |
I sent an email to libusb-devel mailing to seek help from the community just now. |
Hi,
|
@sl1200mk2 |
|
@sl1200mk2 @tormodvolden |
We might wanna bump the CI to macOS 13 (although considered beta there) so it doesn't distract us so much. |
I can see if I can reproduce inside a VM and see if that makes a difference. I keep an Intel machine around for this reason. Will take a bit to get it back up and running. I can also try to reproduce in a VM in an M3 machine. |
Tried to reproduce on 11.6.1 but no error. |
The latest CI run timed out after 6 hours:
Was it just very slow, or is an infinite loop possible? |
Possibly unrelated but there is also reports of deadlock on Windows in #1376 (comment) |
FYI At my work I have a build bot for every Mac OS from 10.10 onwards. Maybe this is due to a threading bug. Could be related to: |
As a good heisenbug the stress_mt failure will likely not happen now...