-
Notifications
You must be signed in to change notification settings - Fork 33
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
platform/surface: aggregator: Defer probing when serdev is not ready #152
platform/surface: aggregator: Defer probing when serdev is not ready #152
Conversation
Thanks for the PR (and especially the breakdown of what goes wrong). But as you said, I think we can do better: Generally, in cases like this the driver should return if (status == -ENXIO) // or whatever is returned by serdev_device_open() when it fails due to not being ready
return -EPROBE_DEFER;
else if (status)
return status; Apart from me not knowing the specific error code, the reason I haven't added this yet is that I don't know where specifically this should be addressed. I.e., should Due to that, my current suggestion for a workaround would be to build-in the required modules ( If I remember correctly Hans de Goede mentioned that (on x86?) the pinctrl stuff should be always built in (or at least present in the initramfs). And since I'm already mentioning him: @jwrdegoede: Maybe you know what the proper way of returning Edit: Looks like I've been mistaken and Fedora doesn't include all the pinctrl modules. But it looks like they are all present in the initramfs. |
@phreer: I've opened a PR for building in the modules at linux-surface/linux-surface#1426. In case we merge this before you get around to testing with |
So |
I think the proper way is probably a mix of both: Build in the pinctrl modules or have them in the initramfs (as a lot of stuff relies on them) but don't do that for the serial ones as they are a bit more specific. Then return |
I just got basic idea how deferred driver probing works in kernel and will play around with it soon, hopefully enable this functionality for SAM driver. But I don't get the point why we still require modules like pinctrl to be built in even if deferred probing is supposed. |
Returning |
Yeah, of course. But what if we also return |
Generally speaking if you have some sort of get() function for a resource and the resource is not yet ready then you would expect that get() function to return -EPROBE_DEFER and the caller will just propagate it. This is also where using dev_err_probe() is useful for logging errors from such get() functions during probe since it will turn the msg into a dbg message when the error code is -EPROBE_DEFER. As for this specific case, where does the serdev the driver tries to open come from ? I would expect this to be created by the tty subsystem, specifically by Since this serdev is instantiated at the time the tty subsystem is probing it I find it weird that when your serdev driver (I assume you are using a serdev driver?) tries to bind to it that it is not ready yet. I think it might be best to discuss this on the linux-serial mailinglist. Feel free to Cc me. |
I think that would be the most ideal solution. But pinctrl drivers are spread out quite a bit, so I'm not really sure what all would need to be done/checked here. |
Yes, the SAM device in ACPI links to UartSerialBusV2 resource, so that's where the serdev comes from.
It is a serdev driver, yes. From the reports I've got, building in I will try to look into it a bit more and then send a mail to linux-serial. Thanks! |
9a2f740
to
7296e62
Compare
Hi @qzed, me again. As you suggested, I tried the way of deferring probing and it worked as expected. Only deal with serdev case for now. Let's be patient and identify more race condition points in the future. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it would look a bit nicer if you do something like
if (status == -ENXIO)
status = -EPROBE_DEFER;
if (status) {
ssam_err_probe(...);
goto err_devopen;
}
You'd have to add ssam_err_probe()
similar to the other print macros though with dev_err_probe()
as underlying print function.
@qzed Sure! Do you think it is a good idea to add more messages upon error, e.g., gpiod_count? (As IMHO it's kind of lacking context in dmesg when something unexpected happen.) |
7296e62
to
0fc940f
Compare
@phreer I think that could be quite helpful, yes. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks! Looks good to me.
6f0f68f
to
59c93ed
Compare
Hope those error messages in the new commit would help! Edit: I tested these changes with my SP5, all looking good. |
Hi @qzed , I've completed my code and would appreciate guidance on how to proceed further. |
Hi, sorry for the delay. I've been a bit busy with day-time work the last days again. I'm not a big fan of moving the controller init to the front, I'd prefer if we can keep the order (with smaller/lighter init stuff in the beginning). Just use the Apart from that I think it looks good. I think you could also submit the |
Sure, it makes sense to let light staffs go first so we can fail fast. I'm pleased to start a discuss the EPROBE_DEFER patch on mailing list. Lastly, I appreciate your assistance despite your busy schedule. Thank you! |
This is an attempt to alleviate race conditions in the SAM driver where essential resources like serial device and GPIO pins are not ready at the time ssam_serial_hub_probe() is called. Instead of giving up probing, a better way would be to defer the probing by returning -EPROBE_DEFER, allowing the kernel try again later. However, there is no way of identifying all such cases from other real errors in a few days. So let's take a gradual approach identify and address these cases as they arise. This commit marks the initial step in this process. Signed-off-by: Weifeng Liu <weifeng.liu.z@gmail.com>
Emits messages upon errors during probing of SAM. Hopefully this could provide useful context to user for the purpose of diagnosis when something miserable happen. Signed-off-by: Weifeng Liu <weifeng.liu.z@gmail.com>
73c322a
to
2e98a15
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for sticking with it! Looks good to me now.
Since [1] we should not require the serial drivers to be built in any more. In essence, the PR causes the SAM driver to return EPROBE_DEFER when the serial device is not ready. Specifics on how to best handle this (i.e., which function should actually return EPROBE_DEFER) are still to be discussed, but for now this should provide a better workaround that hopefully fixes this problem as well. So let's test it. [1]: linux-surface/kernel#152
Changes: - Fix SAM driver probe failure when UART is not ready yet (@phreer). Links: - kernel: linux-surface/kernel@43ef589 - PR: linux-surface/kernel#152
Failure of probing of SSAM serial hub was observed (linux-surface/linux-surface#1271) from time to time due to race condition where the underlying uart device could be not ready at the point calling serdev_device_open(). Specifically, the failing path is
Retrying upon failure in opening serdev device would greatly alleviate this situation. Tested with my Surface Pro 5 rebooting for 10+ times and didn't see the issue anymore.
I admit this solution might be suboptimal, but it does help cure the issue that users lost their battery indicator or touchpad functionality randomly.