Skip to content

Conversation

@longvatrong111
Copy link
Contributor

@longvatrong111 longvatrong111 commented May 22, 2025

Follow up to hit_test failed occasionally when the touch event is sent and #36676 (comment), this PR adds a retry for hit tests with expired epoch in result.

Hit tests with outdated epoch mean it is too early to perform the hit test.

Solution: retry hit test for the event on the next webrender frame.
The retry should guarantee that:

  • Keep the correct order of events
  • Retry time is not too long

Test cases: ./mach test-wpt --product servodriver -r tests\wpt\tests\webdriver\tests\classic\element_click\events.py

cc: @xiaochengh , @yezhizhen

@longvatrong111 longvatrong111 changed the title [Just a draft to share draft solution] - Add retry for hit tests with expired epoch in result Add retry for hit tests with expired epoch in result May 22, 2025
@longvatrong111 longvatrong111 marked this pull request as ready for review May 22, 2025 09:42
@longvatrong111 longvatrong111 requested a review from mrobinson as a code owner May 22, 2025 09:42
@longvatrong111 longvatrong111 force-pushed the hit-testing-fix branch 2 times, most recently from afcd497 to 747726a Compare May 22, 2025 10:18
Copy link
Contributor Author

@longvatrong111 longvatrong111 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wonder should we retry all failed hit tests (the epoch may be mismatch but there is no item in the result to check)?

Copy link
Member

@yezhizhen yezhizhen left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hit tests with expired epoch mean it is too early to perform the hit test.

I think it means "too late" instead "too early"? If we retry, we are essentially hit-test against the new display list. It may work well with WebDriver as it is fairly static, but may introduce weird experience for real browsing..
cc @jdm

@longvatrong111
Copy link
Contributor Author

Hit tests with expired epoch mean it is too early to perform the hit test.

I think it means "too late" instead "too early"? If we retry, we are essentially hit-test against the new display list. It may work well with WebDriver as it is fairly static, but may introduce weird experience for real browsing.. cc @jdm

If two events occur in sequence, their effects should be applied sequentially.
In this issue, the first event triggers a rebuild of the display list, and the second event arrives during that process so it get "outdated epoch". Servo should wait until the process is complete to handle the second event, rather than ignore it.

@mrobinson
Copy link
Member

I'm not sure a "retry" approach is the right one here. It feels like putting a bandaid over the real problem which is that hit testing needs to be done synchronously with script. 🫤

@longvatrong111
Copy link
Contributor Author

longvatrong111 commented May 23, 2025

I'm not sure a "retry" approach is the right one here. It feels like putting a bandaid over the real problem which is that hit testing needs to be done synchronously with script. 🫤

We can think retry and synchronization are similar terms here because we pick a specific point to retry.
What we do here is making some events to wait until hit testing is ready for them.
p/s: we even don't need the retry. As my understanding, all hit test will fail from when epoch is updated in compositor to when compositor receives next webrender frame. So if events come during that period, just make them wait immediately, dont need to try hit test.

@longvatrong111
Copy link
Contributor Author

I'm not sure a "retry" approach is the right one here. It feels like putting a bandaid over the real problem which is that hit testing needs to be done synchronously with script. 🫤

We can think retry and synchronization are similar terms here because we pick a specific point to retry. What we do here is making some events to wait until hit testing is ready for them. p/s: we even don't need the retry. As my understanding, all hit test will fail from when epoch is updated in compositor to when compositor receives next webrender frame. So if events come during that period, just make them wait immediately, dont need to try hit test.

Maybe it leads to the problem of notify new epoch to compositor.
However, I think if we only retry failing hit test with mismatch epoch once in the next frame, it is an improvement to ignoring it.

@longvatrong111 longvatrong111 force-pushed the hit-testing-fix branch 2 times, most recently from 759c376 to 9ced3cd Compare May 25, 2025 17:18
@xiaochengh
Copy link
Contributor

@mrobinson

I'm not sure a "retry" approach is the right one here. It feels like putting a bandaid over the real problem which is that hit testing needs to be done synchronously with script. 🫤

CMIIW but I think the situation is inevitable as long as we have compositor and script running on different threads, and possible solutions are:

  1. Let compositor retry failed hit test attempts
  2. Block compositor/script until synchronized
  3. Allow compositor to return out-of-date hit test results.

Here 1 seems to be the easiest fix. 2 is much more complicated and may cause a bigger performance hit, and I don't see much benefits in there.

However, my favorite is 3, and I would like to try it later in the year https://github.com/servo/servo/wiki/Roadmap#architecture later

@longvatrong111 longvatrong111 requested a review from gterzian as a code owner May 30, 2025 06:40
@longvatrong111 longvatrong111 requested a review from xiaochengh May 30, 2025 06:45
@longvatrong111 longvatrong111 force-pushed the hit-testing-fix branch 2 times, most recently from 5c762c6 to b68f5b1 Compare May 30, 2025 07:24
Signed-off-by: batu_hoang <longvatrong111@gmail.com>
Signed-off-by: batu_hoang <longvatrong111@gmail.com>
Signed-off-by: batu_hoang <longvatrong111@gmail.com>
Signed-off-by: batu_hoang <longvatrong111@gmail.com>
Signed-off-by: batu_hoang <longvatrong111@gmail.com>
Signed-off-by: batu_hoang <longvatrong111@gmail.com>
@xiaochengh xiaochengh added this pull request to the merge queue Jun 4, 2025
Merged via the queue into servo:main with commit b422a65 Jun 4, 2025
21 checks passed
@longvatrong111 longvatrong111 deleted the hit-testing-fix branch June 10, 2025 18:10
return false;
},
_ => {
return false;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In this case, may also need to retry.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This just keeps the default logic, you can add a retry here with a proper reason.
I'm just curious, if so, basically we retry all failed hit test?

.borrow()
.hit_test_at_point(point, get_pipeline_details)
else {
// Don't need to process pending input events in this frame any more.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

On some complex pages, such as https://m.huaweimossel.com, a retry failure may occur once.
For touchmove, failure only affects one swipe, which has a minor impact.
However, for touchdown and touchup, a retry failure has a significant impact on the application.
A touchdown failure will prevent the subsequent touchmove or click events from being completed.
A touchup failure will prevent the click event from being triggered. Additionally, the number of active_points saved in the script thread is incorrect. When the next touchdown is triggered, the application will receive two Touch messages.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

a retry failure has a significant impact on the application

If we don't retry touch events, there is no impact?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The touchdown and touchup events exist in pairs. If touchdown is successful and touchup fails, the application's JavaScript will consider there to be a touchpoint present. When touchdown is triggered again, the number of touchpoints becomes 2.
When this situation occurs on the https://m.huaweimossel.com/ website, the carousel will not be able to slide. This is because the developer has checked the size of e.touches.length in the JavaScript code. This is the issue I have discovered so far.

Copy link
Contributor

@kongbai1996 kongbai1996 Jun 23, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

image

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for your details. As my understanding, the problem occurs because of hit test failings. Since the retry keeps the order of events, it doesn't play a role here.
Anyway, retry is just an improvement, not a final fix.
We can wait for this: #37085 (comment).

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For the problem found now, should we first solve it by increasing the number of retries?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Increase amount of retry may lead to a scenario in which, 1 event takes several frames to be processed.
I'm thinking of retrying touch events as pairs of down and up, how do you think?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you mean that if either down or up fails, everything will be retried?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes both must succeed or fail together.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

But there could be two touchdowns, and then two touchups. Or more.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants