Join GitHub today
GitHub is home to over 50 million developers working together to host and review code, manage projects, and build software together.
Sign upTests with network requests frequently time out under rr #22892
Comments
|
My attempt to write a standalone rust program that made the same kinds of requests to the server did not reproduce the same problem when run under rr. |
|
I spent a few hours today tracing through the logs you gave me, and started to notice subtle differences with what is in the latest versions of tokio. Would it be possible to update hyper and all tokio crates to the newest version? |
|
That is already the case as far as I know. |
|
That being said, I am starting from master and upgrading everything again and I'll grab new logs. |
|
I noticed a couple lines "starting background reactor", which with newer tokio, shouldn't happen automatically anymore, as each worker has its own reactor instead of them all sharing a background one. |
|
Reproduced with master...jdm:foo2. |
|
Could I bother you for logs from the newest versions? Since a few things had changed, it was difficult to trace execution with the older logs and comparing with the tokio repo. |
|
https://gist.github.com/jdm/d91481985d554250a1988c3c34808639 The problem requests start on line 654. |
|
There is still a line about background reactor, which if on the newest version, should only happen when a socket is being used outside of a runtime. I suspect this line here: servo/components/net/http_loader.rs Line 1198 in 2c63d12 What if you replace that with something that spawns the let fut = futures::sync::oneshot::spawn(response_future, &HANDLE.lock().unwrap().executor());
let (res, msg) = match fut.wait() { |
|
|
Can it be |
|
It compiled, but I was still able to reproduce the timeout. |
|
I'm specifically trying to understand why at the end, there are no Oh also, is this Mac-specific (and so I should be looking at kqueue), or Mac + Linux? |
|
I have only observed this under rr, which is linux-specific. |
|
Here's a log from the oneshot build - first there's a log of a timing out run, and then there's a log of a successful run: https://gist.github.com/jdm/66cf22f47c568e59ee6c59721faa59c9 |
|
Oh, this only happens in |
|
Yeah, that was the motivation for my comment: "It's not clear to me whether this is a problem in tokio/hyper that rr exposes, or whether this is a problem caused by rr behaving incorrectly." It could be that rr's chaos mode is exposing a legitimate bug (which is its whole purpose). It could also be that this code is exposing a bug in rr's replaying of the relevant syscalls. I have no idea how to determine which it is. |
|
Which is to say - I have observed inexplicable network-related timeouts in other tests when running tests outside of rr before. I cannot guarantee that this timeout only reproduces in rr, only that I have consistently reproduced it in rr. |
|
Ya, I didn't mean to imply the bug is in rr. So, the confusing part is that the logs claim the requests have been written to socket, but then the reactor, when waiting on epoll, never sees |
I have observed this in an unmodified Servo build using
RUST_LOG=tokio,net,hyper ./mach test-wpt --debugger=rr --debugger-args="record --chaos" --no-pause-after-test --repeat-until-unexpected tests/wpt/web-platform-tests/fetch/content-type/script.window.js. I suspect this is not limited to this particular test.The symptom is that the test makes a number of HTTP requests to the WPT python HTTP server, the server observes all of the requests and responds appropriately, and Servo only observes a subset of the responses. The RUST_LOG output shows that the sockets aren't being notified that there is data available.
It's not clear to me whether this is a problem in tokio/hyper that rr exposes, or whether this is a problem caused by rr behaving incorrectly.