-
Notifications
You must be signed in to change notification settings - Fork 24
Make DAP protocol tests deterministic #1033
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: task/iopub-comm-unification
Are you sure you want to change the base?
Make DAP protocol tests deterministic #1033
Conversation
b440eac to
1f2a4a4
Compare
1b60b80 to
8802e70
Compare
43e46fe to
2208baf
Compare
DavisVaughan
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I didn't actually build locally and try out any debugging since this was mostly concerned with test improvements
| #line 1 "file:///test.R" | ||
| base::.ark_auto_step(base::.ark_breakpoint(browser(), "file:///test.R", "1")) | ||
| #line 1 "file:///test.R" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's supposed to be on line 6 and 8?
| serde_json = "1.0.94" | ||
| stdext = { path = "../stdext" } | ||
| tempfile = "3" | ||
| regex = "1" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
alphabetical order?
| /// On Windows ARM, ZMQ poll with timeout blocks forever instead of | ||
| /// respecting the timeout. Use non-blocking poll with manual timing. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
WAT
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Windows ARM is very worrying. Couldn't find any upstream issues though!
Here is another issue (this time flaky) I just observed:
https://github.com/posit-dev/ark/actions/runs/21896771039/job/63214590132?pr=1037#step:9:864
failures:
test_dap_evaluate_in_different_frames
test result: FAILED. 0 passed; 1 failed; 0 ignored; 0 measured; 7 filtered out; finished in 6.48s
stderr ───
thread 'test_dap_evaluate_in_different_frames' (3508) panicked at crates\ark\tests\dap_evaluate.rs:107:19:
Failed to receive DAP message: A connection attempt failed because the connected party did not properly respond after a period of time, or established connection failed because connected host has failed to respond. (os error 10060)
note: run with `RUST_BACKTRACE=1` environment variable to display a backtrace
FAIL [ 6.611s] (538/765) ark::dap_evaluate test_dap_evaluate_error
| /// Accumulated stderr stream content | ||
| stream_stderr: RefCell<String>, | ||
| /// Put-back queue for non-stream messages encountered during stream assertions | ||
| pending_messages: RefCell<VecDeque<Message>>, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should probably say in the name that they are iopub messages
| trace_iopub_msg(&msg); | ||
| match msg { | ||
| Message::Stream(ref data) => self.buffer_stream(&data.content), | ||
| other => self.pending_messages.borrow_mut().push_back(other), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not super familiar with how this trace_iopub_msg() stuff works or why its useful, but isn't it weird that you can trace twice for some messages?
For a message you put in pending_messages, it is traced here and when you pop it off.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
same in assert_stream_matches_re
| stdout: std::mem::take(&mut *self.stream_stdout.borrow_mut()), | ||
| stderr: std::mem::take(&mut *self.stream_stderr.borrow_mut()), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| stdout: std::mem::take(&mut *self.stream_stdout.borrow_mut()), | |
| stderr: std::mem::take(&mut *self.stream_stderr.borrow_mut()), | |
| stdout: std::mem::take(&mut self.stream_stdout.borrow_mut()), | |
| stderr: std::mem::take(&mut self.stream_stderr.borrow_mut()), |
i think
|
|
||
| #[deprecated = "Use assert_stream_stdout_contains() instead"] | ||
| #[allow(unused)] | ||
| pub fn recv_iopub_stream_stdout(&self, _expect: &str) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Well, should we just remove them then?
| // Overriding methods for base DummyFrontend's `recv_iopub_` methods. These | ||
| // use `recv_iopub_next()` to auto-skip streams. Question: Maybe this | ||
| // behaviour should live in the base dummy frontend? The stream issues are | ||
| // likely not R specific. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yea having to override things when we also own the upstream DummyFrontend is feeling very complicated.
Thinking about touching any of this code a few months from now is already giving me anxiety haha
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yep the only thing that gave me pause is the possibility of an amalthea kernel with more predictable streams? But that seems unlikely.
Branched from #1032 (Merge CommManager into Shell and IOPub)
Closes #1026
This PR is the payoff for the IOPub/Comm unification of #1032. It leverages deterministic message ordering to replace fuzzy message matching with strict sequential assertions in the test infrastructure. It also fixes several bugs made obvious by the new assertions, and removes emission of debug events during auto-stepping so that code injected for breakpoints is invisible to the frontend.
Replaces fuzzy matching with strict sequential assertions, making tests both more readable and more accurate. Before:
After (the different counts are due to other fixes, see below):
Eliminates workarounds like
recv_iopub_busy_skip_streams,recv_iopub_execute_input_skip_streams,recv_iopub_async, and the entireMessageAccumulator/recv_iopub_untilpatternAs part of a complete test review I also fixed:
#linedirective)Stream buffering
The one remaining source of non-determinism is R's stream output. R may batch or split
stdout/stderrmessages arbitrarily, independently of the ordering of non-stream messages. The infrastructure around stream assertions was reworked to robustly prevent test flakiness caused by stream buffering.All
recv_iopub_*methods now automatically buffer Stream messages and ignore them. The accumulated streams are now tested/asserted separately viaassert_stream_stdout_contains()/assert_stream_stderr_contains(). Stream assertions are still order-sensitive: each match drains the buffer up to that point. But a stray stream can no longer cause other assertions to fail.recv_iopub_idle()acts as a synchronization point: at that point, we know for sure R has finished streaming all output emitted during the request handling. The method panics if streams were received but not asserted, then clears the buffers.There must be at least one stream assertion per request handling if output is emitted. Unfortunately this setup still allows unexpected stream output to go unnoticed as long as there is one stream assertion in the test. Due to arbitrary splitting though, I don't think we can do better. Unless we go for full snapshots, but this approach has issues of its own (scrubbing of stateful output, and stability across platforms and versions of R).
Auto-stepping before
debug_start()Previously, auto-stepping through injected breakpoint wrappers happened after
debug_start()had already been sent to the frontend. Every breakpoint hit produced visible intermediate events (start_debug, Stopped, Continued, Continued, start_debug, Stopped). Tests had to account for all of these withrecv_auto_step_through()and multiple start/stop cycles.Now
maybe_auto_step()is called beforedebug_start(). If we detect we're at injected code, we immediately returnnto R without emitting any debug notifications. The frontend only sees a single start_debug + Stopped at the user expression.Similarly, when
pending_inputsis non-empty (multiple expressions queued), we skipdebug_start()for intermediate browser prompts.The main insight is that we never start a new debug session unless we're ready to close the active execution request, and we never close the active execution request until R has stopped at the final user-visible location.