Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Integration testing harness #2359

Merged
merged 32 commits into from
Jun 21, 2022
Merged

Conversation

dead10ck
Copy link
Member

@dead10ck dead10ck commented May 1, 2022

I was already working on an integration testing branch in the background for a little while, and I thought since this is such a critical code path, it warrants introducing the integration testing harness at the same time as #2267. This is very much a "just getting the ball rolling" approach, and anything is up for discussion. I fully expect the integration test framework to change over time.

Some things to note about this implementation:

  • In order to facilitate easier testing, I moved the toml config parsing back into the main function. This undoes the move that was done in Add support for local language configuration #1249.

  • The integration tests require shutting off actual rendering, so to do this, a feature branch was introduced to disable rendering for the integration tests. Tests are run by doing cargo test --features integration

  • We need some way to run the Application with some starting document and input keys, and then inspect the internal state while the Application is still running. We also need to run the actual main event loop in order to handle more things than just the key press event—we need to drive document writes, config changes, and eventually LSP messages, if and when any tests are added for these things. The problem is the main event loop doesn't exit until all documents are closed, and then it discards a lot of internal state.

    So in order to inspect internal state, I first just did this with a simple timeout, but it quickly became clear that this is problematic, as the amount of time that a test finishes can differ dramatically, and it can lead to non-determinism: if the host system becomes busy for some reason, tests fail because they time out too soon, even when there isn't a problem. Additionally, we can't just use very long timeouts as this would artificially make it take a very long time for each integration test to finish, even if the actual work they did was done very quickly.

    So I came up with the idea that we could piggy-back on the idle timer: since the event loop is using a biased event loop, and we submit all key strokes ahead of actually running the Application, the idle timer should only fire when every key has been handled. To implement this, the event loop was split into one which exits after a successful idle timer handle, and the main call site simply loops over this as well. Additionally, the idle timer is reset every time any event gets handled successfully. This lets the application signal when it is done processing the test key strokes, and leaves the internal state open for inspection.

Another bug was discovered along the way while writing these integration tests: when a document is first opened, the cursor has the initial state of (0, 0), and is not updated to reflect the document's contents until the first cursor movement. This led to some weird bugs where the selection did not end up in the right direction after an edit. As far as I have been able to find, this does not cause any bugs in the actual text transformation, so it's likely this did not cause any real harm to users, but nonetheless, it was making testing very confusing until it was fixed. Now, if there is text in the opened document, the cursor is placed over the whole first grapheme cluster.

This PR was written along with #2267, so there are tests that pertain to that PR, but I left them in and ignored them, for purposes of demonstrating different kinds of tests. I'll unignore them along with that PR.

Open questions

  • How do we handle the feature flag? Should we simply leave it as is, and require adding the flag to run integration tests, or should we alias test to do test --features integration? The latter would help by not requiring contributors to remember that you have to do this to run the integration tests, but has the issue that eventually, we may have tests that depend upon external state, such as an LSP on the build system. Perhaps there could be an xtask or something? I'm open to suggestions.
  • When we do have tests that need consistent external state, how do we do this? A docker container?

Comment on lines 256 to 320
pub async fn event_loop_until_idle<S>(&mut self, input_stream: &mut S) -> bool
where
S: Stream<Item = crossterm::Result<crossterm::event::Event>> + Unpin,
{
loop {
if self.editor.should_close() {
return false;
}

use futures_util::StreamExt;

tokio::select! {
biased;

event = reader.next() => {
self.handle_terminal_events(event)
Some(event) = input_stream.next() => {
self.handle_terminal_events(event);
self.editor.reset_idle_timer();
}
Some(signal) = self.signals.next() => {
self.handle_signals(signal).await;
self.editor.reset_idle_timer();
}
Some((id, call)) = self.editor.language_servers.incoming.next() => {
self.handle_language_server_message(call, id).await;
// limit render calls for fast language server messages
let last = self.editor.language_servers.incoming.is_empty();
if last || last_render.elapsed() > deadline {

if last || self.last_render.elapsed() > LSP_DEADLINE {
self.render();
last_render = Instant::now();
self.last_render = Instant::now();
}

self.editor.reset_idle_timer();
}
Some(payload) = self.editor.debugger_events.next() => {
let needs_render = self.editor.handle_debugger_message(payload).await;
if needs_render {
self.render();
}
self.editor.reset_idle_timer();
}
Some(config_event) = self.editor.config_events.1.recv() => {
self.handle_config_events(config_event);
self.render();
self.editor.reset_idle_timer();
}
Some(callback) = self.jobs.futures.next() => {
self.jobs.handle_callback(&mut self.editor, &mut self.compositor, callback);
self.render();
self.editor.reset_idle_timer();
}
Some(callback) = self.jobs.wait_futures.next() => {
self.jobs.handle_callback(&mut self.editor, &mut self.compositor, callback);
self.render();
self.editor.reset_idle_timer();
}
_ = &mut self.editor.idle_timer => {
// idle timeout
self.editor.clear_idle_timer();
self.handle_idle_timeout();
return true;
}
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Only had the time for a brief look, but I don't like the event loop changes.

  1. It adds an extra async function call to what was before a single loop (that's very frequently called)
  2. Events & callbacks should not interact with the idle timer because it's intended to only be affected by keypresses. Continually resetting the idle timer on LSP events for example will ensure the timer never triggers for the first 10-20s while rust-analyzer boots up and continually spews log messages.

Alternate frontends will re-implement the application loop themselves, so maybe the solution is to treat the integration tests as a separate frontend? i.e. define a separate test::Application that has no rendering and declares it's own loop. I started working on this previously by moving the handle_debugger_message onto helix-view/Editor. We'd need to do the same for handle_language_server_message so we'd be able to share most of the code.

(Also, I move commands & parts of UI to helix-view on this branch: https://github.com/helix-editor/helix/tree/gui)

Copy link
Member Author

@dead10ck dead10ck May 11, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Only had the time for a brief look, but I don't like the event loop changes.

  1. It adds an extra async function call to what was before a single loop (that's very frequently called)

Could you please elaborate on what your concern is about this? Is it performance? Aren't we just talking about an extra jump that happens once every few hundred milliseconds in the worst case?

  1. Events & callbacks should not interact with the idle timer because it's intended to only be affected by keypresses. Continually resetting the idle timer on LSP events for example will ensure the timer never triggers for the first 10-20s while rust-analyzer boots up and continually spews log messages.

I believe this is already the case. It's a biased select, so it will handle key presses, signals, LSP, etc, all before it polls the idle timer. I believe the only practical difference this makes is adding a few hundred ms once that initialization finishes.

Also, even if this were not the case, the LSP would not be able to serve completions until it's fully initialized anyway.

Perhaps we could make the reset calls conditional with the integration feature?

Alternate frontends will re-implement the application loop themselves, so maybe the solution is to treat the integration tests as a separate frontend? i.e. define a separate test::Application that has no rendering and declares it's own loop.

This was my first thought when I ran into this problem as well, but then I considered that we want to be able to test all features, and I realized we would just be copy/pasting the entire loop with a few small tweaks. And this kind of defeats the point of integration testing, which is to test the very same code that the user will be running.

We want to be able to test things like :wq saves the file and exits the Application, in that order. Or doesn't exit when there's an error saving.

And we don't want to have to worry about updating the testing harness's event loop every time we touch the term Application's event loop.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@archseer I consolidated the reset calls into a single place and put it behind the integration feature flag. This way, non test builds will see no difference in behavior; and additionally, since key presses are the only thing that triggers idle timer resets, this reduces the frequency that the call stack exits and reenters the inner event loop to once every idle timer fire, which should make the performance impact negligible. How to you feel about this?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Replying to @archseer's message from Matrix:

right, but that's the loop that the terminal app will be running (with the rendering disabled anyway). What about the UI app etc? Are we going to start running integration tests per frontend?

For me, significantly shrinking application.rs to a small event loop + platform specific stuff, then having a custom loop is a reasonable tradeoff. We're primarily interested in covering commands & how they interact with the editor state

This makes a lot of sense; however, I think if the goal is to provide a sensible abstraction for different frontends, this is already an orthogonal problem. The Application event loop is driving a whole bunch of stuff that should be frontend-agnostic, like config changes, LSP message handling, debug events, and file writes. As it is, a GUI frontend would also need to copy/paste a lot of the event loop.

What you're saying makes sense, but I think fixing that would require a larger refactor to better enable different frontends, and might be outside the scope of this PR.

I wish we could somehow force inline https://github.com/helix-editor/helix/pull/2359/files#diff-d9796de334de9f547d45fd8b7476be53c771200ab48f28266e5f62e3493b5df9R252= to avoid more async state generated by the compiler

If it's really that concerning, maybe the event loop logic could be made into a macro? But does the extra await really cause that much trouble?

@dead10ck dead10ck force-pushed the test-harness branch 2 times, most recently from 8fc5075 to 22a3b33 Compare May 17, 2022 03:02
@the-mikedavis the-mikedavis added the S-waiting-on-review Status: Awaiting review from a maintainer. label May 18, 2022
@dead10ck dead10ck force-pushed the test-harness branch 2 times, most recently from 3292fc2 to 074c4ed Compare May 23, 2022 12:53
@archseer
Copy link
Member

Can you give this a quick rebase? It should fix the the tests

@dead10ck
Copy link
Member Author

Can you give this a quick rebase? It should fix the the tests

Sure thing, done. Also, I mentioned this in the chat room, but if you'd like, I could take some extra time to refactor at least the main event loop to make all the events happen behind a single function, somewhat similar to how I had to put all the editor events behind one function in the write path PR. That was due to borrowing restrictions, but it could be extended to all events. That way, it would be more reasonable to make a separate test Application object. I'd be happy to do that if that would help decouple the integration tests a bit from the term Application.

@archseer
Copy link
Member

I'm okay with that happening in a follow-up, this PR is already quite large and it's useful as a standalone.

@dead10ck
Copy link
Member Author

Ok cool. Any thoughts on how to handle the feature flag? There are two things I could think of that I wrote at the bottom of the PR description. If merged as is, users would have to explicitly opt in to integration tests.

@archseer
Copy link
Member

I would add an integration-test alias to .cargo/config then run that as a separate step on the CI (cargo integration-test). That way it's clear if unit tests failed, or integration tests. We can document how to run the tests in the README, plus if it fails on the CI it's easy to figure out what failed: just run the same command.

helix-term/src/commands.rs Outdated Show resolved Hide resolved
helix-loader/Cargo.toml Outdated Show resolved Hide resolved
helix-term/tests/test/write.rs Outdated Show resolved Hide resolved
helix-term/tests/integration/movement.rs Outdated Show resolved Hide resolved
Comment on lines +71 to +83

#[cfg(not(feature = "integration"))]
use tui::backend::CrosstermBackend;

#[cfg(not(feature = "integration"))]
use std::io::stdout;
use tui::backend::{Backend, CrosstermBackend};

#[cfg(not(feature = "integration"))]
type Terminal = tui::terminal::Terminal<CrosstermBackend<std::io::Stdout>>;

#[cfg(feature = "integration")]
type Terminal = tui::terminal::Terminal<TestBackend>;

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Consider cherry picking these changes:

14f9878 57d4a9b e0f9d86

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh interesting. Would that goal be to get rid of the feature flags? I think we'd still have to have them, since we're still building an Application.

Or are you just asking me to do it to get the code change into master?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's part of an upcoming refactor and I think it would simplify the amount of cfg clauses you need

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm sorry, I tried, but the code seems to be built on top of other bigger breaking API changes, like adding a new RenderContext. It's difficult to take just those changes without the rest of the branch.

Comment on lines -56 to -54
- name: Copy minimal languages config
run: cp .github/workflows/languages.toml ./languages.toml
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

need the full languages config for integration tests

Do we? I'd list any languages we test with but avoid fetching uncommon grammars that won't ever be used in tests, that way we're not waiting for 50+ grammars to build.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure, we can do that. It just means that a contributor will have to remember to add a language if they want to add tests for it. And it will be a bit confusing because their tests will pass locally for them, but fail on GitHub, and for not obvious reasons. It was failing for me on the C indentation test, but the failure was the wrong number of spaces. It took a while to figure out it was because of the grammar missing.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's a good point, maybe we should include languages.toml in the test directory, then conditionally compile using that languages.toml over the one in the root dir if running tests?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm, I just tried implementing this, but I just realized, the feature flag would help if we were only running one test command, but if we separate regular unit tests from integration tests into separate phases, then I'm not sure how to detect if we're running unit tests without the integration feature flag.

helix-term/tests/test/write.rs Outdated Show resolved Hide resolved
helix-term/src/application.rs Outdated Show resolved Hide resolved
helix-term/src/ui/mod.rs Outdated Show resolved Hide resolved
helix-view/src/document.rs Outdated Show resolved Hide resolved
dead10ck and others added 6 commits June 18, 2022 23:54
* Use new macro syntax for encoding sequences of keys
* Make convenience helpers for common test pattern
* Use indoc for inline indented raw strings
* Add feature flag for integration testing to disable rendering
When a new View of a Document is created, a default cursor of 0, 0 is
created, and it does not get normalized to a single width cursor until
at least one movement of the cursor happens. This appears to have no
practical negative effect that I could find, but it makes tests difficult
to work with, since the initial selection is not what you expect it to be.

This changes the initial selection of a new View to be the width of the
first grapheme in the text.
Use the Application's main event loop to allow LSP, file writes, etc
During write-quit, if the file fails to be written for any reason, helix
will still quit without saving the changes. This fixes this behavior by
introducing fallibility to the asynchronous job queues. This will also
benefit all contexts which may depend on these job queues.

Fixes helix-editor#1575
Copy link
Member

@archseer archseer left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's get this merged! The other pending comments can always be resolved later. Great work! 🎉

@the-mikedavis the-mikedavis removed the S-waiting-on-review Status: Awaiting review from a maintainer. label Jun 21, 2022
@archseer archseer merged commit 19dccad into helix-editor:master Jun 21, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants