-
Notifications
You must be signed in to change notification settings - Fork 1.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add Support for cargo nextest #3920
base: main
Are you sure you want to change the base?
Conversation
Apologies for the delay, I was on vacation, still catching up. |
That's okay! |
Well its working, actually its my second version working, the fist one used locks. But according to the the library documentation it shouldn't work (at least cross-platform):
Anyway, the overhead was as bad as I expected, as each test execution now requires the same overhead as a complete assembly, I only tested on firefox so far, about 12-13s each, thermal throttled. But when tests take longer than the overhead, the extra cores start to payoff. I'm having thermal throttling issues, but my speed boost might be around 3x when I have the tests configured for very heavy parameters, its a property based testing variation. |
I was waiting for your review, because as I do more tests, I end up finding more stuff to fix, making the review harder on you. Sorry. I ended up fixing support for deno, not sure why, but it was for sure broken. |
@daxpedda I have been trying to create a macro to simplify the tests, this will be particular useful for tests that should be run over the different runtimes supported. I'm still working on the syntax, as the macro by itself, is allowing different usage patterns. But right now, this is the format already working: feature! {
given_there_is_an_assembly_with_one_failing_test();
when_wasm_bindgen_test_runner_is_invoked_with_the_option("-V");
"Outputs the version" {
then_the_standard_output_should_have(
&format!("wasm-bindgen-test-runner {}", env!("CARGO_PKG_VERSION")),
);
}
"Returns success" {
then_success_should_have_been_returned();
}
} It expands to two tests #[test]
fn outputs_the_wasm_bindgen_test_runner_version_information_feature() {
let mut context = Context::new();
given_there_is_an_assembly_with_one_failing_test(&mut context);
when_wasm_bindgen_test_runner_is_invoked_with_the_option(&mut context, "-V");
then_the_standard_output_should_have(
&context,
&format!("wasm-bindgen-test-runner {}", env!("CARGO_PKG_VERSION")),
);
}
#[test]
fn returns_success_feature() {
let mut context = Context::new();
given_there_is_an_assembly_without_anything(&mut context);
when_wasm_bindgen_test_runner_is_invoked_with_the_option(&mut context, "-V");
then_success_should_have_been_returned(&context);
} If the target platform is wasm it uses [wasm_bindgen_test::wasm_bindgen_test] instead. This allows for a more compact file, because there are sometimes 5-7 different outcomes for a single execution context The idea is by default to respect the single outcome, allowing easy troubleshooting of regressions, but its possible to aggregate the executions on CI for faster execution times. |
…g error to invocation without_arguments.
… invocation without_arguments.
…ation with_an_empty_assembly.
…warning to invocation with_an_empty_assembly.
…cation with_an_empty_assembly.
…gen-cli because of dependency of wasm-bindgen-test-runner. Updated assembly builder to avoid runner runtime conflicts.
…ly to invocation with_an_assembly empty.
… cargo test command.
…ation with_an_assembly with_one_successful_test.
… back the cargo target directory.
…ructure to allow multiple instances of the runner using the same assembly to make the tests faster.
…caching of assembly building.
…t generate random suffix for assembly directory name.
… without anything to be more consistent.
…Execution to invocation with_an_assembly with_one_successful_test.
…_an_assembly with_one_successful_test.
…ation with_an_assembly with_one_failing_test.
…to invocation with_an_assembly with_one_failing_test.
…hen running with exact.
…imes node with_one_successful_test.
…mes deno with_one_successful_test.
…l tests in ubuntu-latest.
… run in the browser.
… allow aggregate execution of steps.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Updated to use docopt
- It was used already in wasm-bindgen
- Not sure if you want to use something more modern, its easy to change
- Just checked its unmaintained https://github.com/docopt/docopt.rs and recommends clap or structopt, but structopt's docs state "As clap v3 is now out, and the structopt features are integrated into (almost as-is), structopt is now in maintenance mode"
I think updating to Clap would be nice, seeing that Cargo uses it as well.
That said, lets do this in a follow-up PR.
Updated the macro wasm_bindgen_test
- To export and additional method to allow listing handling as required by nextext
- Not sure if it might make sense to override current export
I think it would be better to update the current macro if possible.
Testing
- I wasn't sure what was the best place to put the tests in, because of the custom test runner in the cli crate, so I placed them on main tests folder.
- I used a variation of BDD that I use for many years now.
-- Although it seems a bit more verbose, it makes writing tests a lot simpler and faster, anyone can understand them and add new ones.
-- When something breaks its very easy to understand why.
-- Although conter-intuitive, in practice I have found that they are a lot easier to update on refactors.
I'm unsure what exactly is going on here and I didn't review the tests yet, but from a glance I would like to change the following:
- Shouldn't the tests be in
/crates/cli/tests
instead of in/tests
? - Not sure why some folders are prefixed with
_
. - Tests include extremely repetitive code and use ambiguous terms like "assembly". While I don't mind having all these files, I would prefer if we can cut down on the repeated code entirely.
- Why did you choose Firefox to be its own target? We should use the name "browser" instead.
Overhead
- When used from cargo nextest the overhead is pretty real, as it has to load the runtime and the wasm into it, so it can be a lot, but it runs them in parallel, so it makes up for it on tests that take a long to execute.
That was to be expected, in follow-up steps we should discuss how to cut that down. The first idea that comes to mind is to introduce a way to start and end the headless browser separately from all the tests. That way each test doesn't spawn its own headless browser, which should cut down on overhead significantly.
Additionally some tests could be run in parallel without requiring their own context, e.g. tests in workers. But that would require some significant refactoring.
Architecture:
- I would prefer to organize the code into:
-- a folder with the CLI and environment stuff
-- a folder with the wasm handling
-- a folder with the runtimes
Considering that the changes aren't actually that big, I would prefer if we do any further refactoring in a separate PR.
@daxpedda I have been trying to create a macro to simplify the tests, this will be particular useful for tests that should be run over the different runtimes supported.
I think that's a great idea, but lets not use the name feature
, because its unclear exactly what that means. Simply test
or multi_target_test
(the term target is clear in this context) would be enough I believe.
Again, I apologize for the delay. I spontaneously extended my vacation and after coming back my backlog was so huge I had to prioritize other things until I reached this PR.
[dev-dependencies] | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
[dev-dependencies] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would prefer if we move any changes to Deno to a separate PR to make it easier for me to review.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have no problem with that, but you will have to guide me on the best way to do that.
It wasn't my intention, I though it was working, so I added a test for it, and then it needed two fixes without them it wasn't working for sure, as you can test.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Deno is different then the browser issue, the main reason why I don't want it in this PR is because I can't review it.
So I'm fine with proceeding here simply without testing Deno.
EDIT: Maybe I'm missing something, but my assumption here is that without any changes to Deno everything will still work, its just not tested on Deno.
None if std::env::var("WASM_BINDGEN_USE_BROWSER").is_ok() => TestMode::Browser { | ||
no_modules: std::env::var("WASM_BINDGEN_USE_NO_MODULE").is_ok(), | ||
}, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why this change?
I think we could at least discuss and do this in a separate PR.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I believe I added it to be able to execute the tests on the browser without having require the runtime in the code, that way you can check if your assembly runs in all runtimes without having to recompile it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is a great feature, but still belongs in a separate PR.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is the same issue with the deno fixes, I didn't planned for them, both had the same underlying objectives:
- add tests to make sure the application remained working as close as possible to the original, because of the docopt and nextest implied changes
- enforce that the nextest api is implemented consistently across all the supported runtimes
That being said, I actually do agree with you, but please advise me the best way to do this.
If I open PRs for them, once they are approved and merged, should a rebase to the latest version work?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I see ... I guess the issue is that running the tests on all targets would be more complicated without this change.
I'm fine with leaving it in this PR in this case, but it needs some added documentation.
@@ -23,6 +23,7 @@ bin-dir = "wasm-bindgen-{ version }-{ target }/{ bin }{ binary-ext }" | |||
docopt = "1.0" | |||
env_logger = "0.8" | |||
anyhow = "1.0" | |||
fs2 = "0.4.3" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I assume this was an accidental leftover?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't remember, I'm going to review that.
Some(Some(_)) => "$", | ||
Some(None) => "$", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Some(Some(_)) => "$", | |
Some(None) => "$", | |
Some(_) => "$", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, this was a left over, that I didn't clean up, sorry.
@daxpedda That's okay, the only issue is that the details aren't as fresh as they were. I already improved the tests a lot, and I hope I'm able to improve them a bit more still, my objective is to make them more clear and intuitive and easier to change. I haven't pushed those changed yet, because there are some things remaining in the macro. I have no issues with the naming, I just use a naming from BDD, but different perspectives always enrich the solution. I'm going to finish those, once I push that, I'm going to tackle your requests one by one. |
Just a quick update, I was able to move forward with the multi_target_test model, I'm still flushing out details in the library as I convert the existing tests I created. But its pretty much working, right now it already runs on all the runtimes that we specify, the problem is that its flushing issues and inconsistencies specific as expected. One of the problems was that the browser was adding extra tabs to the output (I used cargo test with a #[test] as reference), its fixed now. The other issue I'm having is some instability in the Safari Driver, unfixed for now. But aside from that, the model seems to work well to ensure there is a consistent behaviour across all runtimes, but I should know more as I move forward. I'll try to finish this part ASAP, but the blockers I believe will be only the issues that it will uncover. |
FWIW, I've been having spurious failures using Safaridriver lately as well, so far the only error I encountered is a port binding issue. My guess here is that the new ARM MacOS runners have some issue with binding random ports in quick succession or an issue around that with the driver itself. |
I was able to trace down the issue I was having, it was caused by multiple parallel safari invocations, once I added a lock to only allow one at a time, the problem was gone. That lock doesn't limit test execution in practice, as Safari by design only allows one session at a time The issue seems to be triggered by each safaridriver triggering the execution of a Safari --automation, and for some reason when that happens, even with workarounds, they have to be manually terminated for the execution to continue. This solution seems to be cleaner, but the Safari --automation remains in memory, as it did before. I removed most experiments, but left one, I moved the BackgroundChild into inside the Client, I did it because I was having some issues with the drop order, but left it in, because the code was cleaner, not sure if those issues remain. I'm going to add some more safari tests to stress it more, then I'll add a parallel PR like I did with deno. |
I ended up still achieving some more instability, that led me to beef up the lock, it seems pretty robust now. But sometimes Safari --automation still refuses to allow the creation of a new session, so in those situations, I updated it to kill it, and ask safaridriver again for a new session, forcing it to initiate it again, that seems to fix the remaining issues. I tested terminating the Safari --automation for every wasm-bindgen-test-runner execution, but that imposes a severe delay in the tests execution overall. Right now, as I converted more tests into the multi_target_test mode, I'm dealing with some inconsistencies of output between the targets, but the Safari instability seems at least a lot better. |
@spigaz I believe you are spending quite a significant amount of time on testing facilities that don't have to be this perfect compared to all the other untested features |
@daxpedda That wasn't my objective, but nextest implied changes in many different places, including in the arguments handling. As a rule I try to add tests before introducing changes, to make sure I don't make it worse. Because of that and the nextest tests I ended up with a lot of tests, that when I generalised found even more issues that I tried to fix. Right now I'm just trying to iron out some issues with safari to avoid breaking the CI, it seems okay already, perhaps some starvation in the safari instance access. But my point is to make it like we did with the deno PR, using the test base as reference. |
The testing you are trying to implement here is greatly appreciated, but it is of way higher quality then this repository is used to. In What I'm trying to say is that bringing the PR to a mergable state could be way simpler if the tests are in a comparable scope to the rest of the repository. Extending and improving tests can be done in a separate PR. Ofc I will leave the decision on how to proceed here to you, I'm certainly not against getting this well tested in the same PR! |
@daxpedda I understand your reasoning, and to be honest, I'm trying to narrow things to make it easier for you to review. The problem is that cargo nextest stresses wasm-bindgen-test a lot, in my repo, 771 times, that means that all the instability issues pop up. Anyway for now, I'm not being able to trigger issues with any of the supported runtimes... I was finally able to remove the hacks I added to get cargo nextest working, because of the shared directory, using a ResourceCoordinator. I just have some minor things, then I'm going to let you take the lead on this and I can create as much PRs as necessary. |
This looks great! Has it stalled? |
@ifiokjr No, its done. Although now I have some merge conflicts to solve. I just have been working on a way to get the tests to be more intuitive, to see if it eases the merging. I just didn't pushed it, because it didn't seem to be a priority for anyone and I would break the existing tests, but if its a priority for you, I can try to speed up things... |
This is still work in progress:
https://nexte.st/book/custom-test-harnesses.html
Progress
Updated to use clap
Updated the macro wasm_bindgen_test
Testing
I wasn't sure what was the best place to put the tests in, because of the custom test runner in the cli crate, so I placed them on main tests folder.
I used a variation of BDD that I use for many years now.
-- Although it seems a bit more verbose, it makes writing tests a lot simpler and faster, anyone can understand them and add new ones.
-- When something breaks its very easy to understand why.
-- Although conter-intuitive, in practice I have found that they are a lot easier to update on refactors.
Overhead
Architecture:
-- a folder with the CLI and environment stuff
-- a folder with the wasm handling
-- a folder with the runtimes