New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
RFC: Out-of-tree test suite #3557
RFC: Out-of-tree test suite #3557
Conversation
This PR proposes the creation of an external stand-alone test suite and associated tools for verifying the Rust toolchain that also fits into existing Rust development workflows.
I think that having some form of standalone end-to-end testing is a great idea. Also nice for other implementations, e.g. gccrs seems to be interested: https://gcc-rust.zulipchat.com/#narrow/stream/266897-general/topic/RFC.3A.20Out-of-tree.20test.20suite. A few questions/notes:
A downside of tests in a submodule is that workflow for adding tests with a feature gets less ergonomic - two places to commit and potentially get out of sync. Perhaps the main test suite source could still live in
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think such a big RFC is the wrong way to tackle this. As far as I understand, the problem being solved here is that you want to be able to test the distributed toolchains.
I think there are simpler ways to solve this problem, for example by just shipping the current testsuite as-is with some changes to tooling that would allow testing distributed toolchains.
Instead of bikeshedding about this RFC, I think it would be more productive if you came to Zulip t-compiler with a precise problem description and we could then brainstorm a nice solution from scratch, considering all the potential downsides.
|
||
Users and automation would install the OOTTS and invoke its test driver to verify tools and libraries of their choosing. | ||
|
||
Rust developers would use the OOTTS in-tree as a git submodule of `rust-lang/rust`. The existing Rust infrastructure would delegate the building, testing, and packaging of the OOTTS to the OOTTS infrastructure and test driver. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It needs to be a subtree, not a submodule, so that people can add modify in their Rust PRs.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
By your description of the problem, I am slightly worried that in general you have a different idea of how the build process goes down than what happens in actuality, both for toolchains and actual executions of rustc involving cargo.
|
||
Industrial use of language toolchains requires assurance that the toolchain that is distributed has been rigorously tested and is guaranteed - even warranted - to work correctly. To achieve this level of assurance, the toolchain is built and packaged for distribution, installed on a completely clean machine, and then tested. This separation of concerns, which mimics the way a user will interact with the toolchain, ensures that the artifacts distributed are the artifacts tested. | ||
|
||
Currently, the Rust infrastructure does not support this behavior. Instead, testing validates intermediate states of the Rust toolchain as they exist in the build tree. An intermediate state may contain leftover artifacts from a previous build that could influence the validation. The Rust infrastructure may be directed to produce an malformed or incomplete tool (ex: `rustc` with a missing `libstd`), and yet hold the belief that the tool has successfully passed validation. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The "intermediate states" you describe often have more integrity than the actual release binary: we have had several bugs from combining, e.g. BOLT and strip
(yes, binutils strip!) and performance-guided optimization. I believe we added a final test run at some point for these multiply-optimized binaries (@Kobzol would know better, perhaps?) to catch them having bugs, but it has generally not been the case that the final release binary should be given enormously more confidence than just ./x.py clean && ./x.py build --stage 2
(with the dist configuration set in the config.toml, I suppose? my understanding of the argument in favor of release binaries has been they have better UX, not their trustworthiness).
Of course, that only adds to the importance of making running the tests with it easy, but still, it feels like a misframing.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That confidence however relies heavily on the correctness of post-test processing. If there is no post-install verification, then there is no way to detect a post-test bug in the infrastructure, which also includes pilot errors (forgetting to add rust-std
to a ./x dist
call).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The problem of "obvious" post-install errors is easily solved with a few dozen lines of bash running all tools on hello world. For less obvious post-install errors, like a miscompilation, running the normal test suite on the in-tree code (like the current "post-optmization tests" is already done today.
Basically, the thing this OOTTS would be testing that the current test suite isn't testing is ./x dist
, it would be nice to have more tests for it, but writing a test suite for that does not involve moving all UI tests etc, but instead creating something (a lot smaller) specifically for that, that just does some basic tests. We're not concerned about the compiler subtly breaking from the tar -c && tar -x
, we're more concerned about components missing alltogether or being completely wrong in a way that's detected with a simple test.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Such a small "install sanity check" test suite can still be in-tree just fine, optionally shipped as a component if you want. That wouldn't require nearly as much work as this RFC and could probably just be a compiler MCP (after talking to the compiler team about it).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I checked with our Certification team, and a workflow of build -> test -> package -> install -> sanity check is OK from the point of view of tool qualification.
Given that the idea of utilizing a dedicated sanity check test suite is considerably smaller in scope and effort, this RFC should probably be retired.
@Nilstrieb how should we proceed at this point?
Because of the issues I mentioned re: "rustc is a cross compiler and host and target remain build parameters", I am slightly concerned that a test driver might have to be compiled, for each host, for each target, in a combinatorically explosive fashion, as is the case (as far as I understand it) with libstd's artifacts? I suppose if we could get it down to one per target, that might be nice, but then we can't use rustup to distribute this "run the test suite" artifact and actually support all the targets that can run tests, because some targets don't support host toolchains, thus rustup on them is... nonsensical, if it even works... but do support running a rather lot of the tests, which have to be compiled on some host and then run on the target. I realize "run tests for a random target that can't host the toolchain but would support most of the test suite via the remote-test-client and remote-test-server" isn't exactly trivial, so improving the experience of doing that is certainly something can get behind. Unless this proposal isn't actually concerned about such targets except as a "future possibility", and is only truly worried about hosts? In which case, it should probably say that, because there are more bins than "we ship a host Rust toolchain" and "this target doesn't support std". People are happy to accept "this isn't implemented for this target" and are disappointed to see "this was supposed to work for every target! ...but actually doesn't". |
my personal recommendation would be to make this RFC a living document e.g. on hackmd and then discuss this on zulip to figure out the details and potential smaller uncontroversial steps that you can implement immediately. |
Ideally, that would be all tests, including codegen, UI, cargo, clippy, etc. For UI, I see several possibilities:
Given that the OOTTS is supposed to be a complete testing replacement, there will no longer be a
The OOTTS would run after the toolchain is built, packaged, and installed in a clean environment, like so:
|
The problem that is being solved is to ensure that currently performed post-test processing do not alter the validity of a toolchain. This basically amounts to testing an off the shelf toolchain.
If there are simpler solutions, then that would be great!
I will follow up on this. |
My understanding, which is derived from poking around sources and looking at very verbose logs, and which could be very wrong, is that ultimately the testing portion of the infrastructure constructs a call to Is this close to reality? Note that even if the "right" |
Unless I am misunderstanding, I do not think that this will be the case. The test driver would be compiled for each host. Since a testable cross Rust toolchain comes with its own
The proposal does take into account native and cross compilation, though it does not mention how to perform execution on the target (possibly via remote-test-client/-server). |
Just to be sure: are you planning to do all the work that needs to be done for whatever solution we end up accepting? |
We have had a few post-PGO/BOLT miscompilations which tests don't catch since those optimizations only happen on the |
This has been fixed, we now run tests on post PGO artifacts. The optimized artifacts are tested before they are shipped. |
@oli-obk No, AdaCore will not be doing all the work, as we were hoping that this effort would be community-led. |
@Nilstrieb Just to make sure I understand, the sanity checks you proposed are now performed? |
What does that mean? To me, "community-led" means that someone from the community, like yourself or AdaCore, is doing the work. |
While I do like the motivation, I don't think it is very high priority for volunteer contributors. To state this more controversially: Who other than AdaCore benefits from this work? I would recommend trying to find an existing contributor that you can contract to do this work. I do fear otherwise this RFC may, even if accepted, become nothing more than that. We've had various RFCs and MCPs before that were well motivated, but ended up lacking a dedicated implementor to actually do the work. |
@oli-obk I suspect that Ferrous Systems (hello @skade, @pietroalbini !) would benefit from this work, as Ferrous Systems has and will continue to certify the Ferrocene toolchain for ISO 26262, and I assume they would care about the "Testing the artifacts to be distributed" and "Certification" motivations.
Thanks for the suggestion, I will pass this along. |
@Nilstrieb Out of curiosity, what kind of test would detect an improper packaging of a tool or a library? |
@m-ou-se @Nilstrieb @oli-obk @the8472 Assuming that AdaCore does all the implementation work on the RFC, what would it take to get the RFC accepted? |
Compile hello world after extracting the tarball or something like that. |
We'd want the best solution to your problems, which from what I've gathered so far, is not this RFC. If you list all the specific problems you're having, we may be able to come up with something better for each. |
@Nilstrieb To understand the problem we are trying to solve with this RFC, we will need to switch our view point to the safety critical software paradigm. In this world, human lives depend on the correct operation of software. In a worst case scenario, a software error could lead to falling airplanes, exploding nuclear plants, etc. To prevent this from happening, compilers and their libraries and runtimes are certified against a software safety standard, such as DO-178C. The certification aims to build high degree of confidence that the compiler will not generate wrong code under various conditions. To prove this, the compiler and the generated code are subjected to rigorous testing, where each test is linked back to requirements. Since nothing is without fault, known compiler issues are described and supplemented with detection, workaround, and mitigation strategies. One way to achieve high degree of testing confidence is to test the delivered version of the compiler. This method ensures that the compiler being certified is the exact same compiler used to compile safety critical software. Our RFC is essentially trying to modify the existing testing process to achieve this methodology, while at the same time maintain developer friendliness. |
Given the complexity and limited value to the community at large, I am retracting this RFC. |
This PR proposes the creation of an external stand-alone test suite and associated tools for verifying the Rust toolchain that also fits into existing Rust development workflows.
Rendered