New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Feature request] Support custom caching for non-reproducible actions? #573
Comments
Is this actually a build problem? Shouldn't your implementation in It seems to me that the part that you really care about is not the |
It may not be a build problem, but more generally an automation problem, but that might come down to just semantics. I'd be quite happy for this to look something like;
It's true you could use a consistent seed for fuzzing and just fuzz n-iterations and it's possible to get reproducible outputs in that case. However there are still use-cases where it's nice to have a full execution graph (that buck2 provides via DICE) where each node in the execution graph is not necessarily reproducible. A more concrete (though still toy) example of a non-reproducible execution graph might include;
But let's say that that the scanner's take 40min to run, and don't need to be run all that often. So having some caching involved would be great, but then you wouldn't want them to be cached indefinitely (which would be the case with buck2). It's also the case that this execution graph is by definition non-reproducible because the website that is being scanned is outside of your control. Buck2 solves 90% of this automation problem by handling execution graphs, remote execution and caching etc. I'm aware that this doesn't necessarily fit with the primary goal of buck2 being a build system. But it has enough overlap for me to find it interesting as a generalised declarative automation framework. Does this sound too far out of left field for buck2? I'm aware that this is kind of build-system adjacent.
This is sort of true, although I think what I'm hoping for is something like |
Yeah, so we've talked about adding support for this kind of a thing before, primarily under the name "volatile actions." I think the hypothetical API is that when you call The use-case that we had in mind at the time is better integration with system toolchains; for example, maybe you want to invalidate all your rust library builds when you upgrade your rustc version. You could define a volatile action that prints the rustc version into a file, and then add that as a never-read input to every rustc action. I think the vibe on volatile actions is basically positive. Just needs someone to go and write some code I think.
These two seem like they could be implemented on top of the first one. You can have a volatile action that prints the current timestamp / 3600 to a file, and then depend on that file from every other action - at the top of the hour, the contents of that file will change and your actions get invalidated. I suppose that's not exactly the same as "expire after 1 hour," but its pretty close. If you don't care about RE, then you can actually modify this scheme to use incremental actions and then get exactly those semantics (have an action that writes the current timestamp to its output, if its been more than 1 hour since the timestamp written there right now).
This one I'm a bit more hesitant on. My concern though isn't around the caching, but rather around the action execution management. Action executions currently are clearly tied to the lifetime of a single command, ie they are executed as part of that command, need to finish before the command can finish, and are cancelled if the command is cancelled. What you're suggesting seems like it would be a deviation from that, which I think is probably hard to do correctly, both in principle and in practice. |
I've thought about the fuzzing thing a number of times, and I sort of came to the conclusion that you probably want to fix the seeds in your fuzzing tests and try to have a reasonable amount of them if you expect them to run under But "Volatile actions" are also really useful for a lot of other random things where a program may need to invoke some kind of ambient side effect on the system, which can actually be used to improve the precision of dependency tracking. When combined with early cut-off, a lot of the time they aren't so bad, like this example:
This is actually a great example that I used to do all the time when using Shake (through a feature called "Oracles.") I think it's really important for some cases. For example, let's say a user builds a project with In C or C++, this kind of mistake isn't so bad, because they have de-facto stabilized ABIs. This exact case can happen today in Buck2 with |
I've been experimenting with using non-reproducible systems in build systems like bazel and buck2. Buck2 and many modern build systems make the assumption that all build actions are reproducible. However I'd really like to use buck2 with some non-deterministic build actions. I'm fully aware that an enormous amount of work has gone into making buck2 reproducible and hermetic, and I'm aware that what I'm suggesting here runs somewhat counter to that effort.
Would the buck2 team be against having an optional/experimental configuration that allows for more direct control over cache-artifact lifetimes? I could see the following caching strategies being useful at a "rule" level;
This might look something like;
Use case's
I'm a cyber-security researcher and I regularly conduct scan's and do local fuzzing etc. Something that I find excellent about buck2 is the ability to create complex repeatable execution graphs. Currently buck2 works great as a declarative build tool where each execution in the execution graph is repeatable e.g. clang. However it would be great to combine some of this with non-reproducible execution graphs as well. This would be useful for;
The text was updated successfully, but these errors were encountered: