New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Redesign the Builtin Rule API #453
Comments
I hope this can also help make progress with this issue: snowleopard/hadrian#217. Ideally, I need a way to implement something like this: -- Remember a value associated with a key and force a rebuild if the value has changed.
remember :: Int -> Int -> Action ()
remember key value = ...
checkArgsHash :: Target -> Action ()
checkArgsHash target = do
args <- interpret target getArgs
remember (hash target) (hash args) At the moment we implement checkArgsHash :: Target -> Action ()
checkArgsHash target = do
_ <- askOracle $ ArgsHashKey target :: Action Int
return ()
-- Oracle for storing per-target argument list hashes
argsHashOracle :: Rules ()
argsHashOracle = void $
addOracle $ \(ArgsHashKey target) -> hash <$> interpret target getArgs Storing all complete |
Questions and comments:
I'll also note that this is really close to what I have in my |
Thanks for the comments!
Great that we seem to be converging on an answer! |
I don't think this is actually an ability. Suppose A depends on B unconditionally. If B changes, and A does not check its dependencies, then A is in an inconsistent state. And if A changes, you do not get any optimization by skipping the check for B, because A will call For conditional dependencies, all that matters is that Shake checks them in the same order as the rule did; i.e. You mentioned changing Shake to do the
I was thinking of a version of
That explains |
Oh, and there's no need for ShakeOptions as an argument, it's in the Action monad so |
But this only makes sense if there is a previous run. How about the dependencies of the previous run are passed in, and the rule can decide to either pass them through or create a new list? |
@ndmitchell But I think I've shown the solution already, or is it wrong? All I need is this function to be implementable using Shake's rules: -- Remember a value associated with a key and force a rebuild if the value has changed.
remember :: Int -> Int -> Action ()
remember key value = ... With this function we can implement checkArgsHash :: Target -> Action ()
checkArgsHash target = do
args <- interpret target getArgs
remember (hash target) (hash args) There are several cases to consider:
So, if I haven't made a mistake above, we don't need to store P.S.: I am pretty sure |
My suggestions:
Now, let's see if this solves #388. I'll consider rules which have the map
OK, so I am convinced this does solve the problem. Does look annoying to write these functions though. |
Replying to @Mathnerd314
I agree this is the same as #427. If we decide in #427 that it's reasonable to check stored value after, then this should be a Bool. Let's discuss that particular issue there.
What do you imagine the use case for that is? Certainly possible to do, I just can't think why it's useful - usually more fine grained rules solves the problem in a simpler way.
The person calling
The builtin rule will be pre-applied to the first two arguments, since given the Skip/Ignore/Rebuild flags in the options, it may be able to precompute some faster data structure to look up those values. Similarly, given the matches it may be able to precompute something. As a result, it's not able to run Action stuff at that point.
That's certainly a design decision I considered, but requires a value-level representation of dependencies, which otherwise isn't necessary. |
@snowleopard I still don't see how this works. You need to define an operation that given a key says if the value is still valid or not. From the hash of a target you can't compute the expected result, so you have to store the whole target. I'd love to see some kind of prototype using stamp files. If you can do it with stamp files, I'm sure this interface will let you do it more efficiently without. |
Probably not a bad idea. Would also make it explicit that there are two staged input arguments, answering your point from 3.
Phony rules are encoded as 0 byte storage, and are always considered to have rebuilt. It's the mirror of https://github.com/ndmitchell/shake/blob/master/src/Development/Shake/Experiment/Rules.hs#L180
Certainly this opens up a class of bugs, but the idea is that this is the most efficient interface possible, and then sugaring up to some kind of Changed value is quite reasonable. For For
Yep, this is the low-level substrate which hopefully we can sugar up to something easy. The fact it's possible is the thing to dwell on 😄. |
I've implemented most of this in my branch; it uses Binary / ByteString.Lazy instead of a new Encoding class, and there are other differences, but the general sense is there. My I think I get what @snowleopard 's asking; it's possible in current Shake with a new cache primitive that uses a bimap. You would have -- Core.hs:
newBidirectionalCacheIO :: (k -> Action v) -> IO (k -> Action v, v -> IO (Maybe k))
-- Rules/TargetChecker.hs
module TargetChecker(makeTargetChecker) where
newtype TargetHash = TargetHash ...
hash :: Target -> TargetHash
hashResult :: whatever -> HashResult
makeTargetChecker :: Rules (Target -> Action ())
makeTargetChecker = do
(targetToHash, hashToTarget) <- liftIO $ newBidirectionalCacheIO hash
askOracle <- addOracle $ \(TargetHash thash) -> do
Just target <- liftIO $ hashToTarget thash -- always successful since it can only be called from below
hashResult <$> interpret target getArgs
return (\target -> do
thash <- targetToHash target
_ <- askOracle
return ()) So really it belongs in a separate issue. Although, with the new rule API, having a
Pluto uses it in their C compile rule. Don't look too closely at it because I think it's buggy, but the general idea is sound. The rule is formulated as many-to-many (a single rule that compiles all files), but it only recompiles the C files that changed. So a directory that has |
Awesome @Mathnerd314! I'll take a look and see exactly what you've done. Any reason for going via Binary instead of Encoder? My benchmarks have shown Binary to be a bottleneck, hence the desire to switch to Encoder and avoid the performance penalty. Any thoughts of anything that did/did not work particularly well? Any hidden traps we hadn't considered? |
I used Binary because it was already there, whereas using Encoder would require writing a new class and changing more files. And Binary is advertised as 'efficient'; in theory it should optimize to something close to your pointer arithmetic. In practice, well... it's probably worth benchmarking again, because the function layout is significantly different and so GHC may find better optimizations or lose old ones. But, I expect that Shake will need to be significantly rearchitected again to support file-system watching, so I don't think it's needed yet. In terms of changes, it was relatively simple, although tedious; your proposal is really a series of changes (API's become monomorphic, Rule class goes away, etc.) so they decomposed well. I still haven't run the testsuite, so there could be some weird serialization or type coercion failures. But most of the changes were just moving around code, rather than a particularly grand redesign (user rule matching being a mild exception). I didn't implement the whole API though; it's still tweaks here and there. |
Benchmarking shows Binary to be a bottleneck. My intention is to present a Binary interface to the outside world (probably), but use Encoder and implement Encoder using Binary for user types. Binary is nowhere near my bytestring code, partly because it can't be - it starts off with lazy bytestrings, has to be able to consume a prefix (my encoders rely on the length) etc. See https://www.fpcomplete.com/blog/2016/03/efficient-binary-serialization for benchmarks - although I came to the conclusion Encoder would be necessary before that blog post. That said, it's certainly reasonable to get it all working on top of Binary first, then as a second step move to Encoder - no reason to break everything in one go. @Mathnerd314 - how do you suggest I proceed? Are the patches in your branch clean enough to move over wholesale? Should I diff our trees and move diffs across? Should I reimplement but using yours as a guide? |
After looking at Binary more, I agree it's a poorly designed API. But it's more work to implement a new API, and I don't think Encoder is sufficient. So now I'm writing my own serialization API, using lenses... (on a local branch) As far as proceeding, at this point it's probably less work to merge my branch than to reimplement it or move patches one-at-a-time. The diff is relatively readable, except for the Core / Core2 and Database/Database2 splits, which can be worked around with
I recommend that you go through the diff and comment any changes you don't like; I'll fix/reverse those in new commits, then you can merge it, either as a squashed commit (there's lots of half-baked commits and false starts) or with a normal git merge. |
@Mathnerd314 I have to admit I don't understand your solution, but I'll meditate on it when I'm back to fixing |
I was curious what the status of this was, so I looked through master:
|
Thanks for that summary - a good todo list and I confirm all those things are still in progress. A couple of questions:
Current plan is most definitely Database refactoring. That code is important and complex, so want to simplify it then get rid of the execute/build/check distinction. After that everything else looks reasonably straightforward. |
You have user rules as an extra argument to execute:
I didn't pass user rules at all; I stored user rules as part of Action's Global structure, and added two functions (userRule & getUserRules) to retrieve them.
The general expectation of internal modules is that they are exposed, so that you can actually use them in the rare cases where this is necessary. See for example |
Ah yes, there are lots of argument details, I copied about 90% of yours and then differ in this part. The reason is I want to the builtin rules to be able to ask for all the user rules in advance then run advanced/time-consuming optimisations on them once, and then use them lots of times. That necessitates the rules being available during builtin rule construction rather than in Action. I don't yet do any such optimisations, but the hope is that if profiling ever shows up rule matching I could compile a single finite state automaton from all rules and match them in time linear to the length of the path, rather than linear to the number of rules.
I appreciate this is the standard pattern in Haskell, but I find it quite distasteful - it massively increases the API, adds a lot of ambiguity around semantic versioning, and means I can't easily figure out what people are doing with Shake. Instead I prefer to be responsive and if people need internals exposing for good reasons then expose them properly and to everyone, and ideally very quickly. I'm certainly going to expose things like FileA in some form by the end, but not just everything in Internal. As an example of where exposing internals goes wrong, consider Text, which has 27 internal modules including things like Data.Text.Internal.Builder.Int.Digits which consists of a longish ByteString with no clear semantics... |
As a concrete example of in-advance rules, for file rules, I sometimes generate a Set data structure if there are lots of literal paths to match. At the moment that has to be one separate Set per |
Just backing up my in progress notes, which I've been leaving in an unsaved text buffer for a few weeks, but now have to reboot: https://gist.github.com/ndmitchell/904306020655edb3b56caadc286ba8ee (I'm expecting them to be unintelligible to anyone but me!) |
It seems quite readable, here's my notes for comparison: https://gist.githubusercontent.com/Mathnerd314/e9e9f0d667270eaf37a9bfced53c8b58/raw/4ad976d083f102107133ae640722ae8c0bb5a343/gistfile1.txt |
I switched to running storedValue in the thread pool. With -j1 it goes twice as slow for checking 1000's of filetimes. At -j4 it goes slightly faster. The bottleneck is reported less in pool and more in the waiting/rendezvous code, which might need some microoptimisation. However, such optimisation can wait - it's certainly in a ball park of fast enough. Progress continues towards the end goal. |
How is this going? Still waiting so I can port my code to a newer version of Shake 8) |
Progress continues - I think the |
I'm 80% through the implementation and hope to finish end of Jan. |
I think the API is completely rewritten now, so this can be closed. |
Various tickets have lead me to the conclusion that I should redesign the Rules class. This ticket is to discuss that endeavour. Notes:
*.exe
files.The code is all under https://github.com/ndmitchell/shake/tree/master/src/Development/Shake/Experiment. Interface.hs is the proposed API, and Rules.hs is my implementation of equivalent existing rules under that API, mostly stubs.
CC @ezyang and @Mathnerd314 whose requests prompted this redesign.
The text was updated successfully, but these errors were encountered: