New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Kwxm/improve plc timing (SCP-1872) #2738
Conversation
@@ -647,13 +687,14 @@ runEval (EvalOptions language inp ifmt evalMode printMode printtime) = | |||
CK -> PLC.unsafeEvaluateCk PLC.defBuiltinsRuntime | |||
CEK -> PLC.unsafeEvaluateCek PLC.defBuiltinsRuntime | |||
body = void . PLC.toTerm $ prog | |||
() <- Exn.evaluate $ rnf body | |||
!_ = rnf body |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This seems to give more consistent results than Exn.evaluate
. Is it causing deeper evaluation?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It really shouldn't. Even with Exn.evaluate
removed the original line should be forcing the program fully.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I tried again and maybe I was imagining it, or there was something else going on. I'll take another look later just to be sure.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, not sure if there's really any difference. There do seem to be slight differences for very fast programs, but it could just be statistical noise. If there is a difference I don't think it's terribly significant.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Really wondering what's up with that first atypical execution.
languageMode :: Parser Language | ||
languageMode = typedPLC <|> untypedPLC | ||
languagemode :: Parser Language | ||
languagemode = typedPLC <|> untypedPLC |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I liked the previous version more, but whatever.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, the others are camel case too
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I did that because we've got types with names like LanguageMode
, parsers for them with names like langaugeMode
, and variables with names like languagemode
, and there was some inconsistency in how the latter two were capitalised. It might be less confusing to have languageModeParser
and so on.
@@ -647,13 +687,14 @@ runEval (EvalOptions language inp ifmt evalMode printMode printtime) = | |||
CK -> PLC.unsafeEvaluateCk PLC.defBuiltinsRuntime | |||
CEK -> PLC.unsafeEvaluateCek PLC.defBuiltinsRuntime | |||
body = void . PLC.toTerm $ prog | |||
() <- Exn.evaluate $ rnf body | |||
!_ = rnf body |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It really shouldn't. Even with Exn.evaluate
removed the original line should be forcing the program fully.
languageMode :: Parser Language | ||
languageMode = typedPLC <|> untypedPLC | ||
languagemode :: Parser Language | ||
languagemode = typedPLC <|> untypedPLC |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, the others are camel case too
plutus-core/exe/Main.hs
Outdated
NoTiming -> case evaluate body of | ||
PLC.EvaluationSuccess v -> succeed v | ||
PLC.EvaluationFailure -> exitFailure | ||
Timing n -> timeEval n evaluate body >> exitSuccess |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Seems like we should still use the exit status? Which I guess will be the same every time, but still.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah. I was too lazy to do that, but since you spotted it I'll fix it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done.
plutus-core/exe/Main.hs
Outdated
we measure the evaluation time (n+1) times and discard the first result. -} | ||
timeEval :: NFData a => Integer -> (t -> a) -> t -> IO () | ||
timeEval n evaluate prog = | ||
if n <= 0 then error "Error: the number of repetitions should be at least 1" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Style nit: I'd probably use two clauses with the first one having a | n <=0
guard on it. Why? it means that when you do the do
block for the second clause you save a level of indentation.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not sure that that helped. Having the guards is a bit clearer, but then emacs indented it like this:
| otherwise = do
(results, times) <- unzip . tail <$> for (replicate (fromIntegral (n+1)) prog) (timeOnce evaluate)
let mean = (fromIntegral $ sum times) / (fromIntegral n) :: Double
It looks a bit odd with the stuff inside the do-block to the left of the guard. I can cope with that though.
Co-authored-by: Michael Peyton Jones <michael.peyton-jones@iohk.io>
Co-authored-by: Michael Peyton Jones <michael.peyton-jones@iohk.io>
I think the standard thing to do is to shrug your shoulders and say that it's probably got to do with caching. That's kind of plausible here: maybe the AST gets loaded into cache on the first iteration and then it's available quickly for later iterations. I got the results by using a shell loop to run There are some other odd features: in the first graph it descends pretty steeply but then it clearly jumps up again at 159 iterations. The times for 150 to 170 look like this:
Criterion does a garbage collection before each individual run, and each run does exactly the same thing, so I don't know why that would happen. There's another upwards jump at 995:
That's a tiny program though (the execution times are in microseconds), so it won't take a lot to disturb the numbers. For longer-running ones things seem to be a bit smoother. |
This improves the timing feature in the plc command, so it should now be possible to quickly get reasonable benchmark times without having to set up lots of Criterion stuff. You can now say
plc evaluate <prog> -X <N>
to run a program N times and see the mean execution time. You can also use-x
for 100 repetitions.This turned out to be more difficult than I expected. I had a version using
Criterion.Measurement.measure
(see the documentation), which evaluates a function a given number of times and returns various statistics, including the total execution time. However, this gives odd results. It turns out that if you repeatedly execute a PLC program on the CEK machine the first evaluation can take significantly longer than subsequent ones (especially for programs which execute very quickly), perhaps because of warm-up effects. You can see the effect of this in the attached PDFs, which show the time reported bymeasure
for 1,2,3,... executions of a couple of programs (one just multiplies 88 by 44, the other is a crowfunding validator). The reported average time starts off large and then drops asymptotically towards a stable value: this is because the atypically large first measurement drags the mean upwards, but the distortion decreases as large numbers of more typical measurements appear. Criterion'smeasure
function doesn't retain the individual times, so it's impossible to adjust it. To overcome this I've written my own code which collects n+1 measurements and discards the first result (and if you look at the individual numbers you can see that the first one really is atypical). This gives much more consistent results. Possibly this isn't a sensible thing to do: in the real world the machine will be receiving a sequence of different validators to execute one after the other and maybe any warm-up effects won't have the opportunity to appear.Here are the files mentioned above:
mult.pdf
crowdfunding.pdf
Next PR: adding an option to report budgeting information.
Pre-submit checklist:
nix-shell shell.nix --run updateMaterialized
to update the materialized Nix fileshie-*.yaml
files if needednix-shell shell.nix --run fix-stylish-haskell
to fix any formatting issuesnix-shell shell.nix --run fix-purty
to fix any formatting issuesPre-merge checklist: