Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Kwxm/improve plc timing (SCP-1872) #2738

Merged
merged 10 commits into from Feb 17, 2021
Merged

Kwxm/improve plc timing (SCP-1872) #2738

merged 10 commits into from Feb 17, 2021

Conversation

kwxm
Copy link
Contributor

@kwxm kwxm commented Feb 16, 2021

This improves the timing feature in the plc command, so it should now be possible to quickly get reasonable benchmark times without having to set up lots of Criterion stuff. You can now say plc evaluate <prog> -X <N> to run a program N times and see the mean execution time. You can also use -x for 100 repetitions.

This turned out to be more difficult than I expected. I had a version using Criterion.Measurement.measure (see the documentation), which evaluates a function a given number of times and returns various statistics, including the total execution time. However, this gives odd results. It turns out that if you repeatedly execute a PLC program on the CEK machine the first evaluation can take significantly longer than subsequent ones (especially for programs which execute very quickly), perhaps because of warm-up effects. You can see the effect of this in the attached PDFs, which show the time reported by measure for 1,2,3,... executions of a couple of programs (one just multiplies 88 by 44, the other is a crowfunding validator). The reported average time starts off large and then drops asymptotically towards a stable value: this is because the atypically large first measurement drags the mean upwards, but the distortion decreases as large numbers of more typical measurements appear. Criterion's measure function doesn't retain the individual times, so it's impossible to adjust it. To overcome this I've written my own code which collects n+1 measurements and discards the first result (and if you look at the individual numbers you can see that the first one really is atypical). This gives much more consistent results. Possibly this isn't a sensible thing to do: in the real world the machine will be receiving a sequence of different validators to execute one after the other and maybe any warm-up effects won't have the opportunity to appear.

Here are the files mentioned above:
mult.pdf
crowdfunding.pdf

Next PR: adding an option to report budgeting information.

Pre-submit checklist:

  • Branch
    • Commit sequence broadly makes sense
    • Key commits have useful messages
    • Relevant tickets are mentioned in commit messages
  • PR
    • Self-reviewed the diff
    • Useful pull request description
    • Reviewer requested
  • If you updated any cabal files or added Haskell packages:
    • nix-shell shell.nix --run updateMaterialized to update the materialized Nix files
    • Update hie-*.yaml files if needed
  • If you changed any Haskell files:
    • nix-shell shell.nix --run fix-stylish-haskell to fix any formatting issues
  • If you changed any Purescript files:
    • nix-shell shell.nix --run fix-purty to fix any formatting issues

Pre-merge checklist:

  • Someone approved it
  • Commits have useful messages
  • Review clarifications made it into the code
  • History is moderately tidy; or going to squash-merge

@kwxm kwxm changed the title Kwxm/improve plc timing Kwxm/improve plc timing (SCP-1872) Feb 16, 2021
@@ -647,13 +687,14 @@ runEval (EvalOptions language inp ifmt evalMode printMode printtime) =
CK -> PLC.unsafeEvaluateCk PLC.defBuiltinsRuntime
CEK -> PLC.unsafeEvaluateCek PLC.defBuiltinsRuntime
body = void . PLC.toTerm $ prog
() <- Exn.evaluate $ rnf body
!_ = rnf body
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This seems to give more consistent results than Exn.evaluate. Is it causing deeper evaluation?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It really shouldn't. Even with Exn.evaluate removed the original line should be forcing the program fully.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I tried again and maybe I was imagining it, or there was something else going on. I'll take another look later just to be sure.

Copy link
Contributor Author

@kwxm kwxm Feb 17, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, not sure if there's really any difference. There do seem to be slight differences for very fast programs, but it could just be statistical noise. If there is a difference I don't think it's terribly significant.

@kwxm kwxm added Costing Anything relating to costs, fees, gas, etc. Plutus Exe labels Feb 16, 2021
Copy link
Contributor

@effectfully effectfully left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Really wondering what's up with that first atypical execution.

Comment on lines -128 to +129
languageMode :: Parser Language
languageMode = typedPLC <|> untypedPLC
languagemode :: Parser Language
languagemode = typedPLC <|> untypedPLC
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I liked the previous version more, but whatever.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, the others are camel case too

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I did that because we've got types with names like LanguageMode, parsers for them with names like langaugeMode, and variables with names like languagemode, and there was some inconsistency in how the latter two were capitalised. It might be less confusing to have languageModeParser and so on.

@@ -647,13 +687,14 @@ runEval (EvalOptions language inp ifmt evalMode printMode printtime) =
CK -> PLC.unsafeEvaluateCk PLC.defBuiltinsRuntime
CEK -> PLC.unsafeEvaluateCek PLC.defBuiltinsRuntime
body = void . PLC.toTerm $ prog
() <- Exn.evaluate $ rnf body
!_ = rnf body
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It really shouldn't. Even with Exn.evaluate removed the original line should be forcing the program fully.

plutus-core/exe/Main.hs Show resolved Hide resolved
Comment on lines -128 to +129
languageMode :: Parser Language
languageMode = typedPLC <|> untypedPLC
languagemode :: Parser Language
languagemode = typedPLC <|> untypedPLC
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, the others are camel case too

plutus-core/exe/Main.hs Outdated Show resolved Hide resolved
NoTiming -> case evaluate body of
PLC.EvaluationSuccess v -> succeed v
PLC.EvaluationFailure -> exitFailure
Timing n -> timeEval n evaluate body >> exitSuccess
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Seems like we should still use the exit status? Which I guess will be the same every time, but still.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah. I was too lazy to do that, but since you spotted it I'll fix it.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

we measure the evaluation time (n+1) times and discard the first result. -}
timeEval :: NFData a => Integer -> (t -> a) -> t -> IO ()
timeEval n evaluate prog =
if n <= 0 then error "Error: the number of repetitions should be at least 1"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Style nit: I'd probably use two clauses with the first one having a | n <=0 guard on it. Why? it means that when you do the do block for the second clause you save a level of indentation.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure that that helped. Having the guards is a bit clearer, but then emacs indented it like this:

    | otherwise = do
  (results, times) <- unzip . tail <$> for (replicate (fromIntegral (n+1)) prog) (timeOnce evaluate)
  let mean = (fromIntegral $ sum times) / (fromIntegral n) :: Double

It looks a bit odd with the stuff inside the do-block to the left of the guard. I can cope with that though.

plutus-core/exe/Main.hs Outdated Show resolved Hide resolved
kwxm and others added 5 commits February 16, 2021 14:28
Co-authored-by: Michael Peyton Jones <michael.peyton-jones@iohk.io>
Co-authored-by: Michael Peyton Jones <michael.peyton-jones@iohk.io>
@kwxm
Copy link
Contributor Author

kwxm commented Feb 17, 2021

Really wondering what's up with that first atypical execution.

I think the standard thing to do is to shrug your shoulders and say that it's probably got to do with caching. That's kind of plausible here: maybe the AST gets loaded into cache on the first iteration and then it's available quickly for later iterations. I got the results by using a shell loop to run plc repeatedly, and I suppose that'll wipe out the cache because you have to load the source file from disk and parse it each time, and you can't know that you'll get the same AST each time. I don't know how you'd check that though.

There are some other odd features: in the first graph it descends pretty steeply but then it clearly jumps up again at 159 iterations. The times for 150 to 170 look like this:

 4.651 4.466 4.744 4.478 4.392 4.469 4.406 5.173 4.463 7.167 7.018 6.878 7.264 7.105 6.749 6.725 6.716 6.855 6.642 6.881 8.822

Criterion does a garbage collection before each individual run, and each run does exactly the same thing, so I don't know why that would happen. There's another upwards jump at 995:

2.916  2.951  2.834  3.039  2.807  3.616  3.977  3.709  3.526  3.440  3.925

That's a tiny program though (the execution times are in microseconds), so it won't take a lot to disturb the numbers. For longer-running ones things seem to be a bit smoother.

@kwxm kwxm merged commit d16b997 into master Feb 17, 2021
@kwxm kwxm deleted the kwxm/improve-plc-timing branch February 17, 2021 12:32
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Costing Anything relating to costs, fees, gas, etc. Plutus Exe
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants