Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Builtins] [Evaluation] Drop lookups #4914

Merged
merged 1 commit into from Jan 20, 2023

Conversation

effectfully
Copy link
Contributor

Don't look here yet.

@effectfully
Copy link
Contributor Author

effectfully commented Oct 20, 2022

Running the benchmarks manually (I did check that individual runs are consistent and not shaky at all):

Results table
Script Old New Change
auction_1-1 178.0 μs 149.9 μs -16,3%
auction_1-2 686.6 μs 664.9 μs -3,2%
auction_1-3 693.2 μs 653.8 μs -5,8%
auction_1-4 233.2 μs 199.0 μs -14,6%
auction_2-1 178.9 μs 154.0 μs -13,5%
auction_2-2 691.8 μs 660.6 μs -4,5%
auction_2-3 893.6 μs 850.1 μs -4,8%
auction_2-4 690.5 μs 652.3 μs -5,5%
auction_2-5 229.6 μs 197.6 μs -14,0%
crowdfunding-success-1 206.9 μs 181.7 μs -12,1%
crowdfunding-success-2 209.7 μs 182.5 μs -12,9%
crowdfunding-success-3 207.2 μs 180.7 μs -13,0%
currency-1 251.6 μs 234.9 μs -6,8%
escrow-redeem_1-1 368.0 μs 332.9 μs -9,8%
escrow-redeem_1-2 366.3 μs 332.9 μs -9,3%
escrow-redeem_2-1 440.2 μs 396.3 μs -10,0%
escrow-redeem_2-2 436.1 μs 397.0 μs -8,9%
escrow-redeem_2-3 435.2 μs 395.8 μs -9,2%
escrow-refund-1 156.8 μs 137.2 μs -12,2%
future-increase-margin-1 252.5 μs 236.1 μs -6,3%
future-increase-margin-2 578.6 μs 540.3 μs -6,6%
future-increase-margin-3 576.1 μs 538.1 μs -6,6%
future-increase-margin-4 542.8 μs 507.9 μs -6,5%
future-increase-margin-5 861.9 μs 830.4 μs -3,6%
future-pay-out-1 254.0 μs 234.9 μs -7,9%
future-pay-out-2 575.9 μs 540.0 μs -6,1%
future-pay-out-3 576.9 μs 543.7 μs -5,7%
future-pay-out-4 864.1 μs 832.8 μs -3,7%
future-settle-early-1 253.6 μs 236.2 μs -6,7%
future-settle-early-2 579.8 μs 540.7 μs -6,7%
future-settle-early-3 579.7 μs 540.0 μs -6,7%
future-settle-early-4 656.2 μs 628.8 μs -4,3%
game-sm-success_1-1 416.8 μs 379.5 μs -8,9%
game-sm-success_1-2 198.9 μs 167.0 μs -15,7%
game-sm-success_1-3 682.5 μs 653.0 μs -4,3%
game-sm-success_1-4 226.0 μs 195.3 μs -13,7%
game-sm-success_2-1 416.4 μs 377.8 μs -9,4%
game-sm-success_2-2 198.3 μs 168.4 μs -15,2%
game-sm-success_2-3 686.8 μs 653.9 μs -4,8%
game-sm-success_2-4 231.7 μs 194.8 μs -16,0%
game-sm-success_2-5 685.1 μs 654.4 μs -4,5%
game-sm-success_2-6 232.2 μs 194.2 μs -16,4%
multisig-sm-1 424.8 μs 392.7 μs -7,5%
multisig-sm-2 415.0 μs 387.1 μs -6,7%
multisig-sm-3 423.9 μs 359.3 μs -15,1%
multisig-sm-4 428.5 μs 394.6 μs -7,9%
multisig-sm-5 608.4 μs 584.2 μs -3,9%
multisig-sm-6 423.0 μs 393.5 μs -7,1%
multisig-sm-7 414.9 μs 383.9 μs -7,5%
multisig-sm-8 424.9 μs 359.7 μs -15,3%
multisig-sm-9 430.3 μs 397.9 μs -7,7%
multisig-sm-10 613.2 μs 583.0 μs -4,9%
ping-pong-1 348.6 μs 321.0 μs -7,8%
ping-pong-2 348.0 μs 320.1 μs -8,0%
ping-pong_2-1 205.8 μs 184.9 μs -10,2%
prism-1 165.8 μs 140.8 μs -15,2%
prism-2 448.1 μs 414.7 μs -7,6%
prism-3 379.8 μs 345.4 μs -9,0%
pubkey-1 137.8 μs 119.3 μs -13,1%
stablecoin_1-1 949.2 μs 901.7 μs -5,1%
stablecoin_1-2 189.9 μs 162.7 μs -14,3%
stablecoin_1-3 1.082 ms 1.023 ms 0,0%
stablecoin_1-4 200.3 μs 174.9 μs -13,0%
stablecoin_1-5 1.348 ms 1.268 ms 0,0%
stablecoin_1-6 250.4 μs 218.4 μs -12,8%
stablecoin_2-1 947.1 μs 901.5 μs -4,9%
stablecoin_2-2 190.1 μs 163.5 μs -14,2%
stablecoin_2-3 1.084 ms 1.022 ms 0,0%
stablecoin_2-4 202.0 μs 173.8 μs -14,4%
token-account-1 190.6 μs 171.6 μs -10,0%
token-account-2 337.7 μs 315.2 μs -6,5%
uniswap-1 432.7 μs 415.2 μs -3,9%
uniswap-2 226.2 μs 204.8 μs -9,7%
uniswap-3 1.707 ms 1.615 ms 0,0%
uniswap-4 336.5 μs 291.3 μs -13,4%
uniswap-5 1.180 ms 1.096 ms 0,0%
uniswap-6 323.5 μs 279.9 μs -13,6%
vesting-1 364.7 μs 343.3 μs -5,8%

@effectfully
Copy link
Contributor Author

^ -8.57% on average. I was unable to fall asleep and that gave me enough time to finally realize how to remove those damn lookups.

@michaelpj
Copy link
Contributor

Wow!

@michaelpj
Copy link
Contributor

I am quite scandalized that we didn't try this before??????

@effectfully
Copy link
Contributor Author

effectfully commented Oct 20, 2022

I am quite scandalized that we didn't try this before??????

I've tried something along these lines multiple times (see the roadmap):

Screenshot from 2022-10-20 15-20-26

It's obviously the case that not doing lookups is better than doing them, however doing them allowed us to cache a lot of stuff in the array and the non-obvious part was figuring out how to get the same caching (i.e. lack of inlining and lack of annoyingly persistent join-point generation) for the runtime denotations of builtins, while retaining all the inlining within the runtime denotations themselves.

Note that these issues aren't specific to our extensible builtins. If we simply wrote (pseudocode)

computeCek (VBuiltin AddInteger [x, y]) = do
    let (cost, z) = doCostingForAndComputeAddInteger x y
    spendBudgetCek cost
    returnCek z

we'd still run into the problem of needing to precompute and cache the costing function inside of doCostingAndComputeAddInteger somehow (given a cost model, which we only get at runtime).

So it's just non-trivial to do caching alongside full inlining and I'd tried quite a few things to make that happen before this:

Screenshot from 2022-10-20 15-43-44
did the trick.

@effectfully effectfully force-pushed the effectfully/builtins/evaluation/drop-lookups branch from 1c973b2 to 154b984 Compare November 4, 2022 15:38
@effectfully effectfully force-pushed the effectfully/builtins/evaluation/drop-lookups branch 2 times, most recently from 718e3cf to b45b50d Compare January 11, 2023 15:31
@effectfully
Copy link
Contributor Author

effectfully commented Jan 17, 2023

I've documented everything without changing any code on the evaluation path, rebased on the fresh master, ran the benchmarks once again and got a 2-3x slowdown. A day later, after full-scale investigation and trying out anything and everything I could've thought of, I still have no idea what's going on. I've tried looking at the Core -- it seems fine. I've tried comparing it with the Core that we were getting three months ago when I opened this PR -- I can't observe any difference. I've tried tweaking things -- to no avail. I've run the benchmarks for the old version of this PR, for the new one and for a variety of masters -- it's all the same, this PR used to make things faster and now it makes them much much slower. Since I opened this PR we've moved to a new version of GHC and dropped immediate unlifting to name the most obvious potential culprits.

The reason why this small and very beneficial PR has been open for so long is that I never got it to be in my sprint and changing the sprint scope is explicitly discouraged, regardless of how non-urgent the tasks in the sprint are. The lack of automated benchmarking has not been very helpful either.

I'm closing this PR. I don't feel like I should pull my hair out to overcome a broken development process. Feel free to reopen and take over.

@effectfully effectfully deleted the effectfully/builtins/evaluation/drop-lookups branch January 17, 2023 20:21
@thealmarty
Copy link
Contributor

That's too bad.

@effectfully
Copy link
Contributor Author

That's too bad.

Give me some time to recover from this little breakdown and once we have automated benchmarking, I'll start over. I've moved the Jira ticket (PLT-1066) back to the backlog. Although if somebody else wanted to take over, that would be great too.

@effectfully effectfully restored the effectfully/builtins/evaluation/drop-lookups branch January 18, 2023 20:07
@effectfully effectfully reopened this Jan 18, 2023
@effectfully
Copy link
Contributor Author

I've identified the problem. Two problems, actually. The first one is what I posted on slack, quoting verbatim:

for the love of god, why do our benchmarks not run using the same code path as our prod? In the benchmarks the machine parameters are constructed via

defaultCekParameters :: MachineParameters CekMachineCosts CekValue DefaultUni DefaultFun
defaultCekParameters = mkMachineParameters def defaultCekCostModel

while in the prod the machine params are constructed via

mkDynEvaluationContext :: MonadError CostModelApplyError m => BuiltinVersion DefaultFun -> Plutus.CostModelParams -> m EvaluationContext
mkDynEvaluationContext ver newCMP =
    EvaluationContext <$> mkMachineParametersFor ver newCMP

It's mkMachineParameters vs mkMachineParamtersFor! They are different functions that get compiled VERY differently!

I've been banging my head against a wall and asking myself why the hell the same patch works differently depending on the date it's applied despite the Core being more or less identical (I've even used ddump-prep which is substantially more painful to deal with compared to ddump-simpl), until I came to the conclusion that it must be something else, it can't be related to builtins. And then I observed this (not sure if it explains the weirdest results I've been getting, though).

Overall, all of our previous performance achievements are pretty much irrelevant now, since we've been benchmarking a different function, not the one that is actually used in prod.

@effectfully
Copy link
Contributor Author

effectfully commented Jan 19, 2023

The second one is related to

(not sure if it explains the weirdest results I've been getting, though).

The answer is yes. If we stare at the Core of PlutusCore.Evaluation.Machine.ExBudgetingDefaults (which is the file that is used in benchmarks instead of PlutusCore.Evaluation.Machine.MachineParameters.Default from prod) for long enough, we'll find this gem there:

Rec {
-- RHS size: {terms: 496, types: 226, coercions: 0, joins: 0/0}
defaultCekParameters2
  :: [DefaultFun]
     -> DefaultFun -> BuiltinRuntime (CekValue DefaultUni DefaultFun)
defaultCekParameters2
  = \ (ds_iiZU :: [DefaultFun]) (eta_B0 :: DefaultFun) ->
      case ds_iiZU of {
        [] ->
          case eta_B0 of {
            AddInteger -> lvl380_r1XIK;
            <...>
          }
        : y_iiZW ys_iiZX ->
          case y_iiZW of {
            AddInteger ->
              case lvl380_r1XIK of { __DEFAULT ->
              defaultCekParameters2 ys_iiZX eta_B0
              };
            <...>
          }
      }
end Rec }

-- RHS size: {terms: 8, types: 10, coercions: 6, joins: 0/0}
defaultCekParameters1
  :: BuiltinsRuntime DefaultFun (CekValue DefaultUni DefaultFun)
defaultCekParameters1
  = case $wgo3 0# of { (# ww1_s1UI5, ww2_s1UI6 #) ->
    (defaultCekParameters2 (: ww1_s1UI5 ww2_s1UI6)) `cast` <Co:6>
    }

-- RHS size: {terms: 6, types: 9, coercions: 11, joins: 0/0}
defaultCekParameters
  :: MachineParameters CekMachineCosts CekValue DefaultUni DefaultFun
defaultCekParameters
  = case defaultCekParameters1 `cast` <Co:5> of nt_s1U3j
    { __DEFAULT ->
    MachineParameters defaultCekMachineCosts (nt_s1U3j `cast` <Co:6>)
    }

So defaultCekParameters is a MachineParameters and contains a forced-to-WHNF defaultCekParameters1 which is a BuiltinsRuntime, which is a newtype-wrapper around DefaultFun -> BuiltinRuntime (CekValue DefaultUni DefaultFun). GHC is able to see through newtype-wrappers, so for all intents and purposes a BuiltinsRuntime is a function to GHC and GHC performs its usual backstabbery by turning

case y of
    y' -> \x -> z

into

\x -> case y of
    y' -> z

which is how we end up with defaultCekParameters2 binding eta_B0 :: DefaultFun outside of recursion. Coupled with unsaturated defaultCekParameters2 (: ww1_s1UI5 ww2_s1UI6) in defaultCekParameters1 this gives us a function that recurses over the list of all possible builtins each time a builtin is looked up, which as you can imagine is a tad inefficient.

Solution? Block GHC's backstabbery by turning BuiltinsRuntime into a data rather than a newtype. And that's it. This is the Core that we get:

-- RHS size: {terms: 111, types: 2, coercions: 0, joins: 0/0}
lvl488_r1XNu
  :: DefaultFun -> BuiltinRuntime (CekValue DefaultUni DefaultFun)
lvl488_r1XNu
  = \ (x_i1Pr1 :: DefaultFun) ->
      case x_i1Pr1 of {
        AddInteger -> lvl380_r1XLD;
        <...>
      }

$j6_r1XNv
  :: BuiltinsRuntime DefaultFun (CekValue DefaultUni DefaultFun)
$j6_r1XNv = BuiltinsRuntime lvl488_r1XNu

Rec {
-- RHS size: {terms: 332, types: 224, coercions: 0, joins: 0/0}
defaultCekParameters_go1
  :: [DefaultFun]
     -> BuiltinsRuntime DefaultFun (CekValue DefaultUni DefaultFun)
defaultCekParameters_go1
  = \ (ds_ij0S :: [DefaultFun]) ->
      case ds_ij0S of {
        [] -> $j6_r1XNv;
        : y_ij0W ys_ij0X ->
          case y_ij0W of {
            AddInteger ->
              case lvl380_r1XLD of { __DEFAULT ->
              defaultCekParameters_go1 ys_ij0X
              };
            <...>
          }
      }
end Rec }

-- RHS size: {terms: 8, types: 10, coercions: 0, joins: 0/0}
defaultCekParameters1
  :: BuiltinsRuntime DefaultFun (CekValue DefaultUni DefaultFun)
defaultCekParameters1
  = case $wgo3 0# of { (# ww1_s1DOf, ww2_s1DOg #) ->
    defaultCekParameters_go1 (: ww1_s1DOf ww2_s1DOg)
    }

-- RHS size: {terms: 6, types: 14, coercions: 0, joins: 0/0}
defaultCekParameters
  :: MachineParameters CekMachineCosts CekValue DefaultUni DefaultFun
defaultCekParameters
  = case defaultCekParameters1 of { BuiltinsRuntime dt1_i1ytI ->
    MachineParameters defaultCekMachineCosts dt1_i1ytI
    }

Nice and tidy: defaultCekParameters_go1 only binds one variable as it's supposed to, making defaultCekParameters_go1 (: ww1_s1DOf ww2_s1DOg) in defaultCekParameters1 saturated, making the latter reduce properly in defaultCekParameters.

This suggests that we should simply have a policy prescribing to always wrap functions in data rather than newtype, because who knows where else a newtype may backfire.

@michaelpj
Copy link
Contributor

Great digging!

This suggests that we should simply have a policy prescribing to always wrap functions in data rather than newtype, because who knows where else a newtype may backfire.

Well, it depends, sometimes the saving from having an equivalent-representation newtype might also matter.

@michaelpj
Copy link
Contributor

Were you going to make the fixes to which function we're using in the benchmarks and the data/newtype thing here?

@effectfully
Copy link
Contributor Author

Well, it depends, sometimes the saving from having an equivalent-representation newtype might also matter.

Ok, yes, unless there's a strong reason to use newtype, exceptions from a general rule are always possible.

@effectfully
Copy link
Contributor Author

Were you going to make the fixes to which function we're using in the benchmarks and the data/newtype thing here?

Yes, and then benchmark.

@effectfully effectfully force-pushed the effectfully/builtins/evaluation/drop-lookups branch from b45b50d to 9c119e4 Compare January 20, 2023 15:03
@effectfully
Copy link
Contributor Author

effectfully commented Jan 20, 2023

Ready for review.

Got even better results, -10.35% on average:

Results table
Script Old New Change
auction_1-1 168.1 μs 140.5 μs -16.7%
auction_1-2 607.5 μs 578.0 μs -4.8%
auction_1-3 596.5 μs 563.1 μs -5.5%
auction_1-4 218.2 μs 181.7 μs -17.0%
auction_2-1 169.0 μs 143.3 μs -15.4%
auction_2-2 604.7 μs 567.3 μs -6.1%
auction_2-3 787.8 μs 753.8 μs -4.3%
auction_2-4 597.3 μs 562.1 μs -5.9%
auction_2-5 220.1 μs 179.7 μs -18.6%
crowdfunding-success-1 196.2 μs 166.5 μs -15.3%
crowdfunding-success-2 196.0 μs 167.1 μs -14.8%
crowdfunding-success-3 197.9 μs 167.6 μs -15.2%
currency-1 235.9 μs 214.9 μs -8.9%
escrow-redeem_1-1 331.8 μs 297.4 μs -10.3%
escrow-redeem_1-2 330.6 μs 296.7 μs -10.3%
escrow-redeem_2-1 386.9 μs 343.7 μs -11.1%
escrow-redeem_2-2 387.7 μs 345.6 μs -10.9%
escrow-redeem_2-3 387.8 μs 345.3 μs -10.9%
escrow-refund-1 148.5 μs 126.9 μs -14.9%
future-increase-margin-1 236.8 μs 214.7 μs -9.3%
future-increase-margin-2 506.8 μs 467.9 μs -7.7%
future-increase-margin-3 513.7 μs 467.0 μs -9.0%
future-increase-margin-4 438.1 μs 425.5 μs -3.0%
future-increase-margin-5 766.1 μs 730.9 μs -4.7%
future-pay-out-1 236.1 μs 214.7 μs -9.3%
future-pay-out-2 505.5 μs 462.5 μs -8.5%
future-pay-out-3 511.1 μs 466.5 μs -8.8%
future-pay-out-4 760.1 μs 723.2 μs -4.9%
future-settle-early-1 232.6 μs 213.5 μs -8.2%
future-settle-early-2 503.5 μs 462.3 μs -8.2%
future-settle-early-3 508.3 μs 462.1 μs -9.1%
future-settle-early-4 566.8 μs 533.0 μs -5.8%
game-sm-success_1-1 355.9 μs 329.0 μs -7.3%
game-sm-success_1-2 184.5 μs 155.9 μs -15.8%
game-sm-success_1-3 603.6 μs 556.7 μs -7.8%
game-sm-success_1-4 214.8 μs 179.4 μs -16.4%
game-sm-success_2-1 355.2 μs 329.8 μs -7.3%
game-sm-success_2-2 184.8 μs 155.2 μs -15.8%
game-sm-success_2-3 602.0 μs 554.1 μs -8.0%
game-sm-success_2-4 214.7 μs 179.8 μs -16.4%
game-sm-success_2-5 603.7 μs 557.6 μs -7.6%
game-sm-success_2-6 216.7 μs 180.3 μs -16.7%
multisig-sm-1 370.0 μs 338.9 μs -8.6%
multisig-sm-2 360.5 μs 336.3 μs -6.7%
multisig-sm-3 363.2 μs 336.4 μs -7.4%
multisig-sm-4 371.6 μs 333.1 μs -10.2%
multisig-sm-5 526.4 μs 494.3 μs -6.1%
multisig-sm-6 372.1 μs 340.8 μs -8.6%
multisig-sm-7 359.4 μs 332.8 μs -7.5%
multisig-sm-8 365.1 μs 336.2 μs -7.9%
multisig-sm-9 363.7 μs 332.4 μs -8.5%
multisig-sm-10 525.2 μs 492.5 μs -6.3%
ping-pong-1 307.8 μs 279.2 μs -9.1%
ping-pong-2 306.6 μs 280.6 μs -8.5%
ping-pong_2-1 188.5 μs 165.8 μs -12.2%
prism-1 161.1 μs 131.5 μs -18.6%
prism-2 392.1 μs 351.8 μs -10.5%
prism-3 345.8 μs 305.0 μs -11.6%
pubkey-1 134.2 μs 111.3 μs -17.2%
stablecoin_1-1 853.8 μs 796.4 μs -6.7%
stablecoin_1-2 184.9 μs 153.3 μs -16.8%
stablecoin_1-3 986.5 μs 911.5 μs -7.6%
stablecoin_1-4 198.6 μs 162.7 μs -18.2%
stablecoin_1-5 1.256 ms 1.167 ms 0.0%
stablecoin_1-6 237.9 μs 200.0 μs -15.6%
stablecoin_2-1 853.9 μs 791.7 μs -7.3%
stablecoin_2-2 184.1 μs 154.0 μs -16.3%
stablecoin_2-3 977.8 μs 915.9 μs -6.3%
stablecoin_2-4 194.8 μs 162.1 μs -16.5%
token-account-1 178.5 μs 158.1 μs -11.2%
token-account-2 312.9 μs 278.6 μs -10.9%
uniswap-1 388.7 μs 363.6 μs -6.4%
uniswap-2 206.6 μs 186.9 μs -9.7%
uniswap-3 1.643 ms 1.550 ms 0.0%
uniswap-4 310.5 μs 259.4 μs -16.5%
uniswap-5 1.088 ms 1.005 ms 0.0%
uniswap-6 297.9 μs 254.8 μs -14.5%
vesting-1 322.8 μs 294.5 μs -8.7%

@effectfully effectfully force-pushed the effectfully/builtins/evaluation/drop-lookups branch from 9c119e4 to bd9a3dc Compare January 20, 2023 15:36
@effectfully effectfully marked this pull request as ready for review January 20, 2023 15:42
@michaelpj
Copy link
Contributor

Wow, that's massive.

Copy link
Contributor

@michaelpj michaelpj left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

plutus-core/plutus-core/src/PlutusCore/Builtin/Meaning.hs Outdated Show resolved Hide resolved
@effectfully effectfully force-pushed the effectfully/builtins/evaluation/drop-lookups branch from bd9a3dc to b0b5a74 Compare January 20, 2023 16:15
@effectfully
Copy link
Contributor Author

Wow, that's massive.

Keep in mind that I'm running our current benchmarks, which don't necessarily reflect the performance of the production evaluator. But I'd like to get this merged already and then we can talk about fixing the benchmarks. If you're OK with this strategy, hit "merge".

@michaelpj michaelpj enabled auto-merge (squash) January 20, 2023 16:24
@michaelpj michaelpj merged commit ff72adc into master Jan 20, 2023
@michaelpj michaelpj deleted the effectfully/builtins/evaluation/drop-lookups branch January 20, 2023 17:59
@effectfully
Copy link
Contributor Author

Actually, it's -10.35% when I account for bench-compare calculating things incorrectly (reported on slack).

effectfully added a commit that referenced this pull request Apr 4, 2024
This unscrews another portion of benchmarks. We still have benchmarks that are screwed up (see [PLT-6541](https://input-output.atlassian.net/browse/PLT-6541)) See [this](#4914 (comment)) comment for an explanation of what went wrong.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants