Skip to content

Commit

Permalink
Better size measure + beta reduction for inlining fully saturated app…
Browse files Browse the repository at this point in the history
…lications (#5308)
  • Loading branch information
zliu41 committed May 23, 2023
1 parent 79ac43f commit 95c83d0
Show file tree
Hide file tree
Showing 34 changed files with 550 additions and 334 deletions.
4 changes: 2 additions & 2 deletions plutus-benchmark/lists/test/Sum/left-fold-data.budget.golden
Original file line number Diff line number Diff line change
@@ -1,2 +1,2 @@
({cpu: 394766685
| mem: 1218562})
({cpu: 378574685
| mem: 1148162})
4 changes: 2 additions & 2 deletions plutus-benchmark/lists/test/Sum/right-fold-data.budget.golden
Original file line number Diff line number Diff line change
@@ -1,2 +1,2 @@
({cpu: 401666685
| mem: 1248562})
({cpu: 385474685
| mem: 1178162})
2 changes: 1 addition & 1 deletion plutus-benchmark/nofib/test/Spec.hs
Original file line number Diff line number Diff line change
Expand Up @@ -46,7 +46,7 @@ testClausify = testGroup "clausify"
, testCase "formula3" $ mkClausifyTest Clausify.F3
, testCase "formula4" $ mkClausifyTest Clausify.F4
, testCase "formula5" $ mkClausifyTest Clausify.F5
, Tx.fitsInto "formula1 (size)" (Clausify.mkClausifyCode Clausify.F1) 3916
, Tx.fitsInto "formula1 (size)" (Clausify.mkClausifyCode Clausify.F1) 3766
, runTestNested $
Tx.goldenBudget "formulaBudget" $ Clausify.mkClausifyCode Clausify.F1
]
Expand Down
4 changes: 2 additions & 2 deletions plutus-benchmark/nofib/test/formulaBudget.budget.golden
Original file line number Diff line number Diff line change
@@ -1,2 +1,2 @@
({cpu: 11512613908
| mem: 48524048})
({cpu: 10595833908
| mem: 44538048})
Original file line number Diff line number Diff line change
@@ -1,2 +1,2 @@
({cpu: 597728251
| mem: 2440846})
({cpu: 595428251
| mem: 2430846})
Original file line number Diff line number Diff line change
@@ -0,0 +1,4 @@
### Changed

- Improved the inlining of fully saturated functions such that it measures the size
differences more accurately, and also performs beta reduction after inlining.
1 change: 1 addition & 0 deletions plutus-core/plutus-core/src/PlutusCore/Subst.hs
Original file line number Diff line number Diff line change
Expand Up @@ -24,6 +24,7 @@ module PlutusCore.Subst
, vTerm
, tvTerm
, tvTy
, purely
) where

import PlutusPrelude
Expand Down
266 changes: 145 additions & 121 deletions plutus-core/plutus-ir/src/PlutusIR/Transform/Inline/CallSiteInline.hs
Original file line number Diff line number Diff line change
Expand Up @@ -4,20 +4,24 @@
{-# LANGUAGE LambdaCase #-}
{-# LANGUAGE TypeFamilies #-}

{-|
{- |
Call site inlining machinery. For now there's only one part: inlining of fully applied functions.
We inline fully applied functions if the cost and size are acceptable.
See note [Inlining of fully applied functions].
-}

module PlutusIR.Transform.Inline.CallSiteInline where

import Control.Monad.State
import PlutusCore.Rename.Internal
import PlutusCore qualified as PLC
import PlutusCore.Rename
import PlutusIR.Analysis.Size
import PlutusIR.Contexts
import PlutusIR.Core
import PlutusIR.Transform.Inline.Utils
{- Note [Inlining of fully applied functions]
import PlutusIR.Transform.Substitute

import Control.Monad.State

{- Note [Inlining and beta reduction of fully applied functions]
We inline if (1) a function is fully applied (2) its cost and size are acceptable. We discuss
each in detail below.
Expand All @@ -32,7 +36,7 @@ Consider `let v = rhs in body`, in which `body` calls `v`.
We focus on cases when `v` is a function. (Non-functions have arity () or 0).
I.e., it has type/term lambda abstraction(s). E.g.:
let v = \x1.\x2...\xn.VBody in body
let v = \x1.\x2...\xn. v_body in body
(x1,x2...xn) or n is the syntactic arity of a term. That is, a record of the arguments that the
term expects before it may do some work. Since we have both type and lambda
Expand All @@ -45,10 +49,11 @@ an under-approximation of how many arguments the term may need.
e.g. consider the term @let id = \x -> x in id@: the variable @id@ has syntactic
arity @[]@, but does in fact need an argument before it does any work.
In the `body`, where `v` is *called*,
In `body`, where `v` is called,
if it was given the `n` term or type arguments in the correct order, then it is *fully applied*.
If `VBody` is acceptable in size and cost, we inline the call site of the fully applied `v` in this
case, i.e., we replace `v` in the `body` with `rhs`. E.g.
If the changes in size and cost caused by the inlining is acceptable, and the inlining and
the beta reduction is effect-safe, we inline the call site of the fully applied `v`, and perform
beta reduction. E.g.
let f = \x.\y -> x
in
Expand All @@ -60,120 +65,139 @@ becomes
let f = \x.\y -> x
in
let z = f q
in ((\x.\y -> x) a b)
With beta reduction, it becomes:
let f = \x.\y -> x
in
let z = f q
in a (more accurately it becomes (let { x = a, y = b } in x))
This is already a reduction in code size. However, because of this,
our dead code elimination pass is able to further reduce the code to just `a`.
Because a function can be called in the `body` multiple times and may not be fully applied for all
its calls, we cannot simply keep a substitution map like in `Inline`,
which substitute *all* occurrences of a variable. Instead, we store all in scope variables in a
map, `Utils.NonRecInScopeSet`, with all information needed for inlining saturated functions.
To find out whether a function is fully applied, when we encounter a variable in the `body`,
we find its arity from the `Utils.NonRecInScopeSet` map and compare it with the list
of arguments it has been applied to at that site. If it is fully applied, we inline it there.
Note that over-application of a function would also be inlined. E.g.:
```haskell
let id = \y -> y
f = \x -> id
in f x y
```
`f`'s arity is (\x) or 1, but is applied to 2 arguments. We inline `f` in this case if its cost and
size are acceptable.
(2) How do we decide whether cost and size are acceptable?
We currently reuse the heuristics 'Utils.sizeIsAcceptable' and 'Utils.costIsAcceptable'
that are used in unconditional inlining. For
let v = \x1.\x2...\xn.VBody in body
we check `VBody` with the above "acceptable" functions.
The "acceptable" functions are currently quite conservative, e.g.,
we currently reject `Constant` (has acceptable cost but not acceptable size).
We will work on more refined checks (e.g., checking their sizes instead of just rejecting them) to
improve the optimization.
in a
Note that because `f` occurs twice in the original term, and the RHS of `f` is not small enough,
`f` is not unconditionally inlined. Instead, the second usage of `f` is inlined. The first
usage is not inlined because it is not fully applied.
(2) How do we decide whether a fully saturated application can be inlined?
For size, we compare the sizes (in terms of AST nodes before and after the inlining and beta
reduction), and inline only if it does not increase the size. In the above example, we count
the number of AST nodes in `f a b` and in `a`. The latter is smaller, which means inlining
reduces the size.
We care about program size increases primarily because it
affects the size of the serialized script, which appears on chain and must be paid for.
This is different to many compilers which care about this also because e.g. larger ASTs
slow down compilation. We care about this too, but the serialized size is the main driver for us.
The number of AST nodes is an approximate rather than a precise measure. It correlates,
but doesn't directly map to the size of the serialised script. Different kinds of AST nodes
may take different number of bits to encode; in particular, a `Constant` node always
counts as one AST node, but the constant in it can be of arbitrary size. Then, would it be
better to use the size of the serialised term, instead of the number of AST nodes? Not
necessarily, because other transformations, such as CSE, may change the size further;
specifically, if a large constant occurs multiple times in a program, it may get CSE'd.
In general there's no reliable way to precisely predict the size impact of an inlining
decision, and we believe the number of AST nodes is in fact a good approximation.
For cost, we check whether the RHS (in this example, `\x. \y -> x`) has a small cost.
If the RHS has a non-zero arity, then the cost is always small (since a lambda or a type
abstraction is already a value). This may not be the case if the arity is zero.
For effect-safety, we require: (1) the RHS be pure, i.e., evaluating it is guaranteed to
not have side effects; (2) all arguments be pure, since otherwise it is unsafe to
perform beta reduction.
-}

-- | Computes the 'Utils.Arity' of a term. Also returns the function body, for checking whether
-- it's `Utils.acceptable`.
{- | Computes the 'Utils.Arity' of a term. Also returns the function body, for checking whether
it's `Utils.acceptable`.
-}
computeArity ::
Term tyname name uni fun ann
-> (Arity, Term tyname name uni fun ann)
Term tyname name uni fun ann ->
(Arity tyname name, Term tyname name uni fun ann)
computeArity = \case
LamAbs _ _ _ body ->
let (nextArgs, nextBody) = computeArity body in (TermParam:nextArgs, nextBody)
TyAbs _ _ _ body ->
let (nextArgs, nextBody) = computeArity body in (TypeParam:nextArgs, nextBody)
-- Whenever we encounter a body that is not a lambda or type abstraction, we are done counting
tm -> ([],tm)

-- | Given the arity of a function, and the list of arguments applied to it, return whether it is
-- fully applied or not.
isFullyApplied :: Arity -> AppContext tyname name uni fun ann -> Bool
isFullyApplied [] _ = True -- The function needs no more arguments
isFullyApplied (_lam:_lams) AppContextEnd = False -- under-application
isFullyApplied (TermParam:tlLams) (TermAppContext _ _ ctx) = isFullyApplied tlLams ctx
isFullyApplied (TypeParam:tlLams) (TypeAppContext _ _ ctx) = isFullyApplied tlLams ctx
isFullyApplied _ _ =
-- wrong argument type, i.e., we have an ill-typed term here. It's not what we define as fully
-- applied. Although if the term was ill-typed before, it will be ill-typed after the
-- inlining, and it won't make it any worse, so we could consider accepting this.
False

-- | Consider whether to inline a saturated application.
considerInlineSat :: forall tyname name uni fun ann. InliningConstraints tyname name uni fun
=> Term tyname name uni fun ann -- ^ The `body` of the `Let` term.
-> InlineM tyname name uni fun ann (Term tyname name uni fun ann)
considerInlineSat tm = do
-- collect all the arguments of the term being applied to
case splitApplication tm of
-- if it is a `Var` that is being applied to, check to see if it's fully applied
(Var _ann name, ctx) -> do
maybeVarInfo <- gets (lookupVarInfo name)
case maybeVarInfo of
Just varInfo -> do
let body = varBody varInfo
defAsInlineTerm = varDef varInfo
inlineTermToTerm :: InlineTerm tyname name uni fun ann
-> Term tyname name uni fun ann
inlineTermToTerm (Done (Dupable var)) = var
def = inlineTermToTerm defAsInlineTerm
fullyApplied = isFullyApplied (arity varInfo) ctx
-- It is the body that we will be left with in the program after we have
-- reduced the saturated application, so the size increase we will be left
-- with comes from the body, and that is what we need to check is okay
bodySizeOk = sizeIsAcceptable body
-- The definition itself will be inlined, so we need to check that the cost
-- of that is acceptable. Note that we do _not_ check the cost of the _body_.
-- We would have paid that regardless.
-- Consider e.g. `let y = \x. f x`. We pay the cost of the `f x` at every call
-- site regardless. The work that is being duplicated is the work for the lambda.
defCostOk = costIsAcceptable def
-- check if binding is pure to avoid duplicated effects.
-- For strict bindings we can't accidentally make any effects happen less often than
-- it would have before, but we can make it happen more often.
-- We could potentially do this safely in non-conservative mode.
defPure <- isTermBindingPure (varStrictness varInfo) def
if fullyApplied && bodySizeOk && defCostOk && defPure
LamAbs _ n _ body ->
let (nextArgs, nextBody) = computeArity body in (TermParam n : nextArgs, nextBody)
TyAbs _ n _ body ->
let (nextArgs, nextBody) = computeArity body in (TypeParam n : nextArgs, nextBody)
-- Whenever we encounter a body that is not a lambda or type abstraction, we are done counting
tm -> ([], tm)

{- | Fully apply the RHS of the given variable to the given arguments, and beta-reduce
the application, if possible.
-}
fullyApplyAndBetaReduce ::
forall tyname name uni fun ann.
(InliningConstraints tyname name uni fun) =>
-- | The variable
VarInfo tyname name uni fun ann ->
-- | The arguments
AppContext tyname name uni fun ann ->
InlineM tyname name uni fun ann (Maybe (Term tyname name uni fun ann))
fullyApplyAndBetaReduce info args0 = do
rhsBody <- liftDupable (let Done rhsBody = varRhsBody info in rhsBody)
let go ::
Term tyname name uni fun ann ->
Arity tyname name ->
AppContext tyname name uni fun ann ->
InlineM tyname name uni fun ann (Maybe (Term tyname name uni fun ann))
go acc arity args = case (arity, args) of
-- success
([], _) -> pure . Just $ fillAppContext acc args
(TermParam param : arity', TermAppContext arg _ args') -> do
safe <- safeToBetaReduce param arg
if safe
then do
-- rename the term before substituting in
renamed <- renameTerm defAsInlineTerm
pure $ fillAppContext renamed ctx
else pure tm
-- The variable maybe a *recursive* let binding, in which case it won't be in the map,
-- and we don't process it. ATM recursive bindings aren't inlined.
Nothing -> pure tm
-- if the term being applied is not a `Var`, don't inline
_ -> pure tm
acc' <-
termSubstNamesM
(\n -> if n == param then Just <$> PLC.rename arg else pure Nothing)
acc
go acc' arity' args'
else pure Nothing
(TypeParam param : arity', TypeAppContext arg _ args') -> do
acc' <-
termSubstTyNamesM
(\n -> if n == param then Just <$> PLC.rename arg else pure Nothing)
acc
go acc' arity' args'
_ -> pure Nothing

-- Is it safe to turn `(\a -> body) arg` into `body [a := arg]`?
-- The criteria is the same as the criteria for inlining `a` in
-- `let !a = arg in body`.
safeToBetaReduce ::
-- `a`
name ->
-- `arg`
Term tyname name uni fun ann ->
InlineM tyname name uni fun ann Bool
safeToBetaReduce a = shouldUnconditionallyInline Strict a rhsBody
go rhsBody (varArity info) args0

-- | Consider whether to inline an application.
inlineSaturatedApp ::
forall tyname name uni fun ann.
(InliningConstraints tyname name uni fun) =>
Term tyname name uni fun ann ->
InlineM tyname name uni fun ann (Term tyname name uni fun ann)
inlineSaturatedApp t
| (Var _ann name, args) <- splitApplication t =
gets (lookupVarInfo name) >>= \case
Just varInfo ->
fullyApplyAndBetaReduce varInfo args >>= \case
Just fullyApplied -> do
let rhs = varRhs varInfo
-- Inline only if the size is no bigger than not inlining.
sizeIsOk = termSize fullyApplied <= termSize t
-- The definition itself will be inlined, so we need to check that the cost
-- of that is acceptable. Note that we do _not_ check the cost of the _body_.
-- We would have paid that regardless.
-- Consider e.g. `let y = \x. f x`. We pay the cost of the `f x` at
-- every call site regardless. The work that is being duplicated is
-- the work for the lambda.
costIsOk = costIsAcceptable rhs
-- check if binding is pure to avoid duplicated effects.
-- For strict bindings we can't accidentally make any effects happen less often
-- than it would have before, but we can make it happen more often.
-- We could potentially do this safely in non-conservative mode.
rhsPure <- isTermBindingPure (varStrictness varInfo) rhs
pure $ if sizeIsOk && costIsOk && rhsPure then fullyApplied else t
Nothing -> pure t
-- The variable maybe a *recursive* let binding, in which case it won't be in the map,
-- and we don't process it. ATM recursive bindings aren't inlined.
Nothing -> pure t
| otherwise = pure t

1 comment on commit 95c83d0

@github-actions
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Performance Alert ⚠️

Possible performance regression was detected for benchmark 'Plutus Benchmarks'.
Benchmark result of this commit is worse than the previous benchmark result exceeding threshold 1.05.

Benchmark suite Current: 95c83d0 Previous: 79ac43f Ratio
validation-decode-auction_1-4 197.8 μs 186.8 μs 1.06

This comment was automatically generated by workflow using github-action-benchmark.

CC: @input-output-hk/plutus-core

Please sign in to comment.