Better size measure + beta reduction for inlining fully saturated app…

…lications (#5308)
IntersectMBO · May 23, 2023 · 95c83d0 · 95c83d0 · github-actions · May 23, 2023
1 parent 79ac43f
commit 95c83d0
Show file tree

Hide file tree

Showing 34 changed files with 550 additions and 334 deletions.
diff --git a/plutus-benchmark/lists/test/Sum/left-fold-data.budget.golden b/plutus-benchmark/lists/test/Sum/left-fold-data.budget.golden
@@ -1,2 +1,2 @@
-({cpu: 394766685
-| mem: 1218562})
+({cpu: 378574685
+| mem: 1148162})
diff --git a/plutus-benchmark/lists/test/Sum/right-fold-data.budget.golden b/plutus-benchmark/lists/test/Sum/right-fold-data.budget.golden
@@ -1,2 +1,2 @@
-({cpu: 401666685
-| mem: 1248562})
+({cpu: 385474685
+| mem: 1178162})
diff --git a/plutus-benchmark/nofib/test/Spec.hs b/plutus-benchmark/nofib/test/Spec.hs
@@ -46,7 +46,7 @@ testClausify = testGroup "clausify"
                , testCase "formula3" $ mkClausifyTest Clausify.F3
                , testCase "formula4" $ mkClausifyTest Clausify.F4
                , testCase "formula5" $ mkClausifyTest Clausify.F5
-               , Tx.fitsInto "formula1 (size)" (Clausify.mkClausifyCode Clausify.F1) 3916
+               , Tx.fitsInto "formula1 (size)" (Clausify.mkClausifyCode Clausify.F1) 3766
                , runTestNested $
                   Tx.goldenBudget "formulaBudget" $ Clausify.mkClausifyCode Clausify.F1
                ]

diff --git a/plutus-benchmark/nofib/test/formulaBudget.budget.golden b/plutus-benchmark/nofib/test/formulaBudget.budget.golden
@@ -1,2 +1,2 @@
-({cpu: 11512613908
-| mem: 48524048})
+({cpu: 10595833908
+| mem: 44538048})
diff --git a/plutus-benchmark/script-contexts/test/checkScriptContextEqualityTerm-20.budget.golden b/plutus-benchmark/script-contexts/test/checkScriptContextEqualityTerm-20.budget.golden
@@ -1,2 +1,2 @@
-({cpu: 597728251
-| mem: 2440846})
+({cpu: 595428251
+| mem: 2430846})
diff --git a/plutus-core/changelog.d/20230508_200232_unsafeFixIO_inline_sat_size_beta.md b/plutus-core/changelog.d/20230508_200232_unsafeFixIO_inline_sat_size_beta.md
@@ -0,0 +1,4 @@
+### Changed
+
+- Improved the inlining of fully saturated functions such that it measures the size
+  differences more accurately, and also performs beta reduction after inlining.
diff --git a/plutus-core/plutus-core/src/PlutusCore/Subst.hs b/plutus-core/plutus-core/src/PlutusCore/Subst.hs
@@ -24,6 +24,7 @@ module PlutusCore.Subst
     , vTerm
     , tvTerm
     , tvTy
+    , purely
     ) where
 
 import PlutusPrelude

diff --git a/plutus-core/plutus-ir/src/PlutusIR/Transform/Inline/CallSiteInline.hs b/plutus-core/plutus-ir/src/PlutusIR/Transform/Inline/CallSiteInline.hs
@@ -4,20 +4,24 @@
 {-# LANGUAGE LambdaCase       #-}
 {-# LANGUAGE TypeFamilies     #-}
 
-{-|
+{- |
 Call site inlining machinery. For now there's only one part: inlining of fully applied functions.
 We inline fully applied functions if the cost and size are acceptable.
 See note [Inlining of fully applied functions].
 -}
-
 module PlutusIR.Transform.Inline.CallSiteInline where
 
-import Control.Monad.State
-import PlutusCore.Rename.Internal
+import PlutusCore qualified as PLC
+import PlutusCore.Rename
+import PlutusIR.Analysis.Size
 import PlutusIR.Contexts
 import PlutusIR.Core
 import PlutusIR.Transform.Inline.Utils
-{- Note [Inlining of fully applied functions]
+import PlutusIR.Transform.Substitute
+
+import Control.Monad.State
+
+{- Note [Inlining and beta reduction of fully applied functions]
 
 We inline if (1) a function is fully applied (2) its cost and size are acceptable. We discuss
 each in detail below.
@@ -32,7 +36,7 @@ Consider `let v = rhs in body`, in which `body` calls `v`.
 We focus on cases when `v` is a function. (Non-functions have arity () or 0).
 I.e., it has type/term lambda abstraction(s). E.g.:
 
-let v = \x1.\x2...\xn.VBody in body
+let v = \x1.\x2...\xn. v_body in body
 
 (x1,x2...xn) or n is the syntactic arity of a term. That is, a record of the arguments that the
 term expects before it may do some work. Since we have both type and lambda
@@ -45,10 +49,11 @@ an under-approximation of how many arguments the term may need.
 e.g. consider the term @let id = \x -> x in id@: the variable @id@ has syntactic
 arity @[]@, but does in fact need an argument before it does any work.
 
-In the `body`, where `v` is *called*,
+In `body`, where `v` is called,
 if it was given the `n` term or type arguments in the correct order, then it is *fully applied*.
-If `VBody` is acceptable in size and cost, we inline the call site of the fully applied `v` in this
-case, i.e., we replace `v` in the `body` with `rhs`. E.g.
+If the changes in size and cost caused by the inlining is acceptable, and the inlining and
+the beta reduction is effect-safe, we inline the call site of the fully applied `v`, and perform
+beta reduction. E.g.
 
 let f = \x.\y -> x
 in
@@ -60,120 +65,139 @@ becomes
 let f = \x.\y -> x
 in
   let z = f q
-  in ((\x.\y -> x) a b)
-
-With beta reduction, it becomes:
-
-let f = \x.\y -> x
-in
-  let z = f q
-  in a (more accurately it becomes (let { x = a, y = b } in x))
-
-This is already a reduction in code size. However, because of this,
-our dead code elimination pass is able to further reduce the code to just `a`.
-
-Because a function can be called in the `body` multiple times and may not be fully applied for all
-its calls, we cannot simply keep a substitution map like in `Inline`,
-which substitute *all* occurrences of a variable. Instead, we store all in scope variables in a
-map, `Utils.NonRecInScopeSet`, with all information needed for inlining saturated functions.
-
-To find out whether a function is fully applied, when we encounter a variable in the `body`,
-we find its arity from the `Utils.NonRecInScopeSet` map and compare it with the list
-of arguments it has been applied to at that site. If it is fully applied, we inline it there.
-
-Note that over-application of a function would also be inlined. E.g.:
-
-```haskell
-let id = \y -> y
-     f = \x -> id
- in f x y
-```
-
-`f`'s arity is (\x) or 1, but is applied to 2 arguments. We inline `f` in this case if its cost and
-size are acceptable.
-
-(2) How do we decide whether cost and size are acceptable?
-
-We currently reuse the heuristics 'Utils.sizeIsAcceptable' and 'Utils.costIsAcceptable'
-that are used in unconditional inlining. For
-
-let v = \x1.\x2...\xn.VBody in body
-
-we check `VBody` with the above "acceptable" functions.
-The "acceptable" functions are currently quite conservative, e.g.,
-we currently reject `Constant` (has acceptable cost but not acceptable size).
-We will work on more refined checks (e.g., checking their sizes instead of just rejecting them) to
-improve the optimization.
+  in a
+
+Note that because `f` occurs twice in the original term, and the RHS of `f` is not small enough,
+`f` is not unconditionally inlined. Instead, the second usage of `f` is inlined. The first
+usage is not inlined because it is not fully applied.
+
+(2) How do we decide whether a fully saturated application can be inlined?
+
+For size, we compare the sizes (in terms of AST nodes before and after the inlining and beta
+reduction), and inline only if it does not increase the size. In the above example, we count
+the number of AST nodes in `f a b` and in `a`. The latter is smaller, which means inlining
+reduces the size.
+
+We care about program size increases primarily because it
+affects the size of the serialized script, which appears on chain and must be paid for.
+This is different to many compilers which care about this also because e.g. larger ASTs
+slow down compilation. We care about this too, but the serialized size is the main driver for us.
+
+The number of AST nodes is an approximate rather than a precise measure. It correlates,
+but doesn't directly map to the size of the serialised script. Different kinds of AST nodes
+may take different number of bits to encode; in particular, a `Constant` node always
+counts as one AST node, but the constant in it can be of arbitrary size. Then, would it be
+better to use the size of the serialised term, instead of the number of AST nodes? Not
+necessarily, because other transformations, such as CSE, may change the size further;
+specifically, if a large constant occurs multiple times in a program, it may get CSE'd.
+
+In general there's no reliable way to precisely predict the size impact of an inlining
+decision, and we believe the number of AST nodes is in fact a good approximation.
+
+For cost, we check whether the RHS (in this example, `\x. \y -> x`) has a small cost.
+If the RHS has a non-zero arity, then the cost is always small (since a lambda or a type
+abstraction is already a value). This may not be the case if the arity is zero.
+
+For effect-safety, we require: (1) the RHS be pure, i.e., evaluating it is guaranteed to
+not have side effects; (2) all arguments be pure, since otherwise it is unsafe to
+perform beta reduction.
 -}
 
--- | Computes the 'Utils.Arity' of a term. Also returns the function body, for checking whether
--- it's `Utils.acceptable`.
+{- | Computes the 'Utils.Arity' of a term. Also returns the function body, for checking whether
+it's `Utils.acceptable`.
+-}
 computeArity ::
-    Term tyname name uni fun ann
-    -> (Arity, Term tyname name uni fun ann)
+  Term tyname name uni fun ann ->
+  (Arity tyname name, Term tyname name uni fun ann)
 computeArity = \case
-    LamAbs _ _ _ body ->
-      let (nextArgs, nextBody) = computeArity body in (TermParam:nextArgs, nextBody)
-    TyAbs _ _ _ body  ->
-      let (nextArgs, nextBody) = computeArity body in (TypeParam:nextArgs, nextBody)
-    -- Whenever we encounter a body that is not a lambda or type abstraction, we are done counting
-    tm                -> ([],tm)
-
--- | Given the arity of a function, and the list of arguments applied to it, return whether it is
--- fully applied or not.
-isFullyApplied :: Arity -> AppContext tyname name uni fun ann -> Bool
-isFullyApplied [] _ = True -- The function needs no more arguments
-isFullyApplied (_lam:_lams) AppContextEnd = False -- under-application
-isFullyApplied (TermParam:tlLams) (TermAppContext _ _ ctx) = isFullyApplied tlLams ctx
-isFullyApplied (TypeParam:tlLams) (TypeAppContext _ _ ctx) = isFullyApplied tlLams ctx
-isFullyApplied _ _ =
-    -- wrong argument type, i.e., we have an ill-typed term here. It's not what we define as fully
-    -- applied. Although if the term was ill-typed before, it will be ill-typed after the
-    -- inlining, and it won't make it any worse, so we could consider accepting this.
-    False
-
--- | Consider whether to inline a saturated application.
-considerInlineSat :: forall tyname name uni fun ann. InliningConstraints tyname name uni fun
-    => Term tyname name uni fun ann -- ^ The `body` of the `Let` term.
-    -> InlineM tyname name uni fun ann (Term tyname name uni fun ann)
-considerInlineSat tm = do
-    -- collect all the arguments of the term being applied to
-    case splitApplication tm of
-      -- if it is a `Var` that is being applied to, check to see if it's fully applied
-      (Var _ann name, ctx) -> do
-        maybeVarInfo <- gets (lookupVarInfo name)
-        case maybeVarInfo of
-          Just varInfo -> do
-            let body = varBody varInfo
-                defAsInlineTerm = varDef varInfo
-                inlineTermToTerm :: InlineTerm tyname name uni fun ann
-                  -> Term tyname name uni fun ann
-                inlineTermToTerm (Done (Dupable var)) = var
-                def = inlineTermToTerm defAsInlineTerm
-                fullyApplied = isFullyApplied (arity varInfo) ctx
-                -- It is the body that we will be left with in the program after we have
-                -- reduced the saturated application, so the size increase we will be left
-                -- with comes from the body, and that is what we need to check is okay
-                bodySizeOk = sizeIsAcceptable body
-                -- The definition itself will be inlined, so we need to check that the cost
-                -- of that is acceptable. Note that we do _not_ check the cost of the _body_.
-                -- We would have paid that regardless.
-                -- Consider e.g. `let y = \x. f x`. We pay the cost of the `f x` at every call
-                -- site regardless. The work that is being duplicated is the work for the lambda.
-                defCostOk = costIsAcceptable def
-            -- check if binding is pure to avoid duplicated effects.
-            -- For strict bindings we can't accidentally make any effects happen less often than
-            -- it would have before, but we can make it happen more often.
-            -- We could potentially do this safely in non-conservative mode.
-            defPure <- isTermBindingPure (varStrictness varInfo) def
-            if fullyApplied && bodySizeOk && defCostOk && defPure
+  LamAbs _ n _ body ->
+    let (nextArgs, nextBody) = computeArity body in (TermParam n : nextArgs, nextBody)
+  TyAbs _ n _ body ->
+    let (nextArgs, nextBody) = computeArity body in (TypeParam n : nextArgs, nextBody)
+  -- Whenever we encounter a body that is not a lambda or type abstraction, we are done counting
+  tm -> ([], tm)
+
+{- | Fully apply the RHS of the given variable to the given arguments, and beta-reduce
+the application, if possible.
+-}
+fullyApplyAndBetaReduce ::
+  forall tyname name uni fun ann.
+  (InliningConstraints tyname name uni fun) =>
+  -- | The variable
+  VarInfo tyname name uni fun ann ->
+  -- | The arguments
+  AppContext tyname name uni fun ann ->
+  InlineM tyname name uni fun ann (Maybe (Term tyname name uni fun ann))
+fullyApplyAndBetaReduce info args0 = do
+  rhsBody <- liftDupable (let Done rhsBody = varRhsBody info in rhsBody)
+  let go ::
+        Term tyname name uni fun ann ->
+        Arity tyname name ->
+        AppContext tyname name uni fun ann ->
+        InlineM tyname name uni fun ann (Maybe (Term tyname name uni fun ann))
+      go acc arity args = case (arity, args) of
+        -- success
+        ([], _) -> pure . Just $ fillAppContext acc args
+        (TermParam param : arity', TermAppContext arg _ args') -> do
+          safe <- safeToBetaReduce param arg
+          if safe
             then do
-                -- rename the term before substituting in
-                renamed <- renameTerm defAsInlineTerm
-                pure $ fillAppContext renamed ctx
-            else pure tm
-          -- The variable maybe a *recursive* let binding, in which case it won't be in the map,
-          -- and we don't process it. ATM recursive bindings aren't inlined.
-          Nothing -> pure tm
-      -- if the term being applied is not a `Var`, don't inline
-      _ -> pure tm
+              acc' <-
+                termSubstNamesM
+                  (\n -> if n == param then Just <$> PLC.rename arg else pure Nothing)
+                  acc
+              go acc' arity' args'
+            else pure Nothing
+        (TypeParam param : arity', TypeAppContext arg _ args') -> do
+          acc' <-
+            termSubstTyNamesM
+              (\n -> if n == param then Just <$> PLC.rename arg else pure Nothing)
+              acc
+          go acc' arity' args'
+        _ -> pure Nothing
+
+      -- Is it safe to turn `(\a -> body) arg` into `body [a := arg]`?
+      -- The criteria is the same as the criteria for inlining `a` in
+      -- `let !a = arg in body`.
+      safeToBetaReduce ::
+        -- `a`
+        name ->
+        -- `arg`
+        Term tyname name uni fun ann ->
+        InlineM tyname name uni fun ann Bool
+      safeToBetaReduce a = shouldUnconditionallyInline Strict a rhsBody
+  go rhsBody (varArity info) args0
+
+-- | Consider whether to inline an application.
+inlineSaturatedApp ::
+  forall tyname name uni fun ann.
+  (InliningConstraints tyname name uni fun) =>
+  Term tyname name uni fun ann ->
+  InlineM tyname name uni fun ann (Term tyname name uni fun ann)
+inlineSaturatedApp t
+  | (Var _ann name, args) <- splitApplication t =
+      gets (lookupVarInfo name) >>= \case
+        Just varInfo ->
+          fullyApplyAndBetaReduce varInfo args >>= \case
+            Just fullyApplied -> do
+              let rhs = varRhs varInfo
+                  -- Inline only if the size is no bigger than not inlining.
+                  sizeIsOk = termSize fullyApplied <= termSize t
+                  -- The definition itself will be inlined, so we need to check that the cost
+                  -- of that is acceptable. Note that we do _not_ check the cost of the _body_.
+                  -- We would have paid that regardless.
+                  -- Consider e.g. `let y = \x. f x`. We pay the cost of the `f x` at
+                  -- every call site regardless. The work that is being duplicated is
+                  -- the work for the lambda.
+                  costIsOk = costIsAcceptable rhs
+              -- check if binding is pure to avoid duplicated effects.
+              -- For strict bindings we can't accidentally make any effects happen less often
+              -- than it would have before, but we can make it happen more often.
+              -- We could potentially do this safely in non-conservative mode.
+              rhsPure <- isTermBindingPure (varStrictness varInfo) rhs
+              pure $ if sizeIsOk && costIsOk && rhsPure then fullyApplied else t
+            Nothing -> pure t
+        -- The variable maybe a *recursive* let binding, in which case it won't be in the map,
+        -- and we don't process it. ATM recursive bindings aren't inlined.
+        Nothing -> pure t
+  | otherwise = pure t