Skip to content

Commit

Permalink
Implement cardinality analysis
Browse files Browse the repository at this point in the history
This major patch implements the cardinality analysis described
in our paper "Higher order cardinality analysis". It is joint
work with Ilya Sergey and Dimitrios Vytiniotis.

The basic is augment the absence-analysis part of the demand
analyser so that it can tell when something is used
	 never
	 at most once
 	 some other way

The "at most once" information is used
    a) to enable transformations, and
       in particular to identify one-shot lambdas
    b) to allow updates on thunks to be omitted.

There are two new flags, mainly there so you can do performance
comparisons:
    -fkill-absence   stops GHC doing absence analysis at all
    -fkill-one-shot  stops GHC spotting one-shot lambdas
                     and single-entry thunks

The big changes are:

* The Demand type is substantially refactored.  In particular
  the UseDmd is factored as follows
      data UseDmd
        = UCall Count UseDmd
        | UProd [MaybeUsed]
        | UHead
        | Used

      data MaybeUsed = Abs | Use Count UseDmd

      data Count = One | Many

  Notice that UCall recurses straight to UseDmd, whereas
  UProd goes via MaybeUsed.

  The "Count" embodies the "at most once" or "many" idea.

* The demand analyser itself was refactored a lot

* The previously ad-hoc stuff in the occurrence analyser for foldr and
  build goes away entirely.  Before if we had build (\cn -> ...x... )
  then the "\cn" was hackily made one-shot (by spotting 'build' as
  special.  That's essential to allow x to be inlined.  Now the
  occurrence analyser propagates info gotten from 'build's stricness
  signature (so build isn't special); and that strictness sig is
  in turn derived entirely automatically.  Much nicer!

* The ticky stuff is improved to count single-entry thunks separately.

One shortcoming is that there is no DEBUG way to spot if an
allegedly-single-entry thunk is acually entered more than once.  It
would not be hard to generate a bit of code to check for this, and it
would be reassuring.  But it's fiddly and I have not done it.

Despite all this fuss, the performance numbers are rather under-whelming.
See the paper for more discussion.

       nucleic2          -0.8%    -10.9%      0.10      0.10     +0.0%
         sphere          -0.7%     -1.5%      0.08      0.08     +0.0%
--------------------------------------------------------------------------------
            Min          -4.7%    -10.9%     -9.3%     -9.3%    -50.0%
            Max          -0.4%     +0.5%     +2.2%     +2.3%     +7.4%
 Geometric Mean          -0.8%     -0.2%     -1.3%     -1.3%     -1.8%

I don't quite know how much credence to place in the runtime changes,
but movement seems generally in the right direction.
  • Loading branch information
Simon Peyton Jones committed Jun 6, 2013
1 parent da4ff65 commit 99d4e5b
Show file tree
Hide file tree
Showing 15 changed files with 1,304 additions and 736 deletions.
1,034 changes: 712 additions & 322 deletions compiler/basicTypes/Demand.lhs

Large diffs are not rendered by default.

4 changes: 2 additions & 2 deletions compiler/basicTypes/MkId.lhs
Expand Up @@ -321,10 +321,10 @@ mkDictSelId dflags no_unf name clas
strict_sig = mkStrictSig (mkTopDmdType [arg_dmd] topRes)
arg_dmd | new_tycon = evalDmd
| otherwise = mkProdDmd [ if the_arg_id == id then evalDmd else absDmd
| otherwise = mkManyUsedDmd $
mkProdDmd [ if the_arg_id == id then evalDmd else absDmd
| id <- arg_ids ]
tycon = classTyCon clas
new_tycon = isNewTyCon tycon
[data_con] = tyConDataCons tycon
Expand Down
7 changes: 4 additions & 3 deletions compiler/basicTypes/OccName.lhs
Expand Up @@ -58,7 +58,8 @@ module OccName (
-- ** Derived 'OccName's
isDerivedOccName,
mkDataConWrapperOcc, mkWorkerOcc, mkDefaultMethodOcc, mkGenDefMethodOcc,
mkDataConWrapperOcc, mkWorkerOcc, mkDefaultMethodOcc,
mkGenDefMethodOcc,
mkDerivedTyConOcc, mkNewTyCoOcc, mkClassOpAuxOcc,
mkCon2TagOcc, mkTag2ConOcc, mkMaxTagOcc,
mkClassDataConOcc, mkDictOcc, mkIPOcc,
Expand Down Expand Up @@ -574,8 +575,8 @@ isDerivedOccName occ =
\end{code}

\begin{code}
mkDataConWrapperOcc, mkWorkerOcc, mkDefaultMethodOcc, mkGenDefMethodOcc,
mkDerivedTyConOcc, mkClassDataConOcc, mkDictOcc,
mkDataConWrapperOcc, mkWorkerOcc, mkDefaultMethodOcc,
mkGenDefMethodOcc, mkDerivedTyConOcc, mkClassDataConOcc, mkDictOcc,
mkIPOcc, mkSpecOcc, mkForeignExportOcc, mkGenOcc1, mkGenOcc2,
mkGenD, mkGenR, mkGen1R, mkGenRCo,
mkDataTOcc, mkDataCOcc, mkDataConWorkerOcc, mkNewTyCoOcc,
Expand Down
3 changes: 1 addition & 2 deletions compiler/cmm/CmmParse.y
Expand Up @@ -1005,8 +1005,7 @@ stmtMacros = listToUFM [
tickyAllocPAP goods slop ),
( fsLit "TICK_ALLOC_UP_THK", \[goods,slop] ->
tickyAllocThunk goods slop ),
( fsLit "UPD_BH_UPDATABLE", \[reg] -> emitBlackHoleCode False reg ),
( fsLit "UPD_BH_SINGLE_ENTRY", \[reg] -> emitBlackHoleCode True reg )
( fsLit "UPD_BH_UPDATABLE", \[reg] -> emitBlackHoleCode reg )
]
emitPushUpdateFrame :: CmmExpr -> CmmExpr -> FCode ()
Expand Down
44 changes: 34 additions & 10 deletions compiler/codeGen/StgCmmBind.hs
Expand Up @@ -130,8 +130,7 @@ cgBind (StgNonRec name rhs)
= do { (info, fcode) <- cgRhs name rhs
; addBindC (cg_id info) info
; init <- fcode
; emit init
}
; emit init }
-- init cannot be used in body, so slightly better to sink it eagerly

cgBind (StgRec pairs)
Expand Down Expand Up @@ -209,9 +208,34 @@ cgRhs id (StgRhsCon cc con args)
buildDynCon id True cc con args

cgRhs name (StgRhsClosure cc bi fvs upd_flag _srt args body)
| null fvs -- See Note [Nested constant closures]
= do { (info, fcode) <- cgTopRhsClosure Recursive name cc bi upd_flag args body
; return (info, fcode >> return mkNop) }
| otherwise
= do dflags <- getDynFlags
mkRhsClosure dflags name cc bi (nonVoidIds fvs) upd_flag args body

{- Note [Nested constant closures]
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
If we have
f x = let funny = not True
in ...
then 'funny' is a nested closure (compiled with cgRhs) that has no free vars.
This does not happen often, because let-floating takes them all to top
level; but it CAN happen. (Reason: let-floating may make a function f smaller
so it can be inlined, so now (f True) may generate a local no-fv closure.
This actually happened during bootsrapping GHC itself, with f=mkRdrFunBind
in TcGenDeriv.)
If we have one of these things, AND they allocate, the heap check will
refer to the static funny_closure; but there isn't one! (Why does the
heap check refer to the static closure? Becuase nodeMustPointToIt is
False, which is fair enough.)
Simple solution: compile the RHS as if it was top level. Then
everything works. A minor benefit is eliminating the allocation code
too. -}

------------------------------------------------------------------------
-- Non-constructor right hand sides
------------------------------------------------------------------------
Expand Down Expand Up @@ -547,8 +571,9 @@ thunkCode cl_info fv_details _cc node arity body
; entryHeapCheck cl_info node' arity [] $ do
{ -- Overwrite with black hole if necessary
-- but *after* the heap-overflow check
; tickyEnterThunk cl_info
; when (blackHoleOnEntry cl_info && node_points)
(blackHoleIt cl_info node)
(blackHoleIt node)

-- Push update frame
; setupUpdate cl_info node $
Expand All @@ -568,14 +593,14 @@ thunkCode cl_info fv_details _cc node arity body
-- Update and black-hole wrappers
------------------------------------------------------------------------

blackHoleIt :: ClosureInfo -> LocalReg -> FCode ()
blackHoleIt :: LocalReg -> FCode ()
-- Only called for closures with no args
-- Node points to the closure
blackHoleIt closure_info node
= emitBlackHoleCode (closureSingleEntry closure_info) (CmmReg (CmmLocal node))
blackHoleIt node_reg
= emitBlackHoleCode (CmmReg (CmmLocal node_reg))

emitBlackHoleCode :: Bool -> CmmExpr -> FCode ()
emitBlackHoleCode is_single_entry node = do
emitBlackHoleCode :: CmmExpr -> FCode ()
emitBlackHoleCode node = do
dflags <- getDynFlags

-- Eager blackholing is normally disabled, but can be turned on with
Expand Down Expand Up @@ -603,7 +628,6 @@ emitBlackHoleCode is_single_entry node = do
-- work with profiling.

when eager_blackholing $ do
tickyBlackHole (not is_single_entry)
emitStore (cmmOffsetW dflags node (fixedHdrSize dflags))
(CmmReg (CmmGlobal CurrentTSO))
emitPrimCall [] MO_WriteBarrier []
Expand All @@ -614,7 +638,7 @@ setupUpdate :: ClosureInfo -> LocalReg -> FCode () -> FCode ()
-- so that the cost centre in the original closure can still be
-- extracted by a subsequent enterCostCentre
setupUpdate closure_info node body
| closureReEntrant closure_info
| not (lfUpdatable (closureLFInfo closure_info))
= body

| not (isStaticClosure closure_info)
Expand Down
25 changes: 17 additions & 8 deletions compiler/codeGen/StgCmmTicky.hs
Expand Up @@ -133,7 +133,7 @@ import TyCon

import Data.Maybe
import qualified Data.Char
import Control.Monad ( when )
import Control.Monad ( unless, when )

-----------------------------------------------------------------------------
--
Expand Down Expand Up @@ -238,13 +238,22 @@ tickyEnterDynCon = ifTicky $ bumpTickyCounter (fsLit "ENT_DYN_CON_ctr")
tickyEnterStaticCon = ifTicky $ bumpTickyCounter (fsLit "ENT_STATIC_CON_ctr")
tickyEnterViaNode = ifTicky $ bumpTickyCounter (fsLit "ENT_VIA_NODE_ctr")

tickyEnterThunk :: FCode ()
tickyEnterThunk = ifTicky $ do
bumpTickyCounter (fsLit "ENT_DYN_THK_ctr")
ifTickyDynThunk $ do
ticky_ctr_lbl <- getTickyCtrLabel
registerTickyCtrAtEntryDyn ticky_ctr_lbl
bumpTickyEntryCount ticky_ctr_lbl
tickyEnterThunk :: ClosureInfo -> FCode ()
tickyEnterThunk cl_info
= ifTicky $ do
{ bumpTickyCounter ctr
; unless static $ do
ticky_ctr_lbl <- getTickyCtrLabel
registerTickyCtrAtEntryDyn ticky_ctr_lbl
bumpTickyEntryCount ticky_ctr_lbl }
where
updatable = closureSingleEntry cl_info
static = isStaticClosure cl_info

ctr | static = if updatable then fsLit "ENT_STATIC_THK_SINGLE_ctr"
else fsLit "ENT_STATIC_THK_MANY_ctr"
| otherwise = if updatable then fsLit "ENT_DYN_THK_SINGLE_ctr"
else fsLit "ENT_DYN_THK_MANY_ctr"

tickyEnterStdThunk :: FCode ()
tickyEnterStdThunk = tickyEnterThunk
Expand Down
57 changes: 30 additions & 27 deletions compiler/coreSyn/CorePrep.lhs
Expand Up @@ -345,12 +345,13 @@ cpeBind :: TopLevelFlag -> CorePrepEnv -> CoreBind
-> UniqSM (CorePrepEnv, Floats)
cpeBind top_lvl env (NonRec bndr rhs)
= do { (_, bndr1) <- cpCloneBndr env bndr
; let is_strict = isStrictDmd (idDemandInfo bndr)
; let dmd = idDemandInfo bndr
is_unlifted = isUnLiftedType (idType bndr)
; (floats, bndr2, rhs2) <- cpePair top_lvl NonRecursive
(is_strict || is_unlifted)
dmd
is_unlifted
env bndr1 rhs
; let new_float = mkFloat is_strict is_unlifted bndr2 rhs2
; let new_float = mkFloat dmd is_unlifted bndr2 rhs2
-- We want bndr'' in the envt, because it records
-- the evaluated-ness of the binder
Expand All @@ -360,7 +361,7 @@ cpeBind top_lvl env (NonRec bndr rhs)
cpeBind top_lvl env (Rec pairs)
= do { let (bndrs,rhss) = unzip pairs
; (env', bndrs1) <- cpCloneBndrs env (map fst pairs)
; stuff <- zipWithM (cpePair top_lvl Recursive False env') bndrs1 rhss
; stuff <- zipWithM (cpePair top_lvl Recursive topDmd False env') bndrs1 rhss
; let (floats_s, bndrs2, rhss2) = unzip3 stuff
all_pairs = foldrOL add_float (bndrs2 `zip` rhss2)
Expand All @@ -375,11 +376,11 @@ cpeBind top_lvl env (Rec pairs)
add_float b _ = pprPanic "cpeBind" (ppr b)
---------------
cpePair :: TopLevelFlag -> RecFlag -> RhsDemand
cpePair :: TopLevelFlag -> RecFlag -> Demand -> Bool
-> CorePrepEnv -> Id -> CoreExpr
-> UniqSM (Floats, Id, CpeRhs)
-- Used for all bindings
cpePair top_lvl is_rec is_strict_or_unlifted env bndr rhs
cpePair top_lvl is_rec dmd is_unlifted env bndr rhs
= do { (floats1, rhs1) <- cpeRhsE env rhs
-- See if we are allowed to float this stuff out of the RHS
Expand All @@ -392,7 +393,7 @@ cpePair top_lvl is_rec is_strict_or_unlifted env bndr rhs
else WARN(True, text "CorePrep: silly extra arguments:" <+> ppr bndr)
-- Note [Silly extra arguments]
(do { v <- newVar (idType bndr)
; let float = mkFloat False False v rhs2
; let float = mkFloat topDmd False v rhs2
; return ( addFloat floats2 float
, cpeEtaExpand arity (Var v)) })
Expand All @@ -406,6 +407,8 @@ cpePair top_lvl is_rec is_strict_or_unlifted env bndr rhs
; return (floats3, bndr', rhs') }
where
is_strict_or_unlifted = (isStrictDmd dmd) || is_unlifted
platform = targetPlatform (cpe_dynFlags env)
arity = idArity bndr -- We must match this arity
Expand Down Expand Up @@ -650,9 +653,8 @@ cpeApp env expr
[] -> (topDmd, [])
(arg_ty, res_ty) = expectJust "cpeBody:collect_args" $
splitFunTy_maybe fun_ty
is_strict = isStrictDmd ss1
; (fs, arg') <- cpeArg env is_strict arg arg_ty
; (fs, arg') <- cpeArg env ss1 arg arg_ty
; return (App fun' arg', hd, res_ty, fs `appendFloats` floats, ss_rest) }
collect_args (Var v) depth
Expand Down Expand Up @@ -682,8 +684,8 @@ cpeApp env expr
-- N-variable fun, better let-bind it
collect_args fun depth
= do { (fun_floats, fun') <- cpeArg env True fun ty
-- The True says that it's sure to be evaluated,
= do { (fun_floats, fun') <- cpeArg env evalDmd fun ty
-- The evalDmd says that it's sure to be evaluated,
-- so we'll end up case-binding it
; return (fun', (fun', depth), ty, fun_floats, []) }
where
Expand All @@ -694,9 +696,9 @@ cpeApp env expr
-- ---------------------------------------------------------------------------
-- This is where we arrange that a non-trivial argument is let-bound
cpeArg :: CorePrepEnv -> RhsDemand
cpeArg :: CorePrepEnv -> Demand
-> CoreArg -> Type -> UniqSM (Floats, CpeTriv)
cpeArg env is_strict arg arg_ty
cpeArg env dmd arg arg_ty
= do { (floats1, arg1) <- cpeRhsE env arg -- arg1 can be a lambda
; (floats2, arg2) <- if want_float floats1 arg1
then return (floats1, arg1)
Expand All @@ -710,11 +712,12 @@ cpeArg env is_strict arg arg_ty
else do
{ v <- newVar arg_ty
; let arg3 = cpeEtaExpand (exprArity arg2) arg2
arg_float = mkFloat is_strict is_unlifted v arg3
arg_float = mkFloat dmd is_unlifted v arg3
; return (addFloat floats2 arg_float, varToCoreExpr v) } }
where
is_unlifted = isUnLiftedType arg_ty
want_float = wantFloatNested NonRecursive (is_strict || is_unlifted)
is_strict = isStrictDmd dmd
want_float = wantFloatNested NonRecursive (is_strict || is_unlifted)
\end{code}
Note [Floating unlifted arguments]
Expand Down Expand Up @@ -909,20 +912,16 @@ tryEtaReducePrep _ _ = Nothing
\end{code}
-- -----------------------------------------------------------------------------
-- Demands
-- -----------------------------------------------------------------------------
\begin{code}
type RhsDemand = Bool -- True => used strictly; hence not top-level, non-recursive
\end{code}
%************************************************************************
%* *
Floats
%* *
%************************************************************************
Note [Pin demand info on floats]
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
We pin demand info on floated lets so that we can see the one-shot thunks.
\begin{code}
data FloatingBind
= FloatLet CoreBind -- Rhs of bindings are CpeRhss
Expand Down Expand Up @@ -957,12 +956,16 @@ data OkToSpec
-- ok-to-speculate unlifted bindings
| NotOkToSpec -- Some not-ok-to-speculate unlifted bindings
mkFloat :: Bool -> Bool -> Id -> CpeRhs -> FloatingBind
mkFloat is_strict is_unlifted bndr rhs
mkFloat :: Demand -> Bool -> Id -> CpeRhs -> FloatingBind
mkFloat dmd is_unlifted bndr rhs
| use_case = FloatCase bndr rhs (exprOkForSpeculation rhs)
| otherwise = FloatLet (NonRec bndr rhs)
| is_hnf = FloatLet (NonRec bndr rhs)
| otherwise = FloatLet (NonRec (setIdDemandInfo bndr dmd) rhs)
-- See Note [Pin demand info on floats]
where
use_case = is_unlifted || is_strict && not (exprIsHNF rhs)
is_hnf = exprIsHNF rhs
is_strict = isStrictDmd dmd
use_case = is_unlifted || is_strict && not is_hnf
-- Don't make a case for a value binding,
-- even if it's strict. Otherwise we get
-- case (\x -> e) of ...!
Expand Down
6 changes: 5 additions & 1 deletion compiler/main/DynFlags.hs
Expand Up @@ -273,6 +273,8 @@ data GeneralFlag

-- optimisation opts
| Opt_Strictness
| Opt_KillAbsence
| Opt_KillOneShot
| Opt_FullLaziness
| Opt_FloatIn
| Opt_Specialise
Expand Down Expand Up @@ -2534,7 +2536,9 @@ fFlags = [
( "hpc", Opt_Hpc, nop ),
( "pre-inlining", Opt_SimplPreInlining, nop ),
( "flat-cache", Opt_FlatCache, nop ),
( "use-rpaths", Opt_RPath, nop )
( "use-rpaths", Opt_RPath, nop ),
( "kill-absence", Opt_KillAbsence, nop),
( "kill-one-shot", Opt_KillOneShot, nop)
]

-- | These @-f\<blah\>@ flags can all be reversed with @-fno-\<blah\>@
Expand Down

0 comments on commit 99d4e5b

Please sign in to comment.