Merge #3327

3327: Implement `CheckpointPolicy` data type r=HeinrichApfelmus a=HeinrichApfelmus ### Issue number ADP-1497 ### Overview This pull request implements an abstract data type `CheckpointPolicy` which describes a policy for keeping and discarding checkpoints based on their BlockHeight. Specifically, a boolean `keepWhereTip policy tip height` indicates whether the `policy` wants to store a checkpoint at the block height `height` , given that the blockchain is at `tip`. ### Details * The implementation of the data type is based on a function ``` nextCheckpoint policy tip height :: Maybe BlockHeight ``` which returns the next block height (`>= height`) at which a checkpoint should be made. This function is an efficient balance between the purely boolean predicate (`keepWhereTip`) and a full listing of all checkpoints (`toListAtTip`). * New policies can be created from old policies by using the semigroup operation `(<>)`, which represents the union of the checkpoint listings. This simplifies the implementation of the `sparseArithmetic` policy significantly. * The `sparseArithmetic` policy is modeled on the checkpointing policy currently implemented in `cardano-wallet`, but it is not equal to it. ### Comments * This pull request implements the data type, but does *not* use it yet, in order to keep this PR small and easy to review. The type will be used in a future pull request. * This data type is a reimplementation of the data type introduced in the stale pull request #3159. * The main reason for not merging the old PR is that a purely boolean predicate `keepWhereTip` is not suitable for use with `lightSync`. The problem is that `lightSync` does not always read individual blocks and can (potentially) skip over many BlockHeights — which would result in *no* checkpoints being created, as the predicate always returns `False`. Put differently, the checkpoints would "fall between the cracks". In contrast, an implementation based on `nextCheckpoint` can be used with `lightSync`, as it can now *specifically target* the next block height, at which a checkpoint is created. Co-authored-by: Heinrich Apfelmus <heinrich.apfelmus@iohk.io>
cardano-foundation · Jun 19, 2022 · e06071d · e06071d
2 parents df24232 + 9903b2e
commit e06071d
Show file tree

Hide file tree

Showing 3 changed files with 360 additions and 0 deletions.
diff --git a/lib/core/cardano-wallet-core.cabal b/lib/core/cardano-wallet-core.cabal
@@ -190,6 +190,7 @@ library
       Cardano.Wallet.Api.Server.Tls
       Cardano.Wallet.Api.Types
       Cardano.Wallet.Api.Types.SchemaMetadata
+      Cardano.Wallet.Checkpoints.Policy
       Cardano.Wallet.CoinSelection
       Cardano.Wallet.CoinSelection.Internal
       Cardano.Wallet.CoinSelection.Internal.Balance
@@ -450,6 +451,7 @@ test-suite unit
       Cardano.Wallet.Api.ServerSpec
       Cardano.Wallet.Api.TypesSpec
       Cardano.Wallet.ApiSpec
+      Cardano.Wallet.Checkpoints.PolicySpec
       Cardano.Wallet.CoinSelectionSpec
       Cardano.Wallet.CoinSelection.InternalSpec
       Cardano.Wallet.CoinSelection.Internal.BalanceSpec

diff --git a/lib/core/src/Cardano/Wallet/Checkpoints/Policy.hs b/lib/core/src/Cardano/Wallet/Checkpoints/Policy.hs
@@ -0,0 +1,202 @@
+-- |
+-- Copyright: © 2022 IOHK
+-- License: Apache-2.0
+--
+-- Abstract data type that describes a policy for keeping and discarding
+-- checkpoints. To be used with the 'Checkpoints' type.
+module Cardano.Wallet.Checkpoints.Policy
+    ( BlockHeight
+    , CheckpointPolicy
+    , nextCheckpoint
+    , keepWhereTip
+    , toListAtTip
+
+    -- * Construction
+    , atGenesis
+    , atTip
+    , trailingArithmetic
+    , sparseArithmetic
+    , gapSize
+
+    -- * Internal invariants
+    -- $invariants
+    ) where
+
+import Prelude
+
+import Data.List
+    ( unfoldr )
+
+{-------------------------------------------------------------------------------
+    CheckpointPolicy, abstract data type
+-------------------------------------------------------------------------------}
+type BlockHeight = Integer
+
+{-| [CheckpointPolicy]
+
+To save memory and time, we do not store every checkpoint.
+Instead, a 'CheckpointPolicy' determines which checkpoints
+to store and which ones to discard.
+The 'extendAndPrune' functions consults such a policy and
+drops checkpoints as it deems necessary.
+
+A 'CheckpointPolicy' determines whether a checkpoint is worth storing
+only based on its block height. The boolean
+
+  keepWhereTip policy tip blockheight
+
+indicates whether the checkpoint should be stored ('True') or
+not ('False').
+It is important that this function does not oscillate:
+If @blockheight <= tip@, the function result may change from 'True'
+to 'False' as the @tip@ increases, but not the other way round.
+This is because we can only create checkpoints the first time we
+read the corresponding block.
+
+TODO:
+The 'Checkpoints' collection currently relies on 'Slot' instead
+of 'BlockHeight' to store checkpoints. We need to better integrate
+this with 'BlockHeight'.
+
+I (Heinrich) actually prefer 'Slot'. However, not every slot contains a block,
+and we would lose too many checkpoints if we based the decision of
+whether to keep a checkpoint or not based on the slot number alone.
+In contrast, block height is "dense".
+-}
+newtype CheckpointPolicy = CheckpointPolicy
+    { nextCheckpoint :: BlockHeight -> BlockHeight -> Maybe BlockHeight
+        -- ^ Assuming that the tip of the chain is at block height @tip@,
+        -- @nextCheckpoint policy tip height@ returns the smallest
+        -- @height'@ satisfying @height' >= height#
+        -- at which the next checkpoint is to be made.
+    }
+
+{-$invariants
+
+Internal invariants of the 'CheckpointPolicy' type:
+
+* 'prop_monotonicHeight' — 'nextCheckpoint' returns the same height
+  for all heights between a given height and the height returned.
+* prop_monotonicTip' — when increasing the @tip@ height, 'nextCheckpoint'
+  will never return a blockheight that is smaller.
+-}
+
+-- | Assuming that the tip of the chain is at block height @tip@,
+-- the value @keepWhereTip policy tip height@
+-- indicates whether a checkpoint should ('True') or should not ('False')
+-- be stored at @height@.
+keepWhereTip
+    :: CheckpointPolicy -> BlockHeight -> BlockHeight
+    -> Bool
+keepWhereTip policy tip height =
+    nextCheckpoint policy tip height == Just height
+
+-- | List all checkpoints for a given tip.
+toListAtTip :: CheckpointPolicy -> BlockHeight -> [BlockHeight]
+toListAtTip policy tip = unfoldr (fmap next . nextCheckpoint policy tip) 0
+  where next x = (x,x+1)
+
+{-------------------------------------------------------------------------------
+    CheckpointPolicy, construction
+-------------------------------------------------------------------------------}
+-- | The combination of two 'CheckpointPolicy' makes a checkpoint
+-- where at least one of the policies wants to make a checkpoint.
+instance Semigroup CheckpointPolicy where
+    p1 <> p2 = CheckpointPolicy $ \t h ->
+        union min (nextCheckpoint p1 t h) (nextCheckpoint p2 t h)
+      where
+        union _ Nothing mb = mb
+        union _ ma Nothing = ma
+        union f (Just a) (Just b) = Just (f a b)
+
+instance Monoid CheckpointPolicy where
+    mempty = CheckpointPolicy $ \_ _ -> Nothing
+
+-- | The 'CheckpointPolicy' that keeps only the genesis block.
+atGenesis :: CheckpointPolicy
+atGenesis = CheckpointPolicy $ \_tip height ->
+    if height <= 0 then Just height else Nothing
+
+-- | The 'CheckpointPolicy' that only keeps the tip of the chain.
+atTip :: CheckpointPolicy
+atTip = CheckpointPolicy $ \tip height ->
+    if height <= tip then Just tip else Nothing
+
+-- | @trailingArithmetic n height@ keeps @n@ checkpoints
+-- at block heights that are multiples of @height@
+-- and which are closest to the tip of the chain.
+-- (Fewer than @n@ checkpoints are kept while the chain is too short
+-- to accommodate all checkpoints.)
+trailingArithmetic :: Integer -> BlockHeight -> CheckpointPolicy
+trailingArithmetic n grid = CheckpointPolicy $ \tip height ->
+    case [h | h <- window tip, h >= height] of
+        [] -> Nothing
+        (x:_) -> Just x
+  where
+    window tip = [a, a + grid .. tip]
+      where
+        m = n - 1
+        a = if tip > m * grid then toGrid (tip - m * grid) else 0
+    toGrid x = (x `div` grid) * grid
+
+{- | Note [sparseArithmeticPolicy]
+
+The 'sparseArithmetic' checkpoint policy contains essentially two
+sets of checkpoints: One fairly dense set near the tip of the chain
+in order to handle frequent potential rollbacks, and one sparse
+set that spans the entire epoch stability window. These two sets
+are arranged as arithmetic sequences.
+
+This policy is motivated by the following observations:
+
+  - We can't rollback for more than `k = epochStability` blocks in the past
+  - It is pretty fast to re-sync a few hundred blocks
+  - Small rollbacks near the tip may occur more often than long ones
+
+Hence, we should strive to
+
+- Prune any checkpoint that are more than `k` blocks in the past
+- Keep only one checkpoint every `largeGap` ~100 blocks
+- But still keep ~10 most recent checkpoints to cope with small rollbacks.
+
+Roughly, the 'sparseArithmetic' 
+
+0 ..... N*largeGap .... (N+1)*largeGap .. .. M*smallGap (M+1)*smallGap tip
+        |_______________________________________________________________|
+                 epochStability
+
+Note: In the event where chain following "fails completely" (because, for
+example, the node has switch to a different chain, different by more than `k`),
+we have no choice but rolling back from genesis.
+Therefore, we need to keep the very first checkpoint in the database, no
+matter what.
+-}
+sparseArithmetic :: BlockHeight -> CheckpointPolicy
+sparseArithmetic epochStability =
+    atGenesis
+    <> atTip
+    <> trailingArithmetic 10 1
+    <> trailingArithmetic n largeGap
+  where
+    largeGap = gapSize epochStability
+    n = epochStability `div` largeGap
+
+{- | A reasonable gap size used internally in 'sparseArithmeticPolicy'.
+
+'Reasonable' means that it's not _too frequent_ and it's not too large. A
+value that is too small in front of k would require generating much more
+checkpoints than necessary.
+
+A value that is larger than `k` may have dramatic consequences in case of
+deep rollbacks.
+
+As a middle ground, we current choose `k / 3`, which is justified by:
+
+- The current bandwidth of the network layer (several thousands blocks per seconds)
+- The current value of k = 2160
+
+So, `k / 3` = 720, which corresponds to around a second of time needed to catch
+up in case of large rollbacks (if our local node has caught up already).
+-}
+gapSize :: BlockHeight -> Integer
+gapSize epochStability = max 1 (epochStability  `div` 3)
diff --git a/lib/core/test/unit/Cardano/Wallet/Checkpoints/PolicySpec.hs b/lib/core/test/unit/Cardano/Wallet/Checkpoints/PolicySpec.hs
@@ -0,0 +1,156 @@
+{-# LANGUAGE DataKinds #-}
+module Cardano.Wallet.Checkpoints.PolicySpec
+    ( spec
+    ) where
+
+import Prelude
+
+import Cardano.Wallet.Checkpoints.Policy
+    ( BlockHeight, CheckpointPolicy, nextCheckpoint, toListAtTip )
+import Test.Hspec
+    ( Spec, describe, it )
+import Test.QuickCheck
+    ( Arbitrary (..)
+    , Gen
+    , NonNegative (..)
+    , Property
+    , choose
+    , elements
+    , forAll
+    , frequency
+    , oneof
+    , property
+    , (===)
+    )
+
+import qualified Cardano.Wallet.Checkpoints.Policy as CP
+import qualified Data.Set as Set
+
+spec :: Spec
+spec = do
+    describe "general laws, tested on trailingArithmetic" $
+        it "(<>) is union of checkpoints" $
+            property prop_mappendOnTrailing
+
+    describe "general laws, tested on sparseArithmetic" $ do
+        it "prop_monotonicHeight — next heights do not fluctuate" $
+            property $ \ctx@(GenHeightContext epochStability tip) ->
+                forAll (genBlockHeight ctx) $ \height ->
+                    prop_monotonicHeight
+                        (CP.sparseArithmetic epochStability)
+                        tip
+                        height
+
+        it "prop_monotonicTip — heights become sparser as tip increases" $
+            property $ \ctx@(GenHeightContext epochStability tip) ->
+                forAll (genBlockHeight ctx) $ \height ->
+                    prop_monotonicTip
+                        (CP.sparseArithmetic epochStability)
+                        tip
+                        height
+
+    describe "specific policies" $ do
+        it "atGenesis <> atTip has exactly two checkpoints" $
+            property $ \(GenHeightContext _ tip) ->
+                CP.toListAtTip (CP.atGenesis <> CP.atTip) tip == [0,tip]
+
+        it "trailingArithmetic n _  has at most n checkpoints" $
+            property prop_trailingLength
+
+        it "trailingArithmetic checkpoints are located at grid points" $
+            property prop_trailingGrid
+
+        it "sparseArithmetic checkpoints after genesis are close to tip" $
+            property $ \(GenHeightContext epochStability tip) ->
+                maybe False (>= tip - 2*epochStability - 20) $
+                    nextCheckpoint (CP.sparseArithmetic epochStability) tip 1
+
+{-------------------------------------------------------------------------------
+    Properties, general
+-------------------------------------------------------------------------------}
+-- | Internal invariant.
+prop_monotonicHeight
+    :: CheckpointPolicy
+    -> BlockHeight -> BlockHeight
+    -> Property
+prop_monotonicHeight policy tip h1 = case nextCheckpoint policy tip h1 of
+    Nothing -> forAll ((h1 +) <$> genNonNegative) $ \h ->
+        nextCheckpoint policy tip h === Nothing
+    Just h2 -> forAll (genInterval (h1,h2)) $ \h ->
+        nextCheckpoint policy tip h === Just h2
+
+-- | Internal invariant.
+prop_monotonicTip
+    :: CheckpointPolicy
+    -> BlockHeight -> BlockHeight
+    -> Property
+prop_monotonicTip policy tip height = case nextCheckpoint policy tip height of
+    Nothing -> property True
+    Just height1 -> forAll ((tip +) <$> genNonNegative) $ \tip2 ->
+        case nextCheckpoint policy tip2 height of
+            Nothing -> property False
+            Just height2 -> property $ height1 <= height2
+
+prop_mappendOnTrailing :: Property
+prop_mappendOnTrailing = let g17 = choose (1,7) in
+    forAll g17 $ \n1 -> forAll g17 $ \gap1 ->
+    forAll g17 $ \n2 -> forAll g17 $ \gap2 ->
+        let policy1 = CP.trailingArithmetic n1 gap1
+            policy2 = CP.trailingArithmetic n2 gap2
+            tip     = 100
+        in      Set.fromList (toListAtTip policy1 tip)
+                <> Set.fromList (toListAtTip policy2 tip)
+            ===
+                Set.fromList (toListAtTip (policy1 <> policy2) tip)
+
+{-------------------------------------------------------------------------------
+    Properties, specific
+-------------------------------------------------------------------------------}
+prop_trailingLength :: GenHeightContext -> Property
+prop_trailingLength (GenHeightContext gap tip) =
+    forAll (choose (0,5)) $ \n ->
+        fromIntegral n >= length (toListAtTip (CP.trailingArithmetic n gap) tip)
+
+prop_trailingGrid :: GenHeightContext -> Property
+prop_trailingGrid (GenHeightContext gap tip) =
+    forAll (choose (1,7)) $ \n ->
+        all (`divisibleBy` gap) $ toListAtTip (CP.trailingArithmetic n gap) tip
+  where
+    a `divisibleBy` b = a `mod` b == 0
+
+{-------------------------------------------------------------------------------
+    Generators
+-------------------------------------------------------------------------------}
+-- | Data type for generating sensible blockchain heights
+data GenHeightContext = GenHeightContext
+    { _epochStability :: BlockHeight
+    , _tip :: BlockHeight
+    } deriving Show
+
+instance Arbitrary GenHeightContext where
+    arbitrary = do
+        es <- max 1 <$> oneof
+            [choose (0,1000000), elements [1,3,10,30,100,300,1000] ]
+        tip <- max 1 <$> oneof
+            [choose (0,1000000), choose (0,10), choose (es-200,es+200)]
+        pure $ GenHeightContext es tip
+
+-- | Generate 'Integers' from an interval, skewed towards the ends.
+genInterval :: (Integer, Integer) -> Gen Integer
+genInterval (a,b) = clip <$> frequency
+    [ (1, choose (a,a+3))
+    , (1, choose (b,b+3))
+    , (5, choose (a,b))
+    ]
+  where
+    clip = max a . min b
+
+genNonNegative :: Gen Integer
+genNonNegative = getNonNegative <$> arbitrary
+
+genBlockHeight :: GenHeightContext -> Gen BlockHeight
+genBlockHeight (GenHeightContext epochStability tip) = oneof
+    [ genNonNegative
+    , choose (0,10)
+    , choose (tip-2*epochStability,tip+2*epochStability)
+    ]