Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Epoch boundary fix #1707

Merged
merged 2 commits into from
Jul 31, 2020
Merged

Epoch boundary fix #1707

merged 2 commits into from
Jul 31, 2020

Conversation

TimSheard
Copy link
Contributor

Tries to fix a bottle neck at epoch boundaries. Two small changes. Replaces aggregateOuts (EpochBoundary.hs) and
pulls the computation of a constant out of the inner loop of a list comprehension (Rewards.hs)

Map.fromListWith (+) (map (\(_, TxOut a c) -> (a, c)) $ Map.toList u)
-- Map.fromListWith (+) (map (\(_, TxOut a c) -> (a, c)) $ Map.toList u)
Map.foldr accum Map.empty u
where accum (TxOut a c) ans = Map.insertWith (+) a c ans
Copy link
Contributor

@mrBliss mrBliss Jul 24, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see you're no longer using aggregateOuts at all, so the comment below may no longer be relevant:

When looking at the profiteur output (attached) of cardano-node doing a full sync with MC1, you can drill down on the top left corner until you get to aggregateOuts (38.6% of the total time), when you go deeper, you'll see that 24.9% of the total time is spent in deserialiseAddr. That's because pattern matching on a TxOut, which is a pattern synonym for a TxOutCompact, deserialises the address (from a ShortByteString):

pattern TxOut ::
  Crypto crypto =>
  Addr crypto ->
  Coin ->
  TxOut crypto
pattern TxOut addr coin <-
  (viewCompactTxOut -> (addr, coin))
  where
    TxOut addr (Coin coin) =
      TxOutCompact (BSS.toShort $ serialiseAddr addr) (fromIntegral coin)

viewCompactTxOut :: forall crypto. Crypto crypto => TxOut crypto -> (Addr crypto, Coin)
viewCompactTxOut (TxOutCompact bs c) = (addr, coin)
  where
    addr = case deserialiseAddr (BSS.fromShort bs) of
      Nothing -> panic "viewCompactTxOut: impossible"
      Just (a :: Addr crypto) -> a
    coin = word64ToCoin c

Idea: wouldn't it be more efficient to keep using the ShortByteStrings as keys in the map while doing Map.insertWith (+) and afterwards converting the keys in the final map from ShortByteStrings to Addrs?

cardano-node.prof.html.gz

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is an excellent observation. But the problem really has two components. The observation is that
(TxOut addr (Coin coin)) is a pattern and the pattern has to deserializes the whole address,
to get the part of the address that holds the credential. It turns out there is a second problem that
that makes this really slow. The 'addr' is stored as a ShortByteString, and that has to be converted
to a ByteString before it can be deserialized. The solution is to deserialize directly from the ShortByteString, and only look
at the Credential part of the address. We implemented this, and created some benchmarks, The
new code is 10x faster than the old code.

case Map.lookup p ptrs of
Just cred -> (cred,c):ans
Nothing -> ans
accum _other ans = ans
Copy link
Contributor

@mrBliss mrBliss Jul 24, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So here we skip the AddrBootstrap case:

data Addr crypto
  = Addr !Network !(PaymentCredential crypto) !(StakeReference crypto)
  | AddrBootstrap !(BootstrapAddress crypto)

All translated Byron addresses are AddrBootstrap's, so (at least in the beginning) the majority of all Addrs will be AddrBootstraps. In other words, we're wasting most of our time deserialising (see my other comment about deserialiseAddr) Byron addresses that we'll ignore anyway.

Idea: in accum:

  1. Pattern match on the TxOutCompact constructor to avoid calling deserialiseAddr unconditionally for each address (ShortByteString).
  2. Write a function deserialiseStakeReference (or a better name) with as type: ByteString -> Maybe (StakeReference crypto) which returns Nothing for AddrBootstrap (just test a bit to see whether it's Byron and then skip it) and only deserialises the StakeReference in case it's a Shelley Addr (you could deserialise the Network and PaymentCredential as well if needed elsewhere). Maybe throw in an Either there, because we're conflating "deserialisation failure" with "not a Shelley Addr".
  3. Pattern match on the result of calling deserialiseStakeReference on the ShortByteString argument to TxOutCompact to get the same three cases you currently have in accum.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So all the above comments have been explored. I wrote a dozen different algorithms to try and make aggregateOuts (and its successors) faster. All of them were benchmarked and the fasted were wriiten into the latest version. We ran all the tests and all the benchmarks, every thig is working.

@TimSheard TimSheard requested a review from nc6 as a code owner July 30, 2020 23:34
@mrBliss
Copy link
Contributor

mrBliss commented Jul 31, 2020

I think something went wrong rebasing this branch (was master merged in this branch?). It makes it hard to look at the actual changes of this PR.

Jared Corduan added 2 commits July 31, 2020 13:56
…places aggregateOuts (EpochBoundary.hs) and

pulls the computation of a constant out of the inner loop of a list comprehension (Rewards.hs)

(Really from Tim Sheard)
Copy link
Contributor

@JaredCorduan JaredCorduan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍

@JaredCorduan JaredCorduan merged commit 963c801 into master Jul 31, 2020
@iohk-bors iohk-bors bot deleted the epochBoundaryFix branch July 31, 2020 19:55
@JaredCorduan
Copy link
Contributor

@mrBliss I'm sorry if I merged this before you had a chance to review after I cleaned up the PR.

@@ -880,7 +903,7 @@ data Query k v where
-- ======================================================================================

smart :: Bool
smart = True -- for debugging purposes, this can be set to False, in which case no rewrites occurr.
smart = False -- True -- for debugging purposes, this can be set to False, in which case no rewrites occurr.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this intentional? This change is still there in master.

mRewards =
Map.fromList
[ ( hk,
memberRew poolR pool (StakeShare (fromIntegral c % tot)) (StakeShare sigma)
)
| (hk, Coin c) <- Map.toList stake,
hk `Set.notMember` (KeyHashObj `Set.map` _poolOwners pool)
hk `Set.notMember` poolHashes
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can't we avoid this O(n) operation entirely:

poolHashes = (KeyHashObj `Set.map` _poolOwners pool)

We just need to check if kh is in poolOwners pool, something like

notPoolOwner (KeyHashObj hk) = hk `Set.notMember` _poolOwners pool
notPoolOwner (ScriptHashObj hk) = False

right?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yep, notPoolOwner is all we need

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants