Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add a ledger operations benchmarking analysis to db-analyser #4014

Merged
merged 1 commit into from
Dec 14, 2022

Conversation

nfrisby
Copy link
Contributor

@nfrisby nfrisby commented Sep 19, 2022

Add support to db-analyser for benchmarking the five major ledger calculations for each block:

  1. Forecast.
  2. Header tick.
  3. Header application.
  4. Block tick.
  5. Block application.

We focus on these 5 operations because they are involved in:

  • Chain syncing.
  • Block forging.
  • Block validation.

Fixes #4052

@nfrisby nfrisby added the consensus issues related to ouroboros-consensus label Sep 19, 2022
@nfrisby
Copy link
Contributor Author

nfrisby commented Sep 19, 2022

I invoke the tool like this:

$(cabal list-bin exe:db-analyser) --db state-node-mainnet/db-mainnet/ --only-immutable-db cardano --config mainnet-config.json --analyse-from 42942407 --benchmark-ledger-ops 2>/dev/null 1>db-analyser.log

and then feed it to this Gnuplot script

set terminal pngcairo size 1600,800
set border back
set output "bench-ledger-ops.png"

set xlabel "slot"
set ylabel "dur (μs)"

set multiplot layout 2,1

plot 'db-analyser.log' using 1:3 with impulses title "both",\
     'db-analyser.log' using 1:4 title "mut",\
     'db-analyser.log' using 1:5 title "gc"

# draw a vertical line at each every Shelley boundary
do for [sl in "4492800 4924800 5356800 5788800 6220800 6652800 7084800 7516800 7948800 8380800 8812800 9244800 9676800 10108800 10540800 10972800 11404800 11836800 12268800 12700800 13132800 13564800 13996800 14428800 14860800 15292800 15724800 16156800 16588800 17020800 17452800 17884800 18316800 18748800 19180800 19612800 20044800 20476800 20908800 21340800 21772800 22204800 22636800 23068800 23500800 23932800 24364800 24796800 25228800 25660800 26092800 26524800 26956800 27388800 27820800 28252800 28684800 29116800 29548800 29980800 30412800 30844800 31276800 31708800 32140800 32572800 33004800 33436800 33868800 34300800 34732800 35164800 35596800 36028800 36460800 36892800 37324800 37756800 38188800 38620800 39052800 39484800 39916800 40348800 40780800 41212800 41644800 42076800 42508800 42940800 43372800 43804800 44236800 44668800 45100800 45532800 45964800 46396800 46828800 47260800 47692800 48124800 48556800 48988800 49420800 49852800 50284800 50716800 51148800 51580800 52012800 52444800 52876800 53308800 53740800 54172800 54604800 55036800 55468800 55900800 56332800 56764800 57196800 57628800 58060800 58492800 58924800 59356800 59788800 60220800 60652800 61084800 61516800 61948800 62380800 62812800 63244800 63676800 64108800 64540800 64972800 65404800 65836800 66268800 66700800 67132800 67564800 67996800 68428800 68860800 69292800 69724800 70156800 70588800 71020800 71452800 71884800 72316800 72748800 73180800 73612800 74044800 74476800 74908800 75340800 75772800 76204800 76636800 77068800 77500800 77932800 78364800 78796800 79228800 79660800 80092800 80524800 80956800 81388800 81820800 82252800 82684800 83116800 83548800 83980800 84412800 84844800 85276800 85708800 86140800 86572800 87004800 87436800"] {
  set arrow from sl, screen 0.5 to sl, screen 0.6 nohead ls 28
  set arrow from sl, screen 0 to sl, screen 0.1 nohead ls 28
}

# draw a heavier line for the epoch transitions that are also HFC era transitions
do for [sl in "4492800 16588800 23068800 39916800"] {
  set arrow from sl, screen 0.5 to sl, screen 0.575 nohead ls 28 linewidth 5
  set arrow from sl, screen 0.0 to sl, screen 0.075 nohead ls 28 linewidth 5
}

plot 'db-analyser.log' using 1:4 with impulses title "mut",\
     'db-analyser.log' using 1:7 title "forcst",\
     'db-analyser.log' using 1:8 title "hdrTik",\
     'db-analyser.log' using 1:9 title "hdrApp",\
     'db-analyser.log' using 1:10 title "blkTik",\
     'db-analyser.log' using 1:11 title "blkApp" 

unset multiplot


This time, I interrupted the tool after a few minutes, so the tool didn't get through very many slots. It yielded the following chart. I used the commits from the cabal.project of cardano-node as of commit 36b5f661e (a recent master).

The x-axis is the slot number of the block, the y-axis is duration in microseconds. The extra-tall orange ticks demarcate the first slot of each epoch. Legend semantics:

https://github.com/input-output-hk/ouroboros-network/blob/3961b4d6bce2b90a95df22fd0226e2873638a758/ouroboros-consensus-cardano/tools/db-analyser/Analysis.hs#L515-L548

bench-ledger-ops


That top chart shows both significant mutator and GC time elapsed at the first block of each epoch (these epochs were still d>0, so the first slot was always filled by an overlay schedule block).

The second chart shows how that mutator time decomposes: mostly block ticking (not surprising) and header application (surprising!). I haven't checked yet, but strongly anticipate that this bang pattern isn't sufficient to force the entire forecast operation, and so significant portions of the duration of the forecast computation are instead being forced by (and so attributed to) the header application computation. (Edit: see PR #4028 -- my suspicion was correct and the plots in later comments on this PR used that patch)

https://github.com/input-output-hk/ouroboros-network/blob/3961b4d6bce2b90a95df22fd0226e2873638a758/ouroboros-consensus-cardano/tools/db-analyser/Analysis.hs#L519

@nfrisby
Copy link
Contributor Author

nfrisby commented Sep 20, 2022

bench-ledger-ops

@nfrisby
Copy link
Contributor Author

nfrisby commented Sep 21, 2022

Here's the entire history. The only thing you might add to this plot is the Byron epoch delimiters---we chose not too because they'd be very dense.

bench-ledger-ops

@nfrisby
Copy link
Contributor Author

nfrisby commented Sep 21, 2022

And the same plot, for just the youngest 100k blocks.
bench-ledger-ops-most-recent-100k-blocks

@nfrisby
Copy link
Contributor Author

nfrisby commented Sep 21, 2022

And finally for just the youngest 500 blocks -- note that these happen to not include a spike, so the y-axis is range is much smaller.

bench-ledger-ops-most-recent-500-blocks

@nfrisby
Copy link
Contributor Author

nfrisby commented Sep 21, 2022

Here's my latest gnuplot script, which I invoke like this nix-shell -p gnuplot --run "gnuplot -e \"benchmarkLedgerOpsLogFilePath='db-analyser7.log'\" gp.scr"

set terminal pngcairo size 1600,800
set border back
set output "bench-ledger-ops.png"

set xlabel "slot"
set ylabel "dur (μs)"

set multiplot layout 2,1

plot benchmarkLedgerOpsLogFilePath using 1:3 with impulses title "both",\
     benchmarkLedgerOpsLogFilePath using 1:4 title "mut",\
     benchmarkLedgerOpsLogFilePath using 1:5 title "gc"

# draw a vertical line at each every Shelley boundary
do for [sl in "4492800 4924800 5356800 5788800 6220800 6652800 7084800 7516800 7948800 8380800 8812800 9244800 9676800 10108800 10540800 10972800 11404800 11836800 12268800 12700800 13132800 13564800 13996800 14428800 14860800 15292800 15724800 16156800 16588800 17020800 17452800 17884800 18316800 18748800 19180800 19612800 20044800 20476800 20908800 21340800 21772800 22204800 22636800 23068800 23500800 23932800 24364800 24796800 25228800 25660800 26092800 26524800 26956800 27388800 27820800 28252800 28684800 29116800 29548800 29980800 30412800 30844800 31276800 31708800 32140800 32572800 33004800 33436800 33868800 34300800 34732800 35164800 35596800 36028800 36460800 36892800 37324800 37756800 38188800 38620800 39052800 39484800 39916800 40348800 40780800 41212800 41644800 42076800 42508800 42940800 43372800 43804800 44236800 44668800 45100800 45532800 45964800 46396800 46828800 47260800 47692800 48124800 48556800 48988800 49420800 49852800 50284800 50716800 51148800 51580800 52012800 52444800 52876800 53308800 53740800 54172800 54604800 55036800 55468800 55900800 56332800 56764800 57196800 57628800 58060800 58492800 58924800 59356800 59788800 60220800 60652800 61084800 61516800 61948800 62380800 62812800 63244800 63676800 64108800 64540800 64972800 65404800 65836800 66268800 66700800 67132800 67564800 67996800 68428800 68860800 69292800 69724800 70156800 70588800 71020800 71452800 71884800 72316800 72748800 73180800 73612800 74044800 74476800 74908800 75340800 75772800 76204800 76636800 77068800 77500800 77932800 78364800 78796800 79228800 79660800 80092800 80524800 80956800 81388800 81820800 82252800 82684800 83116800 83548800 83980800 84412800 84844800 85276800 85708800 86140800 86572800 87004800 87436800"] {
  set arrow from sl, screen 0.5 to sl, screen 0.6 nohead ls 28
  set arrow from sl, screen 0.0 to sl, screen 0.1 nohead ls 28
}

# draw a heavier line for the epoch transitions that are also HFC era transitions
do for [sl in "4492800 16588800 23068800 39916800"] {
  set arrow from sl, screen 0.5 to sl, screen 0.575 nohead ls 28 linewidth 5
  set arrow from sl, screen 0.0 to sl, screen 0.075 nohead ls 28 linewidth 5
}

plot benchmarkLedgerOpsLogFilePath using 1:4 with impulses title "mut",\
     benchmarkLedgerOpsLogFilePath using 1:7 title "forcst",\
     benchmarkLedgerOpsLogFilePath using 1:8 title "hdrTik",\
     benchmarkLedgerOpsLogFilePath using 1:9 title "hdrApp",\
     benchmarkLedgerOpsLogFilePath using 1:10 title "blkTik",\
     benchmarkLedgerOpsLogFilePath using 1:11 title "blkApp" 

unset multiplot

# the first shelley epoch was 208, and it began in slot 4492800
# there are 10k/f = 2160 * 10 * 20 = 432000 slots in each Shelley epoch

# these are the first epoches of each ledger era
# (see https://github.com/input-output-hk/cardano-ledger/wiki/First-Block-of-Each-Era)
# Byron		0
# Shelley	208
# Allegra 	236
# Mary	 	251
# Alonzo 	290
# Alonzo' 	298

# these are the first slots of each Shelley epoch
# 208 4492800
# 209 4924800
# 210 5356800
# 211 5788800
# 212 6220800
# 213 6652800
# 214 7084800
# 215 7516800
# 216 7948800
# 217 8380800
# 218 8812800
# 219 9244800
# 220 9676800
# 221 10108800
# 222 10540800
# 223 10972800
# 224 11404800
# 225 11836800
# 226 12268800
# 227 12700800
# 228 13132800
# 229 13564800
# 230 13996800
# 231 14428800
# 232 14860800
# 233 15292800
# 234 15724800
# 235 16156800
# 236 16588800
# 237 17020800
# 238 17452800
# 239 17884800
# 240 18316800
# 241 18748800
# 242 19180800
# 243 19612800
# 244 20044800
# 245 20476800
# 246 20908800
# 247 21340800
# 248 21772800
# 249 22204800
# 250 22636800
# 251 23068800
# 252 23500800
# 253 23932800
# 254 24364800
# 255 24796800
# 256 25228800
# 257 25660800
# 258 26092800
# 259 26524800
# 260 26956800
# 261 27388800
# 262 27820800
# 263 28252800
# 264 28684800
# 265 29116800
# 266 29548800
# 267 29980800
# 268 30412800
# 269 30844800
# 270 31276800
# 271 31708800
# 272 32140800
# 273 32572800
# 274 33004800
# 275 33436800
# 276 33868800
# 277 34300800
# 278 34732800
# 279 35164800
# 280 35596800
# 281 36028800
# 282 36460800
# 283 36892800
# 284 37324800
# 285 37756800
# 286 38188800
# 287 38620800
# 288 39052800
# 289 39484800
# 290 39916800
# 291 40348800
# 292 40780800
# 293 41212800
# 294 41644800
# 295 42076800
# 296 42508800
# 297 42940800
# 298 43372800
# 299 43804800
# 300 44236800
# 301 44668800
# 302 45100800
# 303 45532800
# 304 45964800
# 305 46396800
# 306 46828800
# 307 47260800
# 308 47692800
# 309 48124800
# 310 48556800
# 311 48988800
# 312 49420800
# 313 49852800
# 314 50284800
# 315 50716800
# 316 51148800
# 317 51580800
# 318 52012800
# 319 52444800
# 320 52876800
# 321 53308800
# 322 53740800
# 323 54172800
# 324 54604800
# 325 55036800
# 326 55468800
# 327 55900800
# 328 56332800
# 329 56764800
# 330 57196800
# 331 57628800
# 332 58060800
# 333 58492800
# 334 58924800
# 335 59356800
# 336 59788800
# 337 60220800
# 338 60652800
# 339 61084800
# 340 61516800
# 341 61948800
# 342 62380800
# 343 62812800
# 344 63244800
# 345 63676800
# 346 64108800
# 347 64540800
# 348 64972800
# 349 65404800
# 350 65836800
# 351 66268800
# 352 66700800
# 353 67132800
# 354 67564800
# 355 67996800
# 356 68428800
# 357 68860800
# 358 69292800
# 359 69724800
# 360 70156800
# 361 70588800
# 362 71020800
# 363 71452800
# 364 71884800
# 365 72316800
# 366 72748800
# 367 73180800
# 368 73612800
# 369 74044800
# 370 74476800
# 371 74908800
# 372 75340800
# 373 75772800
# 374 76204800
# 375 76636800
# 376 77068800
# 377 77500800
# 378 77932800
# 379 78364800
# 380 78796800
# 381 79228800
# 382 79660800
# 383 80092800
# 384 80524800
# 385 80956800
# 386 81388800
# 387 81820800
# 388 82252800
# 389 82684800
# 390 83116800
# 391 83548800
# 392 83980800
# 393 84412800
# 394 84844800
# 395 85276800
# 396 85708800
# 397 86140800
# 398 86572800
# 399 87004800
# 400 87436800

@dnadales
Copy link
Member

dnadales commented Oct 3, 2022

Some observations from these results:

  • We cannot conclude anything about ledger vs consensus overhead from these benchmarks. We are not measuring the consensus overhead. I think measuring this requires more careful analysis. Maybe we should create a new issue.
  • We seem to have a strong indication that HFC does not introduce significant overhead, at least for the operations benchmarked here. Why? Because we do not see a discrete increase in the total time these operations take as we cross the epoch boundaries.

@dnadales
Copy link
Member

dnadales commented Oct 3, 2022

Possible ways to drive this PR home:

  • Incorporate the Gnuplot scripts into the repository. This requires that we adjust the plot boundaries based on the range of the input data.
  • Improve structure (if applicable).
  • Add more comments.
  • Use more descriptive variable names.
  • Document this feature (eg what the benchmark does, how to run it).
  • Decide where to merge this PR: we should avoid developing on a release branch, however merging this to master will probably render db-analyser unusable since we won't have ledger or consensus version compatible with the Chain DB that was created using a previous version. So we should re-consider moving this tool somewhere else (eg cardano-node or to a separate repository).

Example of a place were we could add more comments:

        -- why do we return this:
        pure $ St rtsStats5 $ ExtLedgerState ldgrSt' headerState'

We might consider adding some ASCII version of these drawings to the documentation

image
image

@nfrisby
Copy link
Contributor Author

nfrisby commented Oct 17, 2022

FWIW, my initial reaction was the same as Arnaud's:

None beside sticking to gnuplot 🙂 Yes the doc is not straightforward but no graphing library in Haskell I am aware of is on par anyway. I would be interested if there were an alternative in Haskell though 🙂
Alternatively, python as some great plotting libraries

I've also had pleasant experiences with some JS libraries (plop your data in a JSON array, copy it in to a HTML file, and boom you have interactive plots).

I would hesitate to commit to any plotting library in Haskell -- I expect it all to inevitably break 😢.

But that's easy enough: just also commit the gnuplot script as a back-up?

Are you liking the vega interface?

@dnadales
Copy link
Member

@nfrisby one of the next steps I had in mind was to output each line as a JSON representation of a SlotDataPoint, eg:

data SlotDataPoint =
    SlotDataPoint
      { slot      :: Int
      , totalTime :: Int
      , mut       :: Int
      , gc        :: Int
       -- ... 
      }

and this is the format that the new plotting and analyzing tool will ingest. In this way, we make sure that if we change the output data in the db-analyser we will detect this in the tools that ingest this data. We can convert this data back to a CSV, but as a separate step (for instance when converting the file to something gnuplot can understand).

@dnadales
Copy link
Member

I've also had pleasant experiences with some JS libraries (plop your data in a JSON array, copy it in to a HTML file, and boom you have interactive plots).

Yeah, we could ultimately do that with vega-lite.

I would hesitate to commit to any plotting library in Haskell -- I expect it all to inevitably break 😢.

What would "breaking" mean in this case? If for instance version 4, the currently supported version of vega-lite, is not distributed anymore by the CDN? In that case this could be a problem indeed. We could then spit out a vega-lite conforming JSON directly.

But that's easy enough: just also commit the gnuplot script as a back-up?

Sure thing.

Are you liking the vega interface?

I am. I looked at 4 different Haskell plotting libraries, and this is by far the easiest one to learn and use, albeit not being all the type-safe one would like, but I guess that's a trade off for usability?

@nfrisby how should I proceed:

  • Option 0: remove the vega-lite implementation and generate a gnu-plot script (where I would basically accommodate the epoch and era boundaries in the plot to cover only the data being plotted).
  • Option 1: keep the vega-lite implementation, and also generate the gnu-plot script.

@nfrisby
Copy link
Contributor Author

nfrisby commented Oct 18, 2022

Option 0: remove the vega-lite implementation and generate a gnu-plot script (where I would basically accommodate the epoch and era boundaries in the plot to cover only the data being plotted).

Why do you have to do that? Can you plot them all (like my gnuplot script does) and then set the x-axis bounds explicitly to automatically crop out the "extra" epoch/era boundaries?

Option 1: keep the vega-lite implementation, and also generate the gnu-plot script.

Why generate the gnuplot script? Can't a static script suffice? Or maybe one that takes gnuplot parameters (you can set variable names on the command-line invocation)?


I don't have a preference.

Tangent: The JSON format is a lot of per-line overhead, so that'll slow things down for longer runs and swell file sizes. Neither of which really matters, but I personally prefer the X-separated values format for tabular data.

@dnadales
Copy link
Member

Option 0: remove the vega-lite implementation and generate a gnu-plot script (where I would basically accommodate the epoch and era boundaries in the plot to cover only the data being plotted).

Why do you have to do that? Can you plot them all (like my gnuplot script does) and the set the x-axis bounds explicitly to automatically crop out the "extra" epoch/era boundaries?

Won't you end up with vertical lines outside the x-axis as we can see in some of the plots above? Is there a way to prevent this in gnuplot?

I don't have a preference.

In that case I'm going to keep the vega-lite backend just because it gives us the possibility of zooming in and the plots show a tooltip with the data values when the mouse pointer hovers over the data point.

Tangent: The JSON format is a lot of per-line overhead, so that'll slow things down for longer runs and swell file sizes. Neither of which really matters, but I personally prefer the X-separated values format for tabular data.

Good point. This is the problem I want to solve: I'd like to couple the plotting program with the benchmarking program in such a way that if the latter outputs the data in a different format (eg swapping the order of the columns), we can detect this discrepancy at compile time. I guess I could define a Read instance for SlotDataPoint, and in the benchmarking program collect the benchmarking data for a point as a SlotDataPoint value, and output it as a single line (as it is now).

@dnadales dnadales force-pushed the nfrisby/old-benchmark-ledger-ops branch from c6795e2 to 891d380 Compare October 22, 2022 12:12
@jasagredo
Copy link
Contributor

Related PR that removes the forecasting step in the forging logic #4099

iohk-bors bot added a commit that referenced this pull request Oct 31, 2022
4028: consensus: make TickHardForkLedgerView fields strict r=nfrisby a=nfrisby

While preparing PR #4014, I noticed the `forecast` function for `CardanoBlock` was lazy. This small patch makes it strict.

This is a Draft PR because I'm not totally sure we want it to be strict.

Co-authored-by: Nicolas Frisby <nick.frisby@iohk.io>
iohk-bors bot added a commit that referenced this pull request Nov 1, 2022
4028: consensus: make TickHardForkLedgerView fields strict r=nfrisby a=nfrisby

While preparing PR #4014, I noticed the `forecast` function for `CardanoBlock` was lazy. This small patch makes it strict.

This is a Draft PR because I'm not totally sure we want it to be strict.

Co-authored-by: Nicolas Frisby <nick.frisby@iohk.io>
@dnadales
Copy link
Member

@nfrisby and I agree it's best to merge this into master, so I'll do that once I get approval from @nfrisby

@dnadales dnadales marked this pull request as ready for review November 15, 2022 10:55
Copy link
Contributor Author

@nfrisby nfrisby left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nothing major. Looks good!

I realized something, though: this PR is not aimed at the master branch. So, we should:

  • change the PR to target master (.... I'm not even sure that's possible, but I'll try)

  • rebase the commits onto master (which will be a bit of a bother since db-analyser was relocated to ouroboros-consensus-cardano-tools).

  • also, the PR description is stale.

@dnadales dnadales force-pushed the nfrisby/old-benchmark-ledger-ops branch 2 times, most recently from ab6e540 to 3affef0 Compare December 3, 2022 00:32
@dnadales dnadales requested review from coot, newhoggy, a team, bolt12 and njd42 as code owners December 3, 2022 00:32
@dnadales dnadales changed the base branch from nfrisby/root-for-old-benchmark-ledger-ops to master December 3, 2022 00:32
@dnadales dnadales changed the title Draft ledger ops benchmark in db-analyser Add a ledger operations benchmarking analysis to db-analyser Dec 8, 2022
Copy link
Contributor Author

@nfrisby nfrisby left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Damian and had a couple calls on this. I Approve 🙌 (though he'll have to click that, since I opened the PR originally.)

In addition:

- Add a gnuplot script to plot the results of ledger ops benchmarking.
- This commit also creates the `script` directory in the root of the
  `db-analyser` directory. The plotting script is placed inside the
  former.
- Remove stale sentence in the README
- Provide examples of how to use db-analyser to store ledger
  db-snapshots and benchmark the ledger operations.
@dnadales dnadales force-pushed the nfrisby/old-benchmark-ledger-ops branch from 3aa1d9b to 74929d4 Compare December 13, 2022 17:13
@dnadales
Copy link
Member

bors r+

iohk-bors bot added a commit that referenced this pull request Dec 13, 2022
4014: Add a ledger operations benchmarking analysis to db-analyser r=dnadales a=nfrisby

Add support to `db-analyser` for benchmarking the  five major ledger calculations for each block:

0. Forecast.
1. Header tick.
2. Header application.
3. Block tick.
4. Block application.

We focus on these 5 operations because they are involved in:

- Chain syncing.
- Block forging.
- Block validation.


Fixes #4052 

Co-authored-by: Nicolas Frisby <nick.frisby@iohk.io>
@iohk-bors
Copy link
Contributor

iohk-bors bot commented Dec 13, 2022

Timed out.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
consensus issues related to ouroboros-consensus
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Implement a benchmark for the ledger operations
4 participants