Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Make persistence incremental thread-safe for Network Reliability layer #1211

Merged
merged 13 commits into from
Dec 16, 2023

Conversation

abailly-iohk
Copy link
Contributor

@abailly-iohk abailly-iohk commented Dec 13, 2023

This PR attempts to fix #1202 by removing concurrent loading and appending of messages, and using an in-memory cache to resend messages. This is achieved by preventing loading from a different thread in PersistenceIncremental once we have started appending to it.

This is a bit lame, and there are actually more problems lurking in this persistence layer as we cannot guarantee the vector clock indices are always in sync with the messages we persist. We should start thinking of moving to a persistence backend where we can guarantee atomicity of updates to vector clock and persisted message.


  • CHANGELOG updated or not needed
  • Documentation updated or not needed
  • Haddocks updated or not needed
  • No new TODOs introduced or explained herafter

Copy link

github-actions bot commented Dec 13, 2023

Transactions Costs

Sizes and execution budgets for Hydra protocol transactions. Note that unlisted parameters are currently using arbitrary values and results are not fully deterministic and comparable to previous runs.

Metadata
Generated at 2023-12-15 10:47:41.092821605 UTC
Max. memory units 14000000
Max. CPU units 10000000000
Max. tx size (kB) 16384

Script summary

Name Hash Size (Bytes)
νInitial 985245919fcc6c0c5cd116023cd2c947c43e80dcbb5075fe12433fbb 4072
νCommit 7cb20fa71eb4c563ca283566ebe0aa65859d96c3f8cba35c52c181fd 2043
νHead 7a36661f5c15e9f1783aeaab890812c59b7286cbbc6de762d3110772 8816
μHead 8b111ac12274e46314769295a1c5dcab1d260096fc469fd698065463* 3851
  • The minting policy hash is only usable for comparison. As the script is parameterized, the actual script is unique per Head.

Cost of Init Transaction

Parties Tx size % max Mem % max CPU Min fee ₳
1 4374 10.48 4.05 0.46
2 4575 12.47 4.79 0.49
3 4774 14.87 5.71 0.52
5 5183 19.12 7.31 0.59
10 6184 30.20 11.51 0.75
41 12418 98.92 37.55 1.77

Cost of Commit Transaction

This is using ada-only outputs for better comparability.

UTxO Tx size % max Mem % max CPU Min fee ₳
1 534 11.37 4.44 0.30
2 724 15.04 6.07 0.35
3 908 18.85 7.75 0.40
5 1285 26.90 11.27 0.51
10 2211 49.55 20.97 0.80
19 3905 99.43 41.75 1.43

Cost of CollectCom Transaction

Parties UTxO (bytes) Tx size % max Mem % max CPU Min fee ₳
1 57 480 21.78 8.51 0.41
2 114 594 34.08 13.42 0.55
3 170 704 45.93 18.28 0.68
4 225 810 62.12 24.85 0.87
5 283 920 82.43 33.02 1.09
6 339 1031 95.16 38.53 1.24

Cost of Close Transaction

Parties Tx size % max Mem % max CPU Min fee ₳
1 551 16.64 7.66 0.36
2 728 18.45 9.31 0.40
3 887 19.81 10.63 0.43
5 1318 23.85 14.26 0.50
10 2065 31.71 21.62 0.65
50 8924 97.24 83.21 1.93

Cost of Contest Transaction

Parties Tx size % max Mem % max CPU Min fee ₳
1 582 20.22 8.97 0.40
2 763 22.21 10.68 0.44
3 966 24.20 12.40 0.48
5 1237 27.83 15.47 0.54
10 2212 37.98 24.19 0.73
44 7848 99.59 77.87 1.86

Cost of Abort Transaction

Some variation because of random mixture of still initial and already committed outputs.

Parties Tx size % max Mem % max CPU Min fee ₳
1 4222 15.14 6.24 0.50
2 4483 31.74 13.84 0.70
3 4674 47.58 20.89 0.89
4 4850 66.26 29.17 1.11
5 5013 86.75 38.33 1.35

Cost of FanOut Transaction

Involves spending head output and burning head tokens. Uses ada-only UTxO for better comparability.

Parties UTxO UTxO (bytes) Tx size % max Mem % max CPU Min fee ₳
5 0 0 4205 7.92 3.32 0.42
5 1 57 4239 9.36 4.17 0.44
5 5 285 4375 14.19 7.18 0.51
5 10 570 4545 20.55 11.08 0.60
5 20 1136 4881 32.64 18.62 0.76
5 30 1708 5225 45.03 26.28 0.93
5 40 2277 5564 57.46 33.97 1.10
5 50 2847 5904 69.77 41.61 1.27
5 74 4212 6718 99.34 59.96 1.68

End-To-End Benchmark Results

This page is intended to collect the latest end-to-end benchmarks results produced by Hydra's Continuous Integration system from the latest master code.

Please take those results with a grain of salt as they are currently produced from very limited cloud VMs and not controlled hardware. Instead of focusing on the absolute results, the emphasis should be on relative results, eg. how the timings for a scenario evolve as the code changes.

Generated at 2023-12-15 10:50:09.935283054 UTC

Baseline Scenario

Number of nodes 3
Number of txs 9000
Avg. Confirmation Time (ms) 23.084302383
P99 112.53818575000017ms
P95 32.08209404999999ms
P50 20.4642205ms
Number of Invalid txs 0

Baseline Scenario

Number of nodes 1
Number of txs 3000
Avg. Confirmation Time (ms) 4.154921120
P99 6.760193569999995ms
P95 4.906416499999999ms
P50 4.014542ms
Number of Invalid txs 0

@abailly-iohk abailly-iohk force-pushed the abailly-iohk/thread-safe-reliability-persistence branch from 6bd29fa to a3e2aa6 Compare December 13, 2023 14:27
Copy link

github-actions bot commented Dec 13, 2023

Test Results

378 tests  +1   373 ✔️ +1   21m 11s ⏱️ -16s
128 suites ±0       5 💤 ±0 
    5 files   ±0       0 ±0 

Results for commit 1b07b81. ± Comparison against base commit 5d1593a.

♻️ This comment has been updated with latest results.

@abailly-iohk abailly-iohk force-pushed the abailly-iohk/thread-safe-reliability-persistence branch from a3e2aa6 to 588a7a3 Compare December 13, 2023 15:25
@ch1bo ch1bo force-pushed the abailly-iohk/thread-safe-reliability-persistence branch from 588a7a3 to 63a1a83 Compare December 14, 2023 10:41
@ch1bo ch1bo requested a review from a team December 14, 2023 10:42
@ch1bo ch1bo force-pushed the abailly-iohk/thread-safe-reliability-persistence branch from 8853335 to 77d749b Compare December 14, 2023 17:29
Copy link
Member

@ch1bo ch1bo left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This should fix the bug and I added some minor nitpicks myself.

hydra-node/test/Hydra/Network/ReliabilitySpec.hs Outdated Show resolved Hide resolved
@ch1bo ch1bo requested a review from a team December 14, 2023 17:30
(forever $ threadDelay 0.01 >> loadAll)
(forM_ moreItems $ \item -> append item >> threadDelay 0.01)
`shouldThrow` \case
IncorrectAccessException{} -> True
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Funnily, I had something like that initially, but I found giving a proper name to the race_ ... block more legible and stating explicitly what it's supposed to do instead of requiring the user to unpack the whole thing.

abailly-iohk and others added 13 commits December 15, 2023 11:45
This small experiment demonstrates how to detect "race conditions"
when some atomic result is expected, the idea is to use a TVar to
simulate files operations and show the consequences of non-atomic
writes and reads on messages resending.
This test is not very satisfying but for now I cannot come up with a
better idea. It shows that with large messages (~10kB) there's
exception thrown but the peers can get stuck, which is a different
problem than the one observed in production.
We try to improve naming the log messages' fields, make it more
consistent between different messages, and replace field accessor
function and positional construction with proper field name punning.
We used to do that but then moved to always reloading from file, which
lead to race conditions when reading/writing concurrently and
incorrect deserialisation of messages. It seems both safer and faster
to use a local cache, and only reload messages at start of the network
layer.
@ch1bo ch1bo force-pushed the abailly-iohk/thread-safe-reliability-persistence branch from 77d749b to 1b07b81 Compare December 15, 2023 10:45
@v0d1ch v0d1ch merged commit 124156f into master Dec 16, 2023
21 checks passed
@v0d1ch v0d1ch deleted the abailly-iohk/thread-safe-reliability-persistence branch December 16, 2023 09:35
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Node fails to broadcast AckSn / PersistenceException
3 participants