Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Provide some tests demonstrating the resilience of a Hydra Head cluster #1106

Open
abailly-iohk opened this issue Oct 10, 2023 · 0 comments
Open
Labels
green 💚 Low complexity or well understood feature L2 Affect off-chain part of the Head protocol/network network task

Comments

@abailly-iohk
Copy link
Contributor

Why

We have been working on improving the resilience of a Hydra Head cluster so that transient failures, whether from the network connection, a node crashing and recovering, or temporary partitions, do not lead to the Head being stuck and forces an expensive closing and reopening of the Head:

We know there are still corner cases which are not covered, but they seem to be very unlikely to occur and as these kind of issues do not breach the intrinsic safety of a Head, we think there's not much value in closing those gaps.

However:

  • We may be wrong in evaluating the frequency of those issues: They may occur more easily,
  • We don't have any concrete evaluation of how resilient is a Hydra cluster today with all those changes

What

Implement some "Chaos monkey" tests, possibly manually or semi-automated, that demonstrates a Hydra Head can survive (or not) transient random crashes, network partitions, connection drops, etc.

How

  • We could perhaps reuse jepsen although it's a bit complicated to setup
  • We did some manual tests using iptables to drop connections and manually killing nodes, but we would like a more systematic exploration to improve coverage
  • The hydra-cluster benchmarks would be a good basic, eg. we don't care about the L1 connectivity and we could even use a mock L1.
@abailly-iohk abailly-iohk added network green 💚 Low complexity or well understood feature task L2 Affect off-chain part of the Head protocol/network labels Oct 10, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
green 💚 Low complexity or well understood feature L2 Affect off-chain part of the Head protocol/network network task
Projects
None yet
Development

No branches or pull requests

1 participant