Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

reduce use of global quorum config #2805

Merged

Conversation

taccatisid
Copy link
Contributor

PR description

The use of the org.hyperledger.besu.config.GoQuorumOptions static variable is ugly. In particular does changing its value during tests cause flaky tests. To ameliorate the issue this PR

  • removes unused code that did rely on GoQuorumOptions
  • removes access to GoQuorumOptions deep in the call stack by looking it up in the transaction validator instead
  • change the TransactionDecoder to explicitly take the goQuorumCompatbility flag as an argument where access to the transaction validator is not easy
  • rewriting all tests that did change the GoQuorumOptions by mocking the option change instead.

Signed-off-by: Taccat Isid taccatisid@protonmail.com

Fixed Issue(s)

fixes #2775.

Changelog

@taccatisid taccatisid force-pushed the 2275-remove-goquorum-global-flag branch 2 times, most recently from 8279e4e to 9eb1dcb Compare September 27, 2021 06:17
@taccatisid taccatisid force-pushed the 2275-remove-goquorum-global-flag branch from 9960c9a to 2591f11 Compare September 28, 2021 06:47
@taccatisid taccatisid marked this pull request as ready for review September 28, 2021 07:15
macfarla
macfarla previously approved these changes Sep 28, 2021
Copy link
Contributor

@macfarla macfarla left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Getting rid of changing the value in tests should fix the flakiness. It's not perfect ie still having different ways of getting the flag value but it's better than it was.

@taccatisid
Copy link
Contributor Author

@shemnon Given that you raised the issue, do you want to have a look at this PR?

Copy link
Contributor

@shemnon shemnon left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This reduces the use of, but does not eliminate the static field configuration. While it reduces the window for test flakiness. And its presence still makes it possible that new flakiness could be introduced unless we are very vigilant in the side effects. I think there is still room to eliminate it by changing a few classes from static methods to instance classes provided via Dependency Injection.

@macfarla macfarla self-assigned this Oct 5, 2021
@taccatisid taccatisid force-pushed the 2275-remove-goquorum-global-flag branch from 445eb95 to b6dd25e Compare October 11, 2021 02:52
@macfarla macfarla dismissed their stale review October 11, 2021 04:29

changing approach

@taccatisid taccatisid force-pushed the 2275-remove-goquorum-global-flag branch from b6dd25e to c0cd715 Compare October 11, 2021 23:01
@macfarla macfarla requested a review from pinges October 12, 2021 00:21
@macfarla
Copy link
Contributor

The second commit that fully eradicates the flag touches a lot of files. I think I prefer the "mostly gone" version. Can we memoize it so once it's set it can't be changed - that would prevent accidental misuse?

@taccatisid
Copy link
Contributor Author

This reduces the use of, but does not eliminate the static field configuration. While it reduces the window for test flakiness. And its presence still makes it possible that new flakiness could be introduced unless we are very vigilant in the side effects. I think there is still room to eliminate it by changing a few classes from static methods to instance classes provided via Dependency Injection.

@shemnon could you have another superficial look at the updated PR? The code surface required to be touched turned out to be quite large. And in light of looming tickets like the separation of privacy into a plugin, we (revenant) feel that threading all those extra objects through the call stack is not an improvement of the codebase.

If we would instead just take the first of the two commits, and add a getter and a write once setter for the static goQuorumCompatibilityMode we would achieve the same safety effects with much less code churn.

@shemnon
Copy link
Contributor

shemnon commented Oct 12, 2021

The value of this bug is to go from n to zero. To go from n to some smaller n still leaves the core issue that it is not safe to test in parallel. If we can't go to zero then it's not worth changing the code and should instead take a step back and try a different strategy to eliminate the static field. Having to use a static variable speaks to deep problems with the DI patterns used, as it completely subverts DI. Perhaps this will need to wait for the use of a DI framework like Dagger by the codebase to fully resolve.

@shemnon
Copy link
Contributor

shemnon commented Oct 12, 2021

I'll take a deeper look tonight, but making storage dependent on the genesis config doesn't look like a good solution.

@shemnon
Copy link
Contributor

shemnon commented Oct 12, 2021

Add SignatureAlgorithmFactory to the naughty list. That is another static configuration that is making this diffucult.

I think we need to refactor the main culprit first: the static methods on TransactionDecoder. This needs to become a factory instance, and two fields would be if goQuorum mode is enabled and what the signature algorithm is (secp256k1 vs secp256r1). The work will then be plumbing the factory instance into the places we need to get transactions decoded from RLP. I believe that will not impact the storage classes.

Once that is in place the few remaining instances can be inferred from other data. The TransactionSimulator can infer it from the presence of privacy parameters and then goquorum privacy parameters. The transaction can infer it from an empty chainId (the TransactionDecoder will guarantee that). The various controller builders can get it directly from the genesis config, which will be available as a field during their execution.
`

@taccatisid
Copy link
Contributor Author

The value of this bug is to go from n to zero. To go from n to some smaller n still leaves the core issue that it is not safe to test in parallel.

The issue that this is 100% safe for concurrent testing is easy to achieve without reducing that static flag use to zero. The first commit (2591f11) already uses mocking to test the goQorum = true case without touching the global variable. To make it impossible to have this cause a problem in test we would just have to replace the public static boolean with a private static boolean and a getter and write once setter.

I'll take a deeper look tonight, but making storage dependent on the genesis config doesn't look like a good solution.

It is already dependent, the PR just made that explicit. Storage needs to parse transactions which depends on the quorum flag which is stored in the genesis.

I think we need to refactor the main culprit first: the static methods on TransactionDecoder. This needs to become a factory instance, and two fields would be if goQuorum mode is enabled and what the signature algorithm is (secp256k1 vs secp256r1). The work will then be plumbing the factory instance into the places we need to get transactions decoded from RLP. I believe that will not impact the storage classes.

I considered this. It would be nicer code than what we have. But it would be a larger refactor: it would touch more or less all the places that the current PR touches but in a less mechanical way.

And without further changes it would not decouple the storage classes from transaction decoding (and thus the quorum flag). This mostly results from us not having decoupled the ability to read, write, and (in-memory) represent transactions. Thus as soon as you do one of these, you need the quorum flag--even if you never use it because you do not parse transactions.

@shemnon
Copy link
Contributor

shemnon commented Oct 13, 2021

Consider two simultaneous tests, one decoding a mainnet encoded transaction and one decoding a goquorum private transaction. The r values overlap and only the static field is used to declare one mainnet and one goquorum. It is not 100% safe to use a static field for such a basic operation.

@taccatisid taccatisid force-pushed the 2275-remove-goquorum-global-flag branch from c0cd715 to 5752224 Compare October 13, 2021 20:53
@taccatisid
Copy link
Contributor Author

When protecting the quorum flag from being changed during runtime like I proposed in the most recent commit, no tests could run concurrently with different flag values. Tests could not even change the quorum flag at all.

Instead, tests have to use mocks, static injection, or slightly adapt the tested class to locally overwrite the quorum config value. In the current code base, three test classes are playing with the static quorum flag. Changing them is much more economical than changing 200+ files as in the previous version of this PR.

@shemnon
Copy link
Contributor

shemnon commented Oct 13, 2021

That's like the Global Interpreter Lock in Python. We can do better.

@shemnon
Copy link
Contributor

shemnon commented Oct 13, 2021

I don't think any solution is going to "look good" unless something like Dagger is adopted, so the DI framework handles the plumbing of strange config settings to the far corners of the app. Adding another hack (like single threading the static value) isn't any better than the current situation.

@taccatisid taccatisid force-pushed the 2275-remove-goquorum-global-flag branch from 5752224 to c40a6fc Compare October 14, 2021 02:17
@taccatisid
Copy link
Contributor Author

We have two problems:

  1. Concurrent write access to the static flag causes flaky and unreliable tests.
  2. The code looks ugly.

This PR fixes 1 but does nothing (good) on 2.

I think we all agree that to fix this in a non-ugly way we need to solve the dependency injection problem. Which is a much bigger task and requires some careful coordination with turning parts of the codebase into plugins.

Can we merge this fix to get rid of flaky tests and discuss the proper way to deal with the issue afterwards?

Actually merging would of course require dealing with the other comments you made on first. But I would prefer to delay the polishing after we agreed what to do in principle.

@shemnon
Copy link
Contributor

shemnon commented Oct 14, 2021

I think we should have another maintainer look at this. I'm not comfortable approving it but I'm not blocking it either.

@taccatisid taccatisid force-pushed the 2275-remove-goquorum-global-flag branch from c40a6fc to 6793cbd Compare October 20, 2021 20:10
@taccatisid
Copy link
Contributor Author

After chats with revenant and @lucassaldanha we came to the conclusion that we dislike the protected static global variant of the PR a little less than the code churning one.

@taccatisid taccatisid force-pushed the 2275-remove-goquorum-global-flag branch from 6793cbd to 388f01d Compare October 20, 2021 21:37
Copy link
Contributor

@macfarla macfarla left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

mock investigation

@macfarla
Copy link
Contributor

I'm happy once the TransactionPoolTest gets cleaned up.

The use of the org.hyperledger.besu.config.GoQuorumOptions static
variable is ugly. In particular does changing its value during tests
cause flaky tests. To ameliorate the issue this commit
- removes unused code that did rely on GoQuorumOptions
- removes access to GoQuorumOptions deep in the call stack by looking
  it up in the transaction validator instead
- change the TransactionDecoder to explicitly take the
  goQuorumCompatbility flag as an argument where access to the
  transaction validator is not easy
- rewriting all tests that did change the GoQuorumOptions by mocking
  the option change instead.

Signed-off-by: Taccat Isid <taccatisid@protonmail.com>
Signed-off-by: Taccat Isid <taccatisid@protonmail.com>
Signed-off-by: Taccat Isid <taccatisid@protonmail.com>
Signed-off-by: Taccat Isid <taccatisid@protonmail.com>
@taccatisid taccatisid force-pushed the 2275-remove-goquorum-global-flag branch from 388f01d to aae58d6 Compare October 21, 2021 21:27
Copy link
Contributor

@macfarla macfarla left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@sonarcloud
Copy link

sonarcloud bot commented Oct 21, 2021

SonarCloud Quality Gate failed.    Quality Gate failed

Bug A 0 Bugs
Vulnerability A 0 Vulnerabilities
Security Hotspot A 0 Security Hotspots
Code Smell A 0 Code Smells

66.7% 66.7% Coverage
0.0% 0.0% Duplication

@macfarla
Copy link
Contributor

This also fixes #2813

@macfarla macfarla merged commit 9deb5ea into hyperledger:main Oct 22, 2021
garyschulte pushed a commit to garyschulte/besu that referenced this pull request Oct 22, 2021
* reduce use of global quorum config

The use of the org.hyperledger.besu.config.GoQuorumOptions static
variable is ugly. In particular does changing its value during tests
cause flaky tests. To ameliorate the issue this commit
- removes unused code that did rely on GoQuorumOptions
- removes access to GoQuorumOptions deep in the call stack by looking
  it up in the transaction validator instead
- change the TransactionDecoder to explicitly take the
  goQuorumCompatbility flag as an argument where access to the
  transaction validator is not easy
- rewriting all tests that did change the GoQuorumOptions by mocking
  the option change instead.

Signed-off-by: Taccat Isid <taccatisid@protonmail.com>

* protect access to GoQuorum flag by getter and write once setter

Signed-off-by: Taccat Isid <taccatisid@protonmail.com>

* deduce goQuorum flag from privacyOptions

Signed-off-by: Taccat Isid <taccatisid@protonmail.com>

* remove unnecessary mocks from TransactionPoolTest

Signed-off-by: Taccat Isid <taccatisid@protonmail.com>

Co-authored-by: Sally MacFarlane <sally.macfarlane@consensys.net>
eum602 pushed a commit to lacchain/besu that referenced this pull request Nov 3, 2023
* reduce use of global quorum config

The use of the org.hyperledger.besu.config.GoQuorumOptions static
variable is ugly. In particular does changing its value during tests
cause flaky tests. To ameliorate the issue this commit
- removes unused code that did rely on GoQuorumOptions
- removes access to GoQuorumOptions deep in the call stack by looking
  it up in the transaction validator instead
- change the TransactionDecoder to explicitly take the
  goQuorumCompatbility flag as an argument where access to the
  transaction validator is not easy
- rewriting all tests that did change the GoQuorumOptions by mocking
  the option change instead.

Signed-off-by: Taccat Isid <taccatisid@protonmail.com>

* protect access to GoQuorum flag by getter and write once setter

Signed-off-by: Taccat Isid <taccatisid@protonmail.com>

* deduce goQuorum flag from privacyOptions

Signed-off-by: Taccat Isid <taccatisid@protonmail.com>

* remove unnecessary mocks from TransactionPoolTest

Signed-off-by: Taccat Isid <taccatisid@protonmail.com>

Co-authored-by: Sally MacFarlane <sally.macfarlane@consensys.net>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Replace GoQuorum static flag with dynamic flag
3 participants