Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add Fallback Option for Eth1 Nodes #8062

Merged
merged 17 commits into from
Dec 11, 2020
Merged

Add Fallback Option for Eth1 Nodes #8062

merged 17 commits into from
Dec 11, 2020

Conversation

nisdas
Copy link
Member

@nisdas nisdas commented Dec 7, 2020

What type of PR is this?

Feature Addition

What does this PR do? Why is it needed?

  • Adds the ability to have fallbacks for our eth1 endpoints.
  • Fix references across the repo.
  • Performs regular health checks to switch over to primary node in the event it comes back up.
  • Add unit test to test this feature.

Which issues(s) does this PR fix?

Fixes #7969

Other notes for review

@nisdas nisdas requested a review from a team as a code owner December 7, 2020 11:49
@nisdas nisdas added the Ready For Review A pull request ready for code review label Dec 7, 2020
@nisdas nisdas added this to the v1.0.5 milestone Dec 7, 2020
@nisdas nisdas added the Enhancement New feature or request label Dec 7, 2020
@nisdas nisdas mentioned this pull request Dec 7, 2020
beacon-chain/flags/base.go Outdated Show resolved Hide resolved
}
if err := ctx.Set(flag.Name, web3endpoint); err != nil {
return errors.Wrapf(err, "could not set %s to %s", flag.Name, web3endpoint)
rawFlags := sliceutil.SplitCommaSeparated(ctx.StringSlice(flags.Name))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i think bootstrap nodes example uses a multiple flag and doesn't use SplitCommaSeparated c.BootstrapNodes = cliCtx.StringSlice(cmd.BootstrapNode.Name)

case strings.HasPrefix(rawValue, "ws://"):
case strings.HasPrefix(rawValue, "wss://"):
default:
web3endpoint, err := fileutil.ExpandPath(rawValue)
Copy link
Contributor

@shayzluf shayzluf Dec 7, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

not sure this code handles yaml file input
good example for this is in function registerP2P using readbootNodes

func (b *BeaconNode) registerP2P(cliCtx *cli.Context) error {

https://github.com/prysmaticlabs/prysm/blob/14e1f08208553292d0015f960e6821729a444cf0/beacon-chain/node/node.go#L347:6

@mohamedmansour
Copy link
Contributor

mohamedmansour commented Dec 7, 2020

@nisdas Thanks for doing this! The main issue with this design is that all the http endpoints are treated equally. We just want to provide a fallback endpoint in case the primary endpoint stopped working. This design will only change current endpoint if a failure occurs.

Would be nice if there was a health check on the endpoints so that if the primary endpoint is back up, it will automatically switch back to that. So instead of doing that check if the call failed, maybe another worker should be maintaining the health status of every endpoint and setting its current url.

Another design issue is that what if we have different authentication for each eth1 node, this doesn’t doesnt work here.

One of the main usecase for this feature is that we are running a local unauthenticated geth node and a remote infura authenticated node.

@nisdas
Copy link
Member Author

nisdas commented Dec 7, 2020

Hey @mohamedmansour ,

Thanks for giving your thoughts on this issue, performing a continous health check will require quite a bit of shifting around of our code. For obvious reasons the service as of this moment is only meant to be running a single persistent eth1 connection. I will think a bit more on if its feasible to be running continous health check on the primary eth1 node without messing around too much with our other core eth1 code.

As for the latter issue, it isnt relevant to the main purpose of the PR as of the current moment prysm only supports unauthenticated connections to eth1 nodes so it is out of scope for the PR. However support for JWT auth can be added in with a follow up PR.

@mohamedmansour
Copy link
Contributor

@nisdas prysm currently supports cert based eth1 nodes. With the current design of the PR if we supply two different eth1 endpoints, we have to use the same cert which most likely isn’t the case.

Regarding my first question, it is very important for the fallback url not to become primary eth1 node for a long time if the primary eth1 node recovered. Otherwise this solution would only be valuable for those who only have strictly remote eth1 services like alchemy (which should be just a few).

The idea of introducing a failover is to be meerly be just temporal, but in this case it will be permanent (which is not what the issue on github is about).

Thanks for replying :)

@nisdas
Copy link
Member Author

nisdas commented Dec 8, 2020

prysm currently supports cert based eth1 nodes. With the current design of the PR if we supply two different eth1 endpoints, we have to use the same cert which most likely isn’t the case.

Currently prysm doesn't have the ability to provide a custom cert for an eth1 endpoint, we use the default certificate authorities to authenticate any upgraded https connections. From the view of the service there is no ability to provide a custom cert to an eth1 endpoint. This will also have to be designed along with JWT auth.

Regarding my first question, it is very important for the fallback url not to become primary eth1 node for a long time if the primary eth1 node recovered. Otherwise this solution would only be valuable for those who only have strictly remote eth1 services like alchemy (which should be just a few).

Understood :) , will take a look at this a bit more deeply then 👍

Copy link
Contributor

@rkapka rkapka left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think changing the HTTPWeb3ProviderFlag to a slice is the best design. The problem is that if you specify the flag multiple times, it's not clear which endpoint is the primary one, and this is very important. Does the CLI library provide an ordering guarantee when using a slice flag?

A clearer approach IMO would be to keep the existing flag as it is, using it for the primary endpoint, and add a new FallbackHTTPWeb3Provider(s) flag for fallback endpoints, which can either be specified multiple times or as a comma-separated list of endpoints (adding s to the flag name in that case).

beacon-chain/powchain/service.go Outdated Show resolved Hide resolved
@rkapka
Copy link
Contributor

rkapka commented Dec 10, 2020

Another option would be to allow specifying a primary suffix e.g. http-web3provider=blah,primary and default to httpEndpoints[0] in case no endpoint is marked as primary, maybe also issuing a warning in this case. We should also handle the scenario where multiple endpoints are marked as primary (error? choose the first one?).

This solution will interfere with @shayzluf's authorization work, though.

shared/cmd/helpers.go Outdated Show resolved Hide resolved
shared/cmd/helpers_test.go Outdated Show resolved Hide resolved
shared/cmd/helpers_test.go Outdated Show resolved Hide resolved
@rkapka
Copy link
Contributor

rkapka commented Dec 10, 2020

What about #8062 (comment)? Do you think it's worthwhile to unify this?

@nisdas nisdas merged commit 99b3835 into develop Dec 11, 2020
@delete-merged-branch delete-merged-branch bot deleted the addMultipleEndpoints branch December 11, 2020 11:15
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Enhancement New feature or request Ready For Review A pull request ready for code review
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Multiple ETH1 endpoints
4 participants