Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We鈥檒l occasionally send you account related emails.

Already on GitHub? Sign in to your account

Remove possibility of conflict between --web3provider and --http-web3provider #5826

Closed
handelaar2 opened this issue May 12, 2020 · 16 comments 路 Fixed by #6055
Closed

Remove possibility of conflict between --web3provider and --http-web3provider #5826

handelaar2 opened this issue May 12, 2020 · 16 comments 路 Fixed by #6055
Assignees
Labels
Discussion Simply a thread for talking about stuff Enhancement New feature or request Help Wanted Extra attention is needed Priority: High High priority item
Milestone

Comments

@handelaar2
Copy link

handelaar2 commented May 12, 2020

馃悶 Bug Report

Description

There are 2 parameters for pointing to an Eth1-node: --web3provider and --http-web3provider
However these parameters cannot be used "independently".
Only populating one parameter (say --http-web3provider ) and omit the other defaults that one to goerli.prylabs.net.

Having two eth1 sources seems to lead to conflicting data and the beacon log start showing errors:
[2020-05-12 08:16:29] ERROR powchain: Unable to process past logs Could not process deposit log: could not get correct merkle index: latest index observed is not accurate, wanted 30928, but received 3090

2020-05-12_10-24-02

The local (Nethermind) Eth1-node is also suffering from this conflict and is bombarded with RPC/http eth_logs-calls, as the conflicting data is causing a loop.
2020-05-12_09-50-57

In other words, -web3provider and --http-web3provider have to point to the same eth1-node so there is only one source of eth1 data.

@nisdas confirmed that any mixing here would be the cause of the error message and the looping.

If this is indeed a restriction of the current eth1-code, the web3-parameters should be defined accordingly:
--web3provider =>ip address only
--web3_ws-port
--web3_http-port

Has this worked before in a previous version?

n/a

Yes, the previous version in which this bug was not present was: ....

馃敩 Minimal Reproduction

Use Nethermind as local eth1 node. Run the beacon node with "only" the http parameter. That is causing the above (looping) error. Adding the ws-parameter is not possible. Nethermind does not support websocket subscriptions. So activating parameter --web3provider (ws) gives the following error messages (both in Nethermind and in beacon-node):
2020-05-12_11-35-23

So leaving the ws-parameter out:
-local_node=>http
-goerli.prylabs.net=>ws
This is causing the RPC loops.
From a user perspective: there is no error/warning message in the beacon that setting only one parameter is actually not possible.

馃敟 Error





馃實 Your Environment

Operating System:

  
docker with latest beacon and Nethermind 1.8.30
  

What version of Prysm are you running? (Which release)

  
96219fc9fad6b72b751352c2c198eb067749f3e0
  

Anything else relevant (validator index / public key)?

I think it is important to reduce the dependency on goerli.prylabs.net for eth1 data. Any obstacle or unclarity of using a local eth1-node should be removed.

@0xKiwi
Copy link
Contributor

0xKiwi commented May 12, 2020

Hello @handelaar2 ! Thanks for the report, asking for clarity here, but what is the issue you'd like to see solved? The flag dependency on eachother matching? Or better UX in terms of setting the flags to ensure they match eachother?

@0xKiwi 0xKiwi added the Bug Something isn't working label May 12, 2020
@terencechain
Copy link
Member

@0xKiwi
Copy link
Contributor

0xKiwi commented May 12, 2020

I see! So solving this would be making a --web3provider flag with --web3-ws-port and --web3-rpc-port flags.

@0xKiwi 0xKiwi added Enhancement New feature or request Good First Issue Good for newcomers and removed Bug Something isn't working labels May 12, 2020
@0xKiwi 0xKiwi changed the title Conflict between --web3provider and --http-web3provider Remove possibility of conflict between --web3provider and --http-web3provider May 12, 2020
@handelaar2
Copy link
Author

handelaar2 commented May 12, 2020

@0xKiwi That was indeed what Nishant called a good suggestion. We would have two port parameters but only one web3provider IP address. This way the prysm "eth1-code" would not be confused having "input" from two sources.
It would also block users from potentially entering 2 different IP addresses and/or different hardcoded defaults kicking in.

It would however not solve my problem which is: using Nethermind as my local eth1-node. (Hopefully they prioritize development of web socket subscriptions). In their discord however people write:
2020-05-12_18-14-18

@nisdas Anything from your side to add?

@tkstanczak Tomasz, care to comment about websocket subscriptions for Nethermind?

@nisdas
Copy link
Member

nisdas commented May 13, 2020

A subscription based api makes code cleaner and easier to write for. The alternative would be polling based apis, which would require a rewrite of portions of our eth1 code. Do you have the reasons stated out for why subscription methods are bad here?
@handelaar2

@MicahZoltu
Copy link

For context, my recommendation to not use eth_subscribe comes from the fact that its design makes it susceptible to unrecoverable faults. Augur ran into a ton of trouble with this during its development and I eventually wrote https://github.com/ethereumjs/ethereumjs-blockstream/ for them to deal with it, which doesn't use eth_subscribe at all.

The short version of the problem is that if you assume the backend node can disappear at any moment (and then come back later, or you get load balanced to a different node), you cannot rely on witnessing every block/log addition and removal necessary to correctly maintain whatever client-side chain state you are maintaining. The library linked above solves this by maintaining a some number of blocks in-memory client-side, and making sure that it always has a clear chain forward and backwards (during a reorg) before notifying the client of an addition/removal. If you just use eth_subscribe to get your block/log notifications, you may miss an addition, or miss a removal, which can lead to client-side sync issues.

I don't know anything about this project, but I do recommend against anyone using eth_subscribe for anything other than toy projects. For real stuff you really need chain history to be reconciled within the app's failure domain, or you need a much more robust system for maintaining synchronization with a backing node than eth_subscribe offers.

FWIW: This came up because Augur used Infura as its backend early on and when you talk to Infura you aren't talking to a single Ethereum node, you are talking to a load balancer that is backed by many nodes. These nodes may be out of sync with each other, which makes the "missed notification" problem actually pretty common. However, even if you are using a single backing Ethereum node, the missed notifications can occur if that node has an outage (even a very brief one).

@handelaar2
Copy link
Author

handelaar2 commented May 13, 2020

So if you ask me what would solve all issues: switch to HTTP RPC endpoint only. #2299 was a good start ;-)) but we also need HTTP RPC for fetching future logs.

  • more robust (Micah's experience)
  • no more mixing of web3 endpoints (one ip+one port; no more merkle index errors)
  • better integration with existing web3 providers (Nethermind,Infura, ....)

Just my 2 cents.

@prestonvanloon
Copy link
Member

It's not as easy as setting a different port. Providers like infura have an entirely different connection URL between websockets and json-rpc.

@nisdas
Copy link
Member

nisdas commented May 14, 2020

Looking at our code and what we have in our powchain package, it might not be too much work to sunset websockets and eth_subscribe from our codebase. We only have 1 method that uses it. Although whether we choose to use a subscription based method or instead a polling approach, both methods will require us to have some sort of client synchronization logic on our end to handle missing blocks and logs we receive from the eth1 chain along with any accompanying re-orgs.

What we have now is a bit handwavy, where we basically look at the current head block , and request all logs from Head - N . This is usually fine if N is a large enough number as goerli rarely ever has deep re-rorgs and it is even more unlikely on the mainchain. Although the spec does do the same thing, one of my goals this quarter was to have something more robust for reconcilling the eth1 state of our deposit contract. We do try request any missing blocks and logs in the event they are missing, however dealing with eth1 re-orgs is a bit more difficult. This change could be bundled into it, and I do not think it would be too much extra work.

@prysmaticlabs/core-team what do you guys think ?

@nisdas nisdas added Discussion Simply a thread for talking about stuff Refactor and removed Good First Issue Good for newcomers labels May 14, 2020
@prestonvanloon
Copy link
Member

I am in favor of removing websockets requirement in Prysm. We don't need on real time events in ETH1 given that we are only supposed to be looking at blocks that are at least 1024 older than the head of ETH1.

Removing websockets eases infrastructure requirements, code complexity, and potential failure vectors.

@nisdas is this something you are interested in working on? If not, I can delegate or work on it myself.

@prestonvanloon prestonvanloon added this to the Diamond milestone May 18, 2020
@nisdas
Copy link
Member

nisdas commented May 19, 2020

I don't have the immediate bandwidth right now to work on it, so its fine if you want to take this on right now or delegate it @prestonvanloon

@alonmuroch
Copy link
Contributor

@prestonvanloon I might be able to help.
The gist of it is to remove any dependency for eth 1 web socket + flags?

@prestonvanloon
Copy link
Member

@alonmuroch yes exactly. There is some functionality that needs to be rewritten. I wouldn't call this a trivial issue, but seems like something you could do. Let me know if I can assign to you

@alonmuroch
Copy link
Contributor

@prestonvanloon I'll give it a go.
Though it's marked as closed..

@alonmuroch
Copy link
Contributor

@rauljordan @prestonvanloon this issue is still open? should I take a look at it or the PR attached takes care of it?

@prestonvanloon
Copy link
Member

@alonmuroch I believe Raul is working on this in #6055.
This turned out to be a high priority item from user feedback on the survey we sent last week.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Discussion Simply a thread for talking about stuff Enhancement New feature or request Help Wanted Extra attention is needed Priority: High High priority item
Projects
None yet
Development

Successfully merging a pull request may close this issue.

8 participants