Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

RPC Redundancy/Failover Configuration #733

Closed
TrevorJTClarke opened this issue Nov 22, 2021 · 14 comments
Closed

RPC Redundancy/Failover Configuration #733

TrevorJTClarke opened this issue Nov 22, 2021 · 14 comments

Comments

@TrevorJTClarke
Copy link
Contributor

TrevorJTClarke commented Nov 22, 2021

Is your feature request related to a problem? Please describe.
Current RPC providers can have downtime, temporary connectivity issues, or rate limits that make clients transactions fail. Over the past year we have observed several windows of RPC failure, which could have been mitigated if near-api-js had configurations for multiple RPC providers.

Describe the solution you'd like
Similar to issues #703 and #717, the core JSON RPC provider should be refactored to not only have retries, but allow for retries against multiple Providers. The default provider list can configure both near foundation and openshards RPC services. Provider list can be a single node string for backward compatibility OR an array of node strings.
Up for discussion but needed: Create a failover threshold of retries for each provider, and a threshold for provider failures before defaulting to a different priority.

Configuration Example:

RPC_MAINNET_PROVIDERS="https://rpc.mainnet.near.org,[https://mainnet-rpc.openshards.io](https://mainnet-rpc.openshards.io)"

RPC_GUILDNET_PROVIDERS="https://guildnet-rpc.openshards.io,[https://rpc.guildnet.near.org](https://rpc.guildnet.near.org)"

NOTE: Because near-api-js is used in many dapps and repos, this functionality is very key toward providing the easiest way to allow clients to decentralize their RPC access. This is critical for community attacks against public resources.

Describe alternatives you've considered
Users must create multiple instances of the Near module with different providers configured and detect TXN failures. Not idea at all.

Additional context
There is an ongoing effort to create a decentralized RPC for mainnet & guildnet using many of the openshards.io nodes with a redundant load-balancer.

@volovyks
Copy link
Collaborator

I believe our strategy for the decentralization of RPC Servers is a bit different, but some of these ideas can be implemented. On near-api-js level we can provide support for multiple servers and fallback logic, but it will add an additional level of complexity and source of petensioal bugs.
@frol @MaximusHaximus , any thoughts?

@volovyks
Copy link
Collaborator

Similar suggestion from @artob: #735

@frol
Copy link
Collaborator

frol commented Nov 25, 2021

@volovyk-s What is on your mind in terms of a different strategy? I feel near-api-js is the right abstraction layer to deal with the pool of RPC servers to enable true decentralization.

@volovyks
Copy link
Collaborator

@frol there are two separate problems. The first one is a stability of a single RPC Server. The second one is the ability to switch to another server when the first one is down (decentralization). As far as I know, our current strategy was to work on stability first (API Keys). For the second one, I agree, usage of multiple RPC Servers with fallbacks on a clientside is the best option. But we will need to design it carefully, simple fallback on each call can be slow. And we will need to support API Keys for each such server. I will prioritize this issue.

@volovyks volovyks added the P2 Pretty important label Nov 29, 2021
@frol
Copy link
Collaborator

frol commented Nov 29, 2021

@volovyk-s Ah, well, those are indeed two completely different efforts, but they came somewhat together, and we need both solutions: (1) extended RPC connection configuration, (2) failover configuration. This issue is about the second point.

@TrevorJTClarke
Copy link
Contributor Author

@volovyk-s Ah, well, those are indeed two completely different efforts, but they came somewhat together, and we need both solutions: (1) extended RPC connection configuration, (2) failover configuration. This issue is about the second point.

(and @volovyk-s)

I apologize on the side discussion on decentralization here... It was just to mention the addition context and reasons.

The goal of this issue is to add support for multiple RPC configurations, allowing retries against a prioritized list of RPC nodes. This at least mitigates single RPC provider failures/outages.
The decentralization should be handled by a very different setup than SDK. :)

@TrevorJTClarke
Copy link
Contributor Author

@frol @volovyk-s any movement on this? Another downtime/major latency issue on mainnet, with many apps unusable because of the dependency of a single RPC provider.

@volovyks
Copy link
Collaborator

volovyks commented Dec 21, 2021

Seems like the ideal solution here will be the creation of FailoverJsonRpcProvider, which will make several simultaneous calls to all provided RPC URLs and return the result if, let's say, 50+% returns the same value. Or the first successful result if we want it to be snappy.

The problem here is the increased load on RPC Servers, something that we are trying to avoid.
Also, the code of near-api-js is heavily coupled and relies on JsonRpcProvider instead of Provider interface. Usage of the new Provider will lead to a ton of breaking changes.
@MaximusHaximus I think we should move the provider to a separate library in the future. People should be able to create their own implementations and use them in near-api-js-x.

In our case, we will need to refactor the existing JsonRpcProvider. And probably these calls and checks will be sequential. It will increase response time when the main RPC Server is down.

Also, we can refactor utils/web.ts to achieve the same result (with similar downsides).

@volovyks volovyks linked a pull request Dec 23, 2021 that will close this issue
@think-in-universe
Copy link
Member

Enable configuring multiple RPC nodes also helps to resolve the feature request of switching RPC URLs in NEAR wallet: near/near-wallet-roadmap#36, if wallet could set fallback RPC URLs by default.

@SteveYuOWO
Copy link

SteveYuOWO commented Jan 25, 2022

I tried to add a new property node_urls for Near.ts. Tried polling different Connections and found that none of the Connection functions could return the status of the node. Invoke status() function still request retry 12 times rather than return a wrong status.

Trying to only modify the implementation of Near.ts is wrong and modifying json-rpc-provider.ts and utils/web.ts are appropriate.

@SteveYuOWO
Copy link

SteveYuOWO commented Jan 25, 2022

When executing fetchJson, try to poll through the list of all rpc's. Find a connectable rpc and default it to be reliable. This avoids exponentialBackoff to nodes that can never connect.

SteveYuOWO added a commit to SteveYuOWO/near-api-js that referenced this issue Jan 25, 2022
SteveYuOWO added a commit to SteveYuOWO/near-api-js that referenced this issue Jan 25, 2022
SteveYuOWO added a commit to SteveYuOWO/near-api-js that referenced this issue Jan 25, 2022
SteveYuOWO added a commit to SteveYuOWO/near-api-js that referenced this issue Jan 25, 2022
SteveYuOWO added a commit to SteveYuOWO/near-api-js that referenced this issue Jan 25, 2022
@TrevorJTClarke
Copy link
Contributor Author

@frol @volovyk-s Yet another downtime/major latency issues on primarily on testnet, because of the dependency of a single RPC provider.

@janewang janewang added the good first issue Good for newcomers label Aug 26, 2022
@exalate-issue-sync exalate-issue-sync bot added good_first_issue and removed good first issue Good for newcomers labels Oct 12, 2022
@leapsamvel
Copy link

Dear Team,

We have a library named fallback-falooda which helps us get the reliable node in the list of nodes in our cosmos environment. We are also using it in our node selector in our near specific use cases.

I have made a few changes in the code to make it work with fallback falooda. here -

master...leapsamvel:near-api-js:master

We can, either

  1. Take the approach of using fallback falooda in the library or
  2. Provide a getter method param and allow the user to provide the URL dynamic when the RPC call is made.

Could you let me know, and I will raise the PR accordingly with the test cases and documentation?

TIA.

@vikinatora
Copy link
Collaborator

Resolved by #1334

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
8 participants