Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

RPC node priorities are broken by design #2306

Closed
roman-khimov opened this issue Apr 19, 2023 · 4 comments · Fixed by #2352
Closed

RPC node priorities are broken by design #2306

roman-khimov opened this issue Apr 19, 2023 · 4 comments · Fixed by #2352
Assignees
Labels
bug Something isn't working neofs-ir Inner Ring node application issues neofs-storage Storage node application issues
Milestone

Comments

@roman-khimov
Copy link
Member

roman-khimov commented Apr 19, 2023

Expected Behavior

Once a NeoFS node is connected to an RPC node it must stick to this connection for as long as it can. A node can only use one RPC safely (because two RPCs can have different state).

Current Behavior

Node can "switch to the most prioritized" RPC even if it has a properly functioning connection. This then leads to #1337.

Priorities themselves are just a way to order RPCs even though they're perfectly ordered already in a config.

Possible Solution

Drop switch_interval and drop priorities, use the order we get in config. This is an incompatible change, thus tracked separately from #2304, even though it's related a bit.

@roman-khimov roman-khimov added bug Something isn't working neofs-ir Inner Ring node application issues neofs-storage Storage node application issues labels Apr 19, 2023
@roman-khimov roman-khimov added this to the v0.37.0 milestone Apr 19, 2023
roman-khimov added a commit that referenced this issue Apr 19, 2023
This is a temporary scheme, just enough to almost get rid of "deprecated"
warnings (refs. #2219). It also fixes #2304.

See also #2306, #2307 and #1337.

Signed-off-by: Roman Khimov <roman@nspcc.ru>
roman-khimov added a commit that referenced this issue Apr 19, 2023
This is a temporary scheme, just enough to almost get rid of "deprecated"
warnings (refs. #2219). It also fixes #2304.

See also #2306, #2307 and #1337.

Signed-off-by: Roman Khimov <roman@nspcc.ru>
@notimetoname notimetoname self-assigned this May 17, 2023
@notimetoname
Copy link
Contributor

notimetoname commented May 17, 2023

Drop switch_interval

@roman-khimov, can you clarify that state a little? In a private installation, there can be a situation when a client and a server can be on a one machine and can be on a different ones. To make communication as local as possible, we introduced "switch to the most prioritized" feature that tries to return to the first one in the priority order (does not matter if it is a separate config field of not).

drop priorities

One of the features was to be able to get the same priorities for some subset of endpoints from a user. Of course, it works only if switching is still present but that is the question above.

Also, can both features be still useful if #1337 is solved somehow separately?

@roman-khimov
Copy link
Member Author

#1337 can be solved, sure, but it still means some state reset for a node and it better be avoided unless you absolutely have to do it (like when the connection breaks), because it's a service interruption in one way or another. If we could switch seamlessly, maybe this would be useful, but I don't immediately see any way to do that. So theoretically we have a feature here, while practically it can be a bug (imagine some node performing a switch (that it doesn't really need!) during intensive load).

My primary problem is with the switch_interval though, if you think priorities can be useful we can keep them as some kind of weight for the next connection try (instead of simple ordering), but I'm not sure it's worth the trouble.

@notimetoname
Copy link
Contributor

if you think priorities can be useful

No, if we drop switch_interval any weights/equal priorities are absolutely useless.

If we could switch seamlessly, maybe this would be useful, but I don't immediately see any way to do that.

To sum up: we consider a connection as important as it can be and nothing should interrupt us (even if the connection is b/w "non-optimal" (network i mean) machines?). A healthy connection is a good connection and only its death can be a reason for instantiating a new neogo.Client?

This is an incompatible change...

How will we do that? Just dropping with some notes in the CHANGELOG?

@roman-khimov
Copy link
Member Author

roman-khimov commented May 18, 2023

we consider a connection as important as it can be and nothing should interrupt us

Yes, because you're losing state if you're disconnected and if you don't have a current state you can't serve requests. You can't consider optimal/suboptimal if you can't really serve anything.

This btw is a nice argument for keeping the local (stripped, non-archive) copy of the chain (that you can't disconnect from). But that's a different story.

How will we do that? Just dropping with some notes in the CHANGELOG?

Yep. This will inactivate these options, but old configs will still be valid, so it's less of an issue.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working neofs-ir Inner Ring node application issues neofs-storage Storage node application issues
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants