Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Location of participants in Lotus stack, multiple identities in one instance #103

Closed
Tracked by #253
Kubuxu opened this issue Feb 29, 2024 · 3 comments
Closed
Tracked by #253

Comments

@Kubuxu
Copy link
Collaborator

Kubuxu commented Feb 29, 2024

For now, I've assumed that the f3 active participant would live in Lotus.
This might not have been as good of an assumption.

An active participant is tied to SP code and identity; that flow lives in lotus-miner/lotus-provider.
At the same time, the lotus miner is not connected with the global pubsub (AFAIK).

A single Lotus node can also host multiple providers, which, if f3 continues to live there, would necessitate either multiple concurrent instances in one Louts node or f3 being able to handle multiple identities.

As far as I know, the protocol flow is independent of our own identity, which should make running multiple identities at the same much easier. We could abstract out signing and VRF generation as part of the broadcast operation. Essentially, the instance itself stops caring about our own identity.

@Kubuxu Kubuxu added this to the F3 Alpha milestone Apr 22, 2024
@Kubuxu
Copy link
Collaborator Author

Kubuxu commented May 20, 2024

The design I settled on:

  • gpbft.Participant is unaware of the ID it is running as the protocol is independent of our own decisions
    • this requires a slight refactor and cleanup in gpbft to remove ParticipantID
  • gpbft requests a given message to be broadcasted, that message is universal (not connected with any ID), this is passed to the Host which knows as which IDs it wants to broadcast, and uses powertable from gpbft to resolve IDs to public keys
  • The host then builds a payload to be signed with that key. Signing can happen over RPC boundary (necessary for offloading keys to lotus-miner). This results in serialized payloads that are to be signed with the given key and returned back to the host for broadcasting
  • When these payloads are returned to be broadcasted, they are also immediately processed as incoming messages

See #188 for PR on it. The suggestion there was to separate the PR into:

  1. async delivery of local messages
  2. removal of ID from gpbft.Participant
  3. addition of the message builder pattern

@anorth
Copy link
Member

anorth commented Jun 10, 2024

This has proven slightly trickier than we thought. The protocol code has assumptions that it can update state as a result of its own decisions more or less synchronously. This used to be achieved by internally receiving messages sent by self synchronously. This changed to async as part of #259 but means there's now a race between receiving those messages sent to self and receiving alarms #316.

I explored an internal send-to-self of unvalidated messages, but immediately ran into not knowing the node's own power, so the state updates are impossible.

Some potential paths forward that I see:

  • Attempt to remove any algorithmic assumption of being able to update state as a result of the participant's own decisions. I don't know how easy this will be. We haven't tried. Possibly related to Running gpbft without active participation (passive observation) #319.
  • When requesting broadcast of a message, synchronously receive back from the host the information needed to update state. This is a list of participant IDs and power amounts that the host will subsequently sign and broadcast on behalf of. This would still leave unknown: tickets for CONVERGE message/s.

masih added a commit that referenced this issue Jun 11, 2024
The gpbft implementation implicitly assumes that broadcast of `CONVERGE`
messages to self are delivered immediately. In practice this assumption
does not hold because of the complexity in deferred signing and async
message delivery.

The changes here relax this assumption by explicitly notifying the local
converge state that the self participant has begun the `CONVERGE` step,
providing self proposal and justification for the proposal. The code
then considers the given data whenever search in converge state does not
bear any results, caused by asynchronous message delivery. Further, the
code ignores the self converge value once at least one broadcast message
is received.

Fixes #316
Reverts #318
Relates to #103 (comment)
masih added a commit that referenced this issue Jun 11, 2024
The gpbft implementation implicitly assumes that broadcast of `CONVERGE`
messages to self are delivered immediately. In practice this assumption
does not hold because of the complexity in deferred signing and async
message delivery.

The changes here relax this assumption by explicitly notifying the local
converge state that the self participant has begun the `CONVERGE` step,
providing self proposal and justification for the proposal. The code
then considers the given data whenever search in converge state does not
bear any results, caused by asynchronous message delivery. Further, the
code ignores the self converge value once at least one broadcast message
is received.

Additionally, the changes remove zero-latency for messages to self in
simulations to make a stronger assertion that synchronous message
delivery to self is no longer required (neither for `GMessage` nor
alarms).

Fixes #316
Reverts #318
Relates to #103 (comment)
masih added a commit that referenced this issue Jun 11, 2024
The gpbft implementation implicitly assumes that broadcast of `CONVERGE`
messages to self are delivered immediately. In practice this assumption
does not hold because of the complexity in deferred signing and async
message delivery.

The changes here relax this assumption by explicitly notifying the local
converge state that the self participant has begun the `CONVERGE` step,
providing self proposal and justification for the proposal. The code
then considers the given data whenever search in converge state does not
bear any results, caused by asynchronous message delivery. Further, the
code ignores the self converge value once at least one broadcast message
is received.

Additionally, the changes remove zero-latency for messages to self in
simulations to make a stronger assertion that synchronous message
delivery to self is no longer required (neither for `GMessage` nor
alarms).

Fixes #316
Reverts #318
Relates to #103 (comment)
anorth pushed a commit that referenced this issue Jun 11, 2024
The gpbft implementation implicitly assumes that broadcast of `CONVERGE`
messages to self are delivered immediately. In practice this assumption
does not hold because of the complexity in deferred signing and async
message delivery.

The changes here relax this assumption by explicitly notifying the local
converge state that the self participant has begun the `CONVERGE` step,
providing self proposal and justification for the proposal. The code
then considers the given data whenever search in converge state does not
bear any results, caused by asynchronous message delivery. Further, the
code ignores the self converge value once at least one broadcast message
is received.

Additionally, the changes remove zero-latency for messages to self in
simulations to make a stronger assertion that synchronous message
delivery to self is no longer required (neither for `GMessage` nor
alarms).

Fixes #316
Reverts #318
Relates to #103 (comment)
github-merge-queue bot pushed a commit that referenced this issue Jun 11, 2024
…ly (#334)

* Relax the assumption of receiving own `CONVERGE` messages synchronously

The gpbft implementation implicitly assumes that broadcast of `CONVERGE`
messages to self are delivered immediately. In practice this assumption
does not hold because of the complexity in deferred signing and async
message delivery.

The changes here relax this assumption by explicitly notifying the local
converge state that the self participant has begun the `CONVERGE` step,
providing self proposal and justification for the proposal. The code
then considers the given data whenever search in converge state does not
bear any results, caused by asynchronous message delivery. Further, the
code ignores the self converge value once at least one broadcast message
is received.

Additionally, the changes remove zero-latency for messages to self in
simulations to make a stronger assertion that synchronous message
delivery to self is no longer required (neither for `GMessage` nor
alarms).

Fixes #316
Reverts #318
Relates to #103 (comment)

* Adjust naming and comments.

---------

Co-authored-by: Alex North <445306+anorth@users.noreply.github.com>
@masih
Copy link
Member

masih commented Jun 13, 2024

@anorth @Kubuxu In light of #334 Is there still remaining work here? If not can we close this issue?

@Kubuxu Kubuxu closed this as completed Jun 13, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
Status: Done
Development

Successfully merging a pull request may close this issue.

3 participants