Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Use gossip to co-ordinate worker selection #507

Open
3 tasks
Tracked by #9
mudler opened this issue Aug 16, 2024 · 1 comment
Open
3 tasks
Tracked by #9

Use gossip to co-ordinate worker selection #507

mudler opened this issue Aug 16, 2024 · 1 comment
Labels
blocked enhancement New feature or request

Comments

@mudler
Copy link
Contributor

mudler commented Aug 16, 2024

Is your feature request related to a problem? Please describe.
With this PR: #504 and masa-finance/roadmap#69 we are introducing a way to select workers directly from the list of the nodes. After that, we dial in with libp2p the nodes, and we send over the work.

The problem of this approach is that there is no orchestration of the work, and this means that in case of failure, a a client would have to iterate over the list of nodes until it finds one which is successfully handling the request.

Describe the solution you'd like
before sending the job to the workers and select RR one by one from the list, we want to have a pre-orchestration phase by using a separate communication channel ( a gossip pub/sub channel) which allows to the nodes to syncronize. The protocol should be the following:

  1. A client sends a work request to the pub sub channel (state exchange channel)
  2. The client receives answers from the workers, and selects the worker (the worker that replies are the only ones that can fullfill the request). It keeps a list of the workers that successfully answered
  3. The client tries to deal the workers, verifies that can connect
  4. The client sends the job
  5. In case of failure, the client goes back to 3 by selecting another worker (Least resource used)
  6. The client acknowledges the work has been done and broadcast that the job was executed successfully, or refused to the pub/sub gossip channel for state exchange

When we have #496 in, we can:
8. have the ledger recording the work has been done
9. the ledger selects the work in step 2, and automatically can have a record of the reward for #382

Describe alternatives you've considered

In the long run, we want the ledger to select the workers that can fullfill a user request so we can control fairness (step 2 and 6 covered by the ledger). This card is skewed to only provide a gossip protocol for orchestration before we jump into a full implementation

Additional context

Adding the diagram that was built from the brainstorming session:

masa_protocol

Acceptance criteria

  • We have a state exchange gossip pub/sub between all the nodes
  • The client and the workers follow the protocol defined above with the state exchange pub/sub channel to distribute the work
  • Follow-up card about incorporating feat: consensus mechanism to write to the CAS/ledger #496 and having the ledger select the worker for the client
@mudler
Copy link
Contributor Author

mudler commented Oct 1, 2024

Blocked by #532

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
blocked enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

1 participant