Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Crash resistance for excludeMembers #123

Closed
11 tasks done
Powersource opened this issue May 24, 2023 · 3 comments · Fixed by ssbc/ssb-tribes2-demo#12
Closed
11 tasks done

Crash resistance for excludeMembers #123

Powersource opened this issue May 24, 2023 · 3 comments · Fixed by ssbc/ssb-tribes2-demo#12
Assignees
Labels
must Needed to complete grant milestone

Comments

@Powersource
Copy link
Collaborator

Powersource commented May 24, 2023

For exclusion we post 3 different messages

  1. First the group/exclude-member message. We don't need to recover from this, if we crash on this step, the user can tell and they just have to try again.
  2. Then a group/init to init the new epoch. Hopefully it's enough to look for a 1. msg. But hmm when should we search for that? If we call excludeMembers again with the exact same args? Should excludeMembers maybe just post exclude-member, and msgs 2. and 3. should be left to listeners?
  3. Lastly all remaining members are re-added with group/add-member messages. The lib/epoch function getMissingMembers is probably very helpful here.

Todos:

@Powersource
Copy link
Collaborator Author

Powersource commented May 24, 2023

Gonna write some more notes to try to figure out exactly how I should go about this.

Goals:

  1. We don't want a dangling exclude-member message in an epoch, that didn't end up having any effect
  2. We don't want a dangling epoch init, in an epoch that didn't end up with any members.
  3. We everyone to get added to the new epoch except for the excluded peers.

Questions:

  1. Do we want to help others with their failed exclusions? Maybe yeah? Since the group/epoch is a common resource and one person crashing might break the group for all of us.
  2. When do we fix a broken state? When calling the function again? If other people should be able to fix it too, then they'll want a listener. Do we want to use that listener for ourselves as well? A listener would only check again on restart, is that fine?

@staltz
Copy link
Member

staltz commented Aug 21, 2023

Do we want to help others with their failed exclusions? Maybe yeah? Since the group/epoch is a common resource and one person crashing might break the group for all of us.

Tough question, but I'm also leaning towards anyone helping proceed with the exclusion, simply because that dangling exclude-member msg may be confusing.

  1. When do we fix a broken state? When calling the function again? If other people should be able to fix it too, then they'll want a listener. Do we want to use that listener for ourselves as well? A listener would only check again on restart, is that fine?

Being eager about it shouldn't be a problem, because of the "same membership" forked epoch resolution. So if admin A tried to exclude Oscar but stopped in between, then admins B and C can proceed to do it, and they will create two forked epochs, but they'll have the same membership set, and then tie breaking rule applies.

In terms of code, I don't know how to organize it.

@Powersource
Copy link
Collaborator Author

Tough question, but I'm also leaning towards anyone helping proceed with the exclusion, simply because that dangling exclude-member msg may be confusing.

Yeah I think I basically ended up going with being agnostic towards who made the breaking state.

Being eager about it shouldn't be a problem, because of the "same membership" forked epoch resolution.

I think I was about to try the eager solution as well but ended up deciding against it, since most/all the recovery logic uses long-ish timeouts in it, which would make regular function usage way too slow.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
must Needed to complete grant milestone
Projects
None yet
2 participants