Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Algorithms should not return errors if <⅓ is faulty. #24

Closed
afck opened this issue May 15, 2018 · 4 comments
Closed

Algorithms should not return errors if <⅓ is faulty. #24

afck opened this issue May 15, 2018 · 4 comments

Comments

@afck
Copy link
Collaborator

afck commented May 15, 2018

We should double-check some of our error conditions:

  • The algorithms should never return an error in any circumstance that can be brought about by <⅓ faulty nodes. In any such case, we need to clearly define the behavior (e.g. terminate without output) in a way that all good nodes will do the same. The problem with errors is that they can cause the user (e.g. a higher-level algorithm) to return early from a function, and fail to follow the correct protocol.
  • For things that can only happen if ≥⅓ is faulty, returning an error is probably the right thing: there is no correct protocol anymore, and the user needs to be alerted and needs to be able to decide programmatically what to do.
  • In cases that can only happen due to programming errors, I'd say we should use expect, with a string that argues why this cannot fail. (E.g. parity seems to do this.) Error variants whose instantiation should be unreachable code are confusing to the user.
@afck afck self-assigned this May 30, 2018
@afck
Copy link
Collaborator Author

afck commented May 30, 2018

On the other hand we probably do want some way to communicate faulty nodes to the user (and propagate that information upwards through the hierarchy of algorithms). I still don't think we should use errors for that: it should be fine to use ? everywhere and assume that won't return if <⅓ are faulty. Not sure what the best alternative is. Probably something like DistAlgorithm::get_faulty_node_ids?

@vkomenda
Copy link
Contributor

Or, more informative, DistAlgorithm::get_fault_log?

@afck
Copy link
Collaborator Author

afck commented May 30, 2018

Another case that's not quite clear is whether an error should be returned from handle_message if the algorithm has already terminated. Maybe it would be safer to just ignore the message and return? I think it is legitimate to use the API in a way where e.g. you'd have one "garbage collection" loop, and not check for termination after each handle_message call. What makes things worse is that we call handle_message internally, too, and I'm not even sure that can't happen after termination (especially with the message queue), so I'll remove that kind of error for now.

Similarly for Agreement's InputNotAccepted error. Or maybe that should be split into two cases:

  • If the user supplied more than one input that is basically a programming error, and we should at least return an error there. (Or even panic??)
  • If the epoch is greater than 0, that is a legitimate use-case again, I think, even in the way we're using it in CommonSubset: If we just received the broadcast rather late, and the others have already provided input for the agreement instance, that's totally fine. Our input should simply be ignored.

@afck afck removed their assignment Jun 21, 2018
@afck
Copy link
Collaborator Author

afck commented Oct 10, 2018

I think all of this has been addressed.

@afck afck closed this as completed Oct 10, 2018
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants