Algorithms should not return errors if <⅓ is faulty. #24

afck · 2018-05-15T07:09:52Z

We should double-check some of our error conditions:

The algorithms should never return an error in any circumstance that can be brought about by <⅓ faulty nodes. In any such case, we need to clearly define the behavior (e.g. terminate without output) in a way that all good nodes will do the same. The problem with errors is that they can cause the user (e.g. a higher-level algorithm) to return early from a function, and fail to follow the correct protocol.
For things that can only happen if ≥⅓ is faulty, returning an error is probably the right thing: there is no correct protocol anymore, and the user needs to be alerted and needs to be able to decide programmatically what to do.
In cases that can only happen due to programming errors, I'd say we should use expect, with a string that argues why this cannot fail. (E.g. parity seems to do this.) Error variants whose instantiation should be unreachable code are confusing to the user.

The text was updated successfully, but these errors were encountered:

afck · 2018-05-30T14:02:51Z

On the other hand we probably do want some way to communicate faulty nodes to the user (and propagate that information upwards through the hierarchy of algorithms). I still don't think we should use errors for that: it should be fine to use ? everywhere and assume that won't return if <⅓ are faulty. Not sure what the best alternative is. Probably something like DistAlgorithm::get_faulty_node_ids?

vkomenda · 2018-05-30T14:06:15Z

Or, more informative, DistAlgorithm::get_fault_log?

afck · 2018-05-30T14:17:42Z

Another case that's not quite clear is whether an error should be returned from handle_message if the algorithm has already terminated. Maybe it would be safer to just ignore the message and return? I think it is legitimate to use the API in a way where e.g. you'd have one "garbage collection" loop, and not check for termination after each handle_message call. What makes things worse is that we call handle_message internally, too, and I'm not even sure that can't happen after termination (especially with the message queue), so I'll remove that kind of error for now.

Similarly for Agreement's InputNotAccepted error. Or maybe that should be split into two cases:

If the user supplied more than one input that is basically a programming error, and we should at least return an error there. (Or even panic??)
If the epoch is greater than 0, that is a legitimate use-case again, I think, even in the way we're using it in CommonSubset: If we just received the broadcast rather late, and the others have already provided input for the agreement instance, that's totally fine. Our input should simply be ignored.

afck · 2018-10-10T15:09:44Z

I think all of this has been addressed.

afck self-assigned this May 30, 2018

This was referenced May 30, 2018

Report faulty nodes. #51

Closed

Make Broadcast handle large payloads, change some errors. #52

Merged

afck removed their assignment Jun 21, 2018

afck closed this as completed Oct 10, 2018

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Algorithms should not return errors if <⅓ is faulty. #24

Algorithms should not return errors if <⅓ is faulty. #24

afck commented May 15, 2018

afck commented May 30, 2018

vkomenda commented May 30, 2018

afck commented May 30, 2018

afck commented Oct 10, 2018

Algorithms should not return errors if <⅓ is faulty. #24

Algorithms should not return errors if <⅓ is faulty. #24

Comments

afck commented May 15, 2018

afck commented May 30, 2018

vkomenda commented May 30, 2018

afck commented May 30, 2018

afck commented Oct 10, 2018