You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
If an error (not panic) occurs during the execution of an actor, the actor will exit with Err, and without reporting to the local barrier manager. The exiting of the actor will cause upstream/downstream to exit, and finally cause a failure when injecting the next barrier, which will be realized by the meta service and trigger the fail-over procedure.
However, if the in-flight concurrent checkpoint has reached the maximum (or is simply set to 1), the compute nodes will hang forever, while the heartbeat acts normally. We'll fail to recover the cluster.
We may need to explicitly report the error to the local barrier manager, so that the meta service will realize the failure ASAP.
If an error (not panic) occurs during the execution of an actor, the actor will exit with
Err
, and without reporting to the local barrier manager. The exiting of the actor will cause upstream/downstream to exit, and finally cause a failure when injecting the next barrier, which will be realized by the meta service and trigger the fail-over procedure.However, if the in-flight concurrent checkpoint has reached the maximum (or is simply set to
1
), the compute nodes will hang forever, while the heartbeat acts normally. We'll fail to recover the cluster.We may need to explicitly report the error to the local barrier manager, so that the meta service will realize the failure ASAP.
cc @StrikeW @yezizp2012
The text was updated successfully, but these errors were encountered: