Dispatch reduce
tasks for all unmaterialized entries during start up
#441
Milestone
reduce
tasks for all unmaterialized entries during start up
#441
Our materializer has two sorts of "events" which are important to re-attempt when a node quit prematurely to assure we're not losing data:
They seem related but actually are independent from each other: Tasks do not necessarily represent arriving operations. Let's say an operation arrives for the first time, kicks in a
reduce
task, followed by adependency
task. Now the node got shut off before thatdependency
task finished. We're sending that operation again on restart to re-attempt that flow, thereduce
task will quit early, saying it already has done its work last time. Nodependency
task will be dispatched, we're having a problem and lost data.This is also true vice-versa: Tasks are handled too late in some race conditions where operations got successfully stored, but the node quit before the
reduce
task got created. We've lost data again.The first point (Tasks) we already solved, but we need to also account for unmaterialized operations. This was not possible until now, since it wasn't easy to distinct in our database if an operation has been materialized or not. Now we have a
sorted_index
which represents that state, see: #438On node startup we should check which operations have
sorted_index = None
and then issuereduce
tasks for them.The text was updated successfully, but these errors were encountered: