You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
satellite/repair: decouple piece decoding from piece uploading
We have a situation in the repairer where slow uploads cause the error
"internal: quiescence" to be returned from _all_ ongoing uploads,
failing a repair entirely. The quiescence machinery is meant to catch
inactive/stalled _downloads_, but our downloading has already finished
by the time this happens.
It appears that what is happening is this: the piece decoder (the part
that recreates the original segment content) is connected directly to
the piece encoder (the part that encodes new pieces). When the repairer
is doing an upload, it is calling into both of those parts by way of
io.Copy. The quiescence machinery is built into the decoder. When
uploads are especially slow, the piece decoder can't read into the
temporary-file piece contents very far because it can't get too far
ahead (limited buffer size, backpressure, yada yada). This causes the
decoder to complain that it hasn't read anything for 5 seconds, the
threshold for the quiescence error.
It looks like a bit of a complicated job to make an option that disables
quiescence checking in uplink. Instead, we will decouple the decoder and
the encoder in the repairer. Writing the whole segment contents to a
local tempfile through the decoder should make sure that the decoder
doesn't ever observe a 5 second stall (barring some crazy disk problem).
Then the encoder and the uploader can continue from the point of the
reconstructed segment, and it can take as long as it needs to take.
Once it is possible to suppress or avoid the quiescence error in
eestream.decodedReader, we can remove this tempfile step.
Change-Id: I71c68b3460fc4129320364a0514f893e1c47876e
0 commit comments