-
Notifications
You must be signed in to change notification settings - Fork 242
Description
So today after delivering PR #600, I moved onto tests with the new Perf CLI branch that confirms both nodes, and with logging set to info.
Very interestingly, I found sawtooth graphs with 30second delays where the throughput stalled (from hundreds of TPS to zero).
This is the error I managed to capture - noting I could only reproduce the error with info logging (with debug long runs didn't hit it - even though I could still achieve similar TPS) :
[2022-03-15T03:43:43.151Z] ERROR node_1: Failed to parse payload referred in batch ID '7ec13697-ef79-48a4-9be2-94a687f35c5a' from transaction '000000000577/000000/000000': context deadline exceeded (Client.Timeout or context cancellation while reading body) role=event-manager
IPFS is well known for handling error with delay+timeout, so we need to handle that behavior.
We cannot afford the whole listener to blocks for this kind of error. So I think we need a set of download workers that cause rewinds for IPFS, a lot more like we have for DX events.
This does need some investigation to plan the right fix.