Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
DLPX-87575 agent panic: request has timed out (openzfs#1160)
When restarting the agent, it uses ListObjects operations to find any objects that are part of the in-progress TXG. If there are a large number of these objects, this may take a while, resulting in the following agent panic: ``` thread 'zoa' panicked at 'called `Result::unwrap()` on an `Err` value: request has timed out Caused by: operation attempt timeout (single attempt) occurred after 2s', zettaobject/src/object_access/mod.rs:491:42 stack backtrace: ... zettaobject::object_access::ObjectAccess::try_list_after::{{closure}} at zfs/cmd/zfs_object_agent/zettaobject/src/object_access/mod.rs:491:35 ... 8: zettaobject::pool::recover_list::{{closure}}::{{closure}}::{{closure}} at zfs/cmd/zfs_object_agent/zettaobject/src/pool.rs:1928:17 ``` The problem is that we configure the “retrying” client of the AWS SDK to have a 2 second timeout. (This “retrying” client is only used for ListObjects requests.) After this timeout expires, the SDK returns an error. However, in this case the request may reasonably take longer, and we want to keep waiting. The fix is to not configure a timeout on the “retrying” client.
- Loading branch information