Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Illumos #3145, #3212 #1160

Closed
wants to merge 1 commit into from
Closed

Conversation

behlendorf
Copy link
Contributor

illumos-gate/commit/9253d63df408bb48584e0b1abfcc24ef2472382e
Illumos changeset: 13840:97fd5cdf328a

3145 single-copy arc
3212 ztest: race condition between vdev_online() and spa_vdev_remove()

Reviewed by: Matt Ahrens matthew.ahrens@delphix.com
Reviewed by: Adam Leventhal ahl@delphix.com
Reviewed by: Eric Schrock eric.schrock@delphix.com
Reviewed by: Justin T. Gibbs gibbs@scsiguy.com
Approved by: Eric Schrock eric.schrock@delphix.com

Ported-by: Brian Behlendorf behlendorf1@llnl.gov
Issue #989
Issue #1137

illumos-gate/commit/9253d63df408bb48584e0b1abfcc24ef2472382e
Illumos changeset: 13840:97fd5cdf328a

3145 single-copy arc
3212 ztest: race condition between vdev_online() and spa_vdev_remove()

Reviewed by: Matt Ahrens <matthew.ahrens@delphix.com>
Reviewed by: Adam Leventhal <ahl@delphix.com>
Reviewed by: Eric Schrock <eric.schrock@delphix.com>
Reviewed by: Justin T. Gibbs <gibbs@scsiguy.com>
Approved by: Eric Schrock <eric.schrock@delphix.com>

Ported-by: Brian Behlendorf <behlendorf1@llnl.gov>
Issue openzfs#989
Issue openzfs#1137
@behlendorf
Copy link
Contributor Author

Merged as part of the feature flags branch for #3212

1eb5bfa Illumos #3145, #3212

@behlendorf behlendorf closed this Jan 8, 2013
pcd1193182 pushed a commit to pcd1193182/zfs that referenced this pull request Sep 26, 2023
When restarting the agent, it uses ListObjects operations to find any objects
that are part of the in-progress TXG.  If there are a large number of these
objects, this may take a while, resulting in the following agent panic:

```
thread 'zoa' panicked at 'called `Result::unwrap()` on an `Err` value: request has timed out

Caused by:
    operation attempt timeout (single attempt) occurred after 2s', zettaobject/src/object_access/mod.rs:491:42
stack backtrace:
...
      zettaobject::object_access::ObjectAccess::try_list_after::{{closure}}
             at zfs/cmd/zfs_object_agent/zettaobject/src/object_access/mod.rs:491:35
...
   8: zettaobject::pool::recover_list::{{closure}}::{{closure}}::{{closure}}
             at zfs/cmd/zfs_object_agent/zettaobject/src/pool.rs:1928:17
```

The problem is that we configure the “retrying” client of the AWS SDK to have a
2 second timeout.  (This “retrying” client is only used for ListObjects
requests.) After this timeout expires, the SDK returns an error.  However, in
this case the request may reasonably take longer, and we want to keep waiting.

The fix is to not configure a timeout on the “retrying” client.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

1 participant