New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add upper bound for slop space calculation #11023
Conversation
module/zfs/spa_misc.c
Outdated
| return (MAX(space >> spa_slop_shift, MIN(space >> 1, spa_min_slop))); | ||
| return (MIN(spa_max_slop, | ||
| MAX(space >> spa_slop_shift, | ||
| MIN(space >> 1, spa_min_slop)))); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It looks like the comment above this function needs to be updated too.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yea, there's a few comments that'll need to be updated. I didn't worry about that yet, until we all think this is something we can actually move forward.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we can and should. 128GiB feels plenty conservative.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@ahrens do you agree with @behlendorf?
I was going to try and see how I might gather some data to verify if this is a reasonable upper bound, but I'm not really sure how to do that, and didn't make much progress toward that goal during the hackathon yesterday.
|
This would be a welcome change, especially to those of us with drastically different pool sizes. It seems like a small change, could we get this added soon? |
|
@prakashsurya if you get a moment would you mind rebasing this on the latest code from the master branch. |
|
@behlendorf done. FWIW, we've determined this change isn't strictly necessary for our product, so it's not high my priority list anymore. Further, I was a little unsure how to guarantee the correctness of this change, for all the different pool configurations and workloads that are possible, which is why I haven't tried to push it forward since initially opening the PR. If we all collectively agree this is a safe change to make, as-is, I can try to push it forward; but if we think it'll take a decent amount of effort to guarantee it's safe to land, and/or requires a decent amount of changes to make it safe to land, I doubt I'll have the time to spend on it. |
|
@prakashsurya thanks! Agreed, I think the open question is can we determine the worst case upper bound for a pool. Or more specifically, how much space must we always leave free in order to accommodate any additional unaccounted for changes which may be required when syncing a transaction group. This include MOS updates, ZIL metaslabs, administrative operations, etc. Let me try and make a convincing argument. While I don't think we can calculate an exact value, I would suggest that This would effectively mean that for any pool smaller than 8TB there wouldn't be any change to the slop space (8TB * 3.2% = 256G). Which seems reasonable to me, though I could be talked in to picking an even more conservative value (512G or even 1TB). The other key concern would be the need to leave enough headroom to avoid pathologically bad allocation times when there isn't much free space. However, all of the recent allocator work has really improved things in this area so it's perhaps somewhat less of a concern. And frankly 256G of free space is a considerable amount of space by most measures! @ahrens what do you think? Are there other concerns, have I overlooked something important? My particular interest in this is for very large pools, >1PB, where we wind up unnecessarily reserving multiple terabytes of capacity. |
I agree. It's really the MOS updates that we need to worry about. The ZIL metaslabs are a known quantity, and "administrative operations" basically means changes to the MOS.
I don't see how the amount of dirty data is related to the amount of MOS updates, which kind of invalidates this line of reasoning. Let me try a few other lines of reasoning that end with a similar conclusion:
My takeaway is that it's hard to imagine ever needing more than hundreds of GB of slop space. |
|
A tunable for us with small systems would be amazing as well. I have 128gb ram on my 150tb raw server, no need to even reserve the max max value here. If only it could be tuned per pool. I have a 256gb root pool and 150tb & 48tb storage pools |
All good points! It sounds like we're in agreement this change as it stands is safe and very unlikely to cause any problems.
It's not, I didn't quite convey what I meant very well. Let me try again briefly. Back in commit 3ec3bc2 we relaxed the
This won't change the behavior at all for small pools. We still do want to reserve ~3.2% (1/2^5) of the total pool capacity, for your 256G pools that's going to be 8G. That seems pretty reasonable to me. |
|
Yeah I guess I was thinking if you are having the max limit set for 1tb ram and 1pb storage, it could still be safely less with 1/10th that limit on a smaller pool. |
|
@behlendorf I see. In practice, thanks to |
|
@prakashsurya @ahrens I've marked this PR ready for review and approved it. Based on the discussion above I suggest we go ahead and merge this change as is. |
|
@behlendorf OK, let me go back and fix up some of the comments that I haven't gotten to yet. |
|
@behlendorf I've updated some of the comments, and rebased onto master. If you and @ahrens are good with this, I'm good landing it. Thanks for all the help. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good to me, thanks!
module/zfs/spa_misc.c
Outdated
| /* | ||
| * Additionally, slop space should never exceed spa_max_slop. | ||
| */ | ||
| slop = MIN(slop, spa_max_slop); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
To be extra precise, on medium to large pools, I think that the space that is unavailable today is the sum of:
- the 4MB labels/boot region per vdev (also not included in the zpool SIZE property)
- the metaslab rounding (up to 16GB on large vdevs) per vdev (also not included in the zpool SIZE property)
- 1/32nd of the remainder (which is the sum of the embedded log metaslab and the return value of spa_get_slop_space()) (included in zpool SIZE but not zfs USED+AVAIL)
Considering the interaction with the embedded log class, I think that the proposed change is that for large pools (vdev size >2TB, total pool size >4TB), the space that's unavailable will be the sum of:
- the 4MB labels/boot region per vdev (also not included in the zpool SIZE property)
- the metaslab rounding (up to 16GB per vdev) (also not included in the zpool SIZE property)
- the embedded log metaslab (16GB per vdev) (included in zpool SIZE but not zfs USED+AVAIL)
- 128GB (included in zpool SIZE but not zfs USED+AVAIL)
The embedded log metaslab change kept the user-visible slop space (the difference between zpool SIZE and zfs USED+AVAIL) the same. The proposed change exposes this, so the user-visible slop space will depend on number of vdevs and the value of zfs_embedded_slog_min_ms.
We might want to keep the user-visible slop space constant regardless of the embedded log (on most configurations). For example we could set the default spa_max_slop to 256GB, but have spa_get_slop_space() return MAX(128GB, 256GB - embedded_log_space). That way with up to 8 top-level vdevs, the spa_max_slop exactly equals the user-visible slop space. To implement this, we would just move the new code before the embedded_log adjustment.
This change modifies the behavior of how we determine how much slop space to use in the pool, such that now it has an upper limit. The default upper limit is 128G, but is configurable via a tunable. Signed-off-by: Prakash Surya <prakash.surya@delphix.com>
|
@ahrens let me know if my latest revision is what you had in mind. |
|
@prakashsurya Looks great to me. |
This change modifies the behavior of how we determine how much slop space to use in the pool, such that now it has an upper limit. The default upper limit is 128G, but is configurable via a tunable. Reviewed-by: Matthew Ahrens <mahrens@delphix.com> Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Signed-off-by: Prakash Surya <prakash.surya@delphix.com> Closes openzfs#11023
|
Is there any plan to add this minor patch to the 2.0 or 2.1 branches? I have a 192TB, 120TB, and a 1TB pool on my main system. With the default |
This change modifies the behavior of how we determine how much slop space to use in the pool, such that now it has an upper limit. The default upper limit is 128G, but is configurable via a tunable. Reviewed-by: Matthew Ahrens <mahrens@delphix.com> Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Signed-off-by: Prakash Surya <prakash.surya@delphix.com> Closes openzfs#11023
This change modifies the behavior of how we determine how much slop space to use in the pool, such that now it has an upper limit. The default upper limit is 128G, but is configurable via a tunable. (Backporting note: Snipped out the embedded_log portion of the changes.) Reviewed-by: Matthew Ahrens <mahrens@delphix.com> Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Signed-off-by: Prakash Surya <prakash.surya@delphix.com> Closes openzfs#11023
This change modifies the behavior of how we determine how much slop space to use in the pool, such that now it has an upper limit. The default upper limit is 128G, but is configurable via a tunable. (Backporting note: Snipped out the embedded_log portion of the changes.) Reviewed-by: Matthew Ahrens <mahrens@delphix.com> Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Signed-off-by: Prakash Surya <prakash.surya@delphix.com> Closes #11023
This change modifies the behavior of how we determine how much slop space to use in the pool, such that now it has an upper limit. The default upper limit is 128G, but is configurable via a tunable. (Backporting note: Snipped out the embedded_log portion of the changes.) Reviewed-by: Matthew Ahrens <mahrens@delphix.com> Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Signed-off-by: Prakash Surya <prakash.surya@delphix.com> Closes #11023
This change modifies the behavior of how we determine how much slop
space to use in the pool, such that now it has an upper limit. The
default upper limit is 128G, but is able to be modified via the new variable.
Types of changes
Checklist:
Signed-off-by.