-
Notifications
You must be signed in to change notification settings - Fork 62
Prevent region allocation from filling pools #7912
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Prevent region allocation from filling pools #7912
Conversation
Recent customer issues have highlighted problems with storage accounting, namely that while there are quotas and reservations for individual Crucible regions, there's nothing set for the whole Crucible dataset. Crucible _could_ end up using the whole disk, or some large fraction of it, such that other users of the same U2 could be starved out. This commit adds a buffer to each zpool that the Crucible region allocation query will not allocate into. This overhead will be set to 250G initially (see oxidecomputer#7875 for reasoning) but could also be modified with omdb. Part of this commit's changes include using a CTE with `regions_hard_delete`, which is much more efficient than the previous for loop but has the effect of overwriting `size_used` for all datasets, which will undo any time this column value was manually set to prevent allocation for particular datasets / pools. Because of this, this commit also adds a `no_provision` flag for a Crucible dataset: if it is set, then the region allocation query will not allocate into that dataset. This flag can be toggled with omdb. Part of the upgrade to R14 will include a support procedure to address if the addition of the control plane storage buffer of 250G causes a Crucible dataset to be "overprovisioned", necessitating manually requested region replacement requests to reduce the size allocated for a particular Crucible dataset. This commit adds an omdb command to show all overprovisioned crucible datasets, and changes the region listing command so it can list regions for a particular dataset. Fixes oxidecomputer#3480
This is using the existing zpool size numbers, and then taking 250 off of that right? |
If we had manually set the |
Yes, and yes :) |
Correct |
|
This will prevent crucible from using up space (which is a good thing) and I know urgency for R14 too. This won't prevent some other service from using up all the space in the pool, even if we keep Crucible in check. |
This is true, but we planned to punt on quotas and reservations because assigning them can fail for deployed racks, and we need a fall-back plan. The "failure mode" introduced in this PR is "we don't overprovision further", which is a good thing, and won't need support staff to remediate. |
|
But anyway, yes, definitely do want quotas / reservations to help limit this issue from crossing abstraction boundaries |
Just wanted to be sure we don't forget to do the additional work, and be sure the casual reader did not think that all the problems are now solved. Things will be better with this, and will continue getting better :) |
smklein
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good, nice job getting the non-provisionable + buffer merged together. And thanks for the tests!
| Ok(()) | ||
| } | ||
|
|
||
| async fn cmd_crucible_dataset_show_overprovisioned( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is an omdb command, so my bar for testing there is lower than it would be otherwise, but have you tested this API? (Even manually)
(I'm bringing this scrutiny because this command seems really useful, actually, and frankly like something we might want to pull into Nexus in the future)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I tested it by filling up all the space I could, then increasing the storage_buffer:
EVT22200005 # omdb-7912 db crucible-dataset show-overprovisioned 2> /dev/null
ID |SIZE_USED |NO_PROVISION |POOL_ID |CONTROL_PLANE_STORAGE_BUFFER |POOL_TOTAL_SIZE
-------------------------------------+--------------+-------------+-------------------------------------+-----------------------------+----------------
12e4105b-3dde-40ff-9f12-5ce5d57f8b4f |2925946470400 |false |02d72adc-f403-4eef-bede-3a2a860a22a3 |375809638400 |3195455668224
leftwo
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Still looking, but I wanted these comments posted now.
| Ok(()) | ||
| } | ||
|
|
||
| async fn cmd_crucible_dataset_show_overprovisioned( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I tested it by filling up all the space I could, then increasing the storage_buffer:
EVT22200005 # omdb-7912 db crucible-dataset show-overprovisioned 2> /dev/null
ID |SIZE_USED |NO_PROVISION |POOL_ID |CONTROL_PLANE_STORAGE_BUFFER |POOL_TOTAL_SIZE
-------------------------------------+--------------+-------------+-------------------------------------+-----------------------------+----------------
12e4105b-3dde-40ff-9f12-5ce5d57f8b4f |2925946470400 |false |02d72adc-f403-4eef-bede-3a2a860a22a3 |375809638400 |3195455668224
|
Could we have an |
leftwo
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just a few questions and comment comments
Done in 5de0a56 |
leftwo
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Here is the omdb with usage output:
EVT22200005 # omdb-7912 db crucible-dataset list 2> /dev/null
ID |TIME_DELETED |POOL_ID |ADDRESS |SIZE_USED |NO_PROVISION |CONTROL_PLANE_STORAGE_BUFFER |POOL_TOTAL_SIZE |SIZE_LEFT
-------------------------------------+-------------+-------------------------------------+-------------------------------+-------------+-------------+-----------------------------+----------------+--------------
12e4105b-3dde-40ff-9f12-5ce5d57f8b4f | |02d72adc-f403-4eef-bede-3a2a860a22a3 |[fd00:1122:3344:101::12]:32345 |132875550720 |false |375809638400 |3195455668224 |2686770479104
1e9c36ea-9b9e-4762-b729-01124dc3d56c | |83cb6813-d89a-4996-bae8-43609c63b1dd |[fd00:1122:3344:101::13]:32345 |71135395840 |false |268435456000 |3195455668224 |2855884816384
238884a7-d934-4325-8d38-d6f5be067b3e | |fd1796f2-f671-4d60-99dc-e1194024bebb |[fd00:1122:3344:101::18]:32345 |230854492160 |false |268435456000 |3195455668224 |2696165720064
5b1defd7-249c-452b-9d4c-f3464195c659 | |571bc087-1c0a-4378-ab37-7a42633b05ae |[fd00:1122:3344:101::16]:32345 |175825223680 |false |268435456000 |3195455668224 |2751194988544
8cf5a521-a139-4596-bd3f-079c544381f1 | |0fd4c7d7-2547-4769-a55f-54c1284f24e4 |[fd00:1122:3344:101::15]:32345 |71135395840 |false |268435456000 |3195455668224 |2855884816384
9d233755-b283-4b00-9f6c-1ee6492ad4d7 | |0c4db663-9601-45f3-b6b3-bde209e8f7d7 |[fd00:1122:3344:101::17]:32345 |147639500800 |false |268435456000 |3195455668224 |2779380711424
daed3ed0-74ae-4acc-94b5-b8e4d26a97f9 | |fb3ea440-ff27-43c4-bf37-a3f1d60ae265 |[fd00:1122:3344:101::14]:32345 |49660559360 |false |268435456000 |3195455668224 |2877359652864
Recent customer issues have highlighted problems with storage accounting, namely that while there are quotas and reservations for individual Crucible regions, there's nothing set for the whole Crucible dataset. Crucible could end up using the whole disk, or some large fraction of it, such that other users of the same U2 could be starved out.
This commit adds a buffer to each zpool that the Crucible region allocation query will not allocate into. This overhead will be set to 250G initially (see #7875 for reasoning) but could also be modified with omdb.
Part of this commit's changes include using a CTE with
regions_hard_delete, which is much more efficient than the previous for loop but has the effect of overwritingsize_usedfor all datasets, which will undo any time this column value was manually set to prevent allocation for particular datasets / pools. Because of this, this commit also adds ano_provisionflag for a Crucible dataset: if it is set, then the region allocation query will not allocate into that dataset. This flag can be toggled with omdb.Part of the upgrade to R14 will include a support procedure to address if the addition of the control plane storage buffer of 250G causes a Crucible dataset to be "overprovisioned", necessitating manually requested region replacement requests to reduce the size allocated for a particular Crucible dataset. This commit adds an omdb command to show all overprovisioned crucible datasets, and changes the region listing command so it can list regions for a particular dataset.
Fixes #3480