Skip to content

Conversation

@jmpesp
Copy link
Contributor

@jmpesp jmpesp commented Apr 10, 2025

Recently, regions_hard_delete changed to use a CTE to update the size_used column of the crucible_dataset table. Unfortunately this had the side effect of causing a massive amount of contention: the CTE would

  1. delete rows from the regions table
  2. read from the regions table during the CTE
  3. update the size_used column for all crucible_dataset rows

This would almost certainly cause each invocation of the CTE to contend with each other, as seen when doing disk deletes in parallel.

This commit changes the CTE to:

  1. delete rows from the regions table, returning the affected datasets
  2. read from the regions table during the CTE
  3. update the size_used column for affected crucible_dataset rows only.

This was tested by using terraform to create and tear down 90 disks, with a parallelism setting of 10, 20, and 30. Before this change, this would not work as Nexus would inevitably return 500s.

Fixes #7952

Recently, regions_hard_delete changed to use a CTE to update the
size_used column of the crucible_dataset table. Unfortunately this had
the side effect of causing a massive amount of contention: the CTE would

1) delete rows from the regions table
2) read from the regions table during the CTE
3) update the size_used column for _all_ crucible_dataset rows

This would almost certainly cause each invocation of the CTE to
contend with each other, as seen when doing disk deletes in parallel.

This commit changes the CTE to:

1) delete rows from the regions table, returning the affected datasets
2) read from the regions table during the CTE
3) update the size_used column for affected crucible_dataset rows only.

This was tested by using terraform to create and tear down 90 disks,
with a parallelism setting of 10, 20, and 30. Before this change, this
would not work as Nexus would inevitably return 500s.

Fixes oxidecomputer#7952
@jmpesp jmpesp requested a review from smklein April 10, 2025 22:07
.execute_async(&conn)
.await?;
let query =
regions_hard_delete::dataset_update_query(dataset_ids);
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yup, makes a lot of sense to have this be much more scoped!

use uuid::Uuid;

/// Update the affected Crucible dataset rows after hard-deleting regions
pub fn dataset_update_query(
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This could probably do with an EXPECTORATE + EXPLAIN test, to make it easier for future changes.

Copy link
Contributor Author

@jmpesp jmpesp Apr 11, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍 done in 5522603

Comment on lines 42 to 49
for (idx, dataset_id) in dataset_ids.into_iter().enumerate() {
if idx != 0 {
builder.sql(",");
}
builder.param().bind::<sql_types::Uuid, _>(dataset_id);
}

builder.sql(
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would it be possible to use:

builder.param().bind::<diesel::pg::sql_types::Array<sql_types::Uuid>, _>(dataset_ids)

Instead of creating a new bind parameter for each individual index?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I started with this, but hit a runtime error, something like couldn't map from uuid[] -> uuid - maybe I was doing it wrong? idk

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thread 'db::queries::regions_hard_delete::test::explainable' panicked at nexus/db-queries/src/db/queries/regions_hard_delete.rs:104:14:
Failed to explain query - is it valid SQL?: DatabaseError(Unknown, "invalid cast: uuid[] -> uuid")

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The following works for me:

-      crucible_dataset.id IN (",
-    );
-
-    for (idx, dataset_id) in dataset_ids.into_iter().enumerate() {
-        if idx != 0 {
-            builder.sql(",");
-        }
-        builder.param().bind::<sql_types::Uuid, _>(dataset_id);
-    }
-
-    builder.sql(
-        ")
+      crucible_dataset.id = ANY (",
+        )
+        .param()
+        .bind::<diesel::pg::sql_types::Array<sql_types::Uuid>, _>(dataset_ids)
+        .sql(
+            ")

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice, yeah - I missed how IN is not the same as = ANY, changed in fc37358

@jmpesp jmpesp enabled auto-merge (squash) April 11, 2025 18:16
@jmpesp jmpesp merged commit 2fd10bb into oxidecomputer:main Apr 11, 2025
16 checks passed
@jmpesp jmpesp deleted the regions_hard_delete_cte_contention branch April 11, 2025 19:37
iliana pushed a commit that referenced this pull request Apr 11, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Slow disk deletion and resurrection of previously deleted disks

2 participants