Disk creation/deletion allocates Crucible regions via sagas #511

smklein · 2021-12-14T17:03:07Z

This PR re-works the disk creation/deletion pathways to allocate Crucible Downstairs regions via sagas.

Nexus

Modifies the disk creation API to actually allocate regions, ensure they exist.
Similarly, modifies the disk deletion API to actually remove those regions.
Converts "disk creation" and "disk deletion" to use sagas.

Datastore

Expose APIs to idempotently allocate regions to back a disk.
Moves "check if disk attached" error into the project_delete_disk API, to avoid the documented race condition.

Sled Agent

Create a simulated Crucible Agent service, with hooks into the simulated Sled Agent.

Tests

Splits the disk integration tests into smaller tests
Adds datastore-specific tests for region allocation
Adds disk integration tests for region allocation, interaction with crucible agent, and undoing saga actions.

…base

nexus/src/db/datastore.rs

smklein · 2022-01-13T18:57:24Z

nexus/src/db/datastore.rs

+        type TxnError = TransactionError<RegionAllocateError>;
+        let params: params::DiskCreate = params.clone();
+        self.pool()
+            .transaction(move |conn| {


That's right. I felt the complication of looking up regions + datasets + other auxiliary data was fairly complex, and rather than optimizing a CTE for it (especially as the allocation algorithm might change) I figured I'd start with something "easy-to-understand, but less optimized".

Is that okay?

nexus/src/db/datastore.rs

smklein · 2022-01-14T23:05:45Z

nexus/src/db/datastore.rs

+                        Region::new(
+                            dataset.id(),
+                            disk_id,
+                            params.block_size().try_into().unwrap(),


Sure, I'll update the type of them to ByteCount. I admittedly am still inclined to leave "extent_count" as something other than a ByteCount, since it is a count, not a size.

nexus/src/sagas.rs

smklein · 2022-01-19T20:19:00Z

nexus/src/sagas.rs

+fn saga_disk_delete() -> SagaTemplate<SagaDiskDelete> {
+    let mut template_builder = SagaTemplateBuilder::new();
+
+    template_builder.append(


I thought it would be critical to modify the disk record before anything else - if we don't update the disk record first, couldn't other concurrent operations poke and prod at the disk while we're tearing down the backing storage?

In the "disk creation" saga, we include a final step to "finalize" the disk, which basically exposes it for access. I figured the most important invariant was "if a disk has been created, it should be backed by functioning regions at all time" - so taking it out of the rotation felt like the most important step, before taking out regions.

sled-agent/src/sim/http_entrypoints_storage.rs

…s still acting like crucible-agent)

smklein · 2022-01-20T04:50:41Z

Thanks for the reviews, y'all. I appreciate the help, and know this PR is a lot to get through.

With this most recent change, I've gotten rid of FULL SCANs by storing some usage info in the Dataset table.

Nice! This is a big piece of work and looks good.

How do you feel about the saga? My read is that the saga action is working just as we hoped. It's fairly straightforward to reason about the individual actions. And looking at what's involved, it feels untenable to try to write this code by hand (like, without sagas or something like it) and have it correctly unwind state on failure and work correctly after a crash, etc.

I think this is largely correct. Breaking the work into smaller units definitely makes reasoning about it more tractable, and it's nice to have the automated support for calling the "unwind" functions. However, I keep on struggling to be sure that the operations I'm writing are idempotent.

I'd really like to write more tests for these conditions (repeating each action, undoing from each node, repeating the undo actions) because they feel fairly easy to miss, and yet pretty dangerous if ultimately wrong.

leftwo

I believe all my concerns have been addressed. You have my approval, for what it's worth :)

davepacheco

Thanks for making a bunch of those changes!

nexus/src/db/datastore.rs

davepacheco · 2022-01-21T17:39:45Z

nexus/src/nexus.rs

            timeseries_client,
        };

        /* TODO-cleanup all the extra Arcs here seems wrong */


What does "saga_type" => "recovery" mean? Is it that this saga has been recovered as opposed to having been created by an API call handled by this process?

When creating the other logger in execute_saga, I created the logger with the key template_name.

In this recovery setup, however, we create a SagaContext object before we know what templates we'll be processing.

So basically, yeah: I wanted some way to distinguish the "saga context for recovery" vs "the normal sagas".

This is totally arbitrary though, if we'd prefer different keys, this could change.

Sounds reasonable. It'll be good to eventually get the template name in those too but we can do that later!

davepacheco · 2022-01-21T18:04:43Z

sled-agent/src/sim/http_entrypoints_storage.rs

+// License, v. 2.0. If a copy of the MPL was not distributed with this
+// file, You can obtain one at https://mozilla.org/MPL/2.0/.
+
+//! HTTP entrypoint functions for simulating the storage agent API.


Alright. How do we make sure this stays in sync with the real Crucible? How would someone modifying Crucible even know that they have to update this too?

davepacheco · 2022-01-21T18:08:10Z

sled-agent/src/sim/storage.rs

+// License, v. 2.0. If a copy of the MPL was not distributed with this
+// file, You can obtain one at https://mozilla.org/MPL/2.0/.
+
+//! Simulated sled agent storage implementation


That makes sense. It's just a little confusing -- in general, "disks" and "storage" are sort of synonymous. It sounds like the real difference between one of these is that one is virtualized -- one is an RFD 4 Disk and the other is a general-purpose lower-level storage subsystem.

#633) #511 made disk allocation more "real" - disks are allocated from a group of datasets. Even for the Simulated Sled Agent, Crucible Regions may be allocated atop a Crucible Dataset (though the data plane won't exist). However, this wasn't the default when running the "simulated sled agent" binary. This PR adds a default for the simulated sled agent: "pretend you have 10 zpools (representing U.2 storage), each with 1 TB".

smklein added 30 commits December 8, 2021 14:05

[nexus] Refactor test-utilities to helper crate, add test benchmarks

082c009

No need to be screamy about disk posting, that's for another PR

444aaa0

Optimize CRDB setup by using 'compile-time' population of a seed data…

42727a7

…base

Merge branch 'main' into benchmark-tests

d1c1345

Add to top-level workspace

54b6f68

Merge branch 'benchmark-tests' into optimize-crdb-setup

6850f67

Add some explanations

c736fe0

re-run if build.rs changes

05aed1b

The temporary directory might not exist

304d03d

Merge branch 'main' into benchmark-tests

61d3fd3

Merge branch 'benchmark-tests' into optimize-crdb-setup

a992ee8

proby dropshot

c5427aa

Remove test-utils from default workspace

fea3cd9

Download database executables *before* we build/test all targets

0153e19

Merge branch 'main' into benchmark-tests

0bcdc28

Merge branch 'benchmark-tests' into optimize-crdb-setup

52b31c6

Adjust path when building, we want that executable for our build script

a0c88a4

Add skeleton of saga-based structure

56718af

Skeleton of test

a00cd7c

Merge branch 'main' into benchmark-tests

43b75c9

Merge branch 'benchmark-tests' into optimize-crdb-setup

3dd98d8

Merge branch 'main' into benchmark-tests

419d89e

Merge branch 'benchmark-tests' into optimize-crdb-setup

7b9dc1b

Merge branch 'main' into optimize-crdb-setup

9ad763c

fixup

1b19c57

review feedback

4f9feef

OUT of the default members again gah

8eb0549

Merge branch 'main' into optimize-crdb-setup

e8a5f91

Merge branch 'optimize-crdb-setup' into regions

dcec64c

... allocation query still WIP

0d673ba

smklein added 2 commits January 19, 2022 00:04

Implement QueryID

20eaafa

Merge branch 'explain' into regions

fac6456

smklein mentioned this pull request Jan 19, 2022

Nexus needs reconciliation process for Sled failure / removal #612

Closed

Make explained Queries work (requiring QueryId), update tests, fmt

18add71

smklein mentioned this pull request Jan 19, 2022

Modify Disk/Region management to use RegionSets #613

Open

logs, idempotency, and setting concurrency limits

39d2c6b

smklein commented Jan 19, 2022

View reviewed changes

smklein mentioned this pull request Jan 19, 2022

Change name of "extent_size" in Crucible Agent API oxidecomputer/crucible#181

Open

extent_size -> blocks_per_extent (everywhere but sled-agent/, which i…

64ad4f2

…s still acting like crucible-agent)

smklein mentioned this pull request Jan 19, 2022

Advertise supported range of block / extent sizes oxidecomputer/crucible#183

Open

Store used size of datasets, fix indices, avoid full table scans

e893253

leftwo approved these changes Jan 20, 2022

View reviewed changes

smklein added 6 commits January 21, 2022 10:10

Merge branch 'main' into explain

91c3965

Add expectorate

2d918fb

I forgot to EXPECTORATE the right newline

08bcfdc

Merge branch 'explain' into regions

36201b9

Merge branch 'regions' into use-json-schema

5664434

Update deps, rely on new implied JsonSchema derives

8e2f3a5

Base automatically changed from explain to main January 21, 2022 15:57

smklein added 2 commits January 21, 2022 11:03

Merge branch 'main' into regions

2d337a4

Merge branch 'use-json-schema' into regions

d06cc75

smklein mentioned this pull request Jan 21, 2022

Make use of the #derive(JsonSchema) addition to progenitor #580

Closed

davepacheco reviewed Jan 21, 2022

View reviewed changes

smklein added 2 commits January 21, 2022 14:51

Clarifying comment

c24a46e

Merge branch 'main' into regions

e2df003

smklein merged commit a75149c into main Jan 24, 2022

smklein deleted the regions branch January 24, 2022 17:40

smklein mentioned this pull request Jan 25, 2022

Add configuration - with defaults - to enable virtualized sled storage #633

Merged

Disk creation/deletion allocates Crucible regions via sagas #511

Disk creation/deletion allocates Crucible regions via sagas #511

Uh oh!

Conversation

smklein commented Dec 14, 2021 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Nexus

Datastore

Sled Agent

Tests

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

smklein commented Jan 20, 2022

Uh oh!

leftwo left a comment

Choose a reason for hiding this comment

Uh oh!

davepacheco left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

smklein commented Dec 14, 2021 •

edited

Loading