Skip to content

Add VolumeInfo#1928

Merged
jmpesp merged 2 commits into
oxidecomputer:mainfrom
jmpesp:volume_status
Apr 27, 2026
Merged

Add VolumeInfo#1928
jmpesp merged 2 commits into
oxidecomputer:mainfrom
jmpesp:volume_status

Conversation

@jmpesp
Copy link
Copy Markdown
Contributor

@jmpesp jmpesp commented Apr 23, 2026

Add a new VolumeInfo enum that will be returned and provide a richer tree of information that matches the shape of the Volume. The intended consumer of this is the control plane in a few areas:

  • when performing region replacement or region snapshot replacement, the control plane needs to know when to consider the live repair or reconciliation successful, ultimately proceeding to cleaning up the temporary resources and continuing with another replacement. The upstairs has always been the source of this answer but the plan is to move from using activation as this signal to using this introduced enum

  • when activating with only 2 out of 3 downstairs available, the control plane needs to know the difference between an unhealthy volume and one that activated early with 2 out of 3 but eventually had all 3 mirrors online.

  • in the future, the control plane could query for health when performing updates or sled reboots, pausing until impacted Volumes become healthy again

The VolumeInfo enum already existed so this PR renames that to VolumeExtentInfo. Eventually we should probably combine the two.

Add a new VolumeInfo enum that will be returned and provide a richer
tree of information that matches the shape of the Volume. The intended
consumer of this is the control plane in a few areas:

- when performing region replacement or region snapshot replacement, the
  control plane needs to know when to consider the live repair or
  reconciliation successful, ultimately proceeding to cleaning up the
  temporary resources and continuing with another replacement. The
  upstairs has always been the source of this answer but the plan is to
  move from using activation as this signal to using this introduced
  enum

- when activating with only 2 out of 3 downstairs available, the control
  plane needs to know the difference between an unhealthy volume and one
  that activated early with 2 out of 3 but eventually had all 3 mirrors
  online.

- in the future, the control plane could query for health when
  performing updates or sled reboots, pausing until impacted Volumes
  become healthy again

The VolumeInfo enum already existed so this PR renames that to
VolumeExtentInfo. Eventually we should probably combine the two.
@jmpesp jmpesp requested a review from leftwo April 23, 2026 18:57
@jmpesp
Copy link
Copy Markdown
Contributor Author

jmpesp commented Apr 23, 2026

Example output from a healthy volume:

  "info": {
    "volume": {
      "sub_volumes": [
        {
          "upstairs": {
            "state": "active",
            "block_size": 512,
            "upstairs_id": "08166582-285a-46a4-98cc-6f5eb966733b",
            "session_id": "db05a92f-f220-4611-aa4b-3d448f885045",
            "generation": 1776970947,
            "read_only": false,
            "encrypted": true,
            "reconcile_in_progress": false,
            "live_repair_in_progress": false,
            "targets": [
              {
                "region_id": "b3092ac3-115a-4480-8d30-3c20fca254c2",
                "target_addr": "127.0.0.1:44101",
                "repair_addr": "[::]:48101",
                "state": {
                  "type": "active"
                }
              },
              {
                "region_id": "73c9cd9f-8c8e-4cea-96ff-cae21dcc53dc",
                "target_addr": "127.0.0.1:44102",
                "repair_addr": "[::]:48102",
                "state": {
                  "type": "active"
                }
              },
              {
                "region_id": "9063af2f-7331-41ba-8bc8-db5d308a1228",
                "target_addr": "127.0.0.1:44103",
                "repair_addr": "[::]:48103",
                "state": {
                  "type": "active"
                }
              }
            ]
          }
        }
      ],
      "read_only_parent": null
    }
  }

@jmpesp
Copy link
Copy Markdown
Contributor Author

jmpesp commented Apr 23, 2026

Example where I took the first downstairs offline:

  "info": {
    "volume": {
      "sub_volumes": [
        {
          "upstairs": {
            "state": "active",
            "block_size": 512,
            "upstairs_id": "08166582-285a-46a4-98cc-6f5eb966733b",
            "session_id": "db05a92f-f220-4611-aa4b-3d448f885045",
            "generation": 1776970947,
            "read_only": false,
            "encrypted": true,
            "reconcile_in_progress": false,
            "live_repair_in_progress": false,
            "targets": [
              {
                "region_id": "b3092ac3-115a-4480-8d30-3c20fca254c2",
                "target_addr": "127.0.0.1:44101",
                "repair_addr": null,
                "state": {
                  "type": "connecting",
                  "state": "negotiating",
                  "mode": "faulted"
                }
              },
              {
                "region_id": "73c9cd9f-8c8e-4cea-96ff-cae21dcc53dc",
                "target_addr": "127.0.0.1:44102",
                "repair_addr": "[::]:48102",
                "state": {
                  "type": "active"
                }
              },
              {
                "region_id": "9063af2f-7331-41ba-8bc8-db5d308a1228",
                "target_addr": "127.0.0.1:44103",
                "repair_addr": "[::]:48103",
                "state": {
                  "type": "active"
                }
              }
            ]
          }
        }
      ],
      "read_only_parent": null
    }
  }

then brought it back up:

  "info": {
    "volume": {
      "sub_volumes": [
        {
          "upstairs": {
            "state": "active",
            "block_size": 512,
            "upstairs_id": "08166582-285a-46a4-98cc-6f5eb966733b",
            "session_id": "db05a92f-f220-4611-aa4b-3d448f885045",
            "generation": 1776970947,
            "read_only": false,
            "encrypted": true,
            "reconcile_in_progress": false,
            "live_repair_in_progress": true,
            "targets": [
              {
                "region_id": "b3092ac3-115a-4480-8d30-3c20fca254c2",
                "target_addr": "127.0.0.1:44101",
                "repair_addr": "[::]:48101",
                "state": {
                  "type": "live_repair"
                }
              },
              {
                "region_id": "73c9cd9f-8c8e-4cea-96ff-cae21dcc53dc",
                "target_addr": "127.0.0.1:44102",
                "repair_addr": "[::]:48102",
                "state": {
                  "type": "active"
                }
              },
              {
                "region_id": "9063af2f-7331-41ba-8bc8-db5d308a1228",
                "target_addr": "127.0.0.1:44103",
                "repair_addr": "[::]:48103",
                "state": {
                  "type": "active"
                }
              }
            ]
          }
        }
      ],
      "read_only_parent": null
    }
  }

@jmpesp
Copy link
Copy Markdown
Contributor Author

jmpesp commented Apr 23, 2026

And here's a reconcile:

  "info": {
    "volume": {
      "sub_volumes": [
        {
          "upstairs": {
            "state": "go_active",
            "block_size": 512,
            "upstairs_id": "08166582-285a-46a4-98cc-6f5eb966733b",
            "session_id": "9acfdd86-a830-42bd-8103-e4d46799de26",
            "generation": 1776974377,
            "read_only": false,
            "encrypted": true,
            "reconcile_in_progress": true,
            "live_repair_in_progress": false,
            "targets": [
              {
                "region_id": "b3092ac3-115a-4480-8d30-3c20fca254c2",
                "target_addr": "127.0.0.1:44101",
                "repair_addr": "[::]:48101",
                "state": {
                  "type": "connecting",
                  "state": "reconcile",
                  "mode": "new"
                }
              },
              {
                "region_id": "73c9cd9f-8c8e-4cea-96ff-cae21dcc53dc",
                "target_addr": "127.0.0.1:44102",
                "repair_addr": "[::]:48102",
                "state": {
                  "type": "connecting",
                  "state": "reconcile",
                  "mode": "new"
                }
              },
              {
                "region_id": "9063af2f-7331-41ba-8bc8-db5d308a1228",
                "target_addr": "127.0.0.1:44103",
                "repair_addr": "[::]:48103",
                "state": {
                  "type": "connecting",
                  "state": "reconcile",
                  "mode": "new"
                }
              }
            ]
          }
        }
      ],
      "read_only_parent": null
    }
  }

Copy link
Copy Markdown
Contributor

@leftwo leftwo left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Minor question and request for multiple sub-volume test.

Comment thread integration_tests/src/lib.rs
);
assert_eq!(*state, DownstairsInfoStatus::Active);
}
}
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should add a test here (or somewhere) that verifies a Volume with two sub-volumes will return with the proper info for each level.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍 done in be58321

"email": "api@oxide.computer"
},
"version": "1.0.0"
"version": "2.0.0"
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we have live update running, and a new pantry with old nexus, will the old nexus still be able to query the new pantry?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yep, old nexus will be sending requests with the api-version header 1.0.0, and the pantry will handle this by converting the 2.0.0 response to 1.0.0 for those requests.

@jmpesp jmpesp merged commit 3c1708d into oxidecomputer:main Apr 27, 2026
17 checks passed
@jmpesp jmpesp deleted the volume_status branch April 27, 2026 19:14
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants