Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Testing: Expunge Sled #5480

Open
andrewjstone opened this issue Apr 9, 2024 · 9 comments
Open

Testing: Expunge Sled #5480

andrewjstone opened this issue Apr 9, 2024 · 9 comments
Assignees
Milestone

Comments

@andrewjstone
Copy link
Contributor

Initial Notes about testing on testbed and madrid live here:

@andrewjstone andrewjstone self-assigned this Apr 9, 2024
@andrewjstone andrewjstone added this to the 8 milestone Apr 9, 2024
@andrewjstone
Copy link
Contributor Author

@jmpesp and I did some preliminary testing in the Canada region earlier. We expunged 'gravytrain' and saw the policy for the sled and its disks change to 'expunged'. We also regenerated a blueprint and saw external DNS change generation numbers.

I'm going to do a run on a4x2 later tonight with @sunshowers code to see if zones get expunged. I'm going to start by adding a sled then removing one. I'll take more detailed notes on that run.

@andrewjstone
Copy link
Contributor Author

Overview

After adding g2

Current blueprint

root@oxz_switch:~# omdb nexus blueprints show current
note: Nexus URL not specified.  Will pick one from DNS.
note: using DNS server for subnet fd00:1122:3344::/48
note: (if this is not right, use --dns-server to specify an alternate DNS server)
note: using Nexus URL http://[fd00:1122:3344:103::5]:12221
blueprint  b5b34652-ef51-4a53-ba87-7b41d10d7df5
parent:    f2a0c950-136a-408b-8c1f-7365b73adae2

  -----------------------------------------------------------------------------------------------
    zone type         zone ID                                disposition  underlay IP
  -----------------------------------------------------------------------------------------------

  sled 0dc22738-49db-458f-b56d-9607c10a5ed9: blueprint zones at generation 3
    crucible          24f1e0af-3753-4f18-9e59-00f957c04653   in service   fd00:1122:3344:121::26
    crucible          6299f241-da9e-4156-9470-8a313fb851f7   in service   fd00:1122:3344:121::25
    crucible          903203c1-d37e-430b-8377-608bc0c4e3ee   in service   fd00:1122:3344:121::24
    crucible          c5f8c4bb-88da-4f30-b80b-af6c1a6bc6be   in service   fd00:1122:3344:121::22
    crucible          fb448ef6-c8ca-476f-9f50-c2dd54f48e8e   in service   fd00:1122:3344:121::23
    internal_ntp      d6b4139d-1737-4ef0-b465-bcf929e3bed0   in service   fd00:1122:3344:121::21

  sled 310448d5-599c-4071-a770-dae06a517a84: blueprint zones at generation 5
    boundary_ntp      9e34c4dc-861e-4af9-969c-62eec8c3fdbb   in service   fd00:1122:3344:102::d
    cockroach_db      60bc290e-f564-4b3a-ae61-ebf4c714b394   in service   fd00:1122:3344:102::4
    cockroach_db      f6be95c7-ed82-4783-838d-b052e0702f8c   in service   fd00:1122:3344:102::3
    crucible          226d25a7-c468-46e8-b983-36eb351ee7fe   in service   fd00:1122:3344:102::c
    crucible          9d8a8f9d-f593-4b95-9174-c77f04473ce5   in service   fd00:1122:3344:102::b
    crucible          a7fb3380-116d-4cfa-98ad-f65c1d5939cd   in service   fd00:1122:3344:102::9
    crucible          bbabb6f1-0e87-439c-9f55-4c4920fafe47   in service   fd00:1122:3344:102::a
    crucible          d2f17fa9-9e40-4339-b21c-e84f375ab1d3   in service   fd00:1122:3344:102::8
    crucible_pantry   a7ffbb94-e638-4cd5-a48b-078535c0fb59   in service   fd00:1122:3344:102::7
    internal_dns      f0af6245-df82-4039-96fa-3cb2bb753c56   in service   fd00:1122:3344:2::1
    nexus             fdaf8c9c-b081-4fb5-a32a-cb23566baee5   in service   fd00:1122:3344:102::5
    oximeter          51b6dbbc-c396-4ac2-8a6f-bdcae7c6ae62   in service   fd00:1122:3344:102::6

  sled 82f77114-57bd-4577-bfbb-83049a03a2e2: blueprint zones at generation 5
    boundary_ntp      d9b11802-df1b-4a66-9a03-8647bb63c4ee   in service   fd00:1122:3344:101::d
    cockroach_db      2a5d27fe-6953-46cb-926d-aff8fc9924d8   in service   fd00:1122:3344:101::3
    cockroach_db      efd73f7c-23fb-4896-8ede-eec55febfa16   in service   fd00:1122:3344:101::4
    crucible          35da43ae-ba2b-4815-af9e-5e2afdfd0679   in service   fd00:1122:3344:101::b
    crucible          483758cc-fe0d-4183-a5cd-ace82774d097   in service   fd00:1122:3344:101::c
    crucible          aae29eeb-63e7-487b-92fb-ad3e33c718ee   in service   fd00:1122:3344:101::8
    crucible          e22def46-9651-4821-b342-28e1ad2691d7   in service   fd00:1122:3344:101::9
    crucible          f20c2183-9563-414b-af94-eb3d2399e60d   in service   fd00:1122:3344:101::a
    crucible_pantry   05294ddd-e5b0-479b-ab48-646d6fa6ffd3   in service   fd00:1122:3344:101::7
    external_dns      4f4cbe27-29af-4398-8828-8a649865c660   in service   fd00:1122:3344:101::5
    internal_dns      46b6d4f6-6777-467f-9e9b-b428ce165ea4   in service   fd00:1122:3344:1::1
    nexus             ea0fd689-6879-4749-a13f-cc381c6a12d6   in service   fd00:1122:3344:101::6

  sled b66c14cd-a6e3-4e0b-8723-2c6c8c558a8e: blueprint zones at generation 5
    clickhouse        0829ca7d-1ac5-4064-8096-1c56dfa00c3c   in service   fd00:1122:3344:103::6
    cockroach_db      b6df9285-5deb-497f-a12e-c65552291090   in service   fd00:1122:3344:103::3
    crucible          19e03ca6-0d9b-41f5-8264-84d4b14a3653   in service   fd00:1122:3344:103::b
    crucible          2b278e59-07bb-4240-8206-dbe4d2bb998a   in service   fd00:1122:3344:103::8
    crucible          a3ff2b82-6ffd-41d7-972f-9f3758c1ac07   in service   fd00:1122:3344:103::a
    crucible          b7f60873-f094-4627-9933-4484e7681ec4   in service   fd00:1122:3344:103::9
    crucible          dbc84500-a28c-4174-ad70-c48ccb9a5250   in service   fd00:1122:3344:103::c
    crucible_pantry   52c15e58-c127-4717-a850-da2399bffefd   in service   fd00:1122:3344:103::7
    external_dns      bb937922-3197-4a35-a8ad-1d762da2eb62   in service   fd00:1122:3344:103::4
    internal_dns      2e34e35d-24fa-4eb4-930d-9cac7d04286c   in service   fd00:1122:3344:3::1
    internal_ntp      c24b800b-ace7-468b-973a-675a86167356   in service   fd00:1122:3344:103::d
    nexus             04e51e6c-9215-44a7-b4cc-dedeee1d8180   in service   fd00:1122:3344:103::5

METADATA:
  created by:            04e51e6c-9215-44a7-b4cc-dedeee1d8180
  created at:            2024-04-20T00:55:22.641Z
  comment:               (none)
  internal DNS version:  3
  external DNS version:  2

Zones on sled g2

root@oxz_switch:~# omdb sled-agent --sled-agent-url http://[fd00:1122:3344:121::1]:12345 zones list
zones:
    "oxz_crucible_24f1e0af-3753-4f18-9e59-00f957c04653"
    "oxz_crucible_6299f241-da9e-4156-9470-8a313fb851f7"
    "oxz_crucible_903203c1-d37e-430b-8377-608bc0c4e3ee"
    "oxz_crucible_c5f8c4bb-88da-4f30-b80b-af6c1a6bc6be"
    "oxz_crucible_fb448ef6-c8ca-476f-9f50-c2dd54f48e8e"
    "oxz_ntp_d6b4139d-1737-4ef0-b465-bcf929e3bed0"

zpools on g2

root@oxz_switch:~# omdb sled-agent --sled-agent-url http://[fd00:1122:3344:121::1]:12345 zpools list
zpools:
    Zpool { disk_type: M2, id: 7eee11ad-4e8b-470a-857a-5573e8cbe0f1 (zpool) }
    Zpool { disk_type: U2, id: 843675c2-0a83-4687-aebd-04a085841705 (zpool) }
    Zpool { disk_type: M2, id: b8a33aa5-c26f-4a64-8575-2ac14be40465 (zpool) }
    Zpool { disk_type: U2, id: 1b0749e8-6b46-44ef-b5dd-a9a09001b2b1 (zpool) }
    Zpool { disk_type: U2, id: fcba180e-dba2-493c-9ecf-4026ad38fb78 (zpool) }
    Zpool { disk_type: U2, id: 1eafa5bd-3bbf-4c34-9a90-cd36f4bedee8 (zpool) }
    Zpool { disk_type: U2, id: 608d6bb7-7c3d-4c51-a69e-5909e466fff1 (zpool) }

Interesting DB data

All Physical disks

root@[fd00:1122:3344:102::3]:32221/omicron> select id, time_modified, serial, disk_policy, disk_state from physical_disk;
                   id                  |         time_modified         |        serial         | disk_policy | disk_state
---------------------------------------+-------------------------------+-----------------------+-------------+-------------
  055f7b77-16ff-4f4f-a09a-312164791170 | 2024-04-20 00:20:15.980214+00 | synthetic-serial-g0_2 | in_service  | active
  2c27a19f-9456-41c8-9f2f-a748b225d256 | 2024-04-20 00:20:15.980214+00 | synthetic-serial-g3_3 | in_service  | active
  367ed0cd-8912-4548-8c2a-f40dbaf6b4c2 | 2024-04-20 00:20:15.980214+00 | synthetic-serial-g0_3 | in_service  | active
  423f152a-4763-4b7d-8341-ae5d46eb61be | 2024-04-20 00:40:54.662062+00 | synthetic-serial-g2_0 | in_service  | active
  47c0a7d2-4132-468d-8a65-9982b48082a6 | 2024-04-20 00:40:56.582444+00 | synthetic-serial-g2_4 | in_service  | active
  527e1766-02ee-4788-961c-36ed8a947d9d | 2024-04-20 00:20:15.980214+00 | synthetic-serial-g1_3 | in_service  | active
  74d39038-7610-4289-8008-af85d5598d2d | 2024-04-20 00:40:55.618475+00 | synthetic-serial-g2_2 | in_service  | active
  7d174018-b675-4d38-95bf-1f68cfe0464f | 2024-04-20 00:20:15.980214+00 | synthetic-serial-g0_0 | in_service  | active
  7e3de442-cce2-422c-86c0-9339994470d5 | 2024-04-20 00:20:15.980214+00 | synthetic-serial-g3_4 | in_service  | active
  82a61ae4-43d5-4576-905f-361f073d4af8 | 2024-04-20 00:20:15.980214+00 | synthetic-serial-g1_2 | in_service  | active
  844794cd-18be-4d46-acfe-0d8acee6f9ab | 2024-04-20 00:40:55.152085+00 | synthetic-serial-g2_1 | in_service  | active
  8ab75c65-1fa0-499a-81da-c80d93d1269f | 2024-04-20 00:40:56.095534+00 | synthetic-serial-g2_3 | in_service  | active
  8e1386f7-ee36-4864-96da-83cac01c2abe | 2024-04-20 00:20:15.980214+00 | synthetic-serial-g0_4 | in_service  | active
  a3b3bdab-e174-497b-a10b-82eb4ceca27c | 2024-04-20 00:20:15.980213+00 | synthetic-serial-g1_0 | in_service  | active
  bb1869d7-316e-42d2-8a87-567dcfd46d1a | 2024-04-20 00:20:15.980214+00 | synthetic-serial-g1_4 | in_service  | active
  be6b69e8-ffef-4a2a-bc3f-ba047ae2b500 | 2024-04-20 00:20:15.980214+00 | synthetic-serial-g3_1 | in_service  | active
  e3c5f704-0178-405b-a7c3-72ff69ee1d60 | 2024-04-20 00:20:15.980213+00 | synthetic-serial-g1_1 | in_service  | active
  e42168d8-f45c-4e7a-8e4c-e9576ba21591 | 2024-04-20 00:20:15.980214+00 | synthetic-serial-g0_1 | in_service  | active
  e47abcd8-a463-433b-89e1-0b691e5e623e | 2024-04-20 00:20:15.980214+00 | synthetic-serial-g3_0 | in_service  | active
  f2472ea4-88cc-4173-9472-90600826a72a | 2024-04-20 00:20:15.980214+00 | synthetic-serial-g3_2 | in_service  | active
(20 rows)

All sleds

root@[fd00:1122:3344:102::3]:32221/omicron> select id, time_modified, rcgen, serial_number, sled_policy, sled_state, sled_agent_gen from sled;
                   id                  |         time_modified         | rcgen | serial_number | sled_policy | sled_state | sled_agent_gen
---------------------------------------+-------------------------------+-------+---------------+-------------+------------+-----------------
  0dc22738-49db-458f-b56d-9607c10a5ed9 | 2024-04-20 00:40:52.492116+00 |    11 | g2            | in_service  | active     |              1
  310448d5-599c-4071-a770-dae06a517a84 | 2024-04-20 00:20:12.959726+00 |    11 | g1            | in_service  | active     |              1
  82f77114-57bd-4577-bfbb-83049a03a2e2 | 2024-04-20 00:20:04.850332+00 |    11 | g0            | in_service  | active     |              1
  b66c14cd-a6e3-4e0b-8723-2c6c8c558a8e | 2024-04-20 00:20:17.571108+00 |    11 | g3            | in_service  | active     |              1
(4 rows)

All omicron zones for the latest blueprint

root@[fd00:1122:3344:102::3]:32221/omicron> select sled_id, id, zone_type, disposition from bp_omicron_zone where blueprint_id =  'b5b34652-ef51-4a53-ba87-7b41d10d7df5';
                sled_id                |                  id                  |    zone_type    | disposition
---------------------------------------+--------------------------------------+-----------------+--------------
  b66c14cd-a6e3-4e0b-8723-2c6c8c558a8e | 04e51e6c-9215-44a7-b4cc-dedeee1d8180 | nexus           | in_service
  82f77114-57bd-4577-bfbb-83049a03a2e2 | 05294ddd-e5b0-479b-ab48-646d6fa6ffd3 | crucible_pantry | in_service
  b66c14cd-a6e3-4e0b-8723-2c6c8c558a8e | 0829ca7d-1ac5-4064-8096-1c56dfa00c3c | clickhouse      | in_service
  b66c14cd-a6e3-4e0b-8723-2c6c8c558a8e | 19e03ca6-0d9b-41f5-8264-84d4b14a3653 | crucible        | in_service
  310448d5-599c-4071-a770-dae06a517a84 | 226d25a7-c468-46e8-b983-36eb351ee7fe | crucible        | in_service
  0dc22738-49db-458f-b56d-9607c10a5ed9 | 24f1e0af-3753-4f18-9e59-00f957c04653 | crucible        | in_service
  82f77114-57bd-4577-bfbb-83049a03a2e2 | 2a5d27fe-6953-46cb-926d-aff8fc9924d8 | cockroach_db    | in_service
  b66c14cd-a6e3-4e0b-8723-2c6c8c558a8e | 2b278e59-07bb-4240-8206-dbe4d2bb998a | crucible        | in_service
  b66c14cd-a6e3-4e0b-8723-2c6c8c558a8e | 2e34e35d-24fa-4eb4-930d-9cac7d04286c | internal_dns    | in_service
  82f77114-57bd-4577-bfbb-83049a03a2e2 | 35da43ae-ba2b-4815-af9e-5e2afdfd0679 | crucible        | in_service
  82f77114-57bd-4577-bfbb-83049a03a2e2 | 46b6d4f6-6777-467f-9e9b-b428ce165ea4 | internal_dns    | in_service
  82f77114-57bd-4577-bfbb-83049a03a2e2 | 483758cc-fe0d-4183-a5cd-ace82774d097 | crucible        | in_service
  82f77114-57bd-4577-bfbb-83049a03a2e2 | 4f4cbe27-29af-4398-8828-8a649865c660 | external_dns    | in_service
  310448d5-599c-4071-a770-dae06a517a84 | 51b6dbbc-c396-4ac2-8a6f-bdcae7c6ae62 | oximeter        | in_service
  b66c14cd-a6e3-4e0b-8723-2c6c8c558a8e | 52c15e58-c127-4717-a850-da2399bffefd | crucible_pantry | in_service
  310448d5-599c-4071-a770-dae06a517a84 | 60bc290e-f564-4b3a-ae61-ebf4c714b394 | cockroach_db    | in_service
  0dc22738-49db-458f-b56d-9607c10a5ed9 | 6299f241-da9e-4156-9470-8a313fb851f7 | crucible        | in_service
  0dc22738-49db-458f-b56d-9607c10a5ed9 | 903203c1-d37e-430b-8377-608bc0c4e3ee | crucible        | in_service
  310448d5-599c-4071-a770-dae06a517a84 | 9d8a8f9d-f593-4b95-9174-c77f04473ce5 | crucible        | in_service
  310448d5-599c-4071-a770-dae06a517a84 | 9e34c4dc-861e-4af9-969c-62eec8c3fdbb | boundary_ntp    | in_service
  b66c14cd-a6e3-4e0b-8723-2c6c8c558a8e | a3ff2b82-6ffd-41d7-972f-9f3758c1ac07 | crucible        | in_service
  310448d5-599c-4071-a770-dae06a517a84 | a7fb3380-116d-4cfa-98ad-f65c1d5939cd | crucible        | in_service
  310448d5-599c-4071-a770-dae06a517a84 | a7ffbb94-e638-4cd5-a48b-078535c0fb59 | crucible_pantry | in_service
  82f77114-57bd-4577-bfbb-83049a03a2e2 | aae29eeb-63e7-487b-92fb-ad3e33c718ee | crucible        | in_service
  b66c14cd-a6e3-4e0b-8723-2c6c8c558a8e | b6df9285-5deb-497f-a12e-c65552291090 | cockroach_db    | in_service
  b66c14cd-a6e3-4e0b-8723-2c6c8c558a8e | b7f60873-f094-4627-9933-4484e7681ec4 | crucible        | in_service
  b66c14cd-a6e3-4e0b-8723-2c6c8c558a8e | bb937922-3197-4a35-a8ad-1d762da2eb62 | external_dns    | in_service
  310448d5-599c-4071-a770-dae06a517a84 | bbabb6f1-0e87-439c-9f55-4c4920fafe47 | crucible        | in_service
  b66c14cd-a6e3-4e0b-8723-2c6c8c558a8e | c24b800b-ace7-468b-973a-675a86167356 | internal_ntp    | in_service
  0dc22738-49db-458f-b56d-9607c10a5ed9 | c5f8c4bb-88da-4f30-b80b-af6c1a6bc6be | crucible        | in_service
  310448d5-599c-4071-a770-dae06a517a84 | d2f17fa9-9e40-4339-b21c-e84f375ab1d3 | crucible        | in_service
  0dc22738-49db-458f-b56d-9607c10a5ed9 | d6b4139d-1737-4ef0-b465-bcf929e3bed0 | internal_ntp    | in_service
  82f77114-57bd-4577-bfbb-83049a03a2e2 | d9b11802-df1b-4a66-9a03-8647bb63c4ee | boundary_ntp    | in_service
  b66c14cd-a6e3-4e0b-8723-2c6c8c558a8e | dbc84500-a28c-4174-ad70-c48ccb9a5250 | crucible        | in_service
  82f77114-57bd-4577-bfbb-83049a03a2e2 | e22def46-9651-4821-b342-28e1ad2691d7 | crucible        | in_service
  82f77114-57bd-4577-bfbb-83049a03a2e2 | ea0fd689-6879-4749-a13f-cc381c6a12d6 | nexus           | in_service
  82f77114-57bd-4577-bfbb-83049a03a2e2 | efd73f7c-23fb-4896-8ede-eec55febfa16 | cockroach_db    | in_service
  310448d5-599c-4071-a770-dae06a517a84 | f0af6245-df82-4039-96fa-3cb2bb753c56 | internal_dns    | in_service
  82f77114-57bd-4577-bfbb-83049a03a2e2 | f20c2183-9563-414b-af94-eb3d2399e60d | crucible        | in_service
  310448d5-599c-4071-a770-dae06a517a84 | f6be95c7-ed82-4783-838d-b052e0702f8c | cockroach_db    | in_service
  0dc22738-49db-458f-b56d-9607c10a5ed9 | fb448ef6-c8ca-476f-9f50-c2dd54f48e8e | crucible        | in_service
  310448d5-599c-4071-a770-dae06a517a84 | fdaf8c9c-b081-4fb5-a32a-cb23566baee5 | nexus           | in_service
(42 rows)

All omicron zones for the latest inventory

root@[fd00:1122:3344:102::3]:32221/omicron> select sled_id, id, zone_type from inv_omicron_zone where inv_collection_id='3b78d7bd-a85e-4925-94cd-0f490db46282';
                sled_id                |                  id                  |    zone_type
---------------------------------------+--------------------------------------+------------------
  b66c14cd-a6e3-4e0b-8723-2c6c8c558a8e | 04e51e6c-9215-44a7-b4cc-dedeee1d8180 | nexus
  82f77114-57bd-4577-bfbb-83049a03a2e2 | 05294ddd-e5b0-479b-ab48-646d6fa6ffd3 | crucible_pantry
  b66c14cd-a6e3-4e0b-8723-2c6c8c558a8e | 0829ca7d-1ac5-4064-8096-1c56dfa00c3c | clickhouse
  b66c14cd-a6e3-4e0b-8723-2c6c8c558a8e | 19e03ca6-0d9b-41f5-8264-84d4b14a3653 | crucible
  310448d5-599c-4071-a770-dae06a517a84 | 226d25a7-c468-46e8-b983-36eb351ee7fe | crucible
  0dc22738-49db-458f-b56d-9607c10a5ed9 | 24f1e0af-3753-4f18-9e59-00f957c04653 | crucible
  82f77114-57bd-4577-bfbb-83049a03a2e2 | 2a5d27fe-6953-46cb-926d-aff8fc9924d8 | cockroach_db
  b66c14cd-a6e3-4e0b-8723-2c6c8c558a8e | 2b278e59-07bb-4240-8206-dbe4d2bb998a | crucible
  b66c14cd-a6e3-4e0b-8723-2c6c8c558a8e | 2e34e35d-24fa-4eb4-930d-9cac7d04286c | internal_dns
  82f77114-57bd-4577-bfbb-83049a03a2e2 | 35da43ae-ba2b-4815-af9e-5e2afdfd0679 | crucible
  82f77114-57bd-4577-bfbb-83049a03a2e2 | 46b6d4f6-6777-467f-9e9b-b428ce165ea4 | internal_dns
  82f77114-57bd-4577-bfbb-83049a03a2e2 | 483758cc-fe0d-4183-a5cd-ace82774d097 | crucible
  82f77114-57bd-4577-bfbb-83049a03a2e2 | 4f4cbe27-29af-4398-8828-8a649865c660 | external_dns
  310448d5-599c-4071-a770-dae06a517a84 | 51b6dbbc-c396-4ac2-8a6f-bdcae7c6ae62 | oximeter
  b66c14cd-a6e3-4e0b-8723-2c6c8c558a8e | 52c15e58-c127-4717-a850-da2399bffefd | crucible_pantry
  310448d5-599c-4071-a770-dae06a517a84 | 60bc290e-f564-4b3a-ae61-ebf4c714b394 | cockroach_db
  0dc22738-49db-458f-b56d-9607c10a5ed9 | 6299f241-da9e-4156-9470-8a313fb851f7 | crucible
  0dc22738-49db-458f-b56d-9607c10a5ed9 | 903203c1-d37e-430b-8377-608bc0c4e3ee | crucible
  310448d5-599c-4071-a770-dae06a517a84 | 9d8a8f9d-f593-4b95-9174-c77f04473ce5 | crucible
  310448d5-599c-4071-a770-dae06a517a84 | 9e34c4dc-861e-4af9-969c-62eec8c3fdbb | boundary_ntp
  b66c14cd-a6e3-4e0b-8723-2c6c8c558a8e | a3ff2b82-6ffd-41d7-972f-9f3758c1ac07 | crucible
  310448d5-599c-4071-a770-dae06a517a84 | a7fb3380-116d-4cfa-98ad-f65c1d5939cd | crucible
  310448d5-599c-4071-a770-dae06a517a84 | a7ffbb94-e638-4cd5-a48b-078535c0fb59 | crucible_pantry
  82f77114-57bd-4577-bfbb-83049a03a2e2 | aae29eeb-63e7-487b-92fb-ad3e33c718ee | crucible
  b66c14cd-a6e3-4e0b-8723-2c6c8c558a8e | b6df9285-5deb-497f-a12e-c65552291090 | cockroach_db
  b66c14cd-a6e3-4e0b-8723-2c6c8c558a8e | b7f60873-f094-4627-9933-4484e7681ec4 | crucible
  b66c14cd-a6e3-4e0b-8723-2c6c8c558a8e | bb937922-3197-4a35-a8ad-1d762da2eb62 | external_dns
  310448d5-599c-4071-a770-dae06a517a84 | bbabb6f1-0e87-439c-9f55-4c4920fafe47 | crucible
  b66c14cd-a6e3-4e0b-8723-2c6c8c558a8e | c24b800b-ace7-468b-973a-675a86167356 | internal_ntp
  0dc22738-49db-458f-b56d-9607c10a5ed9 | c5f8c4bb-88da-4f30-b80b-af6c1a6bc6be | crucible
  310448d5-599c-4071-a770-dae06a517a84 | d2f17fa9-9e40-4339-b21c-e84f375ab1d3 | crucible
  0dc22738-49db-458f-b56d-9607c10a5ed9 | d6b4139d-1737-4ef0-b465-bcf929e3bed0 | internal_ntp
  82f77114-57bd-4577-bfbb-83049a03a2e2 | d9b11802-df1b-4a66-9a03-8647bb63c4ee | boundary_ntp
  b66c14cd-a6e3-4e0b-8723-2c6c8c558a8e | dbc84500-a28c-4174-ad70-c48ccb9a5250 | crucible
  82f77114-57bd-4577-bfbb-83049a03a2e2 | e22def46-9651-4821-b342-28e1ad2691d7 | crucible
  82f77114-57bd-4577-bfbb-83049a03a2e2 | ea0fd689-6879-4749-a13f-cc381c6a12d6 | nexus
  82f77114-57bd-4577-bfbb-83049a03a2e2 | efd73f7c-23fb-4896-8ede-eec55febfa16 | cockroach_db
  310448d5-599c-4071-a770-dae06a517a84 | f0af6245-df82-4039-96fa-3cb2bb753c56 | internal_dns
  82f77114-57bd-4577-bfbb-83049a03a2e2 | f20c2183-9563-414b-af94-eb3d2399e60d | crucible
  310448d5-599c-4071-a770-dae06a517a84 | f6be95c7-ed82-4783-838d-b052e0702f8c | cockroach_db
  0dc22738-49db-458f-b56d-9607c10a5ed9 | fb448ef6-c8ca-476f-9f50-c2dd54f48e8e | crucible
  310448d5-599c-4071-a770-dae06a517a84 | fdaf8c9c-b081-4fb5-a32a-cb23566baee5 | nexus
(42 rows)

Expunge sled g1

Expunge operations

Hyperstop sled 1

pfexec ./a4x2 hyperstop g1

Show current sleds

root@oxz_switch:~# omdb db sleds;
note: database URL not specified.  Will search DNS.
note: (override with --db-url or OMDB_DB_URL)
note: using DNS server for subnet fd00:1122:3344::/48
note: (if this is not right, use --dns-server to specify an alternate DNS server)
note: using database URL postgresql://root@[fd00:1122:3344:101::3]:32221,[fd00:1122:3344:102::4]:32221,[fd00:1122:3344:103::3]:32221,[fd00:1122:3344:101::4]:32221,[fd00:1122:3344:102::3]:32221/omicron?sslmode=disable
note: database schema version matches expected (53.0.0)
SERIAL IP                            ROLE     ID
g2     [fd00:1122:3344:121::1]:12345 -        0dc22738-49db-458f-b56d-9607c10a5ed9
g1     [fd00:1122:3344:102::1]:12345 -        310448d5-599c-4071-a770-dae06a517a84
g0     [fd00:1122:3344:101::1]:12345 scrimlet 82f77114-57bd-4577-bfbb-83049a03a2e2
g3     [fd00:1122:3344:103::1]:12345 scrimlet b66c14cd-a6e3-4e0b-8723-2c6c8c558a8e

Expunge!

root@oxz_switch:~# omdb -w nexus sleds expunge  310448d5-599c-4071-a770-dae06a517a84
note: Nexus URL not specified.  Will pick one from DNS.
note: using DNS server for subnet fd00:1122:3344::/48
note: (if this is not right, use --dns-server to specify an alternate DNS server)
note: using Nexus URL http://[fd00:1122:3344:103::5]:12221
note: database URL not specified.  Will search DNS.
note: (override with --db-url or OMDB_DB_URL)
note: using DNS server for subnet fd00:1122:3344::/48
note: (if this is not right, use --dns-server to specify an alternate DNS server)
note: using database URL postgresql://root@[fd00:1122:3344:101::3]:32221,[fd00:1122:3344:102::4]:32221,[fd00:1122:3344:103::3]:32221,[fd00:1122:3344:101::4]:32221,[fd00:1122:3344:102::3]:32221/omicron?sslmode=disable
note: database schema version matches expected (53.0.0)
WARNING: sled 310448d5-599c-4071-a770-dae06a517a84 is PRESENT in the most recent inventory collection (spotted at 2024-04-20 03:22:13.384882 UTC). It is dangerous to expunge a sled that is still running. Are you sure you want to proceed anyway?
y/N〉y
WARNING: This operation will PERMANENTLY and IRRECOVABLY mark sled 310448d5-599c-4071-a770-dae06a517a84 (g1) expunged. To proceed, type the sled's serial number.
sled serial number〉g1
expunged sled 310448d5-599c-4071-a770-dae06a517a84 (previous policy: InService(Provisionable))

Details after expunge

Before creating a new blueprint

Sleds

root@[fd00:1122:3344:101::3]:32221/omicron> select id, time_modified, rcgen, serial_number, sled_policy, sled_staet, sled_agent_gen from sled;
ERROR: column "sled_staet" does not exist
SQLSTATE: 42703
root@[fd00:1122:3344:101::3]:32221/omicron> select id, time_modified, rcgen, serial_number, sled_policy, sled_state, sled_agent_gen from sled;
                   id                  |         time_modified         | rcgen | serial_number | sled_policy | sled_state | sled_agent_gen
---------------------------------------+-------------------------------+-------+---------------+-------------+------------+-----------------
  0dc22738-49db-458f-b56d-9607c10a5ed9 | 2024-04-20 00:40:52.492116+00 |    11 | g2            | in_service  | active     |              1
  310448d5-599c-4071-a770-dae06a517a84 | 2024-04-20 03:23:54.238733+00 |    11 | g1            | expunged    | active     |              1
  82f77114-57bd-4577-bfbb-83049a03a2e2 | 2024-04-20 00:20:04.850332+00 |    11 | g0            | in_service  | active     |              1
  b66c14cd-a6e3-4e0b-8723-2c6c8c558a8e | 2024-04-20 00:20:17.571108+00 |    11 | g3            | in_service  | active     |              1
(4 rows)

Disks

root@[fd00:1122:3344:101::3]:32221/omicron> select id, time_modified, serial, disk_policy, disk_state from physical_disk;
                   id                  |         time_modified         |        serial         | disk_policy | disk_state
---------------------------------------+-------------------------------+-----------------------+-------------+-------------
  055f7b77-16ff-4f4f-a09a-312164791170 | 2024-04-20 00:20:15.980214+00 | synthetic-serial-g0_2 | in_service  | active
  2c27a19f-9456-41c8-9f2f-a748b225d256 | 2024-04-20 00:20:15.980214+00 | synthetic-serial-g3_3 | in_service  | active
  367ed0cd-8912-4548-8c2a-f40dbaf6b4c2 | 2024-04-20 00:20:15.980214+00 | synthetic-serial-g0_3 | in_service  | active
  423f152a-4763-4b7d-8341-ae5d46eb61be | 2024-04-20 00:40:54.662062+00 | synthetic-serial-g2_0 | in_service  | active
  47c0a7d2-4132-468d-8a65-9982b48082a6 | 2024-04-20 00:40:56.582444+00 | synthetic-serial-g2_4 | in_service  | active
  527e1766-02ee-4788-961c-36ed8a947d9d | 2024-04-20 00:20:15.980214+00 | synthetic-serial-g1_3 | expunged    | active
  74d39038-7610-4289-8008-af85d5598d2d | 2024-04-20 00:40:55.618475+00 | synthetic-serial-g2_2 | in_service  | active
  7d174018-b675-4d38-95bf-1f68cfe0464f | 2024-04-20 00:20:15.980214+00 | synthetic-serial-g0_0 | in_service  | active
  7e3de442-cce2-422c-86c0-9339994470d5 | 2024-04-20 00:20:15.980214+00 | synthetic-serial-g3_4 | in_service  | active
  82a61ae4-43d5-4576-905f-361f073d4af8 | 2024-04-20 00:20:15.980214+00 | synthetic-serial-g1_2 | expunged    | active
  844794cd-18be-4d46-acfe-0d8acee6f9ab | 2024-04-20 00:40:55.152085+00 | synthetic-serial-g2_1 | in_service  | active
  8ab75c65-1fa0-499a-81da-c80d93d1269f | 2024-04-20 00:40:56.095534+00 | synthetic-serial-g2_3 | in_service  | active
  8e1386f7-ee36-4864-96da-83cac01c2abe | 2024-04-20 00:20:15.980214+00 | synthetic-serial-g0_4 | in_service  | active
  a3b3bdab-e174-497b-a10b-82eb4ceca27c | 2024-04-20 00:20:15.980213+00 | synthetic-serial-g1_0 | expunged    | active
  bb1869d7-316e-42d2-8a87-567dcfd46d1a | 2024-04-20 00:20:15.980214+00 | synthetic-serial-g1_4 | expunged    | active
  be6b69e8-ffef-4a2a-bc3f-ba047ae2b500 | 2024-04-20 00:20:15.980214+00 | synthetic-serial-g3_1 | in_service  | active
  e3c5f704-0178-405b-a7c3-72ff69ee1d60 | 2024-04-20 00:20:15.980213+00 | synthetic-serial-g1_1 | expunged    | active
  e42168d8-f45c-4e7a-8e4c-e9576ba21591 | 2024-04-20 00:20:15.980214+00 | synthetic-serial-g0_1 | in_service  | active
  e47abcd8-a463-433b-89e1-0b691e5e623e | 2024-04-20 00:20:15.980214+00 | synthetic-serial-g3_0 | in_service  | active
  f2472ea4-88cc-4173-9472-90600826a72a | 2024-04-20 00:20:15.980214+00 | synthetic-serial-g3_2 | in_service  | active
(20 rows)

No sleds, zones, disks left in latest inventory for latest collection

root@[fd00:1122:3344:101::3]:32221/omicron> select sled_id, id, zone_type from inv_omicron_zone where inv_collection_id='cfd79d8b-f8b5-4aba-9772-b43e11210d3f'
and sled_id = '310448d5-599c-4071-a770-dae06a517a84';
  sled_id | id | zone_type
----------+----+------------
(0 rows)

I had to switch to a later collection as the other was garbage collected

root@[fd00:1122:3344:101::3]:32221/omicron> select * from inv_physical_disk where inv_collection_id='45af27dc-d6e5-4792-8de5-d0a2eb2fa690' and sled_id = '310448d5-599c-4071-a770-dae06a517a84';
  inv_collection_id | sled_id | slot | vendor | model | serial | variant
--------------------+---------+------+--------+-------+--------+----------
(0 rows)
root@[fd00:1122:3344:101::3]:32221/omicron> select * from inv_sled_agent where inv_collection_id='45af27dc-d6e5-4792-8de5-d0a2eb2fa690' and sled_id = '310448d5-599c-4071-a770-dae06a517a84';
  inv_collection_id | time_collected | source | sled_id | hw_baseboard_id | sled_agent_ip | sled_agent_port | sled_role | usable_hardware_threads | usable_physical_ram | reservoir_size
--------------------+----------------+--------+---------+-----------------+---------------+-----------------+-----------+-------------------------+---------------------+-----------------
(0 rows)

Bug?

Looking at the collections we can see that the collection from sled g1, which we expunged is never garbage collected.

root@[fd00:1122:3344:101::3]:32221/omicron> select * from inv_collection;
                   id                  |         time_started          |           time_done           |              collector
---------------------------------------+-------------------------------+-------------------------------+---------------------------------------
  183079ea-2681-4ac2-ac6b-f29fbc33a8a2 | 2024-04-20 03:39:54.775915+00 | 2024-04-20 03:40:55.071637+00 | 04e51e6c-9215-44a7-b4cc-dedeee1d8180
  7a064ede-d03d-4331-87d9-672bc5d101de | 2024-04-20 03:40:24.509742+00 | 2024-04-20 03:41:24.763388+00 | ea0fd689-6879-4749-a13f-cc381c6a12d6
  7d579ea6-97be-4dd9-a119-4428b2dee596 | 2024-04-20 03:22:12.96328+00  | 2024-04-20 03:22:13.384882+00 | ea0fd689-6879-4749-a13f-cc381c6a12d6
(3 rows)

Create a new blueprint

root@oxz_switch:~# omdb -w nexus blueprints regenerate;
note: Nexus URL not specified.  Will pick one from DNS.
note: using DNS server for subnet fd00:1122:3344::/48
note: (if this is not right, use --dns-server to specify an alternate DNS server)
note: using Nexus URL http://[fd00:1122:3344:103::5]:12221
generated new blueprint 183b04ff-d5ea-48ef-aa8b-20fd71bb3c8c

Diff the new blueprint against current target

root@oxz_switch:~# omdb nexus blueprints diff current 183b04ff-d5ea-48ef-aa8b-20fd71bb3c8c
note: Nexus URL not specified.  Will pick one from DNS.
note: using DNS server for subnet fd00:1122:3344::/48
note: (if this is not right, use --dns-server to specify an alternate DNS server)
note: using Nexus URL http://[fd00:1122:3344:103::5]:12221
from: blueprint b5b34652-ef51-4a53-ba87-7b41d10d7df5
to:   blueprint 183b04ff-d5ea-48ef-aa8b-20fd71bb3c8c

  -----------------------------------------------------------------------------------------------------------
     zone type         zone ID                                disposition   underlay IP             status
  -----------------------------------------------------------------------------------------------------------

  UNCHANGED SLEDS:

   sled 82f77114-57bd-4577-bfbb-83049a03a2e2: blueprint zones at generation 5
     boundary_ntp      d9b11802-df1b-4a66-9a03-8647bb63c4ee   in service    fd00:1122:3344:101::d
     cockroach_db      2a5d27fe-6953-46cb-926d-aff8fc9924d8   in service    fd00:1122:3344:101::3
     cockroach_db      efd73f7c-23fb-4896-8ede-eec55febfa16   in service    fd00:1122:3344:101::4
     crucible          35da43ae-ba2b-4815-af9e-5e2afdfd0679   in service    fd00:1122:3344:101::b
     crucible          483758cc-fe0d-4183-a5cd-ace82774d097   in service    fd00:1122:3344:101::c
     crucible          aae29eeb-63e7-487b-92fb-ad3e33c718ee   in service    fd00:1122:3344:101::8
     crucible          e22def46-9651-4821-b342-28e1ad2691d7   in service    fd00:1122:3344:101::9
     crucible          f20c2183-9563-414b-af94-eb3d2399e60d   in service    fd00:1122:3344:101::a
     crucible_pantry   05294ddd-e5b0-479b-ab48-646d6fa6ffd3   in service    fd00:1122:3344:101::7
     external_dns      4f4cbe27-29af-4398-8828-8a649865c660   in service    fd00:1122:3344:101::5
     internal_dns      46b6d4f6-6777-467f-9e9b-b428ce165ea4   in service    fd00:1122:3344:1::1
     nexus             ea0fd689-6879-4749-a13f-cc381c6a12d6   in service    fd00:1122:3344:101::6

   sled b66c14cd-a6e3-4e0b-8723-2c6c8c558a8e: blueprint zones at generation 5
     clickhouse        0829ca7d-1ac5-4064-8096-1c56dfa00c3c   in service    fd00:1122:3344:103::6
     cockroach_db      b6df9285-5deb-497f-a12e-c65552291090   in service    fd00:1122:3344:103::3
     crucible          19e03ca6-0d9b-41f5-8264-84d4b14a3653   in service    fd00:1122:3344:103::b
     crucible          2b278e59-07bb-4240-8206-dbe4d2bb998a   in service    fd00:1122:3344:103::8
     crucible          a3ff2b82-6ffd-41d7-972f-9f3758c1ac07   in service    fd00:1122:3344:103::a
     crucible          b7f60873-f094-4627-9933-4484e7681ec4   in service    fd00:1122:3344:103::9
     crucible          dbc84500-a28c-4174-ad70-c48ccb9a5250   in service    fd00:1122:3344:103::c
     crucible_pantry   52c15e58-c127-4717-a850-da2399bffefd   in service    fd00:1122:3344:103::7
     external_dns      bb937922-3197-4a35-a8ad-1d762da2eb62   in service    fd00:1122:3344:103::4
     internal_dns      2e34e35d-24fa-4eb4-930d-9cac7d04286c   in service    fd00:1122:3344:3::1
     internal_ntp      c24b800b-ace7-468b-973a-675a86167356   in service    fd00:1122:3344:103::d
     nexus             04e51e6c-9215-44a7-b4cc-dedeee1d8180   in service    fd00:1122:3344:103::5

  MODIFIED SLEDS:

*  sled 0dc22738-49db-458f-b56d-9607c10a5ed9: blueprint zones at generation: 3 -> 4
     crucible          24f1e0af-3753-4f18-9e59-00f957c04653   in service    fd00:1122:3344:121::26
     crucible          6299f241-da9e-4156-9470-8a313fb851f7   in service    fd00:1122:3344:121::25
     crucible          903203c1-d37e-430b-8377-608bc0c4e3ee   in service    fd00:1122:3344:121::24
     crucible          c5f8c4bb-88da-4f30-b80b-af6c1a6bc6be   in service    fd00:1122:3344:121::22
     crucible          fb448ef6-c8ca-476f-9f50-c2dd54f48e8e   in service    fd00:1122:3344:121::23
     internal_ntp      d6b4139d-1737-4ef0-b465-bcf929e3bed0   in service    fd00:1122:3344:121::21
+    nexus             b7c4cb2e-17ed-4d09-a958-1b0446d22a46   in service    fd00:1122:3344:121::27  added

*  sled 310448d5-599c-4071-a770-dae06a517a84: blueprint zones at generation: 5 -> 6
-    boundary_ntp      9e34c4dc-861e-4af9-969c-62eec8c3fdbb   in service    fd00:1122:3344:102::d   modified
+     ├─                                                      expunged      fd00:1122:3344:102::d
*     └─ changed: disposition
-    cockroach_db      60bc290e-f564-4b3a-ae61-ebf4c714b394   in service    fd00:1122:3344:102::4   modified
+     ├─                                                      expunged      fd00:1122:3344:102::4
*     └─ changed: disposition
-    cockroach_db      f6be95c7-ed82-4783-838d-b052e0702f8c   in service    fd00:1122:3344:102::3   modified
+     ├─                                                      expunged      fd00:1122:3344:102::3
*     └─ changed: disposition
-    crucible          226d25a7-c468-46e8-b983-36eb351ee7fe   in service    fd00:1122:3344:102::c   modified
+     ├─                                                      expunged      fd00:1122:3344:102::c
*     └─ changed: disposition
-    crucible          9d8a8f9d-f593-4b95-9174-c77f04473ce5   in service    fd00:1122:3344:102::b   modified
+     ├─                                                      expunged      fd00:1122:3344:102::b
*     └─ changed: disposition
-    crucible          a7fb3380-116d-4cfa-98ad-f65c1d5939cd   in service    fd00:1122:3344:102::9   modified
+     ├─                                                      expunged      fd00:1122:3344:102::9
*     └─ changed: disposition
-    crucible          bbabb6f1-0e87-439c-9f55-4c4920fafe47   in service    fd00:1122:3344:102::a   modified
+     ├─                                                      expunged      fd00:1122:3344:102::a
*     └─ changed: disposition
-    crucible          d2f17fa9-9e40-4339-b21c-e84f375ab1d3   in service    fd00:1122:3344:102::8   modified
+     ├─                                                      expunged      fd00:1122:3344:102::8
*     └─ changed: disposition
-    crucible_pantry   a7ffbb94-e638-4cd5-a48b-078535c0fb59   in service    fd00:1122:3344:102::7   modified
+     ├─                                                      expunged      fd00:1122:3344:102::7
*     └─ changed: disposition
-    internal_dns      f0af6245-df82-4039-96fa-3cb2bb753c56   in service    fd00:1122:3344:2::1     modified
+     ├─                                                      expunged      fd00:1122:3344:2::1
*     └─ changed: disposition
-    nexus             fdaf8c9c-b081-4fb5-a32a-cb23566baee5   in service    fd00:1122:3344:102::5   modified
+     ├─                                                      expunged      fd00:1122:3344:102::5
*     └─ changed: disposition
-    oximeter          51b6dbbc-c396-4ac2-8a6f-bdcae7c6ae62   in service    fd00:1122:3344:102::6   modified
+     ├─                                                      expunged      fd00:1122:3344:102::6
*     └─ changed: disposition

  METADATA:
    internal DNS version:  3 (unchanged)
    external DNS version:  2 (unchanged)

set the new blueprint as the target

root@oxz_switch:~# omdb -w nexus blueprints target set  183b04ff-d5ea-48ef-aa8b-20fd71bb3c8c enabled
note: Nexus URL not specified.  Will pick one from DNS.
note: using DNS server for subnet fd00:1122:3344::/48
note: (if this is not right, use --dns-server to specify an alternate DNS server)
note: using Nexus URL http://[fd00:1122:3344:103::5]:12221
set target blueprint to 183b04ff-d5ea-48ef-aa8b-20fd71bb3c8c

Show the new blueprint

root@oxz_switch:~# omdb nexus blueprints show current;
note: Nexus URL not specified.  Will pick one from DNS.
note: using DNS server for subnet fd00:1122:3344::/48
note: (if this is not right, use --dns-server to specify an alternate DNS server)
note: using Nexus URL http://[fd00:1122:3344:103::5]:12221
blueprint  183b04ff-d5ea-48ef-aa8b-20fd71bb3c8c
parent:    b5b34652-ef51-4a53-ba87-7b41d10d7df5

  -----------------------------------------------------------------------------------------------
    zone type         zone ID                                disposition  underlay IP
  -----------------------------------------------------------------------------------------------

  sled 0dc22738-49db-458f-b56d-9607c10a5ed9: blueprint zones at generation 4
    crucible          24f1e0af-3753-4f18-9e59-00f957c04653   in service   fd00:1122:3344:121::26
    crucible          6299f241-da9e-4156-9470-8a313fb851f7   in service   fd00:1122:3344:121::25
    crucible          903203c1-d37e-430b-8377-608bc0c4e3ee   in service   fd00:1122:3344:121::24
    crucible          c5f8c4bb-88da-4f30-b80b-af6c1a6bc6be   in service   fd00:1122:3344:121::22
    crucible          fb448ef6-c8ca-476f-9f50-c2dd54f48e8e   in service   fd00:1122:3344:121::23
    internal_ntp      d6b4139d-1737-4ef0-b465-bcf929e3bed0   in service   fd00:1122:3344:121::21
    nexus             b7c4cb2e-17ed-4d09-a958-1b0446d22a46   in service   fd00:1122:3344:121::27

  sled 310448d5-599c-4071-a770-dae06a517a84: blueprint zones at generation 6
    boundary_ntp      9e34c4dc-861e-4af9-969c-62eec8c3fdbb   expunged     fd00:1122:3344:102::d
    cockroach_db      60bc290e-f564-4b3a-ae61-ebf4c714b394   expunged     fd00:1122:3344:102::4
    cockroach_db      f6be95c7-ed82-4783-838d-b052e0702f8c   expunged     fd00:1122:3344:102::3
    crucible          226d25a7-c468-46e8-b983-36eb351ee7fe   expunged     fd00:1122:3344:102::c
    crucible          9d8a8f9d-f593-4b95-9174-c77f04473ce5   expunged     fd00:1122:3344:102::b
    crucible          a7fb3380-116d-4cfa-98ad-f65c1d5939cd   expunged     fd00:1122:3344:102::9
    crucible          bbabb6f1-0e87-439c-9f55-4c4920fafe47   expunged     fd00:1122:3344:102::a
    crucible          d2f17fa9-9e40-4339-b21c-e84f375ab1d3   expunged     fd00:1122:3344:102::8
    crucible_pantry   a7ffbb94-e638-4cd5-a48b-078535c0fb59   expunged     fd00:1122:3344:102::7
    internal_dns      f0af6245-df82-4039-96fa-3cb2bb753c56   expunged     fd00:1122:3344:2::1
    nexus             fdaf8c9c-b081-4fb5-a32a-cb23566baee5   expunged     fd00:1122:3344:102::5
    oximeter          51b6dbbc-c396-4ac2-8a6f-bdcae7c6ae62   expunged     fd00:1122:3344:102::6

  sled 82f77114-57bd-4577-bfbb-83049a03a2e2: blueprint zones at generation 5
    boundary_ntp      d9b11802-df1b-4a66-9a03-8647bb63c4ee   in service   fd00:1122:3344:101::d
    cockroach_db      2a5d27fe-6953-46cb-926d-aff8fc9924d8   in service   fd00:1122:3344:101::3
    cockroach_db      efd73f7c-23fb-4896-8ede-eec55febfa16   in service   fd00:1122:3344:101::4
    crucible          35da43ae-ba2b-4815-af9e-5e2afdfd0679   in service   fd00:1122:3344:101::b
    crucible          483758cc-fe0d-4183-a5cd-ace82774d097   in service   fd00:1122:3344:101::c
    crucible          aae29eeb-63e7-487b-92fb-ad3e33c718ee   in service   fd00:1122:3344:101::8
    crucible          e22def46-9651-4821-b342-28e1ad2691d7   in service   fd00:1122:3344:101::9
    crucible          f20c2183-9563-414b-af94-eb3d2399e60d   in service   fd00:1122:3344:101::a
    crucible_pantry   05294ddd-e5b0-479b-ab48-646d6fa6ffd3   in service   fd00:1122:3344:101::7
    external_dns      4f4cbe27-29af-4398-8828-8a649865c660   in service   fd00:1122:3344:101::5
    internal_dns      46b6d4f6-6777-467f-9e9b-b428ce165ea4   in service   fd00:1122:3344:1::1
    nexus             ea0fd689-6879-4749-a13f-cc381c6a12d6   in service   fd00:1122:3344:101::6

  sled b66c14cd-a6e3-4e0b-8723-2c6c8c558a8e: blueprint zones at generation 5
    clickhouse        0829ca7d-1ac5-4064-8096-1c56dfa00c3c   in service   fd00:1122:3344:103::6
    cockroach_db      b6df9285-5deb-497f-a12e-c65552291090   in service   fd00:1122:3344:103::3
    crucible          19e03ca6-0d9b-41f5-8264-84d4b14a3653   in service   fd00:1122:3344:103::b
    crucible          2b278e59-07bb-4240-8206-dbe4d2bb998a   in service   fd00:1122:3344:103::8
    crucible          a3ff2b82-6ffd-41d7-972f-9f3758c1ac07   in service   fd00:1122:3344:103::a
    crucible          b7f60873-f094-4627-9933-4484e7681ec4   in service   fd00:1122:3344:103::9
    crucible          dbc84500-a28c-4174-ad70-c48ccb9a5250   in service   fd00:1122:3344:103::c
    crucible_pantry   52c15e58-c127-4717-a850-da2399bffefd   in service   fd00:1122:3344:103::7
    external_dns      bb937922-3197-4a35-a8ad-1d762da2eb62   in service   fd00:1122:3344:103::4
    internal_dns      2e34e35d-24fa-4eb4-930d-9cac7d04286c   in service   fd00:1122:3344:3::1
    internal_ntp      c24b800b-ace7-468b-973a-675a86167356   in service   fd00:1122:3344:103::d
    nexus             04e51e6c-9215-44a7-b4cc-dedeee1d8180   in service   fd00:1122:3344:103::5

METADATA:
  created by:            04e51e6c-9215-44a7-b4cc-dedeee1d8180
  created at:            2024-04-20T03:44:16.246Z
  comment:               sled 310448d5-599c-4071-a770-dae06a517a84 (sled policy is expunged): 12 zones expunged
  internal DNS version:  3
  external DNS version:  2

Possible Bug

Showing sleds should also mark sled expunged or not show it

root@oxz_switch:~# omdb db sleds;
note: database URL not specified.  Will search DNS.
note: (override with --db-url or OMDB_DB_URL)
note: using DNS server for subnet fd00:1122:3344::/48
note: (if this is not right, use --dns-server to specify an alternate DNS server)
note: using database URL postgresql://root@[fd00:1122:3344:101::3]:32221,[fd00:1122:3344:102::4]:32221,[fd00:1122:3344:103::3]:32221,[fd00:1122:3344:101::4]:32221,[fd00:1122:3344:102::3]:32221/omicron?sslmode=disable
note: database schema version matches expected (53.0.0)
SERIAL IP                            ROLE     ID
g2     [fd00:1122:3344:121::1]:12345 -        0dc22738-49db-458f-b56d-9607c10a5ed9
g1     [fd00:1122:3344:102::1]:12345 -        310448d5-599c-4071-a770-dae06a517a84
g0     [fd00:1122:3344:101::1]:12345 scrimlet 82f77114-57bd-4577-bfbb-83049a03a2e2
g3     [fd00:1122:3344:103::1]:12345 scrimlet b66c14cd-a6e3-4e0b-8723-2c6c8c558a8e

Show zones for the latest blueprint

root@[fd00:1122:3344:101::3]:32221/omicron> select sled_id, id, zone_type, disposition from bp_omicron_zone where blueprint_id = '183b04ff-d5ea-48ef-aa8b-20fd71bb3c8c';
                sled_id                |                  id                  |    zone_type    | disposition
---------------------------------------+--------------------------------------+-----------------+--------------
  b66c14cd-a6e3-4e0b-8723-2c6c8c558a8e | 04e51e6c-9215-44a7-b4cc-dedeee1d8180 | nexus           | in_service
  82f77114-57bd-4577-bfbb-83049a03a2e2 | 05294ddd-e5b0-479b-ab48-646d6fa6ffd3 | crucible_pantry | in_service
  b66c14cd-a6e3-4e0b-8723-2c6c8c558a8e | 0829ca7d-1ac5-4064-8096-1c56dfa00c3c | clickhouse      | in_service
  b66c14cd-a6e3-4e0b-8723-2c6c8c558a8e | 19e03ca6-0d9b-41f5-8264-84d4b14a3653 | crucible        | in_service
  310448d5-599c-4071-a770-dae06a517a84 | 226d25a7-c468-46e8-b983-36eb351ee7fe | crucible        | expunged
  0dc22738-49db-458f-b56d-9607c10a5ed9 | 24f1e0af-3753-4f18-9e59-00f957c04653 | crucible        | in_service
  82f77114-57bd-4577-bfbb-83049a03a2e2 | 2a5d27fe-6953-46cb-926d-aff8fc9924d8 | cockroach_db    | in_service
  b66c14cd-a6e3-4e0b-8723-2c6c8c558a8e | 2b278e59-07bb-4240-8206-dbe4d2bb998a | crucible        | in_service
  b66c14cd-a6e3-4e0b-8723-2c6c8c558a8e | 2e34e35d-24fa-4eb4-930d-9cac7d04286c | internal_dns    | in_service
  82f77114-57bd-4577-bfbb-83049a03a2e2 | 35da43ae-ba2b-4815-af9e-5e2afdfd0679 | crucible        | in_service
  82f77114-57bd-4577-bfbb-83049a03a2e2 | 46b6d4f6-6777-467f-9e9b-b428ce165ea4 | internal_dns    | in_service
  82f77114-57bd-4577-bfbb-83049a03a2e2 | 483758cc-fe0d-4183-a5cd-ace82774d097 | crucible        | in_service
  82f77114-57bd-4577-bfbb-83049a03a2e2 | 4f4cbe27-29af-4398-8828-8a649865c660 | external_dns    | in_service
  310448d5-599c-4071-a770-dae06a517a84 | 51b6dbbc-c396-4ac2-8a6f-bdcae7c6ae62 | oximeter        | expunged
  b66c14cd-a6e3-4e0b-8723-2c6c8c558a8e | 52c15e58-c127-4717-a850-da2399bffefd | crucible_pantry | in_service
  310448d5-599c-4071-a770-dae06a517a84 | 60bc290e-f564-4b3a-ae61-ebf4c714b394 | cockroach_db    | expunged
  0dc22738-49db-458f-b56d-9607c10a5ed9 | 6299f241-da9e-4156-9470-8a313fb851f7 | crucible        | in_service
  0dc22738-49db-458f-b56d-9607c10a5ed9 | 903203c1-d37e-430b-8377-608bc0c4e3ee | crucible        | in_service
  310448d5-599c-4071-a770-dae06a517a84 | 9d8a8f9d-f593-4b95-9174-c77f04473ce5 | crucible        | expunged
  310448d5-599c-4071-a770-dae06a517a84 | 9e34c4dc-861e-4af9-969c-62eec8c3fdbb | boundary_ntp    | expunged
  b66c14cd-a6e3-4e0b-8723-2c6c8c558a8e | a3ff2b82-6ffd-41d7-972f-9f3758c1ac07 | crucible        | in_service
  310448d5-599c-4071-a770-dae06a517a84 | a7fb3380-116d-4cfa-98ad-f65c1d5939cd | crucible        | expunged
  310448d5-599c-4071-a770-dae06a517a84 | a7ffbb94-e638-4cd5-a48b-078535c0fb59 | crucible_pantry | expunged
  82f77114-57bd-4577-bfbb-83049a03a2e2 | aae29eeb-63e7-487b-92fb-ad3e33c718ee | crucible        | in_service
  b66c14cd-a6e3-4e0b-8723-2c6c8c558a8e | b6df9285-5deb-497f-a12e-c65552291090 | cockroach_db    | in_service
  0dc22738-49db-458f-b56d-9607c10a5ed9 | b7c4cb2e-17ed-4d09-a958-1b0446d22a46 | nexus           | in_service
  b66c14cd-a6e3-4e0b-8723-2c6c8c558a8e | b7f60873-f094-4627-9933-4484e7681ec4 | crucible        | in_service
  b66c14cd-a6e3-4e0b-8723-2c6c8c558a8e | bb937922-3197-4a35-a8ad-1d762da2eb62 | external_dns    | in_service
  310448d5-599c-4071-a770-dae06a517a84 | bbabb6f1-0e87-439c-9f55-4c4920fafe47 | crucible        | expunged
  b66c14cd-a6e3-4e0b-8723-2c6c8c558a8e | c24b800b-ace7-468b-973a-675a86167356 | internal_ntp    | in_service
  0dc22738-49db-458f-b56d-9607c10a5ed9 | c5f8c4bb-88da-4f30-b80b-af6c1a6bc6be | crucible        | in_service
  310448d5-599c-4071-a770-dae06a517a84 | d2f17fa9-9e40-4339-b21c-e84f375ab1d3 | crucible        | expunged
  0dc22738-49db-458f-b56d-9607c10a5ed9 | d6b4139d-1737-4ef0-b465-bcf929e3bed0 | internal_ntp    | in_service
  82f77114-57bd-4577-bfbb-83049a03a2e2 | d9b11802-df1b-4a66-9a03-8647bb63c4ee | boundary_ntp    | in_service
  b66c14cd-a6e3-4e0b-8723-2c6c8c558a8e | dbc84500-a28c-4174-ad70-c48ccb9a5250 | crucible        | in_service
  82f77114-57bd-4577-bfbb-83049a03a2e2 | e22def46-9651-4821-b342-28e1ad2691d7 | crucible        | in_service
  82f77114-57bd-4577-bfbb-83049a03a2e2 | ea0fd689-6879-4749-a13f-cc381c6a12d6 | nexus           | in_service
  82f77114-57bd-4577-bfbb-83049a03a2e2 | efd73f7c-23fb-4896-8ede-eec55febfa16 | cockroach_db    | in_service
  310448d5-599c-4071-a770-dae06a517a84 | f0af6245-df82-4039-96fa-3cb2bb753c56 | internal_dns    | expunged
  82f77114-57bd-4577-bfbb-83049a03a2e2 | f20c2183-9563-414b-af94-eb3d2399e60d | crucible        | in_service
  310448d5-599c-4071-a770-dae06a517a84 | f6be95c7-ed82-4783-838d-b052e0702f8c | cockroach_db    | expunged
  0dc22738-49db-458f-b56d-9607c10a5ed9 | fb448ef6-c8ca-476f-9f50-c2dd54f48e8e | crucible        | in_service
  310448d5-599c-4071-a770-dae06a517a84 | fdaf8c9c-b081-4fb5-a32a-cb23566baee5 | nexus           | expunged
(43 rows)

Sled 1 not yet decommissioned.

This is because we didnt' complet gc yet. See #5552

root@[fd00:1122:3344:101::3]:32221/omicron> select id, time_modified, rcgen, serial_number, sled_policy, sled_state, sled_agent_gen from sled;
                   id                  |         time_modified         | rcgen | serial_number | sled_policy | sled_state | sled_agent_gen
---------------------------------------+-------------------------------+-------+---------------+-------------+------------+-----------------
  0dc22738-49db-458f-b56d-9607c10a5ed9 | 2024-04-20 00:40:52.492116+00 |    11 | g2            | in_service  | active     |              1
  310448d5-599c-4071-a770-dae06a517a84 | 2024-04-20 03:23:54.238733+00 |    11 | g1            | expunged    | active     |              1
  82f77114-57bd-4577-bfbb-83049a03a2e2 | 2024-04-20 00:20:04.850332+00 |    11 | g0            | in_service  | active     |              1
  b66c14cd-a6e3-4e0b-8723-2c6c8c558a8e | 2024-04-20 00:20:17.571108+00 |    11 | g3            | in_service  | active     |              1
(4 rows)

BUG: Nexus is not running on the added sled g2 after expunge of g1

root@oxz_switch:~#  omdb sled-agent --sled-agent-url http://[fd00:1122:3344:121::1]:12345 zones list
zones:
    "oxz_crucible_24f1e0af-3753-4f18-9e59-00f957c04653"
    "oxz_crucible_6299f241-da9e-4156-9470-8a313fb851f7"
    "oxz_crucible_903203c1-d37e-430b-8377-608bc0c4e3ee"
    "oxz_crucible_c5f8c4bb-88da-4f30-b80b-af6c1a6bc6be"
    "oxz_crucible_fb448ef6-c8ca-476f-9f50-c2dd54f48e8e"
    "oxz_ntp_d6b4139d-1737-4ef0-b465-bcf929e3bed0"

Double checked by logging into g2

root@g2:~# zoneadm list
global
oxz_ntp_d6b4139d-1737-4ef0-b465-bcf929e3bed0
oxz_crucible_6299f241-da9e-4156-9470-8a313fb851f7
oxz_crucible_24f1e0af-3753-4f18-9e59-00f957c04653
oxz_crucible_fb448ef6-c8ca-476f-9f50-c2dd54f48e8e
oxz_crucible_c5f8c4bb-88da-4f30-b80b-af6c1a6bc6be
oxz_crucible_903203c1-d37e-430b-8377-608bc0c4e3ee

Looks like the OmicronDisksEnsure is making it to sled-agent on g2

04:10:53.822Z INFO SledAgent (StorageManager): Received OmicronPhysicalDisksEnsure { config: OmicronPhysicalDisksConfig { generation: Generation(1), disks: [OmicronPhysicalDiskConfig { identity: DiskIdentity { vendor: "synthetic-vendor", serial: "synthetic-serial-g2_0", model: "synthetic-model-U2" }, id: 423f152a-4763-4b7d-8341-ae5d46eb61be, pool_id: 843675c2-0a83-4687-aebd-04a085841705 (zpool) }, OmicronPhysicalDiskConfig { identity: DiskIdentity { vendor: "synthetic-vendor", serial: "synthetic-serial-g2_4", model: "synthetic-model-U2" }, id: 47c0a7d2-4132-468d-8a65-9982b48082a6, pool_id: 608d6bb7-7c3d-4c51-a69e-5909e466fff1 (zpool) }, OmicronPhysicalDiskConfig { identity: DiskIdentity { vendor: "synthetic-vendor", serial: "synthetic-serial-g2_2", model: "synthetic-model-U2" }, id: 74d39038-7610-4289-8008-af85d5598d2d, pool_id: fcba180e-dba2-493c-9ecf-4026ad38fb78 (zpool) }, OmicronPhysicalDiskConfig { identity: DiskIdentity { vendor: "synthetic-vendor", serial: "synthetic-serial-g2_1", model: "synthetic-model-U2" }, id: 844794cd-18be-4d46-acfe-0d8acee6f9ab, pool_id: 1b0749e8-6b46-44ef-b5dd-a9a09001b2b1 (zpool) }, OmicronPhysicalDiskConfig { identity: DiskIdentity { vendor: "synthetic-vendor", serial: "synthetic-serial-g2_3", model: "synthetic-model-U2" }, id: 8ab75c65-1fa0-499a-81da-c80d93d1269f, pool_id: 1eafa5bd-3bbf-4c34-9a90-cd36f4bedee8 (zpool) }] }, tx: ... }
    file = sled-storage/src/manager.rs:381
04:10:53.822Z INFO SledAgent (StorageManager): Reading ledger from /pool/int/7eee11ad-4e8b-470a-857a-5573e8cbe0f1/config/omicron-physical-disks.json
    file = common/src/ledger.rs:177
    request = omicron_physical_disks_ensure
04:10:53.823Z INFO SledAgent (StorageManager): Reading ledger from /pool/int/b8a33aa5-c26f-4a64-8575-2ac14be40465/config/omicron-physical-disks.json
    file = common/src/ledger.rs:177
    request = omicron_physical_disks_ensure
04:10:53.823Z INFO SledAgent (StorageManager): Comparing 'requested disks' to ledger on internal storage
    file = sled-storage/src/manager.rs:658
    request = omicron_physical_disks_ensure
04:10:53.823Z INFO SledAgent (StorageManager): Request looks newer than prior requests
    file = sled-storage/src/manager.rs:677
    request = omicron_physical_disks_ensure
04:10:53.823Z INFO SledAgent (StorageResources): Synchronizing disk managment
    file = sled-storage/src/resources.rs:339
04:10:53.823Z INFO SledAgent (StorageResources): Managing disk
    disk_identity = DiskIdentity { vendor: "synthetic-vendor", serial: "synthetic-serial-g2_0", model: "synthetic-model-U2" }
    file = sled-storage/src/resources.rs:379
04:10:53.823Z INFO SledAgent (StorageResources): Disk already managed successfully
    disk_identity = DiskIdentity { vendor: "synthetic-vendor", serial: "synthetic-serial-g2_0", model: "synthetic-model-U2" }
    file = sled-storage/src/resources.rs:428
04:10:53.823Z INFO SledAgent (StorageResources): Managing disk
    disk_identity = DiskIdentity { vendor: "synthetic-vendor", serial: "synthetic-serial-g2_1", model: "synthetic-model-U2" }
    file = sled-storage/src/resources.rs:379
04:10:53.823Z INFO SledAgent (StorageResources): Disk already managed successfully
    disk_identity = DiskIdentity { vendor: "synthetic-vendor", serial: "synthetic-serial-g2_1", model: "synthetic-model-U2" }
    file = sled-storage/src/resources.rs:428
04:10:53.823Z INFO SledAgent (StorageResources): Managing disk
    disk_identity = DiskIdentity { vendor: "synthetic-vendor", serial: "synthetic-serial-g2_2", model: "synthetic-model-U2" }
    file = sled-storage/src/resources.rs:379
04:10:53.823Z INFO SledAgent (StorageResources): Disk already managed successfully
    disk_identity = DiskIdentity { vendor: "synthetic-vendor", serial: "synthetic-serial-g2_2", model: "synthetic-model-U2" }
    file = sled-storage/src/resources.rs:428
04:10:53.823Z INFO SledAgent (StorageResources): Managing disk
    disk_identity = DiskIdentity { vendor: "synthetic-vendor", serial: "synthetic-serial-g2_3", model: "synthetic-model-U2" }
    file = sled-storage/src/resources.rs:379
04:10:53.823Z INFO SledAgent (StorageResources): Disk already managed successfully
    disk_identity = DiskIdentity { vendor: "synthetic-vendor", serial: "synthetic-serial-g2_3", model: "synthetic-model-U2" }
    file = sled-storage/src/resources.rs:428
04:10:53.823Z INFO SledAgent (StorageResources): Managing disk
    disk_identity = DiskIdentity { vendor: "synthetic-vendor", serial: "synthetic-serial-g2_4", model: "synthetic-model-U2" }
    file = sled-storage/src/resources.rs:379
04:10:53.823Z INFO SledAgent (StorageResources): Disk already managed successfully
    disk_identity = DiskIdentity { vendor: "synthetic-vendor", serial: "synthetic-serial-g2_4", model: "synthetic-model-U2" }
    file = sled-storage/src/resources.rs:428
04:10:53.899Z INFO SledAgent (dropshot (SledAgent)): request completed
    file = /home/ajs/.cargo/git/checkouts/dropshot-a4a923d29dccc492/29ae98d/dropshot/src/server.rs:849
    latency_us = 76852
    local_addr = [fd00:1122:3344:121::1]:12345
    method = PUT
    remote_addr = [fd00:1122:3344:103::5]:38172
    req_id = 77934a94-2e18-4813-8975-c3c9bbeaada0
    response_code = 200
    uri = /omicron-physical-disks

Looks like there is an error in blueprint executor trying to send to sled g1, which is expunged

task: "blueprint_executor"
  configured period: every 1m
  currently executing: iter 241, triggered by a periodic timer firing
    started at 2024-04-20T04:13:54.571Z, running for 1345ms
  last completed activation: iter 240, triggered by a periodic timer firing
    started at 2024-04-20T04:12:54.253Z (61s ago) and ran for 60322ms
warning: unknown background task: "blueprint_executor" (don't know how to interpret details: Object {"errors": Array [String("Failed to put OmicronPhysicalDisksConfig {\n    disks: [\n        OmicronPhysicalDiskConfig {\n            id: 527e1766-02ee-4788-961c-36ed8a947d9d,\n            identity: DiskIdentity {\n                vendor: \"synthetic-vendor\",\n                serial: \"synthetic-serial-g1_3\",\n                model: \"synthetic-model-U2\",\n            },\n            pool_id: 7b7b1abc-ab5a-4da4-8060-58452b1114c2 (zpool),\n        },\n        OmicronPhysicalDiskConfig {\n            id: 82a61ae4-43d5-4576-905f-361f073d4af8,\n            identity: DiskIdentity {\n                vendor: \"synthetic-vendor\",\n                serial: \"synthetic-serial-g1_2\",\n                model: \"synthetic-model-U2\",\n            },\n            pool_id: cc647fab-7546-449c-9933-2f641951e7db (zpool),\n        },\n        OmicronPhysicalDiskConfig {\n            id: a3b3bdab-e174-497b-a10b-82eb4ceca27c,\n            identity: DiskIdentity {\n                vendor: \"synthetic-vendor\",\n                serial: \"synthetic-serial-g1_0\",\n                model: \"synthetic-model-U2\",\n            },\n            pool_id: 2cd3d214-5c42-4426-9a64-1e7e7d4b8e55 (zpool),\n        },\n        OmicronPhysicalDiskConfig {\n            id: bb1869d7-316e-42d2-8a87-567dcfd46d1a,\n            identity: DiskIdentity {\n                vendor: \"synthetic-vendor\",\n                serial: \"synthetic-serial-g1_4\",\n                model: \"synthetic-model-U2\",\n            },\n            pool_id: 68249ca2-8439-44b6-b97d-f4086a563ae0 (zpool),\n        },\n        OmicronPhysicalDiskConfig {\n            id: e3c5f704-0178-405b-a7c3-72ff69ee1d60,\n            identity: DiskIdentity {\n                vendor: \"synthetic-vendor\",\n                serial: \"synthetic-serial-g1_1\",\n                model: \"synthetic-model-U2\",\n            },\n            pool_id: 71ad7932-0734-4da7-b1b7-0a848ec24f5e (zpool),\n        },\n    ],\n    generation: Generation(\n        1,\n    ),\n} to sled 310448d5-599c-4071-a770-dae06a517a84: Communication Error: error sending request for url (http://[fd00:1122:3344:102::1]:12345/omicron-physical-disks): operation timed out: error sending request for url (http://[fd00:1122:3344:102::1]:12345/omicron-physical-disks): operation timed out: operation timed out")], "target_id": String("183b04ff-d5ea-48ef-aa8b-20fd71bb3c8c")})

Lots of errors in nexus on g0

root@g0:~# cat /pool/ext/513d0071-19c4-432c-8357-01173e02ee56/crypt/zone/oxz_nexus_ea0fd689-6879-4749-a13f-cc381c6a12d6/root/var/svc/log/oxide-nexus:default.log | looker | grep 'Failed to put'
03:45:16.831Z WARN ea0fd689-6879-4749-a13f-cc381c6a12d6 (ServerContext): Failed to put OmicronPhysicalDisksConfig {
03:46:17.075Z WARN ea0fd689-6879-4749-a13f-cc381c6a12d6 (ServerContext): Failed to put OmicronPhysicalDisksConfig {
03:47:17.311Z WARN ea0fd689-6879-4749-a13f-cc381c6a12d6 (ServerContext): Failed to put OmicronPhysicalDisksConfig {
03:48:17.535Z WARN ea0fd689-6879-4749-a13f-cc381c6a12d6 (ServerContext): Failed to put OmicronPhysicalDisksConfig {
03:49:17.752Z WARN ea0fd689-6879-4749-a13f-cc381c6a12d6 (ServerContext): Failed to put OmicronPhysicalDisksConfig {
03:50:18.013Z WARN ea0fd689-6879-4749-a13f-cc381c6a12d6 (ServerContext): Failed to put OmicronPhysicalDisksConfig {
03:51:18.298Z WARN ea0fd689-6879-4749-a13f-cc381c6a12d6 (ServerContext): Failed to put OmicronPhysicalDisksConfig {
03:52:18.550Z WARN ea0fd689-6879-4749-a13f-cc381c6a12d6 (ServerContext): Failed to put OmicronPhysicalDisksConfig {
03:53:18.785Z WARN ea0fd689-6879-4749-a13f-cc381c6a12d6 (ServerContext): Failed to put OmicronPhysicalDisksConfig {
03:54:19.001Z WARN ea0fd689-6879-4749-a13f-cc381c6a12d6 (ServerContext): Failed to put OmicronPhysicalDisksConfig {
03:55:19.220Z WARN ea0fd689-6879-4749-a13f-cc381c6a12d6 (ServerContext): Failed to put OmicronPhysicalDisksConfig {
03:56:19.465Z WARN ea0fd689-6879-4749-a13f-cc381c6a12d6 (ServerContext): Failed to put OmicronPhysicalDisksConfig {
03:57:19.688Z WARN ea0fd689-6879-4749-a13f-cc381c6a12d6 (ServerContext): Failed to put OmicronPhysicalDisksConfig {
03:58:19.900Z WARN ea0fd689-6879-4749-a13f-cc381c6a12d6 (ServerContext): Failed to put OmicronPhysicalDisksConfig {
03:59:20.107Z WARN ea0fd689-6879-4749-a13f-cc381c6a12d6 (ServerContext): Failed to put OmicronPhysicalDisksConfig {
04:00:20.307Z WARN ea0fd689-6879-4749-a13f-cc381c6a12d6 (ServerContext): Failed to put OmicronPhysicalDisksConfig {
04:01:20.519Z WARN ea0fd689-6879-4749-a13f-cc381c6a12d6 (ServerContext): Failed to put OmicronPhysicalDisksConfig {
04:02:20.774Z WARN ea0fd689-6879-4749-a13f-cc381c6a12d6 (ServerContext): Failed to put OmicronPhysicalDisksConfig {
04:03:20.999Z WARN ea0fd689-6879-4749-a13f-cc381c6a12d6 (ServerContext): Failed to put OmicronPhysicalDisksConfig {
04:04:21.216Z WARN ea0fd689-6879-4749-a13f-cc381c6a12d6 (ServerContext): Failed to put OmicronPhysicalDisksConfig {
04:05:21.434Z WARN ea0fd689-6879-4749-a13f-cc381c6a12d6 (ServerContext): Failed to put OmicronPhysicalDisksConfig {
04:06:21.637Z WARN ea0fd689-6879-4749-a13f-cc381c6a12d6 (ServerContext): Failed to put OmicronPhysicalDisksConfig {
04:07:21.873Z WARN ea0fd689-6879-4749-a13f-cc381c6a12d6 (ServerContext): Failed to put OmicronPhysicalDisksConfig {
04:08:22.097Z WARN ea0fd689-6879-4749-a13f-cc381c6a12d6 (ServerContext): Failed to put OmicronPhysicalDisksConfig {
04:09:22.313Z WARN ea0fd689-6879-4749-a13f-cc381c6a12d6 (ServerContext): Failed to put OmicronPhysicalDisksConfig {
04:10:22.537Z WARN ea0fd689-6879-4749-a13f-cc381c6a12d6 (ServerContext): Failed to put OmicronPhysicalDisksConfig {
04:11:22.828Z WARN ea0fd689-6879-4749-a13f-cc381c6a12d6 (ServerContext): Failed to put OmicronPhysicalDisksConfig {
04:12:23.063Z WARN ea0fd689-6879-4749-a13f-cc381c6a12d6 (ServerContext): Failed to put OmicronPhysicalDisksConfig {

I'm pretty sure what's happening is that we aren't even getting to the point to try to deploy zones, as we bail after failing to deploy disks to the expunged sled

See:

omicron_physical_disks::deploy_disks(
&opctx,
&sleds_by_id,
&blueprint.blueprint_disks,
)
.await?;
omicron_zones::deploy_zones(
&opctx,
&sleds_by_id,
&blueprint.blueprint_zones,
)
.await?;

@jgallagher
Copy link
Contributor

Thanks for collecting all of this! Just a couple of quick drive-by thoughts from a first read:

Bug?

Looking at the collections we can see that the collection from sled g1, which we expunged is never garbage collected.

Maybe not - do all the later collections report errors? The collection pruner always keeps around the most recent collection with 0 errors, so if all the new collections have errors, you'll see one old one stick around until a new collection with no errors arrives.

I'm pretty sure what's happening is that we aren't even getting to the point to try to deploy zones, as we bail after failing to deploy disks to the expunged sled

"What should the executor do on failure" is a fair question and not something we're handling robustly today at all, but I think I'd prioritize "why is the executor trying to talk to an expunged sled at all"; by definition, that is going to fail, right?

@sunshowers
Copy link
Contributor

sunshowers commented Apr 23, 2024

notes from test 2024-04-23

  • omdb should have a way to list all sleds, not just uninitialized ones (can be done with omdb db sleds, added the ability to filter in [omdb] show sled policy and state, allow application of filter #5620)
  • omdb should have a way to show details of an inventory collection (is there a way to do this already?) there is
  • omdb should let you trigger an inventory collection immediately: [omdb] add basic support for activating background tasks #5615
  • expunged Nexus zone was not removed from service_network_interface table, and still had an external IP attached to it
  • nexuses should stagger inventory collections maybe? not hugely important if for testing we can trigger an inventory collection immediately
  • nexus couldn't perform new collections due to a bb8 timeout: restarting the Nexus worked (is bb8 holding open connections to expunged cockroach?). But note that we're not targeting cockroach zone expungement for r8.
  • nexus collection background task took 60+ seconds because it couldn't reach out to the expunged sled-agent. (Note that it didn't fail, it logged an error.) We should not collect inventory from expunged sleds?

@davepacheco
Copy link
Collaborator

Nice!

expunged Nexus zone was not removed from service_network_interface table, and still had an external IP attached to it

Presumably #5203?

nexuses should stagger inventory collections maybe? not hugely important if for testing we can trigger an inventory collection immediately

Some discussion in #5296.

@sunshowers
Copy link
Contributor

expunged Nexus zone was not removed from service_network_interface table, and still had an external IP attached to it

Presumably #5203?

I believe so yeah. Just wanted to make a note of it.

@andrewjstone
Copy link
Contributor Author

Running sleds expunge and sleds list-uninitialized failed after the sled to be expunged was hyperstopped in a4x2. The Nexus client is not rotating which nexus it is talking to and continues to try to talk to the sled that is stopped.

root@oxz_switch:~# omdb -w nexus sleds list-uninitialized
note: Nexus URL not specified.  Will pick one from DNS.
note: using DNS server for subnet fd00:1122:3344::/48
note: (if this is not right, use --dns-server to specify an alternate DNS server)
note: using Nexus URL http://[fd00:1122:3344:102::5]:12221
Error: listing uninitialized sleds

Caused by:
    0: Communication Error: error sending request for url (http://[fd00:1122:3344:102::5]:12221/sleds/uninitialized): operation timed out
    1: error sending request for url (http://[fd00:1122:3344:102::5]:12221/sleds/uninitialized): operation timed out
    2: operation timed out
root@oxz_switch:~# omdb db sleds
note: database URL not specified.  Will search DNS.
note: (override with --db-url or OMDB_DB_URL)
note: using DNS server for subnet fd00:1122:3344::/48
note: (if this is not right, use --dns-server to specify an alternate DNS server)
note: using database URL postgresql://root@[fd00:1122:3344:103::3]:32221,[fd00:1122:3344:102::4]:32221,[fd00:1122:3344:101::3]:32221,[fd00:1122:3344:102::3]:32221,[fd00:1122:3344:101::4]:32221/omicron?sslmode=disable
note: database schema version matches expected (54.0.0)
SERIAL IP                            ROLE     ID
g3     [fd00:1122:3344:103::1]:12345 scrimlet 3d0a399d-4a55-457b-bfee-cc5170486b19
g0     [fd00:1122:3344:101::1]:12345 scrimlet 53364acd-8829-4cd2-bb76-48b6be04babe
g1     [fd00:1122:3344:102::1]:12345 -        66fbff35-1daf-4d29-ad17-2f7cd71f34a4
g2     [fd00:1122:3344:121::1]:12345 -        8f0525a1-3fda-4969-a6e6-18e938965cdd
root@oxz_switch:~# omdb -w nexus sleds expunge  66fbff35-1daf-4d29-ad17-2f7cd71f34a4
note: Nexus URL not specified.  Will pick one from DNS.
note: using DNS server for subnet fd00:1122:3344::/48
note: (if this is not right, use --dns-server to specify an alternate DNS server)
note: using Nexus URL http://[fd00:1122:3344:102::5]:12221
note: database URL not specified.  Will search DNS.
note: (override with --db-url or OMDB_DB_URL)
note: using DNS server for subnet fd00:1122:3344::/48
note: (if this is not right, use --dns-server to specify an alternate DNS server)
note: using database URL postgresql://root@[fd00:1122:3344:103::3]:32221,[fd00:1122:3344:102::4]:32221,[fd00:1122:3344:101::3]:32221,[fd00:1122:3344:102::3]:32221,[fd00:1122:3344:101::4]:32221/omicron?sslmode=disable
note: database schema version matches expected (54.0.0)
WARNING: sled 66fbff35-1daf-4d29-ad17-2f7cd71f34a4 is PRESENT in the most recent inventory collection (spotted at 2024-04-24 20:00:30.997452 UTC). It is dangerous to expunge a sled that is still running. Are you sure you want to proceed anyway?
y/N〉y
WARNING: This operation will PERMANENTLY and IRRECOVABLY mark sled 66fbff35-1daf-4d29-ad17-2f7cd71f34a4 (g1) expunged. To proceed, type the sled's serial number.
sled serial number〉g1
Error: expunging sled

Caused by:
    0: Communication Error: error sending request for url (http://[fd00:1122:3344:102::5]:12221/sleds/expunge): operation timed out
    1: error sending request for url (http://[fd00:1122:3344:102::5]:12221/sleds/expunge): operation timed out
    2: operation timed out

@davepacheco
Copy link
Collaborator

It looks like these errors are coming from omdb. If that's the case, you should be able to work around it by pointing omdb at a specific Nexus instance (using OMDB_NEXUS_URL, I think it is). I expect what's happening here is we've never done the cueball-like work for DNS and connection pooling. Instead, every time a code path needs to make a request (as omdb does here), it gets a new connection to any of the IPs that are found in DNS for that service. If it gets unlucky and grabs one on the sled that's gone, you'll get an error like this.

For omdb, this is easy to work around. I'm not clear on whether there's some risk that we hit this inside Nexus during the expungement process. If so, that'll be harder to work around. It should generally work to retry, provided the APIs are idempotent.

@andrewjstone
Copy link
Contributor Author

It looks like these errors are coming from omdb. If that's the case, you should be able to work around it by pointing omdb at a specific Nexus instance (using OMDB_NEXUS_URL, I think it is). I expect what's happening here is we've never done the cueball-like work for DNS and connection pooling. Instead, every time a code path needs to make a request (as omdb does here), it gets a new connection to any of the IPs that are found in DNS for that service. If it gets unlucky and grabs one on the sled that's gone, you'll get an error like this.

For omdb, this is easy to work around. I'm not clear on whether there's some risk that we hit this inside Nexus during the expungement process. If so, that'll be harder to work around. It should generally work to retry, provided the APIs are idempotent.

Retry didn't work, but I gave it an explicit nexus URL and voila!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants