Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Adding a previously-expunged sled leaves it expunged #5625

Open
jgallagher opened this issue Apr 25, 2024 · 2 comments
Open

Adding a previously-expunged sled leaves it expunged #5625

jgallagher opened this issue Apr 25, 2024 · 2 comments
Assignees
Milestone

Comments

@jgallagher
Copy link
Contributor

On an a4x2 testbed, I went through this sequence:

  • Set up with three sleds (g0, g1, g3)
  • Add sled g2
  • Step through blueprints to give g2 NTP and crucible
  • Use omdb db reconfigurator-save, reconfigurator-cli, and omdb nexus blueprints import to craft a blueprint that gives g2 a Nexus zone
  • Confirm the Nexus zone comes online and is externally reachable
  • Manually clean-slate the sled (disable sled-agent and mg-ddm, remove relevant files in /pool/int/*)
  • a4x2 hyperstop g2 to shut off the sled
  • omdb db nexus sleds expunge ... to expunge the sled
  • a4x2 hyperstart g2 to bring the sled back online
  • Reenable sled-agent and mg-ddm

At this point the sled shows up as uninitialized:

root@oxz_switch:~# omdb nexus sleds list-uninitialized
RACK_ID                              CUBBY SERIAL PART  REVISION
73d99599-d25c-41d6-a78b-246ceba80faa 2     g2     i86pc 0

And adding it appears to succeed:

root@oxz_switch:~# omdb -w nexus sleds add g2 i86pc
added sled g2 (i86pc)

However, after doing so, it still shows up in list-uninitialized:

root@oxz_switch:~# omdb nexus sleds list-uninitialized
RACK_ID                              CUBBY SERIAL PART  REVISION
73d99599-d25c-41d6-a78b-246ceba80faa 2     g2     i86pc 0

and we can even apparently add it again (again successfully):

root@oxz_switch:~# omdb -w nexus sleds add g2 i86pc
added sled g2 (i86pc)

but it remains in the list-uninitialized output.

We expect this to work, and the add should create a new sled_id for this same baseboard we've seen before. But looking in CRDB, we only have one row for g2 (the one created when we added it the first time), and it is still expunged:

root@[fd00:1122:3344:103::3]:32221/omicron> select id,time_created,time_modified,serial_number,sled_policy from sled;
                   id                  |         time_created          |         time_modified         | serial_number | sled_policy
---------------------------------------+-------------------------------+-------------------------------+---------------+--------------
  353582e7-6a8d-4de4-8aba-ea97ecd2f11b | 2024-04-25 17:24:32.016339+00 | 2024-04-25 19:06:50.980526+00 | g2            | expunged
  692fd79d-6f93-4787-9fa4-bbc47a3cc494 | 2024-04-25 17:18:30.451498+00 | 2024-04-25 17:18:30.451498+00 | g3            | in_service
  cacc195b-5c78-4e08-833c-5af089815eba | 2024-04-25 17:18:28.702021+00 | 2024-04-25 17:18:28.702021+00 | g0            | in_service
  fca3f869-15ce-49ee-a872-7a4129877efb | 2024-04-25 17:18:31.583653+00 | 2024-04-25 17:18:31.583653+00 | g1            | in_service
(4 rows)

Tracing through the add sled flow, I think the problem is in DataStore::allocate_sled_underlay_subnet_octets(). This is the method that creates a new sled ID (line 309); however, hw_baseboard_id is tied to the baseboard which hasn't changed. If we find a matching allocation on lines 315-318, we'll stop and return that prior allocation. In CRDB, we can see that we do have a prior allocation for this baseboard (as expected, since we've added this sled before!):

root@[fd00:1122:3344:103::3]:32221/omicron> select * from sled_underlay_subnet_allocation;
            hw_baseboard_id            |               rack_id                |               sled_id                | subnet_octet
---------------------------------------+--------------------------------------+--------------------------------------+---------------
  a41243b4-dc81-4db4-9223-4dd83dd00eea | 73d99599-d25c-41d6-a78b-246ceba80faa | 353582e7-6a8d-4de4-8aba-ea97ecd2f11b |           33
(1 row)

root@[fd00:1122:3344:103::3]:32221/omicron> select * from hw_baseboard_id where id = 'a41243b4-dc81-4db4-9223-4dd83dd00eea';
                   id                  | part_number | serial_number
---------------------------------------+-------------+----------------
  a41243b4-dc81-4db4-9223-4dd83dd00eea | i86pc       | g2
(1 row)

I think (?) if allocate_sled_underlay_subnet_octets() finds a prior allocation with a matching baseboard, it should check the sled table for that allocation's sled_id, and if it's expunged, ignore that allocation and continue looping (resulting in it either finding a non-expunged match, or exiting the loop and returning new_allocation).

@jgallagher jgallagher added this to the 8 milestone Apr 25, 2024
@davepacheco
Copy link
Collaborator

Would implementing #5554 fix this?

@jgallagher
Copy link
Contributor Author

Would implementing #5554 fix this?

Probably, with at least two caveats:

@jgallagher jgallagher self-assigned this Apr 25, 2024
jgallagher added a commit that referenced this issue May 1, 2024
I ran into this as a part of #5625, where adding a previously-expunged
sled appeared to succeed, but didn't actually add anything new.

Today if we try to add a sled that is already _running_, we get a 500,
because Nexus fails when it tries to tell the sled-agent to start. But
with this PR, we fail earlier: adding a sled that already has a subnet
allocation fails before we even try to talk to the sled-agent, because
it means someone has already added this sled:

```
root@oxz_switch:~# omdb -w nexus sleds add g2 i86pc
added sled g2 (i86pc): 90413e40-8139-43b4-9081-365dab6e5579
root@oxz_switch:~# omdb -w nexus sleds add g2 i86pc
Error: adding sled

Caused by:
    Error Response: status: 400 Bad Request; headers: {"content-type": "application/json", "x-request-id": "9eb95a9f-3fe0-4f75-8846-13490b95500e", "content-length": "188", "date": "Tue, 30 Apr 2024 20:54:49 GMT"}; value: Error { error_code: Some("ObjectAlreadyExists"), message: "already exists: sled \"g2 / i86pc (90413e40-8139-43b4-9081-365dab6e5579)\"", request_id: "9eb95a9f-3fe0-4f75-8846-13490b95500e" }
```

This does change the external API slightly (204 -> 201 created, and we
now return the ID), but I think (?) that's probably fine since we have
no real consumers of that yet.
jgallagher added a commit that referenced this issue May 13, 2024
This adds the `sled_state` map described in RFD 457 to `Blueprint`. The
motivating driver here is decommissioning expunged sleds, which is on
the path to addressing #5625, but this PR only adds the field - the
planner carries forward states from its inputs, but otherwise makes no
changes, and the executor doesn't read the new field at all.
@morlandi7 morlandi7 modified the milestones: 8, 9 May 13, 2024
jgallagher added a commit that referenced this issue May 13, 2024
This builds on (and is currently pointed at) #5663, and is on the path
to fixing #5625, although there is still more work to do there. (Should
be small, and I'll start on it while this is in review to make sure it
really fixes it.) I don't plan on landing this before R8 is out the
door, but wanted to go ahead and open it up to review.

Much of this is fallout from our discussions about what it means to
decommission a sled. Major changes are:

* `SledFilter::All` has been renamed to `SledFilter::Commissioned`, and
no longer returns sleds that are in `SledState::Decommissioned`.
* The blueprint planner will now update the desired sled state to
`Decommissioned` for sleds which satisfy our conditions. (See
`do_plan_decommission()`.)
* The blueprint planner will carry forward the Omicron zones of
decommissioned sleds to child blueprints. Pruning these is #5552.
* The blueprint planner will _not_ carry forward a desired sled state of
`Decommissioned` once the inputs report that the sled has already been
decommissioned.
* The blueprint executor will decommission sleds that the planner said
to.
* Decommissioning a sled implicitly decommissions all its disks. (This
matches what happens with sled expungement, and should not interfere
with region replacement, which keys off of policy, not state.)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants