Skip to content

Conversation

@smklein
Copy link
Collaborator

@smklein smklein commented Aug 26, 2024

#6185 introduced a schema change which added a non-nullable column lldp_link_config_id to the switch_port_settings_link_config table.

However, on systems where switch_port_settings_link_config had rows, there was no "default" value, and the schema update appears to fail. This is currently the case on our dogfood system.

To mitigate: this PR makes the lldp_link_config_id column nullable, by updating the existing schema change (for cases where it could not complete previously) and by adding a new schema change (for cases where the switch_port_settings_link_config table was empty, and the schema change did previously complete).

Fixes #6433

.links
.iter()
.map(|link| link.lldp_link_config_id)
.filter_map(|link| link.lldp_link_config_id)
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This file deserves extra scrutiny - this function (switch_port_settings_get) and the call below (do_switch_port_settings_delete) can now see an empty vec of lldp_link_ids if this field is nullable.

@smklein smklein requested a review from Nieuwejaar August 26, 2024 18:18
@smklein smklein marked this pull request as ready for review August 26, 2024 18:21
Copy link
Contributor

@rcgoodfellow rcgoodfellow left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @smklein. I read through the nexus/src/app/background/tasks/sync_switch_configuration.rs code which I believe is the primary consumer of this information. That code looks like it's prepared to deal with empty lists and generally the absence of LLDP config.

@smklein smklein enabled auto-merge (squash) August 26, 2024 21:15
@smklein smklein merged commit a032d2a into main Aug 26, 2024
@smklein smklein deleted the fix-lldp branch August 26, 2024 22:13
@davepacheco
Copy link
Collaborator

@smklein What do you think about adding a section to https://github.com/oxidecomputer/omicron/blob/main/schema/crdb/README.adoc documenting how to fix a broken schema update that's already landed in main? If I'm understanding the approach you took here, it's:

  1. change the old migration such that it won't be broken
  2. add a new schema migration so that systems that already ran the previous one will look the same as those that haven't, once the get to this new schema migration

Is this always guaranteed to work, including on systems that might have been updated to any version between "old" and "new"? I guess it's possible there are schema migrations between "old" and "new" that are broken by the fix to the old one and you'd have to update them in the same way.

@davepacheco
Copy link
Collaborator

It seems like it might also be useful to document constraints on schema updates, such as not adding a new required column without a default value.

This is a problem everybody has with schema migrations. And there are also other considerations too like not wanting to rewrite tables or take exclusive locks. I wonder if there are software packages out there that can look at a DDL statement and determine if it's going to be problematic in these ways.

nickryand pushed a commit to nickryand/omicron that referenced this pull request Aug 27, 2024
oxidecomputer#6185 introduced a schema
change which added a non-nullable column `lldp_link_config_id` to the
`switch_port_settings_link_config` table.

However, on systems where `switch_port_settings_link_config` had rows,
there was no "default" value, and the schema update appears to fail.
This is currently the case on our dogfood system.

To mitigate: this PR makes the `lldp_link_config_id` column nullable, by
updating the existing schema change (for cases where it could not
complete previously) and by adding a new schema change (for cases where
the `switch_port_settings_link_config` table was empty, and the schema
change **did** previously complete).

Fixes oxidecomputer#6433
smklein added a commit that referenced this pull request Aug 27, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Upgrade from CRDB schema version 88.0.0 failed: null value in column "lldp_link_config_id" violates not-null constraint

5 participants