-
Notifications
You must be signed in to change notification settings - Fork 58
Closed
Description
We updated dogfood today and ran into a new problem with instance provisions:
error_message_internal = saga ACTION error at node "sled_id": unexpected database error: type with ID 218 does not exist
After some digging, it appears that Nexus's OID caching is hanging onto a stale value that was changed by the schema migration. Namely, some sequence like this happens during the upgrade:
- After mupdate, new Nexus starts up, establishes its connections to CockroachDB, and populates its OID cache (which maps enum types like
sled_resource_kind
to their numeric database OID). - We run the schema migration that drops and re-creates the
sled_resource_kind
enum. This invalidates the cache entry because now the namesled_resource_kind
points to a different OID. - In whatever context Diesel uses that cache (which appears to include at least
INSERT
statements), it uses the old OID, which does not correspond to any existing type any more, and we get this error from the database.
The workaround is to restart Nexus instances after this happens because when they come back up they will re-populate their cache with the correct value. The real fix will be to somehow invalidate this cache after schema migrations but we're still figuring out how to do that.
Metadata
Metadata
Assignees
Labels
No labels