On a customer system, after a MUPdate, we found that the normal schema migration that Nexus does was proceeding very slowly. In fact, we initially thought Nexus hadn't come up because:
$ omdb nexus update-status
note: Nexus URL not specified. Will pick one from DNS.
note: using DNS from system config (typically /etc/resolv.conf)
note: (if this is not right, use --dns-server to specify an alternate DNS server)
warning: failed to connect to NexusLockstep at ...:12232: connect "...:12232": Connection refused (os error 146)
warning: failed to connect to NexusLockstep at ...:12232: connect "...:12232": Connection refused (os error 146)
warning: failed to connect to NexusLockstep at ...:12232: connect "...:12232": Connection refused (os error 146)
Error: failed to connect to any instances of NexusLockstep
We looked at it for a while before realizing that Nexus was making forward progress, just very slowly. Nexus completes schema migrations before opening up its listening sockets, which is why we got the error above.
For reference:
- I believe this was a MUPdate from R18 to R19.
- The rack was parked for MUPdate around 2026-04-30T16:00:00Z, or 9am PT.
We were able to tell that schema upgrade steps were happening very slowly but it wasn't clear why. We looked at various data:
- Nexus logs
- output from a DTrace script showing queries executing by Nexus nodes
- the jobs table reported by CockroachDB
- CockroachDB logs
More details coming.
On a customer system, after a MUPdate, we found that the normal schema migration that Nexus does was proceeding very slowly. In fact, we initially thought Nexus hadn't come up because:
We looked at it for a while before realizing that Nexus was making forward progress, just very slowly. Nexus completes schema migrations before opening up its listening sockets, which is why we got the error above.
For reference:
We were able to tell that schema upgrade steps were happening very slowly but it wasn't clear why. We looked at various data:
More details coming.